## Problem Background

A lot of introductory problems in this course can be seen as impractical due to their lack of application to everyday life. Here, we try to show how, using real world data, coding can be fun! In many data-focused jobs or applications, CSV files are used to store large matrices and sets of data. In these files, each column generally represents a piece of data. Here, we are looking at data from the National Hockey League (NHL) that contains the past head-to-head (H2H) information of teams since the beginning of the league (with modified names). But, we are not interested in ANY team - we are interested in the best hockey team of all time: the Toronto Maple Leafs. Your goal is to read this data file containing the Leafs' H2H record against other teams and be able to make an application where search for a team  and determine their head-to-head record and information regarding these match ups that are also contained in the CSV file.

Let's read in the file:

In [1]:
import csv

nhl_data = []
with open('TML_WinLoss.csv', 'r') as csvfile:
    nhl_stats_reader = csv.reader(csvfile)

    for row in nhl_stats_reader:
        nhl_data.append(row)

for row in nhl_data:
    print(row)

['Rk', 'Franchise', 'GP', 'W', 'L', 'T', 'OL', 'PTS', 'PTS%', 'GF', 'GA', 'GF/G', 'GA/G']
['1', 'Anaheim Ducks', '48', '30', '12', '5', '1', '66', '.688', '159', '120', '3.31', '2.50']
['2', 'Boston Bruins', '680', '281', '291', '98', '10', '670', '.493', '2026', '2012', '2.98', '2.96']
['3', 'Buffalo Sabres', '222', '84', '109', '18', '11', '197', '.444', '628', '792', '2.83', '3.57']
['4', 'Carolina Hurricanes', '127', '49', '60', '11', '7', '116', '.457', '397', '446', '3.13', '3.51']
['5', 'Columbus Blue Jackets', '33', '16', '11', '1', '5', '38', '.576', '104', '99', '3.15', '3.00']
['6', 'Calgary Flames', '149', '68', '63', '12', '6', '154', '.517', '494', '525', '3.32', '3.52']
['7', 'Chicago Blackhawks', '655', '291', '265', '96', '3', '681', '.520', '1971', '1879', '3.01', '2.87']
['8', 'Cleveland Barons', '62', '38', '15', '9', '0', '85', '.685', '238', '171', '3.84', '2.76']
['9', 'Colorado Avalanche', '86', '33', '40', '9', '4', '79', '.459', '285', '315', '3.31', '3.66']
[

With all this data, we want to be able to search for a team in this database of teams. Which data structure have we learned that might be best for this?

In [2]:
 # Get the teams from the user and store them in a set
team_set = set()

team = input("Enter a team name (type exit to end): ")
while team != "exit":
    team_set.add(team)
    team = input("Enter a team name (type exit to end): ")

print(team_set)

{'Anaheim Ducks'}


One awkward thing about this code is that we have to write the user prompt twice. And so if we want to change it, we need to remember to do it twice. We can get around this with a variable.

In [94]:
# Get the teams from the user and store them in a set
team_set = set()

prompt = "Enter a team name (type exit to end):  "

team = input(prompt)
while team != "exit":
    team_set.add(team)
    team = input(prompt)

print(team_set)

{'Anaheim Ducks', 'Montreal Canadiens'}


The question initially asked us to search for a team and print out their H2H information against the Leafs and some other info. Let's code for the wins, losses, ties, if they lost in overtimes, and games played. If they played no games against each other ever, then print "No games played".

A pseudo-code plan might look like:

In [4]:
for team name in street_set:
    # search for team_name in NHL data
    # if found - print out wins, losses, games played
    # else - print out "No games played"

SyntaxError: invalid syntax (3561987836.py, line 1)

It might be wise to have a function that prints everything for us nicely since we are accessing so much data from the CSV

In [8]:
def print_headtohead(data, index, team_index):
    # display showing Team name, games played, wins, losses, ties, overtime losses
    print("Team Name:",data[index][team_index], "Games played:",
          data[index][team_index+1], "Wins:", data[index][team_index+2], "Losses:", data[index][team_index+3], "Ties: ", data[index][team_index+4], "Overtime Losses: ",data[index][team_index+5])


So, how can we sort through the set? One way might be:

In [11]:
nhl_team_index = 1
for team in team_set:
    print("Head to Head for: ", team)
    for team in nhl_data:
        if team[nhl_team_index] == team:
            print_team(team)
        else:
            print("No Head to Head for: ", team)

Head to Head for:  Anaheim Ducks
No Head to Head for:  ['Rk', 'Franchise', 'GP', 'W', 'L', 'T', 'OL', 'PTS', 'PTS%', 'GF', 'GA', 'GF/G', 'GA/G']
No Head to Head for:  ['1', 'Anaheim Ducks', '48', '30', '12', '5', '1', '66', '.688', '159', '120', '3.31', '2.50']
No Head to Head for:  ['2', 'Boston Bruins', '680', '281', '291', '98', '10', '670', '.493', '2026', '2012', '2.98', '2.96']
No Head to Head for:  ['3', 'Buffalo Sabres', '222', '84', '109', '18', '11', '197', '.444', '628', '792', '2.83', '3.57']
No Head to Head for:  ['4', 'Carolina Hurricanes', '127', '49', '60', '11', '7', '116', '.457', '397', '446', '3.13', '3.51']
No Head to Head for:  ['5', 'Columbus Blue Jackets', '33', '16', '11', '1', '5', '38', '.576', '104', '99', '3.15', '3.00']
No Head to Head for:  ['6', 'Calgary Flames', '149', '68', '63', '12', '6', '154', '.517', '494', '525', '3.32', '3.52']
No Head to Head for:  ['7', 'Chicago Blackhawks', '655', '291', '265', '96', '3', '681', '.520', '1971', '1879', '3.01'

In [12]:
nhl_team_index = 1
i = 2
for team in team_set:
    print("Head to Head for: ", team)
    found_team = False        # this is the "flag"
    for i in range (len(nhl_data)):
        if nhl_data[i][nhl_team_index] == team:
            print_headtohead(nhl_data, i, nhl_team_index)
            found_team = True

        #else:   # if the flag hasn't been reset, then we didn't find a house
            #print("No Head to Head for: ", team)


Head to Head for:  Anaheim Ducks
Team Name: Anaheim Ducks Games played: 48 Wins: 30 Losses: 12 Ties:  5 Overtime Losses:  1


As problems have multiple solutions: Let's think of another data structure we can use:

In [115]:
# Data Structure Design #2. Store the H2H data in a dict
# key is team name : the value will be a list of lists of stats

def nhl_to_dict(nhl_list):
    """ (list of lists) -> (dict of list of lists)
    creates a dict from nhl_data with key being the team name
    and the value being the list of lists of data for each team
    """

    team_dict = {}
    for team in nhl_list:
        if team[1] in team_dict:
            team_dict[team[1]].append(team)
        else:
            team_dict[team[1]] = [team]

    return team_dict

nhl_dict = nhl_to_dict(nhl_data[1:])
print(nhl_dict)

{'Anaheim Ducks': [['1', 'Anaheim Ducks', '48', '30', '12', '5', '1', '66', '.688', '159', '120', '3.31', '2.50']], 'Boston Bruins': [['2', 'Boston Bruins', '680', '281', '291', '98', '10', '670', '.493', '2026', '2012', '2.98', '2.96']], 'Buffalo Sabres': [['3', 'Buffalo Sabres', '222', '84', '109', '18', '11', '197', '.444', '628', '792', '2.83', '3.57']], 'Carolina Hurricanes': [['4', 'Carolina Hurricanes', '127', '49', '60', '11', '7', '116', '.457', '397', '446', '3.13', '3.51']], 'Columbus Blue Jackets': [['5', 'Columbus Blue Jackets', '33', '16', '11', '1', '5', '38', '.576', '104', '99', '3.15', '3.00']], 'Calgary Flames': [['6', 'Calgary Flames', '149', '68', '63', '12', '6', '154', '.517', '494', '525', '3.32', '3.52']], 'Chicago Blackhawks': [['7', 'Chicago Blackhawks', '655', '291', '265', '96', '3', '681', '.520', '1971', '1879', '3.01', '2.87']], 'Cleveland Barons': [['8', 'Cleveland Barons', '62', '38', '15', '9', '0', '85', '.685', '238', '171', '3.84', '2.76']], 'Color

So let's pause and think about this solution.

What is good about it?

What is bad about it?

Brainstorm: what data structures might fix some of the problems?

In [116]:
for team in team_set:
    if team in nhl_dict:
        print("Head to Head for", team)
        for i in range (len(nhl_data)):
            if nhl_data[i][nhl_team_index] == team:
                print_headtohead(nhl_data, i, nhl_team_index)

Head to Head for Anaheim Ducks
Team Name: Anaheim Ducks Games played: 48 Wins: 30 Losses: 12 Ties:  5 Overtime Losses:  1
Head to Head for Montreal Canadiens
Team Name: Montreal Canadiens Games played: 761 Wins: 309 Losses: 346 Ties:  88 Overtime Losses:  18


## Programming Plan: Step 4: Translate NHL data to our data structure.

Happily, this is almost done already since we did it in figuring out the previous step. Is there anything we want to change about this code? What don't you like?

In [117]:
def nhl_to_dict(nhl_list):
    """ (list of lists) -> (dict of list of lists)
    creates a dict from nhl_data with key being the team name
    and the value being the list of lists of data for each team
    """

    team_dict = {}
    for team in nhl_list:
        if team[1] in team_dict:
            team_dict[team[1]].append(team)
        else:
            team_dict[team[1]] = [team]

    return team_dict

nhl_dict = nhl_to_dict(nhl_data[1:])
print(nhl_dict)

{'Anaheim Ducks': [['1', 'Anaheim Ducks', '48', '30', '12', '5', '1', '66', '.688', '159', '120', '3.31', '2.50']], 'Boston Bruins': [['2', 'Boston Bruins', '680', '281', '291', '98', '10', '670', '.493', '2026', '2012', '2.98', '2.96']], 'Buffalo Sabres': [['3', 'Buffalo Sabres', '222', '84', '109', '18', '11', '197', '.444', '628', '792', '2.83', '3.57']], 'Carolina Hurricanes': [['4', 'Carolina Hurricanes', '127', '49', '60', '11', '7', '116', '.457', '397', '446', '3.13', '3.51']], 'Columbus Blue Jackets': [['5', 'Columbus Blue Jackets', '33', '16', '11', '1', '5', '38', '.576', '104', '99', '3.15', '3.00']], 'Calgary Flames': [['6', 'Calgary Flames', '149', '68', '63', '12', '6', '154', '.517', '494', '525', '3.32', '3.52']], 'Chicago Blackhawks': [['7', 'Chicago Blackhawks', '655', '291', '265', '96', '3', '681', '.520', '1971', '1879', '3.01', '2.87']], 'Cleveland Barons': [['8', 'Cleveland Barons', '62', '38', '15', '9', '0', '85', '.685', '238', '171', '3.84', '2.76']], 'Color

## Programming Plan: Step 5: Implement search for teams in the file

How do we search for team information in the file?

In [120]:
for team in team_set:
    if team in nhl_dict:
        print("Head to Head for", team)
        for i in range (len(nhl_data)):
            if nhl_data[i][nhl_team_index] == team:
                print_headtohead(nhl_data, i, nhl_team_index)

Head to Head for Anaheim Ducks
Team Name: Anaheim Ducks Games played: 48 Wins: 30 Losses: 12 Ties:  5 Overtime Losses:  1
Head to Head for Montreal Canadiens
Team Name: Montreal Canadiens Games played: 761 Wins: 309 Losses: 346 Ties:  88 Overtime Losses:  18



# Select a Solution

As is common, we had to write code to figure out how to solve the problem. That is fine and normal. What we need to be careful of is assuming that our first solution is the best. Ask some questions:
* Is this the most efficient way to solve this problem?
* What are the weaknesses of the code?
* Can the code be improved?
* Are there other data structures that should be considered? For example, what if each house was not represented by a dictionary but by a list or a tuple?

Let's put it all together.

In [None]:
import csv

def print_headtohead(data, index, team_index):

    print("Team Name:",data[index][team_index], "Games played:",
          data[index][team_index+1], "Wins:", data[index][team_index+2], "Losses:", data[index][team_index+3], "Ties: ", data[index][team_index+4], "Overtime Losses: ",data[index][team_index+5])


def nhl_to_dict(nhl_list):
    """ (list of lists) -> (dict of list of lists)
    creates a dict from nhl_list with key being the street name
    and the value being the list of lists of data for each team
    on the street
    """

    team_dict = {}
    for team in nhl_list:
        if team[1] in team_dict:
            team_dict[team[1]].append(team)
        else:
            team_dict[team[1]] = [team]

    return team_dict

nhl_dict = nhl_to_dict(nhl_data[1:])
print(nhl_dict)


# read in the database
nhl_data = []
with open('TML_WinLoss.csv', 'r') as csvfile:
    nhl_stats_reader = csv.reader(csvfile)

    for row in nhl_stats_reader:
        nhl_data.append(row)



# Get the teams from the user and store them in a set

team_set = set()
prompt = "Enter a team name (type exit to end):  "

team = input(prompt)
while team != "exit":
    team_set.add(team)
    team = input(prompt)

%print(team_set)

for team in team_set:
    if team in nhl_dict:
        print("Head to Head for", team)
        for i in range (len(nhl_data)):
            if nhl_data[i][nhl_team_index] == team:
                print_headtohead(nhl_data, i, nhl_team_index)

Is there anything we might want to fix-up? Any functions that we should write?

There are a number of sections of code that could usefully go into a function in order to make the code more readable.

In [126]:
import csv

def print_headtohead(data, index, team_index):
    # display showing team name, games played, wins, losses, ties, overtime losses

    print("Team Name:",data[index][team_index], "Games played:",
          data[index][team_index+1], "Wins:", data[index][team_index+2], "Losses:", data[index][team_index+3], "Ties: ", data[index][team_index+4], "Overtime Losses: ",data[index][team_index+5])


def nhl_to_dict(nhl_list):
    """ (list of lists) -> (dict of list of lists)
    creates a dict from nhl_list with key being the team name
    and the value being the list of lists of data for each team
    """

    team_dict = {}
    for team in nhl_list:
        if team[1] in team_dict:
            team_dict[team[1]].append(team)
        else:
            team_dict[team[1]] = [team]

    return team_dict

def get_nhl_data(filename):
    '''
    (str)->list of lists of string
    Opens <filename> as a CSV file, reads in each row and returns the list of rows
    '''

    # read in the database
    nhl_data = []
    with open('TML_WinLoss.csv', 'r') as file:
        nhl_stats_reader = csv.reader(file)

        for row in nhl_stats_reader:
            nhl_data.append(row)
    return nhl_data


def get_team_queries():
    '''
    None -> set of strings
    Prompts user to enter team names to query database
    '''
    team_set = set()

    team = input("Enter a team name in the format of 'City Mascot' (e.g., 'Montreal Canadiens') (type exit to end): ")
    while team != "exit":
        team_set.add(team)
        team = input("Enter a team name in the format of 'City Mascot' (e.g., 'Montreal Canadiens') (type exit to end): ")

    return team_set

def process_queries(teams, nhl):
    '''
    (set of str, dictionary of lists of list) -> None
    Looks up each entry in teams CSV and prints the team info or an error message
    '''
    for team in team_set:
        if team in nhl_dict:
            print("Head to Head for", team)
            for i in range (len(nhl_data)):
                if nhl_data[i][nhl_team_index] == team:
                    print_headtohead(nhl_data, i, nhl_team_index)

# *** Main code ***

# Read in NHL data and convert to dictionary
nhl_data = get_nhl_data("TML_WinLoss.csv")
nhl_dict = nhl_to_dict(nhl_data[1:])

# Get the teams from the user and store them in a set
team_set = get_team_queries()

# Run the queries on the NHL CSV
process_queries(team_set, nhl_dict)

Head to Head for Anaheim Ducks
Team Name: Anaheim Ducks Games played: 48 Wins: 30 Losses: 12 Ties:  5 Overtime Losses:  1
Head to Head for Montreal Canadiens
Team Name: Montreal Canadiens Games played: 761 Wins: 309 Losses: 346 Ties:  88 Overtime Losses:  18
