# Module 2 Summative Lab

## Introduction

For today's section, we're going to work on a single big lab to apply everything we've learned in Module 2!

## About This Lab

A quick note before getting started--this lab isn't like other labs you seen so far. This lab is meant to take ~8 hours to complete, so it's much longer and more challenging that the average labs you've seen so far. If you feel like this lab is challenging or that you might be struggling a bit, don't fret--that's by design! With everything we've learned about Web Scraping, APIs, and Databases, the best way to test our knowledge of it is to build something substantial! 

## The Project

In this lab, we're going to make use of everything we've learned about APIs, databases, and Object-Oriented Programming to **_Extract, Transform, and Load_** (or **_ETL_**, for short) some data from a SQL database into a MongoDB Database. 

You'll find a database containing information about soccer teams and the matches they've played in the file `database.sqlite`. For this project, our goal is to get the data we think is important from this SQL database, do some calculations and data transformation, and then store everything in a MongoDB database. 

Let's get into the specifics of this project.

### The Goal

Start by examining the data dictionary for the SQL database we'll be working with, which comes from this [kaggle page](https://www.kaggle.com/laudanum/footballdelphi).  Familiarize yourself with the tables it contains, and what each column means. We'll be using this database to get data on each soccer team, calculate some summary statistics, and then store each in a MongoDB database. 

Upon completion of this lab, each unique team in this dataset should have a record in the MongoDB instance containing the following information:

* The name of the team
* The total number of goals scored by the team during the 2011 season
* The total number of wins the team earned during the 2011 season
* A histogram visualization of the team's wins and losses for the 2011 season (store the visualization directly by assigning it to a variable)
* The team's win percentage on days where it was raining during games in the 2011 season. 

#### Getting the Weather Data

Note that for this last calculation, you'll need to figure out if it was raining or not during the game. The database itself does not contain this information, but it does contain the date on which the game was played. For this, you'll need to use the [DarkSky API](https://darksky.net/dev) to get the historical weather data for that day. Note that each game is played in a different location, and this information is not contained in our SQL database. However, the teams in this database are largely german, so go ahead and just use the weather in Berlin, Germany as a proxy for this information. If it was raining in Berlin on the day the game was played, count that as rain game--**_you do not need to try and figure out the actual weather at each game's location, because we don't have that information!_**

#### NOTE: The DarkSky API is limited to 1000 free API calls a day, so be sure to test your model on very small samples. Otherwise, you'll hit the rate limit!

## Project Architecture

Unlike previous labs, this lab is more open-ended, and will require you to make design decisions and plan out your strategy for building a system with this many working parts. However, **_using Object-Oriented Programming is a requirement for this project--you must create at least 2 separate, well structured classes in your solution!_** Although it may seem easier to "just start coding", this is a classic beginner's mistake. Instead, think about separating out the different functionalities you'll need to reach your goal, and then build classes to handle each. For instance, at minimum, you'll need to:

* Query the SQL database
* Calculate summary statistics
* Get the weather data from the DarkSky API
* Load the data into MongoDB

We **_strongly recommend_** you consider creating separate classes for handling at least some of these of these tasks.  Be sure to plan the inputs, outputs, and methods for each class before you begin coding! 

**_NOTE:_** We have provided some empty classes below. You are welcome to delete them and use a different architecture for this project if you so choose.  You do not have to use each of them, they are just there to give you an idea of what you could sorts of classes you may want to consider using.

### Rapid Prototyping and Refactoring

It's totally okay to try to get a task working without using OOP. For instance, when experimenting with the DarkSky API for getting historical weather data, it makes sense to just write the code in the cells and rapidly iterate until you get it all working. However, once you get it working, you're not done--you should then **_Refactor_** your code into functions or classes to make your code more modular, reusable, understandable, and maintainable! 

In short--do what you need to do to get each separate piece of functionality working, and then refactor it into a class after you've figured it out!

### Some Final Advice

You haven't built anything this big or complex thus far, so you may not yet fully realize how much trial and error goes into it. If your code keeps breaking, resist the urge to get frustrated, and just keep working. Software development is an iterative process!  No one writes perfect code that works the first time for something this involved. You're going to run into _a lot_ of small errors in this project, right up until the point where it just works, and then you're done! However, you can reduce these errors by planning out your code, and thinking about how all of the pieces fit together before you begin coding. Once you have some basic understanding of how it all will work, then you'll know what you need to build, and then all that is left is to build it!

In short:

* Plan ahead--you'll thank yourself later!
* Errors and broken code aren't bad, they're normal. 
* Keep working, and stay confident--you can do this!

Good luck--we look forward to seeing your completed project!

In [1]:
import numpy as np
import pandas as pd
import sqlite3
import requests
import json
import pymongo
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
from IPython.display import Image
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
%matplotlib inline

# SQL Querying
* The name of the team
* The total number of goals scored by the team during the 2011 season
* The total number of wins the team earned during the 2011 season
* A histogram visualization of the team's wins and losses for the 2011 season (store the visualization directly by assigning it to a variable)
* The team's win percentage on days where it was raining during games in the 2011 season. 

In [5]:
conn = sqlite3.connect('database.sqlite')
c = conn.cursor()
pd_con = sqlite3.connect('database.sqlite')

## Creating the table of Team Statistics

In [6]:
home_team_df = pd.read_sql_query("SELECT DISTINCT HomeTeam, SUM(FTHG) as HTeamGoals, SUM(FTAG) as HOpponentGoals,\
                                  COUNT(CASE FTR WHEN 'H' THEN 1 ELSE NULL END) as HWins,\
                                  COUNT(CASE FTR WHEN 'A' THEN 1 ELSE NULL END) as HLosses,\
                                  COUNT(CASE FTR WHEN 'D' THEN 1 ELSE NULL END) as HDraws\
                                  FROM Matches\
                                  WHERE Season = 2011\
                                  GROUP BY HomeTeam\
                                  ORDER BY HomeTeam;", pd_con)
print(home_team_df.info())
home_team_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56 entries, 0 to 55
Data columns (total 6 columns):
HomeTeam          56 non-null object
HTeamGoals        56 non-null int64
HOpponentGoals    56 non-null int64
HWins             56 non-null int64
HLosses           56 non-null int64
HDraws            56 non-null int64
dtypes: int64(5), object(1)
memory usage: 2.8+ KB
None


Unnamed: 0,HomeTeam,HTeamGoals,HOpponentGoals,HWins,HLosses,HDraws
0,Aachen,15,24,4,7,6
1,Arsenal,39,17,12,3,4
2,Aston Villa,20,25,4,8,7
3,Augsburg,20,19,6,4,7
4,Bayern Munich,49,6,14,2,1


In [7]:
away_team_df = pd.read_sql_query("SELECT DISTINCT AwayTeam, SUM(FTAG) as ATeamGoals, SUM(FTHG) as AOpponentGoals,\
                                  COUNT(CASE FTR WHEN 'A' THEN 1 ELSE NULL END) as AWins,\
                                  COUNT(CASE FTR WHEN 'H' THEN 1 ELSE NULL END) as ALosses,\
                                  COUNT(CASE FTR WHEN 'D' THEN 1 ELSE NULL END) as ADraws\
                                  FROM Matches\
                                  WHERE Season = 2011\
                                  GROUP BY AwayTeam\
                                  ORDER BY AwayTeam;", pd_con)
print(away_team_df.info())
away_team_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56 entries, 0 to 55
Data columns (total 6 columns):
AwayTeam          56 non-null object
ATeamGoals        56 non-null int64
AOpponentGoals    56 non-null int64
AWins             56 non-null int64
ALosses           56 non-null int64
ADraws            56 non-null int64
dtypes: int64(5), object(1)
memory usage: 2.8+ KB
None


Unnamed: 0,AwayTeam,ATeamGoals,AOpponentGoals,AWins,ALosses,ADraws
0,Aachen,15,23,2,8,7
1,Arsenal,35,32,9,7,3
2,Aston Villa,17,28,3,6,10
3,Augsburg,16,30,2,8,7
4,Bayern Munich,28,16,9,5,3


In [8]:
teams_merging_df = home_team_df.merge(away_team_df, how='inner', left_on='HomeTeam', right_on='AwayTeam')

In [9]:
teams_merging_df.head()

Unnamed: 0,HomeTeam,HTeamGoals,HOpponentGoals,HWins,HLosses,HDraws,AwayTeam,ATeamGoals,AOpponentGoals,AWins,ALosses,ADraws
0,Aachen,15,24,4,7,6,Aachen,15,23,2,8,7
1,Arsenal,39,17,12,3,4,Arsenal,35,32,9,7,3
2,Aston Villa,20,25,4,8,7,Aston Villa,17,28,3,6,10
3,Augsburg,20,19,6,4,7,Augsburg,16,30,2,8,7
4,Bayern Munich,49,6,14,2,1,Bayern Munich,28,16,9,5,3


In [10]:
teams_df = pd.DataFrame()
teams_df['Team'] = teams_merging_df['HomeTeam'].copy()
teams_df['TeamGoals'] = teams_merging_df['HTeamGoals'] + teams_merging_df['ATeamGoals']
teams_df['OpponentGoals'] = teams_merging_df['HOpponentGoals'] + teams_merging_df['AOpponentGoals']
teams_df['TeamWins'] = teams_merging_df['HWins'] + teams_merging_df['AWins']
teams_df['TeamLosses'] = teams_merging_df['HLosses'] + teams_merging_df['ALosses']
teams_df['TeamDraws'] = teams_merging_df['HDraws'] + teams_merging_df['ADraws']

In [11]:
teams_df

Unnamed: 0,Team,TeamGoals,OpponentGoals,TeamWins,TeamLosses,TeamDraws
0,Aachen,30,47,6,15,13
1,Arsenal,74,49,21,10,7
2,Aston Villa,37,53,7,14,17
3,Augsburg,36,49,8,12,14
4,Bayern Munich,77,22,23,7,4
5,Blackburn,48,78,8,23,7
6,Bochum,41,55,10,17,7
7,Bolton,46,77,10,22,6
8,Braunschweig,37,35,10,9,15
9,Chelsea,65,46,18,10,10


## Getting Days for obtaining Rain Status

In [12]:
game_days_df = pd.read_sql_query("SELECT DISTINCT Date\
                                  FROM Matches\
                                  WHERE Season = 2011\
                                  ORDER BY Date;", pd_con)
print(game_days_df.info())
game_days_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 165 entries, 0 to 164
Data columns (total 1 columns):
Date    165 non-null object
dtypes: object(1)
memory usage: 1.4+ KB
None


Unnamed: 0,Date
0,2011-07-15
1,2011-07-16
2,2011-07-17
3,2011-07-18
4,2011-07-22


In [13]:
game_day_list = list(game_days_df.Date)

In [14]:
game_day_list

['2011-07-15',
 '2011-07-16',
 '2011-07-17',
 '2011-07-18',
 '2011-07-22',
 '2011-07-23',
 '2011-07-24',
 '2011-07-25',
 '2011-08-05',
 '2011-08-06',
 '2011-08-07',
 '2011-08-08',
 '2011-08-12',
 '2011-08-13',
 '2011-08-14',
 '2011-08-15',
 '2011-08-19',
 '2011-08-20',
 '2011-08-21',
 '2011-08-22',
 '2011-08-26',
 '2011-08-27',
 '2011-08-28',
 '2011-08-29',
 '2011-09-09',
 '2011-09-10',
 '2011-09-11',
 '2011-09-12',
 '2011-09-16',
 '2011-09-17',
 '2011-09-18',
 '2011-09-19',
 '2011-09-23',
 '2011-09-24',
 '2011-09-25',
 '2011-09-26',
 '2011-09-30',
 '2011-10-01',
 '2011-10-02',
 '2011-10-03',
 '2011-10-14',
 '2011-10-15',
 '2011-10-16',
 '2011-10-17',
 '2011-10-21',
 '2011-10-22',
 '2011-10-23',
 '2011-10-28',
 '2011-10-29',
 '2011-10-30',
 '2011-10-31',
 '2011-11-04',
 '2011-11-05',
 '2011-11-06',
 '2011-11-07',
 '2011-11-18',
 '2011-11-19',
 '2011-11-20',
 '2011-11-21',
 '2011-11-25',
 '2011-11-26',
 '2011-11-27',
 '2011-11-28',
 '2011-12-02',
 '2011-12-03',
 '2011-12-04',
 '2011-12-

## Creating a Match Table for Histogram

In [15]:
home_team_match_df = pd.read_sql_query("SELECT HomeTeam as Team, FTHG as TeamGoals, FTAG as OpponentGoals, DATE as Date,\
                                        (CASE FTR WHEN 'H' THEN 1 WHEN 'A' THEN -1 ELSE 0 END) as Result\
                                        FROM Matches\
                                        WHERE Season = 2011\
                                        ORDER BY DATE;", pd_con)
print(home_team_match_df.info())
home_team_match_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 992 entries, 0 to 991
Data columns (total 5 columns):
Team             992 non-null object
TeamGoals        992 non-null int64
OpponentGoals    992 non-null int64
Date             992 non-null object
Result           992 non-null int64
dtypes: int64(3), object(2)
memory usage: 38.9+ KB
None


Unnamed: 0,Team,TeamGoals,OpponentGoals,Date,Result
0,Cottbus,2,1,2011-07-15,1
1,Greuther Furth,2,3,2011-07-15,-1
2,Frankfurt FSV,1,1,2011-07-15,0
3,Erzgebirge Aue,1,0,2011-07-16,1
4,St Pauli,2,0,2011-07-16,1


In [16]:
away_team_match_df = pd.read_sql_query("SELECT AwayTeam as Team, FTAG as TeamGoals, FTHG as OpponentGoals, DATE as Date,\
                                        (CASE FTR WHEN 'A' THEN 1 WHEN 'H' THEN -1 ELSE 0 END) as Result\
                                        FROM Matches\
                                        WHERE Season = 2011\
                                        ORDER BY DATE;", pd_con)
print(away_team_match_df.info())
away_team_match_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 992 entries, 0 to 991
Data columns (total 5 columns):
Team             992 non-null object
TeamGoals        992 non-null int64
OpponentGoals    992 non-null int64
Date             992 non-null object
Result           992 non-null int64
dtypes: int64(3), object(2)
memory usage: 38.9+ KB
None


Unnamed: 0,Team,TeamGoals,OpponentGoals,Date,Result
0,Dresden,1,2,2011-07-15,-1
1,Ein Frankfurt,3,2,2011-07-15,1
2,Union Berlin,1,1,2011-07-15,0
3,Aachen,0,1,2011-07-16,-1
4,Ingolstadt,0,2,2011-07-16,-1


In [17]:
matches_df = pd.concat([home_team_match_df, away_team_match_df], sort=False)

In [18]:
matches_df.sort_values(by='Date', inplace=True)

In [19]:
matches_df.reset_index(inplace=True, drop=True)

In [20]:
matches_df['Differential'] = matches_df['TeamGoals'] - matches_df['OpponentGoals']

In [21]:
matches_df['WinDifferential'] = matches_df['Differential'].apply(lambda x: x if x > 0 else 0)

In [22]:
matches_df['LossDifferential'] = matches_df['Differential'].apply(lambda x: x if x < 0 else 0)

In [23]:
matches_df['Draw'] = matches_df['Differential'].apply(lambda x: 1 if x == 0 else 0)

In [24]:
print(matches_df.info())
matches_df.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1984 entries, 0 to 1983
Data columns (total 9 columns):
Team                1984 non-null object
TeamGoals           1984 non-null int64
OpponentGoals       1984 non-null int64
Date                1984 non-null object
Result              1984 non-null int64
Differential        1984 non-null int64
WinDifferential     1984 non-null int64
LossDifferential    1984 non-null int64
Draw                1984 non-null int64
dtypes: int64(7), object(2)
memory usage: 139.6+ KB
None


Unnamed: 0,Team,TeamGoals,OpponentGoals,Date,Result,Differential,WinDifferential,LossDifferential,Draw
0,Cottbus,2,1,2011-07-15,1,1,1,0,0
1,Greuther Furth,2,3,2011-07-15,-1,-1,0,-1,0
2,Frankfurt FSV,1,1,2011-07-15,0,0,0,0,1
3,Union Berlin,1,1,2011-07-15,0,0,0,0,1
4,Ein Frankfurt,3,2,2011-07-15,1,1,1,0,0
5,Dresden,1,2,2011-07-15,-1,-1,0,-1,0
6,Aachen,0,1,2011-07-16,-1,-1,0,-1,0
7,Ingolstadt,0,2,2011-07-16,-1,-2,0,-2,0
8,Erzgebirge Aue,1,0,2011-07-16,1,1,1,0,0
9,St Pauli,2,0,2011-07-16,1,2,2,0,0


In [25]:
def create_hist(hist_df):
    fig = plt.figure(figsize=(15,7))

    win_bar = sns.barplot(data=hist_df,
                          x='Date',
                          y='WinDifferential',
                          color=sns.xkcd_rgb["green"],
                          label="Wins")

    loss_bar = sns.barplot(data=hist_df,
                           x='Date',
                           y='LossDifferential',
                           color=sns.xkcd_rgb["red"],
                           label='Losses')

    draw_bar = sns.barplot(data=hist_df,
                           x='Date',
                           y='Draw',
                           bottom=-0.5,
                           color=sns.xkcd_rgb["orange"],
                           label='Draws')

    win_bar.set(xlabel='Date',
                ylabel='Goal Differential',
                title='Game Results: '+hist_df.Team[0])
    win_bar.set_xticklabels(win_bar.get_xticklabels(),
                            rotation=45)

    plt.grid()
    plt.legend()
    fig.savefig('histograms/'+hist_df.Team[0]+'.png')
    plt.close('all')

In [26]:
teams_list = list(teams_df.Team)

In [27]:
#looping to get a histogram for every team
for team in teams_list:
    team_hist_df = matches_df.loc[matches_df.Team == team].copy()
    team_hist_df.reset_index(inplace=True, drop=True)
    create_hist(team_hist_df)

# Rain API Request

In [28]:
my_api_key = '4bda1b770a27edffbd6af00fd8dac1f6'
# Berlin, Germany
berlin_location = '52.5200,13.4050'            

In [33]:
# You don't have to use these classes, but we recommend them as a good place to start!
class WeatherGetter():
    
    def __init__(self, api_key, location, dates_list):
        self.api_key = api_key
        self.location = location
        self.dates_list = dates_list
        self.url_base = 'https://api.darksky.net/forecast/'
        
    def determine_rainy_days(self):
        self.rainy_days = {}
        for date in self.dates_list:
            url = self.url_base+self.api_key+'/'+self.location+','+date+'T12:00:00?exclude=currently,minutely,hourly,flags'
#             print(url)
            response = requests.get(url)
            if response.status_code == 200:
                data = response.json()
                rainy_bool = (data['daily']['data'][0]['precipIntensity'] > 0)
                self.rainy_days.update({date: rainy_bool})
            else:
                return pd.DataFrame.from_dict(self.rainy_days, orient='index')
                print("Couldn't get data for "+date)
                break
        return pd.DataFrame.from_dict(self.rainy_days, orient='index')

In [34]:
soccer_weather = WeatherGetter(api_key=my_api_key, location=berlin_location, dates_list=game_day_list)

In [35]:
rainy_days_df = soccer_weather.determine_rainy_days()

In [32]:
rainy_days_df.head()

Unnamed: 0,0
2011-07-15,False
2011-07-16,False
2011-07-17,True
2011-07-18,True
2011-07-22,True


# Merging Rain Data into Team Stats

In [226]:
rainy_days_df.loc['2011-07-15']

pandas.core.series.Series

In [227]:
matches_df['RainyDay'] = matches_df['Date'].apply(lambda x: rainy_days_df.loc[x][0])

In [233]:
matches_df.head(11)

Unnamed: 0,Team,TeamGoals,OpponentGoals,Date,Result,Differential,WinDifferential,LossDifferential,Draw,RainyDay
0,Cottbus,2,1,2011-07-15,1,1,1,0,0,False
1,Greuther Furth,2,3,2011-07-15,-1,-1,0,-1,0,False
2,Frankfurt FSV,1,1,2011-07-15,0,0,0,0,1,False
3,Union Berlin,1,1,2011-07-15,0,0,0,0,1,False
4,Ein Frankfurt,3,2,2011-07-15,1,1,1,0,0,False
5,Dresden,1,2,2011-07-15,-1,-1,0,-1,0,False
6,Aachen,0,1,2011-07-16,-1,-1,0,-1,0,False
7,Ingolstadt,0,2,2011-07-16,-1,-2,0,-2,0,False
8,Erzgebirge Aue,1,0,2011-07-16,1,1,1,0,0,False
9,St Pauli,2,0,2011-07-16,1,2,2,0,0,False


In [247]:
rainy_games_by_team = matches_df.groupby(by='Team').RainyDay.count()
rainy_game_wins_by_team = matches_df.loc[matches_df.Result == 1].groupby(by='Team').RainyDay.count()

In [302]:
print(teams_df.shape[0])
print(len(rainy_games_by_team))
print(len(rainy_game_wins_by_team))

56
56
56


In [304]:
temp_series = rainy_game_wins_by_team / rainy_games_by_team

In [305]:
temp_df = temp_series.reset_index()

In [306]:
temp_df.head()

Unnamed: 0,Team,RainyDay
0,Aachen,0.176471
1,Arsenal,0.552632
2,Aston Villa,0.184211
3,Augsburg,0.235294
4,Bayern Munich,0.676471


In [307]:
teams_df['RainyDayWinRate'] = temp_df['RainyDay']

In [308]:
teams_df.head()

Unnamed: 0,Team,TeamGoals,OpponentGoals,TeamWins,TeamLosses,TeamDraws,RainyDayWinRate
0,Aachen,30,47,6,15,13,0.176471
1,Arsenal,74,49,21,10,7,0.552632
2,Aston Villa,37,53,7,14,17,0.184211
3,Augsburg,36,49,8,12,14,0.235294
4,Bayern Munich,77,22,23,7,4,0.676471


In [321]:
teams_mongo_df = teams_df[['Team', 'TeamGoals', 'TeamWins', 'RainyDayWinRate']]

In [322]:
teams_mongo_df.head()

Unnamed: 0,Team,TeamGoals,TeamWins,RainyDayWinRate
0,Aachen,30,6,0.176471
1,Arsenal,74,21,0.552632
2,Aston Villa,37,7,0.184211
3,Augsburg,36,8,0.235294
4,Bayern Munich,77,23,0.676471


In [325]:
teams_mongo_dict_list = teams_mongo_df.to_dict('records')

In [326]:
teams_mongo_dict_list

[{'Team': 'Aachen',
  'TeamGoals': 30,
  'TeamWins': 6,
  'RainyDayWinRate': 0.17647058823529413},
 {'Team': 'Arsenal',
  'TeamGoals': 74,
  'TeamWins': 21,
  'RainyDayWinRate': 0.5526315789473685},
 {'Team': 'Aston Villa',
  'TeamGoals': 37,
  'TeamWins': 7,
  'RainyDayWinRate': 0.18421052631578946},
 {'Team': 'Augsburg',
  'TeamGoals': 36,
  'TeamWins': 8,
  'RainyDayWinRate': 0.23529411764705882},
 {'Team': 'Bayern Munich',
  'TeamGoals': 77,
  'TeamWins': 23,
  'RainyDayWinRate': 0.6764705882352942},
 {'Team': 'Blackburn',
  'TeamGoals': 48,
  'TeamWins': 8,
  'RainyDayWinRate': 0.21052631578947367},
 {'Team': 'Bochum',
  'TeamGoals': 41,
  'TeamWins': 10,
  'RainyDayWinRate': 0.29411764705882354},
 {'Team': 'Bolton',
  'TeamGoals': 46,
  'TeamWins': 10,
  'RainyDayWinRate': 0.2631578947368421},
 {'Team': 'Braunschweig',
  'TeamGoals': 37,
  'TeamWins': 10,
  'RainyDayWinRate': 0.29411764705882354},
 {'Team': 'Chelsea',
  'TeamGoals': 65,
  'TeamWins': 18,
  'RainyDayWinRate': 0.47

In [333]:
for team_dict in teams_mongo_dict_list:
    hist_file_path = 'histograms/'+team_dict['Team']+'.png'
    with open(hist_file_path, 'rb') as hist_file:
        hist_img = hist_file.read()
        hist_file.close()
    team_dict.update({'Histogram': hist_img})

In [334]:
teams_mongo_dict_list[0]

{'Team': 'Aachen',
 'TeamGoals': 30,
 'TeamWins': 6,
 'RainyDayWinRate': 0.17647058823529413,
 'Histogram': b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x048\x00\x00\x01\xf8\x08\x06\x00\x00\x00\xaf\xc0]\x9c\x00\x00\x00\x04sBIT\x08\x08\x08\x08|\x08d\x88\x00\x00\x00\tpHYs\x00\x00\x0b\x12\x00\x00\x0b\x12\x01\xd2\xdd~\xfc\x00\x00\x008tEXtSoftware\x00matplotlib version3.1.1, http://matplotlib.org/\x10f\x17\x19\x00\x00 \x00IDATx\x9c\xec\xddy\xb8]\xf3\xdd7\xfe\xf7\xc9@431\xb4\xa5\x82\x18"\x83\x94\x90&\x84\x94\xa0%fj\xaeh)\x8f\x06m\x89R\x0fE\xcd\xd5\xc9\x8d\x9bpSZmZ3!1E\xf4.1\x04i$\xdcmL\x91\x84\x92hB\x12"\xd3\xfa\xfd\xd1\x9f\xf347\x9a#9\xe7\xec\xbd\xcey\xbd\xae\xcb\xe5\x9c\xb5\xd7\xfe~>\xdf\xb5\xf6:{\x9fw\xd6Z\xa7\xa6(\x8a"\x00\x00\x00\x00%\xd6\xa2\xd2\r\x00\x00\x00\x00\xac*\x01\x07\x00\x00\x00Pz\x02\x0e\x00\x00\x00\xa0\xf4\x04\x1c\x00\x00\x00@\xe9\t8\x00\x00\x00\x80\xd2\x13p\x00\x00\x00\x00\xa5\'\xe0\x00\x00\x00\x00JO\xc0\x01\x00\x00\x00\x94\x9e\x80\x03\x00\x00\x00(=\x01\x07\x00\x00\x00Pz\x

# Populating the Mongo Database

In [None]:
myclient = pymongo.MongoClient("mongodb://127.0.0.1:27017/")

In [None]:
db = myclient['mod_2_final_lab']

In [None]:
mycollection = db['mod_2_final_lab_collection']

In [315]:
class MongoHandler():

    def __init__(self, server="mongodb://127.0.0.1:27017/"):
        self.server = server
    
    def create_client(self):
        self.myclient = pymongo.MongoClient(self.server)
        return self.myclient
    
    def change_db(self, database):
        self.db = self.myclient[database]
        return self.db
    
    def change_coll(self, collection):
        self.mycollection = self.db[collection]
        return self.mycollection

In [316]:
handler = MongoHandler()

In [317]:
myclient = handler.create_client()

In [318]:
db = handler.change_db('mod_2_final_lab')

In [319]:
mycollection = handler.change_coll('mod_2_final_lab_collection')

In [335]:
mycollection.insert_many(teams_mongo_dict_list)

<pymongo.results.InsertManyResult at 0x18db2dd0e88>

In [344]:
# results = mycollection.find_one({'Team': 'Aachen'})
# results.get('Histogram')

In [345]:
# # successful test
# results = mycollection.find_one({'Team': 'Aachen'})
# newimg = results.get('Histogram')

# with open('/Users/natha/Downloads/Aachen.jpg', 'wb') as f:
#     f.write(newimg)
#     f.close()

# Summary

In this lab, we dug deep and used everything we've learned so far about python programming, databases, HTTP requests and API calls to ETL data from a SQL database into a MongoDB instance!