I. Querying the SQL database

II. Calculating summary statistics (team wins and losses; total number of goals scored) + histogram of each team's wins and losses

III. Getting the weather data from the DarkSky API + calculating the team's win percentage on days when it was raining

IV. Loading the data into MongoDB

I. SQL database

In [71]:
import pandas as pd

import seaborn as sns

import sqlite3 

conn = sqlite3.connect('database.sqlite')
cur = conn.cursor()

Inspecting the first table, 'matches'

In [2]:
cur.execute("""SELECT * 
                FROM Matches;""")

df1 = pd.DataFrame(cur.fetchall())

df1.columns = [x[0] for x in cur.description]
df1.head()

Unnamed: 0,Match_ID,Div,Season,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR
0,1,D2,2009,2010-04-04,Oberhausen,Kaiserslautern,2,1,H
1,2,D2,2009,2009-11-01,Munich 1860,Kaiserslautern,0,1,A
2,3,D2,2009,2009-10-04,Frankfurt FSV,Kaiserslautern,1,1,D
3,4,D2,2009,2010-02-21,Frankfurt FSV,Karlsruhe,2,1,H
4,5,D2,2009,2009-12-06,Ahlen,Karlsruhe,1,3,A


Table: Matches Match_ID (int): unique ID per match Div (str): identifies the division the match was played in (D1 = Bundesliga, D2 = Bundesliga 2, E0 = English Premier League) Season (int): Season the match took place in (usually covering the period of August till May of the following year) Date (str): Date of the match HomeTeam (str): Name of the home team AwayTeam (str): Name of the away team FTHG (int) (Full Time Home Goals): Number of goals scored by the home team FTAG (int) (Full Time Away Goals): Number of goals scored by the away team FTR (str) (Full Time Result): 3-way result of the match (H = Home Win, D = Draw, A = Away Win)

Selecting all the information from the matches table during the 2011 season for teams in the Bundesliga only

In [3]:
cur.execute("""SELECT * 
                FROM Matches
                WHERE season = '2011' AND Div != 'E0';""")

df2 = pd.DataFrame(cur.fetchall())

df2.columns = [x[0] for x in cur.description]
df2.head()

Unnamed: 0,Match_ID,Div,Season,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR
0,1092,D1,2011,2012-03-31,Nurnberg,Bayern Munich,0,1,A
1,1093,D1,2011,2011-12-11,Stuttgart,Bayern Munich,1,2,A
2,1094,D1,2011,2011-08-13,Wolfsburg,Bayern Munich,0,1,A
3,1095,D1,2011,2011-11-27,Mainz,Bayern Munich,3,2,H
4,1096,D1,2011,2012-02-18,Freiburg,Bayern Munich,0,0,D


Selecting individual dates from which to extract weather information based on dates on which bundesliga teams played.

In [4]:
cur.execute("""SELECT DISTINCT Date
                FROM Matches
                WHERE Season = '2011' AND Div != 'E0'
                ORDER BY Date ASC;""")

dates = pd.DataFrame(cur.fetchall())

dates.columns = [x[0] for x in cur.description]
dates.head()

Unnamed: 0,Date
0,2011-07-15
1,2011-07-16
2,2011-07-17
3,2011-07-18
4,2011-07-22


Selecting relevant data (match_ID, season, date of match, home team, away team, home team goals, away team goals and information about the match) from the matches table during the 2011 season for the bundesliga.

In [5]:
cur.execute("""SELECT Match_ID, Div, Season, Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR
            FROM Matches
            WHERE season = '2011' AND Div != 'E0';""")

df3 = pd.DataFrame(cur.fetchall())

df3.columns = [x[0] for x in cur.description]
df3.head()

Unnamed: 0,Match_ID,Div,Season,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR
0,1092,D1,2011,2012-03-31,Nurnberg,Bayern Munich,0,1,A
1,1093,D1,2011,2011-12-11,Stuttgart,Bayern Munich,1,2,A
2,1094,D1,2011,2011-08-13,Wolfsburg,Bayern Munich,0,1,A
3,1095,D1,2011,2011-11-27,Mainz,Bayern Munich,3,2,H
4,1096,D1,2011,2012-02-18,Freiburg,Bayern Munich,0,0,D


Focus: team wins and losses

Investigating the number of matches where the home team won.

In [6]:
cur.execute("""SELECT DISTINCT HomeTeam, Match_ID, Div, Season, Date, FTHG, FTR
            FROM Matches
            WHERE season = '2011' AND FTR = 'H' AND Div != 'E0'
            GROUP BY HomeTeam;""")

home_wins = pd.DataFrame(cur.fetchall())

home_wins.columns = [x[0] for x in cur.description]
home_wins.head()


Unnamed: 0,HomeTeam,Match_ID,Div,Season,Date,FTHG,FTR
0,Aachen,1247,D2,2011,2012-04-29,1,H
1,Augsburg,1189,D1,2011,2012-03-31,2,H
2,Bayern Munich,1178,D1,2011,2011-12-16,3,H
3,Bochum,1154,D2,2011,2011-09-25,2,H
4,Braunschweig,1347,D2,2011,2011-11-27,4,H


Checking the number of wins for the home team in 2011

In [7]:
cur.execute("""SELECT HomeTeam as HOMETEAM, COUNT(FTR) as home_wins
            FROM Matches
            WHERE season = '2011' AND FTR = 'H' AND Div != 'E0'
            GROUP BY HomeTeam;""")

home_wins = pd.DataFrame(cur.fetchall())

home_wins.columns = [x[0] for x in cur.description]
home_wins

Unnamed: 0,HOMETEAM,home_wins
0,Aachen,4
1,Augsburg,6
2,Bayern Munich,14
3,Bochum,7
4,Braunschweig,6
5,Cottbus,4
6,Dortmund,14
7,Dresden,8
8,Duisburg,8
9,Ein Frankfurt,11


Checking the number of losses for the home team in 2011

In [8]:
cur.execute("""SELECT HomeTeam as hometeam, COUNT(FTR) as home_losses
            FROM Matches
            WHERE season = '2011' AND FTR = 'A' AND Div != 'E0'
            GROUP BY HomeTeam;""")

home_losses = pd.DataFrame(cur.fetchall())

home_losses.columns = [x[0] for x in cur.description]
home_losses

Unnamed: 0,hometeam,home_losses
0,Aachen,7
1,Augsburg,4
2,Bayern Munich,2
3,Bochum,7
4,Braunschweig,3
5,Cottbus,5
6,Dortmund,1
7,Dresden,4
8,Duisburg,7
9,Ein Frankfurt,1


There is one row missing in the home_losses table: Hannover does not appear in it. This means Hannover did not suffer any losses when it played as a home team. 

We can amend the home_wins query, remove the entry for Hannover, and then reinsert the data from Hannover after we have merged the two tables for home team wins and home team losses.

In [9]:
cur.execute("""SELECT HomeTeam as HOMETEAM, COUNT(FTR) as home_wins
            FROM Matches
            WHERE season = '2011' AND FTR = 'H' AND Div != 'E0' AND HomeTeam != 'Hannover'
            GROUP BY HomeTeam;""")

home_wins2 = pd.DataFrame(cur.fetchall())

home_wins2.columns = [x[0] for x in cur.description]
home_wins2

Unnamed: 0,HOMETEAM,home_wins
0,Aachen,4
1,Augsburg,6
2,Bayern Munich,14
3,Bochum,7
4,Braunschweig,6
5,Cottbus,4
6,Dortmund,14
7,Dresden,8
8,Duisburg,8
9,Ein Frankfurt,11


In [10]:
home_wins_losses = pd.concat([home_wins2,home_losses], axis = 1)

In [11]:
home_wins_losses

Unnamed: 0,HOMETEAM,home_wins,hometeam,home_losses
0,Aachen,4,Aachen,7
1,Augsburg,6,Augsburg,4
2,Bayern Munich,14,Bayern Munich,2
3,Bochum,7,Bochum,7
4,Braunschweig,6,Braunschweig,3
5,Cottbus,4,Cottbus,5
6,Dortmund,14,Dortmund,1
7,Dresden,8,Dresden,4
8,Duisburg,8,Duisburg,7
9,Ein Frankfurt,11,Ein Frankfurt,1


In [12]:
Hannover = pd.DataFrame({"HOMETEAM":['Hannover'], 
                    "home_wins":[10], "hometeam":['Hannover'], "home_losses":[0]})

In [13]:
Home_Wins_Losses = home_wins_losses.append(Hannover, ignore_index = True)

Checking the number of wins for the away teams in 2011

In [14]:
cur.execute("""SELECT AwayTeam as AWAYTEAM, COUNT(FTR) as away_wins
            FROM Matches
            WHERE season = '2011' AND FTR = 'A' AND Div != 'E0'
            GROUP BY AwayTeam;""")

away_wins = pd.DataFrame(cur.fetchall())

away_wins.columns = [x[0] for x in cur.description]
away_wins

Unnamed: 0,AWAYTEAM,away_wins
0,Aachen,2
1,Augsburg,2
2,Bayern Munich,9
3,Bochum,3
4,Braunschweig,4
5,Cottbus,4
6,Dortmund,11
7,Dresden,4
8,Duisburg,2
9,Ein Frankfurt,9


In [15]:
cur.execute("""SELECT AwayTeam as awayteam, COUNT(FTR) as away_losses
            FROM Matches
            WHERE season = '2011' AND FTR = 'H' AND Div != 'E0'
            GROUP BY AwayTeam;""")

away_losses = pd.DataFrame(cur.fetchall())

away_losses.columns = [x[0] for x in cur.description]
away_losses

Unnamed: 0,awayteam,away_losses
0,Aachen,8
1,Augsburg,8
2,Bayern Munich,5
3,Bochum,10
4,Braunschweig,6
5,Cottbus,10
6,Dortmund,2
7,Dresden,9
8,Duisburg,8
9,Ein Frankfurt,5


In order for our data sets to match, we must delete the entries from Hannover above and append them to our new dataframe.

In [16]:
cur.execute("""SELECT AwayTeam as AWAYTEAM, COUNT(FTR) as away_wins
            FROM Matches
            WHERE season = '2011' AND FTR = 'A' AND Div != 'E0' AND AwayTeam != 'Hannover'
            GROUP BY AwayTeam;""")

away_wins2 = pd.DataFrame(cur.fetchall())

away_wins2.columns = [x[0] for x in cur.description]
away_wins2

Unnamed: 0,AWAYTEAM,away_wins
0,Aachen,2
1,Augsburg,2
2,Bayern Munich,9
3,Bochum,3
4,Braunschweig,4
5,Cottbus,4
6,Dortmund,11
7,Dresden,4
8,Duisburg,2
9,Ein Frankfurt,9


In [17]:
cur.execute("""SELECT AwayTeam as awayteam, COUNT(FTR) as away_losses
            FROM Matches
            WHERE season = '2011' AND FTR = 'H' AND Div != 'E0' AND awayteam != 'Hannover'
            GROUP BY AwayTeam;""")

away_losses2 = pd.DataFrame(cur.fetchall())

away_losses2.columns = [x[0] for x in cur.description]
away_losses2

Unnamed: 0,awayteam,away_losses
0,Aachen,8
1,Augsburg,8
2,Bayern Munich,5
3,Bochum,10
4,Braunschweig,6
5,Cottbus,10
6,Dortmund,2
7,Dresden,9
8,Duisburg,8
9,Ein Frankfurt,5


In [18]:
away_wins_losses = pd.concat([away_wins2,away_losses2], axis = 1)

In [19]:
away_wins_losses

Unnamed: 0,AWAYTEAM,away_wins,awayteam,away_losses
0,Aachen,2,Aachen,8
1,Augsburg,2,Augsburg,8
2,Bayern Munich,9,Bayern Munich,5
3,Bochum,3,Bochum,10
4,Braunschweig,4,Braunschweig,6
5,Cottbus,4,Cottbus,10
6,Dortmund,11,Dortmund,2
7,Dresden,4,Dresden,9
8,Duisburg,2,Duisburg,8
9,Ein Frankfurt,9,Ein Frankfurt,5


In [20]:
Hannover = pd.DataFrame({"AWAYTEAM":['Hannover'], 
                    "away_wins":[2], "awayteam":['Hannover'], "away_losses":[10]})

In [21]:
Away_Wins_Losses = away_wins_losses.append(Hannover, ignore_index = True)

In [22]:
total_wins_losses = pd.concat([Home_Wins_Losses,Away_Wins_Losses], axis = 1)

In [23]:
total_wins_losses

Unnamed: 0,HOMETEAM,home_wins,hometeam,home_losses,AWAYTEAM,away_wins,awayteam,away_losses
0,Aachen,4,Aachen,7,Aachen,2,Aachen,8
1,Augsburg,6,Augsburg,4,Augsburg,2,Augsburg,8
2,Bayern Munich,14,Bayern Munich,2,Bayern Munich,9,Bayern Munich,5
3,Bochum,7,Bochum,7,Bochum,3,Bochum,10
4,Braunschweig,6,Braunschweig,3,Braunschweig,4,Braunschweig,6
5,Cottbus,4,Cottbus,5,Cottbus,4,Cottbus,10
6,Dortmund,14,Dortmund,1,Dortmund,11,Dortmund,2
7,Dresden,8,Dresden,4,Dresden,4,Dresden,9
8,Duisburg,8,Duisburg,7,Duisburg,2,Duisburg,8
9,Ein Frankfurt,11,Ein Frankfurt,1,Ein Frankfurt,9,Ein Frankfurt,5


In [28]:
total_wins_losses['total_wins_overall'] = total_wins_losses['home_wins'] + total_wins_losses['away_wins']

In [29]:
total_wins_losses

Unnamed: 0,HOMETEAM,home_wins,hometeam,home_losses,AWAYTEAM,away_wins,awayteam,away_losses,total_wins_overall
0,Aachen,4,Aachen,7,Aachen,2,Aachen,8,6
1,Augsburg,6,Augsburg,4,Augsburg,2,Augsburg,8,8
2,Bayern Munich,14,Bayern Munich,2,Bayern Munich,9,Bayern Munich,5,23
3,Bochum,7,Bochum,7,Bochum,3,Bochum,10,10
4,Braunschweig,6,Braunschweig,3,Braunschweig,4,Braunschweig,6,10
5,Cottbus,4,Cottbus,5,Cottbus,4,Cottbus,10,8
6,Dortmund,14,Dortmund,1,Dortmund,11,Dortmund,2,25
7,Dresden,8,Dresden,4,Dresden,4,Dresden,9,12
8,Duisburg,8,Duisburg,7,Duisburg,2,Duisburg,8,10
9,Ein Frankfurt,11,Ein Frankfurt,1,Ein Frankfurt,9,Ein Frankfurt,5,20


In [32]:
total_wins_losses['total_losses_overall'] = total_wins_losses['home_losses'] + total_wins_losses['away_losses']

In [33]:
total_wins_losses

Unnamed: 0,HOMETEAM,home_wins,hometeam,home_losses,AWAYTEAM,away_wins,awayteam,away_losses,total_wins_overall,total_losses_overall
0,Aachen,4,Aachen,7,Aachen,2,Aachen,8,6,15
1,Augsburg,6,Augsburg,4,Augsburg,2,Augsburg,8,8,12
2,Bayern Munich,14,Bayern Munich,2,Bayern Munich,9,Bayern Munich,5,23,7
3,Bochum,7,Bochum,7,Bochum,3,Bochum,10,10,17
4,Braunschweig,6,Braunschweig,3,Braunschweig,4,Braunschweig,6,10,9
5,Cottbus,4,Cottbus,5,Cottbus,4,Cottbus,10,8,15
6,Dortmund,14,Dortmund,1,Dortmund,11,Dortmund,2,25,3
7,Dresden,8,Dresden,4,Dresden,4,Dresden,9,12,13
8,Duisburg,8,Duisburg,7,Duisburg,2,Duisburg,8,10,15
9,Ein Frankfurt,11,Ein Frankfurt,1,Ein Frankfurt,9,Ein Frankfurt,5,20,6


The table above shows the number of team wins and losses overall in the bundesliga in 2011.

In [44]:
total_wins_losses.drop(['home_wins', 'hometeam', 'home_losses', 'AWAYTEAM', 'away_wins', 'awayteam', 'away_losses'], axis=1)

Unnamed: 0,HOMETEAM,total_wins_overall,total_losses_overall
0,Aachen,6,15
1,Augsburg,8,12
2,Bayern Munich,23,7
3,Bochum,10,17
4,Braunschweig,10,9
5,Cottbus,8,15
6,Dortmund,25,3
7,Dresden,12,13
8,Duisburg,10,15
9,Ein Frankfurt,20,6


Focus: total number of goals scored

In [36]:
cur.execute("""SELECT DISTINCT HomeTeam as HomeT, SUM(FTHG)
                FROM Matches
                WHERE season = '2011' AND Div != 'E0'
                GROUP BY HomeTeam
                ORDER BY HomeTeam ASC;""")

home_goals= pd.DataFrame(cur.fetchall())

home_goals.columns = [x[0] for x in cur.description]
home_goals

Unnamed: 0,HomeT,SUM(FTHG)
0,Aachen,15
1,Augsburg,20
2,Bayern Munich,49
3,Bochum,23
4,Braunschweig,21
5,Cottbus,18
6,Dortmund,44
7,Dresden,30
8,Duisburg,23
9,Ein Frankfurt,38


In [37]:
cur.execute("""SELECT DISTINCT AwayTeam as AwayT, SUM(FTAG)
                FROM Matches
                WHERE season = '2011' AND Div != 'E0'
                GROUP BY AwayTeam
                ORDER BY AwayTeam ASC;""")

away_goals= pd.DataFrame(cur.fetchall())

away_goals.columns = [x[0] for x in cur.description]
away_goals

Unnamed: 0,AwayT,SUM(FTAG)
0,Aachen,15
1,Augsburg,16
2,Bayern Munich,28
3,Bochum,18
4,Braunschweig,16
5,Cottbus,12
6,Dortmund,36
7,Dresden,20
8,Duisburg,19
9,Ein Frankfurt,38


In [38]:
total_goals = pd.concat([home_goals,away_goals], axis = 1)

In [39]:
total_goals

Unnamed: 0,HomeT,SUM(FTHG),AwayT,SUM(FTAG)
0,Aachen,15,Aachen,15
1,Augsburg,20,Augsburg,16
2,Bayern Munich,49,Bayern Munich,28
3,Bochum,23,Bochum,18
4,Braunschweig,21,Braunschweig,16
5,Cottbus,18,Cottbus,12
6,Dortmund,44,Dortmund,36
7,Dresden,30,Dresden,20
8,Duisburg,23,Duisburg,19
9,Ein Frankfurt,38,Ein Frankfurt,38


In [40]:
total_goals['total'] = total_goals['SUM(FTHG)'] + total_goals['SUM(FTAG)']

In [41]:
total_goals

Unnamed: 0,HomeT,SUM(FTHG),AwayT,SUM(FTAG),total
0,Aachen,15,Aachen,15,30
1,Augsburg,20,Augsburg,16,36
2,Bayern Munich,49,Bayern Munich,28,77
3,Bochum,23,Bochum,18,41
4,Braunschweig,21,Braunschweig,16,37
5,Cottbus,18,Cottbus,12,30
6,Dortmund,44,Dortmund,36,80
7,Dresden,30,Dresden,20,50
8,Duisburg,23,Duisburg,19,42
9,Ein Frankfurt,38,Ein Frankfurt,38,76


Below is the total number of (home and away) goals scored per team in the bundesliga in 2011.

In [42]:
total_goals.drop(['SUM(FTHG)', 'AwayT', 'SUM(FTAG)'], axis=1)

Unnamed: 0,HomeT,total
0,Aachen,30
1,Augsburg,36
2,Bayern Munich,77
3,Bochum,41
4,Braunschweig,37
5,Cottbus,30
6,Dortmund,80
7,Dresden,50
8,Duisburg,42
9,Ein Frankfurt,76


Statistics for use with the team's winning percentage on days when it was raining.

First, we need to access the Dark Sky API to get historical weather data for the matches. We can append this information to the matches table for all teams in the bundesliga in 2011, and then from there begin to calculate summary statistics.

In [61]:
cur.execute("""SELECT *
                FROM Matches
                WHERE Season = '2011' AND Div != 'E0'
                ORDER BY Date ASC;""")

dates = pd.DataFrame(cur.fetchall())

dates.columns = [x[0] for x in cur.description]
dates

Unnamed: 0,Match_ID,Div,Season,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR
0,1133,D2,2011,2011-07-15,Cottbus,Dresden,2,1,H
1,1167,D2,2011,2011-07-15,Greuther Furth,Ein Frankfurt,2,3,A
2,1551,D2,2011,2011-07-15,Frankfurt FSV,Union Berlin,1,1,D
3,1550,D2,2011,2011-07-16,Erzgebirge Aue,Aachen,1,0,H
4,1678,D2,2011,2011-07-16,St Pauli,Ingolstadt,2,0,H
5,1146,D2,2011,2011-07-17,Karlsruhe,Duisburg,3,2,H
6,1442,D2,2011,2011-07-17,Braunschweig,Munich 1860,3,1,H
7,1602,D2,2011,2011-07-17,Hansa Rostock,Paderborn,1,2,A
8,1360,D2,2011,2011-07-18,Fortuna Dusseldorf,Bochum,2,0,H
9,1505,D2,2011,2011-07-22,Duisburg,Cottbus,1,2,A


This column is the specific column we want to use for the API.

In [62]:
cur.execute("""SELECT DISTINCT Date
                FROM Matches
                WHERE Season = '2011' AND Div != 'E0'
                ORDER BY Date ASC;""")

dates = pd.DataFrame(cur.fetchall())

dates.columns = [x[0] for x in cur.description]
dates

Unnamed: 0,Date
0,2011-07-15
1,2011-07-16
2,2011-07-17
3,2011-07-18
4,2011-07-22
5,2011-07-23
6,2011-07-24
7,2011-07-25
8,2011-08-05
9,2011-08-06


An initial get request to the API gives us an overview of the returned json file.

In [65]:
import requests

In [66]:
m = requests.get(f"https://api.darksky.net/forecast/85591acf50c4208c767c4eb4b18856a5/52.5200,13.4050,2012-05-06T16:00:00").json()

In [67]:
m

{'latitude': 52.52,
 'longitude': 13.405,
 'timezone': 'Europe/Berlin',
 'currently': {'time': 1336312800,
  'summary': 'Mostly Cloudy',
  'icon': 'partly-cloudy-day',
  'precipIntensity': 0,
  'precipProbability': 0,
  'temperature': 51.03,
  'apparentTemperature': 51.03,
  'dewPoint': 37.4,
  'humidity': 0.59,
  'windSpeed': 4,
  'windGust': 4,
  'windBearing': 15,
  'cloudCover': 0.75,
  'uvIndex': 3,
  'visibility': 6.216},
 'hourly': {'summary': 'Possible light rain overnight and in the morning.',
  'icon': 'rain',
  'data': [{'time': 1336255200,
    'summary': 'Mostly Cloudy',
    'icon': 'partly-cloudy-night',
    'precipIntensity': 0,
    'precipProbability': 0,
    'temperature': 43.63,
    'apparentTemperature': 41.02,
    'dewPoint': 38.01,
    'humidity': 0.8,
    'windSpeed': 4.61,
    'windGust': 4.61,
    'windBearing': 0,
    'cloudCover': 0.86,
    'uvIndex': 0,
    'visibility': 6.216},
   {'time': 1336258800,
    'summary': 'Possible Drizzle',
    'icon': 'rain',
   

The 'summary' key within the 'currently' key (two levels down in the dictionary object returned in the json file) contains information about the weather - specifically, whether it is raining or not. We can use these in our get request from the API (when we select for specific dates).

As noted on the Dark Sky overview page for requests, we need to specify the latitude and longitude we are interested in (Berlin) alongside information about the date we are interested in (including year, month, date and a specfic hour). I will use 16h, as this is a common starting time for football matches.

In [63]:
dates_list = []

for i in dates['Date']:
    dates_list.append(i + "T16:00:00")

print(dates_list)

['2011-07-15T16:00:00', '2011-07-16T16:00:00', '2011-07-17T16:00:00', '2011-07-18T16:00:00', '2011-07-22T16:00:00', '2011-07-23T16:00:00', '2011-07-24T16:00:00', '2011-07-25T16:00:00', '2011-08-05T16:00:00', '2011-08-06T16:00:00', '2011-08-07T16:00:00', '2011-08-08T16:00:00', '2011-08-12T16:00:00', '2011-08-13T16:00:00', '2011-08-14T16:00:00', '2011-08-15T16:00:00', '2011-08-19T16:00:00', '2011-08-20T16:00:00', '2011-08-21T16:00:00', '2011-08-22T16:00:00', '2011-08-26T16:00:00', '2011-08-27T16:00:00', '2011-08-28T16:00:00', '2011-08-29T16:00:00', '2011-09-09T16:00:00', '2011-09-10T16:00:00', '2011-09-11T16:00:00', '2011-09-12T16:00:00', '2011-09-16T16:00:00', '2011-09-17T16:00:00', '2011-09-18T16:00:00', '2011-09-19T16:00:00', '2011-09-23T16:00:00', '2011-09-24T16:00:00', '2011-09-25T16:00:00', '2011-09-26T16:00:00', '2011-09-30T16:00:00', '2011-10-01T16:00:00', '2011-10-02T16:00:00', '2011-10-03T16:00:00', '2011-10-14T16:00:00', '2011-10-15T16:00:00', '2011-10-16T16:00:00', '2011-10-1

In [69]:
import json

In [70]:
from dotenv import load_dotenv

import os

load_dotenv()

class WeatherGetter():
    def __init__(self):
        self.BASE_URL = 'https://api.darksky.net'
        self.secretkey = os.getenv('DARKSKY_API')       
        
    def weather_getter(self,date):
        forecast = requests.get(f"{self.BASE_URL}forecast/{self.secretkey}/52.5200,13.4050{date}")
        return forecast

In [72]:
weather_dates = {}

for date in dates_list:
    call_weather = requests.get(f"https://api.darksky.net/forecast/85591acf50c4208c767c4eb4b18856a5/52.5200,13.4050,{date}").json()
    weather = call_weather['currently']['summary']
    weather_dates[date] = weather
    
weather_dates

{'2011-07-15T16:00:00': 'Mostly Cloudy',
 '2011-07-16T16:00:00': 'Clear',
 '2011-07-17T16:00:00': 'Clear',
 '2011-07-18T16:00:00': 'Partly Cloudy',
 '2011-07-22T16:00:00': 'Light Rain',
 '2011-07-23T16:00:00': 'Mostly Cloudy',
 '2011-07-24T16:00:00': 'Mostly Cloudy',
 '2011-07-25T16:00:00': 'Mostly Cloudy',
 '2011-08-05T16:00:00': 'Mostly Cloudy',
 '2011-08-06T16:00:00': 'Mostly Cloudy',
 '2011-08-07T16:00:00': 'Mostly Cloudy',
 '2011-08-08T16:00:00': 'Light Rain',
 '2011-08-12T16:00:00': 'Mostly Cloudy',
 '2011-08-13T16:00:00': 'Partly Cloudy',
 '2011-08-14T16:00:00': 'Mostly Cloudy',
 '2011-08-15T16:00:00': 'Mostly Cloudy',
 '2011-08-19T16:00:00': 'Partly Cloudy',
 '2011-08-20T16:00:00': 'Clear',
 '2011-08-21T16:00:00': 'Clear',
 '2011-08-22T16:00:00': 'Clear',
 '2011-08-26T16:00:00': 'Clear',
 '2011-08-27T16:00:00': 'Mostly Cloudy',
 '2011-08-28T16:00:00': 'Mostly Cloudy',
 '2011-08-29T16:00:00': 'Possible Drizzle',
 '2011-09-09T16:00:00': 'Mostly Cloudy',
 '2011-09-10T16:00:00': 'M

In [73]:
from collections import Counter 

Counter(weather_dates.values())

Counter({'Mostly Cloudy': 62,
         'Clear': 49,
         'Partly Cloudy': 14,
         'Light Rain': 3,
         'Possible Drizzle': 3,
         'Possible Light Rain': 3,
         'Overcast': 2})

In [83]:
cur.execute("""SELECT *
                FROM Matches
                WHERE Season = '2011' AND Div != 'E0' AND HomeTeam = 'Aachen'
                ORDER BY Date ASC;""")

dates = pd.DataFrame(cur.fetchall())

dates.columns = [x[0] for x in cur.description]
dates

Unnamed: 0,Match_ID,Div,Season,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR
0,1574,D2,2011,2011-07-24,Aachen,Braunschweig,0,2,A
1,1507,D2,2011,2011-08-13,Aachen,Cottbus,0,2,A
2,1702,D2,2011,2011-08-27,Aachen,Fortuna Dusseldorf,0,0,D
3,1529,D2,2011,2011-09-18,Aachen,Greuther Furth,0,0,D
4,1662,D2,2011,2011-09-30,Aachen,Frankfurt FSV,1,3,A
5,1682,D2,2011,2011-10-23,Aachen,Ingolstadt,3,1,H
6,1148,D2,2011,2011-11-06,Aachen,Duisburg,2,2,D
7,1446,D2,2011,2011-12-04,Aachen,Munich 1860,2,2,D
8,1595,D2,2011,2011-12-11,Aachen,Erzgebirge Aue,1,1,D
9,1465,D2,2011,2012-02-04,Aachen,St Pauli,2,1,H


In [84]:
cur.execute("""SELECT *
                FROM Matches
                WHERE Season = '2011' AND Div != 'E0' AND AwayTeam = 'Aachen'
                ORDER BY Date ASC;""")

dates = pd.DataFrame(cur.fetchall())

dates.columns = [x[0] for x in cur.description]
dates

Unnamed: 0,Match_ID,Div,Season,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR
0,1550,D2,2011,2011-07-16,Erzgebirge Aue,Aachen,1,0,H
1,1545,D2,2011,2011-08-05,St Pauli,Aachen,3,1,H
2,1535,D2,2011,2011-08-19,Hansa Rostock,Aachen,0,0,D
3,1537,D2,2011,2011-09-09,Paderborn,Aachen,0,0,D
4,1543,D2,2011,2011-09-24,Union Berlin,Aachen,2,0,H
5,1539,D2,2011,2011-10-16,Dresden,Aachen,1,1,D
6,1548,D2,2011,2011-10-28,Bochum,Aachen,1,0,H
7,1534,D2,2011,2011-11-20,Ein Frankfurt,Aachen,4,3,H
8,1549,D2,2011,2011-11-26,Karlsruhe,Aachen,0,2,A
9,1541,D2,2011,2011-12-18,Braunschweig,Aachen,1,1,D
