![Buffalo Soccer Header](https://carload.com/g/BuffaloSoccerWide400.jpg)

# Part 1a, Capture and Convert the OpenLiga data

The data from OpenLigaDB (https://www.openligadb.de/) is accessed through JSON responses. One league at a time, this code ingested that information and changed its JSON formatting into a dataframe of the basic match details we need.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import json
import datetime
from datetime import datetime

In [2]:
with open('data/buffalo/champions_2023.json', 'r') as file:
    data = json.load(file)
#data is a list

An important step in this process is the generation of ordinal dates. They will be crucial in folding this information in with the master Bundesliga season data.

In [3]:
gameCounter = 0
dfMatch = pd.DataFrame({'matchID': [], 'ordDate': [], 'team1Name': [], 'team2Name': [], 'result':[]})

for eachItem in data:
    #eachItem is a dict
    matchID = eachItem['matchID']
    rawDate = eachItem['matchDateTime']
    justDate = rawDate[:10] 
    #justDate is yyyy-mm-dd
    #convert to datetime, then get ordinal
    dateTimeObj = datetime.strptime(justDate, '%Y-%m-%d')
    ordDate = dateTimeObj.toordinal()

    team1Name = eachItem['team1']['shortName']
    team2Name = eachItem['team2']['shortName']
    for partScore in eachItem['matchResults']:
        if (partScore['resultName'] == 'Halbzeit'):
            # halftime  - don't need it
            continue
        if (partScore['resultName'] == 'Endergebnis'):
            #store final score stuff
            team1Score = partScore['pointsTeam1']
            team2Score = partScore['pointsTeam2']
            # Since this is the dfb Cup, there are no draws
            if team1Score > team2Score:
                result = 3 # meaning home team win
            else:
                if team1Score == team2Score:
                    result = 1 # not a tournament, ties happen
                else:
                    result = 0 # visiting team win
                
    dfMatch.loc[gameCounter] = [matchID, ordDate, team1Name, team2Name, result]
    gameCounter = gameCounter + 1


In [4]:
dfMatch.head()

Unnamed: 0,matchID,ordDate,team1Name,team2Name,result
0,68546,738782,YB,Leipzig,0
1,68561,738782,ACM,Newcastle,1
2,68547,738782,Feyenoord,Celtic,3
3,68548,738782,Lazio Rom,,1
4,68549,738782,PSG,BVB,3


In [5]:
print (dfMatch.shape)

(125, 5)


### Match the Team Names

Here's another tricky, semi-manual process. For each of the Bundesliga teams named in the OpenLiga data, I need to ensure that its name string precisely matches the string used by the primary dataset, which we'll see in Part 2.

In [6]:
teamList = dfMatch['team1Name'].unique()
teamList.sort()
print(teamList)
print(type(teamList))

['' 'ACM' 'Arsenal' 'BVB' 'Barcelona' 'Bayern' 'Belgrad' 'Benfica' 'Braga'
 'Celtic' 'Feyenoord' 'Galatasaray' 'Inter' 'Lazio Rom' 'Leipzig' 'Lens'
 'Madrid' "Man'City" 'ManU' 'Newcastle' 'PSG' 'PSV' 'Porto' 'Union Berlin'
 'YB' 'antwerpen']
<class 'numpy.ndarray'>


In [7]:
team2List = dfMatch['team2Name'].unique()
team2List.sort()
print(team2List)
print(type(team2List))

['' 'ACM' 'Arsenal' 'BVB' 'Barcelona' 'Bayern' 'Belgrad' 'Benfica' 'Braga'
 'Celtic' 'Feyenoord' 'Galatasaray' 'Inter' 'Lazio Rom' 'Leipzig' 'Lens'
 'Madrid' "Man'City" 'ManU' 'Newcastle' 'PSG' 'PSV' 'Porto' 'Union Berlin'
 'YB' 'antwerpen']
<class 'numpy.ndarray'>


In [8]:
homeSet = set(teamList)
awaySet = set(team2List)

awayNotHome = awaySet - homeSet
# despite its name, teamList was really an array, so to fix that...
teamListReally = list(teamList)
awayNotHomeList = list(awayNotHome)
allTeams = teamListReally + awayNotHomeList
allTeams.sort()
print(allTeams)

['', 'ACM', 'Arsenal', 'BVB', 'Barcelona', 'Bayern', 'Belgrad', 'Benfica', 'Braga', 'Celtic', 'Feyenoord', 'Galatasaray', 'Inter', 'Lazio Rom', 'Leipzig', 'Lens', 'Madrid', "Man'City", 'ManU', 'Newcastle', 'PSG', 'PSV', 'Porto', 'Union Berlin', 'YB', 'antwerpen']


In [9]:
#Change team names to precisely match the primary dataframe
dfMatch.replace('BVB', 'Dortmund', inplace=True)
dfMatch.replace('Bayern', 'Bayern Munich', inplace=True)
dfMatch.replace('Frankfurt', 'Ein Frankfurt', inplace=True)
#dfMatch.replace('KÃ¶ln', 'FC Koln', inplace=True)
#dfMatch.replace('FÃ¼rth', 'Greuther Furth', inplace=True)
#dfMatch.replace('Gladbach', "M'gladbach", inplace=True)
dfMatch.replace('Leipzig', 'RB Leipzig', inplace=True)
#dfMatch.replace('Schalke', 'Schalke 04', inplace=True)
#dfMatch.replace('Bremen', 'Werder Bremen', inplace=True)

In [10]:
team1List = dfMatch['team1Name'].unique()
team1List.sort()
print(team1List)

['' 'ACM' 'Arsenal' 'Barcelona' 'Bayern Munich' 'Belgrad' 'Benfica'
 'Braga' 'Celtic' 'Dortmund' 'Feyenoord' 'Galatasaray' 'Inter' 'Lazio Rom'
 'Lens' 'Madrid' "Man'City" 'ManU' 'Newcastle' 'PSG' 'PSV' 'Porto'
 'RB Leipzig' 'Union Berlin' 'YB' 'antwerpen']


In [11]:
team2List = dfMatch['team2Name'].unique()
team2List.sort()
print(team2List)

['' 'ACM' 'Arsenal' 'Barcelona' 'Bayern Munich' 'Belgrad' 'Benfica'
 'Braga' 'Celtic' 'Dortmund' 'Feyenoord' 'Galatasaray' 'Inter' 'Lazio Rom'
 'Lens' 'Madrid' "Man'City" 'ManU' 'Newcastle' 'PSG' 'PSV' 'Porto'
 'RB Leipzig' 'Union Berlin' 'YB' 'antwerpen']


Now save the dataframe to a pickle file for merging. While we're here, we might as well save it as a CSV file just in case.

In [None]:
dfMatch.to_pickle('data/buffalo/champions_2023.pkl')

In [5]:
dfMatch.to_csv('data/buffalo/champions_2023.csv')