# BUSINESS CASE: EUROLEAGUE CALENDAR

### Welcome to Euroleague Data!

In this scenario, our main goal is to obtain useful information receiving the calendar of the Euroleague Basketball

![](https://i.ytimg.com/vi/V9m_mrusffc/hqdefault.jpg)

Instead of opening a CSV or an Excel File, we will start opening a Json file: 

In [1]:
# import the recommended libraries
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import json
import numpy as np

In [2]:
data=json.load(open("el2023.json"))

#### First, let's explore the data: 

In [3]:
type(data)       # Check that it's a dictionnaire

dict

Looking good! It is a dictionary (json style!) We will now try to use it as a DataFrame using pandas: 

In [4]:
df=pd.DataFrame(data)

In [5]:
df.head()

Unnamed: 0,data,total
0,"{'gameCode': 304, 'season': {'name': 'EuroLeag...",306
1,"{'gameCode': 306, 'season': {'name': 'EuroLeag...",306
2,"{'gameCode': 303, 'season': {'name': 'EuroLeag...",306
3,"{'gameCode': 305, 'season': {'name': 'EuroLeag...",306
4,"{'gameCode': 301, 'season': {'name': 'EuroLeag...",306


![](https://media.tenor.com/O2Tz9B1UEMsAAAAM/sxv-wtf.gif)

WTF is going on? Let's explore more about this json file: 

In [6]:
data.keys()

dict_keys(['data', 'total'])

In [7]:
data["data"]       # That's a list of dictionnaires and .keys() cannot be applied!!!!!

[{'gameCode': 304,
  'season': {'name': 'EuroLeague 2023-24',
   'code': 'E2023',
   'alias': '2023-24',
   'competitionCode': 'E',
   'year': 2023,
   'startDate': '2023-06-29T00:00:00'},
  'group': {'id': '1c8d3521-392b-4637-8aea-7b35f797bbba',
   'order': 1,
   'name': 'Group Regular Season',
   'rawName': 'Regular Season'},
  'phaseType': {'code': 'RS',
   'alias': 'REGULAR SEASON',
   'name': 'Regular Season',
   'isGroupPhase': True},
  'round': 34,
  'roundAlias': 'Round 34',
  'roundName': 'Round 34',
  'played': False,
  'date': '2024-04-12T21:00:00',
  'confirmedDate': True,
  'confirmedHour': True,
  'localTimeZone': 2,
  'localDate': '2024-04-12T21:00:00',
  'utcDate': '2024-04-12T19:00:00Z',
  'local': {'club': {'code': 'ASV',
    'name': 'LDLC ASVEL Villeurbanne',
    'abbreviatedName': 'ASVEL',
    'editorialName': 'ASVEL',
    'tvCode': 'ASV',
    'isVirtual': False,
    'images': {'crest': 'https://media-cdn.incrowdsports.com/e33c6d1a-95ca-4dbc-b8cb-0201812104cc.png'}}

In [8]:
data["data"][300]

{'gameCode': 6,
 'season': {'name': 'EuroLeague 2023-24',
  'code': 'E2023',
  'alias': '2023-24',
  'competitionCode': 'E',
  'year': 2023,
  'startDate': '2023-06-29T00:00:00'},
 'group': {'id': '1c8d3521-392b-4637-8aea-7b35f797bbba',
  'order': 1,
  'name': 'Group Regular Season',
  'rawName': 'Regular Season'},
 'phaseType': {'code': 'RS',
  'alias': 'REGULAR SEASON',
  'name': 'Regular Season',
  'isGroupPhase': True},
 'round': 1,
 'roundAlias': 'Round 1',
 'roundName': 'Round 1',
 'played': True,
 'date': '2023-10-06T19:45:00',
 'confirmedDate': True,
 'confirmedHour': True,
 'localTimeZone': 3,
 'localDate': '2023-10-06T20:45:00',
 'utcDate': '2023-10-06T17:45:00Z',
 'local': {'club': {'code': 'ULK',
   'name': 'Fenerbahce Beko Istanbul',
   'abbreviatedName': 'Fenerbahce',
   'editorialName': 'Fenerbahce',
   'tvCode': 'FBB',
   'isVirtual': False,
   'images': {'crest': 'https://media-cdn.incrowdsports.com/0233ebbb-f3a2-49ea-837c-7fd3e661e672.png'}},
  'score': 85,
  'standin

In [9]:
data["data"][300].keys()

dict_keys(['gameCode', 'season', 'group', 'phaseType', 'round', 'roundAlias', 'roundName', 'played', 'date', 'confirmedDate', 'confirmedHour', 'localTimeZone', 'localDate', 'utcDate', 'local', 'road', 'audience', 'audienceConfirmed', 'socialFeed', 'operationsCode', 'referee1', 'referee2', 'referee3', 'referee4', 'venue', 'isNeutralVenue', 'gameStatus', 'winner'])

In [10]:
data["data"][300]["date"]

'2023-10-06T19:45:00'

In [11]:
data["data"][300]["local"]       # That's another dictionnaire (with nested dictionnaires)!!!!!

{'club': {'code': 'ULK',
  'name': 'Fenerbahce Beko Istanbul',
  'abbreviatedName': 'Fenerbahce',
  'editorialName': 'Fenerbahce',
  'tvCode': 'FBB',
  'isVirtual': False,
  'images': {'crest': 'https://media-cdn.incrowdsports.com/0233ebbb-f3a2-49ea-837c-7fd3e661e672.png'}},
 'score': 85,
 'standingsScore': 76,
 'partials': {'partials1': 19,
  'partials2': 19,
  'partials3': 24,
  'partials4': 14,
  'extraPeriods': {'1': 9}}}

In [12]:
data["data"][300]["road"]       # That's another dictionnaire (with nested dictionnaires)!!!!!

{'club': {'code': 'MIL',
  'name': 'EA7 Emporio Armani Milan',
  'abbreviatedName': 'Milan',
  'editorialName': 'Milan',
  'tvCode': 'EA7',
  'isVirtual': False,
  'images': {'crest': 'https://media-cdn.incrowdsports.com/8154f184-c61a-4e7f-b14d-9d802e35cb95.png'}},
 'score': 82,
 'standingsScore': 76,
 'partials': {'partials1': 23,
  'partials2': 12,
  'partials3': 19,
  'partials4': 22,
  'extraPeriods': {'1': 6}}}

In [14]:
data["data"][290]["road"]       # Keys can vary (more or less) depending on the index run!!!!!

{'club': {'code': 'BAR',
  'name': 'FC Barcelona',
  'abbreviatedName': 'Barcelona',
  'editorialName': 'Barcelona',
  'tvCode': 'BAR',
  'isVirtual': False,
  'images': {'crest': 'https://media-cdn.incrowdsports.com/35dfa503-e417-481f-963a-bdf6f013763e.png'}},
 'score': 77,
 'standingsScore': 77,
 'partials': {'partials1': 23,
  'partials2': 17,
  'partials3': 16,
  'partials4': 21,
  'extraPeriods': {}}}

In [15]:
data["data"][290]["referee1"]

{'code': 'OADJ',
 'name': 'PUKL,SASA',
 'alias': 'PUKL,S.',
 'country': {'code': 'SLO', 'name': 'Slovenia'},
 'images': {'verticalSmall': '8863itxgormdtqhs'},
 'active': True}

### We will need to extract our information step by step!
#### Before trying to extract everything and then filtering the data, we will do the other way around: What are the basic columns that we must obtain from this json file?
- fecha
- ronda
- local
- localscore
- road
- roadscore
- arbitros(list)

In [16]:
# Explore the df to know exactly where the useful info is
data["data"][300]["roundName"]

'Round 1'

#### Now that we have everythin that we need, it is time to extract it and save it in lists!

In [17]:
fecha=[]
ronda=[]
local=[]
localscore=[]
road=[]
roadscore=[]
arbitros=[[]for i in range(len(data["data"]))]       # It's also an empty list, but the total referees are 4!!!!!

In [18]:
count=0
for i in data["data"]:
    fecha.append(i["date"])
    ronda.append(i["round"])
    local.append(i["local"]["club"]["name"])
    localscore.append(i["local"]["score"])
    road.append(i["road"]["club"]["name"])
    roadscore.append(i["road"]["score"])
    if i["local"]["score"]>0:
        for k in range(3):
            arbitros[count].append(i[f"referee{k+1}"]["name"])
    count+=1

#### Let's now save it as a DataFrame

In [19]:
tabla=pd.DataFrame({"Fecha": fecha, 
                    "Ronda": ronda, 
                    "Local": local, 
                    "Visitante": road, 
                    "Arbitros": arbitros, 
                    "Localscore": localscore,
                    "Roadscore": roadscore})

#### Is there any problem when looking at the .head of the data? Is it normal?

In [20]:
tabla.head()

Unnamed: 0,Fecha,Ronda,Local,Visitante,Arbitros,Localscore,Roadscore
0,2024-04-12T21:00:00,34,LDLC ASVEL Villeurbanne,FC Barcelona,[],0,0
1,2024-04-12T20:30:00,34,Partizan Mozzart Bet Belgrade,Valencia Basket,[],0,0
2,2024-04-12T20:30:00,34,Virtus Segafredo Bologna,Baskonia Vitoria-Gasteiz,[],0,0
3,2024-04-12T20:15:00,34,Olympiacos Piraeus,Fenerbahce Beko Istanbul,[],0,0
4,2024-04-11T20:15:00,34,Panathinaikos AKTOR Athens,ALBA Berlin,[],0,0


#### Try the info and describe methods. Is any of them not the thing we are expecting? 
tik, tak

In [21]:
tabla.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 306 entries, 0 to 305
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Fecha       306 non-null    object
 1   Ronda       306 non-null    int64 
 2   Local       306 non-null    object
 3   Visitante   306 non-null    object
 4   Arbitros    306 non-null    object
 5   Localscore  306 non-null    int64 
 6   Roadscore   306 non-null    int64 
dtypes: int64(3), object(4)
memory usage: 16.9+ KB


In [22]:
tabla.Fecha=pd.to_datetime(tabla.Fecha)

In [23]:
tabla.info()       # Notice that Dtype for 'Fecha' has changed!!!!!

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 306 entries, 0 to 305
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   Fecha       306 non-null    datetime64[ns]
 1   Ronda       306 non-null    int64         
 2   Local       306 non-null    object        
 3   Visitante   306 non-null    object        
 4   Arbitros    306 non-null    object        
 5   Localscore  306 non-null    int64         
 6   Roadscore   306 non-null    int64         
dtypes: datetime64[ns](1), int64(3), object(3)
memory usage: 16.9+ KB


In [24]:
tabla['Dia'] = tabla['Fecha'].map(lambda x: str(x)[:10])       # This way, dates & times are treated as strings!!!!!
tabla['Hora'] = tabla['Fecha'].map(lambda x: str(x)[11:])

tabla['Dia'] = tabla['Fecha'].dt.strftime('%Y-%m-%d')       # Better to treat dates & times this way!!!!!
tabla['Hora'] = tabla['Fecha'].dt.strftime('%H:%M:%S')         # Keep in mind that '-' in 'dates' can vary depending on the format

#### Columns Addition: Competition and Phase

In [30]:
tabla["Competition"] = "EuroLeague"
tabla["Phase"] = "Regular Season"

In [31]:
tabla.sample()

Unnamed: 0,Ronda,Local,Visitante,Arbitros,Localscore,Roadscore,Dia,Hora,Competititon,Phase,Competition
116,22,Anadolu Efes Istanbul,FC Barcelona,"[HORDOV, TOMISLAV , FOUFIS, IOANNIS, TSAROUCHA...",98,74,2024-01-18,18:30:00,EuroLeague,Regular Season,EuroLeague


#### Column Drop: Any useless information?

In [32]:
tabla.drop(columns="Fecha", inplace=True)       # Let's imagine 'Fecha' is not needed

KeyError: "['Fecha'] not found in axis"

### Rearranging the Columns: Any ideas?

In [33]:
tabla=tabla[['Competition', 'Phase', 'Ronda', 'Dia', 'Hora', 'Local', 'Visitante', 'Localscore', 'Roadscore','Arbitros' ]]

### Now let's try to get rid of the matches without information: BRAINSTORMING

In [34]:
actual=tabla[tabla["Localscore"]>0]

In [35]:
actual.head()

Unnamed: 0,Competition,Phase,Ronda,Dia,Hora,Local,Visitante,Localscore,Roadscore,Arbitros
81,EuroLeague,Regular Season,25,2024-02-02,20:30:00,Virtus Segafredo Bologna,Partizan Mozzart Bet Belgrade,88,84,"[LOTTERMOSER, ROBERT, PUKL,SASA, TRAWICKI, TOM..."
82,EuroLeague,Regular Season,25,2024-02-02,20:30:00,AS Monaco,Fenerbahce Beko Istanbul,76,69,"[JAVOR, DAMIR, FOUFIS, IOANNIS, BALAK, AMIT]"
83,EuroLeague,Regular Season,25,2024-02-02,20:00:00,Crvena Zvezda Meridianbet Belgrade,FC Barcelona,76,85,"[ROCHA, FERNANDO, RACYS, SAULIUS, KONSTANTINOV..."
84,EuroLeague,Regular Season,25,2024-02-02,19:00:00,Zalgiris Kaunas,Panathinaikos AKTOR Athens,80,68,"[RADOVIC, SRETEN, PASTUSIAK, PIOTR, CORTES, CA..."
85,EuroLeague,Regular Season,25,2024-02-02,18:30:00,Anadolu Efes Istanbul,EA7 Emporio Armani Milan,79,73,"[PEREZ, MIGUEL ANGEL, KARDUM, LUKA, BAENA, ALB..."


In [36]:
actual.reset_index(inplace=True, drop=True)

In [37]:
actual.head()

Unnamed: 0,Competition,Phase,Ronda,Dia,Hora,Local,Visitante,Localscore,Roadscore,Arbitros
0,EuroLeague,Regular Season,25,2024-02-02,20:30:00,Virtus Segafredo Bologna,Partizan Mozzart Bet Belgrade,88,84,"[LOTTERMOSER, ROBERT, PUKL,SASA, TRAWICKI, TOM..."
1,EuroLeague,Regular Season,25,2024-02-02,20:30:00,AS Monaco,Fenerbahce Beko Istanbul,76,69,"[JAVOR, DAMIR, FOUFIS, IOANNIS, BALAK, AMIT]"
2,EuroLeague,Regular Season,25,2024-02-02,20:00:00,Crvena Zvezda Meridianbet Belgrade,FC Barcelona,76,85,"[ROCHA, FERNANDO, RACYS, SAULIUS, KONSTANTINOV..."
3,EuroLeague,Regular Season,25,2024-02-02,19:00:00,Zalgiris Kaunas,Panathinaikos AKTOR Athens,80,68,"[RADOVIC, SRETEN, PASTUSIAK, PIOTR, CORTES, CA..."
4,EuroLeague,Regular Season,25,2024-02-02,18:30:00,Anadolu Efes Istanbul,EA7 Emporio Armani Milan,79,73,"[PEREZ, MIGUEL ANGEL, KARDUM, LUKA, BAENA, ALB..."


In [38]:
actual["Plusminus"]=actual["Localscore"]-actual["Roadscore"]

In [39]:
actual.head()

Unnamed: 0,Competition,Phase,Ronda,Dia,Hora,Local,Visitante,Localscore,Roadscore,Arbitros,Plusminus
0,EuroLeague,Regular Season,25,2024-02-02,20:30:00,Virtus Segafredo Bologna,Partizan Mozzart Bet Belgrade,88,84,"[LOTTERMOSER, ROBERT, PUKL,SASA, TRAWICKI, TOM...",4
1,EuroLeague,Regular Season,25,2024-02-02,20:30:00,AS Monaco,Fenerbahce Beko Istanbul,76,69,"[JAVOR, DAMIR, FOUFIS, IOANNIS, BALAK, AMIT]",7
2,EuroLeague,Regular Season,25,2024-02-02,20:00:00,Crvena Zvezda Meridianbet Belgrade,FC Barcelona,76,85,"[ROCHA, FERNANDO, RACYS, SAULIUS, KONSTANTINOV...",-9
3,EuroLeague,Regular Season,25,2024-02-02,19:00:00,Zalgiris Kaunas,Panathinaikos AKTOR Athens,80,68,"[RADOVIC, SRETEN, PASTUSIAK, PIOTR, CORTES, CA...",12
4,EuroLeague,Regular Season,25,2024-02-02,18:30:00,Anadolu Efes Istanbul,EA7 Emporio Armani Milan,79,73,"[PEREZ, MIGUEL ANGEL, KARDUM, LUKA, BAENA, ALB...",6


In [40]:
actual["Winner"]=np.where(actual["Plusminus"]>0, actual.Local, actual.Visitante)
                                #  Condition       If 'yes'        If 'no'

In [41]:
actual.head()

Unnamed: 0,Competition,Phase,Ronda,Dia,Hora,Local,Visitante,Localscore,Roadscore,Arbitros,Plusminus,Winner
0,EuroLeague,Regular Season,25,2024-02-02,20:30:00,Virtus Segafredo Bologna,Partizan Mozzart Bet Belgrade,88,84,"[LOTTERMOSER, ROBERT, PUKL,SASA, TRAWICKI, TOM...",4,Virtus Segafredo Bologna
1,EuroLeague,Regular Season,25,2024-02-02,20:30:00,AS Monaco,Fenerbahce Beko Istanbul,76,69,"[JAVOR, DAMIR, FOUFIS, IOANNIS, BALAK, AMIT]",7,AS Monaco
2,EuroLeague,Regular Season,25,2024-02-02,20:00:00,Crvena Zvezda Meridianbet Belgrade,FC Barcelona,76,85,"[ROCHA, FERNANDO, RACYS, SAULIUS, KONSTANTINOV...",-9,FC Barcelona
3,EuroLeague,Regular Season,25,2024-02-02,19:00:00,Zalgiris Kaunas,Panathinaikos AKTOR Athens,80,68,"[RADOVIC, SRETEN, PASTUSIAK, PIOTR, CORTES, CA...",12,Zalgiris Kaunas
4,EuroLeague,Regular Season,25,2024-02-02,18:30:00,Anadolu Efes Istanbul,EA7 Emporio Armani Milan,79,73,"[PEREZ, MIGUEL ANGEL, KARDUM, LUKA, BAENA, ALB...",6,Anadolu Efes Istanbul


In [42]:
actual.rename(columns={"Winner": "Ganador"}, inplace=True)

In [43]:
actual.head()

Unnamed: 0,Competition,Phase,Ronda,Dia,Hora,Local,Visitante,Localscore,Roadscore,Arbitros,Plusminus,Ganador
0,EuroLeague,Regular Season,25,2024-02-02,20:30:00,Virtus Segafredo Bologna,Partizan Mozzart Bet Belgrade,88,84,"[LOTTERMOSER, ROBERT, PUKL,SASA, TRAWICKI, TOM...",4,Virtus Segafredo Bologna
1,EuroLeague,Regular Season,25,2024-02-02,20:30:00,AS Monaco,Fenerbahce Beko Istanbul,76,69,"[JAVOR, DAMIR, FOUFIS, IOANNIS, BALAK, AMIT]",7,AS Monaco
2,EuroLeague,Regular Season,25,2024-02-02,20:00:00,Crvena Zvezda Meridianbet Belgrade,FC Barcelona,76,85,"[ROCHA, FERNANDO, RACYS, SAULIUS, KONSTANTINOV...",-9,FC Barcelona
3,EuroLeague,Regular Season,25,2024-02-02,19:00:00,Zalgiris Kaunas,Panathinaikos AKTOR Athens,80,68,"[RADOVIC, SRETEN, PASTUSIAK, PIOTR, CORTES, CA...",12,Zalgiris Kaunas
4,EuroLeague,Regular Season,25,2024-02-02,18:30:00,Anadolu Efes Istanbul,EA7 Emporio Armani Milan,79,73,"[PEREZ, MIGUEL ANGEL, KARDUM, LUKA, BAENA, ALB...",6,Anadolu Efes Istanbul


In [44]:
actual["Winner"]=np.where(actual["Ganador"]==actual["Local"], "Local", "Visitante")

### Now, let's order the dataframe: 
- Should we order it depending on more than one column?
- Will it affect the index?

In [46]:
actual.sort_values(["Ronda", "Dia", "Hora"], inplace=True)

In [47]:
actual.reset_index(inplace=True, drop=True)

In [48]:
actual.head(4)

Unnamed: 0,Competition,Phase,Ronda,Dia,Hora,Local,Visitante,Localscore,Roadscore,Arbitros,Plusminus,Ganador,Winner
0,EuroLeague,Regular Season,1,2023-10-05,19:00:00,Crvena Zvezda Meridianbet Belgrade,LDLC ASVEL Villeurbanne,94,73,"[ROCHA, FERNANDO, PATERNICO, CARMELO, PASTUSIA...",21,Crvena Zvezda Meridianbet Belgrade,Local
1,EuroLeague,Regular Season,1,2023-10-05,20:05:00,Maccabi Playtika Tel Aviv,Partizan Mozzart Bet Belgrade,96,81,"[PEREZ, MIGUEL ANGEL, VILIUS, GYTIS, KARDUM, L...",15,Maccabi Playtika Tel Aviv,Local
2,EuroLeague,Regular Season,1,2023-10-05,20:30:00,Virtus Segafredo Bologna,Zalgiris Kaunas,79,82,"[BOLTAUZER, MATEJ, PANTHER, ANNE, SILVA, SERGIO]",-3,Zalgiris Kaunas,Visitante
3,EuroLeague,Regular Season,1,2023-10-05,20:30:00,FC Bayern Munich,ALBA Berlin,80,68,"[GARCIA, JUAN CARLOS, LATISEVS, OLEGS, SUKYS, ...",12,FC Bayern Munich,Local


### Addition of more columns (based on conditions):

- +-
- Winners of the matches

#### Rename of a particular column

#### What happens if we apply the np.where to a column that has already been created?

## Questions about this basic DF
- There has been more Local or Road winners?
- Which club has won more matches?

## Referees

In this particular column, we could have some troubles as the object is not a string but a `list`

We want to analyze: 

- How many referees has already been in a match?
- Who is the referee that has been in more local victories?

## Geolocalization: Map function

Con este listado de países de la línea de más abajo, queremos asignar en cada row el país a cada uno de los equipos para saber dónde juega cada uno. 

Después, queremos analizar: 

- La media de puntos local y visitante por país
- Cuál es el máximo de anotación local y visitante por país
- El número de veces que han ganado los equipos locales en sus tierras, cuando jugaban de local

In [47]:
countries=["Serbia", "Israel", "Italy", "Germany", "Spain", "Turkey", "Greece", "Spain", "Spain", "Turkey","France", "Germany", "Lithuania", "France", "Greece", "Italy", "Serbia", "Spain"]

# Creation of the classification

Now we are going to create the Final Classification: 

- What are the columns that we are going to use?
- Is any transformation needed?
- How can we operate with both local and Road teams?

# WE WILL SEE THIS IN THE LAB
## Let's compare the competitions!

![](https://eurospects.com/wp-content/uploads/2018/10/eurocupeuroleague.png)