# Football in Denmark: Where are we playing?

> **Note the following:** 
> 1. This is *not* meant to be an example of an actual **data analysis project**, just an example of how to structure such a project.
> 1. Remember the general advice on structuring and commenting your code
> 1. The `dataproject.py` file includes a function which can be used multiple times in this notebook.

Imports and set magics:

In [5]:
%pip install git+https://github.com/alemartinello/dstapi #Installing the API (only need to do once)

Collecting git+https://github.com/alemartinello/dstapi
  Cloning https://github.com/alemartinello/dstapi to /private/var/folders/z8/1crkytq93gz7b66n_5673q3r0000gn/T/pip-req-build-9i0n08i2
  Running command git clone --filter=blob:none --quiet https://github.com/alemartinello/dstapi /private/var/folders/z8/1crkytq93gz7b66n_5673q3r0000gn/T/pip-req-build-9i0n08i2
  Resolved https://github.com/alemartinello/dstapi to commit d9eeb5a82cbc70b7d63b2ff44d92632fd77123a4
  Preparing metadata (setup.py) ... [?25ldone
Note: you may need to restart the kernel to use updated packages.


In [None]:
%pip install pandas-datareader # Installing the data reader (only need to do once)

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams.update({"axes.grid":True,"grid.color":"black","grid.alpha":"0.25","grid.linestyle":"--"})
plt.rcParams.update({'font.size': 14})
import ipywidgets as widgets
# from matplotlib_venn import venn2
from dstapi import DstApi # install with `pip install git+https://github.com/alemartinello/dstapi`

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# user written modules
import dataproject


# Read and clean data

Import your data, either through an API or manually, and load it. 

In [30]:
columns_dict = {}
columns_dict['BLSTKOM'] = 'county'
columns_dict['AKTIVITET'] = 'activity'
columns_dict['KON'] = 'sex'
columns_dict['ALDER1'] = 'age'
columns_dict['TID'] = 'year'
columns_dict['INDHOLD'] = 'value'

#var_dict = {} # var is for variable
#var_dict['Football'] = 'football'

Downloading all of the football-variables in IDRAKT01

In [47]:
idrakt_api = DstApi('IDRAKT01')  #Creating the DST API which will allow us to interact with the API server
params = idrakt_api._define_base_params(language='en') #Creating a parameter dictionary with the language set to English
variables = params['variables'] # Returns a view, that we can edit
variables[1]['values'] = ['A22'] # Choosing football as the activity (The ID for football is A22 Using code from: https://alemartinello.com/2022/02/24/dstapi/)
print(variables)

[{'code': 'BLSTKOM', 'values': ['*']}, {'code': 'AKTIVITET', 'values': ['A22']}, {'code': 'KON', 'values': ['*']}, {'code': 'ALDER1', 'values': ['*']}, {'code': 'Tid', 'values': ['*']}]


In [48]:
idrakt = idrakt_api.get_data(params=params) #Downloading the dataset

In [49]:
idrakt.head() #Looking at the dataset

Unnamed: 0,BLSTKOM,AKTIVITET,KON,ALDER1,TID,INDHOLD
0,Herning,Football,Men,"Age, total",2017,4950
1,Herning,Football,Men,0-12 years,2017,1850
2,Herning,Football,Men,13-18 years,2017,1070
3,Herning,Football,Men,19-24 years,2017,660
4,Herning,Football,Men,25-59 years,2017,1250


**Step 2:** Rename coloums using colmns_dict

In [50]:
idrakt.rename(columns=columns_dict,inplace=True)
idrakt.head()

Unnamed: 0,county,activity,sex,age,year,value
0,Herning,Football,Men,"Age, total",2017,4950
1,Herning,Football,Men,0-12 years,2017,1850
2,Herning,Football,Men,13-18 years,2017,1070
3,Herning,Football,Men,19-24 years,2017,660
4,Herning,Football,Men,25-59 years,2017,1250


**Step 3:** Only keep rows where the variable is in `Age, total` and afterwards deleting the coloumn.

In [51]:
#Only keeps rows with age = 'Age, total' and afterwards deleting the age coloumn
idrakt = idrakt[idrakt['age'] == 'Age, total']
idrakt.drop(columns=['age'],inplace=True)
idrakt.head()

Unnamed: 0,county,activity,sex,year,value
0,Herning,Football,Men,2017,4950
6,Herning,Football,"Sex, total",2017,6240
12,Herning,Football,Women,2017,1290
18,Horsens,Football,Men,2017,4580
24,Horsens,Football,"Sex, total",2017,6240


**Step 4:** Only keeps rows where the variable is in a region or total

In [52]:
# Keeping rows where 'county' starts with "Landsdel" or is "All Denmark"
idrakt = idrakt[idrakt['county'].str.startswith('Region') | (idrakt['county'] == 'All Denmark')]
idrakt.head()

Unnamed: 0,county,activity,sex,year,value
640,All Denmark,Football,Men,2017,293990
646,All Denmark,Football,"Sex, total",2017,358270
652,All Denmark,Football,Women,2017,64280
2088,All Denmark,Football,Men,2021,296210
2094,All Denmark,Football,"Sex, total",2021,368060


**Step 5:** Sort the dataset by county and year

In [53]:
# Sorting the dataset by county and then year
idrakt.sort_values(by=['county','year'],inplace=True)
idrakt.reset_index(drop=True,inplace=True)
idrakt.head()

Unnamed: 0,county,activity,sex,year,value
0,All Denmark,Football,Men,2014,307430
1,All Denmark,Football,"Sex, total",2014,374400
2,All Denmark,Football,Women,2014,66970
3,All Denmark,Football,Men,2015,306540
4,All Denmark,Football,"Sex, total",2015,372640


In [25]:
# Creating a copy of the dataset where we only keep the rows with the value 'All Denmark' in the 'county' column
idrakt_all = idrakt[idrakt['county'] == 'All Denmark'].copy()
idrakt_all.drop(columns=['county'],inplace=True)
idrakt_all.head()

Unnamed: 0,activity,sex,year,value
640,Football,Men,2017,293990
646,Football,"Sex, total",2017,358270
652,Football,Women,2017,64280
2088,Football,Men,2021,296210
2094,Football,"Sex, total",2021,368060


In [29]:
#Sorting the total dataset by year
idrakt_all.sort_values(by='year',inplace=True)
idrakt_all.head()

Unnamed: 0,activity,sex,year,value
10104,Football,"Sex, total",2014,374400
10098,Football,Men,2014,307430
10110,Football,Women,2014,66970
13590,Football,Men,2015,306540
13596,Football,"Sex, total",2015,372640


## Explore each data set

In order to be able to **explore the raw data**, you may provide **static** and **interactive plots** to show important developments 

**Interactive plot** :

Explain what you see when moving elements of the interactive plot around. 

# Analysis

To get a quick overview of the data, we show some **summary statistics** on a meaningful aggregation. 

MAKE FURTHER ANALYSIS. EXPLAIN THE CODE BRIEFLY AND SUMMARIZE THE RESULTS.

# Conclusion

ADD CONCISE CONLUSION.