# Project 5 - Machine Learning Model in Python

Artificial intelligence has been with us for a long time, but it has never developed at such a dizzying pace as today. Its models accompany us at every step. Starting with assistants in phones and online stores who recommend applications or products to complex language models such as ChatGPT or Bard.

In this project, I would like to use a popular set of tools in the most popular programming language, Python, in order not only to better understand the relationship between the data used in previous projects, but also to use them to build a model that will allow predicting the grades obtained by gymnastics competitors!

Tools used:

- Jupyter notebook

- Python 3.11.2

At the very beginning, we will prepare the environment in which we will work!

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy.stats import norm
from sklearn.preprocessing import StandardScaler
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

Ok, we have environments - now we need data!

In [2]:
df_junior_qualification = pd.read_csv("E:/Gymnastics on GitHub!/Gymnastics-on-GitHub/Project 5 - ML model in Python/Data used in this project/2nd_Junior_World_Championship_2023/Qualification.csv", sep=';')
df_junior_AA = pd.read_csv("E:/Gymnastics on GitHub!/Gymnastics-on-GitHub/Project 5 - ML model in Python/Data used in this project/2nd_Junior_World_Championship_2023/AA_Final.csv", sep=';')
df_junior_FX = pd.read_csv("E:/Gymnastics on GitHub!/Gymnastics-on-GitHub/Project 5 - ML model in Python/Data used in this project/2nd_Junior_World_Championship_2023/FX_Final.csv", sep=';')
df_junior_PH = pd.read_csv("E:/Gymnastics on GitHub!/Gymnastics-on-GitHub/Project 5 - ML model in Python/Data used in this project/2nd_Junior_World_Championship_2023/PH_Final.csv", sep=';')
df_junior_SR = pd.read_csv("E:/Gymnastics on GitHub!/Gymnastics-on-GitHub/Project 5 - ML model in Python/Data used in this project/2nd_Junior_World_Championship_2023/SR_Final.csv", sep=';')
df_junior_VT = pd.read_csv("E:/Gymnastics on GitHub!/Gymnastics-on-GitHub/Project 5 - ML model in Python/Data used in this project/2nd_Junior_World_Championship_2023/VT_Final.csv", sep=';')
df_junior_PB = pd.read_csv("E:/Gymnastics on GitHub!/Gymnastics-on-GitHub/Project 5 - ML model in Python/Data used in this project/2nd_Junior_World_Championship_2023/PB_Final.csv", sep=';')
df_junior_HB = pd.read_csv("E:/Gymnastics on GitHub!/Gymnastics-on-GitHub/Project 5 - ML model in Python/Data used in this project/2nd_Junior_World_Championship_2023/HB_Final.csv", sep=';')
df_senior_qualification = pd.read_csv("E:/Gymnastics on GitHub!/Gymnastics-on-GitHub/Project 5 - ML model in Python/Data used in this project/51_FIG_World_Championship_2022/Qualification.csv", sep=';')
df_senior_AA = pd.read_csv("E:/Gymnastics on GitHub!/Gymnastics-on-GitHub/Project 5 - ML model in Python/Data used in this project/51_FIG_World_Championship_2022/AA_Final.csv", sep=';')
df_senior_FX = pd.read_csv("E:/Gymnastics on GitHub!/Gymnastics-on-GitHub/Project 5 - ML model in Python/Data used in this project/51_FIG_World_Championship_2022/FX_Final.csv", sep=';')
df_senior_PH = pd.read_csv("E:/Gymnastics on GitHub!/Gymnastics-on-GitHub/Project 5 - ML model in Python/Data used in this project/51_FIG_World_Championship_2022/PH_Final.csv", sep=';')
df_senior_SR = pd.read_csv("E:/Gymnastics on GitHub!/Gymnastics-on-GitHub/Project 5 - ML model in Python/Data used in this project/51_FIG_World_Championship_2022/SR_Final.csv", sep=';')
df_senior_VT = pd.read_csv("E:/Gymnastics on GitHub!/Gymnastics-on-GitHub/Project 5 - ML model in Python/Data used in this project/51_FIG_World_Championship_2022/VT_Final.csv", sep=';')
df_senior_PB = pd.read_csv("E:/Gymnastics on GitHub!/Gymnastics-on-GitHub/Project 5 - ML model in Python/Data used in this project/51_FIG_World_Championship_2022/PB_Final.csv", sep=';')
df_senior_HB = pd.read_csv("E:/Gymnastics on GitHub!/Gymnastics-on-GitHub/Project 5 - ML model in Python/Data used in this project/51_FIG_World_Championship_2022/HB_Final.csv", sep=';')

Ok, we have all the data that we will use to build the model. However, before we start creating it, let's take a look at the data itself and see if it requires cleaning.

In [9]:
print(df_junior_FX.columns)
print(df_junior_HB.columns)
print(df_senior_FX.columns)
print(df_junior_HB.columns)

Index(['Rank', 'Name', 'NOC', 'D\nScore', 'E\nScore', 'Penalty',
       'Total Score'],
      dtype='object')
Index(['Rank', 'Name', 'NOC', 'D\nScore', 'E\nScore', 'Penalty',
       'Total Score'],
      dtype='object')
Index(['Rank', 'Name', 'NOC', 'D_Score', 'E_Score', 'Penalty', 'Total_Score'], dtype='object')
Index(['Rank', 'Name', 'NOC', 'D\nScore', 'E\nScore', 'Penalty',
       'Total Score'],
      dtype='object')


When we want to take a closer look at the data we imported, we notice that the fields containing numerical values ​​do not have the correct type. This absolutely needs to be fixed! Another problem is column names so let's fix this.

First, column names fixing:

In [18]:
# list with correct columns names
apparatus_col_names = ['Rank','Name','Country','DScore','EScore','Penalty','Total Score']

vault_col_names = ['Rank','Name','Country','DScore','EScore','Penalty','Total Score']

aa_col_names = ['Rank','Name','Country','DScore','EScore','Penalty']

# list with all apparatus dataframe
finals_data_frame = [df_junior_FX, df_junior_PH, df_junior_SR, df_junior_PB, df_junior_HB,
                     df_senior_FX, df_senior_PH, df_senior_SR, df_senior_PB, df_senior_HB]

aa_data_frame = [df_junior_qualification, df_junior_AA,
                 df_senior_qualification, df_senior_AA]


# loop changing columns names
for frame in finals_data_frame:
    frame.columns = apparatus_col_names

# look on results
for frame in finals_data_frame:
    print(frame.columns)

Index(['Rank', 'Name', 'Country', 'DScore', 'EScore', 'Penalty',
       'Total Score'],
      dtype='object')
Index(['Rank', 'Name', 'Country', 'DScore', 'EScore', 'Penalty',
       'Total Score'],
      dtype='object')
Index(['Rank', 'Name', 'Country', 'DScore', 'EScore', 'Penalty',
       'Total Score'],
      dtype='object')
Index(['Rank', 'Name', 'Country', 'DScore', 'EScore', 'Penalty',
       'Total Score'],
      dtype='object')
Index(['Rank', 'Name', 'Country', 'DScore', 'EScore', 'Penalty',
       'Total Score'],
      dtype='object')
Index(['Rank', 'Name', 'Country', 'DScore', 'EScore', 'Penalty',
       'Total Score'],
      dtype='object')
Index(['Rank', 'Name', 'Country', 'DScore', 'EScore', 'Penalty',
       'Total Score'],
      dtype='object')
Index(['Rank', 'Name', 'Country', 'DScore', 'EScore', 'Penalty',
       'Total Score'],
      dtype='object')
Index(['Rank', 'Name', 'Country', 'DScore', 'EScore', 'Penalty',
       'Total Score'],
      dtype='object')
Index(['Ra