## Importing Required Libraries
For this lab, we will be using the following libraries:
*  pandas for managing the data.
*  numpy for mathematical operations.
*  sklearn for machine learning and machine-learning-pipeline related functions.
*  seaborn for visualizing the data.
*  matplotlib for additional plotting tools.

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
#The code, "%matplotlib inline," is a command used in programming to make sure that any graphs or charts created with the matplotlib library are displayed directly in the output without any extra steps. It's like having a picture instantly show up on a screen without needing to open another program to see it.
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_digits, load_wine
from sklearn.model_selection import GridSearchCV
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

*MLPClassifier is a machine learning algorithm used for classification tasks. MLP stands for Multi-Layer Perceptron, which is a type of artificial neural network. This algorithm consists of multiple layers of nodes, with each node performing a weighted sum of inputs and passing the result through an activation function. The output of one layer is then used as input for the next layer until the final output is produced. The MLPClassifier is commonly used in supervised learning tasks, where the algorithm is trained on labeled data to predict the class label of new, unseen data.*

*GridSearchCV is a method used in machine learning to tune hyperparameters of a model by exhaustively searching over a specified parameter grid. It involves training and evaluating the model with each combination of hyperparameters in the grid, and selecting the combination that yields the best performance. This process can be computationally expensive, but it can lead to improved model performance and generalization.*

## Collect and Analyse the Data
The following is all the national team game results in 2022. (from Kaggle https://www.kaggle.com/datasets/martj42/international-football-results-from-1872-to-2017)

Modified dataset ('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IND-GPXX0TUZEN/Training_games.csv')

In [3]:
path = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IND-GPXX0TUZEN/Training_games.csv'

Game_all = pd.read_csv(path)
Game_all.head()

Unnamed: 0.1,Unnamed: 0,home_team,away_team,home_score,away_score,tournament,city,country,neutral
0,2022-01-02,Gabon,Burkina Faso,0.0,3.0,Friendly,Dubai,United Arab Emirates,True
1,2022-01-02,Sudan,Zimbabwe,0.0,0.0,Friendly,Yaoundé,Cameroon,True
2,2022-01-03,Rwanda,Guinea,3.0,0.0,Friendly,Kigali,Rwanda,False
3,2022-01-04,Mauritania,Gabon,1.0,1.0,Friendly,Dubai,United Arab Emirates,True
4,2022-01-05,Algeria,Ghana,3.0,0.0,Friendly,Al Rayyan,Qatar,True


To get a list of column names in the `Game_all` DataFrame, you can use the `columns` attribute of the DataFrame.
This will print a list of column names in the `Game_all` DataFrame.

In [5]:
column_list = Game_all.columns.tolist()
print(column_list)

['Unnamed: 0', 'home_team', 'away_team', 'home_score', 'away_score', 'tournament', 'city', 'country', 'neutral']


To get the contents of a specific column in the `Game_all` DataFrame, you can use indexing with the specific column name. This will print the contents of the specified column.

In [7]:
column_contents = Game_all["home_team"]
print(column_contents)

0           Gabon
1           Sudan
2          Rwanda
3      Mauritania
4         Algeria
          ...    
736        Norway
737        Sweden
738        Kosovo
739        Greece
740          Fiji
Name: home_team, Length: 741, dtype: object


To remove duplicates from a column in the `Game_all` DataFrame, you can use the `drop_duplicates()` method of the DataFrame. This will print the unique values of the specified column.

In [8]:
unique_column_contents = Game_all["home_team"].drop_duplicates()
print(unique_column_contents)

0                  Gabon
1                  Sudan
2                 Rwanda
3             Mauritania
4                Algeria
             ...        
570          Afghanistan
574              Myanmar
597              Mapuche
599               Aymara
603    Brunei Darussalam
Name: home_team, Length: 206, dtype: object


Automatically extract the unique team names from both the 'home_team' and 'away_team' columns of the `Game_all` DataFrame and use them to create the `Teams` list.
This will create a new DataFrame called `Game_used` that contains only rows where either the 'home_team' or 'away_team' column contains a value that is present in the `Teams` list. The `Teams` list is created by concatenating the unique values of the 'home_team' and 'away_team' columns, removing duplicates using the `set()` function, and converting the resulting set to a list using the `list()` function.To check the size or dimension of the `Teams` list, you can use the `len()` function.
This will print the number of elements in the `Teams` list, which corresponds to the number of unique team names.

In [18]:
home_teams = Game_all['home_team'].unique().tolist()
away_teams = Game_all['away_team'].unique().tolist()
Teams = list(set(home_teams + away_teams))
Game_used = Game_all[Game_all['home_team'].isin(Teams) | Game_all['away_team'].isin(Teams)]
print(Teams)
print(len(Teams))

['Guinea-Bissau', 'Luxembourg', 'Senegal', 'Brunei', 'Malaysia', 'Fiji', 'Azerbaijan', 'Wales', 'Jordan', 'Kuwait', 'New Zealand', 'Andorra', 'Kyrgyzstan', 'Iraq', 'Ecuador', 'Philippines', 'Morocco', 'South Africa', 'Palestine', 'El Salvador', 'Northern Ireland', 'Gambia', 'Portugal', 'Mauritius', 'Republic of Ireland', 'Togo', 'Cyprus', 'Myanmar', 'Turks and Caicos Islands', 'Saudi Arabia', 'China PR', 'Saint Martin', 'Timor-Leste', 'Bahrain', 'Slovenia', 'Cook Islands', 'Ukraine', 'Burundi', 'Greece', 'Sweden', 'DR Congo', 'Bangladesh', 'Venezuela', 'Equatorial Guinea', 'Tunisia', 'Somalia', 'Yemen', 'Ivory Coast', 'Zambia', 'Sierra Leone', 'Laos', 'Dominica', 'Netherlands', 'Cayman Islands', 'Hong Kong', 'Antigua and Barbuda', 'Rwanda', 'Sint Maarten', 'Grenada', 'Indonesia', 'Finland', 'Cape Verde', 'Papua New Guinea', 'Gibraltar', 'Norway', 'United States', 'Namibia', 'Uganda', 'Scotland', 'Martinique', 'Israel', 'Peru', 'Aruba', 'Slovakia', 'Trinidad and Tobago', 'England', 'Cro

In [14]:
print(Game_used)

     Unnamed: 0   home_team         away_team  home_score  away_score  \
0    2022-01-02       Gabon      Burkina Faso         0.0         3.0   
1    2022-01-02       Sudan          Zimbabwe         0.0         0.0   
2    2022-01-03      Rwanda            Guinea         3.0         0.0   
3    2022-01-04  Mauritania             Gabon         1.0         1.0   
4    2022-01-05     Algeria             Ghana         3.0         0.0   
..          ...         ...               ...         ...         ...   
736  2022-09-27      Norway            Serbia         0.0         2.0   
737  2022-09-27      Sweden          Slovenia         1.0         1.0   
738  2022-09-27      Kosovo            Cyprus         5.0         1.0   
739  2022-09-27      Greece  Northern Ireland         3.0         1.0   
740  2022-09-30        Fiji   Solomon Islands         NaN         NaN   

                   tournament        city               country  neutral  
0                    Friendly       Dubai  Unite

To make sure that every team that is going to the World Cup is present in the training set, you can compare the list of teams in the training set with the list of World Cup teams.
This code extracts the unique team names from both the 'home_team' and 'away_team' columns of the `Game_all` DataFrame and concatenates them into a single list called `training_teams`. It then compares this list with the list of World Cup teams using the `set()` function and the `-` operator to find any missing teams. If there are no missing teams, it prints a message indicating that all World Cup teams are present in the training set. Otherwise, it prints a message listing the missing teams.

In [19]:
training_teams = Game_all['home_team'].unique().tolist() + Game_all['away_team'].unique().tolist()
missing_teams = set(Teams) - set(training_teams)

if len(missing_teams) == 0:
    print("All World Cup teams are present in the training set.")
else:
    print("The following World Cup teams are missing from the training set:")
    print(missing_teams)


All World Cup teams are present in the training set.


## Training the set