# Analysis of UEFA Euro Cup Matches

## Introduction

The UEFA European Football Championship, commonly known as the UEFA European Championship and informally as the Euros, is the primary association football competition contested by the senior men's national teams of the members of the Union of European Football Associations (UEFA), determining the continental champion of Europe. Held every four years since 1960, in the even-numbered year between World Cup tournaments, it was originally called the European Nations' Cup, changing to the current name in 1968. Starting with the 1996 tournament, specific championships are often referred to in the form "UEFA Euro"; this format has since been retroactively applied to earlier tournaments.
Prior to entering the tournament, all teams other than the host nations (which qualify automatically) compete in a qualifying process. Until 2016 the championship winners could compete in the following FIFA Confederations Cup, but were not obliged to do so. 
The 15 European Championship tournaments have been won by ten national teams: Germany and Spain each have won three titles, France has two titles, and the Soviet Union, Italy, Czechoslovakia, Netherlands, Denmark, Greece and Portugal have won one title each. To date, Spain is the only team in history to have won consecutive titles, doing so in 2008 and 2012. It is the second most watched football tournament in the world after the FIFA World Cup. The Euro 2012 final was watched by a global audience of around 300 million.
The most recent championship, hosted by France in 2016, was won by Portugal, who beat France 1–0 in the final at the Stade de France in Saint-Denis after extra time. The final also attracted 284 million viewers which is the second most viewed game in European tournament history.
Source(https://en.wikipedia.org/wiki/UEFA_European_Championship)

## Data description

In the whole history of UEFA played a lot matches, after FIFA World Cup the most popular cup and therefore it is famous cup which plays 1 time for 4 years. All European countries which qualified plays in this cup. 
Analysis was based on all matches from since 1960 due 2016. Below is data that we will be scraped and used for our analysis:

Date – Date of the matches

Time – Time of the matches

HTN – Home team name 

ATN – Away team name

HTG – Home team goals

ATG – Away team goals

Attendance – how many people was in match

Year – Year of the match

PlayerName – Name of Player

Goals - Goals of Players

TS – Top scorers of Cup history

## Research questions

In this project, I will focus on several parts of data like:

Analyze the TOP 10 scorers of Cup

Analyze the attendance of Cup

Analyze the Home team goals of each Cup

Analyze the Away team goals of each Cup

Analyze the goals of each team

## Data manipulation

Drop columns that we will not use

check for the inconsistencies

In [12]:
#import libraries
import pandas as pd
import numpy as np
import seaborn as sns 
import matplotlib.pyplot as plt

#import datasets
df1=pd.read_csv('Uefa Euro Cup All Matches.csv')
df2=pd.read_csv('Uefa Euro Cup All Players.csv')
df1.head(5)
df2.head(5)

#drop columns that don't needed
drop = (['Caps','Club','ShirtNumber','DateofBirth(age)'])
df2.drop(drop, axis = 'columns', inplace = True)

drop = (['Stage','SpecialWinConditions','Stadium','City'])
df1.drop(drop, axis = 'columns', inplace = True)

I have dropped some columns from two data frames.

I need to check in Data frame 1 is there any NaN values

In [14]:
#Check for NaN values
df1.isnull().any()

Date             False
Time             False
HomeTeamName     False
AwayTeamName     False
HomeTeamGoals    False
AwayTeamGoals    False
Attendance       False
Year             False
dtype: bool

In the first data frame i have no missed data values

I need to check Data frame 2 for Nan values

In [29]:
#Checking for NaN values
df2.isnull().any()

Position               False
PlayerName(Captain)    False
Goals                   True
Country                False
Year                   False
dtype: bool

As we can see, there is NaN values in Goals column

In [16]:
df2.isnull().sum()

Position                  0
PlayerName(Captain)       0
Goals                  2102
Country                   0
Year                      0
dtype: int64

We have 2102 NaN values, so we need to change them by 0 for fututre analysis

In [25]:
df2.shape

(3410, 5)

In the second data frame we have 3410 rows. But, in this rows 2102 rows have NaN values, so we need to replace them by 0

In the second i have NaN values in the Goals column, but it should be zero. If there NaN values, that mean player didn't score, so we conclude it should be 0

In [28]:
#NaN replaced by 0
df2.replace("0",np.nan, inplace = True)

In [21]:
#Show the data frame 2
df2.head()

Unnamed: 0,Position,PlayerName(Captain),Goals,Country,Year
0,GK,Justín Javorek,0,Czechoslovakia,1960
1,GK,Viliam Schrojf,0,Czechoslovakia,1960
2,DF,Ladislav Novák ( captain ),0,Czechoslovakia,1960
3,DF,Ján Popluhár,0,Czechoslovakia,1960
4,DF,František Šafránek,0,Czechoslovakia,1960


I have changed the NaN to 0

I need to check is there any duplicate values in my dataframes

In [22]:
#Checking for duplicates
print(df1.duplicated())

0      False
1      False
2      False
3      False
4      False
       ...  
281    False
282    False
283    False
284    False
285    False
Length: 286, dtype: bool


In [23]:
#Checking for duplicates
print(df2.duplicated())

0       False
1       False
2       False
3       False
4       False
        ...  
3405    False
3406    False
3407    False
3408    False
3409    False
Length: 3410, dtype: bool


There is no duplicated values