## <span style="color:darkblue"> First three modules of the class </span>

<font size="5"> 

- **Module 1:**

    1. Introduction to GitHub 
    2. Local computing vs. cloud computing 
    3. Local workflows: Anaconda virtual machines and local folders + GitHub desktop 
    4. Cloud workflows: AWS Notebooks + GitHub

<span style="color:white">'</span>
- **Module 2:**

    1. Numbers, arithmetic and numpy math functions
    2. Lists and numpy arrays: storing sequences of data
    3. For loops and While loops
    4. If-elif-else statements
    5. Tuples and Dictionaries: other ways to store sequences of data

<span style="color:white">'</span>
- **Module 3:**

    1. Built-in functions and User-defined functions
    2. Python scope of variables: local versus global
    3. How to use functions to solve different programing problems

# Module 4: Data Manipulation with Pandas

## <span style="color:darkblue"> Today: Pandas for data analysis in Python </span>

<font size="5"> 

- [Pandas](https://pandas.pydata.org/docs/index.html) is a library for data analysis in Python

- It has functions for analyzing, cleaning, exploring, and manipulating data

- Pandas can clean messy data sets including both quantitative and qualitative data (text as data)

In [1]:
import numpy as np
import pandas as pd

## <span style="color:darkblue"> 1. Object creation </span>

<font size="5"> 

- Let's start by creating what Pandas call a ```Series```

- You can think of a ```Series``` as a column in a table of a dataset

- **Example:** FIFA 23

![data](images/data.png)

- We can create a ```Series``` by passing a ```list``` of values and Pandas will automatically create an index

- To create a ```Series```, we use the Pandas method ```pd.Series()```

- **Example:** create a ```Series``` with the last names of first four players in the previous image

In [2]:
last_names = ['Messi', 'Benzema', 'De Bruyne']
pd.Series(last_names)

0        Messi
1      Benzema
2    De Bruyne
dtype: object

<font size="5"> 

- We can create Pandas ```DataFrames``` by combining different ```Series```

- We can create a ```DataFrame``` by passing a ```Dictionary``` of objects 

- To create a ```DataFrame```, we use the Pandas method ```pd.DataFrame()```

- **Example:** create a ```DataFrame``` with the last name, the overall and the potential from the previous picture.

In [3]:
d = {'last_name':last_names,'overall':[91,91,91], 'potential':[91,91,91]}

In [5]:
pd.DataFrame(d)

Unnamed: 0,last_name,overall,potential
0,Messi,91,91
1,Benzema,91,91
2,De Bruyne,91,91


<font size="5"> 

- What if we only pass an ```int``` to the dictionary value instead of a list?

In [7]:
d2 = {'last_name':last_names, 'overall':91, 'potential':91}
df = pd.DataFrame(d2)

<font size="5"> 

- One (almost) required feature of a table, is that the columns composing it will have different types

- Let's check the types of the columns in our table by using the method ```dtypes```

- [Here](https://regenerativetoday.com/30-very-useful-pandas-functions-for-everyday-data-analysis-tasks/) are some useful Pandas methods!

In [9]:
df.dtypes

last_name    object
overall       int64
potential     int64
dtype: object

## <span style="color:darkblue"> 2. Importing a CSV table </span>

<font size="5"> 

- An easier way to analyze data is to take it from a source and import it into Python

- Pandas can read different types of data format including CSV, XLSX, txt, JSON, SQL ([here](https://www.cbtnuggets.com/blog/technology/programming/14-file-types-you-can-import-into-pandas) is a list)

- In this case, we have access to a CSV table

- We can use the Pandas function ```pd.read_csv()``` to import all the data

In [13]:
fifa = pd.read_csv('fifa23_players.csv')

In [14]:
fifa

Unnamed: 0,Known As,Full Name,Overall,Potential,Value(in Euro),Positions Played,Best Position,Nationality,Image Link,Age,...,LM Rating,CM Rating,RM Rating,LWB Rating,CDM Rating,RWB Rating,LB Rating,CB Rating,RB Rating,GK Rating
0,L. Messi,Lionel Messi,91,91,54000000,RW,CAM,Argentina,https://cdn.sofifa.net/players/158/023/23_60.png,35,...,91,88,91,67,66,67,62,53,62,22
1,K. Benzema,Karim Benzema,91,91,64000000,"CF,ST",CF,France,https://cdn.sofifa.net/players/165/153/23_60.png,34,...,89,84,89,67,67,67,63,58,63,21
2,R. Lewandowski,Robert Lewandowski,91,91,84000000,ST,ST,Poland,https://cdn.sofifa.net/players/188/545/23_60.png,33,...,86,83,86,67,69,67,64,63,64,22
3,K. De Bruyne,Kevin De Bruyne,91,91,107500000,"CM,CAM",CM,Belgium,https://cdn.sofifa.net/players/192/985/23_60.png,31,...,91,91,91,82,82,82,78,72,78,24
4,K. Mbappé,Kylian Mbappé,91,95,190500000,"ST,LW",ST,France,https://cdn.sofifa.net/players/231/747/23_60.png,23,...,92,84,92,70,66,70,66,57,66,21
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18534,D. Collins,Darren Collins,47,56,110000,"ST,RM",CAM,Republic of Ireland,https://cdn.sofifa.net/players/243/725/23_60.png,21,...,50,44,50,41,38,41,40,36,40,15
18535,Yang Dejiang,Dejiang Yang,47,57,90000,CDM,CDM,China PR,https://cdn.sofifa.net/players/261/933/23_60.png,17,...,45,45,45,47,48,47,49,49,49,15
18536,L. Mullan,Liam Mullan,47,67,130000,CM,RM,Northern Ireland,https://cdn.sofifa.net/players/267/823/23_60.png,18,...,52,49,52,46,44,46,46,42,46,17
18537,D. McCallion,Daithí McCallion,47,61,100000,CB,CB,Republic of Ireland,https://cdn.sofifa.net/players/267/824/23_60.png,17,...,33,33,33,44,42,44,47,49,47,15


## <span style="color:darkblue"> 3. Exploring the data </span>

<font size="5"> 

- Use ```DataFrame.head()``` and ```DataFrame.tail()``` to view the top and bottom rows of the frame respectively:

In [16]:
fifa.head()

Unnamed: 0,Known As,Full Name,Overall,Potential,Value(in Euro),Positions Played,Best Position,Nationality,Image Link,Age,...,LM Rating,CM Rating,RM Rating,LWB Rating,CDM Rating,RWB Rating,LB Rating,CB Rating,RB Rating,GK Rating
0,L. Messi,Lionel Messi,91,91,54000000,RW,CAM,Argentina,https://cdn.sofifa.net/players/158/023/23_60.png,35,...,91,88,91,67,66,67,62,53,62,22
1,K. Benzema,Karim Benzema,91,91,64000000,"CF,ST",CF,France,https://cdn.sofifa.net/players/165/153/23_60.png,34,...,89,84,89,67,67,67,63,58,63,21
2,R. Lewandowski,Robert Lewandowski,91,91,84000000,ST,ST,Poland,https://cdn.sofifa.net/players/188/545/23_60.png,33,...,86,83,86,67,69,67,64,63,64,22
3,K. De Bruyne,Kevin De Bruyne,91,91,107500000,"CM,CAM",CM,Belgium,https://cdn.sofifa.net/players/192/985/23_60.png,31,...,91,91,91,82,82,82,78,72,78,24
4,K. Mbappé,Kylian Mbappé,91,95,190500000,"ST,LW",ST,France,https://cdn.sofifa.net/players/231/747/23_60.png,23,...,92,84,92,70,66,70,66,57,66,21


In [17]:
fifa.tail()

Unnamed: 0,Known As,Full Name,Overall,Potential,Value(in Euro),Positions Played,Best Position,Nationality,Image Link,Age,...,LM Rating,CM Rating,RM Rating,LWB Rating,CDM Rating,RWB Rating,LB Rating,CB Rating,RB Rating,GK Rating
18534,D. Collins,Darren Collins,47,56,110000,"ST,RM",CAM,Republic of Ireland,https://cdn.sofifa.net/players/243/725/23_60.png,21,...,50,44,50,41,38,41,40,36,40,15
18535,Yang Dejiang,Dejiang Yang,47,57,90000,CDM,CDM,China PR,https://cdn.sofifa.net/players/261/933/23_60.png,17,...,45,45,45,47,48,47,49,49,49,15
18536,L. Mullan,Liam Mullan,47,67,130000,CM,RM,Northern Ireland,https://cdn.sofifa.net/players/267/823/23_60.png,18,...,52,49,52,46,44,46,46,42,46,17
18537,D. McCallion,Daithí McCallion,47,61,100000,CB,CB,Republic of Ireland,https://cdn.sofifa.net/players/267/824/23_60.png,17,...,33,33,33,44,42,44,47,49,47,15
18538,N. Rabha,Nabin Rabha,47,50,60000,LB,LB,India,https://cdn.sofifa.net/players/261/424/23_60.png,25,...,44,40,44,46,43,46,47,47,47,19


<font size="5"> 

- Use the method ```describe()``` to show a quick summary statistic

In [18]:
fifa.describe()

Unnamed: 0,Overall,Potential,Value(in Euro),Age,Height(in cm),Weight(in kg),TotalStats,BaseStats,Wage(in Euro),Release Clause,...,LM Rating,CM Rating,RM Rating,LWB Rating,CDM Rating,RWB Rating,LB Rating,CB Rating,RB Rating,GK Rating
count,18539.0,18539.0,18539.0,18539.0,18539.0,18539.0,18539.0,18539.0,18539.0,18539.0,...,18539.0,18539.0,18539.0,18539.0,18539.0,18539.0,18539.0,18539.0,18539.0,18539.0
mean,65.852042,71.016668,2875461.0,25.240412,181.550839,75.173904,1602.114569,357.946221,8824.537462,5081688.0,...,58.451319,57.374076,58.451319,56.281569,55.928583,56.281569,55.650251,54.528184,55.650251,23.257134
std,6.788353,6.192866,7635129.0,4.718163,6.858097,7.013593,273.160237,39.628259,19460.531154,14672030.0,...,13.987122,13.171194,13.987122,13.903836,13.87219,13.903836,14.159466,14.743929,14.159466,15.108925
min,47.0,48.0,0.0,16.0,155.0,49.0,759.0,224.0,0.0,0.0,...,18.0,18.0,18.0,17.0,19.0,17.0,17.0,18.0,17.0,10.0
25%,62.0,67.0,475000.0,21.0,177.0,70.0,1470.0,331.0,1000.0,665000.0,...,54.0,53.0,54.0,51.0,48.0,51.0,49.0,45.0,49.0,17.0
50%,66.0,71.0,1000000.0,25.0,182.0,75.0,1640.0,358.0,3000.0,1500000.0,...,62.0,60.0,62.0,59.0,59.0,59.0,59.0,58.0,59.0,18.0
75%,70.0,75.0,2000000.0,29.0,186.0,80.0,1786.0,385.0,8000.0,3400000.0,...,67.0,66.0,67.0,66.0,66.0,66.0,65.0,66.0,65.0,20.0
max,91.0,95.0,190500000.0,44.0,206.0,105.0,2312.0,502.0,450000.0,366700000.0,...,92.0,91.0,92.0,88.0,89.0,88.0,87.0,90.0,87.0,90.0


<font size="5"> 

- Use the method ```sort_values()``` to sort the table by a given column

In [21]:
fifa.sort_values(by='Overall', ascending= False)

Unnamed: 0,Known As,Full Name,Overall,Potential,Value(in Euro),Positions Played,Best Position,Nationality,Image Link,Age,...,LM Rating,CM Rating,RM Rating,LWB Rating,CDM Rating,RWB Rating,LB Rating,CB Rating,RB Rating,GK Rating
0,L. Messi,Lionel Messi,91,91,54000000,RW,CAM,Argentina,https://cdn.sofifa.net/players/158/023/23_60.png,35,...,91,88,91,67,66,67,62,53,62,22
2,R. Lewandowski,Robert Lewandowski,91,91,84000000,ST,ST,Poland,https://cdn.sofifa.net/players/188/545/23_60.png,33,...,86,83,86,67,69,67,64,63,64,22
3,K. De Bruyne,Kevin De Bruyne,91,91,107500000,"CM,CAM",CM,Belgium,https://cdn.sofifa.net/players/192/985/23_60.png,31,...,91,91,91,82,82,82,78,72,78,24
4,K. Mbappé,Kylian Mbappé,91,95,190500000,"ST,LW",ST,France,https://cdn.sofifa.net/players/231/747/23_60.png,23,...,92,84,92,70,66,70,66,57,66,21
1,K. Benzema,Karim Benzema,91,91,64000000,"CF,ST",CF,France,https://cdn.sofifa.net/players/165/153/23_60.png,34,...,89,84,89,67,67,67,63,58,63,21
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18534,D. Collins,Darren Collins,47,56,110000,"ST,RM",CAM,Republic of Ireland,https://cdn.sofifa.net/players/243/725/23_60.png,21,...,50,44,50,41,38,41,40,36,40,15
18535,Yang Dejiang,Dejiang Yang,47,57,90000,CDM,CDM,China PR,https://cdn.sofifa.net/players/261/933/23_60.png,17,...,45,45,45,47,48,47,49,49,49,15
18536,L. Mullan,Liam Mullan,47,67,130000,CM,RM,Northern Ireland,https://cdn.sofifa.net/players/267/823/23_60.png,18,...,52,49,52,46,44,46,46,42,46,17
18537,D. McCallion,Daithí McCallion,47,61,100000,CB,CB,Republic of Ireland,https://cdn.sofifa.net/players/267/824/23_60.png,17,...,33,33,33,44,42,44,47,49,47,15
