# Module 4: Data Manipulation with Pandas

## <span style="color:darkblue"> Review </span>

<font size="5"> 

1. Object creation: ```pd.Series()``` and ```pd.DataFrame()```

2. Importing CSV tables: ```pd.read_csv()```

3. Exploring data: ```DataFrame.head()```, ```DataFrame.tail()```, ```DataFrame.describe()``` and ```DataFrame.sort_values()```

## <span style="color:darkblue"> Today: Selection of data using Pandas </span>

<font size="5"> 

- The first step when using Pandas is to import the package 

- Then, you have to decide what data to use: are you creating or importing data?

In [3]:
import pandas as pd

<font size="5"> 

- When the data we want to import is not in our current working directory, you have to specify the location where the data is

- To find out your current working directory use the package ```os``` (short for operating system)

- In particular, we use the function ```os.getcwd()``` (short for get current working directory)

In [8]:
import os 
os.getcwd()

'c:\\Users\\juane\\OneDrive - Emory University\\EmoryUniversity\\Courses\\QTM151\\qtm151-notes\\lecture13'

In [11]:
fifa = pd.read_csv('../lecture12/fifa23_players.csv')

In [12]:
fifa

Unnamed: 0,Known As,Full Name,Overall,Potential,Value(in Euro),Positions Played,Best Position,Nationality,Image Link,Age,...,LM Rating,CM Rating,RM Rating,LWB Rating,CDM Rating,RWB Rating,LB Rating,CB Rating,RB Rating,GK Rating
0,L. Messi,Lionel Messi,91,91,54000000,RW,CAM,Argentina,https://cdn.sofifa.net/players/158/023/23_60.png,35,...,91,88,91,67,66,67,62,53,62,22
1,K. Benzema,Karim Benzema,91,91,64000000,"CF,ST",CF,France,https://cdn.sofifa.net/players/165/153/23_60.png,34,...,89,84,89,67,67,67,63,58,63,21
2,R. Lewandowski,Robert Lewandowski,91,91,84000000,ST,ST,Poland,https://cdn.sofifa.net/players/188/545/23_60.png,33,...,86,83,86,67,69,67,64,63,64,22
3,K. De Bruyne,Kevin De Bruyne,91,91,107500000,"CM,CAM",CM,Belgium,https://cdn.sofifa.net/players/192/985/23_60.png,31,...,91,91,91,82,82,82,78,72,78,24
4,K. Mbappé,Kylian Mbappé,91,95,190500000,"ST,LW",ST,France,https://cdn.sofifa.net/players/231/747/23_60.png,23,...,92,84,92,70,66,70,66,57,66,21
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18534,D. Collins,Darren Collins,47,56,110000,"ST,RM",CAM,Republic of Ireland,https://cdn.sofifa.net/players/243/725/23_60.png,21,...,50,44,50,41,38,41,40,36,40,15
18535,Yang Dejiang,Dejiang Yang,47,57,90000,CDM,CDM,China PR,https://cdn.sofifa.net/players/261/933/23_60.png,17,...,45,45,45,47,48,47,49,49,49,15
18536,L. Mullan,Liam Mullan,47,67,130000,CM,RM,Northern Ireland,https://cdn.sofifa.net/players/267/823/23_60.png,18,...,52,49,52,46,44,46,46,42,46,17
18537,D. McCallion,Daithí McCallion,47,61,100000,CB,CB,Republic of Ireland,https://cdn.sofifa.net/players/267/824/23_60.png,17,...,33,33,33,44,42,44,47,49,47,15


## <span style="color:darkblue"> 1. Getting individual columns from a data frame </span>

<font size="5"> 

**Task:**

1. Print all the columns in the fifa data frame

2. Check if there is a column containing the player nationality 

3. Extract the column with the nationality name: this will get you a Pandas ```Series```

4. Using a ```lambda``` function and the ```filter``` extract the number of players from Brazil (<span style="color:darkred"> we will learn how to do this with Pandas</span>, this is for practice)

In [19]:
n = fifa['Nationality']

In [25]:
fifa[['Nationality', 'Full Name']]

Unnamed: 0,Nationality,Full Name
0,Argentina,Lionel Messi
1,France,Karim Benzema
2,Poland,Robert Lewandowski
3,Belgium,Kevin De Bruyne
4,France,Kylian Mbappé
...,...,...
18534,Republic of Ireland,Darren Collins
18535,China PR,Dejiang Yang
18536,Northern Ireland,Liam Mullan
18537,Republic of Ireland,Daithí McCallion


In [23]:
b = list(filter(lambda x: x == "Brazil", n))

In [24]:
len(b)

728

In [14]:
fifa.columns

Index(['Known As', 'Full Name', 'Overall', 'Potential', 'Value(in Euro)',
       'Positions Played', 'Best Position', 'Nationality', 'Image Link', 'Age',
       'Height(in cm)', 'Weight(in kg)', 'TotalStats', 'BaseStats',
       'Club Name', 'Wage(in Euro)', 'Release Clause', 'Club Position',
       'Contract Until', 'Club Jersey Number', 'Joined On', 'On Loan',
       'Preferred Foot', 'Weak Foot Rating', 'Skill Moves',
       'International Reputation', 'National Team Name',
       'National Team Image Link', 'National Team Position',
       'National Team Jersey Number', 'Attacking Work Rate',
       'Defensive Work Rate', 'Pace Total', 'Shooting Total', 'Passing Total',
       'Dribbling Total', 'Defending Total', 'Physicality Total', 'Crossing',
       'Finishing', 'Heading Accuracy', 'Short Passing', 'Volleys',
       'Dribbling', 'Curve', 'Freekick Accuracy', 'LongPassing', 'BallControl',
       'Acceleration', 'Sprint Speed', 'Agility', 'Reactions', 'Balance',
       'Shot Powe

<font size="5"> 

- You can also get more than one column by using a list, instead of only the name of one column 

- **Example:** extract both name and nationality

<font size="5"> 

- You can also extract data using the slicing notation with ```[]```, which will extract rows from the data frame

## <span style="color:darkblue"> 2. Selection by label </span>

<font size="5"> 

- If you want to extract a subset of rows and a subset of columns, you have to use the notation ```DataFrame.loc[]```

**Task:**

- Extract only the name, nationality and overall of the players in the rows 15 to 20

## <span style="color:darkblue"> 3. Selection by position </span>

<font size="5"> 

- You can select all columns of a given row by passing its position as an integer using ```DataFrame.iloc``` 

<font size="5"> 

- You can also use slicing for both the rows and the columns 

<font size="5"> 

- Finally, you can also pass lists with rows and columns positions