# Manipulating Dataframes

In the previous tutorial we covered what a dataframe is, and how to create and access it. Now we will look at how to manipulate dataframes and when to use a dataframe as the chosen data structure.

Let's start by importing the pandas library and the football_players.csv we used in the previous tutorial.

In [37]:
import pandas as pd

In [38]:
# Load data - pass 'Name' as our index column
load_df = pd.read_csv('C:/Users/alfred/Downloads/pandas/football_players-a-26.csv', index_col='Name')

# Create dataframe called df
df = pd.DataFrame(load_df)

# Use the head() function to look at the first 5 rows
df.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0_level_0,Age,Nationality,Overall,Acceleration,Aggression,Agility,Balance,Ball control,Composure,Crossing,Curve,Dribbling,Finishing,Free kick accuracy,GK diving,GK handling,GK kicking,GK positioning,GK reflexes,Heading accuracy,Interceptions,Jumping,Long passing,Long shots,Marking,Penalties,Positioning,Reactions,Short passing,Shot power,Sliding tackle,Sprint speed,Stamina,Standing tackle,Strength,Vision,Volleys,Preferred Positions
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1
Cristiano Ronaldo,32,Portugal,94,89,63,89,63,93,95,85,81,91,94,76,7,11,15,14,11,88,29,95,77,92,22,85,95,96,83,94,23,91,92,31,80,85,88,ST LW
L. Messi,30,Argentina,93,92,48,90,95,95,96,77,89,97,95,90,6,11,15,14,8,71,22,68,87,88,13,74,93,95,88,85,26,87,73,28,59,90,85,RW
Neymar,25,Brazil,92,94,56,96,82,95,92,75,81,96,89,84,9,9,15,15,11,62,36,61,75,77,21,81,90,88,81,80,33,90,78,24,53,80,83,LW
L. Suárez,30,Uruguay,92,88,78,86,60,91,83,77,86,86,94,84,27,25,31,33,37,77,41,69,64,86,30,85,92,93,83,87,38,77,89,45,80,84,88,ST
M. Neuer,31,Germany,92,58,29,52,35,48,70,15,14,30,13,11,91,90,95,91,89,25,30,78,59,16,10,47,12,85,55,25,11,61,44,10,83,70,11,GK


## Sorting

Usually one of the first things that is done when working with a large dataset is to sort the values using some criteria. This can be done by using the `sort_values()` function. We need to pass the column to sort by. Let's look at some examples:

In [3]:
# Sort by age from youngest to oldest (select first 5 entries)
df.sort_values(by='Age').head()

Unnamed: 0_level_0,Age,Nationality,Overall,Acceleration,Aggression,Agility,Balance,Ball control,Composure,Crossing,...,Short passing,Shot power,Sliding tackle,Sprint speed,Stamina,Standing tackle,Strength,Vision,Volleys,Preferred Positions
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
J. Romero,16,Argentina,58,75,36,70,86,55,59,46,...,52,54,25,77,58,26,41,51,45,ST
J. Hove,16,Norway,51,57,57,58,56,51,42,40,...,56,50,56,64,47,51,54,56,29,CM
Javi Vázquez,16,Spain,58,61,53,48,65,47,43,33,...,41,42,59,64,59,63,55,37,29,CB
K. Pierie,16,Netherlands,65,72,64,66,53,65,66,47,...,68,35,62,72,45,67,65,56,32,LB CB
L. Pintor,16,France,54,72,29,70,68,56,43,27,...,30,57,12,68,36,10,34,38,48,ST


When sorting values, the default is always ascending. If we want to sort in decending order, we need to pass ascending=False.

In [4]:
# Sort by age from oldest to youngest (select first 5 entries)
df.sort_values(by='Age', ascending=False).head()

Unnamed: 0_level_0,Age,Nationality,Overall,Acceleration,Aggression,Agility,Balance,Ball control,Composure,Crossing,...,Short passing,Shot power,Sliding tackle,Sprint speed,Stamina,Standing tackle,Strength,Vision,Volleys,Preferred Positions
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
B. Richardson,47,England,46,25,44,35,44,22,44,11,...,12,13,13,25,32,12,47,17,12,GK
E. El Hadary,44,Egypt,70,28,29,21,41,25,57,20,...,16,18,15,39,34,19,73,16,12,GK
O. Pérez,44,Mexico,71,60,26,69,69,23,50,19,...,26,24,11,57,41,12,66,40,13,GK
J. Walker,43,England,55,22,41,53,64,38,35,16,...,22,22,13,35,19,11,62,32,11,GK
D. Coyne,43,Wales,55,36,36,48,58,31,57,13,...,28,21,19,33,37,16,76,33,11,GK


## Filtering

A very useful thing to do with a dataframe is to filter the values to only view the data you are interested in. We can do this by passing a condition inside square brackets after the dataframe, eg. the syntax will look like dataframe[condition]. Let's look at a few examples:

In [5]:
# Filter on players older than 30
df[df['Age'] > 30]

Unnamed: 0_level_0,Age,Nationality,Overall,Acceleration,Aggression,Agility,Balance,Ball control,Composure,Crossing,...,Short passing,Shot power,Sliding tackle,Sprint speed,Stamina,Standing tackle,Strength,Vision,Volleys,Preferred Positions
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Cristiano Ronaldo,32,Portugal,94,89,63,89,63,93,95,85,...,83,94,23,91,92,31,80,85,88,ST LW
M. Neuer,31,Germany,92,58,29,52,35,48,70,15,...,55,25,11,61,44,10,83,70,11,GK
Sergio Ramos,31,Spain,90,75,84,79,60,84,80,66,...,78,79,91,77,84,89,81,63,66,CB
L. Modrić,31,Croatia,89,75,62,93,94,92,84,78,...,92,73,73,71,82,80,58,90,74,CDM CM
G. Chiellini,32,Italy,89,68,92,59,64,57,82,58,...,59,78,90,78,68,92,91,50,45,CB
G. Buffon,39,Italy,89,49,38,55,49,28,70,13,...,37,39,11,43,39,11,69,50,17,GK
D. Godín,31,Uruguay,88,62,86,63,58,76,82,55,...,79,67,89,67,67,86,80,52,47,CB
Thiago Silva,32,Brazil,88,70,77,74,68,80,83,60,...,79,74,88,74,74,89,81,74,63,CB
Z. Ibrahimović,35,Sweden,88,63,84,81,41,88,91,76,...,84,91,27,67,71,41,88,83,90,ST
A. Robben,33,Netherlands,88,87,47,89,91,89,86,80,...,84,87,26,86,68,26,67,83,86,RW RM


We can also pass multiple conditions in the square brackets by using the | and & operators. Note that each condition should be closed inside round brackets as well. 

In [6]:
# Filter on players older than 30 and overall rating greater than 90
df[(df['Age'] > 30) & (df['Overall'] > 90)]

Unnamed: 0_level_0,Age,Nationality,Overall,Acceleration,Aggression,Agility,Balance,Ball control,Composure,Crossing,...,Short passing,Shot power,Sliding tackle,Sprint speed,Stamina,Standing tackle,Strength,Vision,Volleys,Preferred Positions
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Cristiano Ronaldo,32,Portugal,94,89,63,89,63,93,95,85,...,83,94,23,91,92,31,80,85,88,ST LW
M. Neuer,31,Germany,92,58,29,52,35,48,70,15,...,55,25,11,61,44,10,83,70,11,GK


## Create and Drop Columns

Another useful thing is to create new columns from existing ones as well as to drop columns that are not needed.

### Create Columns

We can create new columns from existing ones by simply defining the new name as a string inside square brackets after calling the dataframe followed by the function of the other column(s). Let's look at an example:

In [7]:
# Create column of rating per year of age
df['Rating Per Year of Age'] = df['Overall'] / df['Age']

# Look at first 5 entries
df['Rating Per Year of Age'].head()

Name
Cristiano Ronaldo    2.937500
L. Messi             3.100000
Neymar               3.680000
L. Suárez            3.066667
M. Neuer             2.967742
Name: Rating Per Year of Age, dtype: float64

### Drop Columns

Columns can be dropped by using the `drop()` function. The arguments are the column name (string) and the axis which should be equal to 1.

In [8]:
# Drop column just created
df = df.drop('Rating Per Year of Age', axis=1)

df.head()

Unnamed: 0_level_0,Age,Nationality,Overall,Acceleration,Aggression,Agility,Balance,Ball control,Composure,Crossing,...,Short passing,Shot power,Sliding tackle,Sprint speed,Stamina,Standing tackle,Strength,Vision,Volleys,Preferred Positions
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Cristiano Ronaldo,32,Portugal,94,89,63,89,63,93,95,85,...,83,94,23,91,92,31,80,85,88,ST LW
L. Messi,30,Argentina,93,92,48,90,95,95,96,77,...,88,85,26,87,73,28,59,90,85,RW
Neymar,25,Brazil,92,94,56,96,82,95,92,75,...,81,80,33,90,78,24,53,80,83,LW
L. Suárez,30,Uruguay,92,88,78,86,60,91,83,77,...,83,87,38,77,89,45,80,84,88,ST
M. Neuer,31,Germany,92,58,29,52,35,48,70,15,...,55,25,11,61,44,10,83,70,11,GK


## Group Data

We can also group the data according to some column(s). This will help if we are interested in certain groups within the dataset. We use the `groupby()` function to do this. 

Note that after the `groupby()` function we need to call an aggregation function for the grouped data. Examples are `mean()`, `sum()`, `min()` and `max()`. It will then return a column of the chosen aggregation for each numeric column. We can also then select the columns we want to see. Let's look at an example:

In [9]:
# Look at the average rating by age (first 5 rows)
df.groupby('Age').mean().head()

Unnamed: 0_level_0,Overall
Age,Unnamed: 1_level_1
16,57.846154
17,56.089147
18,57.287202
19,59.430309
20,61.559839


It is possible to group by more than one column. We simply need to pass the list of columns.

In [10]:
# Look at the average rating by age and nationality (first 5 rows)
df.groupby(['Age', 'Nationality']).mean().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Overall
Age,Nationality,Unnamed: 2_level_1
16,Argentina,56.0
16,England,61.5
16,France,54.0
16,Germany,57.0
16,Italy,61.0


## Transforming Data

It is often the case that we want to format a column or columns inside our dataframe. We can transform a column using the `apply()` function, while we use the `applymap()` function to transform all the columns at once. The argument for these functions is a function that must be applied to the specific column or columns. 

We can create a function by using `def` or `lambda expressions`.

### Def Expression

Let's format a column in our dataset. We start by creating the function.

In [11]:
# Using def
def year_to_month(x):
    "Converts no of years to no of months"
    return x * 12

In [12]:
# Change the age column to months and look at first 5 entries
df['Age'].apply(year_to_month).head()

Name
Cristiano Ronaldo    384
L. Messi             360
Neymar               300
L. Suárez            360
M. Neuer             372
Name: Age, dtype: int64

### Lambda Expressions

You briefly covered lambda expressions in the Python Fundamentals. But let's have a quick recap:

Lambda operator or lambda function is used for creating small, one-time and anonymous function objects in Python. Basic syntax:

* `lambda` *arguments*: *expression*

Lambda operator can have any number of arguments, but it can have only one expression. It cannot contain any statements and it returns a function object which can be assigned to any variable.

Let's look at an example:

In [13]:
# Using lambda operator
year_to_month_lamb = lambda x: x * 12

# Print function answer
print('def: ', year_to_month(30))
print('lambda: ', year_to_month_lamb(30))

def:  360
lambda:  360


So essentially it does the same as `def`, but it can only have one expression and there is no need to return anything.

Let's use it to format a column in our dataset:

In [14]:
# Change the age column to months and look at first 5 entries
df['Age'].apply(lambda x: x*12).head()

Name
Cristiano Ronaldo    384
L. Messi             360
Neymar               300
L. Suárez            360
M. Neuer             372
Name: Age, dtype: int64

### More Examples

For practice, let's create another function to create a column with the player position type, where types can be Forward, Midfielder, Back or GoalKeeper.

In [15]:
# Create function
def position_type(s):
    
    """"This function converts the individual positions (abbreviations) and classfies it
    as either a forward, midfielder, back or goal keeper"""
    
    if (s[-2] == 'T') | (s[-2] == 'W'):
        return 'Forward'
    elif s[-2] == 'M':
        return 'Midfielder'
    elif s[-2] == 'B':
        return 'Back'
    else:
        return 'GoalKeeper'

In [16]:
# Create position type column
df['Preferred Positions Type'] = df['Preferred Positions'].apply(position_type)

# Look at first 5 entries
df['Preferred Positions Type'].head()

Name
Cristiano Ronaldo       Forward
L. Messi                Forward
Neymar                  Forward
L. Suárez               Forward
M. Neuer             GoalKeeper
Name: Preferred Positions Type, dtype: object

Let's create another function to transform many columns at once using the `applymap()` function. 

A lot of the columns look like they should be numbers, but are actually stored as strings. We can check this by using the `info()` function:

In [17]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 17981 entries, Cristiano Ronaldo to L. Sackey
Data columns (total 39 columns):
Age                         17981 non-null int64
Nationality                 17981 non-null object
Overall                     17981 non-null int64
Acceleration                17981 non-null object
Aggression                  17981 non-null object
Agility                     17981 non-null object
Balance                     17981 non-null object
Ball control                17981 non-null object
Composure                   17981 non-null object
Crossing                    17981 non-null object
Curve                       17981 non-null object
Dribbling                   17981 non-null object
Finishing                   17981 non-null object
Free kick accuracy          17981 non-null object
GK diving                   17981 non-null object
GK handling                 17981 non-null object
GK kicking                  17981 non-null object
GK positioning              

As you can see, most of the number columns (attributes) are stored as strings.

We will create a function that transforms all the supposed number columns, to floats. For the strings, we will only select the first two characters to transform so that it returns a two digit number before the comma.

In [18]:
# Select all attribute columns
cols = ['Overall', 'Acceleration', 'Aggression',
       'Agility', 'Balance', 'Ball control', 'Composure', 'Crossing', 'Curve',
       'Dribbling', 'Finishing', 'Free kick accuracy', 'GK diving',
       'GK handling', 'GK kicking', 'GK positioning', 'GK reflexes',
       'Heading accuracy', 'Interceptions', 'Jumping', 'Long passing',
       'Long shots', 'Marking', 'Penalties', 'Positioning', 'Reactions',
       'Short passing', 'Shot power', 'Sliding tackle', 'Sprint speed',
       'Stamina', 'Standing tackle', 'Strength', 'Vision', 'Volleys']

In [19]:
# Create function
def to_float(x):    
    "Transforms attribute columns to type float"
    
    if type(x) is int:
        return float(x)
    else:
        return float(x[0:2])

In [20]:
# Use applymap() function to transform all selected columns
df[cols] = df[cols].applymap(to_float)

Now let's look at the data types of the columns:

In [21]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 17981 entries, Cristiano Ronaldo to L. Sackey
Data columns (total 39 columns):
Age                         17981 non-null int64
Nationality                 17981 non-null object
Overall                     17981 non-null float64
Acceleration                17981 non-null float64
Aggression                  17981 non-null float64
Agility                     17981 non-null float64
Balance                     17981 non-null float64
Ball control                17981 non-null float64
Composure                   17981 non-null float64
Crossing                    17981 non-null float64
Curve                       17981 non-null float64
Dribbling                   17981 non-null float64
Finishing                   17981 non-null float64
Free kick accuracy          17981 non-null float64
GK diving                   17981 non-null float64
GK handling                 17981 non-null float64
GK kicking                  17981 non-null float64
GK positioni

**NB Note:** 

**Use the two functions in this section (More Examples) for your project deliverable to distinguish between defenders (backs) and attackers (forwards), and also to transform your attribute columns to floats (i.e. perform these two transformations on the data before attempting to create the functions required for the project deliverable).**

## Exercises
#### Using the above dataframe, who is the oldest player with an Overall of at least 70?

In [41]:
df[(df['Age'] > 40) & (df['Overall'] > 70)].index[0]

'O. Pérez'

#### What is the age of the youngest Norweigian player?

In [23]:
df[(df['Nationality'] == 'Norway')]['Age'].min()

16

#### What is the average Overall for Brazillian players that are at least 26 years old?

In [24]:
df[(df['Nationality'] == 'Brazil') & (df['Age'] > 26)]['Overall'].mean()

71.61961722488039

In [25]:
pd.set_option('display.max_rows',10)

In [26]:
df.groupby(['Nationality'])['Overall'].mean()  == 73.0

Nationality
Afghanistan          False
Albania              False
Algeria              False
Angola               False
Antigua & Barbuda    False
                     ...  
Venezuela            False
Vietnam              False
Wales                False
Zambia               False
Zimbabwe             False
Name: Overall, Length: 165, dtype: bool

In [27]:
df[(df['Nationality'] == 'Algeria')]['Overall'].index[0]

'R. Mahrez'

In [28]:
df[(df['Preferred Positions Type'] == 'Back')]['Sliding tackle'].index[0]

'Sergio Ramos'

In [29]:
pd.set_option('display.max_rows',1000)

In [30]:
df.groupby(['Nationality', 'Preferred Positions Type'])['Overall'].mean()

Nationality           Preferred Positions Type
Afghanistan           Midfielder                  56.333333
Albania               Back                        68.000000
                      Forward                     63.666667
                      GoalKeeper                  76.000000
                      Midfielder                  65.764706
Algeria               Back                        71.857143
                      Forward                     70.300000
                      GoalKeeper                  66.666667
                      Midfielder                  71.166667
Angola                Back                        70.400000
                      Forward                     68.000000
                      GoalKeeper                  63.000000
                      Midfielder                  68.400000
Antigua & Barbuda     Back                        65.000000
                      Forward                     56.500000
                      Midfielder                  57.

In [31]:
df[(df['Age'] == df['Age'].max())]['Nationality']

Name
B. Richardson    England
Name: Nationality, dtype: object

In [32]:
df.groupby(['Preferred Positions Type'])['Reactions','Acceleration'].mean()

Unnamed: 0_level_0,Reactions,Acceleration
Preferred Positions Type,Unnamed: 1_level_1,Unnamed: 2_level_1
Back,61.52223,64.384405
Forward,62.030741,71.011682
GoalKeeper,59.470867,40.229749
Midfielder,62.655931,69.126163


In [33]:
pd.set_option('display.max_columns',1000)

In [34]:
df.groupby(['Preferred Positions Type']).mean()

Unnamed: 0_level_0,Age,Overall,Acceleration,Aggression,Agility,Balance,Ball control,Composure,Crossing,Curve,Dribbling,Finishing,Free kick accuracy,GK diving,GK handling,GK kicking,GK positioning,GK reflexes,Heading accuracy,Interceptions,Jumping,Long passing,Long shots,Marking,Penalties,Positioning,Reactions,Short passing,Shot power,Sliding tackle,Sprint speed,Stamina,Standing tackle,Strength,Vision,Volleys
Preferred Positions Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1
Back,25.436731,66.495383,64.384405,65.494699,60.121067,61.526334,57.074213,58.661594,51.287278,43.687073,51.841997,34.787449,39.387141,10.597982,10.598324,10.602941,10.601744,10.56686,60.916211,63.728112,68.800616,53.082421,39.892613,64.180404,43.962722,43.440834,61.52223,58.344562,51.924077,65.404412,65.448358,68.021888,66.863543,70.360123,45.786423,35.974863
Forward,24.630188,66.209038,71.011682,51.642176,68.46265,65.8838,64.98217,60.876729,50.086689,53.173071,64.525361,65.780203,46.353827,10.499846,10.628343,10.676606,10.499231,10.537658,60.642484,27.517369,66.074085,46.835229,59.583461,23.73901,62.285583,64.881033,62.030741,58.867507,66.457117,24.522902,71.493083,64.523517,26.908392,66.409468,56.045804,58.22533
GoalKeeper,26.01658,64.811937,40.229749,27.578873,41.912364,44.454287,21.96163,37.51208,16.028423,16.665088,16.203221,14.516817,16.12648,62.988157,60.675983,59.424443,60.843676,63.848887,16.108479,18.032212,58.339176,26.71909,15.131691,13.270962,22.028423,13.803411,59.470867,28.401705,24.290857,14.606821,40.526291,32.057793,14.806253,60.914732,37.171009,14.906206
Midfielder,24.867336,66.500813,69.126163,58.194563,70.186143,70.731275,66.696115,61.945191,58.689171,57.126311,65.186586,53.928645,53.280396,10.544837,10.618407,10.672773,10.594918,10.625055,52.087753,49.643817,62.829665,62.454277,57.50229,45.943566,55.285862,58.760674,62.655931,67.055843,63.356774,48.032501,68.42739,68.02083,50.474664,61.50421,62.587236,51.078889


In [35]:
df['Preferred Positions Type'].value_counts()

Midfielder    6769
Back          5848
Forward       3253
GoalKeeper    2111
Name: Preferred Positions Type, dtype: int64

In [36]:
df[(df['Nationality'] == 'Portugal') & (df['Age'] < 25)]['Overall'].index[0]

'Bernardo Silva'

## When to use Dataframes

DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables. It is therefore a two-dimensional labeled data structure. We should use a pandas dataframe if all of the following statements hold:

* We have 2-dimensional data (rows and columns)
* The data type is the same within a column
* We are interested in the index (rows) and column names

That is the end of this tutorial. You should have a better understanding of how to manipulate pandas dataframes. You have now covered all of the tutorials in the Python Data Structures section of the course. Well done!!!