# Week 3 Notebook 2 Data Exploration

## Pandas for Data Exploration

We have used the Pandas data science library to perform some basic data exploration and wrangling in the previous week. 

In this notebook, we will try out more functions for:
* Manipulating pandas DataFrames
* Sorting with Pandas
* Filtering with Pandas

First, we have to import the pandas library.


In [1]:
import pandas as pd

## The Pandas DataFrame

The Pandas library supports a two dimensional data structure known as a data frame to store, retrieve & manipulate data. Rows & columns of a data frame allow quick & easy acess to the data stored.

Lets say we want to keep track of the staff hired by a local hospital in Melbourne. We will store the information provided by each member in a single data frame. 

For this purpose, we can make use of the pandas `DataFrame()` constructor method to create a new data frame. 

In [2]:
df = pd.DataFrame({
    'StaffID' : [897654, 290128, 612586, 478132, 108954],
    'FirstName' : ['Harvey', 'Mike', 'Riley', 'James', 'Jane'],
    'LastName' : ['Specter', 'Ross', 'Jackson', 'Bond', 'Austin'],
    'Profession' : ['Nurse', 'Pediatrician', 'Neurologist','Gynaecologist','Ophthalmologist']
})

Displaying the contents of the `df` object will show the nicely structured table format of the data frame. 

In [28]:
df

Unnamed: 0,StaffID,FirstName,LastName,Profession
0,897654,Harvey,Specter,Nurse
1,290128,Mike,Ross,Pediatrician
2,612586,Riley,Jackson,Neurologist
3,478132,James,Bond,Gynaecologist
4,108954,Jane,Austin,Ophthalmologist


Columns can be selected using the label of the column in the square brackets:

In [3]:
df['Profession']

0              Nurse
1       Pediatrician
2        Neurologist
3      Gynaecologist
4    Ophthalmologist
Name: Profession, dtype: object

Alternatively, dataFrame.columnName can also be used to display columns. However this can only be used if the column name is a valid python identifier, so columns which are reserved words, start with a digit or have spaces cannot be accessed using the dot operator.

In [4]:
# selecting column using dot operator
df.Profession

0              Nurse
1       Pediatrician
2        Neurologist
3      Gynaecologist
4    Ophthalmologist
Name: Profession, dtype: object

To select more than one column, you will also need to use the square brackets, and place the labels of the required columns in a list. A DataFrame object will be returned.

In [33]:
df[['Profession','FirstName']]

Unnamed: 0,Profession,FirstName
0,Nurse,Harvey
1,Pediatrician,Mike
2,Neurologist,Riley
3,Gynaecologist,James
4,Ophthalmologist,Jane


You can select rows from the data frame using the slicing operator:

In [20]:
# select rows indexed 1 to 3
df[1:4]

Unnamed: 0,StaffID,FirstName,LastName,Profession
1,290128,Mike,Ross,Pediatrician
2,612586,Riley,Jackson,Neurologist
3,478132,James,Bond,Gynaecologist


Rows can also be selected based on a condition:

In [34]:
df[df.StaffID == 478132]

Unnamed: 0,StaffID,FirstName,LastName,Profession
3,478132,James,Bond,Gynaecologist


We can add columns to the data frame even after it has been created:

In [35]:
df['RoomNo'] = [23,67,45,12,8]
df

Unnamed: 0,StaffID,FirstName,LastName,Profession,RoomNo
0,897654,Harvey,Specter,Nurse,23
1,290128,Mike,Ross,Pediatrician,67
2,612586,Riley,Jackson,Neurologist,45
3,478132,James,Bond,Gynaecologist,12
4,108954,Jane,Austin,Ophthalmologist,8


Additionally, data can be read in from various file formats including csv, xslx, and txt. Let's read in some data to do more exploration.

# More Pandas

We are going to practice our data exploration and cleaning skills with a different [data set provided by Pavan Tinniru on Kaggle](https://www.kaggle.com/pavantanniru/-datacleaningforbeginnerusingpandas).

Save the data set into this current working directory. 


In [2]:
import pandas as pd

jobs = pd.read_csv('Data-cleaning-for-beginners-using-pandas.csv')


Before we start operating on our dataset, it is always a good idea to explore the data & understand exactly what we are getting ourselves into. Pandas provides some great functions to begin with!

To start exploring the data set, we can first check how many rows and columns it has, ie. the *dimensions*, by using the shape property:

In [3]:
# check the dimensions of the data frame
jobs.shape

(29, 7)

The above values indicate that our dataset contains 29 rows with 7 columns!

Lets now explore the datatypes that are contained within our dataset!

In [4]:
# check the data types of the columns
jobs.dtypes

Index            int64
Age            float64
Salary          object
Rating         float64
Location        object
Established      int64
Easy Apply      object
dtype: object

We have a combination of integer, float and other objects. Notice that salary is stored as an object. Let's have a look at the data.

In [3]:
# view the first five rows
jobs.head()

Unnamed: 0,Index,Age,Salary,Rating,Location,Established,Easy Apply
0,0,44.0,$44k-$99k,5.4,"India,In",1999,TRUE
1,1,66.0,$55k-$66k,3.5,"New York,Ny",2002,TRUE
2,2,,$77k-$89k,-1.0,"New York,Ny",-1,-1
3,3,64.0,$44k-$99k,4.4,India In,1988,-1
4,4,25.0,$44k-$99k,6.4,Australia Aus,2002,-1


In [5]:
# view the last five rows
jobs.tail()

Unnamed: 0,Index,Age,Salary,Rating,Location,Established,Easy Apply
24,24,13.0,$44k-$99k,-1.0,"New York,Ny",1987,-1
25,25,55.0,$44k-$99k,0.0,Australia Aus,1980,TRUE
26,26,,$55k-$66k,,"India,In",1934,TRUE
27,27,52.0,$44k-$99k,5.4,"India,In",1935,-1
28,28,,$39k-$88k,3.4,Australia Aus,1932,-1


Let's get some basic statistics regarding our dataset:

In [40]:
jobs.describe()

Unnamed: 0,Index,Age,Rating,Established
count,29.0,22.0,28.0,29.0
mean,14.0,39.045455,3.528571,1638.62069
std,8.514693,16.134781,2.825133,762.079599
min,0.0,13.0,-1.0,-1.0
25%,7.0,25.0,1.05,1935.0
50%,14.0,39.5,4.2,1984.0
75%,21.0,50.0,5.4,1999.0
max,28.0,66.0,7.8,2020.0


## Sorting the Data Frame

We can also arrange the data by sorting it using the `sort_values()` function.
You will have to select the columns using the `by=` parameter.

For example, if we want to sort the `jobs` data frame according to age, we would choose the 'Age' column:

In [5]:
jobs.sort_values(by='Age')

Unnamed: 0,Index,Age,Salary,Rating,Location,Established,Easy Apply
24,24,13.0,$44k-$99k,-1.0,"New York,Ny",1987,-1
22,22,19.0,$55k-$66k,7.8,"New York,Ny",1988,TRUE
16,16,19.0,$19k-$40k,4.5,"India,In",1984,-1
6,6,21.0,$44k-$99k,0.0,"New York,Ny",-1,-1
9,9,22.0,$44k-$99k,7.7,"India,In",-1,TRUE
4,4,25.0,$44k-$99k,6.4,Australia Aus,2002,-1
13,13,25.0,$44k-$99k,-1.0,Australia Aus,2019,TRUE
19,19,32.0,$44k-$99k,3.3,"New York,Ny",1955,TRUE
21,21,35.0,$44k-$99k,5.0,"New York,Ny",1946,-1
8,8,35.0,$44k-$99k,5.4,"New York,Ny",-1,-1


Running the code cell above will show the jobs in order of Age. We could also choose more than one column to sort, in this case the columns would have to be provided as a list.



In [9]:
# Sorting by Age, then Salary. 
# the 'ascending=' parameter is used to specify that the ages should be
# in descending order, and then for Age values which are equal, the the data
# is arranged by salary in ascending order.

jobs.sort_values(by=(['Age', 'Salary']), ascending = [False, True])

Unnamed: 0,Index,Age,Salary,Rating,Location,Established,Easy Apply
14,14,66.0,$44k-$99k,4.0,Australia Aus,2020,TRUE
1,1,66.0,$55k-$66k,3.5,"New York,Ny",2002,TRUE
3,3,64.0,$44k-$99k,4.4,India In,1988,-1
10,10,55.0,$10k-$49k,5.4,"India,In",2008,TRUE
25,25,55.0,$44k-$99k,0.0,Australia Aus,1980,TRUE
27,27,52.0,$44k-$99k,5.4,"India,In",1935,-1
11,11,44.0,$10k-$49k,6.7,"India,In",2009,-1
0,0,44.0,$44k-$99k,5.4,"India,In",1999,TRUE
7,7,44.0,$44k-$99k,-1.0,Australia Aus,-1,-1
5,5,44.0,$77k-$89k,1.4,"India,In",1999,TRUE


In the example above, the `ascending=` parameter is used to specify that data should be sorted with the 'Age' column should be in descending order, and for Ages which are the same value, the data should be sorted by salary in ascending order. The `ascending=` parameter is optional, by default the sorting is in ascending order.

## Filtering Data

Filtering data means to choose only a part of the data, which could be certain columns or rows only. Usually we will store the filtered data in a new object.

We can use the columns in a list to specify which columns to select. The columns in the list can be repeated.

In [10]:
# filter some columns
someJobs = jobs[['Age', 'Rating', 'Salary', 'Age']]
print(someJobs)
type(someJobs)

     Age  Rating      Salary   Age
0   44.0     5.4   $44k-$99k  44.0
1   66.0     3.5   $55k-$66k  66.0
2    NaN    -1.0   $77k-$89k   NaN
3   64.0     4.4   $44k-$99k  64.0
4   25.0     6.4   $44k-$99k  25.0
5   44.0     1.4   $77k-$89k  44.0
6   21.0     0.0   $44k-$99k  21.0
7   44.0    -1.0   $44k-$99k  44.0
8   35.0     5.4   $44k-$99k  35.0
9   22.0     7.7   $44k-$99k  22.0
10  55.0     5.4   $10k-$49k  55.0
11  44.0     6.7   $10k-$49k  44.0
12   NaN     0.0   $44k-$99k   NaN
13  25.0    -1.0   $44k-$99k  25.0
14  66.0     4.0   $44k-$99k  66.0
15  44.0     3.0  $88k-$101k  44.0
16  19.0     4.5   $19k-$40k  19.0
17   NaN     5.3   $44k-$99k   NaN
18  35.0     6.7   $44k-$99k  35.0
19  32.0     3.3   $44k-$99k  32.0
20   NaN     5.7   $44k-$99k   NaN
21  35.0     5.0   $44k-$99k  35.0
22  19.0     7.8   $55k-$66k  19.0
23   NaN     2.4   $44k-$99k   NaN
24  13.0    -1.0   $44k-$99k  13.0
25  55.0     0.0   $44k-$99k  55.0
26   NaN     NaN   $55k-$66k   NaN
27  52.0     5.4   $

pandas.core.frame.DataFrame

We can see that someJobs is a DataFrame object. Printing `someJobs` also shows that the Salary actually consists of the string including the`$`.

## Conditions

Another way to filter the data is to specify a condition on the column. This will indicate which rows match the condition.

In [11]:
# Show which rows have Age values > 30
jobs['Age']>30

0      True
1      True
2     False
3      True
4     False
5      True
6     False
7      True
8      True
9     False
10     True
11     True
12    False
13    False
14     True
15     True
16    False
17    False
18     True
19     True
20    False
21     True
22    False
23    False
24    False
25     True
26    False
27     True
28    False
Name: Age, dtype: bool

To filter the data frame based on this condition, we put the condition in the square brackets.

In [12]:
# show the rows where the condition is true
jobs[jobs['Age']>30]

Unnamed: 0,Index,Age,Salary,Rating,Location,Established,Easy Apply
0,0,44.0,$44k-$99k,5.4,"India,In",1999,TRUE
1,1,66.0,$55k-$66k,3.5,"New York,Ny",2002,TRUE
3,3,64.0,$44k-$99k,4.4,India In,1988,-1
5,5,44.0,$77k-$89k,1.4,"India,In",1999,TRUE
7,7,44.0,$44k-$99k,-1.0,Australia Aus,-1,-1
8,8,35.0,$44k-$99k,5.4,"New York,Ny",-1,-1
10,10,55.0,$10k-$49k,5.4,"India,In",2008,TRUE
11,11,44.0,$10k-$49k,6.7,"India,In",2009,-1
14,14,66.0,$44k-$99k,4.0,Australia Aus,2020,TRUE
15,15,44.0,$88k-$101k,3.0,Australia Aus,1999,-1


### Combining Conditions

We can combine Boolean conditions in pandas using the operators:
- `|` for `or`
- `&` for `and`
- `~` for `not`

In [13]:
# using & operator to select the rows that match both conditions
jobs[(jobs['Age']>30) & (jobs['Rating']< 3)]

Unnamed: 0,Index,Age,Salary,Rating,Location,Established,Easy Apply
5,5,44.0,$77k-$89k,1.4,"India,In",1999,TRUE
7,7,44.0,$44k-$99k,-1.0,Australia Aus,-1,-1
25,25,55.0,$44k-$99k,0.0,Australia Aus,1980,TRUE


In [17]:
# using ~ and | operators
jobs[~(jobs['Easy Apply']=='TRUE') | (jobs['Established']==-1)]

Unnamed: 0,Index,Age,Salary,Rating,Location,Established,Easy Apply
2,2,,$77k-$89k,-1.0,"New York,Ny",-1,-1
3,3,64.0,$44k-$99k,4.4,India In,1988,-1
4,4,25.0,$44k-$99k,6.4,Australia Aus,2002,-1
6,6,21.0,$44k-$99k,0.0,"New York,Ny",-1,-1
7,7,44.0,$44k-$99k,-1.0,Australia Aus,-1,-1
8,8,35.0,$44k-$99k,5.4,"New York,Ny",-1,-1
9,9,22.0,$44k-$99k,7.7,"India,In",-1,TRUE
11,11,44.0,$10k-$49k,6.7,"India,In",2009,-1
12,12,,$44k-$99k,0.0,"India,In",1999,-1
15,15,44.0,$88k-$101k,3.0,Australia Aus,1999,-1


### Using between()

The `between()` method is a useful way to deal with selecting a range of values.

The arguments specify the starting and ending interval along with whether to include boundary values using the `inclusive=` argument, where the argument values can be `left`, `right` or `both`


In [6]:
# Show jobs established between 2015 and 2020, inclusive
jobs[jobs['Established'].between(2015, 2020, inclusive='both')]

Unnamed: 0,Index,Age,Salary,Rating,Location,Established,Easy Apply
13,13,25.0,$44k-$99k,-1.0,Australia Aus,2019,True
14,14,66.0,$44k-$99k,4.0,Australia Aus,2020,True


## Subsetting rows and columns

We can filter the data set based on rows **and** columns using the `loc` and `iloc` attributes of the data frame. These allow us to specify based on the (*row*, *column*) required. 

### Using loc

The `.loc` attribute of a data frame is used to specify the rows and columns required, based on the labels. In our `jobs` data frame, the rows are labeled from 0 to 28 and the columns are labeled 'Index', 'Age', 'Salary', 'Rating', 'Location', 'Established' and 'Easy Apply'.
Let's try some examples.

In [19]:
# getting the value in row labeled 3 and column labeled 'Salary'
jobs.loc[3, 'Salary']

'$44k-$99k'

Specifying the column is optional, for example to obtain the values the row labeled '3', we can just use jobs.loc with the row label:

In [20]:
# getting the row labeled 3
jobs.loc[3]

Index                  3
Age                 64.0
Salary         $44k-$99k
Rating               4.4
Location        India In
Established         1988
Easy Apply            -1
Name: 3, dtype: object

However, if we wanted to get a specific column without specifying the rows, we would have to use the slicing operator, `:` to get all the rows.

In [54]:
# using : to select all rows for the 'Rating' column
jobs.loc[:,'Rating']

0     5.4
1     3.5
2    -1.0
3     4.4
4     6.4
5     1.4
6     0.0
7    -1.0
8     5.4
9     7.7
10    5.4
11    6.7
12    0.0
13   -1.0
14    4.0
15    3.0
16    4.5
17    5.3
18    6.7
19    3.3
20    5.7
21    5.0
22    7.8
23    2.4
24   -1.0
25    0.0
26    NaN
27    5.4
28    3.4
Name: Rating, dtype: float64

To specify more than one row or column we can use lists:

In [24]:
# selecting row labeled 13 and 2 columns
jobs.loc[[13,14,15], ['Age', 'Salary']]

Unnamed: 0,Age,Salary
13,25.0,$44k-$99k
14,66.0,$44k-$99k
15,44.0,$88k-$101k


### Slicing

We can also use the slicing operator `:` to select the rows and columns.

Remember that the slicing operator selects based on the values of *start*:*stop*:*step*.

Using the slicing operator with `loc` is **inclusive** on both bounds, so it will select up to, and including the *stop* value.

You can also use the slicing operator to select columns based on their labels.


In [37]:
# using slicing operator to select rows labeled 13 up to and including 16 
# and columns labeled 'Age' up to and including 'Rating'
print(jobs.loc[13:19, 'Age': 'Established'])

# adding the step to step by 2
print(jobs.loc[13:19:2, 'Age':'Established':2])

     Age      Salary  Rating       Location  Established
13  25.0   $44k-$99k    -1.0  Australia Aus         2019
14  66.0   $44k-$99k     4.0  Australia Aus         2020
15  44.0  $88k-$101k     3.0  Australia Aus         1999
16  19.0   $19k-$40k     4.5       India,In         1984
17   NaN   $44k-$99k     5.3    New York,Ny         1943
18  35.0   $44k-$99k     6.7    New York,Ny         1954
19  32.0   $44k-$99k     3.3    New York,Ny         1955
     Age  Rating  Established
13  25.0    -1.0         2019
15  44.0     3.0         1999
17   NaN     5.3         1943
19  32.0     3.3         1955


### loc with Conditions

We can also use `loc` to filter rows based on a condition. The conditions are usually based on values found in specific rows.

In [3]:
# Find rows were Rating value is greater than 4 
jobs.loc[jobs['Rating']> 4]

Unnamed: 0,Index,Age,Salary,Rating,Location,Established,Easy Apply
0,0,44.0,$44k-$99k,5.4,"India,In",1999,TRUE
3,3,64.0,$44k-$99k,4.4,India In,1988,-1
4,4,25.0,$44k-$99k,6.4,Australia Aus,2002,-1
8,8,35.0,$44k-$99k,5.4,"New York,Ny",-1,-1
9,9,22.0,$44k-$99k,7.7,"India,In",-1,TRUE
10,10,55.0,$10k-$49k,5.4,"India,In",2008,TRUE
11,11,44.0,$10k-$49k,6.7,"India,In",2009,-1
16,16,19.0,$19k-$40k,4.5,"India,In",1984,-1
17,17,,$44k-$99k,5.3,"New York,Ny",1943,TRUE
18,18,35.0,$44k-$99k,6.7,"New York,Ny",1954,TRUE


Using `loc` allows us to filter the rows and determine the columns required to create a subset of the data.

We have to save the filtered data to a new object on order to perform further manipulation on it. 


In [4]:
# Filter the data to specific rows and columns and save in a new DataFrame object
highRatingJobs = jobs.loc[jobs['Rating']> 4, ['Age', 'Easy Apply', 'Location']]
highRatingJobs.head()

Unnamed: 0,Age,Easy Apply,Location
0,44.0,TRUE,"India,In"
3,64.0,-1,India In
4,25.0,-1,Australia Aus
8,35.0,-1,"New York,Ny"
9,22.0,TRUE,"India,In"


### Using iloc

The ‘dot’ i-loc attribute is another way of selecting the rows and columns. The difference between loc and i-loc is that i-loc uses **integer** values to specify the rows and columns required.

You can use python’s slicing operators to define the range of integer values to use. For `iloc`, because the values are always integers, it performs the slicing without including the upper bound in the result.


In [6]:
# using iloc to select row index 5 and column index 6
# remember that the index always starts from 0
jobs.iloc[5, 6]

'TRUE'

In [8]:
# Specifying rows only selects all rows
jobs.iloc[[1,7,9]]

Unnamed: 0,Index,Age,Salary,Rating,Location,Established,Easy Apply
1,1,66.0,$55k-$66k,3.5,"New York,Ny",2002,TRUE
7,7,44.0,$44k-$99k,-1.0,Australia Aus,-1,-1
9,9,22.0,$44k-$99k,7.7,"India,In",-1,TRUE


In [10]:
# show all rows but only two columns
jobs.iloc[:, [3,5]]

Unnamed: 0,Rating,Established
0,5.4,1999
1,3.5,2002
2,-1.0,-1
3,4.4,1988
4,6.4,2002
5,1.4,1999
6,0.0,-1
7,-1.0,-1
8,5.4,-1
9,7.7,-1


### Slicing with iloc 
You can use python’s slicing operators to define the range of integer values to use. For `iloc`, because the values are always integers, it performs the slicing without including the upper bound in the result.

In [5]:
# Using iloc with slicing
jobs.iloc[2:10:2, 1:5]

Unnamed: 0,Age,Salary,Rating,Location
2,,$77k-$89k,-1.0,"New York,Ny"
4,25.0,$44k-$99k,6.4,Australia Aus
6,21.0,$44k-$99k,0.0,"New York,Ny"
8,35.0,$44k-$99k,5.4,"New York,Ny"


## Exercises

A sample dataset on unemployment in Asia for 2019 and 2020 has been downloaded from the [International Labour Organization ](https://ilostat.ilo.org/data/).

The dataset is generated based on "Employment-to-population ratio by sex and age" to compare data for 2019 and 2020 for adults aged 25+.

Save the dataset `ILOlabour.csv` into this working directory. 

In [1]:
import pandas as pd

empData = pd.read_csv("ILOlabour.csv")

How many rows and columns are in the data?


In [5]:
empData.shape

(117, 4)

In [6]:
empData.head()

Unnamed: 0,Reference area,Sex,2019,2020
0,Afghanistan,Female,20.1,14.0
1,Afghanistan,Male,74.3,67.8
2,Afghanistan,Total,48.0,41.7
3,Armenia,Female,38.0,34.5
4,Armenia,Male,59.7,55.3


Sort the data by Sex, then 2020 values in descending order.

In [8]:
empData.sort_values(by=['Sex','2020'], ascending=[True, False])

Unnamed: 0,Reference area,Sex,2019,2020
60,Lao People's Democratic Republic,Female,81.3,80.4
18,Cambodia,Female,79.1,77.2
51,"Korea, Democratic People's Republic of",Female,77.1,76.1
78,Nepal,Female,82.5,75.4
114,Viet Nam,Female,73.5,72.2
...,...,...,...,...
107,Turkey,Total,49.0,46.4
5,Armenia,Total,47.6,43.7
98,Tajikistan,Total,44.5,43.7
2,Afghanistan,Total,48.0,41.7


Filter the data to select only rows with Sex == 'Total' and store the result in a DataFrame called `totalEmp`.

In [11]:
totalEmp = empData[empData['Sex']=='Total']

Sort the `totalEmp` DataFrame by 2020 data, in descending order.


In [12]:
totalEmp.sort_values(by='2020', ascending='False')

Unnamed: 0,Reference area,Sex,2019,2020
41,"Iran, Islamic Republic of",Total,43.3,40.2
2,Afghanistan,Total,48.0,41.7
5,Armenia,Total,47.6,43.7
98,Tajikistan,Total,44.5,43.7
107,Turkey,Total,49.0,46.4
35,India,Total,53.8,49.0
110,Turkmenistan,Total,50.6,49.6
92,Sri Lanka,Total,55.3,51.9
29,Georgia,Total,60.5,52.9
83,Pakistan,Total,54.6,53.5


For the `totalEmp` data, we don't need the column labeled `Total` any more, so just select the other columns and save it into `totalEmp` again.

In [14]:
totalEmp = totalEmp[['Reference area', '2019', '2020']]

In [15]:
totalEmp

Unnamed: 0,Reference area,2019,2020
2,Afghanistan,48.0,41.7
5,Armenia,47.6,43.7
8,Azerbaijan,68.9,64.2
11,Bangladesh,62.9,60.9
14,Bhutan,73.4,68.9
17,Brunei Darussalam,67.4,66.2
20,Cambodia,85.2,84.0
23,China,69.8,68.7
26,Cyprus,63.7,63.1
29,Georgia,60.5,52.9


Using `.loc` on the `totalEmp` data set, find the countries where the total employment-to-population ratio is below 50 in 2020.

In [16]:
totalEmp.loc[totalEmp['2020']<50]

Unnamed: 0,Reference area,2019,2020
2,Afghanistan,48.0,41.7
5,Armenia,47.6,43.7
35,India,53.8,49.0
41,"Iran, Islamic Republic of",43.3,40.2
98,Tajikistan,44.5,43.7
107,Turkey,49.0,46.4
110,Turkmenistan,50.6,49.6


Using `.loc` on the `totalEmp` data set, find the countries where the total employment-to-population ratio is lower in 2020 than 2019.

In [17]:
totalEmp.loc[totalEmp['2020']<totalEmp['2019']]

Unnamed: 0,Reference area,2019,2020
2,Afghanistan,48.0,41.7
5,Armenia,47.6,43.7
8,Azerbaijan,68.9,64.2
11,Bangladesh,62.9,60.9
14,Bhutan,73.4,68.9
17,Brunei Darussalam,67.4,66.2
20,Cambodia,85.2,84.0
23,China,69.8,68.7
26,Cyprus,63.7,63.1
29,Georgia,60.5,52.9
