![](../docs/banner.png)

# Pandas

**Tomas Beuzen, September 2020**

These exercises complement [Chapter 7](../chapters/chapter7-pandas.ipynb).

## Exercises

### 1.

In this set of practice exercises we'll be investigating the carbon footprint of different foods. We'll be leveraging a dataset compiled by [Kasia Kulma](https://r-tastic.co.uk/post/from-messy-to-tidy/) and contributed to [R's Tidy Tuesday project](https://github.com/rfordatascience/tidytuesday).

Start by importing pandas with the alias `pd`.

In [None]:
import pandas as pd

### 2.

The dataset we'll be working with has the following columns:

|column      |description |
|:-------------|:-----------|
|country       | Country Name |
|food_category | Food Category |
|consumption   | Consumption (kg/person/year) |
|co2_emmission | Co2 Emission (Kg CO2/person/year) |


Import the dataset as a dataframe named `df` from this url: <https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv>

In [None]:
url = 'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv'
df = pd.read_csv(url, skiprows=[1])
# Print a few records of df
df.head()

Unnamed: 0,country,food_category,consumption,co2_emmission
0,Argentina,Poultry,38.66,41.53
1,Argentina,Beef,55.48,1712.0
2,Argentina,Lamb & Goat,1.56,54.63
3,Argentina,Fish,4.36,6.96
4,Argentina,Eggs,11.39,10.46


### 3.

How many rows and columns are there in the dataframe?

In [None]:
df.shape

(1429, 4)

There are 1429 rows and 4 columns.

### 4.

What is the type of data in each column of `df`?

In [None]:
df.dtypes

country           object
food_category     object
consumption      float64
co2_emmission    float64
dtype: object

### 5.

What is the mean `co2_emission` of the whole dataset?

In [None]:
df.mean()


  """Entry point for launching an IPython kernel.


consumption      28.122722
co2_emmission    74.410014
dtype: float64

Mean co2 emission of whole set is 74.41004


### 6.

How many different kinds of foods are there in the dataset? How many countries are in the dataset?

In [None]:
#df.food_category
print(f"There are {df['food_category'].nunique()} foods.")
print(f"There are {df['country'].nunique()} countries.")


There are 11 foods.
There are 130 countries.


### 7.

What is the maximum `co2_emmission` in the dataset and which food type and country does it belong to?

In [None]:
df['co2_emmission'].max()


1712.0

T

The maximum co2_emmision in the dataset is 1712.

### 8.

How many countries produce more than 1000 Kg CO2/person/year for at least one food type?

In [None]:
df['co2_emmission'] = df['co2_emmission'].astype(int)
df.co2_emmission.dtype
df3=df[df['co2_emmission'] >1000]
print(f'There are  {df3.country.nunique()} countries producing more than 1000 Kg CO2/person/year for at least one food type.')



There are  5 countries producing more than 1000 Kg CO2/person/year for at least one food type.


### 9.

Which country consumes the least amount of beef per person per year?

In [None]:

df4 = df[df['food_category'].str.contains('Beef')]
#print(df4)
df4['consumption'].min()
print(df4[df4.consumption== df4.consumption.min()])
print(f"Liberia consumes the least amount of beef per person per year.")


      country food_category  consumption  co2_emmission
1409  Liberia          Beef         0.78             24
Liberia consumes the least amount of beef per person per year.


### 10.

Which country consumes the most amount of soybeans per person per year?

In [None]:

df5 = df[df['food_category'].str.contains('Soybeans')]
#print(df5)
df5['consumption'].max()
print(df5[df5.consumption== df5.consumption.max()])
print(f"Taiwan consumes the most amount of Soybeans per person per year.")

          country food_category  consumption  co2_emmission
1009  Taiwan. ROC      Soybeans        16.95              7
Taiwan consumes the most amount of Soybeans per person per year.


### 11.

What is the total emissions of all the meat products (Pork, Poultry, Fish, Lamb & Goat, Beef) in the dataset combined?

In [None]:
df6= df[df['food_category'].str.contains("Pork|Poultry|Fish|Lamb|Goat|Beef")]
df6.shape
df6.co2_emmission.sum()

74075

### 12.

What is the total emissions of all other (non-meat) products in the dataset combined?

In [None]:
#df['food_category'].unique()
df7 = df[df.food_category != "Pork|Poultry|Fish|Lamb|Goat|Beef"]
df7.shape
df7.co2_emmission.sum()


105658

<hr>
<hr>
<hr>

## Solutions

### 1.

In this set of practice exercises we'll be investigating the carbon footprint of different foods. We'll be leveraging a dataset compiled by [Kasia Kulma](https://r-tastic.co.uk/post/from-messy-to-tidy/) and contributed to [R's Tidy Tuesday project](https://github.com/rfordatascience/tidytuesday).

Start by importing pandas with the alias `pd`.

In [None]:
import pandas as pd

### 2.

The dataset we'll be working with has the following columns:

|column      |description |
|:-------------|:-----------|
|country       | Country Name |
|food_category | Food Category |
|consumption   | Consumption (kg/person/year) |
|co2_emmission | Co2 Emission (Kg CO2/person/year) |


Import the dataset as a dataframe named `df` from this url: <https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv>

In [None]:
url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv"
df = pd.read_csv(url)
df

Unnamed: 0,country,food_category,consumption,co2_emmission
0,Argentina,Pork,10.51,37.20
1,Argentina,Poultry,38.66,41.53
2,Argentina,Beef,55.48,1712.00
3,Argentina,Lamb & Goat,1.56,54.63
4,Argentina,Fish,4.36,6.96
...,...,...,...,...
1425,Bangladesh,Milk - inc. cheese,21.91,31.21
1426,Bangladesh,Wheat and Wheat Products,17.47,3.33
1427,Bangladesh,Rice,171.73,219.76
1428,Bangladesh,Soybeans,0.61,0.27


### 3.

How many rows and columns are there in the dataframe?

In [None]:
df.shape

(1430, 4)

### 4.

What is the type of data in each column of `df`?

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1430 entries, 0 to 1429
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   country        1430 non-null   object 
 1   food_category  1430 non-null   object 
 2   consumption    1430 non-null   float64
 3   co2_emmission  1430 non-null   float64
dtypes: float64(2), object(2)
memory usage: 44.8+ KB


### 5.

What is the mean `co2_emission` of the whole dataset?

In [None]:
df["co2_emmission"].mean()

74.383993006993

### 6.

How many different kinds of foods are there in the dataset? How many countries are in the dataset?

In [None]:
print(f"There are {df['food_category'].nunique()} foods.")
print(f"There are {df['country'].nunique()} countries.")

There are 11 foods.
There are 130 countries.


### 7.

What is the maximum `co2_emmission` in the dataset and which food type and country does it belong to?

In [None]:
df.iloc[df['co2_emmission'].idxmax()]

country          Argentina
food_category         Beef
consumption          55.48
co2_emmission         1712
Name: 2, dtype: object

### 8.