![](../docs/banner.png)

# Pandas

**Tomas Beuzen, September 2020**

These exercises complement [Chapter 7](../chapters/chapter7-pandas.ipynb).

## Exercises

### 1.

In this set of practice exercises we'll be investigating the carbon footprint of different foods. We'll be leveraging a dataset compiled by [Kasia Kulma](https://r-tastic.co.uk/post/from-messy-to-tidy/) and contributed to [R's Tidy Tuesday project](https://github.com/rfordatascience/tidytuesday).

Start by importing pandas with the alias `pd`.

In [1]:
import pandas as pd

### 2.

The dataset we'll be working with has the following columns:

|column      |description |
|:-------------|:-----------|
|country       | Country Name |
|food_category | Food Category |
|consumption   | Consumption (kg/person/year) |
|co2_emmission | Co2 Emission (Kg CO2/person/year) |


Import the dataset as a dataframe named `df` from this url: <https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv>

In [2]:
url="https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv"
df= pd.read_csv(url)

### 3.

How many rows and columns are there in the dataframe?

In [3]:
df.shape

(1430, 4)

### 4.

What is the type of data in each column of `df`?

In [4]:
df.dtypes

country           object
food_category     object
consumption      float64
co2_emmission    float64
dtype: object

### 5.

What is the mean `co2_emission` of the whole dataset?

In [5]:
df['co2_emmission'].mean()

74.38399300699302

### 6.

How many different kinds of foods are there in the dataset? How many countries are in the dataset?

In [6]:
print(f"There are {df['food_category'].nunique()} foods.")
print(f"There are {df['country'].nunique()} countries.")

There are 11 foods.
There are 130 countries.


### 7.

What is the maximum `co2_emmission` in the dataset and which food type and country does it belong to?

In [7]:
df.iloc[df['co2_emmission'].idxmax()]

country          Argentina
food_category         Beef
consumption          55.48
co2_emmission       1712.0
Name: 2, dtype: object

### 8.

How many countries produce more than 1000 Kg CO2/person/year for at least one food type?

In [8]:
df[df['co2_emmission']>1000].count()

country          5
food_category    5
consumption      5
co2_emmission    5
dtype: int64

### 9.

Which country consumes the least amount of beef per person per year?

In [9]:
(df.query("food_category == 'Beef'").sort_values(by="consumption").head(1))

Unnamed: 0,country,food_category,consumption,co2_emmission
1410,Liberia,Beef,0.78,24.07


### 10.

Which country consumes the most amount of soybeans per person per year?

In [10]:
(df.query("food_category=='Soybeans'").sort_values(by="consumption", ascending=False).head(1))

Unnamed: 0,country,food_category,consumption,co2_emmission
1010,Taiwan. ROC,Soybeans,16.95,7.63


### 11.

What is the total emissions of all the meat products (Pork, Poultry, Fish, Lamb & Goat, Beef) in the dataset combined?

In [11]:
meat = ['Poultry', 'Pork', 'Fish', 'Lamb & Goat', 'Beef']
df["co2_emmission"][df['food_category'].isin(meat)].sum()

74441.13

### 12.

What is the total emissions of all other (non-meat) products in the dataset combined?

In [12]:
meat = ['Poultry', 'Pork', 'Fish', 'Lamb & Goat', 'Beef']
df["co2_emmission"][~df['food_category'].isin(meat)].sum()

31927.98

<hr>
<hr>
<hr>

## Solutions

### 1.

In this set of practice exercises we'll be investigating the carbon footprint of different foods. We'll be leveraging a dataset compiled by [Kasia Kulma](https://r-tastic.co.uk/post/from-messy-to-tidy/) and contributed to [R's Tidy Tuesday project](https://github.com/rfordatascience/tidytuesday).

Start by importing pandas with the alias `pd`.

In [None]:
import pandas as pd

### 2.

The dataset we'll be working with has the following columns:

|column      |description |
|:-------------|:-----------|
|country       | Country Name |
|food_category | Food Category |
|consumption   | Consumption (kg/person/year) |
|co2_emmission | Co2 Emission (Kg CO2/person/year) |


Import the dataset as a dataframe named `df` from this url: <https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv>

In [None]:
url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv"
df = pd.read_csv(url)
df

### 3.

How many rows and columns are there in the dataframe?

In [None]:
df.shape

### 4.

What is the type of data in each column of `df`?

In [None]:
df.info()

### 5.

What is the mean `co2_emission` of the whole dataset?

In [None]:
df["co2_emmission"].mean()

### 6.

How many different kinds of foods are there in the dataset? How many countries are in the dataset?

In [None]:
print(f"There are {df['food_category'].nunique()} foods.")
print(f"There are {df['country'].nunique()} countries.")

### 7.

What is the maximum `co2_emmission` in the dataset and which food type and country does it belong to?

In [None]:
df.iloc[df['co2_emmission'].idxmax()]

### 8.