# Example

Elements of Data Science

by [Allen Downey](https://allendowney.com)

[MIT License](https://opensource.org/licenses/MIT)

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Cocoa bean production

This example is based on [this chart from Our World In Data](https://ourworldindata.org/grapher/cocoa-beans-production-by-region)

The following cell downloads the data:

In [3]:
# Get the data file

import os

filename = 'cocoa-beans-production-by-region.csv'
if not os.path.exists(filename):
    !wget https://github.com/AllenDowney/ElementsOfDataScience/raw/master/data/cocoa-beans-production-by-region.csv

Now we can read the data into a Pandas `DataFrame`.

In [4]:
df = pd.read_csv(filename)
df.head()

Unnamed: 0,Entity,Code,Year,"Crops - Cocoa, beans - 661 - Production - 5510 - tonnes"
0,Africa,,1961,835368
1,Africa,,1962,867170
2,Africa,,1963,922621
3,Africa,,1964,1190061
4,Africa,,1965,874245


In [5]:
df.columns = ['Entity', 'Code', 'Year', 'Tonnes']
df.head()

Unnamed: 0,Entity,Code,Year,Tonnes
0,Africa,,1961,835368
1,Africa,,1962,867170
2,Africa,,1963,922621
3,Africa,,1964,1190061
4,Africa,,1965,874245


In [6]:
df['MTonnes'] = df['Tonnes'] / 1e6
df['MTonnes'].describe()

count    4639.000000
mean        0.168764
std         0.508442
min         0.000000
25%         0.000499
50%         0.004300
75%         0.051492
max         5.277863
Name: MTonnes, dtype: float64

Here are the values in the `region` column.

In [7]:
df['Entity'].value_counts()

El Salvador                        58
Western Africa                     58
Fiji                               58
Sri Lanka                          58
Equatorial Guinea                  58
                                   ..
Pacific Islands Trust Territory    30
Micronesia (country)               24
Benin                              20
Martinique                         12
Tonga                              10
Name: Entity, Length: 84, dtype: int64

We can use Pandas to create a plot similar to the one in the article.

In [9]:
regions = ['Africa', 'Asia', 'South America', 'Central America', 'Oceania']

And maybe for countries in West Africa as well.

In [13]:
countries = ['Ghana', 'Nigeria', 'Togo', 'Benin', 'Guinea', 'Sierra Leone', 'Liberia']