# 0101 - First Session With Python - Solution Notebook

* Written by Alexandre Gazagnes
* Last update: 2025-02-01

## About 

### Using Jupyter

You have 2 options: 
- Locally: 

    - **Install Anaconda https://www.anaconda.com/ or Jupyter https://jupyter.org/install on your machine**

    - Use Anaconda or Jupyter installed on the Unilasalle PC (**Warning ⚠️**: some packages may be missing) 


- Online:

    - **Use Google Colab https://colab.research.google.com/** (you have to be connected to your google account)

    - **Open this notebook on Google colab URL**
        * Badge

    - Use Jupyter online  https://jupyter.org/try-jupyter (**Warning ⚠️**: External packages cannot be installed) 


### Material

All the material for this course could be found here.
- https://github.com/AlexandreGazagnes/Unilassalle-Public-Ressources/tree/main/4a-data-analysis

### Python / Jupyter ? 

Few Questions : 
- Why Python
- Python vs R ? 
- What is Data Analysis ? 
- What are we talking about ? 
- What is Jupyter ?

### Context

You are a new employee of the NPO named "NPO".

You are in charged of data analysis.

First project is about GHG emissions, more precisely regarding Bovine Meat.

### Data

After a quick look on the internet, you find a very interesting dataset on the FAO website. It contains a list of various indicators. You decide to use this dataset to identify segments of countries.

- Find relevant data : 
    - https://www.kaggle.com/datasets/unitednations/global-food-agriculture-statistics
    - https://www.kaggle.com/datasets/dorbicycle/world-foodfeed-production
    - https://www.fao.org/faostat/en/
    - https://fr-en.openfoodfacts.org/
    - https://fr-en.openfoodfacts.org/data


**You can use a preprocessed version of the dataset [here](https://gist.githubusercontent.com/AlexandreGazagnes/2000e5c0e9149ffdb8c682a751ac448a/raw/35ad83320c26155415b7cccff8a4150ee80ee501/FAO_Unilassalle_raw.csv).** (Best option)



### Mission


Our job is to : 
* Prepare notebook environment
* Load data
* Explore data
* Clean data ==> Select relevant data
* Clean data ==> Handle missing values
* Clean data ==> Handle duplicates ? 
* Clean data ==> Handle outliers ?
* Perform some basic analysis and data inspection
* Perform some basic visualisation
* Export our data

### Usefull Ressources about Google Colab


- On Youtube : 
    - https://www.youtube.com/watch?v=8KeJZBZGtYo
    - https://www.youtube.com/watch?v=JJYZ3OE_lGo
    - https://www.youtube.com/watch?v=tCVXoTV12dE

### Usefull Ressources about Anaconda and Jupyter


- On Youtube : 
    - https://www.youtube.com/watch?v=ovlID7gefzE
    - https://www.youtube.com/watch?v=IMrxB8Mq5KU
    - https://www.youtube.com/watch?v=Ou-7G9VQugg
    - https://www.youtube.com/watch?v=5pf0_bpNbkw


### Teacher 

- More info : 
    - https://www.linkedin.com/in/alexandregazagnes/
    - https://github.com/AlexandreGazagnes
    

Youtube plyalist : 
* https://www.youtube.com/playlist?list=PLuU_Vh8r4mJDVJVyG3Lzv5ZTa0gMaWCRp

## Preliminaries

### System

These commands will display the system information:

Uncomment theses lines if needed. 

In [1]:
# pwd

In [2]:
# cd ..

In [3]:
# ls

In [4]:
# cd ..

In [5]:
# ls

These commands will install the required packages:

**Please note that if you are using google colab, all you need is already installed**

In [6]:
# !pip install pandas matplotlib seaborn plotly scikit-learn

This command will download the dataset:

**Please note that we will download the dataset later, in this notebook**

In [7]:
# !wget https://gist.githubusercontent.com/AlexandreGazagnes/2000e5c0e9149ffdb8c682a751ac448a/raw/35ad83320c26155415b7cccff8a4150ee80ee501/FAO_Unilassalle_raw.csv

### Imports

Import data libraries:

In [8]:
import pandas as pd  # DataFrame
import numpy as np  # Matrix and advanced maths operations

Import Graphical libraries:

In [9]:
import matplotlib.pyplot as plt  # Visualisation
import seaborn as sns  # Visualisation
import plotly.express as px  # Visualisation (not used here)

:warning:**These imports must be done, it is not possible to use this notebook without pandas, matplotlib etc.**

### Data

1st option : Download the dataset from the web

In [10]:
# url
url = "https://gist.githubusercontent.com/AlexandreGazagnes/2000e5c0e9149ffdb8c682a751ac448a/raw/35ad83320c26155415b7cccff8a4150ee80ee501/FAO_Unilassalle_raw.csv"
url

'https://gist.githubusercontent.com/AlexandreGazagnes/2000e5c0e9149ffdb8c682a751ac448a/raw/35ad83320c26155415b7cccff8a4150ee80ee501/FAO_Unilassalle_raw.csv'

Read the data : 

In [11]:
df = pd.read_csv(url, encoding="latin1")
df.head()

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,AFG,2,Afghanistan,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,...,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0,4711.0,4810,4895
1,AFG,2,Afghanistan,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
2,AFG,2,Afghanistan,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
3,AFG,2,Afghanistan,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
4,AFG,2,Afghanistan,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200


2nd Option : Read data from a file

In [12]:
# # or

# fn = "my/awsome/respository/my_awsome_file.csv"
# fn = "./data/source/FAO.csv"
# df = pd.read_csv(fn, encoding='latin1')

## Data Exploration

### Display

Display the first rows of the dataset:

In [13]:
# head

df.head()

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,AFG,2,Afghanistan,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,...,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0,4711.0,4810,4895
1,AFG,2,Afghanistan,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
2,AFG,2,Afghanistan,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
3,AFG,2,Afghanistan,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
4,AFG,2,Afghanistan,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200


Display the last rows of the dataset:

In [14]:
# tail

df.tail(10)

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
21467,ZWE,181,Zimbabwe,2943,Meat,5142,Food,1000 tonnes,-19.02,29.15,...,222.0,228.0,233.0,238.0,242.0,265.0,262.0,277.0,280,258
21468,ZWE,181,Zimbabwe,2945,Offals,5142,Food,1000 tonnes,-19.02,29.15,...,20.0,20.0,21.0,21.0,21.0,21.0,21.0,21.0,22,22
21469,ZWE,181,Zimbabwe,2946,Animal fats,5142,Food,1000 tonnes,-19.02,29.15,...,26.0,26.0,29.0,29.0,27.0,31.0,30.0,25.0,26,20
21470,ZWE,181,Zimbabwe,2949,Eggs,5142,Food,1000 tonnes,-19.02,29.15,...,15.0,18.0,18.0,21.0,22.0,27.0,27.0,24.0,24,25
21471,ZWE,181,Zimbabwe,2948,Milk - Excluding Butter,5521,Feed,1000 tonnes,-19.02,29.15,...,21.0,21.0,21.0,21.0,21.0,23.0,25.0,25.0,30,31
21472,ZWE,181,Zimbabwe,2948,Milk - Excluding Butter,5142,Food,1000 tonnes,-19.02,29.15,...,373.0,357.0,359.0,356.0,341.0,385.0,418.0,457.0,426,451
21473,ZWE,181,Zimbabwe,2960,"Fish, Seafood",5521,Feed,1000 tonnes,-19.02,29.15,...,5.0,4.0,9.0,6.0,9.0,5.0,15.0,15.0,15,15
21474,ZWE,181,Zimbabwe,2960,"Fish, Seafood",5142,Food,1000 tonnes,-19.02,29.15,...,18.0,14.0,17.0,14.0,15.0,18.0,29.0,40.0,40,40
21475,ZWE,181,Zimbabwe,2961,"Aquatic Products, Other",5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
21476,ZWE,181,Zimbabwe,2928,Miscellaneous,5142,Food,1000 tonnes,-19.02,29.15,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


Display a sample of the dataset:

In [15]:
# sample 10

df.sample(10)

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
6366,SLV,60,El Salvador,2764,"Marine Fish, Other",5521,Feed,1000 tonnes,13.79,-88.9,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
17493,SVN,198,Slovenia,2579,Sesameseed Oil,5142,Food,1000 tonnes,46.15,15.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
1197,AZE,52,Azerbaijan,2949,Eggs,5142,Food,1000 tonnes,40.14,47.58,...,44.0,47.0,42.0,50.0,64.0,68.0,67.0,72.0,70,77
1654,BLR,57,Belarus,2761,Freshwater Fish,5142,Food,1000 tonnes,53.71,27.95,...,12.0,14.0,17.0,25.0,34.0,38.0,40.0,31.0,34,34
2314,BIH,80,Bosnia and Herzegovina,2558,Rape and Mustardseed,5521,Feed,1000 tonnes,43.92,17.68,...,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0,2
13706,NLD,150,Netherlands,2536,Sugar cane,5521,Feed,1000 tonnes,52.13,5.29,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0
4282,CHN,214,"China, Taiwan Province of",2520,"Cereals, Other",5521,Feed,1000 tonnes,23.7,120.96,...,3.0,2.0,3.0,1.0,2.0,1.0,1.0,1.0,1,1
2412,BWA,20,Botswana,2514,Maize and products,5521,Feed,1000 tonnes,-22.33,24.68,...,3.0,3.0,3.0,3.0,4.0,3.0,3.0,4.0,4,8
19141,TLS,176,Timor-Leste,2905,Cereals - Excluding Beer,5142,Food,1000 tonnes,-8.87,125.73,...,139.0,155.0,156.0,159.0,170.0,162.0,178.0,179.0,185,183
9893,ITA,106,Italy,2734,Poultry Meat,5142,Food,1000 tonnes,41.87,12.57,...,917.0,904.0,795.0,940.0,1020.0,1045.0,1063.0,1101.0,1158,1135


In [16]:
# Sample with just 10% of the dataset

df.sample(frac=0.1)

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
4593,COD,46,Congo,2617,Apples and products,5142,Food,1000 tonnes,-0.23,15.83,...,0.0,0.0,1.0,0.0,1.0,1.0,1.0,2.0,2,3
1014,AUT,11,Austria,2640,Pepper,5142,Food,1000 tonnes,47.52,14.55,...,2.0,2.0,2.0,2.0,1.0,1.0,1.0,0.0,1,1
6963,FRA,68,France,2580,Olive Oil,5142,Food,1000 tonnes,46.23,2.21,...,99.0,98.0,100.0,104.0,110.0,116.0,114.0,113.0,114,112
10535,KEN,114,Kenya,2612,"Lemons, Limes and products",5142,Food,1000 tonnes,-0.02,37.91,...,14.0,14.0,14.0,15.0,16.0,18.0,14.0,11.0,23,17
13603,NPL,149,Nepal,2577,Palm Oil,5142,Food,1000 tonnes,28.39,84.12,...,30.0,32.0,24.0,23.0,19.0,19.0,23.0,21.0,12,23
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88,ALB,3,Albania,2513,Barley and products,5142,Food,1000 tonnes,41.15,20.17,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1,1
11070,LVA,119,Latvia,2531,Potatoes and products,5142,Food,1000 tonnes,56.88,24.60,...,284.0,272.0,253.0,213.0,225.0,284.0,243.0,232.0,251,239
8481,GUY,91,Guyana,2848,Milk - Excluding Butter,5142,Food,1000 tonnes,4.86,-58.93,...,100.0,93.0,96.0,94.0,97.0,94.0,92.0,95.0,112,113
16794,STP,193,Sao Tome and Principe,2782,"Fish, Liver Oil",5142,Food,1000 tonnes,0.19,6.61,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0


### Structure

What is the shape of the dataset?

In [17]:
# shape

df.shape

(21477, 63)

What data types are present in the dataset?

In [18]:
# dtypes

df.dtypes

Area Abbreviation     object
Area Code              int64
Area                  object
Item Code              int64
Item                  object
                      ...   
Y2009                float64
Y2010                float64
Y2011                float64
Y2012                  int64
Y2013                  int64
Length: 63, dtype: object

:warning: 
**Please note that we have here main python dtypes**
Data types : 
- int : *Integer* : 1,2,12332, 1_000_000
- float : *Float* : 1.243453, 198776.8789, 1.9776
- object : In this example object stands for *String* : "Paris", "Rouen", "Lea" 

Count the number of columns with specific data types:

In [19]:
# value_counts

df.dtypes.value_counts()

float64    53
object      5
int64       5
Name: count, dtype: int64

Select only string columns:

In [22]:
# select_dtypes

df.select_dtypes(include="object").head()

Unnamed: 0,Area Abbreviation,Area,Item,Element,Unit
0,AFG,Afghanistan,Wheat and products,Food,1000 tonnes
1,AFG,Afghanistan,Rice (Milled Equivalent),Food,1000 tonnes
2,AFG,Afghanistan,Barley and products,Feed,1000 tonnes
3,AFG,Afghanistan,Barley and products,Food,1000 tonnes
4,AFG,Afghanistan,Maize and products,Feed,1000 tonnes


Counting unique values for string columns : 

In [23]:
# nunique

df.select_dtypes(include="object").nunique()

Area Abbreviation    169
Area                 174
Item                 115
Element                2
Unit                   1
dtype: int64

### Select data

Display all the columns : 

In [None]:
# columns

df.columns

Just use a small number of columns : 

In [None]:
columns = [
    "Area Abbreviation",
    "Area Code",
    "Area",
    "Item Code",
    "Item",
    "Element Code",
    "Element",
    "Unit",
    "latitude",
    "longitude",
    "Y2010",
    "Y2011",
    "Y2012",
    "Y2013",
]
columns

Make your column selection and display the output : 

In [None]:
# loc ? => JUST THE OUTPUT

df.loc[:, columns].head()

If this Transformation is OK, you can re-write your ```df``` variable : 

In [None]:
# loc ? => REWRITE the DF

df = df.loc[:, columns]
df.sample(10)

Use ```iloc``` to select the nth line and the mth column : 

In [None]:
# iloc

n = 3
m = 3
df.iloc[n, m]

Use ```iloc``` to select data from 1st to the nth line and from first to the mth column : 

In [None]:
# iloc

n = 3
m = 3
df.iloc[:n, :m]

Just keep in mind the global shape of our dataset : 

In [None]:
df.sample(10)

And the names of our columns :

In [None]:
df.columns

Columns with the *code* key word are not relevant : 

In [None]:
columns = ["Area Code", "Item Code", "Element Code"]
columns

Suppose we have 1_000 columns ...

Let's find a more *pythonic* way to extract the *code* columns : 

In [31]:
columns = []
for col in df.columns:
    if "Code" in col:
        columns.append(col)

:clap: We have used : 
- a ```list``` : ```columns = [] ``` 
- a ```for``` loop
- an ```if``` statement 

What is the value of the ```columns``` variable ?

In [None]:
columns

Let's drop these columns : 

In [None]:
# drop columns

df.drop(columns=columns).head()

Rewrite our dataframe 

In [None]:
df = df.drop(columns=columns)
df.head()

In [None]:
# drop indexes

df.drop(index=[0, 1, 2]).head()

In [None]:
# Drop with errors="ignore"

df = df.drop(columns=columns, errors="ignore")
df.head()

Another usage of iloc : 

In [None]:
# Implenting iloc

df.iloc[:, 1:].head()

So far so good, we can rewrite our ```df```

In [None]:
# Saving our df

df = df.iloc[:, 1:]
df.head()

Selecting a specific column : 

In [None]:
# 1st implementation

df.Item.head()

In [None]:
# 2nd implementation

df.loc[:, "Item"].head()

Can we have a good representation of each unique value for the ```Item``` column ?

In [None]:
# Item unique ?

df.Item.sort_values().unique()

Is ```meat``` in our Item column ?

In [None]:
# Meat in Item unique ?

"Meat" in df.Item.unique()

Use a list, a for loop and an if statement to be sure to have all items with ```Meat``` : 

In [None]:
# Select meat items

meat_items = []

for item in df.Item.unique():
    if "Meat" in item:
        meat_items.append(item)

meat_items

Build a boolean selector : 

In [None]:
# Creating a selector True / False

selector = (df.Item == "Bovine Meat").tolist()
selector[:10]

Select relevant data with the ```loc``` method : 

In [None]:
# .loc

df.loc[selector, :].head()

Try a more advanced selection : 

In [None]:
# More advanced selection

df = df.loc[df.Item == "Bovine Meat"]
df.head()

What about Area ?

In [None]:
# Area?

df.Area.unique()[:10]

And area number of unique values ? 

In [None]:
# Area nunique ?

df.Area.nunique()

Same for Item : 

In [None]:
# Item nunique ?

df.Item.nunique()

Same for Unit : 

In [None]:
# Unit unique ?

df.Unit.nunique()

Drop uselss columns : 

In [None]:
# Drop other useless columns

columns = [
    "Item",
    "Element",
    "Unit",
    "latitude",
    "longitude",
]

df = df.drop(columns=columns, errors="ignore")
df

### NaN Values

Lets have a look to NaN (Not a Number) aka missing values : 

In [None]:
# Nan Values

df.isna().head()

Compute the sum of missing values for each line : 

In [None]:
# Sum of Nan Values

df.isna().sum()

Try to focus on a specifc column: 

In [None]:
# Select Nan Values

df.loc[df.Y2010.isna(), :]

Try to focus on a specific Country :

In [None]:
# Other selection
df.loc[df.Area == "Sudan", :]

Drop Sudan from our DataFrame : 

In [None]:
# Drop a specific row

df.loc[df.Area != "Sudan", :].head()

In [None]:
# Drop a specific row

df = df.loc[df.Area != "Sudan", :]

df.head()

Are we done ?


In [None]:
df.isna().sum()

Useless but fun : 

In [None]:
df.isna().sum().sum()

Final output of ```df``` :


In [None]:
df

### Data Inspection

In [None]:
# Describe

df.describe()

In [None]:
# Better describe ?

df.describe().round(2)

In [None]:
# Recast as int

df.describe().astype(int)

In [None]:
# Sort by values

df.sort_values(by="Y2010").head(20)

In [None]:
# Select small values

df.loc[df.Y2010 < 5, :]

In [None]:
# Select small values and sort

df.loc[df.Y2010 < 5, :].sort_values(by="Y2010")

In [None]:
# select 'big' values ==> drop lower values

df = df.loc[df.Y2010 > 5, :]
df.head()

In [None]:
# sort by values top :

df.sort_values(by="Y2010", ascending=False).head(20)

In [None]:
# Are we good ?

df.sort_values(by="Y2010", ascending=True).head(20)

In [None]:
# Just to be sure :

df.select_dtypes(include="number").head()

In [71]:
# Creating tmp variable, just with numeric values

tmp = df.select_dtypes(include="number")

In [None]:
# Correlation matrix is non sens here
# (sorry for that 😅)

corr = tmp.corr()
corr.round(4)

In [None]:
# Heatmap ?

sns.heatmap(corr, annot=True)

In [None]:
# Better heatmap ?

sns.heatmap(corr, annot=True, cmap="coolwarm", fmt=".4f", vmin=0, vmax=1)

In [None]:
# Best heatmap ever done ?

mask = np.triu(corr)
sns.heatmap(corr, annot=True, cmap="coolwarm", fmt=".4f", vmin=-1, vmax=1, mask=mask)

In [76]:
# Build your first function


def corr_heatmap(df):
    tmp = df.select_dtypes(include="number")
    corr = tmp.corr()
    mask = np.triu(corr)
    sns.heatmap(
        corr, annot=True, cmap="coolwarm", fmt=".4f", vmin=-1, vmax=1, mask=mask
    )

In [None]:
# Use this function

corr_heatmap(df)

## Visualisation

### Distplot

In [None]:
# Just to be sure

df.sort_values("Y2010", ascending=False).head(20)

In [None]:
# Just to be sure

df.sort_values("Y2010", ascending=False).tail(20)

In [None]:
# Distplot

sns.displot(df.Y2010, kde=True)

In [None]:
# Distplot normal

sns.displot(np.random.normal(size=10000), kde=True, bins=100)

In [None]:
# What about skewness ?

df.Y2010.skew()

In [None]:
# What about kurtosis ?

df.Y2010.kurtosis()

In [None]:
# Log1p => log(x+1) ?

log_Y2010 = np.log1p(df.Y2010)
sns.displot(log_Y2010, kde=True)

In [None]:
# Top 5

top_5 = df.sort_values("Y2010", ascending=False).head(5)
top_5

### Barplot

In [None]:
# Bar plot

sns.barplot(data=top_5, x="Area", y="Y2010")

In [None]:
# Same but better

px.bar(data_frame=top_5, x="Area", y="Y2010")

### Boxplot

In [None]:
# My favorite plot EVER ;)

sns.boxplot(data=df.Y2010)

In [None]:
# Ok, this one

sns.boxplot(data=np.log1p(df.Y2010))

In [None]:
# Just another df output

df

### Lineplot

In [None]:
# Melt ?

melt = pd.melt(df, id_vars=["Area"], value_vars=["Y2010", "Y2011", "Y2012", "Y2013"])
melt

In [None]:
# Boxplot

sns.boxplot(data=melt, x="variable", y="value")

In [None]:
# Line plot

px.line(data_frame=melt, x="variable", y="value", color="Area")

In [None]:
# Melt only top 5

melt = pd.melt(top_5, id_vars=["Area"], value_vars=["Y2010", "Y2011", "Y2012", "Y2013"])
px.line(data_frame=melt, x="variable", y="value", color="Area")

## Export

Export the csv file : 

In [95]:
df.to_csv("data.csv", index=False)