# Tutoriel Python et pickle

**cf le cours :** https://www.datacamp.com/community/tutorials/pickle-python-tutorial

*Pickling is useful for applications where you need some degree of persistency in your data.*

For this tutorial, you will be pickling a simple dictionary. A dictionary is a list of `key : value`elements.

In [1]:
import pickle

# Déclaration d'un petit dictionnaire `chien:age`
dogs_dict = { 'Ozzy': 3, 'Filou': 8, 'Luna': 5, 'Skippy': 10, 'Barco': 12, 'Balou': 9, 'Laika': 16 }

#### Pickling files

In [2]:
filename = 'data/dogs'
outfile = open(filename, 'wb')

pickle.dump(dogs_dict, outfile)
outfile.close()

#### Unpickling files

In [3]:
infile = open(filename, 'rb')
new_dict = pickle.load(infile)
infile.close()

**Let's print the dictionary and compare it to the previous one :**

In [4]:
print(new_dict)
print('\n')
print(new_dict==dogs_dict)
print(type(new_dict))

{'Ozzy': 3, 'Filou': 8, 'Luna': 5, 'Skippy': 10, 'Barco': 12, 'Balou': 9, 'Laika': 16}


True
<class 'dict'>


#### Compressing pickle files 

If you are saving a large dataset and your pickled file takes up a lot of space, you may want to compress it. This can be done using `bzip2` or `gzip`. They both compress files, but `bzip2` is a bit slower. `gzip`, however, produces files about twice as large as `bzip2`. You'll be using `bzip2` in this tutorial.

In [5]:
import bz2
import pickle 

sfile = bz2.BZ2File('data/smallerfile', 'w')
pickle.dump(dogs_dict, sfile)

sfile.close()

#### Unpickling Python 2 objects in Python 3

You might sometimes come across objects that were pickled in Python 2 while running Python 3. This can be a hassle to unpickle.

You could either unpickle it by running Python 2, or do it in Python 3 with `encoding='latin1'` in the `load()` function.

In [6]:
infile = open(filename, 'rb')
new_dict = pickle.load(infile, encoding='latin1')

This will not work if your objects contains NumPy arrays. In that case, you could also try using `encoding='bytes'`:

In [7]:
infile = open(filename,'rb')
new_dict = pickle.load(infile, encoding='bytes')

# Tutoriel Data frames `panda` in Python 
**cf le cours :** https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python

Pandas is a popular Python package for data science, and with good reason: it offers powerful, expressive and flexible data structures that make data manipulation and analysis easy, among many other things. The DataFrame is one of these structures.



This tutorial covers Pandas DataFrames, from basic manipulations to advanced operations, by tackling 11 of the most popular questions so that you understand -and avoid- the doubts of the Pythonistas who have gone before you.


### Let's create `panda` data !

In [8]:
import pandas as pd
import os
import numpy as np

In [39]:
df = pd.DataFrame(
    {
        "Name": [
            "Braund, Mr. Owen Harris",
            "Allen, Mr. William Henry",
            "Bonnell, Miss. Elizabeth",
        ],
        "Age": [22, 35, 58],
        "Sex": ["male", "male", "female"],
    }
)

df

Unnamed: 0,Name,Age,Sex
0,"Braund, Mr. Owen Harris",22,male
1,"Allen, Mr. William Henry",35,male
2,"Bonnell, Miss. Elizabeth",58,female


In [10]:
print(os.popen("ls ").read())

bash_tuto.ipynb
data
networkx.ipynb
panda_DataFrame_&Pickle_tuto.ipynb
Pandas_plot_hist_etc.ipynb



- ### Animals.csv <br>
Petit fichier de tuto contenant des lignes d'animaux <br>
Les colonnes sont : `animal`, `age`, `prenom`

In [40]:
animals = pd.read_csv('data/animals.csv')

animals.info()
animals.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Animal  5 non-null      object
 1   Age     5 non-null      int64 
 2   Prenom  5 non-null      object
dtypes: int64(1), object(2)
memory usage: 248.0+ bytes


Unnamed: 0,Animal,Age,Prenom
0,chat,9,Tigrou
1,chien,5,Chipie
2,chat,35,Caramel
3,hamster,9,Zoé
4,chat,12,Bongo


In [12]:
# création du tableau (colonne) Birth_date

birth_date = open('data/birth_date.txt').read().splitlines()


print("** birth dates : " + str(birth_date) + "\n** birth length : " + str(len(birth_date)))
print("** birth [0] : " + birth_date[0])

** birth dates : ['(10/08/2016)', '(16/09/2019)', '(03/12/2020)', '(04/09/2017)', '(03/11/2019)']
** birth length : 5
** birth [0] : (10/08/2016)


In [13]:
# ajout de la colonne Birth_date au DataFrame
animals["Birth_date"] = birth_date
animals.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Animal      5 non-null      object
 1   Age         5 non-null      int64 
 2   Prenom      5 non-null      object
 3   Birth_date  5 non-null      object
dtypes: int64(1), object(3)
memory usage: 288.0+ bytes


In [14]:
animals.head()
# La colonne Birth_date apparaît

Unnamed: 0,Animal,Age,Prenom,Birth_date
0,chat,9,Tigrou,(10/08/2016)
1,chien,5,Chipie,(16/09/2019)
2,chat,35,Caramel,(03/12/2020)
3,hamster,9,Zoé,(04/09/2017)
4,chat,12,Bongo,(03/11/2019)


#### Utiliser les fonction `head`, `tail` et `[ ]`

In [15]:
animals.head(2)

Unnamed: 0,Animal,Age,Prenom,Birth_date
0,chat,9,Tigrou,(10/08/2016)
1,chien,5,Chipie,(16/09/2019)


In [16]:
animals.tail(3)

Unnamed: 0,Animal,Age,Prenom,Birth_date
2,chat,35,Caramel,(03/12/2020)
3,hamster,9,Zoé,(04/09/2017)
4,chat,12,Bongo,(03/11/2019)


In [17]:
animals[2:4]

Unnamed: 0,Animal,Age,Prenom,Birth_date
2,chat,35,Caramel,(03/12/2020)
3,hamster,9,Zoé,(04/09/2017)


In [18]:
animals[2:]

Unnamed: 0,Animal,Age,Prenom,Birth_date
2,chat,35,Caramel,(03/12/2020)
3,hamster,9,Zoé,(04/09/2017)
4,chat,12,Bongo,(03/11/2019)


In [19]:
animals[:2]

Unnamed: 0,Animal,Age,Prenom,Birth_date
0,chat,9,Tigrou,(10/08/2016)
1,chien,5,Chipie,(16/09/2019)


In [20]:
# Extraire les dates de naissance
animals.loc[:, "Birth_date"]

0    (10/08/2016)
1    (16/09/2019)
2    (03/12/2020)
3    (04/09/2017)
4    (03/11/2019)
Name: Birth_date, dtype: object

In [21]:
# N'afficher que les chats
animals[animals.Animal=='chat']

Unnamed: 0,Animal,Age,Prenom,Birth_date
0,chat,9,Tigrou,(10/08/2016)
2,chat,35,Caramel,(03/12/2020)
4,chat,12,Bongo,(03/11/2019)


In [22]:
# N'afficher que les chats
animals.loc[animals.Animal=='chat', :]

Unnamed: 0,Animal,Age,Prenom,Birth_date
0,chat,9,Tigrou,(10/08/2016)
2,chat,35,Caramel,(03/12/2020)
4,chat,12,Bongo,(03/11/2019)


In [23]:
# N'afficher que les dates de naissance des chats
animals.loc[animals.Animal=='chat', 'Age']

0     9
2    35
4    12
Name: Age, dtype: int64

In [24]:
# Obtenir la liste des âges des chats
list(animals.loc[animals.Animal=='chat', 'Age'])

[9, 35, 12]

- Extraire plusieurs colonnes d'un `dataset` Panda

In [25]:
animals.head()

Unnamed: 0,Animal,Age,Prenom,Birth_date
0,chat,9,Tigrou,(10/08/2016)
1,chien,5,Chipie,(16/09/2019)
2,chat,35,Caramel,(03/12/2020)
3,hamster,9,Zoé,(04/09/2017)
4,chat,12,Bongo,(03/11/2019)


In [26]:
age_prenom = animals[['Age', 'Prenom']]

In [27]:
age_prenom.head()

Unnamed: 0,Age,Prenom
0,9,Tigrou
1,5,Chipie
2,35,Caramel
3,9,Zoé
4,12,Bongo


In [28]:
age_birthdate = animals.iloc[: , -2:]

In [29]:
age_birthdate.head()

Unnamed: 0,Prenom,Birth_date
0,Tigrou,(10/08/2016)
1,Chipie,(16/09/2019)
2,Caramel,(03/12/2020)
3,Zoé,(04/09/2017)
4,Bongo,(03/11/2019)


- #### Transformation d'une colonne 'String enum' en 'entier'

In [41]:
animals.head()

Unnamed: 0,Animal,Age,Prenom
0,chat,9,Tigrou
1,chien,5,Chipie
2,chat,35,Caramel
3,hamster,9,Zoé
4,chat,12,Bongo


In [42]:
animals['Animal'] = animals['Animal'].map({'chat': 0, 'chien': 1, 'hamster': 2})

In [43]:
animals.head()

Unnamed: 0,Animal,Age,Prenom
0,0,9,Tigrou
1,1,5,Chipie
2,0,35,Caramel
3,2,9,Zoé
4,0,12,Bongo
