**Pickle**

Python pickle module is used for serializing and de-serializing a Python object structure. Any object in Python can be pickled so that it can be saved on disk. What pickle does is that it “serializes” the object first before writing it to file. Pickling is a way to convert a python object (list, dict, etc.) into a character stream. The idea is that this character stream contains all the information necessary to reconstruct the object in another python script.

**Advantages**:

1) **Keeps same column data types**: Using CSV file also we can save/read the data into/from Disk but when you convert column data types and store into a csv and load the data,you will get initial column data types(not converted data types).

2) **Recursive objects (objects containing references to themselves)**: Pickle keeps track of the objects it has already serialized, so later references to the same object won’t be serialized again. (The marshal module breaks for this.)

3) **Object sharing (references to the same object in different places)**: This is similar to self- referencing objects; pickle stores the object once, and ensures that all other references point to the master copy. Shared objects remain shared, which can be very important for mutable objects.

4) **User-defined classes and their instances**: Marshal does not support these at all, but pickle can save and restore class instances transparently. The class definition must be importable and live in the same module as when the object was stored.

In [6]:
## Import necessary libraries

import os ## For connecting to local machine to get path for reading/writing files.
import pandas as pd ## To create dataframe/load/write files.
import pickle as pkl ## For writing/loading pickle file.

In [12]:
## Get current working directory.
os.getcwd()

'D:\\Python\\Optimization'

In [13]:
## Set workinng dorectory.
os.chdir("D:\DataScience\Pratice\Memory Optimization")
os.getcwd()

'D:\\DataScience\\Pratice\\Memory Optimization'

In [7]:
## Prepare a data dictionary.
datadict= {'Age': [1,2,3,4,5,8,10,10],
        'Gender': [1,1,0,0,1,0,0,1],
        'vacc':['yes','yes','no','no','no','yes','yes','no']
    
}

In [8]:
## Preare a dataframe with data dictionary and display it.
dataframe= pd.DataFrame(datadict)
dataframe

Unnamed: 0,Age,Gender,vacc
0,1,1,yes
1,2,1,yes
2,3,0,no
3,4,0,no
4,5,1,no
5,8,0,yes
6,10,0,yes
7,10,1,no


In [9]:
## Get the summary statistics of data.
dataframe.describe(include='all')

Unnamed: 0,Age,Gender,vacc
count,8.0,8.0,8
unique,,,2
top,,,no
freq,,,4
mean,5.375,0.5,
std,3.543102,0.534522,
min,1.0,0.0,
25%,2.75,0.0,
50%,4.5,0.5,
75%,8.5,1.0,


In [10]:
## Get the column data types.
dataframe.dtypes

Age        int64
Gender     int64
vacc      object
dtype: object

In [11]:
## Convert to appropriate datatypes.
cats=['Gender','vacc']
for i in cats:
    dataframe[i]=dataframe[i].astype('category')
dataframe.dtypes

Age          int64
Gender    category
vacc      category
dtype: object

In [15]:
## Dump/Save the pickle file into current directory with given file name.
def pickleFileDump(fileName,df):
    outfile=open(fileName,'wb')
    pkl.dump(df,outfile)
    outfile.close()

In [16]:
## Dump dataframe data into a pickle.
pickleFileDump('pickleFile',dataframe)

In [17]:
## Read the pickle file from current directory with given file name.
def readPickleFile(fileName):
    infile=open(fileName,'rb')
    df = pkl.load(infile)
    infile.close()
    return df

In [18]:
## Read dataframe data from a pickle.
loaded_pkl = readPickleFile('pickleFile')

In [19]:
## Check the data types after loading pickle back.
loaded_pkl.dtypes

Age          int64
Gender    category
vacc      category
dtype: object

## We can observe column data types are same before dumping data into pickle file and after loading pickle file.

In [21]:
## Save dataframe data into a csv file.
dataframe.to_csv('csvFile.csv',index=False)

In [22]:
## Read csvFile file.
loaded_csv = pd.read_csv('csvFile.csv')

In [23]:
## Check the data types after loading csv file
loaded_csv.dtypes

Age        int64
Gender     int64
vacc      object
dtype: object

## We can observe column data types are different before saving into csv file and after reading from csv file.