## Memory Optimization in Python



## Step 1: Import the Required Library

Let's learn about memory optimization in depth.

- Import the pandas library


In [1]:
import pandas as pd

## Step 2: Load the Dataset

- Load the **HousePrices.csv** dataset using pandas:


In [4]:
df = pd.read_csv('../../Datasets/HousePrices.csv')

In [5]:
df.head(2)

Unnamed: 0,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,sqft_above,sqft_basement,yr_built,yr_renovated,street,city,statezip,country
0,2014-05-02 00:00:00,313000.0,3.0,1.5,1340,7912,1.5,0,0,3,1340,0,1955,2005,18810 Densmore Ave N,Shoreline,WA 98133,USA
1,2014-05-02 00:00:00,2384000.0,5.0,2.5,3650,9050,2.0,0,4,5,3370,280,1921,0,709 W Blaine St,Seattle,WA 98119,USA


## Step 3: Check the DataFrame's Size and Memory Usage

- Check the size and memory usage of the DataFrame:


In [6]:
df.size

82800

**Observation**

The size of the DataFrame is 82800. 

- Use deep=True in memory_usage() to get the memory usage of the object columns:

In [7]:
df.memory_usage(deep=True) 

Index               128
date             349600
price             36800
bedrooms          36800
bathrooms         36800
sqft_living       36800
sqft_lot          36800
floors            36800
waterfront        36800
view              36800
condition         36800
sqft_above        36800
sqft_basement     36800
yr_built          36800
yr_renovated      36800
street           340484
city             297868
statezip         299000
country          276000
dtype: int64

**Observation**

Here, we can see the memory utilization of each column.

## Step 4: Check the DataFrame's Data Types

- Check the data types of the DataFrame columns:


In [8]:
df.dtypes

date              object
price            float64
bedrooms         float64
bathrooms        float64
sqft_living        int64
sqft_lot           int64
floors           float64
waterfront         int64
view               int64
condition          int64
sqft_above         int64
sqft_basement      int64
yr_built           int64
yr_renovated       int64
street            object
city              object
statezip          object
country           object
dtype: object

## Step 5: Convert the Column's Data Types

- Convert the data types of the columns to reduce memory usage:


In [9]:
df['date'] = pd.to_datetime(df['date'])
df['street'] = df['street'].astype(str)
df['city'] = df['city'].astype(str)
df['statezip'] = df['statezip'].astype(str)
df['country'] = df['country'].astype(str)

## Step 6: Set the Index for the DataFrame

- Set the **date** column as the index for the DataFrame:


In [10]:
df.set_index(['date'],inplace = True)

## Step 7: Check the Updated Memory Usage

- Check the memory usage of the DataFrame after converting the data types:


In [11]:
df.memory_usage(deep=True)

Index             36800
price             36800
bedrooms          36800
bathrooms         36800
sqft_living       36800
sqft_lot          36800
floors            36800
waterfront        36800
view              36800
condition         36800
sqft_above        36800
sqft_basement     36800
yr_built          36800
yr_renovated      36800
street           340484
city             297868
statezip         299000
country          276000
dtype: int64

In [12]:
df.size

78200

**Observation**

We can see that the memory used has reduced. 

In [13]:
df.head(2)

Unnamed: 0_level_0,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,sqft_above,sqft_basement,yr_built,yr_renovated,street,city,statezip,country
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2014-05-02,313000.0,3.0,1.5,1340,7912,1.5,0,0,3,1340,0,1955,2005,18810 Densmore Ave N,Shoreline,WA 98133,USA
2014-05-02,2384000.0,5.0,2.5,3650,9050,2.0,0,4,5,3370,280,1921,0,709 W Blaine St,Seattle,WA 98119,USA


## Step 8: Further Reduce the Memory Usage by Converting More Columns

- Convert the **bedrooms** and **price** columns to smaller data types:


In [14]:
df['bedrooms'] = df['bedrooms'].astype('int8')

In [15]:
df['price'] = df['price'].astype('int32')

## Step 9: Check Final Memory Usage

- Check the memory usage of the DataFrame after all data type conversions:


In [16]:
df.size

78200

- Use **deep = True** to get the memory usage of the object columns:

In [17]:
df.memory_usage(deep = True)

Index             36800
price             18400
bedrooms           4600
bathrooms         36800
sqft_living       36800
sqft_lot          36800
floors            36800
waterfront        36800
view              36800
condition         36800
sqft_above        36800
sqft_basement     36800
yr_built          36800
yr_renovated      36800
street           340484
city             297868
statezip         299000
country          276000
dtype: int64

**Observation**

We can see the memory utilization of columns **bedrooms** and **price** has been reduced.