# Caching
### We look at caching in this notebook

In [None]:
%run ../Includes/Classroom-Setup

In [None]:
%fs ls /mnt/davis/fire-calls/fire-calls-truncated-comma.csv

path,name,size
dbfs:/mnt/davis/fire-calls/fire-calls-truncated-comma.csv,fire-calls-truncated-comma.csv,89222803


#### File size is 90MB in file system.

In [None]:
%python 
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.master('local[*]').appName('Cachingdf').getOrCreate()

In [None]:
%python
spark

In [None]:
%python
df = spark.read.csv('/mnt/davis/fire-calls/fire-calls-truncated-comma.csv',header='true',inferSchema='true')

#### Count took 3sec when the dataframe is not cached

In [None]:
%python
df.count()

#### Lets cache the dataframe 
#### Note cache is lazy in dataframe API so it will cache only after first action on the dataframe

In [None]:
%python
df.cache()

#### count took 12 seconds because under the hood it's caching the entire dataframe in memory

In [None]:
%python
df.count()

Now we can see entire dataframe is cached
## Spark UI

You'll notice that our data when cached actually takes up less space than our file on disk! That is thanks to the Tungsten Optimizer.

Our file in memory takes up ~59 MB, and on disk it takes up ~90 MB!

<div><img src="http://files.training.databricks.com/images/eLearning/ucdavis/inmemorysize.png" style="height: 300px; border: 1px solid #aaa; border-radius: 10px 10px 10px 10px; box-shadow: 5px 5px 5px #aaa; margin: 20px"/></div>

#### Lets count again
#### Now count took just 0.5sec because the entire dataframe is cached in memory

In [None]:
%python
df.count()

#### We can also check the storage level
#### More about storage level [link](https://data-flair.training/blogs/pyspark-storagelevel/)

In [None]:
%python
df.storageLevel

#### Lets unpersist the dataframe

In [None]:
%python
df.unpersist()

#### Now check the storage level after unpersist

In [None]:
%python
df.storageLevel

#### Count took 3sec again since dataframe is not cached

In [None]:
%python
df.count()