# Lazy mode

Lazy mode is the key reason why polars is so fast. Since data analysis often involves multiple steps we call a full set of steps a query:
* Loading data from an internal/external source
* Data transformation
* Grouping
* Feature extraction
...

Although most steps can be performed step by step in an eager mod (the typical pandas approach), it involves certain disadvantages:
* Each line of code is not aware of what the others are doing.
* Each line of code requires a copy of the full dataframe.

In contrast, an integrated query in the lazy mode is a polars alternative and can identigy efficiencies, minimize memory usage and produce a single centralized output.

#### To summarize:
**Eager mode:** Runs each line of code as soon as it is encountered

**Lazy mode:** each line is added to a query plan and the query plan is optimized internally.

In [None]:
import polars as pl

csv_file = 'Titanic.csv'

When dataframe is read in an eager mode, we get a **DataFrame** as a result.

In [None]:
dfEager = pl.read_csv(csv_file)

Alternatively, when we scan a csv file in a lazy mode we get a **LazyFrame** as a result.

In [None]:
dfLazy = pl.scan_csv(csv_file)

In [None]:
# Compare the types of the dataframe structures. They are different.

print(type(dfEager))
print(type(dfLazy))

<class 'polars.dataframe.frame.DataFrame'>
<class 'polars.lazyframe.frame.LazyFrame'>


### DataFrame vs LazyFrame comparison

The typical DataFrame returns the data while the LazyFrame returns a query plan, which will be executed once the LazyFrame is called.

In other words when you do changes to the DataFrame it directly updates the data, while the LazyFrame updates the query plan.

In [None]:
# Data

dfEager.head(3)

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
1,0,3,"""Braund, Mr. Ow…","""male""",22.0,1,0,"""A/5 21171""",7.25,,"""S"""
2,1,1,"""Cumings, Mrs. …","""female""",38.0,1,0,"""PC 17599""",71.2833,"""C85""","""C"""
3,1,3,"""Heikkinen, Mis…","""female""",26.0,0,0,"""STON/O2. 31012…",7.925,,"""S"""


In [None]:
# Query plan

dfLazy

The query plan is passed to a query optimizer instead of being directly executed. The plan itself can be examined with the following command.

In [None]:
print(dfLazy.describe_optimized_plan())


  CSV SCAN Titanic.csv
  PROJECT */12 COLUMNS


### Query optimizations overview

Most query optimizations can be implemented manually by us if the query is build in an optimal way and we are aware that the optimization exists. Therefore, the three main components of an optimized query are:

 * Knowledge of the optimization
 * Remeber to implement the optimization
 * Implement the optimization correctly

Polars optimizations include:


 * Projection pushdown - limit the number of columns read to those required
 * Predicate pushdown - apply filter conditions as early as possible
 * Slice pushdown - limit rows processed when limited rows are required
 * Combine predicated - combine multiple filter conditions
 * Common subplan elimination - combine duplicated transformations

## Switching between a DataFrame and a LazyFrame

At some point when the LazyFrame was evaluated it might be beneficial to switch to a DataFrame for convenience and further analysis. To do so, we can trigger an evaluation of a LazyFrame to a DataFrame by simply using the collect() method:

In [None]:
dfLazy.collect().head()

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
1,0,3,"""Braund, Mr. Ow…","""male""",22.0,1,0,"""A/5 21171""",7.25,,"""S"""
2,1,1,"""Cumings, Mrs. …","""female""",38.0,1,0,"""PC 17599""",71.2833,"""C85""","""C"""
3,1,3,"""Heikkinen, Mis…","""female""",26.0,0,0,"""STON/O2. 31012…",7.925,,"""S"""
4,1,1,"""Futrelle, Mrs.…","""female""",35.0,1,0,"""113803""",53.1,"""C123""","""S"""
5,0,3,"""Allen, Mr. Wil…","""male""",35.0,0,0,"""373450""",8.05,,"""S"""


Alternatively, a partial evaluation is also available, which ewssentially triggers evaluation of a limited number of rows. It can be done by using the fetch() method (instead of collect()). In general, the collect method is preferable while fetch is useful for development and debugging if you want to avoid running a full query on a large dataset.

In [None]:
dfLazy.fetch(3)

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
1,0,3,"""Braund, Mr. Ow…","""male""",22.0,1,0,"""A/5 21171""",7.25,,"""S"""
2,1,1,"""Cumings, Mrs. …","""female""",38.0,1,0,"""PC 17599""",71.2833,"""C85""","""C"""
3,1,3,"""Heikkinen, Mis…","""female""",26.0,0,0,"""STON/O2. 31012…",7.925,,"""S"""


When you want to save intermediate values from a query, it might be useful to translate a LazyFrame into a DataFrame (inverse transformation). Also, out query can have a transformation that can only be done in eager mode. One example of this is doing a pivot, which cannot be done with a LazyFrame as the values need to be known ahead of time. The conversion from DataFrame to LazyFrame is straightforward:

In [None]:
dfLazyConverted = dfEager.lazy()