What is DataFrame?

A DataFrame is a two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). It is similar to a spreadsheet or SQL table, or a dictionary of Series objects. DataFrames are commonly used in data analysis and manipulation tasks.

1. dicionary and list(DataFrame)
2. create Polars and Pandas dataframe
3. convert dataframe

Dictionary

A dictionary is a collection of key-value pairs. Each key is unique and maps to a value. In Python, dictionaries are defined using curly braces `{}`.

In [1]:
# Dictionary
data = {
    "ID": 1,
    "Name": 'Kunal',
    "Age": 25
}

In [2]:
data

{'ID': 1, 'Name': 'Kunal', 'Age': 25}

type() is a built-in function that returns the type of an object.

In [4]:
print(type(data))

<class 'dict'>


Here, type(data) outputs <class 'dict'>, indicating that `data` is a dictionary.

In [5]:
len(data)

3

Using Dictionary in Polars DataFrame

In [7]:
import polars as pl

data = {
    "ID": 1,
    "Name": "Kunal",
    "Age": 25
}

In [8]:
df = pl.DataFrame(data)

In [9]:
df

ID,Name,Age
i64,str,i64
1,"""Kunal""",25


In [10]:
type(df)

polars.dataframe.frame.DataFrame

Above type(df) show the df is dataframe class in Polars.

https://docs.pola.rs/api/python/stable/reference/dataframe/index.html

In [12]:
df.columns

['ID', 'Name', 'Age']

In [13]:
len(df)

1

List:

A list is an ordered collection of items that can contain elements of different types. Lists are defined using square brackets `[]`.

In [14]:
#List
data = [[1, 'Kunal', 25]]

In [15]:
data

[[1, 'Kunal', 25]]

In [16]:
print(type(data))

<class 'list'>


In [23]:
import polars as pl

data = [[1, 'Kunal', 25]]
df = pl.DataFrame(data, schema={"ID": int, "Name": str, "Age": int}) # schema as Dictionary
# df = pl.DataFrame(data, schema=["ID", "Name", "Age"]) # schema as List where each element data type is decided by Polars

  df = pl.DataFrame(data, schema={"ID": int, "Name": str, "Age": int}) # schema as Dictionary


In [21]:
df

ID,Name,Age
i64,str,i64
1,"""Kunal""",25


In [28]:
type(df)

polars.dataframe.frame.DataFrame

In [29]:
len(df)

1

Creating Data variable using List And Dictionary

In [30]:
data = {
    'Name': ['Kunal', 'John', 'Supriye', 'Sourav'],
    'Age': [25, 30, 28, 22],
    'Scores': [[90, 85, 88], [78, 82, 80], [95, 92, 89], [70, 75, 80]],
    'Details': [{
        'City': 'Delhi','Job': 'Software Engineer',
        'Designation': 'Senior Developer'
    },
    {        'City': 'New York', 'Job': 'Data Scientist',
        'Designation': 'Junior Data Analyst'
    },
    {        'City': 'London', 'Job': 'Data Analyst',
        'Designation': 'Senior Data Analyst'
    },
    {        'City': 'San Francisco', 'Job': 'Product Manager',
        'Designation': 'Senior Product Manager'
    }]
}

Creating Pandas DataFrame using Data variable

Pandas is a powerful data manipulation library in Python that provides data structures like DataFrame for handling structured data.

In [32]:
import pandas as pd
# create pandas DataFrame
df_pandas = pd.DataFrame(data)

In [33]:
df_pandas

Unnamed: 0,Name,Age,Scores,Details
0,Kunal,25,"[90, 85, 88]","{'City': 'Delhi', 'Job': 'Software Engineer', ..."
1,John,30,"[78, 82, 80]","{'City': 'New York', 'Job': 'Data Scientist', ..."
2,Supriye,28,"[95, 92, 89]","{'City': 'London', 'Job': 'Data Analyst', 'Des..."
3,Sourav,22,"[70, 75, 80]","{'City': 'San Francisco', 'Job': 'Product Mana..."


In [34]:
type(df_pandas)

pandas.core.frame.DataFrame

Creating Polars DataFrame using Data variable

Polars is another data manipulation library in Python that is designed for high performance and efficiency, especially with large datasets.

In [37]:
import polars as pl
# create polars DataFrame

df_polars = pl.DataFrame(data)

In [36]:
df_polars

Name,Age,Scores,Details
str,i64,list[i64],struct[3]
"""Kunal""",25,"[90, 85, 88]","{""Delhi"",""Software Engineer"",""Senior Developer""}"
"""John""",30,"[78, 82, 80]","{""New York"",""Data Scientist"",""Junior Data Analyst""}"
"""Supriye""",28,"[95, 92, 89]","{""London"",""Data Analyst"",""Senior Data Analyst""}"
"""Sourav""",22,"[70, 75, 80]","{""San Francisco"",""Product Manager"",""Senior Product Manager""}"


In [38]:
type(df_polars)

polars.dataframe.frame.DataFrame

Convert Pandas DataFrame to Polars

convert Pandas to Polars DataFrame using .from_pandas()

In [39]:
df_polars_from_pandas = pl.from_pandas(df_pandas)

In [40]:
df_polars_from_pandas

Name,Age,Scores,Details
str,i64,list[i64],struct[3]
"""Kunal""",25,"[90, 85, 88]","{""Delhi"",""Senior Developer"",""Software Engineer""}"
"""John""",30,"[78, 82, 80]","{""New York"",""Junior Data Analyst"",""Data Scientist""}"
"""Supriye""",28,"[95, 92, 89]","{""London"",""Senior Data Analyst"",""Data Analyst""}"
"""Sourav""",22,"[70, 75, 80]","{""San Francisco"",""Senior Product Manager"",""Product Manager""}"


In [41]:
type(df_polars_from_pandas)

polars.dataframe.frame.DataFrame

Convert Polars DataFrame to Pandas

convert Polars to Pandas DataFrame using .to_pandas()

In [42]:
df_pandas_from_polars = df_polars.to_pandas()

In [43]:
df_polars_from_pandas

Name,Age,Scores,Details
str,i64,list[i64],struct[3]
"""Kunal""",25,"[90, 85, 88]","{""Delhi"",""Senior Developer"",""Software Engineer""}"
"""John""",30,"[78, 82, 80]","{""New York"",""Junior Data Analyst"",""Data Scientist""}"
"""Supriye""",28,"[95, 92, 89]","{""London"",""Senior Data Analyst"",""Data Analyst""}"
"""Sourav""",22,"[70, 75, 80]","{""San Francisco"",""Senior Product Manager"",""Product Manager""}"


In [44]:
type(df_pandas_from_polars)

pandas.core.frame.DataFrame

Understanding Lazy Mode in Polars
Introduction

Imagine you have a giant pile of books, and you need to find all books written by a specific author. If you start picking up each book one by one, checking the author’s name, and then putting it back, this would take a lot of time. But what if, instead, you first make a plan: “I will only look at books from the ‘Authors A-Z’ section, and I will filter only those books whose covers mention the author’s name”? That would be much faster and more efficient!

This is exactly how lazy mode in Polars works—it delays execution and optimizes queries before actually processing them.



What is Lazy Mode?

Lazy mode in Polars is a way of handling data where computations are not executed immediately. Instead, Polars waits until all operations are defined before executing them in the most efficient way possible.



How Lazy Mode Works



Build a Query Plan: Instead of executing operations one by one, Polars first records what needs to be done.

Optimize the Plan: Polars rearranges and optimizes the steps to reduce unnecessary work.

Execute Efficiently: Only when needed, Polars runs the optimized query.

Example: Lazy Mode

In [1]:
import polars as pl 
df = pl.DataFrame({"name": ["Alice", "Bob", "Charlie"], "age": [25, 30, 35]})
lazy_df = df.lazy() # Convert to lazy mode 
filtered_lazy_df = lazy_df.filter(pl.col("age") > 28) # No execution yet! 
result = filtered_lazy_df.collect() 
print(result)

shape: (2, 2)
┌─────────┬─────┐
│ name    ┆ age │
│ ---     ┆ --- │
│ str     ┆ i64 │
╞═════════╪═════╡
│ Bob     ┆ 30  │
│ Charlie ┆ 35  │
└─────────┴─────┘


Nothing runs until .collect() is called, allowing Polars to optimize the query.

Why Use Lazy Mode?



Faster Execution: Polars avoids unnecessary computations, making it faster for large datasets.

Better Memory Management: It processes data in chunks instead of keeping everything in memory.

Smart Query Optimization: It can rearrange operations for better efficiency, like pushing filters earlier to reduce the amount of data processed.

When to Use Lazy Mode?



Working with large datasets (millions or billions of rows).

Running complex queries involving multiple operations.

Needing efficient memory usage to avoid crashes.

Conclusion

Lazy mode in Polars is like planning a task before doing it—it makes execution faster and more efficient. Instead of performing every step immediately, Polars waits, optimizes, and then executes everything in the best way possible. This makes it an excellent choice for big data processing.