## Polars python library

Polars is a powerful Python library designed for efficient data manipulation and analysis, especially with large datasets.Its high speed and performance optimizations, making it a great choice for handling big data effectively. It highlight Polars' user-friendly interface with DataFrame structures, which makes data manipulation tasks intuitive. Additionally, it seamlessly integrates with other Python libraries like NumPy and PyArrow, expanding its capabilities and benefiting from a wide range of tools. It also emphasize the convenience of converting Polars DataFrames to pandas DataFrames for interoperability and easy integration into existing workflows. Overall, Polars provides an extensive toolkit to maximize the potential of data analysis endeavors, whether dealing with complex data types, large datasets, or seeking performance improvements.

In [17]:
# read csv file
import polars as pl
import pandas as pd

data = pl.read_csv('diamond.csv')


# check the head
data.head()


Carat Weight,Cut,Color,Clarity,Polish,Symmetry,Report,Price
f64,str,str,str,str,str,str,i64
1.1,"""Ideal""","""H""","""SI1""","""VG""","""EX""","""GIA""",5169
0.83,"""Ideal""","""H""","""VS1""","""ID""","""ID""","""AGSL""",3470
0.85,"""Ideal""","""H""","""SI1""","""EX""","""EX""","""GIA""",3183
0.91,"""Ideal""","""E""","""SI1""","""VG""","""VG""","""GIA""",4370
0.83,"""Ideal""","""G""","""SI1""","""EX""","""EX""","""GIA""",3171


In [6]:
type(data)

polars.dataframe.frame.DataFrame

## Selecting and filtering data

In [8]:
# Select specific columns: carat, cut, and price
selected_df = data.select(['Carat Weight', 'Cut', 'Price'])

# show selected_df head
selected_df.head()

Carat Weight,Cut,Price
f64,str,i64
1.1,"""Ideal""",5169
0.83,"""Ideal""",3470
0.85,"""Ideal""",3183
0.91,"""Ideal""",4370
0.83,"""Ideal""",3171


In [9]:
# filter the df with condition
filtered_df = data.filter(pl.col('Carat Weight') > 2.0)


# show filtered_df head
filtered_df.head()

Carat Weight,Cut,Color,Clarity,Polish,Symmetry,Report,Price
f64,str,str,str,str,str,str,i64
2.11,"""Ideal""","""H""","""SI1""","""VG""","""VG""","""GIA""",18609
2.51,"""Very Good""","""G""","""VS2""","""VG""","""VG""","""GIA""",34361
2.2,"""Ideal""","""H""","""VS2""","""EX""","""VG""","""GIA""",22241
2.6,"""Ideal""","""G""","""VS2""","""EX""","""EX""","""GIA""",37621
2.02,"""Good""","""I""","""VVS2""","""EX""","""VG""","""GIA""",19756


## Sorting and ordering data

In [11]:
# sort the df by price
sorted_df = data.sort(by='Price')

# show sorted_df head
sorted_df.head()

Carat Weight,Cut,Color,Clarity,Polish,Symmetry,Report,Price
f64,str,str,str,str,str,str,i64
0.77,"""Good""","""I""","""VS1""","""VG""","""G""","""AGSL""",2184
0.77,"""Good""","""I""","""SI1""","""EX""","""VG""","""GIA""",2241
0.78,"""Very Good""","""I""","""SI1""","""EX""","""VG""","""GIA""",2348
0.75,"""Ideal""","""I""","""SI1""","""VG""","""VG""","""GIA""",2383
0.76,"""Very Good""","""H""","""SI1""","""G""","""G""","""GIA""",2396


##  Handling missing values

In [12]:
# drop missing values
cleaned_df = data.drop_nulls()

# show cleaned_df head
cleaned_df.head()

Carat Weight,Cut,Color,Clarity,Polish,Symmetry,Report,Price
f64,str,str,str,str,str,str,i64
1.1,"""Ideal""","""H""","""SI1""","""VG""","""EX""","""GIA""",5169
0.83,"""Ideal""","""H""","""VS1""","""ID""","""ID""","""AGSL""",3470
0.85,"""Ideal""","""H""","""SI1""","""EX""","""EX""","""GIA""",3183
0.91,"""Ideal""","""E""","""SI1""","""VG""","""VG""","""GIA""",4370
0.83,"""Ideal""","""G""","""SI1""","""EX""","""EX""","""GIA""",3171


## Grouping data based on specific columns

In [14]:
# group by cut and calc mean of price
grouped_df = data.groupby(by='Cut').agg(pl.col('Price').mean())

# show grouped_df head
grouped_df.head()

Cut,Price
str,f64
"""Fair""",5886.178295
"""Signature-Idea…",11541.525692
"""Ideal""",13127.331185
"""Good""",9326.65678
"""Very Good""",11484.69687


## Joining and combining DataFrames

In [16]:
# Create the first DataFrame
df1 = pl.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['Alan', 'Bypu', 'Cendra', 'Davki']
})


# Create the second DataFrame
df2 = pl.DataFrame({
    'id': [2, 3, 5],
    'age': [25, 30, 35]
})


# Perform an inner join on the 'id' column
joined_df = df1.join(df2, on='id')


# Display the joined DataFrame
joined_df

id,name,age
i64,str,i64
2,"""Bypu""",25
3,"""Cendra""",30


## Converting Polars DataFrames to pandas DataFrames

In [18]:
# Create a Polars DataFrame
df_polars = pl.DataFrame({
    'column_A': [1, 2, 3],
    'column_B': ['apple', 'banana', 'orange']
})


# Convert Polars DataFrame to Pandas DataFrame
df_pandas = df_polars.to_pandas()


# Display the Pandas DataFrame
df_pandas

Unnamed: 0,column_A,column_B
0,1,apple
1,2,banana
2,3,orange
