<div style="background-color:#5D73F2; color:#19180F; font-size:40px; font-family:Arial; padding:10px; border: 5px solid #19180F; border-radius:10px"> 40+ Pandas functions & their Custom and Polars Implementations </div>
<div style="background-color:#A8B4F6; color:#19180F; font-size:30px; font-family:Arial; padding:10px; border: 5px solid #19180F; border-radius:10px"> Objective</div>
<div style="background-color:#D5D9F2; color:#19180F; font-size:15px; font-family:Arial; padding:10px; border: 5px solid #19180F; border-radius:10px">
This notebook attempts to give a complete collection of 40+ inbuilt functions from the pandas library, as well as custom and Polars implementations alongwith practical usage, with an emphasis on the Titanic - Machine Learning from Disaster dataset, which is suitable for beginners..<br><br>
This notebook is a helpful resource for data scientists by integrating the pandas library's built-in methods with their custom and Polars library's implementations. It includes subjects and techniques like as data processing, exploration, cleansing, feature engineering, and visualisation. Each function is well explained, emphasising its purpose, arguments, and return values. Furthermore, the custom and Polars implementations provide an opportunity for users to deepen their understanding of pandas and enhance their programming skills.<br><br>
</div>
    

<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
📌
Importing modules
    </div>


In [1]:
import pandas as pd
import polars as pl
import csv

<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 1. <b> Reading & displaying the contents: </b>
Reading the dataframe using Pandas, Polars and custom method and displaying their top 5 and last 5 rows  </div>


In [2]:
pandas_df = pd.read_csv('/kaggle/input/titanic/train.csv')

In [3]:
pandas_df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [4]:
pandas_df.tail()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


In [5]:
polars_df= pl.read_csv('/kaggle/input/titanic/train.csv')

In [6]:
polars_df.head()

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
1,0,3,"""Braund, Mr. Ow…","""male""",22.0,1,0,"""A/5 21171""",7.25,,"""S"""
2,1,1,"""Cumings, Mrs. …","""female""",38.0,1,0,"""PC 17599""",71.2833,"""C85""","""C"""
3,1,3,"""Heikkinen, Mis…","""female""",26.0,0,0,"""STON/O2. 31012…",7.925,,"""S"""
4,1,1,"""Futrelle, Mrs.…","""female""",35.0,1,0,"""113803""",53.1,"""C123""","""S"""
5,0,3,"""Allen, Mr. Wil…","""male""",35.0,0,0,"""373450""",8.05,,"""S"""


In [7]:
polars_df.tail()

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
887,0,2,"""Montvila, Rev.…","""male""",27.0,0,0,"""211536""",13.0,,"""S"""
888,1,1,"""Graham, Miss. …","""female""",19.0,0,0,"""112053""",30.0,"""B42""","""S"""
889,0,3,"""Johnston, Miss…","""female""",,1,2,"""W./C. 6607""",23.45,,"""S"""
890,1,1,"""Behr, Mr. Karl…","""male""",26.0,0,0,"""111369""",30.0,"""C148""","""C"""
891,0,3,"""Dooley, Mr. Pa…","""male""",32.0,0,0,"""370376""",7.75,,"""Q"""


In [8]:
with open('/kaggle/input/titanic/train.csv', 'r') as file:
    reader = csv.reader(file)
    
    header = next(reader)
    
    data = [row for row in reader]
    
print(header)


['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked']


In [9]:
for row in data[:5]:
    print(row)

['1', '0', '3', 'Braund, Mr. Owen Harris', 'male', '22', '1', '0', 'A/5 21171', '7.25', '', 'S']
['2', '1', '1', 'Cumings, Mrs. John Bradley (Florence Briggs Thayer)', 'female', '38', '1', '0', 'PC 17599', '71.2833', 'C85', 'C']
['3', '1', '3', 'Heikkinen, Miss. Laina', 'female', '26', '0', '0', 'STON/O2. 3101282', '7.925', '', 'S']
['4', '1', '1', 'Futrelle, Mrs. Jacques Heath (Lily May Peel)', 'female', '35', '1', '0', '113803', '53.1', 'C123', 'S']
['5', '0', '3', 'Allen, Mr. William Henry', 'male', '35', '0', '0', '373450', '8.05', '', 'S']


In [10]:
for row in data[-5:]:
    print(row)

['887', '0', '2', 'Montvila, Rev. Juozas', 'male', '27', '0', '0', '211536', '13', '', 'S']
['888', '1', '1', 'Graham, Miss. Margaret Edith', 'female', '19', '0', '0', '112053', '30', 'B42', 'S']
['889', '0', '3', 'Johnston, Miss. Catherine Helen "Carrie"', 'female', '', '1', '2', 'W./C. 6607', '23.45', '', 'S']
['890', '1', '1', 'Behr, Mr. Karl Howell', 'male', '26', '0', '0', '111369', '30', 'C148', 'C']
['891', '0', '3', 'Dooley, Mr. Patrick', 'male', '32', '0', '0', '370376', '7.75', '', 'Q']


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 2. <b> Summary of the dataframe: </b>
Displaying the summary of dataframe  </div>


In [11]:
summary_pandas = pandas_df.describe()
summary_pandas

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


In [12]:
polars_summary = polars_df.describe()
polars_summary


describe,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
str,f64,f64,f64,str,str,f64,f64,f64,str,f64,str,str
"""count""",891.0,891.0,891.0,"""891""","""891""",891.0,891.0,891.0,"""891""",891.0,"""891""","""891"""
"""null_count""",0.0,0.0,0.0,"""0""","""0""",177.0,0.0,0.0,"""0""",0.0,"""687""","""2"""
"""mean""",446.0,0.383838,2.308642,,,29.699118,0.523008,0.381594,,32.204208,,
"""std""",257.353842,0.486592,0.836071,,,14.526497,1.102743,0.806057,,49.693429,,
"""min""",1.0,0.0,1.0,"""Abbing, Mr. An…","""female""",0.42,0.0,0.0,"""110152""",0.0,"""A10""","""C"""
"""max""",891.0,1.0,3.0,"""van Melkebeke,…","""male""",80.0,8.0,6.0,"""WE/P 5735""",512.3292,"""T""","""S"""
"""median""",446.0,0.0,3.0,,,28.0,0.0,0.0,,14.4542,,
"""25%""",223.0,0.0,2.0,,,20.0,0.0,0.0,,7.8958,,
"""75%""",669.0,1.0,3.0,,,38.0,1.0,0.0,,31.0,,


In [13]:
#custom function
def custom_summary(df):
    numeric_cols = df.select_dtypes(include=['number']).columns
    summary = {
        'mean': df[numeric_cols].mean(),
        'median': df[numeric_cols].median(),
        'min': df[numeric_cols].min(),
        'max': df[numeric_cols].max(),
        'std': df[numeric_cols].std(),
    }
    return summary
#displaying summary
summary_custom = custom_summary(pandas_df)
for stat, value in summary_custom.items():
    print(f'{stat}: {value}')

mean: PassengerId    446.000000
Survived         0.383838
Pclass           2.308642
Age             29.699118
SibSp            0.523008
Parch            0.381594
Fare            32.204208
dtype: float64
median: PassengerId    446.0000
Survived         0.0000
Pclass           3.0000
Age             28.0000
SibSp            0.0000
Parch            0.0000
Fare            14.4542
dtype: float64
min: PassengerId    1.00
Survived       0.00
Pclass         1.00
Age            0.42
SibSp          0.00
Parch          0.00
Fare           0.00
dtype: float64
max: PassengerId    891.0000
Survived         1.0000
Pclass           3.0000
Age             80.0000
SibSp            8.0000
Parch            6.0000
Fare           512.3292
dtype: float64
std: PassengerId    257.353842
Survived         0.486592
Pclass           0.836071
Age             14.526497
SibSp            1.102743
Parch            0.806057
Fare            49.693429
dtype: float64


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 3. <b> Shape of the dataframe: </b>
Displaying the shape of dataframe  </div>


In [14]:
shape_pandas = pandas_df.shape
print(shape_pandas)

(891, 12)


In [15]:
shape_polars = polars_df.shape
print(shape_polars)

(891, 12)


In [16]:
def custom_shape(df):
    num_rows, num_cols = df.shape
    return num_rows, num_cols

shape_custom = custom_shape(pandas_df)
print(f"Number of rows: {shape_custom[0]}")
print(f"Number of columns: {shape_custom[1]}")

Number of rows: 891
Number of columns: 12


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 4. <b> Column Labels: </b>
Displaying the column labels of dataframe  </div>


In [17]:
column_labels_pandas = pandas_df.columns
print(column_labels_pandas)

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')


In [18]:
column_labels_polars = polars_df.columns
print(column_labels_polars)

['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked']


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 5. <b> Index Labels: </b>
Displaying the index labels of dataframe  </div>


In [19]:
index_labels_pandas = pandas_df.index
print(index_labels_pandas)

RangeIndex(start=0, stop=891, step=1)


In [20]:
#no direct method available, hence convert to pandas
pandas_df = polars_df.to_pandas()
index_labels_pandas = pandas_df.index
print(index_labels_pandas)

RangeIndex(start=0, stop=891, step=1)


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 5. <b> Accessing group of rows and columns: </b>
Accessing group of rows and columns by labels or by integer positions  </div>


In [21]:
# Accessing rows and columns using labels (loc)
subset_pandas = pandas_df.loc[2:4, ['PassengerId', 'Survived', 'Name', 'Age']]
print(subset_pandas)

   PassengerId  Survived                                          Name   Age
2            3         1                        Heikkinen, Miss. Laina  26.0
3            4         1  Futrelle, Mrs. Jacques Heath (Lily May Peel)  35.0
4            5         0                      Allen, Mr. William Henry  35.0


In [22]:
# Accessing rows and columns using integer positions (iloc)
pandas_df.iloc[1,3]

'Cumings, Mrs. John Bradley (Florence Briggs Thayer)'

In [23]:
subset_polars = polars_df.select(['PassengerId', 'Survived', 'Name', 'Age']).slice(1, 4)
print(subset_polars)


shape: (4, 4)
┌─────────────┬──────────┬───────────────────────────────────┬──────┐
│ PassengerId ┆ Survived ┆ Name                              ┆ Age  │
│ ---         ┆ ---      ┆ ---                               ┆ ---  │
│ i64         ┆ i64      ┆ str                               ┆ f64  │
╞═════════════╪══════════╪═══════════════════════════════════╪══════╡
│ 2           ┆ 1        ┆ Cumings, Mrs. John Bradley (Flor… ┆ 38.0 │
│ 3           ┆ 1        ┆ Heikkinen, Miss. Laina            ┆ 26.0 │
│ 4           ┆ 1        ┆ Futrelle, Mrs. Jacques Heath (Li… ┆ 35.0 │
│ 5           ┆ 0        ┆ Allen, Mr. William Henry          ┆ 35.0 │
└─────────────┴──────────┴───────────────────────────────────┴──────┘


In [24]:
# Accessing rows and columns using integer positions (iloc)
row_index = 1
col_index = 3
value_polars = polars_df[row_index][col_index]
print(value_polars)

shape: (0, 12)
┌─────────────┬──────────┬────────┬──────┬───┬────────┬──────┬───────┬──────────┐
│ PassengerId ┆ Survived ┆ Pclass ┆ Name ┆ … ┆ Ticket ┆ Fare ┆ Cabin ┆ Embarked │
│ ---         ┆ ---      ┆ ---    ┆ ---  ┆   ┆ ---    ┆ ---  ┆ ---   ┆ ---      │
│ i64         ┆ i64      ┆ i64    ┆ str  ┆   ┆ str    ┆ f64  ┆ str   ┆ str      │
╞═════════════╪══════════╪════════╪══════╪═══╪════════╪══════╪═══════╪══════════╡
└─────────────┴──────────┴────────┴──────┴───┴────────┴──────┴───────┴──────────┘


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 6. <b> Dropping specified labels</b>
Dropping specified labels from rows or columns</div>


In [25]:
# Dropping labels from rows
rows_to_drop = [2,3]  # specify the labels to drop
new_pandas_df = pandas_df.drop(index=rows_to_drop)
new_pandas_df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


In [26]:
columns_to_drop = ['Survived']

# Drop the specified columns
new_polars_df = polars_df.drop(columns_to_drop)

print(new_polars_df)

shape: (891, 11)
┌─────────────┬────────┬────────────────────┬────────┬───┬────────────┬─────────┬───────┬──────────┐
│ PassengerId ┆ Pclass ┆ Name               ┆ Sex    ┆ … ┆ Ticket     ┆ Fare    ┆ Cabin ┆ Embarked │
│ ---         ┆ ---    ┆ ---                ┆ ---    ┆   ┆ ---        ┆ ---     ┆ ---   ┆ ---      │
│ i64         ┆ i64    ┆ str                ┆ str    ┆   ┆ str        ┆ f64     ┆ str   ┆ str      │
╞═════════════╪════════╪════════════════════╪════════╪═══╪════════════╪═════════╪═══════╪══════════╡
│ 1           ┆ 3      ┆ Braund, Mr. Owen   ┆ male   ┆ … ┆ A/5 21171  ┆ 7.25    ┆ null  ┆ S        │
│             ┆        ┆ Harris             ┆        ┆   ┆            ┆         ┆       ┆          │
│ 2           ┆ 1      ┆ Cumings, Mrs. John ┆ female ┆ … ┆ PC 17599   ┆ 71.2833 ┆ C85   ┆ C        │
│             ┆        ┆ Bradley (Flor…     ┆        ┆   ┆            ┆         ┆       ┆          │
│ 3           ┆ 3      ┆ Heikkinen, Miss.   ┆ female ┆ … ┆ STON/O2.   ┆ 7.

<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 7. <b> Removing missing values</b>
Removing missing values from dataframe</div>


In [27]:
pandas_df = pandas_df.dropna()


In [28]:
pandas_df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.0,1,1,PP 9549,16.7000,G6,S
11,12,1,1,"Bonnell, Miss. Elizabeth",female,58.0,0,0,113783,26.5500,C103,S
...,...,...,...,...,...,...,...,...,...,...,...,...
871,872,1,1,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",female,47.0,1,1,11751,52.5542,D35,S
872,873,0,1,"Carlsson, Mr. Frans Olof",male,33.0,0,0,695,5.0000,B51 B53 B55,S
879,880,1,1,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",female,56.0,0,1,11767,83.1583,C50,C
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S


In [29]:
polars_df = polars_df.filter(pl.all(pl.col(col).is_not_null() for col in polars_df.columns))
polars_df

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
2,1,1,"""Cumings, Mrs. …","""female""",38.0,1,0,"""PC 17599""",71.2833,"""C85""","""C"""
4,1,1,"""Futrelle, Mrs.…","""female""",35.0,1,0,"""113803""",53.1,"""C123""","""S"""
7,0,1,"""McCarthy, Mr. …","""male""",54.0,0,0,"""17463""",51.8625,"""E46""","""S"""
11,1,3,"""Sandstrom, Mis…","""female""",4.0,1,1,"""PP 9549""",16.7,"""G6""","""S"""
12,1,1,"""Bonnell, Miss.…","""female""",58.0,0,0,"""113783""",26.55,"""C103""","""S"""
22,1,2,"""Beesley, Mr. L…","""male""",34.0,0,0,"""248698""",13.0,"""D56""","""S"""
24,1,1,"""Sloper, Mr. Wi…","""male""",28.0,0,0,"""113788""",35.5,"""A6""","""S"""
28,0,1,"""Fortune, Mr. C…","""male""",19.0,3,2,"""19950""",263.0,"""C23 C25 C27""","""S"""
53,1,1,"""Harper, Mrs. H…","""female""",49.0,1,0,"""PC 17572""",76.7292,"""D33""","""C"""
55,0,1,"""Ostby, Mr. Eng…","""male""",65.0,0,1,"""113509""",61.9792,"""B30""","""C"""


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 8. <b> Fill missing values</b>
Filling missing values with a certain value in the dataframe</div>


In [30]:
# Fill missing values with a specific value
filled_polars_df = polars_df.fill_null(0)


In [31]:
filled_polars_df

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
2,1,1,"""Cumings, Mrs. …","""female""",38.0,1,0,"""PC 17599""",71.2833,"""C85""","""C"""
4,1,1,"""Futrelle, Mrs.…","""female""",35.0,1,0,"""113803""",53.1,"""C123""","""S"""
7,0,1,"""McCarthy, Mr. …","""male""",54.0,0,0,"""17463""",51.8625,"""E46""","""S"""
11,1,3,"""Sandstrom, Mis…","""female""",4.0,1,1,"""PP 9549""",16.7,"""G6""","""S"""
12,1,1,"""Bonnell, Miss.…","""female""",58.0,0,0,"""113783""",26.55,"""C103""","""S"""
22,1,2,"""Beesley, Mr. L…","""male""",34.0,0,0,"""248698""",13.0,"""D56""","""S"""
24,1,1,"""Sloper, Mr. Wi…","""male""",28.0,0,0,"""113788""",35.5,"""A6""","""S"""
28,0,1,"""Fortune, Mr. C…","""male""",19.0,3,2,"""19950""",263.0,"""C23 C25 C27""","""S"""
53,1,1,"""Harper, Mrs. H…","""female""",49.0,1,0,"""PC 17572""",76.7292,"""D33""","""C"""
55,0,1,"""Ostby, Mr. Eng…","""male""",65.0,0,1,"""113509""",61.9792,"""B30""","""C"""


In [32]:
# Fill missing values with a specific value
filled_pandas_df = pandas_df.fillna(0)


In [33]:
filled_pandas_df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.0,1,1,PP 9549,16.7000,G6,S
11,12,1,1,"Bonnell, Miss. Elizabeth",female,58.0,0,0,113783,26.5500,C103,S
...,...,...,...,...,...,...,...,...,...,...,...,...
871,872,1,1,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",female,47.0,1,1,11751,52.5542,D35,S
872,873,0,1,"Carlsson, Mr. Frans Olof",male,33.0,0,0,695,5.0000,B51 B53 B55,S
879,880,1,1,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",female,56.0,0,1,11767,83.1583,C50,C
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 9. <b> Return Boolean for Missing</b>
Return Boolean values for NaN or missing values</div>


In [34]:
is_nan_polars_df = pl.DataFrame(
    {
        col: (pl.col(col).is_null() | pl.col(col).is_nan())
        for col in polars_df.columns
    }
)

In [35]:
is_nan_polars_df

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
object,object,object,object,object,object,object,object,object,object,object,object
"[(col(""PassengerId"").is_null()) | (col(""PassengerId"").is_nan())]","[(col(""Survived"").is_null()) | (col(""Survived"").is_nan())]","[(col(""Pclass"").is_null()) | (col(""Pclass"").is_nan())]","[(col(""Name"").is_null()) | (col(""Name"").is_nan())]","[(col(""Sex"").is_null()) | (col(""Sex"").is_nan())]","[(col(""Age"").is_null()) | (col(""Age"").is_nan())]","[(col(""SibSp"").is_null()) | (col(""SibSp"").is_nan())]","[(col(""Parch"").is_null()) | (col(""Parch"").is_nan())]","[(col(""Ticket"").is_null()) | (col(""Ticket"").is_nan())]","[(col(""Fare"").is_null()) | (col(""Fare"").is_nan())]","[(col(""Cabin"").is_null()) | (col(""Cabin"").is_nan())]","[(col(""Embarked"").is_null()) | (col(""Embarked"").is_nan())]"


In [36]:
is_null_pandas_df = pandas_df.isnull()


In [37]:
is_null_pandas_df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False
6,False,False,False,False,False,False,False,False,False,False,False,False
10,False,False,False,False,False,False,False,False,False,False,False,False
11,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...
871,False,False,False,False,False,False,False,False,False,False,False,False
872,False,False,False,False,False,False,False,False,False,False,False,False
879,False,False,False,False,False,False,False,False,False,False,False,False
887,False,False,False,False,False,False,False,False,False,False,False,False


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 10. <b>Grouping a dataframe </b>
Grouping a dataframe by one or more columns</div>


In [38]:
grouped_pandas_df = pandas_df.groupby(['Survived', 'Age'])

In [39]:
for group, group_df in grouped_pandas_df:
    print(f"Group: {group}")
    group_df
    print()

Group: (0, 2.0)

Group: (0, 18.0)

Group: (0, 19.0)

Group: (0, 21.0)

Group: (0, 24.0)

Group: (0, 25.0)

Group: (0, 27.0)

Group: (0, 29.0)

Group: (0, 30.0)

Group: (0, 31.0)

Group: (0, 33.0)

Group: (0, 36.0)

Group: (0, 36.5)

Group: (0, 37.0)

Group: (0, 38.0)

Group: (0, 39.0)

Group: (0, 40.0)

Group: (0, 42.0)

Group: (0, 44.0)

Group: (0, 45.0)

Group: (0, 45.5)

Group: (0, 46.0)

Group: (0, 47.0)

Group: (0, 49.0)

Group: (0, 50.0)

Group: (0, 52.0)

Group: (0, 54.0)

Group: (0, 55.0)

Group: (0, 56.0)

Group: (0, 57.0)

Group: (0, 58.0)

Group: (0, 61.0)

Group: (0, 62.0)

Group: (0, 64.0)

Group: (0, 65.0)

Group: (0, 70.0)

Group: (0, 71.0)

Group: (1, 0.92)

Group: (1, 1.0)

Group: (1, 2.0)

Group: (1, 3.0)

Group: (1, 4.0)

Group: (1, 6.0)

Group: (1, 11.0)

Group: (1, 14.0)

Group: (1, 15.0)

Group: (1, 16.0)

Group: (1, 17.0)

Group: (1, 18.0)

Group: (1, 19.0)

Group: (1, 21.0)

Group: (1, 22.0)

Group: (1, 23.0)

Group: (1, 24.0)

Group: (1, 25.0)

Group: (1, 26.0)

In [40]:
grouped_polars_df = polars_df.groupby(['Survived', 'Age'])


In [41]:
for group, group_df in grouped_polars_df:
    print(f"Group: {group}")
    print(group_df)
    print()

Group: (1, 34.0)
shape: (2, 12)
┌─────────────┬──────────┬────────┬─────────────────┬───┬────────────┬──────┬───────┬──────────┐
│ PassengerId ┆ Survived ┆ Pclass ┆ Name            ┆ … ┆ Ticket     ┆ Fare ┆ Cabin ┆ Embarked │
│ ---         ┆ ---      ┆ ---    ┆ ---             ┆   ┆ ---        ┆ ---  ┆ ---   ┆ ---      │
│ i64         ┆ i64      ┆ i64    ┆ str             ┆   ┆ str        ┆ f64  ┆ str   ┆ str      │
╞═════════════╪══════════╪════════╪═════════════════╪═══╪════════════╪══════╪═══════╪══════════╡
│ 22          ┆ 1        ┆ 2      ┆ Beesley, Mr.    ┆ … ┆ 248698     ┆ 13.0 ┆ D56   ┆ S        │
│             ┆          ┆        ┆ Lawrence        ┆   ┆            ┆      ┆       ┆          │
│ 517         ┆ 1        ┆ 2      ┆ Lemore, Mrs.    ┆ … ┆ C.A. 34260 ┆ 10.5 ┆ F33   ┆ S        │
│             ┆          ┆        ┆ (Amelia Milley) ┆   ┆            ┆      ┆       ┆          │
└─────────────┴──────────┴────────┴─────────────────┴───┴────────────┴──────┴───────┴──────────

<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 11. <b>Unique val counts</b>
Count unique values in the dataframe</div>


In [42]:
unique_counts_polars = polars_df.n_unique()


In [43]:
unique_counts_polars

183

In [44]:
unique_counts_pandas = pandas_df.nunique()


In [45]:
unique_counts_pandas

PassengerId    183
Survived         2
Pclass           3
Name           183
Sex              2
Age             63
SibSp            4
Parch            4
Ticket         127
Fare            93
Cabin          133
Embarked         3
dtype: int64

<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 11. <b>Unique val </b>
Returning unique values in a series</div>


In [46]:
unique_values_polars = polars_df.unique()


In [47]:
unique_values_polars

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
2,1,1,"""Cumings, Mrs. …","""female""",38.0,1,0,"""PC 17599""",71.2833,"""C85""","""C"""
7,0,1,"""McCarthy, Mr. …","""male""",54.0,0,0,"""17463""",51.8625,"""E46""","""S"""
28,0,1,"""Fortune, Mr. C…","""male""",19.0,3,2,"""19950""",263.0,"""C23 C25 C27""","""S"""
63,0,1,"""Harris, Mr. He…","""male""",45.0,1,0,"""36973""",83.475,"""C83""","""S"""
93,0,1,"""Chaffee, Mr. H…","""male""",46.0,1,0,"""W.E.P. 5734""",61.175,"""E31""","""S"""
98,1,1,"""Greenfield, Mr…","""male""",23.0,0,1,"""PC 17759""",63.3583,"""D10 D12""","""C"""
103,0,1,"""White, Mr. Ric…","""male""",21.0,0,1,"""35281""",77.2875,"""D26""","""S"""
111,0,1,"""Porter, Mr. Wa…","""male""",47.0,0,0,"""110465""",52.0,"""C110""","""S"""
125,0,1,"""White, Mr. Per…","""male""",54.0,0,1,"""35281""",77.2875,"""D26""","""S"""
149,0,2,"""Navratil, Mr. …","""male""",36.5,0,2,"""230080""",26.0,"""F2""","""S"""


In [48]:
unique_values_pandas = {}
for column in pandas_df.columns:
    unique_values_pandas[column] = pandas_df[column].unique()
#printing 

for column, values in unique_values_pandas.items():
    print(f"Column: {column}")
    print(values)
    print()

Column: PassengerId
[  2   4   7  11  12  22  24  28  53  55  63  67  76  89  93  97  98 103
 111 119 124 125 137 138 140 149 152 171 175 178 184 194 195 196 206 210
 216 219 225 231 246 249 252 253 258 263 264 269 270 274 276 292 293 298
 300 306 308 310 311 312 319 320 326 328 330 332 333 337 338 340 341 342
 346 357 367 370 371 378 391 394 395 413 430 431 435 436 439 446 450 453
 454 457 461 463 474 485 487 488 493 497 499 505 506 513 516 517 521 524
 537 540 541 545 551 557 559 572 573 578 582 584 586 588 592 600 610 619
 622 626 628 631 633 642 646 648 660 663 672 680 682 690 691 699 700 701
 702 708 711 713 716 717 718 725 731 738 742 743 746 749 752 760 764 766
 773 780 782 783 790 797 803 807 810 821 824 836 854 858 863 868 872 873
 880 888 890]

Column: Survived
[1 0]

Column: Pclass
[1 3 2]

Column: Name
['Cumings, Mrs. John Bradley (Florence Briggs Thayer)'
 'Futrelle, Mrs. Jacques Heath (Lily May Peel)' 'McCarthy, Mr. Timothy J'
 'Sandstrom, Miss. Marguerite Rut' 'Bonnell, 

In [49]:
unique_counts_pandas

PassengerId    183
Survived         2
Pclass           3
Name           183
Sex              2
Age             63
SibSp            4
Parch            4
Ticket         127
Fare            93
Cabin          133
Embarked         3
dtype: int64

<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 12. <b>Sort the dataframe </b>
Sorting the dataframe by one or more columns</div>


In [50]:
sorted_pandas_df = pandas_df.sort_values(['Survived', 'Age'])
sorted_pandas_df


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
205,206,0,3,"Strom, Miss. Telma Matilda",female,2.0,0,1,347054,10.4625,G6,S
297,298,0,1,"Allison, Miss. Helen Loraine",female,2.0,1,2,113781,151.5500,C22 C26,S
505,506,0,1,"Penasco y Castellana, Mr. Victor de Satode",male,18.0,1,0,PC 17758,108.9000,C65,C
27,28,0,1,"Fortune, Mr. Charles Alexander",male,19.0,3,2,19950,263.0000,C23 C25 C27,S
715,716,0,3,"Soholt, Mr. Peter Andreas Lauritz Andersen",male,19.0,0,0,348124,7.6500,F G73,S
...,...,...,...,...,...,...,...,...,...,...,...,...
268,269,1,1,"Graham, Mrs. William Thompson (Edith Junkins)",female,58.0,0,1,PC 17582,153.4625,C125,S
366,367,1,1,"Warren, Mrs. Frank Manley (Anna Sophia Atkinson)",female,60.0,1,0,110813,75.2500,D37,C
587,588,1,1,"Frolicher-Stehli, Mr. Maxmillian",male,60.0,1,1,13567,79.2000,B41,C
275,276,1,1,"Andrews, Miss. Kornelia Theodosia",female,63.0,1,0,13502,77.9583,D7,S


In [51]:
sorted_polars_df = polars_df.sort(['Survived', 'Age'])
sorted_polars_df


PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
206,0,3,"""Strom, Miss. T…","""female""",2.0,0,1,"""347054""",10.4625,"""G6""","""S"""
298,0,1,"""Allison, Miss.…","""female""",2.0,1,2,"""113781""",151.55,"""C22 C26""","""S"""
506,0,1,"""Penasco y Cast…","""male""",18.0,1,0,"""PC 17758""",108.9,"""C65""","""C"""
28,0,1,"""Fortune, Mr. C…","""male""",19.0,3,2,"""19950""",263.0,"""C23 C25 C27""","""S"""
716,0,3,"""Soholt, Mr. Pe…","""male""",19.0,0,0,"""348124""",7.65,"""F G73""","""S"""
749,0,1,"""Marvin, Mr. Da…","""male""",19.0,1,0,"""113773""",53.1,"""D30""","""S"""
103,0,1,"""White, Mr. Ric…","""male""",21.0,0,1,"""35281""",77.2875,"""D26""","""S"""
119,0,1,"""Baxter, Mr. Qu…","""male""",24.0,0,1,"""PC 17558""",247.5208,"""B58 B60""","""C"""
140,0,1,"""Giglio, Mr. Vi…","""male""",24.0,0,0,"""PC 17593""",79.2,"""B86""","""C"""
76,0,3,"""Moen, Mr. Sigu…","""male""",25.0,0,0,"""348123""",7.65,"""F G73""","""S"""


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 13. <b>Merge the dataframe </b>
Merging the dataframe based on common columns</div>


In [52]:
common_column = "Age"
merged_polars_df = polars_df.join(polars_df.clone(), on=[common_column], suffix="_copy")
merged_polars_df

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,PassengerId_copy,Survived_copy,Pclass_copy,Name_copy,Sex_copy,SibSp_copy,Parch_copy,Ticket_copy,Fare_copy,Cabin_copy,Embarked_copy
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str,i64,i64,i64,str,str,i64,i64,str,f64,str,str
2,1,1,"""Cumings, Mrs. …","""female""",38.0,1,0,"""PC 17599""",71.2833,"""C85""","""C""",2,1,1,"""Cumings, Mrs. …","""female""",1,0,"""PC 17599""",71.2833,"""C85""","""C"""
225,1,1,"""Hoyt, Mr. Fred…","""male""",38.0,1,0,"""19943""",90.0,"""C93""","""S""",2,1,1,"""Cumings, Mrs. …","""female""",1,0,"""PC 17599""",71.2833,"""C85""","""C"""
333,0,1,"""Graham, Mr. Ge…","""male""",38.0,0,1,"""PC 17582""",153.4625,"""C91""","""S""",2,1,1,"""Cumings, Mrs. …","""female""",1,0,"""PC 17599""",71.2833,"""C85""","""C"""
717,1,1,"""Endres, Miss. …","""female""",38.0,0,0,"""PC 17757""",227.525,"""C45""","""C""",2,1,1,"""Cumings, Mrs. …","""female""",1,0,"""PC 17599""",71.2833,"""C85""","""C"""
4,1,1,"""Futrelle, Mrs.…","""female""",35.0,1,0,"""113803""",53.1,"""C123""","""S""",4,1,1,"""Futrelle, Mrs.…","""female""",1,0,"""113803""",53.1,"""C123""","""S"""
231,1,1,"""Harris, Mrs. H…","""female""",35.0,1,0,"""36973""",83.475,"""C83""","""S""",4,1,1,"""Futrelle, Mrs.…","""female""",1,0,"""113803""",53.1,"""C123""","""S"""
270,1,1,"""Bissette, Miss…","""female""",35.0,0,0,"""PC 17760""",135.6333,"""C99""","""S""",4,1,1,"""Futrelle, Mrs.…","""female""",1,0,"""113803""",53.1,"""C123""","""S"""
487,1,1,"""Hoyt, Mrs. Fre…","""female""",35.0,1,0,"""19943""",90.0,"""C93""","""S""",4,1,1,"""Futrelle, Mrs.…","""female""",1,0,"""113803""",53.1,"""C123""","""S"""
702,1,1,"""Silverthorne, …","""male""",35.0,0,0,"""PC 17475""",26.2875,"""E24""","""S""",4,1,1,"""Futrelle, Mrs.…","""female""",1,0,"""113803""",53.1,"""C123""","""S"""
738,1,1,"""Lesurer, Mr. G…","""male""",35.0,0,0,"""PC 17755""",512.3292,"""B101""","""C""",4,1,1,"""Futrelle, Mrs.…","""female""",1,0,"""113803""",53.1,"""C123""","""S"""


In [53]:
common_column = "Age"
merged_pandas_df = pd.merge(pandas_df, pandas_df.copy(), on=common_column, suffixes=["", "_copy"])
merged_pandas_df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,...,Survived_copy,Pclass_copy,Name_copy,Sex_copy,SibSp_copy,Parch_copy,Ticket_copy,Fare_copy,Cabin_copy,Embarked_copy
0,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,...,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,1,0,PC 17599,71.2833,C85,C
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,...,1,1,"Hoyt, Mr. Frederick Maxfield",male,1,0,19943,90.0000,C93,S
2,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,...,0,1,"Graham, Mr. George Edward",male,0,1,PC 17582,153.4625,C91,S
3,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,...,1,1,"Endres, Miss. Caroline Louise",female,0,0,PC 17757,227.5250,C45,C
4,225,1,1,"Hoyt, Mr. Frederick Maxfield",male,38.0,1,0,19943,90.0000,...,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,1,0,PC 17599,71.2833,C85,C
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
780,858,1,1,"Daly, Mr. Peter Denis",male,51.0,0,0,113055,26.5500,...,1,1,"Daly, Mr. Peter Denis",male,0,0,113055,26.5500,E17,S
781,773,0,2,"Mack, Mrs. (Mary)",female,57.0,0,0,S.O./P.P. 3,10.5000,...,0,2,"Mack, Mrs. (Mary)",female,0,0,S.O./P.P. 3,10.5000,E77,S
782,780,1,1,"Robert, Mrs. Edward Scott (Elisabeth Walton Mc...",female,43.0,0,1,24160,211.3375,...,1,1,"Robert, Mrs. Edward Scott (Elisabeth Walton Mc...",female,0,1,24160,211.3375,B3,S
783,803,1,1,"Carter, Master. William Thornton II",male,11.0,1,2,113760,120.0000,...,1,1,"Carter, Master. William Thornton II",male,1,2,113760,120.0000,B96 B98,S


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 14. <b>Pivot Table </b>
Create a spreadsheet type pivot table</div>


In [54]:
grouped_polars_df = polars_df.groupby("PassengerId").agg(
    mean_age=pl.mean("Age"), sum_parch=pl.sum("Parch")
)
grouped_polars_df

PassengerId,mean_age,sum_parch
i64,f64,i64
7,54.0,0
880,56.0,1
332,45.5,0
63,45.0,0
461,48.0,0
646,48.0,0
210,40.0,0
485,25.0,0
746,70.0,1
149,36.5,2


In [55]:
pandas_df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.0,1,1,PP 9549,16.7000,G6,S
11,12,1,1,"Bonnell, Miss. Elizabeth",female,58.0,0,0,113783,26.5500,C103,S
...,...,...,...,...,...,...,...,...,...,...,...,...
871,872,1,1,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",female,47.0,1,1,11751,52.5542,D35,S
872,873,0,1,"Carlsson, Mr. Frans Olof",male,33.0,0,0,695,5.0000,B51 B53 B55,S
879,880,1,1,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",female,56.0,0,1,11767,83.1583,C50,C
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S


In [56]:
pivot_table_pandas = pandas_df.pivot_table(
    index=["PassengerId"],
    columns=["Survived", "Pclass"],
    values=["Age", "Parch"],
    aggfunc={"Age": "mean", "Parch": "sum"},
    margins=True,
)

pivot_table_pandas


Unnamed: 0_level_0,Age,Age,Age,Age,Age,Age,Age,Parch,Parch,Parch,Parch,Parch,Parch,Parch
Survived,0,0,0,1,1,1,All,0,0,0,1,1,1,All
Pclass,1,2,3,1,2,3,Unnamed: 7_level_2,1,2,3,1,2,3,Unnamed: 14_level_2
PassengerId,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3
2,,,,38.000000,,,38.000000,,,,0.0,,,0
4,,,,35.000000,,,35.000000,,,,0.0,,,0
7,54.000000,,,,,,54.000000,0.0,,,,,,0
11,,,,,,4.0,4.000000,,,,,,1.0,1
12,,,,58.000000,,,58.000000,,,,0.0,,,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
873,33.000000,,,,,,33.000000,0.0,,,,,,0
880,,,,56.000000,,,56.000000,,,,1.0,,,1
888,,,,19.000000,,,19.000000,,,,0.0,,,0
890,,,,26.000000,,,26.000000,,,,0.0,,,0


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 15. <b>Cross Tab</b>
Create a cross tabulation table</div>


In [57]:
# Convert Polars DataFrame to pandas DataFrame
pandas_df_pl = polars_df.to_pandas()

# Create cross-tabulation using pandas
cross_tab_pd = pd.crosstab(pandas_df_pl['Age'], pandas_df_pl['Age'])

# Convert pandas cross-tabulation back to Polars DataFrame
cross_tab_pl = pl.from_pandas(cross_tab_pd)
cross_tab_pl

0.92,1.0,2.0,3.0,4.0,6.0,11.0,14.0,15.0,16.0,17.0,18.0,19.0,21.0,22.0,23.0,24.0,25.0,26.0,27.0,28.0,29.0,30.0,31.0,32.0,32.5,33.0,34.0,35.0,36.0,36.5,37.0,38.0,39.0,40.0,41.0,42.0,43.0,44.0,45.0,45.5,46.0,47.0,48.0,49.0,50.0,51.0,52.0,53.0,54.0,55.0,56.0,57.0,58.0,60.0,61.0,62.0,63.0,64.0,65.0,70.0,71.0,80.0
i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 16. <b> Apply</b>
Apply a function to each column or element of the dataframe</div>


In [58]:
def custom_function(data):
    result = data *2
    return result

In [59]:
new_columns = []
for column in polars_df.columns:
    new_column = polars_df[column].apply(custom_function)
    new_columns.append(new_column)

polars_df = polars_df.with_columns(new_columns)


In [60]:
polars_df

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
4,2,2,"""Cumings, Mrs. …","""femalefemale""",76.0,2,0,"""PC 17599PC 175…",142.5666,"""C85C85""","""CC"""
8,2,2,"""Futrelle, Mrs.…","""femalefemale""",70.0,2,0,"""113803113803""",106.2,"""C123C123""","""SS"""
14,0,2,"""McCarthy, Mr. …","""malemale""",108.0,0,0,"""1746317463""",103.725,"""E46E46""","""SS"""
22,2,6,"""Sandstrom, Mis…","""femalefemale""",8.0,2,2,"""PP 9549PP 9549…",33.4,"""G6G6""","""SS"""
24,2,2,"""Bonnell, Miss.…","""femalefemale""",116.0,0,0,"""113783113783""",53.1,"""C103C103""","""SS"""
44,2,4,"""Beesley, Mr. L…","""malemale""",68.0,0,0,"""248698248698""",26.0,"""D56D56""","""SS"""
48,2,2,"""Sloper, Mr. Wi…","""malemale""",56.0,0,0,"""113788113788""",71.0,"""A6A6""","""SS"""
56,0,2,"""Fortune, Mr. C…","""malemale""",38.0,6,4,"""1995019950""",526.0,"""C23 C25 C27C23…","""SS"""
106,2,2,"""Harper, Mrs. H…","""femalefemale""",98.0,2,0,"""PC 17572PC 175…",153.4584,"""D33D33""","""CC"""
110,0,2,"""Ostby, Mr. Eng…","""malemale""",130.0,0,2,"""113509113509""",123.9584,"""B30B30""","""CC"""


In [61]:
pandas_df = pandas_df.apply(custom_function)


In [62]:
pandas_df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,4,2,2,"Cumings, Mrs. John Bradley (Florence Briggs Th...",femalefemale,76.0,2,0,PC 17599PC 17599,142.5666,C85C85,CC
3,8,2,2,"Futrelle, Mrs. Jacques Heath (Lily May Peel)Fu...",femalefemale,70.0,2,0,113803113803,106.2000,C123C123,SS
6,14,0,2,"McCarthy, Mr. Timothy JMcCarthy, Mr. Timothy J",malemale,108.0,0,0,1746317463,103.7250,E46E46,SS
10,22,2,6,"Sandstrom, Miss. Marguerite RutSandstrom, Miss...",femalefemale,8.0,2,2,PP 9549PP 9549,33.4000,G6G6,SS
11,24,2,2,"Bonnell, Miss. ElizabethBonnell, Miss. Elizabeth",femalefemale,116.0,0,0,113783113783,53.1000,C103C103,SS
...,...,...,...,...,...,...,...,...,...,...,...,...
871,1744,2,2,"Beckwith, Mrs. Richard Leonard (Sallie Monypen...",femalefemale,94.0,2,2,1175111751,105.1084,D35D35,SS
872,1746,0,2,"Carlsson, Mr. Frans OlofCarlsson, Mr. Frans Olof",malemale,66.0,0,0,695695,10.0000,B51 B53 B55B51 B53 B55,SS
879,1760,2,2,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)P...",femalefemale,112.0,0,2,1176711767,166.3166,C50C50,CC
887,1776,2,2,"Graham, Miss. Margaret EdithGraham, Miss. Marg...",femalefemale,38.0,0,0,112053112053,60.0000,B42B42,SS


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 17. <b> Map</b>
Apply a function to each element of the series</div>


In [63]:
pandas_df = pandas_df.applymap(custom_function)


In [64]:
pandas_df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,8,4,4,"Cumings, Mrs. John Bradley (Florence Briggs Th...",femalefemalefemalefemale,152.0,4,0,PC 17599PC 17599PC 17599PC 17599,285.1332,C85C85C85C85,CCCC
3,16,4,4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)Fu...",femalefemalefemalefemale,140.0,4,0,113803113803113803113803,212.4000,C123C123C123C123,SSSS
6,28,0,4,"McCarthy, Mr. Timothy JMcCarthy, Mr. Timothy J...",malemalemalemale,216.0,0,0,17463174631746317463,207.4500,E46E46E46E46,SSSS
10,44,4,12,"Sandstrom, Miss. Marguerite RutSandstrom, Miss...",femalefemalefemalefemale,16.0,4,4,PP 9549PP 9549PP 9549PP 9549,66.8000,G6G6G6G6,SSSS
11,48,4,4,"Bonnell, Miss. ElizabethBonnell, Miss. Elizabe...",femalefemalefemalefemale,232.0,0,0,113783113783113783113783,106.2000,C103C103C103C103,SSSS
...,...,...,...,...,...,...,...,...,...,...,...,...
871,3488,4,4,"Beckwith, Mrs. Richard Leonard (Sallie Monypen...",femalefemalefemalefemale,188.0,4,4,11751117511175111751,210.2168,D35D35D35D35,SSSS
872,3492,0,4,"Carlsson, Mr. Frans OlofCarlsson, Mr. Frans Ol...",malemalemalemale,132.0,0,0,695695695695,20.0000,B51 B53 B55B51 B53 B55B51 B53 B55B51 B53 B55,SSSS
879,3520,4,4,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)P...",femalefemalefemalefemale,224.0,0,4,11767117671176711767,332.6332,C50C50C50C50,CCCC
887,3552,4,4,"Graham, Miss. Margaret EdithGraham, Miss. Marg...",femalefemalefemalefemale,76.0,0,0,112053112053112053112053,120.0000,B42B42B42B42,SSSS


In [65]:
polars_df

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,f64,i64,i64,str,f64,str,str
4,2,2,"""Cumings, Mrs. …","""femalefemale""",76.0,2,0,"""PC 17599PC 175…",142.5666,"""C85C85""","""CC"""
8,2,2,"""Futrelle, Mrs.…","""femalefemale""",70.0,2,0,"""113803113803""",106.2,"""C123C123""","""SS"""
14,0,2,"""McCarthy, Mr. …","""malemale""",108.0,0,0,"""1746317463""",103.725,"""E46E46""","""SS"""
22,2,6,"""Sandstrom, Mis…","""femalefemale""",8.0,2,2,"""PP 9549PP 9549…",33.4,"""G6G6""","""SS"""
24,2,2,"""Bonnell, Miss.…","""femalefemale""",116.0,0,0,"""113783113783""",53.1,"""C103C103""","""SS"""
44,2,4,"""Beesley, Mr. L…","""malemale""",68.0,0,0,"""248698248698""",26.0,"""D56D56""","""SS"""
48,2,2,"""Sloper, Mr. Wi…","""malemale""",56.0,0,0,"""113788113788""",71.0,"""A6A6""","""SS"""
56,0,2,"""Fortune, Mr. C…","""malemale""",38.0,6,4,"""1995019950""",526.0,"""C23 C25 C27C23…","""SS"""
106,2,2,"""Harper, Mrs. H…","""femalefemale""",98.0,2,0,"""PC 17572PC 175…",153.4584,"""D33D33""","""CC"""
110,0,2,"""Ostby, Mr. Eng…","""malemale""",130.0,0,2,"""113509113509""",123.9584,"""B30B30""","""CC"""


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 18. <b> Replace</b>
Replace specific values in the dataframe</div>


In [66]:
# sample from https://stackoverflow.com/questions/70968749/pandas-replace-equivalent-in-python-polars
df = pl.DataFrame({
    "a": [1, 2, 3, 4, 5]
})

mapper = {
    1: 0,
    2: 0,
    3: 10,
    4: 10
}

df.select(
    pl.all().map_dict(mapper, default=pl.col("a"))
)


a
i64
0
0
10
10
5


In [67]:
# Replace specific values in pandas_df
pandas_df['Age'] = pandas_df['Age'].replace(10, 2)


In [68]:
pandas_df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,8,4,4,"Cumings, Mrs. John Bradley (Florence Briggs Th...",femalefemalefemalefemale,152.0,4,0,PC 17599PC 17599PC 17599PC 17599,285.1332,C85C85C85C85,CCCC
3,16,4,4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)Fu...",femalefemalefemalefemale,140.0,4,0,113803113803113803113803,212.4000,C123C123C123C123,SSSS
6,28,0,4,"McCarthy, Mr. Timothy JMcCarthy, Mr. Timothy J...",malemalemalemale,216.0,0,0,17463174631746317463,207.4500,E46E46E46E46,SSSS
10,44,4,12,"Sandstrom, Miss. Marguerite RutSandstrom, Miss...",femalefemalefemalefemale,16.0,4,4,PP 9549PP 9549PP 9549PP 9549,66.8000,G6G6G6G6,SSSS
11,48,4,4,"Bonnell, Miss. ElizabethBonnell, Miss. Elizabe...",femalefemalefemalefemale,232.0,0,0,113783113783113783113783,106.2000,C103C103C103C103,SSSS
...,...,...,...,...,...,...,...,...,...,...,...,...
871,3488,4,4,"Beckwith, Mrs. Richard Leonard (Sallie Monypen...",femalefemalefemalefemale,188.0,4,4,11751117511175111751,210.2168,D35D35D35D35,SSSS
872,3492,0,4,"Carlsson, Mr. Frans OlofCarlsson, Mr. Frans Ol...",malemalemalemale,132.0,0,0,695695695695,20.0000,B51 B53 B55B51 B53 B55B51 B53 B55B51 B53 B55,SSSS
879,3520,4,4,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)P...",femalefemalefemalefemale,224.0,0,4,11767117671176711767,332.6332,C50C50C50C50,CCCC
887,3552,4,4,"Graham, Miss. Margaret EdithGraham, Miss. Marg...",femalefemalefemalefemale,76.0,0,0,112053112053112053112053,120.0000,B42B42B42B42,SSSS


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 19. <b> Cast</b>
Cast a dataframe to a specific datatype</div>


In [69]:
pandas_df['Age'] = pandas_df['Age'].astype(int)


In [70]:
pandas_df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,8,4,4,"Cumings, Mrs. John Bradley (Florence Briggs Th...",femalefemalefemalefemale,152,4,0,PC 17599PC 17599PC 17599PC 17599,285.1332,C85C85C85C85,CCCC
3,16,4,4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)Fu...",femalefemalefemalefemale,140,4,0,113803113803113803113803,212.4000,C123C123C123C123,SSSS
6,28,0,4,"McCarthy, Mr. Timothy JMcCarthy, Mr. Timothy J...",malemalemalemale,216,0,0,17463174631746317463,207.4500,E46E46E46E46,SSSS
10,44,4,12,"Sandstrom, Miss. Marguerite RutSandstrom, Miss...",femalefemalefemalefemale,16,4,4,PP 9549PP 9549PP 9549PP 9549,66.8000,G6G6G6G6,SSSS
11,48,4,4,"Bonnell, Miss. ElizabethBonnell, Miss. Elizabe...",femalefemalefemalefemale,232,0,0,113783113783113783113783,106.2000,C103C103C103C103,SSSS
...,...,...,...,...,...,...,...,...,...,...,...,...
871,3488,4,4,"Beckwith, Mrs. Richard Leonard (Sallie Monypen...",femalefemalefemalefemale,188,4,4,11751117511175111751,210.2168,D35D35D35D35,SSSS
872,3492,0,4,"Carlsson, Mr. Frans OlofCarlsson, Mr. Frans Ol...",malemalemalemale,132,0,0,695695695695,20.0000,B51 B53 B55B51 B53 B55B51 B53 B55B51 B53 B55,SSSS
879,3520,4,4,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)P...",femalefemalefemalefemale,224,0,4,11767117671176711767,332.6332,C50C50C50C50,CCCC
887,3552,4,4,"Graham, Miss. Margaret EdithGraham, Miss. Marg...",femalefemalefemalefemale,76,0,0,112053112053112053112053,120.0000,B42B42B42B42,SSSS


In [88]:
polars_df = polars_df.with_columns([
    pl.col('Age').cast(pl.Int64) 
])

In [89]:
polars_df

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,i64,i64,i64,str,f64,str,str
4,2,2,"""Cumings, Mrs. …","""femalefemale""",76,2,0,"""PC 17599PC 175…",142.5666,"""C85C85""","""CC"""
8,2,2,"""Futrelle, Mrs.…","""femalefemale""",70,2,0,"""113803113803""",106.2,"""C123C123""","""SS"""
14,0,2,"""McCarthy, Mr. …","""malemale""",108,0,0,"""1746317463""",103.725,"""E46E46""","""SS"""
22,2,6,"""Sandstrom, Mis…","""femalefemale""",8,2,2,"""PP 9549PP 9549…",33.4,"""G6G6""","""SS"""
24,2,2,"""Bonnell, Miss.…","""femalefemale""",116,0,0,"""113783113783""",53.1,"""C103C103""","""SS"""
44,2,4,"""Beesley, Mr. L…","""malemale""",68,0,0,"""248698248698""",26.0,"""D56D56""","""SS"""
48,2,2,"""Sloper, Mr. Wi…","""malemale""",56,0,0,"""113788113788""",71.0,"""A6A6""","""SS"""
56,0,2,"""Fortune, Mr. C…","""malemale""",38,6,4,"""1995019950""",526.0,"""C23 C25 C27C23…","""SS"""
106,2,2,"""Harper, Mrs. H…","""femalefemale""",98,2,0,"""PC 17572PC 175…",153.4584,"""D33D33""","""CC"""
110,0,2,"""Ostby, Mr. Eng…","""malemale""",130,0,2,"""113509113509""",123.9584,"""B30B30""","""CC"""


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 20. <b> Write to CSV</b>
Write dataframe to CSV</div>


In [93]:
pandas_df.to_csv('pandas_data.csv', index=False)

In [94]:
polars_df.write_csv('polars_data.csv', separator=",")


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 20. <b> Write to SQL</b>
Write dataframe to SQL database</div>


In [95]:
import sqlalchemy
# Assuming you have a pandas DataFrame called 'pandas_df'
engine = sqlalchemy.create_engine('sqlite:///titanic.db')
pandas_df.to_sql('titanic', con=engine, if_exists='replace', index=False)

183

<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
 Polars writing to SQL database is not supported, directly. Refer <a href="https://www.kaggle.com/discussions/questions-and-answers/420549#2326923"> Link </a> </div>

<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 20. <b> Pivot</b>
Reshape a dataframe based on column values</div>


In [100]:
pandas_df_pivoted = pd.pivot_table(
    pandas_df,
    index=['PassengerId', 'Survived', 'Pclass', 'Name', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
    columns='Sex',
    values='Name',
    aggfunc='count',
    fill_value=0
)

In [102]:
pandas_df_pivoted

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0,Sex
PassengerId,Survived,Pclass,Name,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
8,4,4,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)Cumings, Mrs. John Bradley (Florence Briggs Thayer)Cumings, Mrs. John Bradley (Florence Briggs Thayer)Cumings, Mrs. John Bradley (Florence Briggs Thayer)",152,4,0,PC 17599PC 17599PC 17599PC 17599,285.1332,C85C85C85C85,CCCC
16,4,4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)Futrelle, Mrs. Jacques Heath (Lily May Peel)Futrelle, Mrs. Jacques Heath (Lily May Peel)Futrelle, Mrs. Jacques Heath (Lily May Peel)",140,4,0,113803113803113803113803,212.4000,C123C123C123C123,SSSS
28,0,4,"McCarthy, Mr. Timothy JMcCarthy, Mr. Timothy JMcCarthy, Mr. Timothy JMcCarthy, Mr. Timothy J",216,0,0,17463174631746317463,207.4500,E46E46E46E46,SSSS
44,4,12,"Sandstrom, Miss. Marguerite RutSandstrom, Miss. Marguerite RutSandstrom, Miss. Marguerite RutSandstrom, Miss. Marguerite Rut",16,4,4,PP 9549PP 9549PP 9549PP 9549,66.8000,G6G6G6G6,SSSS
48,4,4,"Bonnell, Miss. ElizabethBonnell, Miss. ElizabethBonnell, Miss. ElizabethBonnell, Miss. Elizabeth",232,0,0,113783113783113783113783,106.2000,C103C103C103C103,SSSS
...,...,...,...,...,...,...,...,...,...,...
3488,4,4,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)Beckwith, Mrs. Richard Leonard (Sallie Monypeny)Beckwith, Mrs. Richard Leonard (Sallie Monypeny)Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",188,4,4,11751117511175111751,210.2168,D35D35D35D35,SSSS
3492,0,4,"Carlsson, Mr. Frans OlofCarlsson, Mr. Frans OlofCarlsson, Mr. Frans OlofCarlsson, Mr. Frans Olof",132,0,0,695695695695,20.0000,B51 B53 B55B51 B53 B55B51 B53 B55B51 B53 B55,SSSS
3520,4,4,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",224,0,4,11767117671176711767,332.6332,C50C50C50C50,CCCC
3552,4,4,"Graham, Miss. Margaret EdithGraham, Miss. Margaret EdithGraham, Miss. Margaret EdithGraham, Miss. Margaret Edith",76,0,0,112053112053112053112053,120.0000,B42B42B42B42,SSSS


In [121]:
polars_df.pivot(
    index="Age",
    columns="Parch",
    values="Survived",
    aggregate_function=pl.element().tanh().mean(),
)

Age,0,2,4,8
i64,f64,f64,f64,f64
76,0.964028,0.0,,
70,0.964028,,,
108,0.482014,0.0,,
8,,0.964028,0.964028,
116,0.642685,0.964028,0.0,
68,0.964028,,,
56,0.964028,,,
38,0.482014,,0.482014,
98,0.964028,0.0,,
130,0.0,0.0,,


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 21. <b> Melt</b>
Unpivot a table from wide format to long format </div>


In [122]:
melted_pandas_df = pandas_df.melt(id_vars=['PassengerId', 'Survived', 'Pclass', 'Name', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'], var_name='Sex', value_name='Value')


In [123]:
melted_pandas_df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Sex,Value
0,8,4,4,"Cumings, Mrs. John Bradley (Florence Briggs Th...",152,4,0,PC 17599PC 17599PC 17599PC 17599,285.1332,C85C85C85C85,CCCC,Sex,femalefemalefemalefemale
1,16,4,4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)Fu...",140,4,0,113803113803113803113803,212.4000,C123C123C123C123,SSSS,Sex,femalefemalefemalefemale
2,28,0,4,"McCarthy, Mr. Timothy JMcCarthy, Mr. Timothy J...",216,0,0,17463174631746317463,207.4500,E46E46E46E46,SSSS,Sex,malemalemalemale
3,44,4,12,"Sandstrom, Miss. Marguerite RutSandstrom, Miss...",16,4,4,PP 9549PP 9549PP 9549PP 9549,66.8000,G6G6G6G6,SSSS,Sex,femalefemalefemalefemale
4,48,4,4,"Bonnell, Miss. ElizabethBonnell, Miss. Elizabe...",232,0,0,113783113783113783113783,106.2000,C103C103C103C103,SSSS,Sex,femalefemalefemalefemale
...,...,...,...,...,...,...,...,...,...,...,...,...,...
178,3488,4,4,"Beckwith, Mrs. Richard Leonard (Sallie Monypen...",188,4,4,11751117511175111751,210.2168,D35D35D35D35,SSSS,Sex,femalefemalefemalefemale
179,3492,0,4,"Carlsson, Mr. Frans OlofCarlsson, Mr. Frans Ol...",132,0,0,695695695695,20.0000,B51 B53 B55B51 B53 B55B51 B53 B55B51 B53 B55,SSSS,Sex,malemalemalemale
180,3520,4,4,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)P...",224,0,4,11767117671176711767,332.6332,C50C50C50C50,CCCC,Sex,femalefemalefemalefemale
181,3552,4,4,"Graham, Miss. Margaret EdithGraham, Miss. Marg...",76,0,0,112053112053112053112053,120.0000,B42B42B42B42,SSSS,Sex,femalefemalefemalefemale


In [127]:
polars_df.melt(id_vars="Age", value_vars=["Survived","Fare"])

Age,variable,value
i64,str,f64
76,"""Survived""",2.0
70,"""Survived""",2.0
108,"""Survived""",0.0
8,"""Survived""",2.0
116,"""Survived""",2.0
68,"""Survived""",2.0
56,"""Survived""",2.0
38,"""Survived""",0.0
98,"""Survived""",2.0
130,"""Survived""",0.0


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 22. <b> Drop Duplicates</b>
Drop duplicate values </div>


In [128]:
pandas_df = pandas_df.drop_duplicates()


In [129]:
pandas_df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,8,4,4,"Cumings, Mrs. John Bradley (Florence Briggs Th...",femalefemalefemalefemale,152,4,0,PC 17599PC 17599PC 17599PC 17599,285.1332,C85C85C85C85,CCCC
3,16,4,4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)Fu...",femalefemalefemalefemale,140,4,0,113803113803113803113803,212.4000,C123C123C123C123,SSSS
6,28,0,4,"McCarthy, Mr. Timothy JMcCarthy, Mr. Timothy J...",malemalemalemale,216,0,0,17463174631746317463,207.4500,E46E46E46E46,SSSS
10,44,4,12,"Sandstrom, Miss. Marguerite RutSandstrom, Miss...",femalefemalefemalefemale,16,4,4,PP 9549PP 9549PP 9549PP 9549,66.8000,G6G6G6G6,SSSS
11,48,4,4,"Bonnell, Miss. ElizabethBonnell, Miss. Elizabe...",femalefemalefemalefemale,232,0,0,113783113783113783113783,106.2000,C103C103C103C103,SSSS
...,...,...,...,...,...,...,...,...,...,...,...,...
871,3488,4,4,"Beckwith, Mrs. Richard Leonard (Sallie Monypen...",femalefemalefemalefemale,188,4,4,11751117511175111751,210.2168,D35D35D35D35,SSSS
872,3492,0,4,"Carlsson, Mr. Frans OlofCarlsson, Mr. Frans Ol...",malemalemalemale,132,0,0,695695695695,20.0000,B51 B53 B55B51 B53 B55B51 B53 B55B51 B53 B55,SSSS
879,3520,4,4,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)P...",femalefemalefemalefemale,224,0,4,11767117671176711767,332.6332,C50C50C50C50,CCCC
887,3552,4,4,"Graham, Miss. Margaret EdithGraham, Miss. Marg...",femalefemalefemalefemale,76,0,0,112053112053112053112053,120.0000,B42B42B42B42,SSSS


In [132]:
polars_df = polars_df.unique()

In [133]:
polars_df

PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,i64,str,str,i64,i64,i64,str,f64,str,str
24,2,2,"""Bonnell, Miss.…","""femalefemale""",116,0,0,"""113783113783""",53.1,"""C103C103""","""SS"""
110,0,2,"""Ostby, Mr. Eng…","""malemale""",130,0,2,"""113509113509""",123.9584,"""B30B30""","""CC"""
126,0,2,"""Harris, Mr. He…","""malemale""",90,2,0,"""3697336973""",166.95,"""C83C83""","""SS"""
186,0,2,"""Chaffee, Mr. H…","""malemale""",92,2,0,"""W.E.P. 5734W.E…",122.35,"""E31E31""","""SS"""
206,0,2,"""White, Mr. Ric…","""malemale""",42,0,2,"""3528135281""",154.575,"""D26D26""","""SS"""
238,0,2,"""Baxter, Mr. Qu…","""malemale""",48,0,2,"""PC 17558PC 175…",495.0416,"""B58 B60B58 B60…","""CC"""
342,0,2,"""Van der hoef, …","""malemale""",122,0,0,"""111240111240""",67.0,"""B19B19""","""SS"""
350,0,2,"""Smith, Mr. Jam…","""malemale""",112,0,0,"""1776417764""",61.3916,"""A7A7""","""CC"""
438,2,2,"""Bazzani, Miss.…","""femalefemale""",64,0,0,"""1181311813""",152.5834,"""D15D15""","""CC"""
450,2,2,"""Hoyt, Mr. Fred…","""malemale""",76,2,0,"""1994319943""",180.0,"""C93C93""","""SS"""


<div style="background-color:#F0E3D2; color:#19180F; font-size:15px; font-family:Verdana; padding:10px; border: 2px solid #19180F; border-radius:10px"> 
    📌 23. <b> Set Index</b>
Set dataframe index(Row labels) </div>


In [137]:
pandas_df = pandas_df.set_index('PassengerId')

In [138]:
pandas_df

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
8,4,4,"Cumings, Mrs. John Bradley (Florence Briggs Th...",femalefemalefemalefemale,152,4,0,PC 17599PC 17599PC 17599PC 17599,285.1332,C85C85C85C85,CCCC
16,4,4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)Fu...",femalefemalefemalefemale,140,4,0,113803113803113803113803,212.4000,C123C123C123C123,SSSS
28,0,4,"McCarthy, Mr. Timothy JMcCarthy, Mr. Timothy J...",malemalemalemale,216,0,0,17463174631746317463,207.4500,E46E46E46E46,SSSS
44,4,12,"Sandstrom, Miss. Marguerite RutSandstrom, Miss...",femalefemalefemalefemale,16,4,4,PP 9549PP 9549PP 9549PP 9549,66.8000,G6G6G6G6,SSSS
48,4,4,"Bonnell, Miss. ElizabethBonnell, Miss. Elizabe...",femalefemalefemalefemale,232,0,0,113783113783113783113783,106.2000,C103C103C103C103,SSSS
...,...,...,...,...,...,...,...,...,...,...,...
3488,4,4,"Beckwith, Mrs. Richard Leonard (Sallie Monypen...",femalefemalefemalefemale,188,4,4,11751117511175111751,210.2168,D35D35D35D35,SSSS
3492,0,4,"Carlsson, Mr. Frans OlofCarlsson, Mr. Frans Ol...",malemalemalemale,132,0,0,695695695695,20.0000,B51 B53 B55B51 B53 B55B51 B53 B55B51 B53 B55,SSSS
3520,4,4,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)P...",femalefemalefemalefemale,224,0,4,11767117671176711767,332.6332,C50C50C50C50,CCCC
3552,4,4,"Graham, Miss. Margaret EdithGraham, Miss. Marg...",femalefemalefemalefemale,76,0,0,112053112053112053112053,120.0000,B42B42B42B42,SSSS


In [141]:
pl.from_pandas(polars_df.to_pandas().set_index("PassengerId"))


Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
i64,i64,str,str,i64,i64,i64,str,f64,str,str
2,2,"""Bonnell, Miss.…","""femalefemale""",116,0,0,"""113783113783""",53.1,"""C103C103""","""SS"""
0,2,"""Ostby, Mr. Eng…","""malemale""",130,0,2,"""113509113509""",123.9584,"""B30B30""","""CC"""
0,2,"""Harris, Mr. He…","""malemale""",90,2,0,"""3697336973""",166.95,"""C83C83""","""SS"""
0,2,"""Chaffee, Mr. H…","""malemale""",92,2,0,"""W.E.P. 5734W.E…",122.35,"""E31E31""","""SS"""
0,2,"""White, Mr. Ric…","""malemale""",42,0,2,"""3528135281""",154.575,"""D26D26""","""SS"""
0,2,"""Baxter, Mr. Qu…","""malemale""",48,0,2,"""PC 17558PC 175…",495.0416,"""B58 B60B58 B60…","""CC"""
0,2,"""Van der hoef, …","""malemale""",122,0,0,"""111240111240""",67.0,"""B19B19""","""SS"""
0,2,"""Smith, Mr. Jam…","""malemale""",112,0,0,"""1776417764""",61.3916,"""A7A7""","""CC"""
2,2,"""Bazzani, Miss.…","""femalefemale""",64,0,0,"""1181311813""",152.5834,"""D15D15""","""CC"""
2,2,"""Hoyt, Mr. Fred…","""malemale""",76,2,0,"""1994319943""",180.0,"""C93C93""","""SS"""
