# Polars Tutorial - Part 1: Installation and Basics

Welcome to the Polars tutorial series! In this notebook, we'll cover:
- Installing Polars
- Creating DataFrames
- Basic operations
- Understanding data types and schemas

## 1. Installation

Polars can be installed using pip:

In [2]:
# Install Polars (run this if not already installed)
# !pip install polars

Collecting polars
  Downloading polars-1.36.1-py3-none-any.whl.metadata (10 kB)
Collecting polars-runtime-32==1.36.1 (from polars)
  Downloading polars_runtime_32-1.36.1-cp39-abi3-macosx_11_0_arm64.whl.metadata (1.5 kB)
Downloading polars-1.36.1-py3-none-any.whl (802 kB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m802.4/802.4 kB[0m [31m3.3 MB/s[0m  [33m0:00:00[0m3.4 MB/s[0m eta [36m0:00:01[0m
[?25hDownloading polars_runtime_32-1.36.1-cp39-abi3-macosx_11_0_arm64.whl (39.3 MB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.3/39.3 MB[0m [31m2.8 MB/s[0m  [33m0:00:13[0m[0m eta [36m0:00:01[0m[36m0:00:01[0m
[?25hInstalling collected packages: polars-runtime-32, polars
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2/2[0m [polars]━━━━[0m [32m1/2[0m [polars]
[1A[2KSuccessfully installed polars-1.36.1 polars-runtime-32-1.36.1


For additional features, you can install optional dependencies:

In [None]:
# Install with all optional dependencies
!pip install 'polars[all]'

# Or install specific components
!pip install polars pyarrow  # For Parquet support

Collecting deltalake>=1.0.0 (from polars[all])
  Downloading deltalake-1.3.0-cp310-abi3-macosx_11_0_arm64.whl.metadata (5.4 kB)
Collecting pyiceberg>=0.7.1 (from polars[all])
  Downloading pyiceberg-0.10.0-cp312-cp312-macosx_11_0_arm64.whl.metadata (4.9 kB)
Collecting altair>=5.4.0 (from polars[all])
  Downloading altair-6.0.0-py3-none-any.whl.metadata (11 kB)
Collecting great-tables>=0.8.0 (from polars[all])
  Downloading great_tables-0.20.0-py3-none-any.whl.metadata (12 kB)
Collecting narwhals>=1.27.1 (from altair>=5.4.0->polars[all])
  Downloading narwhals-2.14.0-py3-none-any.whl.metadata (13 kB)
Collecting arro3-core>=0.5.0 (from deltalake>=1.0.0->polars[all])
  Downloading arro3_core-0.6.5-cp311-abi3-macosx_11_0_arm64.whl.metadata (363 bytes)
Collecting commonmark>=0.9.1 (from great-tables>=0.8.0->polars[all])
  Downloading commonmark-0.9.1-py2.py3-none-any.whl.metadata (5.7 kB)
Collecting faicons>=0.2.2 (from great-tables>=0.8.0->polars[all])
  Downloading faicons-0.2.2-py3-none-

## 2. Importing Polars

In [3]:
import polars as pl
import numpy as np
from datetime import datetime, timedelta

# Check the version
print(f"Polars version: {pl.__version__}")

Polars version: 1.36.1


## 3. Creating DataFrames

### 3.1 From a Dictionary

In [4]:
# Create a simple DataFrame from a dictionary
df = pl.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'age': [25, 30, 35, 28, 32],
    'city': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney'],
    'salary': [70000, 80000, 90000, 75000, 85000]
})

print("DataFrame from dictionary:")
print(df)

DataFrame from dictionary:
shape: (5, 4)
┌─────────┬─────┬──────────┬────────┐
│ name    ┆ age ┆ city     ┆ salary │
│ ---     ┆ --- ┆ ---      ┆ ---    │
│ str     ┆ i64 ┆ str      ┆ i64    │
╞═════════╪═════╪══════════╪════════╡
│ Alice   ┆ 25  ┆ New York ┆ 70000  │
│ Bob     ┆ 30  ┆ London   ┆ 80000  │
│ Charlie ┆ 35  ┆ Paris    ┆ 90000  │
│ Diana   ┆ 28  ┆ Tokyo    ┆ 75000  │
│ Eve     ┆ 32  ┆ Sydney   ┆ 85000  │
└─────────┴─────┴──────────┴────────┘


### 3.2 From a List of Dictionaries

In [5]:
# Create DataFrame from list of dictionaries
data = [
    {'product': 'Laptop', 'price': 999.99, 'quantity': 5},
    {'product': 'Mouse', 'price': 29.99, 'quantity': 15},
    {'product': 'Keyboard', 'price': 79.99, 'quantity': 10}
]

df_products = pl.DataFrame(data)
print("\nDataFrame from list of dictionaries:")
print(df_products)


DataFrame from list of dictionaries:
shape: (3, 3)
┌──────────┬────────┬──────────┐
│ product  ┆ price  ┆ quantity │
│ ---      ┆ ---    ┆ ---      │
│ str      ┆ f64    ┆ i64      │
╞══════════╪════════╪══════════╡
│ Laptop   ┆ 999.99 ┆ 5        │
│ Mouse    ┆ 29.99  ┆ 15       │
│ Keyboard ┆ 79.99  ┆ 10       │
└──────────┴────────┴──────────┘


### 3.3 From NumPy Arrays

In [6]:
# Create DataFrame from NumPy arrays
np_data = np.random.randn(5, 3)
df_numpy = pl.DataFrame(
    np_data,
    schema=['col1', 'col2', 'col3']
)

print("\nDataFrame from NumPy array:")
print(df_numpy)


DataFrame from NumPy array:
shape: (5, 3)
┌───────────┬───────────┬───────────┐
│ col1      ┆ col2      ┆ col3      │
│ ---       ┆ ---       ┆ ---       │
│ f64       ┆ f64       ┆ f64       │
╞═══════════╪═══════════╪═══════════╡
│ 1.520388  ┆ 0.752315  ┆ 0.488913  │
│ 0.503146  ┆ -0.790752 ┆ 0.684321  │
│ -0.03704  ┆ -1.67807  ┆ -0.208617 │
│ 0.457854  ┆ 0.241205  ┆ 1.893789  │
│ -0.387471 ┆ -0.863062 ┆ -0.582842 │
└───────────┴───────────┴───────────┘


### 3.4 Creating Series

In [7]:
# Create a Polars Series (single column)
series = pl.Series('numbers', [1, 2, 3, 4, 5])
print("\nPolars Series:")
print(series)

# Series with different data types
string_series = pl.Series('names', ['Alice', 'Bob', 'Charlie'])
bool_series = pl.Series('flags', [True, False, True, False])

print("\nString Series:")
print(string_series)
print("\nBoolean Series:")
print(bool_series)


Polars Series:
shape: (5,)
Series: 'numbers' [i64]
[
	1
	2
	3
	4
	5
]

String Series:
shape: (3,)
Series: 'names' [str]
[
	"Alice"
	"Bob"
	"Charlie"
]

Boolean Series:
shape: (4,)
Series: 'flags' [bool]
[
	true
	false
	true
	false
]


## 4. Basic DataFrame Operations

### 4.1 Viewing Data

In [8]:
# View first few rows
print("First 3 rows:")
print(df.head(3))

# View last few rows
print("\nLast 2 rows:")
print(df.tail(2))

# Get DataFrame shape
print(f"\nShape: {df.shape}")
print(f"Rows: {df.height}, Columns: {df.width}")

First 3 rows:
shape: (3, 4)
┌─────────┬─────┬──────────┬────────┐
│ name    ┆ age ┆ city     ┆ salary │
│ ---     ┆ --- ┆ ---      ┆ ---    │
│ str     ┆ i64 ┆ str      ┆ i64    │
╞═════════╪═════╪══════════╪════════╡
│ Alice   ┆ 25  ┆ New York ┆ 70000  │
│ Bob     ┆ 30  ┆ London   ┆ 80000  │
│ Charlie ┆ 35  ┆ Paris    ┆ 90000  │
└─────────┴─────┴──────────┴────────┘

Last 2 rows:
shape: (2, 4)
┌───────┬─────┬────────┬────────┐
│ name  ┆ age ┆ city   ┆ salary │
│ ---   ┆ --- ┆ ---    ┆ ---    │
│ str   ┆ i64 ┆ str    ┆ i64    │
╞═══════╪═════╪════════╪════════╡
│ Diana ┆ 28  ┆ Tokyo  ┆ 75000  │
│ Eve   ┆ 32  ┆ Sydney ┆ 85000  │
└───────┴─────┴────────┴────────┘

Shape: (5, 4)
Rows: 5, Columns: 4


### 4.2 Getting Information

In [9]:
# Get column names
print("Column names:", df.columns)

# Get data types
print("\nData types:")
print(df.dtypes)

# Get schema
print("\nSchema:")
print(df.schema)

# Get descriptive statistics
print("\nDescriptive statistics:")
print(df.describe())

Column names: ['name', 'age', 'city', 'salary']

Data types:
[String, Int64, String, Int64]

Schema:
Schema({'name': String, 'age': Int64, 'city': String, 'salary': Int64})

Descriptive statistics:
shape: (9, 5)
┌────────────┬───────┬──────────┬────────┬────────────┐
│ statistic  ┆ name  ┆ age      ┆ city   ┆ salary     │
│ ---        ┆ ---   ┆ ---      ┆ ---    ┆ ---        │
│ str        ┆ str   ┆ f64      ┆ str    ┆ f64        │
╞════════════╪═══════╪══════════╪════════╪════════════╡
│ count      ┆ 5     ┆ 5.0      ┆ 5      ┆ 5.0        │
│ null_count ┆ 0     ┆ 0.0      ┆ 0      ┆ 0.0        │
│ mean       ┆ null  ┆ 30.0     ┆ null   ┆ 80000.0    │
│ std        ┆ null  ┆ 3.807887 ┆ null   ┆ 7905.69415 │
│ min        ┆ Alice ┆ 25.0     ┆ London ┆ 70000.0    │
│ 25%        ┆ null  ┆ 28.0     ┆ null   ┆ 75000.0    │
│ 50%        ┆ null  ┆ 30.0     ┆ null   ┆ 80000.0    │
│ 75%        ┆ null  ┆ 32.0     ┆ null   ┆ 85000.0    │
│ max        ┆ Eve   ┆ 35.0     ┆ Tokyo  ┆ 90000.0    │
└───

### 4.3 Selecting Columns

In [10]:
# Select a single column (returns Series)
print("Single column (Series):")
print(df['name'])

# Select a single column (returns DataFrame)
print("\nSingle column (DataFrame):")
print(df.select('name'))

# Select multiple columns
print("\nMultiple columns:")
print(df.select(['name', 'age', 'salary']))

Single column (Series):
shape: (5,)
Series: 'name' [str]
[
	"Alice"
	"Bob"
	"Charlie"
	"Diana"
	"Eve"
]

Single column (DataFrame):
shape: (5, 1)
┌─────────┐
│ name    │
│ ---     │
│ str     │
╞═════════╡
│ Alice   │
│ Bob     │
│ Charlie │
│ Diana   │
│ Eve     │
└─────────┘

Multiple columns:
shape: (5, 3)
┌─────────┬─────┬────────┐
│ name    ┆ age ┆ salary │
│ ---     ┆ --- ┆ ---    │
│ str     ┆ i64 ┆ i64    │
╞═════════╪═════╪════════╡
│ Alice   ┆ 25  ┆ 70000  │
│ Bob     ┆ 30  ┆ 80000  │
│ Charlie ┆ 35  ┆ 90000  │
│ Diana   ┆ 28  ┆ 75000  │
│ Eve     ┆ 32  ┆ 85000  │
└─────────┴─────┴────────┘


### 4.4 Filtering Rows

In [11]:
# Filter using conditions
print("People older than 28:")
print(df.filter(pl.col('age') > 28))

# Multiple conditions
print("\nPeople older than 28 with salary > 80000:")
print(df.filter(
    (pl.col('age') > 28) & (pl.col('salary') > 80000)
))

# Filter using string operations
print("\nCities starting with 'L' or 'P':")
print(df.filter(
    pl.col('city').str.starts_with('L') | pl.col('city').str.starts_with('P')
))

People older than 28:
shape: (3, 4)
┌─────────┬─────┬────────┬────────┐
│ name    ┆ age ┆ city   ┆ salary │
│ ---     ┆ --- ┆ ---    ┆ ---    │
│ str     ┆ i64 ┆ str    ┆ i64    │
╞═════════╪═════╪════════╪════════╡
│ Bob     ┆ 30  ┆ London ┆ 80000  │
│ Charlie ┆ 35  ┆ Paris  ┆ 90000  │
│ Eve     ┆ 32  ┆ Sydney ┆ 85000  │
└─────────┴─────┴────────┴────────┘

People older than 28 with salary > 80000:
shape: (2, 4)
┌─────────┬─────┬────────┬────────┐
│ name    ┆ age ┆ city   ┆ salary │
│ ---     ┆ --- ┆ ---    ┆ ---    │
│ str     ┆ i64 ┆ str    ┆ i64    │
╞═════════╪═════╪════════╪════════╡
│ Charlie ┆ 35  ┆ Paris  ┆ 90000  │
│ Eve     ┆ 32  ┆ Sydney ┆ 85000  │
└─────────┴─────┴────────┴────────┘

Cities starting with 'L' or 'P':
shape: (2, 4)
┌─────────┬─────┬────────┬────────┐
│ name    ┆ age ┆ city   ┆ salary │
│ ---     ┆ --- ┆ ---    ┆ ---    │
│ str     ┆ i64 ┆ str    ┆ i64    │
╞═════════╪═════╪════════╪════════╡
│ Bob     ┆ 30  ┆ London ┆ 80000  │
│ Charlie ┆ 35  ┆ Paris  ┆ 9000

## 5. Data Types in Polars

### 5.1 Common Data Types

In [12]:
# Create DataFrame with various data types
df_types = pl.DataFrame({
    'integers': [1, 2, 3, 4, 5],
    'floats': [1.1, 2.2, 3.3, 4.4, 5.5],
    'strings': ['a', 'b', 'c', 'd', 'e'],
    'booleans': [True, False, True, False, True],
    'dates': pl.date_range(
        datetime(2024, 1, 1),
        datetime(2024, 1, 5),
        interval='1d',
        eager=True
    )
})

print("DataFrame with various types:")
print(df_types)
print("\nData types:")
print(df_types.dtypes)

DataFrame with various types:
shape: (5, 5)
┌──────────┬────────┬─────────┬──────────┬────────────┐
│ integers ┆ floats ┆ strings ┆ booleans ┆ dates      │
│ ---      ┆ ---    ┆ ---     ┆ ---      ┆ ---        │
│ i64      ┆ f64    ┆ str     ┆ bool     ┆ date       │
╞══════════╪════════╪═════════╪══════════╪════════════╡
│ 1        ┆ 1.1    ┆ a       ┆ true     ┆ 2024-01-01 │
│ 2        ┆ 2.2    ┆ b       ┆ false    ┆ 2024-01-02 │
│ 3        ┆ 3.3    ┆ c       ┆ true     ┆ 2024-01-03 │
│ 4        ┆ 4.4    ┆ d       ┆ false    ┆ 2024-01-04 │
│ 5        ┆ 5.5    ┆ e       ┆ true     ┆ 2024-01-05 │
└──────────┴────────┴─────────┴──────────┴────────────┘

Data types:
[Int64, Float64, String, Boolean, Date]


### 5.2 Casting Data Types

In [13]:
# Cast column to different type
df_cast = df_types.with_columns(
    pl.col('integers').cast(pl.Float64).alias('integers_as_float'),
    pl.col('floats').cast(pl.Int64).alias('floats_as_int'),
    pl.col('booleans').cast(pl.Int8).alias('booleans_as_int')
)

print("DataFrame with casted columns:")
print(df_cast)
print("\nNew data types:")
print(df_cast.dtypes)

DataFrame with casted columns:
shape: (5, 8)
┌──────────┬────────┬─────────┬──────────┬────────────┬──────────────┬──────────────┬──────────────┐
│ integers ┆ floats ┆ strings ┆ booleans ┆ dates      ┆ integers_as_ ┆ floats_as_in ┆ booleans_as_ │
│ ---      ┆ ---    ┆ ---     ┆ ---      ┆ ---        ┆ float        ┆ t            ┆ int          │
│ i64      ┆ f64    ┆ str     ┆ bool     ┆ date       ┆ ---          ┆ ---          ┆ ---          │
│          ┆        ┆         ┆          ┆            ┆ f64          ┆ i64          ┆ i8           │
╞══════════╪════════╪═════════╪══════════╪════════════╪══════════════╪══════════════╪══════════════╡
│ 1        ┆ 1.1    ┆ a       ┆ true     ┆ 2024-01-01 ┆ 1.0          ┆ 1            ┆ 1            │
│ 2        ┆ 2.2    ┆ b       ┆ false    ┆ 2024-01-02 ┆ 2.0          ┆ 2            ┆ 0            │
│ 3        ┆ 3.3    ┆ c       ┆ true     ┆ 2024-01-03 ┆ 3.0          ┆ 3            ┆ 1            │
│ 4        ┆ 4.4    ┆ d       ┆ false    ┆ 202

## 6. Adding and Modifying Columns

### 6.1 Adding New Columns

In [14]:
# Add a new column
df_new = df.with_columns(
    pl.lit('USA').alias('country'),
    (pl.col('salary') * 1.1).alias('salary_after_raise')
)

print("DataFrame with new columns:")
print(df_new)

DataFrame with new columns:
shape: (5, 6)
┌─────────┬─────┬──────────┬────────┬─────────┬────────────────────┐
│ name    ┆ age ┆ city     ┆ salary ┆ country ┆ salary_after_raise │
│ ---     ┆ --- ┆ ---      ┆ ---    ┆ ---     ┆ ---                │
│ str     ┆ i64 ┆ str      ┆ i64    ┆ str     ┆ f64                │
╞═════════╪═════╪══════════╪════════╪═════════╪════════════════════╡
│ Alice   ┆ 25  ┆ New York ┆ 70000  ┆ USA     ┆ 77000.0            │
│ Bob     ┆ 30  ┆ London   ┆ 80000  ┆ USA     ┆ 88000.0            │
│ Charlie ┆ 35  ┆ Paris    ┆ 90000  ┆ USA     ┆ 99000.0            │
│ Diana   ┆ 28  ┆ Tokyo    ┆ 75000  ┆ USA     ┆ 82500.0            │
│ Eve     ┆ 32  ┆ Sydney   ┆ 85000  ┆ USA     ┆ 93500.0            │
└─────────┴─────┴──────────┴────────┴─────────┴────────────────────┘


### 6.2 Renaming Columns

In [15]:
# Rename columns
df_renamed = df.rename({
    'name': 'employee_name',
    'age': 'employee_age'
})

print("DataFrame with renamed columns:")
print(df_renamed)

DataFrame with renamed columns:
shape: (5, 4)
┌───────────────┬──────────────┬──────────┬────────┐
│ employee_name ┆ employee_age ┆ city     ┆ salary │
│ ---           ┆ ---          ┆ ---      ┆ ---    │
│ str           ┆ i64          ┆ str      ┆ i64    │
╞═══════════════╪══════════════╪══════════╪════════╡
│ Alice         ┆ 25           ┆ New York ┆ 70000  │
│ Bob           ┆ 30           ┆ London   ┆ 80000  │
│ Charlie       ┆ 35           ┆ Paris    ┆ 90000  │
│ Diana         ┆ 28           ┆ Tokyo    ┆ 75000  │
│ Eve           ┆ 32           ┆ Sydney   ┆ 85000  │
└───────────────┴──────────────┴──────────┴────────┘


### 6.3 Dropping Columns

In [16]:
# Drop columns
df_dropped = df.drop(['city', 'salary'])

print("DataFrame with dropped columns:")
print(df_dropped)

DataFrame with dropped columns:
shape: (5, 2)
┌─────────┬─────┐
│ name    ┆ age │
│ ---     ┆ --- │
│ str     ┆ i64 │
╞═════════╪═════╡
│ Alice   ┆ 25  │
│ Bob     ┆ 30  │
│ Charlie ┆ 35  │
│ Diana   ┆ 28  │
│ Eve     ┆ 32  │
└─────────┴─────┘


## 7. Sorting Data

In [17]:
# Sort by a single column
print("Sorted by age (ascending):")
print(df.sort('age'))

# Sort by a single column (descending)
print("\nSorted by salary (descending):")
print(df.sort('salary', descending=True))

# Sort by multiple columns
print("\nSorted by city then age:")
print(df.sort(['city', 'age']))

Sorted by age (ascending):
shape: (5, 4)
┌─────────┬─────┬──────────┬────────┐
│ name    ┆ age ┆ city     ┆ salary │
│ ---     ┆ --- ┆ ---      ┆ ---    │
│ str     ┆ i64 ┆ str      ┆ i64    │
╞═════════╪═════╪══════════╪════════╡
│ Alice   ┆ 25  ┆ New York ┆ 70000  │
│ Diana   ┆ 28  ┆ Tokyo    ┆ 75000  │
│ Bob     ┆ 30  ┆ London   ┆ 80000  │
│ Eve     ┆ 32  ┆ Sydney   ┆ 85000  │
│ Charlie ┆ 35  ┆ Paris    ┆ 90000  │
└─────────┴─────┴──────────┴────────┘

Sorted by salary (descending):
shape: (5, 4)
┌─────────┬─────┬──────────┬────────┐
│ name    ┆ age ┆ city     ┆ salary │
│ ---     ┆ --- ┆ ---      ┆ ---    │
│ str     ┆ i64 ┆ str      ┆ i64    │
╞═════════╪═════╪══════════╪════════╡
│ Charlie ┆ 35  ┆ Paris    ┆ 90000  │
│ Eve     ┆ 32  ┆ Sydney   ┆ 85000  │
│ Bob     ┆ 30  ┆ London   ┆ 80000  │
│ Diana   ┆ 28  ┆ Tokyo    ┆ 75000  │
│ Alice   ┆ 25  ┆ New York ┆ 70000  │
└─────────┴─────┴──────────┴────────┘

Sorted by city then age:
shape: (5, 4)
┌─────────┬─────┬──────────┬────────┐

## 8. Handling Missing Values

In [18]:
# Create DataFrame with null values
df_nulls = pl.DataFrame({
    'a': [1, 2, None, 4, 5],
    'b': [None, 2.5, 3.5, 4.5, 5.5],
    'c': ['x', 'y', None, 'z', 'w']
})

print("DataFrame with null values:")
print(df_nulls)

# Check for null values
print("\nNull count per column:")
print(df_nulls.null_count())

# Fill null values
print("\nFill nulls with specific values:")
print(df_nulls.fill_null(0))

# Drop rows with null values
print("\nDrop rows with any null:")
print(df_nulls.drop_nulls())

# Fill null with forward fill
print("\nForward fill:")
print(df_nulls.fill_null(strategy='forward'))

DataFrame with null values:
shape: (5, 3)
┌──────┬──────┬──────┐
│ a    ┆ b    ┆ c    │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ f64  ┆ str  │
╞══════╪══════╪══════╡
│ 1    ┆ null ┆ x    │
│ 2    ┆ 2.5  ┆ y    │
│ null ┆ 3.5  ┆ null │
│ 4    ┆ 4.5  ┆ z    │
│ 5    ┆ 5.5  ┆ w    │
└──────┴──────┴──────┘

Null count per column:
shape: (1, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ u32 ┆ u32 ┆ u32 │
╞═════╪═════╪═════╡
│ 1   ┆ 1   ┆ 1   │
└─────┴─────┴─────┘

Fill nulls with specific values:
shape: (5, 3)
┌─────┬─────┬──────┐
│ a   ┆ b   ┆ c    │
│ --- ┆ --- ┆ ---  │
│ i64 ┆ f64 ┆ str  │
╞═════╪═════╪══════╡
│ 1   ┆ 0.0 ┆ x    │
│ 2   ┆ 2.5 ┆ y    │
│ 0   ┆ 3.5 ┆ null │
│ 4   ┆ 4.5 ┆ z    │
│ 5   ┆ 5.5 ┆ w    │
└─────┴─────┴──────┘

Drop rows with any null:
shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str │
╞═════╪═════╪═════╡
│ 2   ┆ 2.5 ┆ y   │
│ 4   ┆ 4.5 ┆ z   │
│ 5   ┆ 5.5 ┆ w   │
└─────┴─────┴─────┘

Forward fill:
shape: (5, 3)

## 9. Summary

In this notebook, we covered:
- ✅ Installing Polars
- ✅ Creating DataFrames from various sources
- ✅ Basic operations (selecting, filtering, sorting)
- ✅ Understanding data types
- ✅ Adding, modifying, and dropping columns
- ✅ Handling missing values

### Key Takeaways:
1. Polars uses `pl.col()` for column references in expressions
2. DataFrame methods are chainable and return new DataFrames (immutable)
3. Polars has a rich set of data types optimized for performance
4. The API is designed to be explicit and type-safe

**Next:** In the next notebook, we'll explore data import and export with various file formats!