# Polars Cookbook in Rust

This notebook contains short and sweet examples for useful Polars recipes. It is similar to [Pandas' Cookbook](https://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html), and some examples are taken from there.

Note that I'm a beginner in Polars and Rust in general and I created this as an exercise for myself. I'm sure this notebook contains many errors and/or inefficient codes, feel free to submit PR to fix them.

## Installation

Add to `Cargo.toml`:

```
[dependencies]
polars = { version = "0.30.0", features = ["lazy"]}
```

**Important**: you do need to specify the "lazy" feature.

For more info, see [Installation (Polar user's guide)](https://pola-rs.github.io/polars-book/user-guide/installation/). 

For this notebook, we need to specify this instead:

In [2]:
:dep polars = { version = "0.31.0", features = ["lazy", "parquet"]}

## Importing


In [3]:
use polars::prelude::*;

Note: I tried to use `use polars::prelude as pl` construct but immediately some macros don't work.

## Creating Series

Reference:

- [Series (Polars eager API cookbook)](https://pola-rs.github.io/polars/polars/docs/eager/index.html#series)

### From iterator

In [4]:
{
    let itr = (0..5).map(Some);
    println!("{:?}", itr);
    
    let mut s: Series = itr.collect();
    s.rename("foo");
    println!("\n{}", s);
}

Map { iter: 0..5 }

shape: (5,)
Series: 'foo' [i32]
[
	0
	1
	2
	3
	4
]


()

### From slices

In [127]:
{
    Series::new("foo", &[Some("hello"), Some("world!"), None])
}

shape: (3,)
Series: 'foo' [str]
[
	"hello"
	"world!"
	null
]

### From chunked-Array

In [128]:
{
    let ca = UInt32Chunked::new("foo", &[Some(1), None, Some(3)]);
    ca.into_series()
}

shape: (3,)
Series: 'foo' [u32]
[
	1
	null
	3
]

## Creating DataFrame

References:

- [DataFrame (Polars API reference)](https://pola-rs.github.io/polars/polars/frame/struct.DataFrame.html)


### Creating empty DataFrame

In [7]:
{
    let df = DataFrame::default();
    
    assert!(df.is_empty());
    
    df
}


shape: (0, 0)
┌┐
╞╡
└┘

I think empty DataFrame is not useful since AFAIK there's no way to add row one by one. Ultimately, you would need to construct a DataFrame from array or Series using `df!` or `DataFrame::new` below.

### With df! macro

In [5]:
let df01 = df! (
    "AAA" => &[4, 5, 6, 7],
    "BBB" => &[Some("hello"), Some("world"), Some("!"), None],
    "CCC" => &[1.1, 2.2, 3.3, 4.4],
)?;

println!("{}", df01);

shape: (4, 3)
┌─────┬───────┬─────┐
│ AAA ┆ BBB   ┆ CCC │


### From Series


In [6]:
let df02 = {
    let aaa = Series::new("AAA", [4, 5, 6, 7]);
    let bbb = Series::new("BBB", [Some("hello"), Some("world"), Some("!"), None]);
    let ccc = Series::new("CCC", [1.1, 2.2, 3.3, 4.4]);

    DataFrame::new(vec![aaa, bbb, ccc])?
};

assert_eq!(df02, df01);

println!("{}", df02);

│ --- ┆ ---   ┆ --- │
│ i32 ┆ str   ┆ f64 │
╞═════╪═══════╪═════╡
│ 4   ┆ hello ┆ 1.1 │
│ 5   ┆ world ┆ 2.2 │
│ 6   ┆ !     ┆ 3.3 │
│ 7   ┆ null  ┆ 4.4 │
└─────┴───────┴─────┘
shape: (4, 3)
┌─────┬───────┬─────┐
│ AAA ┆ BBB   ┆ CCC │
│ --- ┆ ---   ┆ --- │
│ i32 ┆ str   ┆ f64 │
╞═════╪═══════╪═════╡
│ 4   ┆ hello ┆ 1.1 │
│ 5   ┆ world ┆ 2.2 │
│ 6   ┆ !     ┆ 3.3 │
│ 7   ┆ null  ┆ 4.4 │
└─────┴───────┴─────┘


### From array of struct/dict

AFAIK you can't (as of v0.30.0). Ultimately you need to convert those to list of values to be fed to `df!` or `DataFrame::new` API above.

References:

- https://stackoverflow.com/questions/73167416/creating-polars-dataframe-from-vecstruct?rq=3
- https://stackoverflow.com/questions/69112232/rust-dataframe-in-polars-using-a-vector-of-structs?rq=3

## DataFrame structure

Reference:

- [DataFrame (Polars API reference)](https://pola-rs.github.io/polars/polars/frame/struct.DataFrame.html)
- [DataFrame attributes (Polars Python API Reference)](https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/attributes.html)

### Column names

In [78]:
df01.get_column_names()

["AAA", "BBB", "CCC"]

### Schema

The [DataFrame.schema()](https://pola-rs.github.io/polars/polars/frame/struct.DataFrame.html#method.schema) method returns Schema object:

In [142]:
println!("{:?}", df01.schema());

Schema:
name: AAA, data type: Int32
name: BBB, data type: Utf8
name: CCC, data type: Float64



### Shape, row count, column count

In [114]:
df01.shape()

(4, 3)

In [115]:
df01.height()

4

In [116]:
df01.width()

3

### is_empty()

In [23]:
df01.is_empty()

false

### Add/replace column

The [DataFrame.with_column()](https://pola-rs.github.io/polars/polars/frame/struct.DataFrame.html#method.with_column) adds a new column or replace an existing one.

In [136]:
{
    let mut df = df01.clone();
    let ddd = Series::new("DDD", ["d0", "d1", "d2", "d3"]);
    
    // Add new column
    df.with_column(ddd).unwrap();
    println!("new column:\n{}", df);
    
    // Replace column
    let ccc = Series::new("CCC", ["c0", "c1", "c2", "c3"]);
    df.with_column(ccc).unwrap();
    println!("\nreplaced column:\n{}", df);
}

new column:
shape: (4, 4)
┌─────┬───────┬─────┬─────┐
│ AAA ┆ BBB   ┆ CCC ┆ DDD │
│ --- ┆ ---   ┆ --- ┆ --- │
│ i32 ┆ str   ┆ f64 ┆ str │
╞═════╪═══════╪═════╪═════╡
│ 4   ┆ hello ┆ 1.1 ┆ d0  │
│ 5   ┆ world ┆ 2.2 ┆ d1  │
│ 6   ┆ !     ┆ 3.3 ┆ d2  │
│ 7   ┆ null  ┆ 4.4 ┆ d3  │
└─────┴───────┴─────┴─────┘

replaced column:
shape: (4, 4)
┌─────┬───────┬─────┬─────┐
│ AAA ┆ BBB   ┆ CCC ┆ DDD │
│ --- ┆ ---   ┆ --- ┆ --- │
│ i32 ┆ str   ┆ str ┆ str │
╞═════╪═══════╪═════╪═════╡
│ 4   ┆ hello ┆ c0  ┆ d0  │
│ 5   ┆ world ┆ c1  ┆ d1  │
│ 6   ┆ !     ┆ c2  ┆ d2  │
│ 7   ┆ null  ┆ c3  ┆ d3  │
└─────┴───────┴─────┴─────┘


()

### Add/replace columns (lazy API)

In [138]:
{
    let mut df = df01.clone();
    println!("before:\n{}\n", df);
    
    let ccc = Series::new("CCC", ["c0", "c1", "c2", "c3"]);
    let ddd = Series::new("DDD", ["d0", "d1", "d2", "d3"]);
    
    let new_df = df.lazy()
                   .with_columns([ccc.lit(), ddd.lit()])
                   .collect()
                   .unwrap();
        
    println!("after:\n{}", new_df);
}

before:
shape: (4, 3)
┌─────┬───────┬─────┐
│ AAA ┆ BBB   ┆ CCC │
│ --- ┆ ---   ┆ --- │
│ i32 ┆ str   ┆ f64 │
╞═════╪═══════╪═════╡
│ 4   ┆ hello ┆ 1.1 │
│ 5   ┆ world ┆ 2.2 │
│ 6   ┆ !     ┆ 3.3 │
│ 7   ┆ null  ┆ 4.4 │
└─────┴───────┴─────┘

after:
shape: (4, 4)
┌─────┬───────┬─────┬─────┐
│ AAA ┆ BBB   ┆ CCC ┆ DDD │
│ --- ┆ ---   ┆ --- ┆ --- │
│ i32 ┆ str   ┆ str ┆ str │
╞═════╪═══════╪═════╪═════╡
│ 4   ┆ hello ┆ c0  ┆ d0  │
│ 5   ┆ world ┆ c1  ┆ d1  │
│ 6   ┆ !     ┆ c2  ┆ d2  │
│ 7   ┆ null  ┆ c3  ┆ d3  │
└─────┴───────┴─────┴─────┘


()

### Rename column names

In [117]:
{
    let mut df = df01.clone();
    let cols = df.get_column_names_owned();
    println!("old col names: {:?}", cols);
    
    for col in cols {
        let new_col = format!("new-{}", col);
        df.rename(&col, &new_col)?;
    }
    
    println!("new col names: {:?}", df.get_column_names());
}

old col names: ["AAA", "BBB", "CCC"]
new col names: ["new-AAA", "new-BBB", "new-CCC"]


()

### Drop column

In [144]:
{
    let new_df = df01.drop("AAA").unwrap();
    println!("new_df:\n{:?}\n", new_df);
}

println!("Original df unaffected:\n{:?}", df01);

new_df:
shape: (4, 2)
┌───────┬─────┐
│ BBB   ┆ CCC │
│ ---   ┆ --- │
│ str   ┆ f64 │
╞═══════╪═════╡
│ hello ┆ 1.1 │
│ world ┆ 2.2 │
│ !     ┆ 3.3 │
│ null  ┆ 4.4 │
└───────┴─────┘

Original df unaffected:
shape: (4, 3)
┌─────┬───────┬─────┐
│ AAA ┆ BBB   ┆ CCC │
│ --- ┆ ---   ┆ --- │
│ i32 ┆ str   ┆ f64 │
╞═════╪═══════╪═════╡
│ 4   ┆ hello ┆ 1.1 │
│ 5   ┆ world ┆ 2.2 │
│ 6   ┆ !     ┆ 3.3 │
│ 7   ┆ null  ┆ 4.4 │
└─────┴───────┴─────┘


### Drop columns

In [131]:
{
    let cols = vec!["AAA", "BBB"];
    let new_df = df01.drop_many(&cols);
    println!("new_df:\n{:?}\n", new_df);
}

println!("Original df unaffected:\n{:?}", df01);

new_df:
shape: (4, 1)
┌─────┐
│ CCC │
│ --- │
│ f64 │
╞═════╡
│ 1.1 │
│ 2.2 │
│ 3.3 │
│ 4.4 │
└─────┘

Original df unaffected:
shape: (4, 3)
┌─────┬───────┬─────┐
│ AAA ┆ BBB   ┆ CCC │
│ --- ┆ ---   ┆ --- │
│ i32 ┆ str   ┆ f64 │
╞═════╪═══════╪═════╡
│ 4   ┆ hello ┆ 1.1 │
│ 5   ┆ world ┆ 2.2 │
│ 6   ┆ !     ┆ 3.3 │
│ 7   ┆ null  ┆ 4.4 │
└─────┴───────┴─────┘


### Drop column in place

And return the removed column.

In [111]:
{
    let mut df = df01.clone();
    let ser = df.drop_in_place("AAA").unwrap();
    
    println!("Original df after drop():\n{:?}\n", df);

    ser
}

Original df after drop():
shape: (4, 2)
┌───────┬─────┐
│ BBB   ┆ CCC │
│ ---   ┆ --- │
│ str   ┆ f64 │
╞═══════╪═════╡
│ hello ┆ 1.1 │
│ world ┆ 2.2 │
│ !     ┆ 3.3 │
│ null  ┆ 4.4 │
└───────┴─────┘



shape: (4,)
Series: 'AAA' [i32]
[
	4
	5
	6
	7
]

## Input/Output

### CSV

Reference:

- [CSV (Polars User Guide)](https://pola-rs.github.io/polars-book/user-guide/io/csv/)


#### Writing to CSV

In [110]:
{
    let mut hout = std::fs::File::create("sample.csv")
                                 .expect("could not create file");
    let mut df = df01.clone();
    CsvWriter::new(&mut hout).has_header(true)
                             .with_delimiter(b',')
                             .finish(&mut df)
                             .expect("CSV write error");
}

()

#### Reading from CSV

In [103]:
{
    let df = CsvReader::from_path("sample.csv")
                        .unwrap()
                        .has_header(true)
                        .finish()
                        .unwrap();
    println!("{}", df);
}

shape: (4, 3)
┌─────┬───────┬─────┐
│ AAA ┆ BBB   ┆ CCC │
│ --- ┆ ---   ┆ --- │
│ i64 ┆ str   ┆ f64 │
╞═════╪═══════╪═════╡
│ 4   ┆ hello ┆ 1.1 │
│ 5   ┆ world ┆ 2.2 │
│ 6   ┆ !     ┆ 3.3 │
│ 7   ┆ null  ┆ 4.4 │
└─────┴───────┴─────┘


()

#### Lazy reading CSV


In [104]:
{
    let lazy_df : LazyFrame = LazyCsvReader::new("sample.csv")
                       .finish()
                       .unwrap();
    
    // .. do something with lazy_df
    
    let df = lazy_df
                 .collect()
                 .unwrap();
    println!("{}", df);
}

shape: (4, 3)
┌─────┬───────┬─────┐
│ AAA ┆ BBB   ┆ CCC │
│ --- ┆ ---   ┆ --- │
│ i64 ┆ str   ┆ f64 │
╞═════╪═══════╪═════╡
│ 4   ┆ hello ┆ 1.1 │
│ 5   ┆ world ┆ 2.2 │
│ 6   ┆ !     ┆ 3.3 │
│ 7   ┆ null  ┆ 4.4 │
└─────┴───────┴─────┘


()

### Parquet

**Note**: you must add feature `parquet` in Polars dependency.

References:

- [Parquet (Polars User Guide)](https://pola-rs.github.io/polars-book/user-guide/io/parquet/)

#### Write parquet

In [7]:
{
    let file = std::fs::File::create("sample.parquet")
                             .expect("could not create file");
    let mut df = df01.clone();
    ParquetWriter::new(file)
                  .finish(&mut df);
}

()

#### Read parquet

In [8]:
{
    let mut file = std::fs::File::open("sample.parquet").unwrap();

    let df = ParquetReader::new(&mut file).finish().unwrap();
    df
}

shape: (4, 3)
┌─────┬───────┬─────┐
│ AAA ┆ BBB   ┆ CCC │
│ --- ┆ ---   ┆ --- │
│ i32 ┆ str   ┆ f64 │
╞═════╪═══════╪═════╡
│ 4   ┆ hello ┆ 1.1 │
│ 5   ┆ world ┆ 2.2 │
│ 6   ┆ !     ┆ 3.3 │
│ 7   ┆ null  ┆ 4.4 │
└─────┴───────┴─────┘

#### Lazy reading parquet

In [9]:
{
    let args = ScanArgsParquet::default();
    let lazy_df : LazyFrame = LazyFrame::scan_parquet("sample.parquet", args)
                                        .unwrap();

    
    // .. do something with lazy_df, collect() when done.
    
    let df = lazy_df
                 .collect()
                 .unwrap();
    df
}

shape: (4, 3)
┌─────┬───────┬─────┐
│ AAA ┆ BBB   ┆ CCC │
│ --- ┆ ---   ┆ --- │
│ i32 ┆ str   ┆ f64 │
╞═════╪═══════╪═════╡
│ 4   ┆ hello ┆ 1.1 │
│ 5   ┆ world ┆ 2.2 │
│ 6   ┆ !     ┆ 3.3 │
│ 7   ┆ null  ┆ 4.4 │
└─────┴───────┴─────┘


shape: (4, 3)
┌─────┬───────┬─────┐
│ AAA ┆ BBB   ┆ CCC │
│ --- ┆ ---   ┆ --- │
│ i32 ┆ str   ┆ f64 │
╞═════╪═══════╪═════╡
│ 4   ┆ hello ┆ 1.1 │
│ 5   ┆ world ┆ 2.2 │
│ 6   ┆ !     ┆ 3.3 │
│ 7   ┆ null  ┆ 4.4 │
└─────┴───────┴─────┘

## Display

### Controlling maximum number of rows/columns displayed

See: [Config with ENV vars (Polars API reference)](https://pola-rs.github.io/polars/polars/index.html#config-with-env-vars)

In [235]:
std::env::set_var("POLARS_FMT_MAX_ROWS", "20"); // Set to -1 to show all
std::env::set_var("POLARS_FMT_MAX_COLUMNS", "20");

### head, tail

In [113]:
df01.head(Some(2))

shape: (2, 3)
┌─────┬───────┬─────┐
│ AAA ┆ BBB   ┆ CCC │
│ --- ┆ ---   ┆ --- │
│ i32 ┆ str   ┆ f64 │
╞═════╪═══════╪═════╡
│ 4   ┆ hello ┆ 1.1 │
│ 5   ┆ world ┆ 2.2 │
└─────┴───────┴─────┘

In [114]:
df01.tail(Some(2))

shape: (2, 3)
┌─────┬──────┬─────┐
│ AAA ┆ BBB  ┆ CCC │
│ --- ┆ ---  ┆ --- │
│ i32 ┆ str  ┆ f64 │
╞═════╪══════╪═════╡
│ 6   ┆ !    ┆ 3.3 │
│ 7   ┆ null ┆ 4.4 │
└─────┴──────┴─────┘

## Select (Eager API)

### Get item at row/column index

The Rust interface doesn't have *Series.item()* method like in the Python API, so this is how to do it AFAIK.

References:

- [AnyValue enum (Polars API reference)](https://pola-rs.github.io/polars/polars/datatypes/enum.AnyValue.html)

In [121]:
{
    let col: usize = 1;
    let row: usize = 2;
    
    // Get the Series for the column
    let ser = &df01[col];
    
    // Explicitly typecast to get ChunkedArray with i16(), i32(), f32(), utf8(), etc.
    let ca = ser.utf8().unwrap();  
    
    // Get the item from ChunkedArray. val is AnyValue<T>
    let val = ca.get_any_value(row).unwrap();

    // Extract string from val, and clone to return the value
    let s = String::from( val.get_str().unwrap() );
    
    s
}

"!"

### Getting a column

#### By the name

In [109]:
df01.column("AAA").unwrap()

shape: (4,)
Series: 'AAA' [i32]
[
	4
	5
	6
	7
]

#### By the index

In [107]:
&df01[0]

shape: (4,)
Series: 'AAA' [i32]
[
	4
	5
	6
	7
]

### Select multiple columns

In [200]:
{
    let df = df01.clone();

    df.select(["AAA", "CCC"])
       .unwrap()
}

shape: (4, 2)
┌─────┬─────┐
│ AAA ┆ CCC │
│ --- ┆ --- │
│ i32 ┆ f64 │
╞═════╪═════╡
│ 4   ┆ 1.1 │
│ 5   ┆ 2.2 │
│ 6   ┆ 3.3 │
│ 7   ┆ 4.4 │
└─────┴─────┘

### DataFrame Aggregation

For more DataFrame aggregation methods, see:

- [DataFrame aggregation (Python API Reference)](https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/aggregation.html)

In [113]:
df01.max()

shape: (1, 3)
┌─────┬───────┬─────┐
│ AAA ┆ BBB   ┆ CCC │
│ --- ┆ ---   ┆ --- │
│ i32 ┆ str   ┆ f64 │
╞═════╪═══════╪═════╡
│ 7   ┆ world ┆ 4.4 │
└─────┴───────┴─────┘

### Eager arithmetic

References:

- [Arithmetic (Polars API reference)](https://pola-rs.github.io/polars/polars/docs/eager/index.html#arithmetic)

Tips: the lazy API provides more powerful expression (see next section).

In [244]:
{
    let s_int = Series::new("s_int", &[1,   2,   3]);
    let s_flt = Series::new("s_flt", &[1.0, 2.0, 3.0]);

    DataFrame::new(vec![
        s_int.clone(),
        
        Series::new("add",  &s_int + &s_flt),
        Series::new("sub",  &s_int - &s_flt),
        Series::new("mul",  &s_int * &s_flt),
        Series::new("div",  &s_int / &s_flt),
        Series::new("mod",  &s_int % &s_flt),
        
        // Left side operations
        Series::new("30/s", 30.div(&s_int)),
        Series::new("10-s", 10.sub(&s_int)),
    ])?
}

shape: (3, 8)
┌───────┬─────┬─────┬─────┬─────┬─────┬──────┬──────┐
│ s_int ┆ add ┆ sub ┆ mul ┆ div ┆ mod ┆ 30/s ┆ 10-s │
│ ---   ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ ---  ┆ ---  │
│ i32   ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ i32  ┆ i32  │
╞═══════╪═════╪═════╪═════╪═════╪═════╪══════╪══════╡
│ 1     ┆ 2.0 ┆ 0.0 ┆ 1.0 ┆ 1.0 ┆ 0.0 ┆ 30   ┆ 9    │
│ 2     ┆ 4.0 ┆ 0.0 ┆ 4.0 ┆ 1.0 ┆ 0.0 ┆ 15   ┆ 8    │
│ 3     ┆ 6.0 ┆ 0.0 ┆ 9.0 ┆ 1.0 ┆ 0.0 ┆ 10   ┆ 7    │
└───────┴─────┴─────┴─────┴─────┴─────┴──────┴──────┘

## Select (lazy API)

### LazyFrame select

Use [LazyFrame::select()](https://docs.rs/polars/latest/polars/prelude/struct.LazyFrame.html#method.select) method. Columns can be selected with [**col**](https://docs.rs/polars/latest/polars/prelude/fn.col.html). Use `col("*")` to select all columns. You can also select columns [by regular expression](https://pola-rs.github.io/polars-book/user-guide/expressions/column_selections/#by-regular-expressions).

In [21]:
{
    let df = df01.clone();
    
    df.lazy()
      .select([
          col("AAA"),
          col("BBB"),
          col("CCC"),
      ])
      .collect()
      .unwrap()
}

shape: (4, 3)
┌─────┬───────┬─────┐
│ AAA ┆ BBB   ┆ CCC │
│ --- ┆ ---   ┆ --- │
│ i32 ┆ str   ┆ f64 │
╞═════╪═══════╪═════╡
│ 4   ┆ hello ┆ 1.1 │
│ 5   ┆ world ┆ 2.2 │
│ 6   ┆ !     ┆ 3.3 │
│ 7   ┆ null  ┆ 4.4 │
└─────┴───────┴─────┘

### Select expression

Columns can be built independently with expressions inside the select construct.

In [23]:
{
    let df = df01.clone();
    
    df.lazy()
      .select([
          col("AAA"),
          col("BBB").sort(false),
          col("BBB").first().alias("first B"),
          (col("CCC") * lit(10)).alias("10xC"),
      ])
      .collect()
      .unwrap()
}    


shape: (4, 4)
┌─────┬───────┬─────────┬──────┐
│ AAA ┆ BBB   ┆ first B ┆ 10xC │
│ --- ┆ ---   ┆ ---     ┆ ---  │
│ i32 ┆ str   ┆ str     ┆ f64  │
╞═════╪═══════╪═════════╪══════╡
│ 4   ┆ null  ┆ hello   ┆ 11.0 │
│ 5   ┆ !     ┆ hello   ┆ 22.0 │
│ 6   ┆ hello ┆ hello   ┆ 33.0 │
│ 7   ┆ world ┆ hello   ┆ 44.0 │
└─────┴───────┴─────────┴──────┘

### Select with `with_columns`

[LazyFrame::with_columns()](https://docs.rs/polars/latest/polars/prelude/struct.LazyFrame.html#method.with_columns) is similar to select(). The main difference is that `with_columns` retains the original columns and adds new ones while select drops the original columns.

In [26]:
{
    let df = df01.clone();
    
    df.lazy()
      .with_columns([
          col("AAA").sort(true).alias("sorted-AAA"),
          col("BBB").sort(false).alias("sorted-BBB"),
      ])
      .collect()
      .unwrap()
}    


shape: (4, 5)
┌─────┬───────┬─────┬────────────┬────────────┐
│ AAA ┆ BBB   ┆ CCC ┆ sorted-AAA ┆ sorted-BBB │
│ --- ┆ ---   ┆ --- ┆ ---        ┆ ---        │
│ i32 ┆ str   ┆ f64 ┆ i32        ┆ str        │
╞═════╪═══════╪═════╪════════════╪════════════╡
│ 4   ┆ hello ┆ 1.1 ┆ 7          ┆ null       │
│ 5   ┆ world ┆ 2.2 ┆ 6          ┆ !          │
│ 6   ┆ !     ┆ 3.3 ┆ 5          ┆ hello      │
│ 7   ┆ null  ┆ 4.4 ┆ 4          ┆ world      │
└─────┴───────┴─────┴────────────┴────────────┘

## Iterating rows

- https://stackoverflow.com/questions/72440403/iterate-over-rows-polars-rust?rq=3

In [None]:
{
    let df = df01.clone();
    
    let rows = df.iter_rows();
    for row in rows {
        println!("{}", row);
    }
}

Method #2:

In [69]:
{
    let df = df01.clone();
    
    // "row_count_df" is dataframe with 1 row/column, containing the row count
    let row_count_df = df.lazy()
                         .select([
                             count()
                          ])
                         .collect()
                         .unwrap();
    
    let row_count = match row_count_df[0].sum::<i64>() {
        Some(val) => val,
        _ => 0 as i64
    };
    
    row_count
}

4

Method #3:

In [71]:
{
    let df = df01.clone();
    
    // "row_count_df" is dataframe with 1 row/column, containing the row count
    let row_count_df = df.lazy()
                         .select([
                             count()
                          ])
                         .collect()
                         .unwrap();
    
    let row_count = row_count_df[0]  // 1st column
                        .u32()       // unpack to Result< ChunkedArray<u32> >
                        .unwrap()    // etc..
                        .get(0)
                        .unwrap();
    
    row_count
}

4

## Filter (Eager API)

References:

- [Comparisons (Eager API cookbook)](https://pola-rs.github.io/polars/polars/docs/eager/index.html#comparisons)
- [Series methods (Polars API reference)](https://pola-rs.github.io/polars/polars/series/struct.Series.html#method.is_not_nan)

### Creating mask


In [207]:
{
    let aaa = df01.column("AAA")?;
    println!("AAA:\n{:?}\n", aaa);
    
    let mask_vec = vec![
        Series::new("equal(5)",     aaa.equal(5)? ),
        Series::new("not_equal(5)", aaa.not_equal(5)? ),
        Series::new("lt(5)",        aaa.lt(5)? ),
        Series::new("lt_eq(5)",     aaa.lt_eq(5)? ),
        Series::new("gt(5)",        aaa.gt(5)? ),
        Series::new("gt_eq(5)",     aaa.gt_eq(5)? ),
        
        // User defined
        Series::new("user",         vec![false, false, false, true] ),
    ];
    
    println!("Sample masks:");
    
    DataFrame::new(mask_vec).unwrap()
}

AAA:
shape: (4,)
Series: 'AAA' [i32]
[
	4
	5
	6
	7
]

Sample masks:


shape: (4, 7)
┌──────────┬──────────────┬───────┬──────────┬───────┬──────────┬───────┐
│ equal(5) ┆ not_equal(5) ┆ lt(5) ┆ lt_eq(5) ┆ gt(5) ┆ gt_eq(5) ┆ user  │
│ ---      ┆ ---          ┆ ---   ┆ ---      ┆ ---   ┆ ---      ┆ ---   │
│ bool     ┆ bool         ┆ bool  ┆ bool     ┆ bool  ┆ bool     ┆ bool  │
╞══════════╪══════════════╪═══════╪══════════╪═══════╪══════════╪═══════╡
│ false    ┆ true         ┆ true  ┆ true     ┆ false ┆ false    ┆ false │
│ true     ┆ false        ┆ false ┆ true     ┆ false ┆ true     ┆ false │
│ false    ┆ true         ┆ false ┆ false    ┆ true  ┆ true     ┆ false │
│ false    ┆ true         ┆ false ┆ false    ┆ true  ┆ true     ┆ true  │
└──────────┴──────────────┴───────┴──────────┴───────┴──────────┴───────┘

### Filter Series using mask

In [197]:
{
    let aaa = df01.column("AAA")?;
    
    let mask = aaa.equal(5).unwrap();
    println!("mask:\n{:?}\n", mask);
    
    df01.column("BBB")?
        .filter(&mask)
        .unwrap()
}

mask:
shape: (4,)
ChunkedArray: 'AAA' [bool]
[
	false
	true
	false
	false
]



shape: (1,)
Series: 'BBB' [str]
[
	"world"
]

### Filter DataFrame using mask

In [198]:
{
    let aaa = df01.column("AAA")?;
    
    let mask = aaa.equal(5).unwrap();
    println!("mask:\n{:?}\n", mask);
    
    df01.filter(&mask)
        .unwrap()
}

mask:
shape: (4,)
ChunkedArray: 'AAA' [bool]
[
	false
	true
	false
	false
]



shape: (1, 3)
┌─────┬───────┬─────┐
│ AAA ┆ BBB   ┆ CCC │
│ --- ┆ ---   ┆ --- │
│ i32 ┆ str   ┆ f64 │
╞═════╪═══════╪═════╡
│ 5   ┆ world ┆ 2.2 │
└─────┴───────┴─────┘

### DataFrame mask from multiple columns

TBD

## Filter (Lazy API)

In [27]:
{
    let df = df![
        "a" => [1, 2, 3],
        "b" => [None, Some("a"), Some("b")]
    ]?;

    df.lazy()
        .filter(col("a").gt(lit(2)))
        .collect()?
}

shape: (1, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i32 ┆ str │
╞═════╪═════╡
│ 3   ┆ b   │
└─────┴─────┘

## Apply functions/closures (Eager API)

References:

- [apply (Eager API cookbook)](https://pola-rs.github.io/polars/polars/docs/eager/index.html#apply-functions-closures)

### Series / ChunkedArrays apply

In [253]:
{
    let foo = Series::new("foo", &[Some(1), Some(2), None]);
    
    // apply a closure over all values
    let mut new_foo = foo.i32()?.apply(|value| value * 20);
    new_foo.rename("new_foo");
    
    println!("{}", foo);
    
    // Notice the type is ChunkedArray
    println!("{:?}", new_foo);
    
    // If you want Series:
    Series::new("new_foo", new_foo)
}

shape: (3,)
Series: 'foo' [i32]
[
	1
	2
	null
]
shape: (3,)
ChunkedArray: 'new_foo' [i32]
[
	20
	40
	null
]


shape: (3,)
Series: 'new_foo' [i32]
[
	20
	40
	null
]

### Series apply and change type

In [255]:
{
    let s = Series::new("foo", &["foo", "bar", "foobar"]);
    
    // count string lengths
    let len_s = s.utf8()?.apply_cast_numeric::<_, UInt64Type>(|str_val| str_val.len() as u64);
    
    // this is ChunkedArray
    len_s
}

shape: (3,)
ChunkedArray: 'foo' [u64]
[
	3
	3
	6
]

### DataFrame apply

In [259]:
{
    let mut df = df![
        "letters" => ["a", "b", "c", "d"],
        "numbers" => [1, 2, 3, 4]
    ]?;

    println!("Original df:\n{}\n", df);

    // coerce numbers to floats
    df.try_apply("numbers", |s: &Series| {
        s.cast(&DataType::Float64)
    })?;

    // transform letters to uppercase letters
    df.try_apply("letters", |s: &Series| {
        Ok(s.utf8()?.to_uppercase())
    })?;
 
    df
}

Original df:
shape: (4, 2)
┌─────────┬─────────┐
│ letters ┆ numbers │
│ ---     ┆ ---     │
│ str     ┆ i32     │
╞═════════╪═════════╡
│ a       ┆ 1       │
│ b       ┆ 2       │
│ c       ┆ 3       │
│ d       ┆ 4       │
└─────────┴─────────┘



shape: (4, 2)
┌─────────┬─────────┐
│ letters ┆ numbers │
│ ---     ┆ ---     │
│ str     ┆ f64     │
╞═════════╪═════════╡
│ A       ┆ 1.0     │
│ B       ┆ 2.0     │
│ C       ┆ 3.0     │
│ D       ┆ 4.0     │
└─────────┴─────────┘

## Sorting (Eager API)

### Sorting DataFrame

With basic [DataFrame::sort()](https://pola-rs.github.io/polars/polars/frame/struct.DataFrame.html#method.sort) method. 

Note: it returns new DataFrame.

In [12]:
{
    let df = df![
        "A" => [1, 2, 3],
        "B" => ["b", "a", "c"]
    ]?;

    let by = &["B", "A"];
    let descending = vec![true, false];

    let sorted_df = df.sort(by, descending, true)?;
    
    println!("Original df after sort:\n{}", df);
    
    sorted_df
}

Original df:
shape: (3, 2)
┌─────┬─────┐
│ A   ┆ B   │
│ --- ┆ --- │
│ i32 ┆ str │
╞═════╪═════╡
│ 1   ┆ b   │
│ 2   ┆ a   │
│ 3   ┆ c   │
└─────┴─────┘


shape: (3, 2)
┌─────┬─────┐
│ A   ┆ B   │
│ --- ┆ --- │
│ i32 ┆ str │
╞═════╪═════╡
│ 3   ┆ c   │
│ 1   ┆ b   │
│ 2   ┆ a   │
└─────┴─────┘

### Sort by single column

Sort by **single column** with [DataFrame::sort_with_options()](https://pola-rs.github.io/polars/polars/frame/struct.DataFrame.html#method.sort_with_options) method:

In [13]:
{
    let df = df![
        "A" => [1, 2, 3],
        "B" => ["b", "a", "c"]
    ]?;

    let sorted_df = df.sort_with_options("B", SortOptions {
        descending: true,
        nulls_last: true,
        multithreaded: false,
        maintain_order: true
    })?;
    
    println!("Original df after sort:\n{}", df);
    
    sorted_df
}

Original df after sort:
shape: (3, 2)
┌─────┬─────┐
│ A   ┆ B   │
│ --- ┆ --- │
│ i32 ┆ str │
╞═════╪═════╡
│ 1   ┆ b   │
│ 2   ┆ a   │
│ 3   ┆ c   │
└─────┴─────┘


shape: (3, 2)
┌─────┬─────┐
│ A   ┆ B   │
│ --- ┆ --- │
│ i32 ┆ str │
╞═════╪═════╡
│ 3   ┆ c   │
│ 1   ┆ b   │
│ 2   ┆ a   │
└─────┴─────┘

### Sorting in-place

Sort by multiple columns in-place with [DataFrame::sort_in_place()](https://pola-rs.github.io/polars/polars/frame/struct.DataFrame.html#method.sort_in_place):

In [37]:
{
    let mut df = df![
        "A" => [1, 2, 3],
        "B" => ["b", "a", "c"]
    ]?;

    let by = ["B", "A"];
    let descending = vec![true, false];
    
    df.sort_in_place(&by, descending, true)?;
    
    println!("{}", df);
}

shape: (3, 2)
┌─────┬─────┐
│ A   ┆ B   │
│ --- ┆ --- │
│ i32 ┆ str │
╞═════╪═════╡
│ 3   ┆ c   │
│ 1   ┆ b   │
│ 2   ┆ a   │
└─────┴─────┘


()

## Sorting (Lazy API)

### Basic LazyFrame sorting

Sort by single column with [LazyFrame::sort()](https://docs.rs/polars/latest/polars/prelude/struct.LazyFrame.html#method.sort) method.

In [17]:
{
    let df = df![
        "A" => [1, 2, 3],
        "B" => ["b", "a", "c"]
    ]?;

    let by = vec![col("B"), col("A")];
    let descending = vec![true, false];

    let sorted_df = df.lazy()
                      .sort("B", Default::default())
                      .collect()?;
    // Note that lazy() takes ownership of df
    sorted_df
}

shape: (3, 2)
┌─────┬─────┐
│ A   ┆ B   │
│ --- ┆ --- │
│ i32 ┆ str │
╞═════╪═════╡
│ 2   ┆ a   │
│ 1   ┆ b   │
│ 3   ┆ c   │
└─────┴─────┘

### Sorting LazyFrame

Use [LazyFrame::sort_by_exprs()](https://docs.rs/polars/latest/polars/prelude/struct.LazyFrame.html#method.sort_by_exprs).

In [16]:
{
    let df = df![
        "A" => [1, 2, 3],
        "B" => ["b", "a", "c"]
    ]?;

    let by = vec![col("B"), col("A")];
    let descending = vec![true, false];

    let sorted_df = df.lazy()
                      .sort_by_exprs(by, descending, false, false)
                      .collect()?;
    
    // Note that lazy() takes ownership of df
    
    sorted_df
}

shape: (3, 2)
┌─────┬─────┐
│ A   ┆ B   │
│ --- ┆ --- │
│ i32 ┆ str │
╞═════╪═════╡
│ 3   ┆ c   │
│ 1   ┆ b   │
│ 2   ┆ a   │
└─────┴─────┘

## Joins (Eager API)

References:

- [left join (Polars API reference)](https://pola-rs.github.io/polars/polars/frame/struct.DataFrame.html#method.left_join)
- [joins (Eeager API cookbook)](https://pola-rs.github.io/polars/polars/docs/eager/index.html#joins)


In [45]:
{
    // Create first df.
    let temp = df!("days" => &[0, 1, 2, 3, 4],
                   "temp" => &[22.1, 19.9, 7., 2., 3.],
                   "other" => &["zero", "uno", "two", "three", "four"]
    )?;

    // Create second df.
    let rain = df!("days" => &[1, 2, 9],
                   "rain" => &[0.1, 0.2, 0.9],
                   "other" => &["one", "two", "nine"]
    )?;

    // join on a single column
    let df = temp.left_join(&rain, ["days"], ["days"])?;
    println!("left join:\n{}\n", df);
    
    let df = temp.inner_join(&rain, ["days"], ["days"])?;
    println!("inner join:\n{}\n", df);
    
    let df = temp.outer_join(&rain, ["days"], ["days"])?;
    println!("outer join:\n{}\n", df);

    // left join on multiple columns
    let df = temp.join(&rain, 
                       vec!["days", "other"], 
                       vec!["days", "other"], 
                       JoinArgs::new(JoinType::Left)
    )?;
    println!("multi-column left join:\n{}\n", df);
    
    // inner join on multiple columns
    let df = temp.join(&rain, 
                       vec!["days", "other"], 
                       vec!["days", "other"], 
                       JoinArgs::new(JoinType::Inner)
    )?;
    println!("multi-column inner join:\n{}\n", df);
    
}

left join:
shape: (5, 5)
┌──────┬──────┬───────┬──────┬─────────────┐
│ days ┆ temp ┆ other ┆ rain ┆ other_right │
│ ---  ┆ ---  ┆ ---   ┆ ---  ┆ ---         │
│ i32  ┆ f64  ┆ str   ┆ f64  ┆ str         │
╞══════╪══════╪═══════╪══════╪═════════════╡
│ 0    ┆ 22.1 ┆ zero  ┆ null ┆ null        │
│ 1    ┆ 19.9 ┆ uno   ┆ 0.1  ┆ one         │
│ 2    ┆ 7.0  ┆ two   ┆ 0.2  ┆ two         │
│ 3    ┆ 2.0  ┆ three ┆ null ┆ null        │
│ 4    ┆ 3.0  ┆ four  ┆ null ┆ null        │
└──────┴──────┴───────┴──────┴─────────────┘

inner join:
shape: (2, 5)
┌──────┬──────┬───────┬──────┬─────────────┐
│ days ┆ temp ┆ other ┆ rain ┆ other_right │
│ ---  ┆ ---  ┆ ---   ┆ ---  ┆ ---         │
│ i32  ┆ f64  ┆ str   ┆ f64  ┆ str         │
╞══════╪══════╪═══════╪══════╪═════════════╡
│ 1    ┆ 19.9 ┆ uno   ┆ 0.1  ┆ one         │
│ 2    ┆ 7.0  ┆ two   ┆ 0.2  ┆ two         │
└──────┴──────┴───────┴──────┴─────────────┘

outer join:
shape: (6, 5)
┌──────┬──────┬───────┬──────┬─────────────┐
│ days ┆ temp ┆ other

()

## GroupBy (Eager)

References:

- [GroupBy struct (Polars API reference)](https://pola-rs.github.io/polars/polars/frame/groupby/struct.GroupBy.html)
- [GroupBy (Polars eager cookbook)](https://pola-rs.github.io/polars/polars/docs/eager/index.html#groupby)

This is the DataFrame for this experiment.

In [48]:
let df = {
    // Create DataFrame
    let dates = &["2020-08-21", "2020-08-21", "2020-08-22", "2020-08-23", "2020-08-22", ];
    let fmt = "%Y-%m-%d";
    let s0 = DateChunked::parse_from_str_slice("date", dates, fmt)
                         .into_series();
    let s1 = Series::new("temp", [20, 10, 7, 9, 1]);
    let s2 = Series::new("rain", [0.2, 0.1, 0.3, 0.1, 0.01]);

    DataFrame::new(vec![s0, s1, s2]).unwrap()
};

df

shape: (5, 3)
┌────────────┬──────┬──────┐
│ date       ┆ temp ┆ rain │
│ ---        ┆ ---  ┆ ---  │
│ date       ┆ i32  ┆ f64  │
╞════════════╪══════╪══════╡
│ 2020-08-21 ┆ 20   ┆ 0.2  │
│ 2020-08-21 ┆ 10   ┆ 0.1  │
│ 2020-08-22 ┆ 7    ┆ 0.3  │
│ 2020-08-23 ┆ 9    ┆ 0.1  │
│ 2020-08-22 ┆ 1    ┆ 0.01 │
└────────────┴──────┴──────┘

### Group-by then aggregate

Below are basic construct for group-by followed by aggregation. See [GroupBy struct (Polars reference)](https://pola-rs.github.io/polars/polars/frame/groupby/struct.GroupBy.html) for more aggregation methods (min, max, sum, etc.).

In [49]:
df.groupby(["date"])?
  .select(&["temp", "rain"])
  .mean()?

shape: (3, 3)
┌────────────┬───────────┬───────────┐
│ date       ┆ temp_mean ┆ rain_mean │
│ ---        ┆ ---       ┆ ---       │
│ date       ┆ f64       ┆ f64       │
╞════════════╪═══════════╪═══════════╡
│ 2020-08-21 ┆ 15.0      ┆ 0.15      │
│ 2020-08-23 ┆ 9.0       ┆ 0.1       │
│ 2020-08-22 ┆ 4.0       ┆ 0.155     │
└────────────┴───────────┴───────────┘

### Group members index

In [50]:
df.groupby(["date"])?
  .groups()?

shape: (3, 2)
┌────────────┬───────────┐
│ date       ┆ groups    │
│ ---        ┆ ---       │
│ date       ┆ list[u32] │
╞════════════╪═══════════╡
│ 2020-08-21 ┆ [0, 1]    │
│ 2020-08-22 ┆ [2, 4]    │
│ 2020-08-23 ┆ [3]       │
└────────────┴───────────┘

### Apply function to group

The function can return arbitrary DataFrame.

In [64]:
{
    fn func(subset: DataFrame) -> Result<DataFrame, PolarsError> {
        let 
        let min_temp : i32 = subset.column("temp")?.min().unwrap();
        let max_temp : i32 = subset.column("temp")?.max().unwrap();
        let avg_temp : f64 = subset.column("temp")?.mean().unwrap();
        let total_rain : f64 = subset.column("rain")?.sum().unwrap();
        
        let result = DataFrame::new(
            vec![
                Series::new("date",       vec![date] ),
                Series::new("min_temp",   vec![min_temp] ),
                Series::new("max_temp",   vec![max_temp] ),
                Series::new("avg_temp",   vec![avg_temp] ),
                Series::new("total_rain", vec![total_rain] ),
            ]
        )?;
        
        Ok(result)
    }
    
    df.groupby(["date"])?
      .apply( func )?
}


shape: (3, 4)
┌──────────┬──────────┬──────────┬────────────┐
│ min_temp ┆ max_temp ┆ avg_temp ┆ total_rain │
│ ---      ┆ ---      ┆ ---      ┆ ---        │
│ i32      ┆ i32      ┆ f64      ┆ f64        │
╞══════════╪══════════╪══════════╪════════════╡
│ 10       ┆ 20       ┆ 15.0     ┆ 0.3        │
│ 9        ┆ 9        ┆ 9.0      ┆ 0.1        │
│ 1        ┆ 7        ┆ 4.0      ┆ 0.31       │
└──────────┴──────────┴──────────┴────────────┘

## Advanced expressions (Lazy API)

See [Expressions (Polars User Guide)](https://pola-rs.github.io/polars-book/user-guide/expressions/column_selections/#by-multiple-strings)