In [None]:
import polars as pl
import plotly.express as px

csv_file = 'Titanic.csv'

df = pl.read_csv(csv_file)

### Count occurences on Series

Counting the occurences in a series is not different from pandas synthactically. However, in pandas the output of this operation is a Series, while in polars the output is a DataFrame with one column. Additionally, it is not sorted like a pandas Series and does not maintain any order unless a sort argument is passed.

In [None]:
df['Pclass'].value_counts()

Pclass,counts
i64,u32
3,491
1,216
2,184


In [None]:
df['Pclass'].value_counts(sort = True)

Pclass,counts
i64,u32
3,491
1,216
2,184


In [None]:
# An alternative sorting using the sort method on the output dataframe.

df['Pclass'].value_counts().sort('Pclass')

Pclass,counts
i64,u32
1,216
2,184
3,491


### Value counts as an expression

When we use value_counts in an expression, the output is a struct column. It is done by using it in the select statement. Getting a DataFrame as a result is also possible by calling .struct.to_frame method on the Series.

In [None]:
# Struct column

df.select(pl.col('Pclass').value_counts())

Pclass
struct[2]
"{3,491}"
"{1,216}"
"{2,184}"


In [None]:
# DataFrame

df.select(pl.col('Pclass').value_counts())['Pclass'].struct.to_frame()


`StructNameSpace.to_frame` has been renamed; this redirect is temporary, please use `.unnest` instead



Pclass,counts
i64,u32
2,184
1,216
3,491


### Plotting the value counts

To display the output with Plotly we need to convert the integer Pclass column to string dtype and the call value_counts on a Series again.

In [None]:
classCounts = df['Pclass'].value_counts().sort('Pclass').with_column(pl.col('Pclass').cast(pl.Utf8))

px.bar(x = classCounts['Pclass'], y=classCounts['counts'])


`DataFrame.with_column` has been renamed; this redirect is temporary, please use `.with_columns` instead



### Value conts in lazy mode

There is no LazySeries so we must call value_counts as an expression in a LazyMode.

As the output of the value_count expression is a struct dtype we then:

* trigger evaluation of the LazyFrame
* transform the struct column to a DataFrame

Note that in this scenario polars detects that only the Pclass column neets to be read from the CSV in lazy mode.

In [None]:
pl.scan_csv(csv_file).select(pl.col('Pclass').value_counts()).collect()['Pclass'].struct.to_frame()


`StructNameSpace.to_frame` has been renamed; this redirect is temporary, please use `.unnest` instead



Pclass,counts
i64,u32
1,216
2,184
3,491


In [None]:
# Optimized query plan

print(pl.scan_csv(csv_file).select(pl.col('Pclass').value_counts()).describe_optimized_plan())

 SELECT [col("Pclass").value_counts()] FROM

    CSV SCAN Titanic.csv
    PROJECT 1/12 COLUMNS
