Installing Polars is just a simple pip install. All binaries are pre-built for python >= 3.6.
$ pip3 install polars
Below we show a simple snippet that parses a csv and does a filter followed by a groupby operation. The eager API must feel very similar to users familiar to pandas. The lazy api is more declarative and describes what you want, not how you want it.
import polars as pl
df = pl.read_csv("https://j.mp/iriscsv")
df[df["sepal_length"] > 5].groupby("species").sum()
(pl.scan_csv("iris.csv")
.filter(pl.col("sepal_length") > 5)
.groupby("species")
.agg(pl.col("*").sum())
).collect()
This outputs:
species | sepal_length_sum | sepal_width_sum | petal_length_sum | petal_width_sum |
---|---|---|---|---|
str | f64 | f64 | f64 | f64 |
"setosa" | 116.9 | 81.7 | 33.2 | 6.1 |
"virginica" | 324.5 | 146.2 | 273.1 | 99.6 |
"versicolor" | 281.9 | 131.8 | 202.9 | 63.3 |
The eager API is similar to pandas. Operations are executed directly in an imperative manner. The important data structures are DataFrame's and Series
Read more about the eager DataFrame operations.
Read more about the eager Series operations.
The lazy API builds a query plan. Nothing is executed until you explicitly ask polars to execute the query
(via LazyFrame.collect()
, or LazyFrame.fetch
). This provides polars with the entire context of the query and allows
for optimizations and choosing the fastest algorithm given that context.
A LazyFrame
is a DataFrame
abstraction that lazily keeps track of the query plan.
Read more about the Lazy DataFrame operations.
The arguments given to a LazyFrame
can be constructed by building simple or complex queries. See the examples in the
how can I? section in the book.
The API of the Expr can be found here.
The polars book provides more in-depth information about polars. Reading this will provide you with a more thorough understanding of what polars lazy has to offer, and what kind of optimizations are done by the query optimizer.