In [1]:
import polars as pl

### Read data

In [2]:
db_path = "data/database.sqlite"
connection_string = "sqlite://" + db_path

Read the data directly into memory with the `pl.read_*` functions.

In [4]:
player_data = pl.read_database("select * from Player_Attributes", connection_string)
player_data.head()

id,player_fifa_api_id,player_api_id,date,overall_rating,potential,preferred_foot,attacking_work_rate,defensive_work_rate,crossing,finishing,heading_accuracy,short_passing,volleys,dribbling,curve,free_kick_accuracy,long_passing,ball_control,acceleration,sprint_speed,agility,reactions,balance,shot_power,jumping,stamina,strength,long_shots,aggression,interceptions,positioning,vision,penalties,marking,standing_tackle,sliding_tackle,gk_diving,gk_handling,gk_kicking,gk_positioning,gk_reflexes
i64,i64,i64,str,i64,i64,str,str,str,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64,i64
1,218353,505942,"""2016-02-18 00:…",67,71,"""right""","""medium""","""medium""",49,44,71,61,44,51,45,39,64,49,60,64,59,47,65,55,58,54,76,35,71,70,45,54,48,65,69,69,6,11,10,8,8
2,218353,505942,"""2015-11-19 00:…",67,71,"""right""","""medium""","""medium""",49,44,71,61,44,51,45,39,64,49,60,64,59,47,65,55,58,54,76,35,71,70,45,54,48,65,69,69,6,11,10,8,8
3,218353,505942,"""2015-09-21 00:…",62,66,"""right""","""medium""","""medium""",49,44,71,61,44,51,45,39,64,49,60,64,59,47,65,55,58,54,76,35,63,41,45,54,48,65,66,69,6,11,10,8,8
4,218353,505942,"""2015-03-20 00:…",61,65,"""right""","""medium""","""medium""",48,43,70,60,43,50,44,38,63,48,60,64,59,46,65,54,58,54,76,34,62,40,44,53,47,62,63,66,5,10,9,7,7
5,218353,505942,"""2007-02-22 00:…",61,65,"""right""","""medium""","""medium""",48,43,70,60,43,50,44,38,63,48,60,64,59,46,65,54,58,54,76,34,62,40,44,53,47,62,63,66,5,10,9,7,7


There is also the `pl.scan_*` functions, to laizily read information. Better to use these when available.

### Eager vs. Lazy execution
- **Eager**: queries are exectuted on the fly in the order provided. Very little optimization by polars.
- **Lazy**: queries are exectuded when needed. Polars optimizes the query execution.

Example of Eager execution:

In [12]:
%%timeit
attackig_potential_data = (player_data
                            .sort(pl.col("potential"))
                            .filter(pl.col("attacking_work_rate")=="high"))

47 ms ± 13 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Example of Lazy execution. We need to call the lazy method to mark that all subsequent queries are exectuden in lazy mode, and the collect method to retrieve the result of the lazy execution:

In [13]:
%%timeit
attackig_potential_data = (player_data
                            .lazy()
                            .sort(pl.col("potential"))
                            .filter(pl.col("attacking_work_rate")=="high")
                            .collect())

10.9 ms ± 363 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
