# Exercises: Data Analysis with Python

In [1]:
import polars as pl

## Mean Temperature in Zurich
The weather station in Zurich Fluntern has been recording temperature data since 1864.

#### Import Data
The file *sma_zrh_historical.csv* contains daily mean temperatures.

In [30]:
df = pl.read_csv('data/sma_zrh_historical.csv', separator=';')
df

station_abbr,reference_timestamp,tre200d0,tre200dx,tre200dn,tre005d0,tre005dx,tre005dn,ure200d0,pva200d0,prestad0,pp0qffd0,ppz850d0,ppz700d0,pp0qnhd0,fkl010d0,fkl010d1,fu3010d0,fu3010d1,fkl010d3,fu3010d3,wcc006d0,rre150d0,rka150d0,htoautd0,gre000d0,oli000d0,olo000d0,osr000d0,ods000d0,sre000d0,sremaxdv,erefaod0,xcd000d0,dkl010d0,xno000d0,xno012d0,rreetsd0
str,str,f64,str,str,str,str,str,f64,f64,f64,str,str,str,str,f64,str,f64,str,str,str,str,f64,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str
"""SMA""","""01.01.1864 00:00""",-0.8,,,,,,94.3,5.5,951.1,,,,,2.4,,8.6,,,,,6.3,,,,,,,,,,,,,,,
"""SMA""","""02.01.1864 00:00""",-8.0,,,,,,91.7,3.1,962.4,,,,,3.5,,12.6,,,,,4.5,,,,,,,,,,,,,,,
"""SMA""","""03.01.1864 00:00""",-11.1,,,,,,92.7,2.5,965.0,,,,,8.3,,29.9,,,,,0.1,,,,,,,,,,,,,,,
"""SMA""","""04.01.1864 00:00""",-10.2,,,,,,97.3,2.7,965.5,,,,,1.4,,5.0,,,,,0.2,,,,,,,,,,,,,,,
"""SMA""","""05.01.1864 00:00""",-11.3,,,,,,95.0,2.6,966.9,,,,,0.4,,1.4,,,,,0.0,,,,,,,,,,,,,,,
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
"""SMA""","""27.12.2024 00:00""",-1.8,"""-1.1""","""-2.5""","""-1.3""","""0""","""-2.4""",95.5,5.1,966.5,"""1036.1""",,,"""1032.7""",0.9,"""3.2""",3.2,"""11.5""","""3.1""","""11.2""","""0""",0.0,"""0""","""0""","""17""","""304""",,,,"""0""","""0""","""0.2""","""0""","""191""","""21.8""","""13.8""","""0"""
"""SMA""","""28.12.2024 00:00""",-1.5,"""-0.6""","""-2.2""","""-0.8""","""0.8""","""-1.6""",98.8,5.4,964.4,"""1033.7""",,,"""1030.4""",0.7,"""2.6""",2.5,"""9.4""","""2.5""","""9""","""0""",0.0,"""0""","""0""","""21""","""306""",,,,"""0""","""0""","""0.2""","""0""","""225""","""21.5""","""13.5""","""0"""
"""SMA""","""29.12.2024 00:00""",-1.7,"""-0.5""","""-3.3""","""-0.9""","""1.2""","""-2.7""",99.3,5.3,963.9,"""1033.3""",,,"""1030""",0.8,"""2.6""",2.9,"""9.4""","""2.6""","""9.4""","""0""",0.0,"""0""","""0""","""24""","""304""",,,,"""0""","""0""","""0.2""","""0""","""219""","""21.7""","""13.7""","""0"""
"""SMA""","""30.12.2024 00:00""",-2.3,"""-1.3""","""-3.3""","""-1.6""","""-0.3""","""-2.7""",98.0,5.1,964.7,"""1034.3""",,,"""1030.8""",0.7,"""3.3""",2.5,"""11.9""","""3.2""","""11.5""","""0""",0.0,"""0""","""0""","""19""","""303""",,,,"""0""","""0""","""0.2""","""0""","""198""","""22.3""","""14.3""","""0"""


#### Preparing the data
We are only interested in the date (column 'reference_timestamp') and the daily mean temperature ('tre200d0'). Create a dataframe with only these two columns, and at the same time change the datatype of the date column and change the column names to something more meaningful.

In [13]:
temperature = (df
    .select(
        pl.col('reference_timestamp')
            .str.to_date(format='%d.%m.%Y %H:%M')
            .alias('Date'), 
        pl.col('tre200d0')
            .alias('Temperature')
    )
     )

temperature

Date,Temperature
date,f64
1864-01-01,-0.8
1864-01-02,-8.0
1864-01-03,-11.1
1864-01-04,-10.2
1864-01-05,-11.3
…,…
2024-12-27,-1.8
2024-12-28,-1.5
2024-12-29,-1.7
2024-12-30,-2.3


#### Calculate yearly means and graph data
Calculate the yearly means using Polars' *group_by_dynamic* method. Graph the yearly mean as a function of time.

In [32]:
yearly_means = (temperature
                .group_by_dynamic('Date', every='1y')
                .agg(pl.col('Temperature').mean())
               )

yearly_means.plot.line(
    x='Date',
    y='Temperature'
)

#### Rolling Average
To reduce the effect of yearly fluctuations, we can use a *rolling average*. Polars provides the methode *rolling* to calculate the rolling average over a given window.

In [27]:
rolling_av = (yearly_means
              .rolling(index_column='Date', period='10y')
              .agg(pl.col('Temperature').mean())
             )

rolling_av.plot.line(
    x='Date',
    y='Temperature'
)
              

## Rainfall in Zurich
As an additional we can analyse it precipitation (column 'rre150d0') has also changed over the years.

In [39]:
precipitation = (df
    .select(
        pl.col('reference_timestamp')
            .str.to_date(format='%d.%m.%Y %H:%M')
            .alias('Date'), 
        pl.col('rre150d0')
            .alias('Precipitation')
    )
     )

precipitation

Date,Precipitation
date,f64
1864-01-01,6.3
1864-01-02,4.5
1864-01-03,0.1
1864-01-04,0.2
1864-01-05,0.0
…,…
2024-12-27,0.0
2024-12-28,0.0
2024-12-29,0.0
2024-12-30,0.0


In [41]:
prec_roll = (precipitation
    .group_by_dynamic('Date', every='1y')
    .agg(pl.col('Precipitation').mean())
    .rolling(index_column='Date', period='10y')
    .agg(pl.col('Precipitation').mean())
            )

prec_roll.plot.line(
    x='Date',
    y='Precipitation'
)