# Real world data - TAWES

## Tawes weather data example

Tawes weather data for kapfenberg from 1997 to 2018 is in `data/tawes/Messstationen Tagesdaten v2 Datensatz_19970101_20241001_kapfenberg.csv`.
CSV (Comma Separated Value) files can be read with pandas (amongst many other file formats).

In [1]:
import pandas
df = pandas.read_csv('../data/tawes/Messstationen Tagesdaten v2 Datensatz_19970101_20241001_kapfenberg.csv')
df.head()

Unnamed: 0,time,station,cglo_j,rr,tl_mittel,vv_mittel,p_mittel,tlmin,tlmax
0,1997-01-01T00:00+00:00,13305,,,,,,,
1,1997-01-02T00:00+00:00,13305,,,,,,,
2,1997-01-03T00:00+00:00,13305,,,,,,,
3,1997-01-04T00:00+00:00,13305,,,,,,,
4,1997-01-05T00:00+00:00,13305,,,,,,,


rr is sum percipitation for the whole day, cglo_j is global irradiance, tl_mittel is average air temperature, tlmin and tlmax are the respective temperature extrema on these days, vv_mittel is average wind speed, p_mittel is average air pressure.
However, there are no values. NaN is short for Not A Number. And if there is no value in this row (that is otherwise numeric in nature) it simple fills it with this NaN indicator.
The first row in the csv file looks like this:
`1997-01-01T00:00+00:00,13305,,,,,`
because the weather station started being operational only later that year, and only been operational until 2018.

So how to get rid of those NaNs?


In [2]:
df.dropna().head()

Unnamed: 0,time,station,cglo_j,rr,tl_mittel,vv_mittel,p_mittel,tlmin,tlmax
1826,2002-01-01T00:00+00:00,13305,322.0,-1.0,-4.6,1.5,972.4,-9.4,0.2
1827,2002-01-02T00:00+00:00,13305,566.0,0.0,-3.0,2.1,965.6,-9.8,3.8
1828,2002-01-03T00:00+00:00,13305,684.0,-1.0,-3.7,2.1,971.8,-7.4,0.1
1829,2002-01-04T00:00+00:00,13305,668.0,-1.0,-11.7,1.0,975.2,-16.8,-6.6
1830,2002-01-05T00:00+00:00,13305,235.0,-1.0,-12.5,1.0,975.2,-16.7,-8.2
...,...,...,...,...,...,...,...,...,...
7806,2018-05-17T00:00+00:00,13305,1388.0,-1.0,14.4,1.0,955.6,9.5,19.2
7807,2018-05-18T00:00+00:00,13305,1455.0,-1.0,15.9,1.0,958.1,10.0,21.8
7808,2018-05-19T00:00+00:00,13305,2075.0,-1.0,15.9,2.1,959.6,9.3,22.4
7809,2018-05-20T00:00+00:00,13305,1482.0,11.7,14.8,1.0,962.0,8.3,21.2


This however drops all rows that contain a NaN in any place. clgo was not available until later (2002) so we loose a lot of data. We want to know at least when all temperature values were available)

In [3]:
df.dropna(subset=['tl_mittel', 'tlmax', 'tlmin']).head()

Unnamed: 0,time,station,cglo_j,rr,tl_mittel,vv_mittel,p_mittel,tlmin,tlmax
59,1997-03-01T00:00+00:00,13305,,-1.0,2.4,1.0,971.7,-1.9,6.6
60,1997-03-02T00:00+00:00,13305,,-1.0,6.6,1.0,969.7,-0.5,13.6
61,1997-03-03T00:00+00:00,13305,,-1.0,6.2,0.6,971.1,-1.2,13.6
62,1997-03-04T00:00+00:00,13305,,2.5,7.6,1.5,972.8,4.9,10.3
63,1997-03-05T00:00+00:00,13305,,-1.0,6.4,0.6,968.0,3.3,9.4


In [4]:
df.dropna(subset=['tl_mittel', 'tlmax', 'tlmin']).tail()

Unnamed: 0,time,station,cglo_j,rr,tl_mittel,vv_mittel,p_mittel,tlmin,tlmax
7807,2018-05-18T00:00+00:00,13305,1455.0,-1.0,15.9,1.0,958.1,10.0,21.8
7808,2018-05-19T00:00+00:00,13305,2075.0,-1.0,15.9,2.1,959.6,9.3,22.4
7809,2018-05-20T00:00+00:00,13305,1482.0,11.7,14.8,1.0,962.0,8.3,21.2
7810,2018-05-21T00:00+00:00,13305,1771.0,1.4,16.1,1.0,958.8,10.5,21.7
7811,2018-05-22T00:00+00:00,13305,1732.0,,16.2,1.5,956.8,9.6,22.7


So from row 1826 until row 7810 seems to represent the time that the weather station was operational. We can simply slice the dataframe like we would a python list object.

In [11]:
df = df[59:7811].copy()

There are more elaborate ways to go about this, including not accessing the DataFrame using row indices, but using the time as index, but more on that later.

In [12]:
df


Unnamed: 0,time,station,cglo_j,rr,tl_mittel,vv_mittel,p_mittel,tlmin,tlmax,tl_diff
118,1997-04-29T00:00+00:00,13305,,0.1,11.1,0.6,951.5,6.7,15.5,8.8
119,1997-04-30T00:00+00:00,13305,,-1.0,9.7,1.5,958.0,4.5,14.8,10.3
120,1997-05-01T00:00+00:00,13305,,-1.0,11.0,1.5,964.8,6.0,16.0,10.0
121,1997-05-02T00:00+00:00,13305,,-1.0,12.6,1.5,966.8,4.9,20.2,15.3
122,1997-05-03T00:00+00:00,13305,,-1.0,14.9,1.5,957.8,3.9,25.9,22.0
...,...,...,...,...,...,...,...,...,...,...
7806,2018-05-17T00:00+00:00,13305,1388.0,-1.0,14.4,1.0,955.6,9.5,19.2,9.7
7807,2018-05-18T00:00+00:00,13305,1455.0,-1.0,15.9,1.0,958.1,10.0,21.8,11.8
7808,2018-05-19T00:00+00:00,13305,2075.0,-1.0,15.9,2.1,959.6,9.3,22.4,13.1
7809,2018-05-20T00:00+00:00,13305,1482.0,11.7,14.8,1.0,962.0,8.3,21.2,12.9


## Accessing columns

One can access separate columns (can be multiple) like we would with a dictionary:

In [13]:
df['tl_mittel']

118     11.1
119      9.7
120     11.0
121     12.6
122     14.9
        ... 
7806    14.4
7807    15.9
7808    15.9
7809    14.8
7810    16.1
Name: tl_mittel, Length: 7693, dtype: float64

In [14]:
df[['tl_mittel', 'tlmin', 'tlmax']]

Unnamed: 0,tl_mittel,tlmin,tlmax
118,11.1,6.7,15.5
119,9.7,4.5,14.8
120,11.0,6.0,16.0
121,12.6,4.9,20.2
122,14.9,3.9,25.9
...,...,...,...
7806,14.4,9.5,19.2
7807,15.9,10.0,21.8
7808,15.9,9.3,22.4
7809,14.8,8.3,21.2


## Basic maths operations

Lets do some basic operations. Make a new column with the temperature differential tlmax - tmin.

In [15]:
df['tl_diff'] = df['tlmax'] - df['tlmin']
df.head()

Unnamed: 0,time,station,cglo_j,rr,tl_mittel,vv_mittel,p_mittel,tlmin,tlmax,tl_diff
118,1997-04-29T00:00+00:00,13305,,0.1,11.1,0.6,951.5,6.7,15.5,8.8
119,1997-04-30T00:00+00:00,13305,,-1.0,9.7,1.5,958.0,4.5,14.8,10.3
120,1997-05-01T00:00+00:00,13305,,-1.0,11.0,1.5,964.8,6.0,16.0,10.0
121,1997-05-02T00:00+00:00,13305,,-1.0,12.6,1.5,966.8,4.9,20.2,15.3
122,1997-05-03T00:00+00:00,13305,,-1.0,14.9,1.5,957.8,3.9,25.9,22.0


**Exercise**: Create a new column named 'freezing' that contains True if the min temparature was below 0 and False otherwise.

In [17]:
df['freezing'] = df['tlmin'] < 0
df

Unnamed: 0,time,station,cglo_j,rr,tl_mittel,vv_mittel,p_mittel,tlmin,tlmax,tl_diff,freezing
118,1997-04-29T00:00+00:00,13305,,0.1,11.1,0.6,951.5,6.7,15.5,8.8,False
119,1997-04-30T00:00+00:00,13305,,-1.0,9.7,1.5,958.0,4.5,14.8,10.3,False
120,1997-05-01T00:00+00:00,13305,,-1.0,11.0,1.5,964.8,6.0,16.0,10.0,False
121,1997-05-02T00:00+00:00,13305,,-1.0,12.6,1.5,966.8,4.9,20.2,15.3,False
122,1997-05-03T00:00+00:00,13305,,-1.0,14.9,1.5,957.8,3.9,25.9,22.0,False
...,...,...,...,...,...,...,...,...,...,...,...
7806,2018-05-17T00:00+00:00,13305,1388.0,-1.0,14.4,1.0,955.6,9.5,19.2,9.7,False
7807,2018-05-18T00:00+00:00,13305,1455.0,-1.0,15.9,1.0,958.1,10.0,21.8,11.8,False
7808,2018-05-19T00:00+00:00,13305,2075.0,-1.0,15.9,2.1,959.6,9.3,22.4,13.1,False
7809,2018-05-20T00:00+00:00,13305,1482.0,11.7,14.8,1.0,962.0,8.3,21.2,12.9,False


**Exercise**: Instead of True and False this freezing column should contain 1 and 0 (1 if True, 0 if False)

In [23]:
df['freezing'] = (df['tlmin'] < 0).astype(int)
df

Unnamed: 0,time,station,cglo_j,rr,tl_mittel,vv_mittel,p_mittel,tlmin,tlmax,tl_diff,freezing
118,1997-04-29T00:00+00:00,13305,,0.1,11.1,0.6,951.5,6.7,15.5,8.8,0
119,1997-04-30T00:00+00:00,13305,,-1.0,9.7,1.5,958.0,4.5,14.8,10.3,0
120,1997-05-01T00:00+00:00,13305,,-1.0,11.0,1.5,964.8,6.0,16.0,10.0,0
121,1997-05-02T00:00+00:00,13305,,-1.0,12.6,1.5,966.8,4.9,20.2,15.3,0
122,1997-05-03T00:00+00:00,13305,,-1.0,14.9,1.5,957.8,3.9,25.9,22.0,0
...,...,...,...,...,...,...,...,...,...,...,...
7806,2018-05-17T00:00+00:00,13305,1388.0,-1.0,14.4,1.0,955.6,9.5,19.2,9.7,0
7807,2018-05-18T00:00+00:00,13305,1455.0,-1.0,15.9,1.0,958.1,10.0,21.8,11.8,0
7808,2018-05-19T00:00+00:00,13305,2075.0,-1.0,15.9,2.1,959.6,9.3,22.4,13.1,0
7809,2018-05-20T00:00+00:00,13305,1482.0,11.7,14.8,1.0,962.0,8.3,21.2,12.9,0


0

## Basic stats

Get some basic statistics on the data using describe().

In [24]:
df.describe()

Unnamed: 0,station,cglo_j,rr,tl_mittel,vv_mittel,p_mittel,tlmin,tlmax,tl_diff,freezing
count,7693.0,5959.0,7693.0,7693.0,7693.0,7693.0,7693.0,7693.0,7693.0,7693.0
mean,13305.0,1128.234435,1.684752,9.938711,1.081698,957.614754,4.854959,14.985168,10.130209,0.278565
std,0.0,733.033308,6.081223,8.210142,0.508782,7.230782,7.390349,9.525686,4.667688,0.448322
min,13305.0,47.0,-1.0,-14.2,0.0,919.8,-20.0,-9.3,1.0,0.0
25%,13305.0,486.0,-1.0,3.2,0.6,953.4,-0.6,6.7,6.5,0.0
50%,13305.0,998.0,-1.0,10.5,1.0,957.9,5.1,15.8,9.7,0.0
75%,13305.0,1690.0,1.2,16.8,1.5,962.2,11.1,22.7,13.5,1.0
max,13305.0,3085.0,77.4,27.9,5.9,980.4,21.7,37.2,24.6,1.0


## Conditional slicing, finding and counting occurrences

When was the coldest day in Kapfenberg? We can see above that tlmin had a lowest value of -20, but when?
We can use conditions that evaluate to true or false (like the freezing one above) as indexers.

In [25]:
df['tlmin'] == -20.0

118     False
119     False
120     False
121     False
122     False
        ...  
7806    False
7807    False
7808    False
7809    False
7810    False
Name: tlmin, Length: 7693, dtype: bool

In [26]:
df[df['tlmin'] == -20.0]


Unnamed: 0,time,station,cglo_j,rr,tl_mittel,vv_mittel,p_mittel,tlmin,tlmax,tl_diff,freezing
2202,2003-01-12T00:00+00:00,13305,691.0,-1.0,-14.2,1.0,970.8,-20.0,-8.4,11.6,1


We can use this to select whole ranges of data where some condition applies. E.g. select all data where it was freezing.

In [27]:
df['freezing'] = (df['tlmin'] < 0).astype(int)
df[df['freezing'] == 1]

Unnamed: 0,time,station,cglo_j,rr,tl_mittel,vv_mittel,p_mittel,tlmin,tlmax,tl_diff,freezing
286,1997-10-14T00:00+00:00,13305,,0.6,4.9,1.5,953.5,-0.5,10.2,10.7,1
297,1997-10-25T00:00+00:00,13305,,-1.0,2.4,2.1,962.2,-3.5,8.3,11.8,1
299,1997-10-27T00:00+00:00,13305,,1.8,-0.4,1.0,965.4,-3.9,3.1,7.0,1
300,1997-10-28T00:00+00:00,13305,,-1.0,-1.8,2.6,972.8,-6.5,3.0,9.5,1
301,1997-10-29T00:00+00:00,13305,,-1.0,-2.7,1.0,976.7,-8.8,3.5,12.3,1
...,...,...,...,...,...,...,...,...,...,...,...
7751,2018-03-23T00:00+00:00,13305,1129.0,0.0,3.4,0.6,945.8,-1.0,7.8,8.8,1
7752,2018-03-24T00:00+00:00,13305,953.0,-1.0,4.4,1.0,945.6,-0.3,9.1,9.4,1
7753,2018-03-25T00:00+00:00,13305,1215.0,-1.0,2.9,1.5,949.8,-1.3,7.0,8.3,1
7756,2018-03-28T00:00+00:00,13305,1033.0,0.2,3.8,1.0,950.0,-2.3,9.9,12.2,1


We can use value_counts to see what value occurs how often in the dataframe.

In [30]:
df['freezing'].value_counts()

freezing
0    5550
1    2143
Name: count, dtype: int64

**Exercise**: How many days were the temperature was always freezing (look at tlmax) and what percentage of the time does this represent.

In [34]:
df['allDayFreezing'] = (df['tlmax'] < 0).astype(int)
counts = df['allDayFreezing'].value_counts()
counts

allDayFreezing
0    7325
1     368
Name: count, dtype: int64

In [36]:
counts[1] / (counts[0] + counts[1]) *100

4.783569478746913

## Time indexed DataFrames

In [37]:
df

Unnamed: 0,time,station,cglo_j,rr,tl_mittel,vv_mittel,p_mittel,tlmin,tlmax,tl_diff,freezing,allDayFreezing
118,1997-04-29T00:00+00:00,13305,,0.1,11.1,0.6,951.5,6.7,15.5,8.8,0,0
119,1997-04-30T00:00+00:00,13305,,-1.0,9.7,1.5,958.0,4.5,14.8,10.3,0,0
120,1997-05-01T00:00+00:00,13305,,-1.0,11.0,1.5,964.8,6.0,16.0,10.0,0,0
121,1997-05-02T00:00+00:00,13305,,-1.0,12.6,1.5,966.8,4.9,20.2,15.3,0,0
122,1997-05-03T00:00+00:00,13305,,-1.0,14.9,1.5,957.8,3.9,25.9,22.0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...
7806,2018-05-17T00:00+00:00,13305,1388.0,-1.0,14.4,1.0,955.6,9.5,19.2,9.7,0,0
7807,2018-05-18T00:00+00:00,13305,1455.0,-1.0,15.9,1.0,958.1,10.0,21.8,11.8,0,0
7808,2018-05-19T00:00+00:00,13305,2075.0,-1.0,15.9,2.1,959.6,9.3,22.4,13.1,0,0
7809,2018-05-20T00:00+00:00,13305,1482.0,11.7,14.8,1.0,962.0,8.3,21.2,12.9,0,0


Pandas supports datetime indices

In [40]:
df['time'] = pandas.to_datetime(df['time'], utc=True)
# df
df = df.set_index('time', drop=True)

In [41]:
df.head()


Unnamed: 0_level_0,station,cglo_j,rr,tl_mittel,vv_mittel,p_mittel,tlmin,tlmax,tl_diff,freezing,allDayFreezing
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1997-04-29 00:00:00+00:00,13305,,0.1,11.1,0.6,951.5,6.7,15.5,8.8,0,0
1997-04-30 00:00:00+00:00,13305,,-1.0,9.7,1.5,958.0,4.5,14.8,10.3,0,0
1997-05-01 00:00:00+00:00,13305,,-1.0,11.0,1.5,964.8,6.0,16.0,10.0,0,0
1997-05-02 00:00:00+00:00,13305,,-1.0,12.6,1.5,966.8,4.9,20.2,15.3,0,0
1997-05-03 00:00:00+00:00,13305,,-1.0,14.9,1.5,957.8,3.9,25.9,22.0,0,0


Now we can index rows based on times.

In [42]:
df['2011-03-01 00:00': '2011-03-02 00:00']

Unnamed: 0_level_0,station,cglo_j,rr,tl_mittel,vv_mittel,p_mittel,tlmin,tlmax,tl_diff,freezing,allDayFreezing
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2011-03-01 00:00:00+00:00,13305,194.0,-1.0,-0.4,1.5,969.1,-4.3,3.6,7.9,1,0
2011-03-02 00:00:00+00:00,13305,501.0,-1.0,-2.6,1.5,969.5,-7.2,2.0,9.2,1,0


We can now resample the dataframe to some other resolution. Resampling to a lower frequency is called downsampling, to a higher frequency this is called upsampling.

When resampling, one has to specify a frequency and a method.

For example yearly avarages:


In [51]:
df.resample('1YS').mean() 

Unnamed: 0_level_0,station,cglo_j,rr,tl_mittel,vv_mittel,p_mittel,tlmin,tlmax,tl_diff,freezing,allDayFreezing
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1997-01-01 00:00:00+00:00,13305.0,,2.032389,12.071255,1.327126,957.073279,7.094737,17.012551,9.917814,0.129555,0.016194
1998-01-01 00:00:00+00:00,13305.0,,1.438904,9.493699,1.167945,957.485205,4.482466,14.470137,9.987671,0.30137,0.043836
1999-01-01 00:00:00+00:00,13305.0,,1.624384,9.472603,1.09589,956.593973,4.757534,14.15589,9.398356,0.30411,0.052055
2000-01-01 00:00:00+00:00,13305.0,,1.778415,10.279508,1.066667,956.969945,5.255464,15.263661,10.008197,0.23224,0.051913
2001-01-01 00:00:00+00:00,13305.0,,0.851507,9.569589,1.067123,956.835342,4.418904,14.687945,10.269041,0.29863,0.060274
2002-01-01 00:00:00+00:00,13305.0,1167.607735,1.838082,10.171781,1.174795,957.592877,5.231233,15.073151,9.841918,0.249315,0.052055
2003-01-01 00:00:00+00:00,13305.0,1215.260989,1.054795,9.716986,1.172877,959.304658,4.058082,15.339178,11.281096,0.345205,0.043836
2004-01-01 00:00:00+00:00,13305.0,1083.517808,1.627869,9.137705,1.126503,958.069399,4.332514,13.911202,9.578689,0.327869,0.038251
2005-01-01 00:00:00+00:00,13305.0,1179.101928,2.273425,8.987123,1.205753,959.006575,4.06,13.883562,9.823562,0.347945,0.093151
2006-01-01 00:00:00+00:00,13305.0,1191.417582,1.35589,9.509041,1.218904,959.675068,4.452877,14.530411,10.077534,0.29589,0.054795


We can also resample to a higher frequency than the original data.

For example upsampling to hourly frequency while using linear interpolation.

In [52]:
small_df = df['2011-03-01 00:00': '2011-03-07 00:00'].copy() 
small_df.resample('1h').interpolate()


Unnamed: 0_level_0,station,cglo_j,rr,tl_mittel,vv_mittel,p_mittel,tlmin,tlmax,tl_diff,freezing,allDayFreezing
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2011-03-01 00:00:00+00:00,13305.0,194.000000,-1.0,-0.400000,1.500000,969.100000,-4.300000,3.600000,7.900000,1.0,0.0
2011-03-01 01:00:00+00:00,13305.0,206.791667,-1.0,-0.491667,1.500000,969.116667,-4.420833,3.533333,7.954167,1.0,0.0
2011-03-01 02:00:00+00:00,13305.0,219.583333,-1.0,-0.583333,1.500000,969.133333,-4.541667,3.466667,8.008333,1.0,0.0
2011-03-01 03:00:00+00:00,13305.0,232.375000,-1.0,-0.675000,1.500000,969.150000,-4.662500,3.400000,8.062500,1.0,0.0
2011-03-01 04:00:00+00:00,13305.0,245.166667,-1.0,-0.766667,1.500000,969.166667,-4.783333,3.333333,8.116667,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...
2011-03-06 20:00:00+00:00,13305.0,1449.500000,-1.0,0.450000,1.416667,970.633333,-4.083333,4.900000,8.983333,1.0,0.0
2011-03-06 21:00:00+00:00,13305.0,1456.375000,-1.0,0.387500,1.437500,971.125000,-4.112500,4.800000,8.912500,1.0,0.0
2011-03-06 22:00:00+00:00,13305.0,1463.250000,-1.0,0.325000,1.458333,971.616667,-4.141667,4.700000,8.841667,1.0,0.0
2011-03-06 23:00:00+00:00,13305.0,1470.125000,-1.0,0.262500,1.479167,972.108333,-4.170833,4.600000,8.770833,1.0,0.0


**Exercise**: Ressample the 'freezing' column to yearly frequency providing not the mean (as in the examples above) but the sum within each year.

In [55]:
df['freezing'].resample('1YS').sum() 

time
1997-01-01 00:00:00+00:00     32
1998-01-01 00:00:00+00:00    110
1999-01-01 00:00:00+00:00    111
2000-01-01 00:00:00+00:00     85
2001-01-01 00:00:00+00:00    109
2002-01-01 00:00:00+00:00     91
2003-01-01 00:00:00+00:00    126
2004-01-01 00:00:00+00:00    120
2005-01-01 00:00:00+00:00    127
2006-01-01 00:00:00+00:00    108
2007-01-01 00:00:00+00:00     87
2008-01-01 00:00:00+00:00     89
2009-01-01 00:00:00+00:00     93
2010-01-01 00:00:00+00:00    115
2011-01-01 00:00:00+00:00    119
2012-01-01 00:00:00+00:00     94
2013-01-01 00:00:00+00:00     97
2014-01-01 00:00:00+00:00     66
2015-01-01 00:00:00+00:00    103
2016-01-01 00:00:00+00:00     91
2017-01-01 00:00:00+00:00    106
2018-01-01 00:00:00+00:00     64
Freq: YS-JAN, Name: freezing, dtype: int64