# Getting started

## Preprocess daily weather observations

First, we need to access some data.

We will use a [CSV
file](https://data.climpact.gr/dataset/1a1f82e5-94da-4dd0-8b54-04016fc9574e/resource/3efa8645-6dde-42db-b71a-f5f39af3fd53/download/hcd_noa.csv)
with daily historical climatic data from the [Thissio
station](https://data.climpact.gr/en/dataset/1ce5d2ce-23df-412e-849e-ef2493319da9/resource/3efa8645-6dde-42db-b71a-f5f39af3fd53)
(Athens, Greece) of the National Observatory of Athens ([Founda,
2011](https://doi.org/10.1080/17512549.2011.582338), [Founda et al.,
2013](https://doi.org/10.1002/asl2.419)).
The dataset is puclic under [CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/).

After downloading our file, we should inspect its data and preprocess it so that
it can be used with hotspell.

In [1]:
import pandas as pd

input_file = "/my_path/hcd_noa.csv"  # Replace with your path
# input_file = "/home/ilias/Documents/test_hotspell/getting_started/hcd_noa.csv"

df = pd.read_csv(input_file)
print(df.head())

   YEAR  MONTH  DAY  Tmax (oC)  Tmin (oC)  RH (%)  Rain (mm)
0  1901      1    1       14.3        5.9    67.0        0.2
1  1901      1    2       15.0        8.9    80.0        5.8
2  1901      1    3       10.3        4.8    76.0        5.0
3  1901      1    4        7.1        4.8    78.0        3.2
4  1901      1    5       10.4        5.2    83.0        6.6


As we can see, the dataset includes 7 columns. The first 3 columns correspond to
the year, month and day of the observations, columns 4 and 5 to the daily
maximum (Tmax) and daily minimum (Tmin) air temperature (in °C), column 6 to to
relative humidity (RH, expressed as percent) and the last column to
precipitation (Rain, in mm).

We need to drop humidity and rain and rearrange Tmax and Tmin:

In [2]:
df = df[["YEAR", "MONTH", "DAY", "Tmin (oC)", "Tmax (oC)"]]
print(df.head())

   YEAR  MONTH  DAY  Tmin (oC)  Tmax (oC)
0  1901      1    1        5.9       14.3
1  1901      1    2        8.9       15.0
2  1901      1    3        4.8       10.3
3  1901      1    4        4.8        7.1
4  1901      1    5        5.2       10.4


Suppose that the station metadata asserts that nodata values are included in the
timeseries and have been set as -9999.

We should delete these measurements as:

In [3]:
nodata_value = -9999
df = df.loc[
    (df["Tmin (oC)"] != nodata_value ) & (df["Tmax (oC)"] != nodata_value)]

We are ready to output file; during save we **must drop** the header and the
index of the DataFrame.

In [4]:
# output_file = "/my_path/hcd_noa_processed.csv"  # Replace with your path
output_file = "/home/ilias/Documents/test_hotspell/getting_started/hcd_noa_processed.csv"

df.to_csv(output_file, header=False, index=False)

We have finished the preprocessing of our data we are ready to use hotspell.

First we must initialize the heat wave index we want to use.
For this example we will use the index CTX90PCT. 

https://doi.org/10.1175/JCLI-D-12-00383.1


In [5]:
import hotspell

index_name = "ctx90pct"
hw_index = hotspell.index(name=index_name)

This index uses as a threshold the calendar day 90th percentile value of the
maximum temperature based on a 15-day window. That is there is a different
percentile value for each day of the year where the window is centered on the
day in question.  A heat wave occurs when the threshold is exceeded for at least
3 consecutive days.

The complete list of available heat wave indices can be found here.

The attributes of predifined indexes are set automatically

Using the index we selected above and the preprocessed data from the Preprocessing section
we can find the heat waves event in the period covered by the data.

We will use the default arguments for the parameters of get_heatwaves.
That is we the base period used to calculate the percentile value will have as beggining year 1961 and as ending year 1990.
We will find the heatwaves only for the months June, July and August.
We will also compute the annual heatwave metrics and we will save our results in csv files.

In [6]:
# input_file = "/my_path/hcd_noa_processed.csv"  # Replace with your path
input_file = "/home/ilias/Documents/test_hotspell/getting_started/hcd_noa_processed.csv"

hw = hotspell.get_heatwaves(filename=input_file, hw_index=hw_index)

hw.events is a DataFrame that contain the dates of detected heat wave events, as well as their basic characteristics (duration and temperature statistics).

In [7]:
print(hw.events.head())

           begin_date   end_date  duration  avg_tmax  std_tmax  max_tmax
index                                                                   
1901-08-01 1901-08-01 1901-08-03         3      37.9       0.8      38.8
1902-07-22 1902-07-22 1902-07-24         3      38.2       1.8      40.3
1903-08-14 1903-08-14 1903-08-16         3      36.0       0.2      36.2
1904-08-09 1904-08-09 1904-08-12         4      36.5       0.3      36.8
1905-08-26 1905-08-26 1905-08-31         6      36.5       0.9      38.0


In [8]:
print(hw.events.describe())

         duration    avg_tmax    std_tmax    max_tmax
count  198.000000  198.000000  198.000000  198.000000
mean     4.717172   36.745455    1.121212   38.145455
std      2.172932    1.538327    0.637774    2.020794
min      3.000000   32.500000    0.100000   32.800000
25%      3.000000   36.000000    0.700000   36.900000
50%      4.000000   36.950000    1.000000   38.000000
75%      6.000000   37.600000    1.400000   39.200000
max     13.000000   41.000000    3.600000   44.800000


hw.metrics is a DataFrame with the annual heatwaves properties.

- hwn: number of events
- hwf: number of days
- hwd: duration of longest event
- hwdm: mean duration of events
- hwm: mean normalized magnitude
- hwma: mean absolute magnitude
- hwa: normalized magnitude of hottest day
- hwaa: absolute magnitude of hottest day

For a more detailed description of heat waves properties see the documentation

In [9]:
print(hw.metrics.head())

      hwn  hwf  hwd  hwdm  hwm  hwma  hwa  hwaa
year                                           
1901    1    3  3.0   3.0  7.2  38.8  7.2  38.8
1902    1    3  3.0   3.0  8.7  40.3  8.7  40.3
1903    1    3  3.0   3.0  4.6  36.2  4.6  36.2
1904    1    4  4.0   4.0  5.2  36.8  5.2  36.8
1905    1    6  6.0   6.0  6.4  38.0  6.4  38.0


In [10]:
print(hw.metrics.describe())

              hwn         hwf        hwd       hwdm        hwm       hwma  \
count  121.000000  121.000000  85.000000  85.000000  85.000000  85.000000   
mean     1.702479    7.933884   5.541176   4.383529   6.401176  38.001176   
std      1.710968    9.443459   2.657214   1.264803   1.569007   1.569007   
min      0.000000    0.000000   3.000000   3.000000   2.400000  34.000000   
25%      0.000000    0.000000   4.000000   3.400000   5.600000  37.200000   
50%      1.000000    5.000000   5.000000   4.000000   6.400000  38.000000   
75%      2.000000   12.000000   7.000000   5.000000   7.300000  38.900000   
max      8.000000   54.000000  13.000000   7.500000  10.300000  41.900000   

             hwa       hwaa  
count  85.000000  85.000000  
mean    7.364706  38.964706  
std     2.102703   2.102703  
min     2.700000  34.300000  
25%     6.200000  37.800000  
50%     7.200000  38.800000  
75%     8.700000  40.300000  
max    11.400000  43.000000  


Let's repeat the procedure with a custom more extreme index that aims to capture at least 3 concecutive days above 39 degrees Celcius. See for a detailed descriptioo of which index characteristics can be set

In [11]:
hw_extreme_index = hotspell.index(name="extreme", var="tmax", fixed_thres=40, min_duration=4)
input_file = "/home/ilias/Documents/test_hotspell/getting_started/hcd_noa_processed.csv"
hw_extreme = hotspell.get_heatwaves(filename=input_file, hw_index=hw_extreme_index)

In [12]:
print(hw_extreme.events)

           begin_date   end_date  duration  avg_tmax  std_tmax  max_tmax
index                                                                   
1987-07-21 1987-07-21 1987-07-27         7      41.6       0.7      42.8
2007-07-22 2007-07-22 2007-07-25         4      41.4       0.5      41.9
