![title](birds-nest-4-eggs.png)

###### image:https://www.publicdomainpictures.net/nl/view-image.php?image=61631&picture=vogels-nest-4-eieren

#   The effect of precipitation between March and July on the breeding of birds in the Netherlands 1990-2020 in changes compared to the year before.

Sources:

Dutch bird breeding per season as percentage compared to indexyear (all years and all birds manually selected):
https://opendata.cbs.nl/statline/#/CBS/nl/dataset/84498NED/table?ts=1673294982549

Monthly sum of precipitation in 0.1 mm (ROYAL NETHERLANDS METEOROLOGICAL INSTITUTE):
###### De Kooy: https://cdn.knmi.nl/knmi/map/page/klimatologie/gegevens/maandgegevens/mndgeg_235_rh24.txt
###### De Bilt: https://cdn.knmi.nl/knmi/map/page/klimatologie/gegevens/maandgegevens/mndgeg_260_rh24.txt
###### Leeuwarden: https://cdn.knmi.nl/knmi/map/page/klimatologie/gegevens/maandgegevens/mndgeg_270_rh24.txt
###### Eelde: https://cdn.knmi.nl/knmi/map/page/klimatologie/gegevens/maandgegevens/mndgeg_280_rh24.txt
###### Twenthe: https://cdn.knmi.nl/knmi/map/page/klimatologie/gegevens/maandgegevens/mndgeg_290_rh24.txt
###### Schiphol: https://cdn.knmi.nl/knmi/map/page/klimatologie/gegevens/maandgegevens/mndgeg_240_rh24.txt
###### Rotterdam: https://cdn.knmi.nl/knmi/map/page/klimatologie/gegevens/maandgegevens/mndgeg_344_rh24.txt
###### Vlissingen: https://cdn.knmi.nl/knmi/map/page/klimatologie/gegevens/maandgegevens/mndgeg_310_rh24.txt
###### Eindhoven: https://cdn.knmi.nl/knmi/map/page/klimatologie/gegevens/maandgegevens/mndgeg_370_rh24.txt
###### Maastricht/Beek: https://cdn.knmi.nl/knmi/map/page/klimatologie/gegevens/maandgegevens/mndgeg_380_rh24.txt

###### *Coordinates for the stations: http://climexp.knmi.nl/KNMIData/list_dx.txt
###### *geojson for provinces: https://www.webuildinternet.com/articles/2015-07-19-geojson-data-of-the-netherlands/provinces.geojson






In [None]:
#Importing all necessary libraries
import pandas as pd
import functions_final_assignment as fn
import numpy as np

In [None]:
#Loading in the files with a yaml config file
config = fn.yaml_config()
#Loading the data into a dataframe
precipitation_df = fn.load_concat_df(config["precipitation"])
birds_df = pd.read_excel(io=config["breedingbirds"],sheet_name="Provinciale trends 1990-2020",skiprows=2)
#loading a dataframe with the centerpoints of the provinces
geo_df = fn.read_geojson("DATA/provinces.geojson")
geo_df.head(30)
#getting the middle points of the provinces.
geo_df["middle_point"] = [fn.get_centerpoint(data) for data in geo_df["geometry.coordinates"]]

geo_df = geo_df[["properties.name","middle_point"]]

# Every point except for the Y of Noord-brabant is calculated correctly, probably because of some encapulated regions
#this needs to be overwritten. 

In [None]:
geo_df.head(30)

In [None]:
#took the geographic mid point for brabant from google maps (its a monument)

geo_df["middle_point"].loc[6] = [51.562212646388495, 5.185266108595458]
geo_df.head(7)

In [None]:
precipitation_df.head(10)


The precipitation data starts well before 1990, as we only need the data between 1990 and 2020 we can get rid of most data.
After that the amount of missing values may be calculated.

In [None]:
#select only the rows with the values in the YYYY column between 1990 and 2020
precipitation_df = precipitation_df[precipitation_df.YYYY.between(1990,2020)]
#Show the unique values for YYYY to see if the YYYY filtering is done correctly
print(f'{precipitation_df.YYYY.unique()}')


In [None]:
#Calculated the amount of missing values
print(f'The amount of missing values are:\n {precipitation_df.isnull().sum()}')


In [None]:
#convert all values to integers:

precipitation_df = precipitation_df.astype(int)

In [None]:
#Adding the location at which the station is found to the dataframe
stn_dict = fn.make_stn_dict(config["stn_coord"])
precipitation_df["COORD"] = [stn_dict[str(s)] for s in precipitation_df.STN]
precipitation_df.head()

In [None]:
#As we want to see the sum for precipitation between march and july
#A column will be added
precipitation_df["MAR-JUL"] = precipitation_df.iloc[:,4:9].T.sum()
precipitation_df = precipitation_df[["STN","YYYY","COORD","MAR-JUL"]]
precipitation_df.head()

In [None]:
stn_df = precipitation_df[["STN","COORD"]]
stn_df = stn_df.drop_duplicates(subset=["STN"])


for c, column in enumerate(geo_df["properties.name"]):
    stn_df[f'dist_{column}'] = [fn.calc_point_dist(geo_df.iloc[c,1],x) for x in stn_df["COORD"]]

# Calculates the distance (sum of coordinates) between the weatherstations and the middle point of the province.
stn_df.head(12)

Most of the provinces have good coverage, however the province of Noord-Brabant is too far away from all the stations to predict the average precipitation there. 


In [None]:
#plot to see if the rain in March and July is normally distributed
#Also to see what the mean and deviation is. 
fn.hist_robust_dist(precipitation_df["MAR-JUL"])

There is some right skewedness as wheather extremes are not uncommon. A longer tail is observed in the data.
Tried to fit a gamma distribution to adjust for the longer tail, but it was finicky to adjust.

In [None]:
#New column with "condition" give the values higher than mu + sigma "wet" and mu - sigma as "dry", the rest (the middle) as "normal"
mu = 2865
sigma = 663.27
lower_lim = mu - sigma
upper_lim = mu + sigma

precipitation_df["CONDITION"] = [fn.validate_precipitation(i,lower_lim,upper_lim) for i in precipitation_df["MAR-JUL"]]



In [None]:
#Print the amount of conditions seen in the dataframe

dry_count = len(precipitation_df[precipitation_df["CONDITION"]=="dry"])
wet_count = len(precipitation_df[precipitation_df["CONDITION"]=="wet"])
normal_count = len(precipitation_df[precipitation_df["CONDITION"]=="normal"])
print(f'The amount of dry years: {dry_count}\n'
f'The amount of wet years: {wet_count}\n'
f'The amount of normal years: {normal_count}')

In [None]:
#and now prepare the data in the birds df:

birds_df.head()

Some descriptives in the first three and in the last four columns we do not need. 

In [None]:
#drop the first three and the last four columns
birds_df = birds_df.iloc[:,3:-4]

birds_df.head()

In [None]:
#change the values to delta precentage change

birds_df = birds_df.T
birds_df.head()



In [None]:
#replace the data with percentage change, with 1990 as 0. np.inf is converted to zero.
birds_df.iloc[2:,2:] = birds_df.iloc[2:,2:].pct_change().replace({np.inf:0})
birds_df.head(10)

