# Yearly Observations

Het jaarlijkse totale aantal vogelobservaties stijgt in de loop van de tijd. Waarnemingen.be verkrijgt steeds meer data, en steeds meer mensen loggen hun waarnemingen op waarnemingen.be. Meer waarnemingen van een vogelsoort in de tijd wil dus niet zeggen dat deze vogelsoort meer voorkomt, maar vooral dat deze waarnemingen meer gelogged worden.

Om de jaarlijkse stijging in waarnemingen op te vangen evalueren we ook het het aandeel van onze soort ten opzichte van het totaal aantal vogel observaties in dat jaar.
Als observatie waarde gebruiken we het aandeel van de observaties van een bepaalde soort tov 1 000 000 observaties van alle vogels in dat jaar. Dit geven we aan met de extensie _pym (per yearly million).

Hiervoor gebruiken we de observaties van onze te bestuderen soorten, en het jaarlijkse totaal van vogelwaarnemingen.

We gaan ervan uit dat het aandeel van elke vogelsoort ten opzichte van elkaar constant blijft als de populatie constant blijft. Als het aandeel van een bepaalde vogelsoort stijgt, nemen we aan dat er daadwerkelijk meer vogels van die soort voorkomen. </br>

In [41]:
import pandas as pd

# set the max columns to none
pd.set_option('display.max_columns', None)
# set the max columns to none
pd.set_option('display.max_rows', None)

## Load clean or gold data

In [42]:
yearly = f'../2_cleaning/clean_data/observations_yearly_clean.parquet'
boomklever = f'../3_transformation/gold/observations_bk.parquet'
halsbandparkiet = f'../3_transformation/gold/observations_hp.parquet'

df_yearly_birds = pd.read_parquet(yearly, engine="pyarrow")
df_boomklever = pd.read_parquet(boomklever, engine="pyarrow")
df_halsbandparkiet = pd.read_parquet(halsbandparkiet, engine="pyarrow")

## Load and transform clean data

In [43]:
first_year = df_yearly_birds.index.min()
last_year = df_yearly_birds.index.max()

print(f'Yearly observations from: {first_year} to {last_year}')

# Year with min observation count
min_observations = df_yearly_birds[(df_yearly_birds['allbirds_observation_count'] == df_yearly_birds['allbirds_observation_count'].min())]
min_observation_count = min_observations['allbirds_observation_count'].values[0]
year_min_observation_count = min_observations.index[0]

print(f'Min observation count: {min_observation_count} in {year_min_observation_count}')

# Year with max observation count
max_observations = df_yearly_birds[(df_yearly_birds['allbirds_observation_count'] == df_yearly_birds['allbirds_observation_count'].max())]
max_observation_count = max_observations['allbirds_observation_count'].values[0]
year_max_observation_count = max_observations.index[0]

print(f'Max observation count: {max_observation_count} in {year_max_observation_count}')

Yearly observations from: 1971 to 2024
Min observation count: 2242 in 1971
Max observation count: 3807834 in 2021


In [44]:

## TODO dit kan met herbruikbare code    
# Halsbandparkiet Group observations by year and merge with yearly
df_halsbandparkiet["year"] = df_halsbandparkiet["date"].dt.year 
df_halsbandparkiet_yearly = df_halsbandparkiet.reset_index().groupby("year").agg({'observation_id': 'nunique', 'observer_id': 'nunique'}).rename(columns={'observation_id': 'observation_count', 'observer_id': 'observers_count'})
df_halsbandparkiet_yearly = df_yearly_birds.merge(df_halsbandparkiet_yearly, on='year', how='left')
    # Aandeel per jaarlijks miljoen vogelwaarnemingen
df_halsbandparkiet_yearly['observations_pym'] = df_halsbandparkiet_yearly['observation_count'] * 1_000_000 / df_halsbandparkiet_yearly['allbirds_observation_count'] 
    # 5 jaarlijks gemiddelde 
df_halsbandparkiet_yearly['observations_pym_5yr_avg'] = df_halsbandparkiet_yearly['observations_pym'].rolling(window=5, min_periods=1).mean()
    # % groei over 5 jaar
df_halsbandparkiet_yearly['observations_growth_5yr_%'] = df_halsbandparkiet_yearly['observations_pym_5yr_avg'].pct_change(periods=5) * 100

# Boomklever Group observations by year and merge with yearly
df_boomklever["year"] = df_boomklever["date"].dt.year
df_boomklever_yearly = df_boomklever.reset_index().groupby("year").agg({'observation_id': 'nunique', 'observer_id': 'nunique'}).rename(columns={'observation_id': 'observation_count', 'observer_id': 'observers_count'})
df_boomklever_yearly = df_yearly_birds.merge(df_boomklever_yearly, on='year', how='left')
df_boomklever_yearly['observations_pym'] = df_boomklever_yearly['observation_count'] * 1_000_000 / df_boomklever_yearly['allbirds_observation_count']
    # 5 jaarlijks gemiddelde 
df_boomklever_yearly['observations_pym_5yr_avg'] = df_boomklever_yearly['observations_pym'].rolling(window=5, min_periods=1).mean()
    # % groei over 5 jaar
df_boomklever_yearly['observations_growth_5yr_%'] = df_boomklever_yearly['observations_pym_5yr_avg'].pct_change(periods=5) * 100

# # merge the observation dataframes
df_observations_yearly = pd.merge(df_halsbandparkiet_yearly, df_boomklever_yearly, on=['year', 'allbirds_observation_count'], how='outer', suffixes=("_hp", "_bk"))

# # merge with year
df_observations_yearly.fillna(0, inplace=True) # geen waarnemingen -> 0 ipv NaN
df_observations_yearly.sort_index(ascending=True).head(10)



Unnamed: 0_level_0,allbirds_observation_count,observation_count_hp,observers_count_hp,observations_pym_hp,observations_pym_5yr_avg_hp,observations_growth_5yr_%_hp,observation_count_bk,observers_count_bk,observations_pym_bk,observations_pym_5yr_avg_bk,observations_growth_5yr_%_bk
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1971,2242,3,3,1338.09099,1338.09099,0.0,1.0,1.0,446.03033,446.03033,0.0
1972,5281,9,6,1704.222685,1521.156838,0.0,1.0,1.0,189.358076,317.694203,0.0
1973,6547,3,2,458.225141,1166.846272,0.0,0.0,0.0,0.0,317.694203,0.0
1974,9548,9,7,942.605781,1110.786149,0.0,0.0,0.0,0.0,317.694203,0.0
1975,9115,33,8,3620.405924,1612.710104,0.0,3.0,2.0,329.127811,321.505406,0.0
1976,7035,9,7,1279.317697,1600.955446,19.644737,0.0,0.0,0.0,259.242944,-41.877732
1977,7394,3,2,405.734379,1341.257785,-11.826463,1.0,1.0,135.244793,232.186302,-26.915159
1978,11301,16,9,1415.803911,1532.773539,31.360366,1.0,1.0,88.487744,184.286783,-41.9924
1979,15202,10,8,657.808183,1475.814019,32.862119,5.0,2.0,328.904092,220.44111,-30.612171
1980,23498,38,17,1617.158907,1075.164616,-33.331811,4.0,3.0,170.227253,180.715971,-43.79069


## Write result to parquet-file in "gold" folder

In [46]:
df_observations_yearly.to_parquet(f'./gold/yearly_observations.parquet', engine="pyarrow")