In [1]:
pip install meteostat folium

Collecting meteostat
  Downloading meteostat-1.6.8-py3-none-any.whl.metadata (4.6 kB)
Downloading meteostat-1.6.8-py3-none-any.whl (31 kB)
Installing collected packages: meteostat
Successfully installed meteostat-1.6.8


In [2]:
from datetime import datetime,timedelta,date
import matplotlib.pyplot as plt
from meteostat import Daily, Point, Normals, Monthly, Stations
import numpy
from geopy.distance import geodesic
import folium
import pandas as pd
import math

Just a little code snippet to show how to extract the data from the closest 100 stations araound Salzburg, exclude those stations without information and add the distance to Salzburg to all the observations as additional variable. This can be used as a start to build a weather forcasting model.

In [3]:
#Get the 100 closest stations to Salzburg, drop those without data and add the distance to Salzburg as additional parameter.
#This can be used for forcasting methods (as likely stations further away have less influence on the waether in Salzburg)
coord=(47.792,13.0477)
day=datetime(2024,5,20)

stations = Stations()
stations = stations.nearby(coord[0],coord[1])
stationSbg2 = stations.fetch(100)

test=stationSbg2.dropna()
indexes=test.index

for i in indexes:
  data=Daily(str(i),day-timedelta(10000),day)
  data=data.fetch()
  stat=stationSbg2.loc[str(i)]
  locatio=(stat['latitude'],stat['longitude'])
  distance=geodesic(coord,locatio).km
  #print(distance)
  data=data.assign(distance=distance)
  #print(i)
  print(data)

            tavg  tmin  tmax  prcp  snow   wdir  wspd  wpgt    pres  tsun  \
time                                                                        
1997-01-02 -10.6 -15.3  -6.9   0.0  51.0    NaN   3.0   NaN  1022.2   NaN   
1997-01-03  -6.9  -9.4  -2.4   0.0  51.0    NaN   NaN   NaN     NaN   NaN   
1997-01-04  -4.7 -10.8  -2.1   1.0  51.0    NaN   NaN   NaN     NaN   NaN   
1997-01-05  -1.3  -2.9   0.3   0.0  79.0    NaN   4.2   NaN  1010.6   NaN   
1997-01-06  -1.6  -1.8  -0.3   0.0  61.0  314.0   5.8   NaN  1017.5   NaN   
...          ...   ...   ...   ...   ...    ...   ...   ...     ...   ...   
2024-05-16  19.2  16.7  22.3   0.0   NaN  117.0  17.7  47.0  1003.8   NaN   
2024-05-17  13.7  11.2  16.9   3.1   NaN    2.0  11.7  38.9  1007.2   NaN   
2024-05-18  16.0  10.1  21.9   0.9   NaN   29.0  11.1  31.3  1011.2   NaN   
2024-05-19  16.7  11.2  22.0   1.2   NaN    6.0  11.7  55.4  1011.4   NaN   
2024-05-20  18.2  11.8  23.9   3.0   NaN   40.0  13.1  37.1  1009.6   NaN   

We want to analyse climate change on a local scale, so that people can see how climate change affected different areas. Therefore, we need a function that shows for any given reference period and spatial input data by how much the past year differed from the reference period in that area. Therefore, we hand over a grid that consists of real-world coordinates and the start of a reference period. Then, for every point in the grid we look for the closest station that has climate normals data starting in the given year. This climate normals data is our reference of how warm it used to be (on average) in that location. Then we compare the temperature in the past year with the temperature in the reference period.

So we get a data frame that stores for any geographic location by how much the average temperature in the past year differed from the normal temperature in the reference period.

Climate normals take into account a period of 30 years, which is the period suggested by the World Meteorological Organization. Thus it makes sense to stick to this definition, not only for convenience as we can use the function built into meteostat, but also for scientific consistency.

This data could then be used for further analysis or to create some kind of weather map, visualising climate change on a local scale.

In [4]:
#First, we finde the closest station to given coordinates for that climate normals are available for the given time period.
#x,y are latitude and longitude coordinates (e.g 8.753), year is the start of the reference period (e.g. 1961)
def nearest_station(x,y,year):
  stations = Stations()
  stations = stations.nearby(x,y)
  testi = stations.fetch() #all the stations ordered by how close they are to the given coordiantes
  for i in testi.index:
      if math.isnan(Normals(i,year,year+29).fetch()['tavg'].mean())==False: #check if there is a mean temperature of the average temperature in the climate normals. This serves as a not-yet perfect but still okayish condition for the data to be available
        a=i #if this station has climate normals for the given period, store it
        break #we found the closest station satisfying the condition, end search
  return a #id of the station is the output

#Then, we check by how far the temperature changed at a given location  compared to a reference period.
def temp_change(x,y,tstart):
  tend=tstart+29 #reference period has a length of 30 years
  data3=Normals(nearest_station(x,y,tstart),tstart,tend).fetch() #get the climate normals from the closest station to the location
  #we compare this with the past year, starting one month before today. On the one hand, this gives us very recent data, on the other hand it increaes the possibility that the data is already available.
  startdate=datetime.combine(date.today(), datetime.min.time())-timedelta(394) #bring the times in the right format for the functions to work
  data4=Monthly(nearest_station(x,y,tstart),startdate,datetime.combine(date.today(), datetime.min.time())-timedelta(30)).fetch() #Get the data from the past year
  data5=data4['tavg'].mean()-data3['tavg'].mean() #difference of mean tmeperature from reference period and the past year. 'tavg' averages over the days, mean() averages over the months
  return data5

#We combine this together. grid has to be some data frame, containing geographic coordinates, tstart is the starting year of the reference period
def gridapply(grid,tstart):
  grid['station_id']=0 #initialise the id of the closest station
  grid['tempchange']=-999 #initialise the temperature change. Value chosen such that algorithmic errors not producing tempchanges are easily found
  for i in grid.index: #for every location in the grid
    grid['station_id'].iloc[i]=nearest_station(grid['longitude'].iloc[i],grid['latitude'].iloc[i],tstart) #find the nearest station having the desired climate normals
    grid['tempchange'].iloc[i]=temp_change(grid['longitude'].iloc[i],grid['latitude'].iloc[i],tstart) #calculate the temperature change of interest
  return grid




Now we want to test our function. Therefore, we create a grid that is spanned over the world in steps of 20 degree (both longitude and latitude). We then check for every point how the temperature has changed compared to 1961-1990.

Some of the points share the closest station and thus the temperature changes. This is due to the distribution of the stations (especially on the oceans, ehere are none to almost none). So we must take the results with a grain of salt in those cases. However, wherever the nearest station is close to the location, we get useful results.

In [5]:
long=numpy.arange(-180,180,20)
long=numpy.repeat(long,len(long))
lat=numpy.arange(-180,180,20)
lat=numpy.tile(lat,len(lat)) #By using tile and repeat we ensure that the grid long x lat contains all possible combinations and thus is really the desired grid
grid=pd.DataFrame()
grid['latitude']=lat
grid['longitude']=long
gridapply(grid,1961)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,latitude,longitude,station_id,tempchange
0,-180,-180,65380,0.200000
1,-160,-180,64458,0.750000
2,-140,-180,63740,1.000000
3,-120,-180,63980,0.812821
4,-100,-180,43497,0.150000
...,...,...,...,...
319,80,160,76675,2.700000
320,100,160,78384,2.535256
321,120,160,78897,0.137179
322,140,160,81405,-0.642308
