# Kazakhstan Climate Change Research Project

``` 
    The main goal of this project is to analyze how the climate in Kazakhstan is changing over time 
and to explore regional climate trends. My hypothesis is that global warming affects Kazakhstan 
faster than many other regions.
``` 

### Project Plan

1. Load historical weather data using Meteostat.
2. Select meteorological stations covering different regions of Kazakhstan.
3. Compare Kazakhstan’s climate data with Europe and the USA.

#### Load weather information 

In [1]:
import pandas as pd
from meteostat import Stations

In [14]:

stations = Stations()
stations = stations.region('KZ')

'Stations in KZ:', stations.count()

('Stations in KZ:', 129)

In [15]:
meteostations = stations.fetch()

In [16]:
meteostations.head()

Unnamed: 0_level_0,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,monthly_start,monthly_end
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
28676,Petropavlovsk / Bishkul',KZ,SEV,28676,,54.7925,69.1178,91.0,Asia/Almaty,NaT,NaT,1890-11-19,2025-08-22,1890-01-01,2021-01-01
28679,Petropavlovsk,KZ,SEV,28679,UACP,54.8333,69.15,136.0,Asia/Qyzylorda,2005-01-01,2025-12-09,2020-01-18,2022-04-01,1994-01-01,2012-01-01
28687,Bulayevo / B?laevo,KZ,SEV,28687,,54.9,70.45,134.0,Asia/Qyzylorda,1959-01-01,2021-01-02,1955-01-01,2020-12-31,1955-01-01,2020-01-01
28766,Blacoveschenka,KZ,SEV,28766,,54.3667,66.9667,153.0,Asia/Qyzylorda,1948-01-01,2025-04-02,1936-10-01,2021-03-18,1936-01-01,2021-01-01
28867,Uricky,KZ,KUS,28867,,53.3167,65.55,210.0,Asia/Aqtobe,1953-02-16,2025-04-02,1937-01-01,2021-06-25,1937-01-01,2021-01-01


In [6]:
meteostations.info()

<class 'pandas.core.frame.DataFrame'>
Index: 129 entries, 28676 to UAUU0
Data columns (total 15 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   name           129 non-null    object        
 1   country        129 non-null    string        
 2   region         129 non-null    string        
 3   wmo            120 non-null    string        
 4   icao           19 non-null     string        
 5   latitude       129 non-null    float64       
 6   longitude      129 non-null    float64       
 7   elevation      129 non-null    float64       
 8   timezone       129 non-null    string        
 9   hourly_start   104 non-null    datetime64[ns]
 10  hourly_end     104 non-null    datetime64[ns]
 11  daily_start    128 non-null    datetime64[ns]
 12  daily_end      128 non-null    datetime64[ns]
 13  monthly_start  126 non-null    datetime64[ns]
 14  monthly_end    126 non-null    datetime64[ns]
dtypes: datetime64[ns](6), 

### For better coverage and data continuity, I selected a set of stations using their WMO IDs:

| name          | country | region | wmo   | latitude | longitude | elevation | timezone        | hourly_start | hourly_end  |
|---------------|---------|--------|-------|----------|-----------|-----------|------------------|--------------|-------------|
| Shymkent      | KZ      | YUZ    | 38328 | 42,3167  | 69,7      | 552       | Asia/Qyzylorda   | 01.01.1948   | 22.11.2025  |
| Semipalatinsk | KZ      | VOS    | 36177 | 50,4167  | 80,3      | 196       | Asia/Almaty      | 01.01.1932   | 22.11.2025  |
| Aktjubinsk    | KZ      | AKT    | 35229 | 50,2833  | 57,15     | 227       | Asia/Aqtobe      | 01.01.1932   | 22.11.2025  |
| Panfilov      | KZ      | ALM    | 36859 | 44,1667  | 80,0667   | 640       | Asia/Almaty      | 14.01.1946   | 02.04.2025  |
| Balhash       | KZ      | KAR    | 35796 | 46,8     | 75,0833   | 352       | Asia/Almaty      | 01.01.1948   | 02.04.2025  |
| Atbasar       | KZ      | AKM    | 35078 | 51,8167  | 68,3667   | 308       | Asia/Qyzylorda   | 01.01.1948   | 02.04.2025  |


In [9]:
wmo_lists = ['38328','36177','35229','36859','35796','35078']

In [12]:
from meteostat import Hourly
from datetime import datetime

In [13]:
all_data = []
start = datetime(1950, 1, 1)
end = datetime(2025, 1, 1)
for station_id in wmo_lists:
    print(f"Loading: {station_id}")
    data = Hourly(station_id, start, end).fetch()
    data['station'] = station_id
    all_data.append(data)


df_weather = pd.concat(all_data)

Loading: 38328




Loading: 36177




Loading: 35229




Loading: 36859




Loading: 35796




Loading: 35078




In [14]:
df_weather.head()

Unnamed: 0_level_0,temp,dwpt,rhum,prcp,snow,wdir,wspd,wpgt,pres,tsun,coco,station
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1950-01-01 00:00:00,-2.2,-2.8,96.0,,,250.0,42.5,,1022.6,,,38328
1950-01-01 06:00:00,-2.2,,,,,220.0,25.9,,1027.0,,,38328
1950-01-01 12:00:00,-2.2,-3.9,88.0,,,0.0,0.0,,1026.0,,,38328
1950-01-01 18:00:00,-3.9,-6.0,85.0,,,40.0,3.6,,1023.9,,,38328
1950-01-02 00:00:00,-3.9,-11.2,57.0,,,90.0,11.2,,1024.4,,,38328


In [15]:
df_weather.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1392038 entries, 1950-01-01 00:00:00 to 2025-01-01 00:00:00
Data columns (total 12 columns):
 #   Column   Non-Null Count    Dtype  
---  ------   --------------    -----  
 0   temp     1382975 non-null  Float64
 1   dwpt     1373904 non-null  Float64
 2   rhum     1373904 non-null  Float64
 3   prcp     203992 non-null   Float64
 4   snow     0 non-null        Float64
 5   wdir     1359628 non-null  Float64
 6   wspd     1356750 non-null  Float64
 7   wpgt     0 non-null        Float64
 8   pres     1200715 non-null  Float64
 9   tsun     0 non-null        Float64
 10  coco     186547 non-null   Float64
 11  station  1392038 non-null  object 
dtypes: Float64(11), object(1)
memory usage: 152.7+ MB


In [None]:
df_weather.to_parquet("df_Daily.parquet", index=True)

In [17]:
meteostations=meteostations[meteostations['wmo'].isin(wmo_lists)]

In [18]:
meteostations.to_parquet("kz_meteostations_ref.parquet", index=True)

## Step 2. Comparative climate analysis

In this step, we compare long-term temperature trends in Kazakhstan with several reference regions that represent different climate drivers:

- **Western Europe (North Atlantic region)**  
  Chosen to evaluate the stabilizing effect of the Atlantic Ocean and the Gulf Stream on long-term temperature variability.

- **Central United States**  
  Selected as a continental climate analogue to Kazakhstan, allowing assessment of temperature trends under similar land-dominated conditions.

- **Japan / South Korea**  
  Representing a Pacific ocean-influenced climate, characterized by strong oceanic and monsoonal effects.

The objective of this comparison is to evaluate whether continental regions exhibit stronger long-term warming trends and higher temperature variability compared to ocean-influenced regions.


### European Atlantic reference stations

To represent Atlantic-influenced European climates, two coastal stations at contrasting latitudes were selected:

- **Bergen (Norway)** — representing a cold, strongly oceanic climate with significant influence from the Gulf Stream.
- **Lisbon (Portugal)** — representing a warm Atlantic climate with partial Mediterranean characteristics.

Although Lisbon exhibits seasonal dry periods typical of southern Europe, both stations are directly influenced by the Atlantic Ocean. This allows evaluation of how ocean-driven climate systems maintain relative temperature stability across different latitudes, compared to continental regions such as Kazakhstan.


In [25]:

stations = Stations()
stations = stations.region('NO')

'Stations in Norway:', stations.count()

('Stations in Norway:', 192)

In [26]:
meteostations_No = stations.fetch()

In [27]:
meteostations_No

Unnamed: 0_level_0,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,monthly_start,monthly_end
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
01001,Jan Mayen,NO,,01001,ENJA,70.9333,-8.6667,10.0,Europe/Oslo,1931-01-01,2025-03-20,1921-12-31,2025-10-30,1922-01-01,2022-01-01
01002,Grahuken,NO,SJ,01002,,79.7833,14.4667,0.0,Europe/Oslo,1986-11-09,2025-03-20,2010-10-07,2020-08-17,NaT,NaT
01003,Hornsund,NO,,01003,,77.0000,15.5000,10.0,Europe/Oslo,1985-06-01,2025-03-20,2009-11-26,2020-08-31,2016-01-01,2017-01-01
01004,New Alesund II,NO,SJ,01004,ENAS,78.9167,11.9333,8.0,Europe/Oslo,1973-01-01,2014-05-23,1968-12-31,1997-03-01,1969-01-01,1974-01-01
01005,Barentsburg,NO,SJ,01005,,78.0667,13.6333,9.0,Arctic/Longyearbyen,NaT,NaT,NaT,NaT,1951-01-01,1980-01-01
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ENQC0,Troll C Platform,NO,,,ENQC,60.8858,3.6097,-9999.0,Europe/Oslo,2011-05-12,2025-05-14,2011-05-13,2022-03-06,2011-01-01,2021-01-01
ENSF0,Statfjord Oil Rig,NO,,,ENSF,61.2167,1.8333,15.0,Europe/Oslo,1984-12-04,2025-05-14,1987-09-21,2022-03-06,2010-01-01,2020-01-01
ENSL0,Sleipner A Platform,NO,,,ENSL,58.3667,1.9069,-9999.0,Europe/Oslo,2009-02-01,2025-05-14,2010-12-08,2022-03-06,2020-01-01,2021-01-01
ENSO0,Stord / Foldrøyhamn,NO,HO,,ENSO,59.8000,5.3500,49.0,Europe/Oslo,1986-11-25,2025-12-14,2020-01-20,2022-04-26,NaT,NaT


In [28]:
meteostations_No = meteostations_No[
    meteostations_No['name'].str.contains('Bergen', case=False, na=False)
]

In [29]:
meteostations_No=meteostations_No.reset_index()

In [30]:
meteostations_No

Unnamed: 0,id,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,monthly_start,monthly_end
0,1317,Bergen / Florida,NO,HO,1317,,60.3833,5.3333,12.0,Europe/Oslo,1973-01-01,2025-03-21,1957-01-01,2025-10-31,1957-01-01,2022-01-01


In [33]:

stations = Stations()
stations = stations.region('PT')

'Stations in Portugal:', stations.count()

('Stations in Portugal:', 39)

In [34]:
meteostations_Pt = stations.fetch()

In [35]:
meteostations_Pt

Unnamed: 0_level_0,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,monthly_start,monthly_end
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
08000,Água de Pena,PT,MA,8000.0,LPMA,32.7,-16.77,60.0,Atlantic/Madeira,2004-05-10,2025-12-15,2004-05-11,2022-04-25,2008-01-01,2022-01-01
08501,Flores Acores,PT,AZ,8501.0,LPFL,39.45,-31.1333,28.0,Atlantic/Azores,1978-12-01,2025-12-15,1978-12-01,2025-08-24,1980-01-01,2022-01-01
08502,Corvo Acores,PT,AZ,8502.0,LPCR,39.6667,-31.1167,18.0,Europe/Lisbon,2003-10-23,2025-02-04,2021-06-03,2022-03-25,NaT,NaT
08505,Horta / Castelo Branco Acores,PT,AZ,8505.0,LPHR,38.5167,-28.7167,40.0,Atlantic/Azores,1976-02-01,2025-12-15,1999-01-02,2022-04-24,2020-01-01,2022-01-01
08506,Horta Acores,PT,AZ,8506.0,,38.5167,-28.6333,60.0,Atlantic/Azores,NaT,NaT,1973-01-01,2024-10-14,1949-01-01,2021-01-01
08509,Lajes Acores,PT,AZ,8509.0,LPLA,38.7667,-27.1,52.0,Atlantic/Azores,1947-01-01,2025-12-15,1947-01-01,2025-08-24,1947-01-01,2022-01-01
08512,Ponta Delgada / Nordela Acores,PT,AZ,8512.0,LPPD,37.7333,-25.7,71.0,Atlantic/Azores,1931-01-03,2025-12-15,1973-01-02,2023-12-30,1949-01-01,2022-01-01
08513,Ponta Delgada / Obs. Acores,PT,AZ,8513.0,,37.75,-25.6667,35.0,Atlantic/Azores,NaT,NaT,NaT,NaT,1980-01-01,2008-01-01
08515,Santa Maria Acores,PT,AZ,8515.0,LPAZ,36.9667,-25.1667,100.0,Atlantic/Azores,1944-08-07,2025-12-15,1944-08-07,2025-08-24,1944-01-01,2022-01-01
08521,Funchal / S. Catarina,PT,MD,8521.0,LPFU,32.6833,-16.7667,58.0,Atlantic/Madeira,1948-01-11,2025-12-15,1973-01-01,2025-08-01,1949-01-01,2021-01-01


In [36]:
meteostations_Pt=meteostations_Pt[meteostations_Pt['region']=='LI']

In [37]:
meteostations_Pt

Unnamed: 0_level_0,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,monthly_start,monthly_end
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
08532,Sintra / Granja,PT,LI,8532.0,LPST,38.8333,-9.3333,134.0,Europe/Lisbon,1990-10-01,2025-12-12,2006-01-12,2010-11-20,NaT,NaT
08535,Lisboa / Geof,PT,LI,8535.0,,38.7167,-9.15,77.0,Europe/Lisbon,NaT,NaT,1900-12-31,2025-05-19,1901-01-01,2021-01-01
08536,Lisboa / Portela,PT,LI,8536.0,LPPT,38.7667,-9.1333,114.0,Europe/Lisbon,1931-01-03,2025-12-15,1973-01-13,2022-04-25,1949-01-01,2022-01-01
08537,Alverca,PT,LI,8537.0,LPAR,38.8833,-9.0333,3.0,Europe/Lisbon,1999-01-04,2025-12-14,NaT,NaT,NaT,NaT
08539,Ota,PT,LI,8539.0,LPOT,39.1167,-8.9833,41.0,Europe/Lisbon,2024-12-13,2024-12-13,NaT,NaT,NaT,NaT
08579,Lisboa / Gago Coutinho,PT,LI,8579.0,,38.7667,-9.1333,104.0,Europe/Lisbon,2018-01-27,2025-06-16,NaT,NaT,NaT,NaT
LPCS0,Cascais / Tires,PT,LI,,LPCS,38.725,-9.3552,99.0,Europe/Lisbon,2020-01-14,2025-12-14,2020-01-15,2022-04-24,2020-01-01,2021-01-01


In [39]:
meteostations_Pt=meteostations_Pt[meteostations_Pt['name']=='Lisboa / Geof']

In [41]:
meteostations_Pt=meteostations_Pt.reset_index()

In [42]:
meteostations_Pt

Unnamed: 0,id,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,monthly_start,monthly_end
0,8535,Lisboa / Geof,PT,LI,8535,,38.7167,-9.15,77.0,Europe/Lisbon,NaT,NaT,1900-12-31,2025-05-19,1901-01-01,2021-01-01


In [43]:
meteostations_Eu=pd.concat([meteostations_No,meteostations_Pt])

In [44]:
meteostations_Eu

Unnamed: 0,id,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,monthly_start,monthly_end
0,1317,Bergen / Florida,NO,HO,1317,,60.3833,5.3333,12.0,Europe/Oslo,1973-01-01,2025-03-21,1957-01-01,2025-10-31,1957-01-01,2022-01-01
0,8535,Lisboa / Geof,PT,LI,8535,,38.7167,-9.15,77.0,Europe/Lisbon,NaT,NaT,1900-12-31,2025-05-19,1901-01-01,2021-01-01


## Central USA reference stations

To represent the strongly continental climate of central North America, two stations at contrasting latitudes were selected:

- **Northern station (e.g., Minneapolis, Minnesota)** — representing a cold continental climate with long winters, significant temperature variation, and moderate precipitation.
- **Southern station (e.g., Dallas, Texas)** — representing a warmer continental climate with hotter summers, milder winters, and a different precipitation regime.

Although both stations are located in the continental interior and far from oceans, they provide a useful contrast in latitude and climate intensity. This allows evaluation of how continental climate systems vary with latitude and seasonal extremes, which can then be compared to Kazakhstan’s similarly continental climate patterns.


In [45]:
stations = Stations()
stations = stations.region('US')

'Stations in USA:', stations.count()

('Stations in USA:', 2701)

In [46]:
meteostations_USA = stations.fetch()

In [47]:
meteostations_USA

Unnamed: 0_level_0,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,monthly_start,monthly_end
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
04AEH,Norwich,US,NY,,KOIC,42.5665,-75.5242,312.0,America/New_York,2022-04-23,2025-12-15,2022-04-23,2022-04-26,NaT,NaT
0MV8M,Hill Air Force Base,US,UT,,KHIF,41.1111,-111.9623,1459.0,America/Denver,2022-04-23,2025-12-15,2022-04-23,2022-04-25,NaT,NaT
0NNEW,Effingham County Memorial Airport,US,IL,,K1H2,39.0706,-88.5333,179.0,America/Chicago,2022-05-06,2025-12-15,NaT,NaT,NaT,NaT
0OBKP,Live Oak County Airport,US,TX,,K8T6,28.3628,-98.1165,39.0,America/Chicago,2022-05-06,2025-12-15,NaT,NaT,NaT,NaT
0RJDR,Hotel (Gurley),US,NE,,K1HW,41.3200,-102.8300,1263.0,America/Denver,2022-05-06,2025-12-15,NaT,NaT,NaT,NaT
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ZGK9P,Victorville Airport,US,CA,,KVCV,34.5972,-117.3828,876.0,America/Los_Angeles,2022-04-23,2025-12-14,2022-04-23,2022-04-23,NaT,NaT
ZJ8AR,India - Sidney,US,NE,,K1IW,41.0500,-102.8700,1293.0,America/Denver,2022-05-06,2025-12-15,NaT,NaT,NaT,NaT
ZNWZW,Columbus Municipal Airport,US,NE,,KOLU,41.4500,-97.3333,440.0,America/Chicago,2022-04-23,2025-12-15,2022-04-24,2022-04-24,NaT,NaT
ZUQJS,Ephraim-Gibraltar Airport,US,WI,,K3D2,45.1357,-87.1881,234.0,America/Menominee,2022-05-06,2025-12-15,NaT,NaT,NaT,NaT


In [49]:
meteostations_USA=meteostations_USA[meteostations_USA['region'].isin(['MI','TX'])]

In [56]:
meteostations_USA_TX = meteostations_USA[
    (meteostations_USA['region'] == 'TX') & 
    (meteostations_USA['daily_start'].notna())
]

In [59]:
meteostations_USA_TX=meteostations_USA_TX.sort_values(by='daily_start')
meteostations_USA_TX.head(20)

Unnamed: 0_level_0,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,monthly_start,monthly_end
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
72351,Wichita Falls Sheppard Afb,US,TX,72351.0,KSPS,33.9833,-98.4928,309.0,America/Chicago,1942-04-13,2025-12-15,1897-01-01,2025-12-10,1897-01-01,2022-01-01
72250,Brownsville/South Padre Is,US,TX,72250.0,KBRO,25.9,-97.4231,7.0,America/Chicago,1947-01-01,2025-12-15,1898-12-01,2025-12-10,1898-01-01,2022-01-01
72263,Mathis Field,US,TX,72263.0,KSJT,31.35,-100.5,584.0,America/Chicago,1948-02-15,2025-12-15,1907-08-01,2025-12-10,1907-01-01,2022-01-01
72260,"Stephenville, Clark Field Municipal Airport",US,TX,72260.0,KSEP,32.2167,-98.1833,403.0,America/Chicago,1984-01-01,2025-12-15,1918-05-01,2025-12-10,1918-01-01,2022-01-01
72265,Midland International Airport,US,TX,72265.0,KMAF,31.9333,-102.2,875.0,America/Chicago,1948-02-01,2025-12-15,1930-06-01,2025-12-10,1930-01-01,2022-01-01
72244,Houston / Olcott,US,TX,72244.0,KEFD,29.6073,-95.1587,10.0,America/Chicago,1941-07-01,2025-12-15,1930-08-01,2025-12-10,1930-01-01,2022-01-01
KSKF0,San Antonio / Lackland Air Force Base,US,TX,,KSKF,29.3842,-98.5812,210.0,America/Chicago,1937-07-07,2025-12-15,1937-08-26,2022-04-24,1937-01-01,2021-01-01
72270,El Paso International Airport,US,TX,72270.0,KELP,31.8167,-106.3833,1206.0,America/Denver,1941-04-01,2025-12-15,1938-04-01,2025-12-10,1938-01-01,2022-01-01
KRND0,San Antonio / Universal City / Airport City,US,TX,,KRND,29.5289,-98.278,232.0,America/Chicago,1938-04-01,2025-12-15,1938-04-02,2022-04-25,1938-01-01,2021-01-01
72254,Camp Mabry/Austin City Asos,US,TX,72254.0,KATT,30.3167,-97.7667,201.0,America/Chicago,2000-01-01,2025-12-15,1938-06-01,2025-12-10,1938-01-01,2022-01-01


In [60]:
meteostations_USA_TX=meteostations_USA_TX[meteostations_USA_TX['name']=='Dallas / Oldham']
meteostations_USA_TX

Unnamed: 0_level_0,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,monthly_start,monthly_end
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
72258,Dallas / Oldham,US,TX,72258,KDAL,32.8471,-96.8518,148.0,America/Chicago,1946-07-26,2025-12-15,1939-08-01,2025-12-10,1939-01-01,2022-01-01


In [61]:
meteostations_USA_MI= meteostations_USA[
    (meteostations_USA['region'] == 'MI') & 
    (meteostations_USA['daily_start'].notna())
]

In [62]:
meteostations_USA_MI

Unnamed: 0_level_0,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,monthly_start,monthly_end
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
72019,Munsing Lakeshore,US,MI,72019,KP53,46.4167,-86.6500,187.0,America/Detroit,2006-01-01,2025-12-15,2006-01-01,2022-04-24,2006-01-01,2022-01-01
72034,Frankfort / Elberta,US,MI,72034,KFKS,44.6252,-86.2008,193.0,America/Detroit,2006-01-01,2025-12-15,2006-01-01,2022-04-25,2006-01-01,2022-01-01
72537,Detroit Metropolitan,US,MI,72537,KDTW,42.2333,-83.0000,195.0,America/Detroit,1942-08-01,2025-12-15,1942-08-01,2025-08-27,1942-01-01,2022-01-01
72538,Copper Harbor,US,MI,72538,KP59,47.4667,-87.8833,191.0,America/Detroit,2006-01-01,2025-12-15,2004-11-11,2025-12-10,2004-01-01,2022-01-01
72539,Capital City Airport,US,MI,72539,KLAN,42.7833,-84.5833,262.0,America/Detroit,1973-01-01,2025-12-15,1948-01-01,2025-12-10,1948-01-01,2022-01-01
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
KY310,West Branch / Twin Pines Mobile Home Park,US,MI,,KY31,44.2448,-84.1798,269.0,America/Detroit,2020-01-14,2025-12-15,2020-01-14,2022-04-25,2020-01-01,2022-01-01
KY700,Ionia / Canterbury Estates Mobile Home Park,US,MI,,KY70,42.9380,-85.0606,249.0,America/Detroit,2014-07-31,2025-12-15,2014-08-01,2022-04-25,2020-01-01,2022-01-01
KYIP0,Detroit / Denton,US,MI,,KYIP,42.2393,-83.5310,218.0,America/Detroit,1973-01-01,2025-12-15,1973-01-03,2022-04-24,2005-01-01,2022-01-01
NO755,Luce County Airport,US,MI,,KERY,46.3111,-85.4572,265.0,America/Detroit,2022-04-23,2025-12-15,2022-04-23,2022-04-25,NaT,NaT


In [63]:
meteostations_USA_MI=meteostations_USA_MI.sort_values(by='daily_start')
meteostations_USA_MI.head(20)

Unnamed: 0_level_0,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,monthly_start,monthly_end
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
72743,Marquette,US,MI,72743.0,KMQT,46.5333,-87.55,434.0,America/Detroit,1973-01-01,2025-04-30,1857-09-01,2025-12-08,1857-01-01,2022-01-01
72648,Escanaba,US,MI,72648.0,KESC,45.75,-87.0333,180.0,America/Detroit,2006-01-01,2025-12-15,1871-05-25,2025-12-11,1948-01-01,2022-01-01
72636,Muskegon County Airport,US,MI,72636.0,KMKG,43.1667,-86.2333,191.0,America/Detroit,1973-01-01,2025-12-15,1896-06-02,2025-12-10,1896-01-01,2022-01-01
72639,Alpena County Regional Airport,US,MI,72639.0,KAPN,45.0833,-83.5667,210.0,America/Detroit,1973-01-01,2025-12-15,1916-10-21,2025-12-10,1916-01-01,2022-01-01
72734,Sault Ste. Marie Muni,US,MI,72734.0,KANJ,46.4833,-84.35,218.0,America/Detroit,1973-01-01,2025-12-15,1931-01-01,2025-12-10,1931-01-01,2022-01-01
72537,Detroit Metropolitan,US,MI,72537.0,KDTW,42.2333,-83.0,195.0,America/Detroit,1942-08-01,2025-12-15,1942-08-01,2025-08-27,1942-01-01,2022-01-01
KBTL0,Battle Creek / Avenue A Mobile Home Estates,US,MI,,KBTL,42.3073,-85.2515,290.0,America/Detroit,1942-10-01,2025-12-15,1942-11-02,2022-04-24,1942-01-01,2022-01-01
KOSC0,Oscoda / Whispering Woods Mobile Home Community,US,MI,,KOSC,44.4515,-83.3942,193.0,America/Detroit,1943-07-01,2025-12-09,1943-07-02,2022-04-25,1943-01-01,2022-01-01
72539,Capital City Airport,US,MI,72539.0,KLAN,42.7833,-84.5833,262.0,America/Detroit,1973-01-01,2025-12-15,1948-01-01,2025-12-10,1948-01-01,2022-01-01
72744,Hancock Houghton Cty. Memo,US,MI,72744.0,KCMX,47.1667,-88.4833,334.0,America/Detroit,1973-01-01,2025-12-15,1948-01-01,2025-12-10,1948-01-01,2022-01-01


In [64]:
meteostations_USA_MI=meteostations_USA_MI[meteostations_USA_MI['name']=='Marquette']
meteostations_USA_MI

Unnamed: 0_level_0,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,monthly_start,monthly_end
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
72743,Marquette,US,MI,72743,KMQT,46.5333,-87.55,434.0,America/Detroit,1973-01-01,2025-04-30,1857-09-01,2025-12-08,1857-01-01,2022-01-01


In [65]:
meteostations_USA=pd.concat([meteostations_USA_TX,meteostations_USA_MI])

In [68]:
meteostations_USA=meteostations_USA.reset_index()

In [69]:
meteostations_USA

Unnamed: 0,id,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,monthly_start,monthly_end
0,72258,Dallas / Oldham,US,TX,72258,KDAL,32.8471,-96.8518,148.0,America/Chicago,1946-07-26,2025-12-15,1939-08-01,2025-12-10,1939-01-01,2022-01-01
1,72743,Marquette,US,MI,72743,KMQT,46.5333,-87.55,434.0,America/Detroit,1973-01-01,2025-04-30,1857-09-01,2025-12-08,1857-01-01,2022-01-01


## Japanese reference stations  
To capture the variation in Pacific-influenced climates across Japan, two stations at contrasting latitudes were selected:  

- Sapporo (Hokkaido) — representing a cold, snowy climate in northern Japan with a stronger continental influence during winter months.  
- Tokyo (Honshu) — representing a temperate climate in central-southern Japan with significant Pacific Ocean and monsoon influence, resulting in milder winters and warmer, humid summers.  

These two stations allow analysis of how ocean-driven climate systems in the Pacific region influence seasonal temperature stability across different latitudes, providing a point of comparison with continental Kazakhstan and Atlantic-influenced European stations.


In [70]:
stations = Stations()
stations = stations.region('JP')

'Stations in Japan:', stations.count()

('Stations in Japan:', 278)

In [71]:
meteostations_Japan = stations.fetch()

In [74]:
meteostations_Japan

Unnamed: 0_level_0,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,monthly_start,monthly_end
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
47218,Tomisato,JP,HK,47218,RJEO,42.0700,139.4500,36.0,Asia/Tokyo,1975-05-21,1999-02-28,NaT,NaT,NaT,NaT
47401,Wakkanai,JP,HK,47401,,45.4167,141.6833,3.0,Asia/Tokyo,NaT,NaT,1951-01-01,2024-09-28,1938-01-01,2021-01-01
47402,Kitamiesashi,JP,HK,47402,,44.9333,142.5833,7.0,Asia/Tokyo,NaT,NaT,1951-01-01,2025-06-09,1951-01-01,2020-01-01
47404,Haboro,JP,HK,47404,,44.3667,141.7000,8.0,Asia/Tokyo,NaT,NaT,1951-01-01,2025-06-09,1951-01-01,2021-01-01
47405,Omu,JP,HK,47405,,44.5833,142.9667,14.0,Asia/Tokyo,NaT,NaT,1951-01-01,2025-06-09,1951-01-01,2020-01-01
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
RORH0,Hateruma,JP,ON,,RORH,24.0667,123.8000,13.0,Asia/Tokyo,1979-09-13,1999-01-29,NaT,NaT,NaT,NaT
RORK0,Kitadaito Island / Nakanoku,JP,,,RORK,25.9167,131.3333,22.0,Asia/Tokyo,1980-05-02,2025-12-15,2019-03-19,2022-04-25,2020-01-01,2021-01-01
RORS0,Shimoji-Shima Island / Shimojishima / shimochi,JP,ON,,RORS,24.8333,125.1500,16.0,Asia/Tokyo,1979-10-16,2025-12-15,2020-05-22,2022-04-01,NaT,NaT
RORT0,Tarama Island,JP,,,RORT,24.6500,124.7000,9.0,Asia/Tokyo,1979-10-12,2025-12-15,NaT,NaT,NaT,NaT


In [76]:
meteostations_Japan= meteostations_Japan[meteostations_Japan['daily_start'].notna()].sort_values(by='daily_start')

In [78]:
meteostations_Japan=meteostations_Japan[meteostations_Japan['region'].isin(['HK','TK'])]

In [79]:
meteostations_Japan

Unnamed: 0_level_0,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,monthly_start,monthly_end
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
47420,Nemuro,JP,HK,47420.0,,43.3333,145.5833,25.0,Asia/Tokyo,NaT,NaT,1931-01-01,2025-01-23,1880-01-01,2021-01-01
47981,Iwojima,JP,TK,47981.0,RJAW,24.7833,141.3167,113.0,Asia/Tokyo,1945-03-01,2025-12-15,1945-03-12,2022-04-20,1945-01-01,1960-01-01
47425,Chitose Ab,JP,HK,47425.0,RJCC,42.8,141.6667,27.0,Asia/Tokyo,1945-10-31,2025-12-15,1945-11-01,2022-04-25,1945-01-01,2022-01-01
47660,Tachikawa Ab,JP,TK,47660.0,RJTC,35.7,139.4,95.0,Asia/Tokyo,1945-11-15,2025-12-14,1945-11-16,2004-10-13,1946-01-01,1969-01-01
47642,Yokota Ab,JP,TK,47642.0,RJTY,35.75,139.35,139.0,Asia/Tokyo,1999-01-01,2025-12-14,1949-01-01,2022-04-25,1949-01-01,2021-01-01
47675,Oshima,JP,TK,47675.0,,34.75,139.3667,74.0,Asia/Tokyo,NaT,NaT,1951-01-01,2025-04-09,1938-01-01,2021-01-01
47401,Wakkanai,JP,HK,47401.0,,45.4167,141.6833,3.0,Asia/Tokyo,NaT,NaT,1951-01-01,2024-09-28,1938-01-01,2021-01-01
47678,Hachijojima,JP,TK,47678.0,,33.1,139.7833,79.0,Asia/Tokyo,NaT,NaT,1951-01-01,2025-04-09,1949-01-01,2021-01-01
47677,Miyakejima,JP,TK,47677.0,,34.1167,139.5167,36.0,Asia/Tokyo,NaT,NaT,1951-01-01,2025-04-09,1951-01-01,2020-01-01
47662,Tokyo,JP,TK,47662.0,RJTD,35.6833,139.7667,5.0,Asia/Tokyo,1952-12-31,2025-06-17,1951-01-01,2025-04-09,1875-01-01,2021-01-01


In [80]:
meteostations_Japan=meteostations_Japan[meteostations_Japan['name'].isin(['Nemuro','Tokyo'])]

In [82]:
meteostations_Japan=meteostations_Japan.reset_index()

In [83]:
meteostations_foreign=pd.concat([meteostations_USA,meteostations_Eu,meteostations_Japan])


In [84]:
meteostations_foreign

Unnamed: 0,id,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,monthly_start,monthly_end
0,72258,Dallas / Oldham,US,TX,72258,KDAL,32.8471,-96.8518,148.0,America/Chicago,1946-07-26,2025-12-15,1939-08-01,2025-12-10,1939-01-01,2022-01-01
1,72743,Marquette,US,MI,72743,KMQT,46.5333,-87.55,434.0,America/Detroit,1973-01-01,2025-04-30,1857-09-01,2025-12-08,1857-01-01,2022-01-01
0,1317,Bergen / Florida,NO,HO,1317,,60.3833,5.3333,12.0,Europe/Oslo,1973-01-01,2025-03-21,1957-01-01,2025-10-31,1957-01-01,2022-01-01
0,8535,Lisboa / Geof,PT,LI,8535,,38.7167,-9.15,77.0,Europe/Lisbon,NaT,NaT,1900-12-31,2025-05-19,1901-01-01,2021-01-01
0,47420,Nemuro,JP,HK,47420,,43.3333,145.5833,25.0,Asia/Tokyo,NaT,NaT,1931-01-01,2025-01-23,1880-01-01,2021-01-01
1,47662,Tokyo,JP,TK,47662,RJTD,35.6833,139.7667,5.0,Asia/Tokyo,1952-12-31,2025-06-17,1951-01-01,2025-04-09,1875-01-01,2021-01-01


In [85]:
meteostations_foreign.to_parquet("foreign_meteostations_ref.parquet", index=False)

In [87]:
from meteostat import Daily
from datetime import datetime

In [89]:
all_data = []
start = datetime(1950, 1, 1)
end = datetime(2025, 1, 1)
for station_id in meteostations_foreign['wmo']:
    print(f"Loading: {station_id}")
    data = Daily(station_id, start, end).fetch()
    data['station'] = station_id
    all_data.append(data)


df_daily = pd.concat(all_data)


Loading: 72258
Loading: 72743
Loading: 01317




Loading: 08535
Loading: 47420




Loading: 47662




In [95]:
df_daily.to_parquet("foreign_meteostations_daily.parquet", index=False)

In [94]:
df_daily=df_daily.reset_index()