# Kazakhstan Climate Change Research Project

``` 
    The main goal of this project is to analyze how the climate in Kazakhstan is changing over time 
and to explore regional climate trends. My hypothesis is that global warming affects Kazakhstan 
faster than many other regions.
``` 

### Project Plan

1. Load historical weather data using Meteostat.
2. Select meteorological stations covering different regions of Kazakhstan.
3. Compare Kazakhstanâ€™s climate data with Europe and the USA.

#### Load weather information 

In [1]:
import pandas as pd
from meteostat import Stations

In [14]:

stations = Stations()
stations = stations.region('KZ')

'Stations in KZ:', stations.count()

('Stations in KZ:', 129)

In [15]:
meteostations = stations.fetch()

In [16]:
meteostations.head()

Unnamed: 0_level_0,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,monthly_start,monthly_end
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
28676,Petropavlovsk / Bishkul',KZ,SEV,28676,,54.7925,69.1178,91.0,Asia/Almaty,NaT,NaT,1890-11-19,2025-08-22,1890-01-01,2021-01-01
28679,Petropavlovsk,KZ,SEV,28679,UACP,54.8333,69.15,136.0,Asia/Qyzylorda,2005-01-01,2025-12-09,2020-01-18,2022-04-01,1994-01-01,2012-01-01
28687,Bulayevo / B?laevo,KZ,SEV,28687,,54.9,70.45,134.0,Asia/Qyzylorda,1959-01-01,2021-01-02,1955-01-01,2020-12-31,1955-01-01,2020-01-01
28766,Blacoveschenka,KZ,SEV,28766,,54.3667,66.9667,153.0,Asia/Qyzylorda,1948-01-01,2025-04-02,1936-10-01,2021-03-18,1936-01-01,2021-01-01
28867,Uricky,KZ,KUS,28867,,53.3167,65.55,210.0,Asia/Aqtobe,1953-02-16,2025-04-02,1937-01-01,2021-06-25,1937-01-01,2021-01-01


In [6]:
meteostations.info()

<class 'pandas.core.frame.DataFrame'>
Index: 129 entries, 28676 to UAUU0
Data columns (total 15 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   name           129 non-null    object        
 1   country        129 non-null    string        
 2   region         129 non-null    string        
 3   wmo            120 non-null    string        
 4   icao           19 non-null     string        
 5   latitude       129 non-null    float64       
 6   longitude      129 non-null    float64       
 7   elevation      129 non-null    float64       
 8   timezone       129 non-null    string        
 9   hourly_start   104 non-null    datetime64[ns]
 10  hourly_end     104 non-null    datetime64[ns]
 11  daily_start    128 non-null    datetime64[ns]
 12  daily_end      128 non-null    datetime64[ns]
 13  monthly_start  126 non-null    datetime64[ns]
 14  monthly_end    126 non-null    datetime64[ns]
dtypes: datetime64[ns](6), 

### For better coverage and data continuity, I selected a set of stations using their WMO IDs:

| name          | country | region | wmo   | latitude | longitude | elevation | timezone        | hourly_start | hourly_end  |
|---------------|---------|--------|-------|----------|-----------|-----------|------------------|--------------|-------------|
| Shymkent      | KZ      | YUZ    | 38328 | 42,3167  | 69,7      | 552       | Asia/Qyzylorda   | 01.01.1948   | 22.11.2025  |
| Semipalatinsk | KZ      | VOS    | 36177 | 50,4167  | 80,3      | 196       | Asia/Almaty      | 01.01.1932   | 22.11.2025  |
| Aktjubinsk    | KZ      | AKT    | 35229 | 50,2833  | 57,15     | 227       | Asia/Aqtobe      | 01.01.1932   | 22.11.2025  |
| Panfilov      | KZ      | ALM    | 36859 | 44,1667  | 80,0667   | 640       | Asia/Almaty      | 14.01.1946   | 02.04.2025  |
| Balhash       | KZ      | KAR    | 35796 | 46,8     | 75,0833   | 352       | Asia/Almaty      | 01.01.1948   | 02.04.2025  |
| Atbasar       | KZ      | AKM    | 35078 | 51,8167  | 68,3667   | 308       | Asia/Qyzylorda   | 01.01.1948   | 02.04.2025  |


In [9]:
wmo_lists = ['38328','36177','35229','36859','35796','35078']

In [12]:
from meteostat import Hourly
from datetime import datetime

In [13]:
all_data = []
start = datetime(1950, 1, 1)
end = datetime(2025, 1, 1)
for station_id in wmo_lists:
    print(f"Loading: {station_id}")
    data = Hourly(station_id, start, end).fetch()
    data['station'] = station_id
    all_data.append(data)


df_weather = pd.concat(all_data)

Loading: 38328




Loading: 36177




Loading: 35229




Loading: 36859




Loading: 35796




Loading: 35078




In [14]:
df_weather.head()

Unnamed: 0_level_0,temp,dwpt,rhum,prcp,snow,wdir,wspd,wpgt,pres,tsun,coco,station
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1950-01-01 00:00:00,-2.2,-2.8,96.0,,,250.0,42.5,,1022.6,,,38328
1950-01-01 06:00:00,-2.2,,,,,220.0,25.9,,1027.0,,,38328
1950-01-01 12:00:00,-2.2,-3.9,88.0,,,0.0,0.0,,1026.0,,,38328
1950-01-01 18:00:00,-3.9,-6.0,85.0,,,40.0,3.6,,1023.9,,,38328
1950-01-02 00:00:00,-3.9,-11.2,57.0,,,90.0,11.2,,1024.4,,,38328


In [15]:
df_weather.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1392038 entries, 1950-01-01 00:00:00 to 2025-01-01 00:00:00
Data columns (total 12 columns):
 #   Column   Non-Null Count    Dtype  
---  ------   --------------    -----  
 0   temp     1382975 non-null  Float64
 1   dwpt     1373904 non-null  Float64
 2   rhum     1373904 non-null  Float64
 3   prcp     203992 non-null   Float64
 4   snow     0 non-null        Float64
 5   wdir     1359628 non-null  Float64
 6   wspd     1356750 non-null  Float64
 7   wpgt     0 non-null        Float64
 8   pres     1200715 non-null  Float64
 9   tsun     0 non-null        Float64
 10  coco     186547 non-null   Float64
 11  station  1392038 non-null  object 
dtypes: Float64(11), object(1)
memory usage: 152.7+ MB


In [None]:
df_weather.to_parquet("df_Daily.parquet", index=True)

In [17]:
meteostations=meteostations[meteostations['wmo'].isin(wmo_lists)]

In [18]:
meteostations.to_parquet("kz_meteostations_ref.parquet", index=True)