In this notebook we will take the data from the "windfarm_weather_data" folder and combine it into one CSV with one column that corresponds to time in UTC and the other columns correspond to weather data.

In [2]:
import os
import pandas as pd

We have a file of all windfarms, including some under construction projects. We will remove the unfinished windfarms, and we will also add a column with numerical labels for the wind farms to make our labelling in the large file simpler.

In [7]:
#deletes all windfarms labelled 'planned'
# Load the CSV file
wf = pd.read_csv("hydroquebec_wind_farms.csv")

# Remove rows where the 'label' column has the value 'A'
wf = wf[wf['status'] != 'Planned']


In [16]:
# adds a new column with numerical label

labels = list(range(1, len(wf) + 1))

# Insert the new column next to 'label'
wf.insert(1, 'labels', labels)



ValueError: cannot insert labels, already exists

In [17]:
wf

Unnamed: 0,name,labels,project_type,capacity_MW,region,status,commissioning_date,latitude,longitude
0,Baie-des-Sables wind farm,1,Wind farm,109.5,Bas-Saint-Laurent,In service,2006-11-22,48.70221,-67.872489
1,Carleton wind farm,2,Wind farm,109.5,Gaspésie–Îles-de-la-Madel,In service,2008-11-22,48.202737,-66.128295
2,Mont-Rothery wind farm,3,Wind farm,74.0,Gaspésie–Îles-de-la-Madel,In service,2015-12-01,48.978875,-65.373544
3,De L'Érable wind farm,4,Wind farm,100.0,Centre-du-Québec,In service,2013-11-16,46.096983,-71.647639
4,Des Moulins wind farm,5,Wind farm,156.85,Chaudière-Appalaches,In service,2013-12-07,46.173945,-71.351327
5,Frampton wind farm,6,Wind farm,24.0,Chaudière-Appalaches,In service,2015-12-15,46.438045,-70.741843
6,Gros-Morne wind farm,7,Wind farm,211.5,Gaspésie–Îles-de-la-Madel,In service,"Phase-1 : 2011-11-29, phase-2 : 2012-11-06.",49.206341,-65.448274
7,Côte-de-Beaupré wind farm,8,Wind farm,23.5,Capitale-National,In service,2015-11-19,47.311169,-70.881464
8,La Mitis wind farm,9,Wind farm,24.6,Bas-Saint-Laurent,In service,2014-10-17,48.358704,-67.821154
9,Lac-Alfred wind farm,10,Wind farm,300.0,Bas-Saint-Laurent,In service,"Phase-1 : 2013-01-19, phase-2 : 2013-08-31.",48.38917,-67.751679


In [19]:
#save a new csv
wf.to_csv("hydroquebec_wind_farms_in_service.csv", index=False)

Now we will use this file to separate out the weather data files that corresond to wind farms which are In Service.

In [21]:
# Paths
wf_csv_path = "hydroquebec_wind_farms_in_service.csv"
source_folder = "./windfarm_weather_data"
destination_folder = "./data/windfarm_weather_data_in_service"


In [23]:
# checking that the destination folder es
os.makedirs(destination_folder, exist_ok=True)

In [25]:
# read csv
wf = pd.read_csv(wf_csv_path)

In [33]:
# Move files
for filename in wf["name"]:
    fullfilename = filename + " hourly weather 2019-2024.csv"
    src_path = os.path.join(source_folder, fullfilename)
    dst_path = os.path.join(destination_folder, fullfilename)

    if os.path.exists(src_path):
        os.rename(src_path, dst_path)
        print(f"Moved: {fullfilename}")
    else:
        print(f"File not found: {fullfilename}")

Moved: Baie-des-Sables wind farm hourly weather 2019-2024.csv
File not found: Carleton wind farm hourly weather 2019-2024.csv
File not found: Mont-Rothery wind farm hourly weather 2019-2024.csv
File not found: De L'Érable wind farm hourly weather 2019-2024.csv
File not found: Des Moulins wind farm hourly weather 2019-2024.csv
File not found: Frampton wind farm hourly weather 2019-2024.csv
File not found: Gros-Morne wind farm hourly weather 2019-2024.csv
File not found: Côte-de-Beaupré wind farm hourly weather 2019-2024.csv
File not found: La Mitis wind farm hourly weather 2019-2024.csv
File not found: Lac-Alfred wind farm hourly weather 2019-2024.csv
File not found: L'Anse-à-Valleau wind farm hourly weather 2019-2024.csv
File not found: Le Granit wind farm hourly weather 2019-2024.csv
File not found: Le Plateau wind farm hourly weather 2019-2024.csv
File not found: Le Plateau 2 wind farm hourly weather 2019-2024.csv
File not found: Massif du Sud wind farm hourly weather 2019-2024.csv
F

We have now moved all the relevant csv files into one folder. They all the same column names and I want to add the labels to the column names so they can be differentiated in the big csv.

## WARNING this code overwrites the files in the folder titled "windfarm_weather_data_in_service! Save a copy of this folder before running!

In [66]:
wf.columns

Index(['name', 'labels', 'project_type', 'capacity_MW', 'region', 'status',
       'commissioning_date', 'latitude', 'longitude'],
      dtype='object')

In [44]:
for filename, label in zip(wf["name"], wf["labels"]):
    
    file_path = os.path.join(destination_folder, filename + " hourly weather 2019-2024.csv")

    if os.path.exists(file_path):
        try:
            df = pd.read_csv(file_path)

            new_columns = [f"{column}" for column in df.columns[0:2]] + [f"{column}_{label}" for column in df.columns[2:]]
            df.columns = new_columns

            # Save the file (overwrite)
            df.to_csv(file_path, index=False)
            print(f"Updated columns in {filename}")
        except Exception as e:
            print(f"Error processing {filename}: {e}")
    else:
        print(f"File not found: {filename}")

Updated columns in Baie-des-Sables wind farm
Updated columns in Carleton wind farm
Updated columns in Mont-Rothery wind farm
Updated columns in De L'Érable wind farm
Updated columns in Des Moulins wind farm
Updated columns in Frampton wind farm
Updated columns in Gros-Morne wind farm
Updated columns in Côte-de-Beaupré wind farm
Updated columns in La Mitis wind farm
Updated columns in Lac-Alfred wind farm
Updated columns in L'Anse-à-Valleau wind farm
Updated columns in Le Granit wind farm
Updated columns in Le Plateau wind farm
Updated columns in Le Plateau 2 wind farm
Updated columns in Massif du Sud wind farm
Updated columns in Montagne Sèche wind farm
Updated columns in Montérégie wind farm
Updated columns in Mont-Louis wind farm
Updated columns in New Richmond wind farm
Updated columns in Pierre-De Saurel wind farm
Updated columns in Mesgi'g Ugju's'n wind farm
Updated columns in Rivière-du-Moulin wind farm
Updated columns in Des Cultures wind farm
Updated columns in Saint-Damase win

In [75]:
#before combining the data frames I want to make sure that each dataframe has the same set of values in the time column and the column with the numbers

In [74]:
testdf = pd.read_csv("./windfarm_weather_data_in_service/Baie-des-Sables wind farm hourly weather 2019-2024.csv") #pick one csv

for filename in wf["name"]:
    
    file_path = os.path.join(destination_folder, filename + " hourly weather 2019-2024.csv")

    if os.path.exists(file_path):
        try:
            df = pd.read_csv(file_path)
            if  (df['time'] == testdf['time']).all() and (df['Unnamed: 0'] == testdf['Unnamed: 0']).all(): 
                print('check all good')
            else:
                string = "error at " + str(filename)
                print(string)
        except Exception as e:
            print(f"Error checking {filename}: {e}")
    else:
        print(f"File not found: {filename}")

check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good
check all good


In [76]:
testdf.columns

Index(['Unnamed: 0', 'time', 'temperature_2m_1', 'relative_humidity_2m_1',
       'wind_speed_10m_1', 'wind_direction_10m_1', 'location_1'],
      dtype='object')

In [85]:
initialdf = pd.read_csv("./windfarm_weather_data_in_service/Baie-des-Sables wind farm hourly weather 2019-2024.csv") 
#pick the csv corresponding to the first label
mergeddf = initialdf
for filename in wf["name"]:
    if filename == "Baie-des-Sables wind farm":
        pass
    else: 
        file_path = os.path.join(destination_folder, filename + " hourly weather 2019-2024.csv")
        df = pd.read_csv(file_path)
        mergeddf = pd.merge(mergeddf, df, on=['Unnamed: 0', 'time'])


mergeddf

Unnamed: 0.1,Unnamed: 0,time,temperature_2m_1,relative_humidity_2m_1,wind_speed_10m_1,wind_direction_10m_1,location_1,temperature_2m_2,relative_humidity_2m_2,wind_speed_10m_2,...,temperature_2m_38,relative_humidity_2m_38,wind_speed_10m_38,wind_direction_10m_38,location_38,temperature_2m_39,relative_humidity_2m_39,wind_speed_10m_39,wind_direction_10m_39,location_39
0,0,2019-01-01T00:00,-7.4,81,17.9,146,Baie-des-Sables wind farm,-13.4,85,11.4,...,-5.6,84,18.8,144,Viger-Denonville wind farm,-4.8,59,13.9,287,Dune-du-Nord wind farm
1,1,2019-01-01T01:00,-7.0,80,18.9,145,Baie-des-Sables wind farm,-13.6,87,12.8,...,-5.8,84,19.2,142,Viger-Denonville wind farm,-4.7,59,10.8,274,Dune-du-Nord wind farm
2,2,2019-01-01T02:00,-7.1,80,21.3,146,Baie-des-Sables wind farm,-14.1,89,13.6,...,-5.8,83,17.9,142,Viger-Denonville wind farm,-4.5,64,12.3,255,Dune-du-Nord wind farm
3,3,2019-01-01T03:00,-6.8,80,22.2,148,Baie-des-Sables wind farm,-12.8,86,15.5,...,-6.0,84,15.0,136,Viger-Denonville wind farm,-4.2,65,10.0,232,Dune-du-Nord wind farm
4,4,2019-01-01T04:00,-6.8,81,22.4,143,Baie-des-Sables wind farm,-11.9,88,16.2,...,-6.3,86,15.8,131,Viger-Denonville wind farm,-3.8,66,13.0,208,Dune-du-Nord wind farm
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
52603,52603,2024-12-31T19:00,1.9,85,12.8,219,Baie-des-Sables wind farm,-0.5,90,8.4,...,0.9,84,15.8,222,Viger-Denonville wind farm,2.9,99,23.1,191,Dune-du-Nord wind farm
52604,52604,2024-12-31T20:00,2.2,83,12.3,218,Baie-des-Sables wind farm,-1.1,92,6.5,...,0.4,85,14.4,224,Viger-Denonville wind farm,2.8,98,23.8,196,Dune-du-Nord wind farm
52605,52605,2024-12-31T21:00,1.9,86,13.5,223,Baie-des-Sables wind farm,-2.4,94,5.6,...,-0.1,90,13.5,216,Viger-Denonville wind farm,3.3,94,24.2,204,Dune-du-Nord wind farm
52606,52606,2024-12-31T22:00,1.4,87,13.6,215,Baie-des-Sables wind farm,-2.4,95,6.2,...,-0.7,94,11.8,203,Viger-Denonville wind farm,3.4,94,23.9,209,Dune-du-Nord wind farm


In [87]:
#whoops I forgot to remove the location column
locations = [f"location_{i}" for i in range(1, 40)]
mergeddf = mergeddf.drop(columns=locations, axis=1)


In [88]:
mergeddf

Unnamed: 0.1,Unnamed: 0,time,temperature_2m_1,relative_humidity_2m_1,wind_speed_10m_1,wind_direction_10m_1,temperature_2m_2,relative_humidity_2m_2,wind_speed_10m_2,wind_direction_10m_2,...,wind_speed_10m_37,wind_direction_10m_37,temperature_2m_38,relative_humidity_2m_38,wind_speed_10m_38,wind_direction_10m_38,temperature_2m_39,relative_humidity_2m_39,wind_speed_10m_39,wind_direction_10m_39
0,0,2019-01-01T00:00,-7.4,81,17.9,146,-13.4,85,11.4,79,...,18.2,141,-5.6,84,18.8,144,-4.8,59,13.9,287
1,1,2019-01-01T01:00,-7.0,80,18.9,145,-13.6,87,12.8,79,...,18.7,140,-5.8,84,19.2,142,-4.7,59,10.8,274
2,2,2019-01-01T02:00,-7.1,80,21.3,146,-14.1,89,13.6,73,...,19.9,135,-5.8,83,17.9,142,-4.5,64,12.3,255
3,3,2019-01-01T03:00,-6.8,80,22.2,148,-12.8,86,15.5,85,...,20.9,134,-6.0,84,15.0,136,-4.2,65,10.0,232
4,4,2019-01-01T04:00,-6.8,81,22.4,143,-11.9,88,16.2,90,...,17.3,135,-6.3,86,15.8,131,-3.8,66,13.0,208
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
52603,52603,2024-12-31T19:00,1.9,85,12.8,219,-0.5,90,8.4,259,...,7.5,188,0.9,84,15.8,222,2.9,99,23.1,191
52604,52604,2024-12-31T20:00,2.2,83,12.3,218,-1.1,92,6.5,268,...,7.6,180,0.4,85,14.4,224,2.8,98,23.8,196
52605,52605,2024-12-31T21:00,1.9,86,13.5,223,-2.4,94,5.6,272,...,7.2,167,-0.1,90,13.5,216,3.3,94,24.2,204
52606,52606,2024-12-31T22:00,1.4,87,13.6,215,-2.4,95,6.2,277,...,4.8,153,-0.7,94,11.8,203,3.4,94,23.9,209


In [89]:
#save this dataframe
mergeddf.to_csv('windfarmweatherdata_notinUTC.csv', index=False)

In [103]:
#convert the time in the time column to UTC, time column is currently GMT-4
mergeddf['time'] = pd.to_datetime(df['time'])
mergeddf['time'] = mergeddf['time'].dt.tz_localize('Etc/GMT+4')
mergeddf['time'] = mergeddf['time'].dt.tz_convert('UTC')
mergeddf['time'] = mergeddf['time'].dt.tz_localize(None)
mergeddf

Unnamed: 0.1,Unnamed: 0,time,temperature_2m_1,relative_humidity_2m_1,wind_speed_10m_1,wind_direction_10m_1,temperature_2m_2,relative_humidity_2m_2,wind_speed_10m_2,wind_direction_10m_2,...,wind_direction_10m_37,temperature_2m_38,relative_humidity_2m_38,wind_speed_10m_38,wind_direction_10m_38,temperature_2m_39,relative_humidity_2m_39,wind_speed_10m_39,wind_direction_10m_39,time_utc
0,0,2019-01-01 04:00:00,-7.4,81,17.9,146,-13.4,85,11.4,79,...,141,-5.6,84,18.8,144,-4.8,59,13.9,287,2019-01-01 04:00:00+00:00
1,1,2019-01-01 05:00:00,-7.0,80,18.9,145,-13.6,87,12.8,79,...,140,-5.8,84,19.2,142,-4.7,59,10.8,274,2019-01-01 05:00:00+00:00
2,2,2019-01-01 06:00:00,-7.1,80,21.3,146,-14.1,89,13.6,73,...,135,-5.8,83,17.9,142,-4.5,64,12.3,255,2019-01-01 06:00:00+00:00
3,3,2019-01-01 07:00:00,-6.8,80,22.2,148,-12.8,86,15.5,85,...,134,-6.0,84,15.0,136,-4.2,65,10.0,232,2019-01-01 07:00:00+00:00
4,4,2019-01-01 08:00:00,-6.8,81,22.4,143,-11.9,88,16.2,90,...,135,-6.3,86,15.8,131,-3.8,66,13.0,208,2019-01-01 08:00:00+00:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
52603,52603,2024-12-31 23:00:00,1.9,85,12.8,219,-0.5,90,8.4,259,...,188,0.9,84,15.8,222,2.9,99,23.1,191,2024-12-31 23:00:00+00:00
52604,52604,2025-01-01 00:00:00,2.2,83,12.3,218,-1.1,92,6.5,268,...,180,0.4,85,14.4,224,2.8,98,23.8,196,2025-01-01 00:00:00+00:00
52605,52605,2025-01-01 01:00:00,1.9,86,13.5,223,-2.4,94,5.6,272,...,167,-0.1,90,13.5,216,3.3,94,24.2,204,2025-01-01 01:00:00+00:00
52606,52606,2025-01-01 02:00:00,1.4,87,13.6,215,-2.4,95,6.2,277,...,153,-0.7,94,11.8,203,3.4,94,23.9,209,2025-01-01 02:00:00+00:00


In [104]:
mergeddf.to_csv('windfarmweatherdata_inUTC.csv', index=False)