# About

The script reads the Electricity Demand components from the ETYS Spatial CSV file created by running SAS script "Write_ETYS-Demands-CSV.sas". It processes it to produce the CSV files required for the visualisation map.

## Setup

In [1]:
#Import all libraries and set root driectory location
import shutil, os, json, pandas as pd, math
os.chdir("C:/Users/thomas.laskowski/Documents/Python/regional-fes-resources-master")

In [2]:
#Delete the outputs folder and all it's content (to remove any old data).
dir_path = r".\ETYS data\Output"

output_folders = ["Active" , "DG", "DSR" ,"Sub1MW"]
            
try:
    shutil.rmtree(dir_path)
except OSError as e:
    print("Error: %s : %s" % (dir_path, e.strerror))

# We create a new empty Output folder.
try:
    os.mkdir(dir_path)
except OSError:
    print ("WARNING: Creation of the directory %s failed" % dir_path)
else:
    print ("Successfully created the directory %s " % dir_path)
    for items in output_folders:
        path = os.path.join(dir_path, items)
        os.mkdir(path)

Successfully created the directory .\ETYS data\Output 


## Get list of regions in the spatail visualisation

In [3]:
#Get the list of Regions in the spatial visualisation
with open(r".\Geographies\GSPs 2019\GSP_post.geojson") as f:
    data = json.load(f) 

fin = open("./ETYS data/GSPs_VisualisationList.csv", "wt")
fin.write("Region")
fin.write('\n')

for feature in data['features']:
    fin.write(feature['properties']['GSP(s)'])
    fin.write('\n')
    
fin.close()

#Read resultant CSV in as GSP_Regions dataframe
GSP_regions = pd.read_csv(r".\ETYS data\GSPs_VisualisationList.csv")

GSP_regions

Unnamed: 0,Region
0,ABHA1
1,ABNE_P
2,ABTH_1
3,ACTL_2;CBNK_H;GREE_H;PERI_H
4,ALNE_P
...,...
312,WOHI_P
313,WTHU31
314,WWEY_1
315,WYLF_1


In [4]:
#explode GSP_regions into grouped_regions

grouped_regions = pd.concat([pd.Series(row['Region'], row['Region'].split(';'))              
                    for _, row in GSP_regions.iterrows()]).reset_index()    

grouped_regions

Unnamed: 0,index,0
0,ABHA1,ABHA1
1,ABNE_P,ABNE_P
2,ABTH_1,ABTH_1
3,ACTL_2,ACTL_2;CBNK_H;GREE_H;PERI_H
4,CBNK_H,ACTL_2;CBNK_H;GREE_H;PERI_H
...,...,...
354,WOHI_P,WOHI_P
355,WTHU31,WTHU31
356,WWEY_1,WWEY_1
357,WYLF_1,WYLF_1


## Read input data CSVs

We drop un-used columns as soon as possible. This prevents us from using them in the future without first considering what corrections need applying to them first.

In [5]:
#pd.options.mode.chained_assignment = None  # default='warn'

#pd.options.mode.chained_assignment = "warn"

# Read in the CSVs to dataframes
df_active_csv = pd.read_csv(r".\ETYS data\Input\active.csv")
df_active_csv = df_active_csv.drop(columns=['DemandAM', 'DemandPM'])
print(df_active_csv.head(10))

df_DG_csv = pd.read_csv(r".\ETYS data\Input\DG.csv")
df_DG_csv = df_DG_csv.drop(columns=['wintpk', 'summam', 'summpm'])
print(df_DG_csv.head(10))

df_Sub1MW_csv = pd.read_csv(r".\ETYS data\Input\Sub1MW.csv")
df_Sub1MW_csv = df_Sub1MW_csv.drop(columns=['wintpk', 'summam', 'summpm'])
print(df_Sub1MW_csv.head(10))

df_DSR_csv = pd.read_csv(r".\ETYS data\Input\DSR.csv")
print(df_DSR_csv.head(10))

  scenario    GSP  DemandPk type  year
0       SP  ABHA1    79.890    C    20
1       SP  ABHA1    81.231    C    21
2       SP  ABHA1    82.074    C    22
3       SP  ABHA1    83.289    C    23
4       SP  ABHA1    84.134    C    24
5       SP  ABHA1    84.998    C    25
6       SP  ABHA1    85.526    C    26
7       SP  ABHA1    85.903    C    27
8       SP  ABHA1    86.262    C    28
9       SP  ABHA1    86.619    C    29
  scenario   tech  year etys_location  capacity
0       CF  Hydro    20        ALNE_P      4.00
1       CF  Hydro    20        ARDK_P      5.20
2       CF  Hydro    20        ARMO_P      1.20
3       CF  Hydro    20        BEAU_P      1.99
4       CF  Hydro    20        BOAG_P      6.95
5       CF  Hydro    20        BRAC_P      3.00
6       CF  Hydro    20        BROA_P      1.20
7       CF  Hydro    20        CAAD_P      2.40
8       CF  Hydro    20          CAFA     12.00
9       CF  Hydro    20        CASS_P      4.50
  scenario     tech  year etys_location  ca

## Apply corrections
The ETYS spatial demand data has some edge case changes that we need to reverse out. These are:

- Several Scottish GSPs are split in the ETYS data. These are the "G_EXTRA" locations and we will need to set them back to their Elexon registered GSP for the data visualisation.
- There are cases of new GSPs added in the future taking a proportion of demand from other (existing) GSPs. As our visualisation does not show future new GSPs, we need to reverse the logic and add the demands back to their present GSP.

In [6]:
#Active
df_active = df_active_csv.copy()
df_active.loc[(df_active.GSP == 'G_EXTRA_1'),'GSP']='DUMF'
df_active.loc[(df_active.GSP == 'G_EXTRA_2'),'GSP']='DUMF'
df_active.loc[(df_active.GSP == 'G_EXTRA_3'),'GSP']='DUMF'
df_active.loc[(df_active.GSP == 'G_EXTRA_4'),'GSP']='GRMO'
df_active.loc[(df_active.GSP == 'G_EXTRA_5'),'GSP']='GRMO'
df_active.loc[(df_active.GSP == 'G_EXTRA_6'),'GSP']='KILB'
df_active.loc[(df_active.GSP == 'G_EXTRA_7'),'GSP']='KILB'
df_active.loc[(df_active.GSP == 'G_EXTRA_8'),'GSP']='SACO'
df_active.loc[(df_active.GSP == 'G_EXTRA_9'),'GSP']='SACO'
df_active.loc[(df_active.GSP == 'G_EXTRA_10'),'GSP']='CROO'
df_active.loc[(df_active.GSP == 'G_EXTRA_11'),'GSP']='CROO'
df_active.loc[(df_active.GSP == 'DUNB_A'),'GSP']='DUNB'
df_active.loc[(df_active.GSP == 'DUNB_B'),'GSP']='DUNB'

##Reverse out G_EXTRA_13
df_active.loc[(df_active.GSP == 'KINT_P') & (df_active.year >= 21),'DemandPk'] = df_active.DemandPk / (1 - 0.1164)
df_active.loc[(df_active.GSP == 'KEIT_P') & (df_active.year >= 21),'DemandPk'] = df_active.DemandPk / (1 - 0.14457)

##Reverse out G_EXTRA_14
#Nothing to do for FES 2021.

##Reverse out G_EXTRA_15
df_active.loc[(df_active.GSP == 'LINM') & (df_active.year >= 25),'DemandPk'] = df_active.DemandPk / (1 - 0.13)

##Reverse out G_EXTRA_16
df_active.loc[(df_active.GSP == 'CHAP') & (df_active.year >= 23),'DemandPk'] = df_active.DemandPk / (1 - 0.42)

##Reverse out ISLI_1
#Commented out as we have actually added ISLI_1 to the geojson making a (approx) assumption that it is splitting only from WHAM_1.
#df_active.loc[(df_active.GSP == 'LODR_6') & (df_active.year >= 22),'DemandPk'] = df_active.DemandPk / (1 - 0.25)
#df_active.loc[(df_active.GSP == 'WHAM_1') & (df_active.year >= 22),'DemandPk'] = df_active.DemandPk / (1 - 0.45)


#Convert year column from YY to YYYY format
df_active.loc[:,'year'] =  (df_active.loc[:,'year'] + 2000).astype(int)

In [7]:
#DG
df_DG = df_DG_csv.copy()
df_DG.loc[(df_DG.etys_location == 'G_EXTRA_1'),'etys_location']='DUMF'
df_DG.loc[(df_DG.etys_location == 'G_EXTRA_2'),'etys_location']='DUMF'
df_DG.loc[(df_DG.etys_location == 'G_EXTRA_3'),'etys_location']='DUMF'
df_DG.loc[(df_DG.etys_location == 'G_EXTRA_4'),'etys_location']='GRMO'
df_DG.loc[(df_DG.etys_location == 'G_EXTRA_5'),'etys_location']='GRMO'
df_DG.loc[(df_DG.etys_location == 'G_EXTRA_6'),'etys_location']='KILB'
df_DG.loc[(df_DG.etys_location == 'G_EXTRA_7'),'etys_location']='KILB'
df_DG.loc[(df_DG.etys_location == 'G_EXTRA_8'),'etys_location']='SACO'
df_DG.loc[(df_DG.etys_location == 'G_EXTRA_9'),'etys_location']='SACO'
df_DG.loc[(df_DG.etys_location == 'G_EXTRA_10'),'etys_location']='CROO'
df_DG.loc[(df_DG.etys_location == 'G_EXTRA_11'),'etys_location']='CROO'
df_DG.loc[(df_DG.etys_location == 'DUNB_A'),'etys_location']='DUNB'
df_DG.loc[(df_DG.etys_location == 'DUNB_B'),'etys_location']='DUNB'

##Reverse out others
#We don't have the ability to do that at this point in the analysis as we have aggregated technology types together.
#TODO: Look at our upstream processes and change them so that we produce the visualisation data at that stage.

#Renaming etys_location column to GSP
df_DG.rename(columns = {'etys_location': 'GSP'}, inplace = True)

#Convert year column from YY to YYYY format
df_DG.loc[:,'year'] =  (df_DG.loc[:,'year'] + 2000).astype(int)

In [8]:
#Sub1MW
df_Sub1MW = df_Sub1MW_csv.copy()
df_Sub1MW.loc[(df_Sub1MW.etys_location == 'G_EXTRA_1'),'etys_location']='DUMF'
df_Sub1MW.loc[(df_Sub1MW.etys_location == 'G_EXTRA_2'),'etys_location']='DUMF'
df_Sub1MW.loc[(df_Sub1MW.etys_location == 'G_EXTRA_3'),'etys_location']='DUMF'
df_Sub1MW.loc[(df_Sub1MW.etys_location == 'G_EXTRA_4'),'etys_location']='GRMO'
df_Sub1MW.loc[(df_Sub1MW.etys_location == 'G_EXTRA_5'),'etys_location']='GRMO'
df_Sub1MW.loc[(df_Sub1MW.etys_location == 'G_EXTRA_6'),'etys_location']='KILB'
df_Sub1MW.loc[(df_Sub1MW.etys_location == 'G_EXTRA_7'),'etys_location']='KILB'
df_Sub1MW.loc[(df_Sub1MW.etys_location == 'G_EXTRA_8'),'etys_location']='SACO'
df_Sub1MW.loc[(df_Sub1MW.etys_location == 'G_EXTRA_9'),'etys_location']='SACO'
df_Sub1MW.loc[(df_Sub1MW.etys_location == 'G_EXTRA_10'),'etys_location']='CROO'
df_Sub1MW.loc[(df_Sub1MW.etys_location == 'G_EXTRA_11'),'etys_location']='CROO'
df_Sub1MW.loc[(df_Sub1MW.etys_location == 'DUNB_A'),'etys_location']='DUNB'
df_Sub1MW.loc[(df_Sub1MW.etys_location == 'DUNB_B'),'etys_location']='DUNB'

##Reverse out others
#We don't have the ability to do that at this point in the analysis as we have aggregated technology types together.
#TODO: Look at our upstream processes and change them so that we produce the visualisation data at that stage.

#Renaming etys_location column to GSP
df_Sub1MW.rename(columns = {'etys_location': 'GSP'}, inplace = True)

#Convert year column from YY to YYYY format
df_Sub1MW.loc[:,'year'] =  (df_Sub1MW.loc[:,'year'] + 2000).astype(int)

In [9]:
#DSR
df_DSR = df_DSR_csv.copy()
df_DSR.loc[(df_DSR.GSP == 'G_EXTRA_1'),'GSP']='DUMF'
df_DSR.loc[(df_DSR.GSP == 'G_EXTRA_2'),'GSP']='DUMF'
df_DSR.loc[(df_DSR.GSP == 'G_EXTRA_3'),'GSP']='DUMF'
df_DSR.loc[(df_DSR.GSP == 'G_EXTRA_4'),'GSP']='GRMO'
df_DSR.loc[(df_DSR.GSP == 'G_EXTRA_5'),'GSP']='GRMO'
df_DSR.loc[(df_DSR.GSP == 'G_EXTRA_6'),'GSP']='KILB'
df_DSR.loc[(df_DSR.GSP == 'G_EXTRA_7'),'GSP']='KILB'
df_DSR.loc[(df_DSR.GSP == 'G_EXTRA_8'),'GSP']='SACO'
df_DSR.loc[(df_DSR.GSP == 'G_EXTRA_9'),'GSP']='SACO'
df_DSR.loc[(df_DSR.GSP == 'G_EXTRA_10'),'GSP']='CROO'
df_DSR.loc[(df_DSR.GSP == 'G_EXTRA_11'),'GSP']='CROO'
df_DSR.loc[(df_DSR.GSP == 'DUNB_A'),'GSP']='DUNB'
df_DSR.loc[(df_DSR.GSP == 'DUNB_B'),'GSP']='DUNB'

##Reverse out others
#TODO.

#Convert year column from YY to YYYY format
df_DSR.loc[:,'year'] =  (df_DSR.loc[:,'year'] + 2000).astype(int)

In [10]:
df_DSR

Unnamed: 0,scenario,GSP,DSR,year
0,SP,ABHA1,-4.347,2020
1,SP,ABHA1,-4.574,2021
2,SP,ABHA1,-4.795,2022
3,SP,ABHA1,-5.013,2023
4,SP,ABHA1,-5.213,2024
...,...,...,...,...
45379,LW,WYLF_1,-9.728,2046
45380,LW,WYLF_1,-9.826,2047
45381,LW,WYLF_1,-9.900,2048
45382,LW,WYLF_1,-9.975,2049


## Produce outputs

This takes the input data frames and merges them with the grouped_regions data frame, aggregating the GSPs to their regions.

After merging the data frames, it pivots the data to have years on the top and regions in the left most column.

There is inbuilt QA to flag to the user if GSPs haven't been assigned a region or if a region is missing data from one or more GSPs

The below scripts also produce the output data files


In [11]:
#Active.csv

for category in ["C", "D", "E", "H", "I", "R", "Z"]:
    for scenario in ["SP", "CT", "ST", "LW"]:
        # Create df as a filter of df_csv
        
        df = df_active[(df_active['scenario'] == scenario) & (df_active['type'] == category)]
        
        #Drop transmition demand
        df = df[~df.type.str.contains("T")]
    
        # Merge df on grouped_regions. Merge type 'outer' to prevent dropping of data from 
        # df_Active and grouped_regions dataframes. Drop index & GSP columns.
        df = df.merge(grouped_regions, left_on = "GSP", right_on = "index", how = 'outer').drop(columns = ["index"]).rename(columns = {0:"Region"})

        # Perform checks on df merged with grouped_regions

        # Creating new column containing GSPs which have not been assigned regions in the merge
        df['Region_check'] = df['Region']
        df.Region_check.fillna(df.GSP, inplace=True)
        df["Region_check"] = df.apply(lambda x: x["Region_check"].replace(str(x["Region"]), "").strip(), axis=1)


        # Creating new column containing Regions which are missing one or more GSPs
        df['GSP_check'] = df['GSP']
        df.GSP_check.fillna(df.Region, inplace=True)
        df["GSP_check"] = df.apply(lambda x: x["GSP_check"].replace(str(x["GSP"]), "").strip(), axis=1)

        #Checking which Regions and which GSPs are missing values
        check_Region_for_NaN = df['Region'].isna()
        check_GSP_for_NaN = df['GSP'].isna()
        NaN_Region_count = check_Region_for_NaN.sum()
        NaN_GSP_count = check_GSP_for_NaN.sum()

        #Printing warning messages to user.
        if NaN_Region_count > 0 and NaN_GSP_count > 0:
            print('Warning: There are blanks in both GSP and Region columns after the merge with the grouped_regions data frame. There is a risk some GSPs have not been asssigned a correct region and that some regions are missing data from one or more GSPs. This may lead to a loss of data when data is pivoted. Please check input and output data files.')
            print('\n')
            print('The GSPs in Scenario ' + scenario + ' and Demand Type ' + category + ' which have not been assigned regions are: ')
            print(df.Region_check.unique())
            print('\n')
            print('The regions in Scenario ' + scenario + ' and Demand Type ' + category + ' with missing GSPs are: ')
            print(df.GSP_check.unique())
            print('\n')
        elif NaN_Region_count > 0 and NaN_GSP_count == 0:
            print('Warning: There are blanks in the Region column after the merge with the grouped_regions data frame. There is a risk some GSPs have not been assigned to their correct region. This may lead to a loss of data when data is pivoted. Please check input and output data files.')
            print('\n')
            print('The GSPs in Scenario ' + scenario + ' and Demand Type ' + category + ' which have not been assigned regions are: ')
            print(df.Region_check.unique())
            print('\n')
        elif NaN_Region_count == 0 and NaN_GSP_count > 0:
            print('There are blanks in the GSP column after the merge with the grouped_regions data frame. Some regions are missing data from one or more GSPs. This may lead to a loss of data when data is pivoted. Please check input and output data files.')
            print('\n')
            print('The Regions in Scenario ' + scenario + ' and Demand Type ' + category + 'which are missing GSPs are:')
            print(df.GSP_check.unique())
            print('\n')
        else:
            print('Data is okay. Please proceed.')         
        
        
        #converting years from float to integer
        df['year'] = df['year'].astype('Int64')
                   
        # Pivot to have years across the top
        df = pd.pivot_table(df, index='Region', columns= 'year', values='DemandPk', aggfunc = 'sum')
        
        
        # Export to CSV
        df.index.name = 'Primary'
        filename = scenario + "-DemandPk-" + category + ".csv"
        
        df.to_csv(r".\ETYS data\Output\Active\\" + filename, index=True, float_format='%.3f')



The GSPs in Scenario SP and Demand Type C which have not been assigned regions are: 
['' 'G_EXTRA_13' 'G_EXTRA_14' 'G_EXTRA_15' 'G_EXTRA_16']


The regions in Scenario SP and Demand Type C with missing GSPs are: 
['' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'CANTN1;RICH1;RICH_J'
 'COWL_1;ECLA_H' 'G_EXTRA_12']




The GSPs in Scenario CT and Demand Type C which have not been assigned regions are: 
['' 'G_EXTRA_13' 'G_EXTRA_14' 'G_EXTRA_15' 'G_EXTRA_16']


The regions in Scenario CT and Demand Type C with missing GSPs are: 
['' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'CANTN1;RICH1;RICH_J'
 'COWL_1;ECLA_H' 'G_EXTRA_12']




The GSPs in Scenario ST and Demand Type C which have not been assigned regions are: 
['' 'G_EXTRA_13' 'G_EXTRA_14' 'G_EXTRA_15' 'G_EXTRA_16']


The regions in Scenario ST and Demand Type C with missing GSPs are: 
['' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'CANTN1;RICH1;RICH_J'
 'COWL_1;ECLA_H' 'G_EXTRA_12']




The GSPs in Scenario LW and Demand Type C which have not been ass



The GSPs in Scenario CT and Demand Type H which have not been assigned regions are: 
['' 'G_EXTRA_13' 'G_EXTRA_14' 'G_EXTRA_15' 'G_EXTRA_16']


The regions in Scenario CT and Demand Type H with missing GSPs are: 
['' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'CANTN1;RICH1;RICH_J'
 'COWL_1;ECLA_H' 'G_EXTRA_12']




The GSPs in Scenario ST and Demand Type H which have not been assigned regions are: 
['' 'G_EXTRA_13' 'G_EXTRA_14' 'G_EXTRA_15' 'G_EXTRA_16']


The regions in Scenario ST and Demand Type H with missing GSPs are: 
['' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'CANTN1;RICH1;RICH_J'
 'COWL_1;ECLA_H' 'G_EXTRA_12']




The GSPs in Scenario LW and Demand Type H which have not been assigned regions are: 
['' 'G_EXTRA_13' 'G_EXTRA_14' 'G_EXTRA_15' 'G_EXTRA_16']


The regions in Scenario LW and Demand Type H with missing GSPs are: 
['' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'CANTN1;RICH1;RICH_J'
 'COWL_1;ECLA_H' 'G_EXTRA_12']




The GSPs in Scenario SP and Demand Type I which have not been ass



The GSPs in Scenario ST and Demand Type Z which have not been assigned regions are: 
['' 'G_EXTRA_13' 'G_EXTRA_14' 'G_EXTRA_15' 'G_EXTRA_16']


The regions in Scenario ST and Demand Type Z with missing GSPs are: 
['' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'CANTN1;RICH1;RICH_J'
 'COWL_1;ECLA_H' 'G_EXTRA_12']




The GSPs in Scenario LW and Demand Type Z which have not been assigned regions are: 
['' 'G_EXTRA_13' 'G_EXTRA_14' 'G_EXTRA_15' 'G_EXTRA_16']


The regions in Scenario LW and Demand Type Z with missing GSPs are: 
['' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'CANTN1;RICH1;RICH_J'
 'COWL_1;ECLA_H' 'G_EXTRA_12']




In [12]:
#Active.csv (Aggregated)

for scenario in ["SP", "CT", "ST", "LW"]:
    # Create df as a filter of df_csv
    df = df_active[(df_active['scenario'] == scenario)]

    #Drop transmition demand
    df = df[~df.type.str.contains("T")]
    
    # Merge df on grouped_regions. Merge type 'outer' to prevent dropping of data from 
    # df_DG and grouped_regions dataframes. Drop index & GSP columns.
    df = df.merge(grouped_regions, left_on = "GSP", right_on = "index", how = 'outer').drop(columns = ["index"]).rename(columns = {0:"Region"})

    # Perform checks on df merged with grouped_regions

    # Creating new column containing GSPs which have not been assigned regions in the merge
    df['Region_check'] = df['Region']
    df.Region_check.fillna(df.GSP, inplace=True)
    df["Region_check"] = df.apply(lambda x: x["Region_check"].replace(str(x["Region"]), "").strip(), axis=1)


    # Creating new column containing Regions which are missing one or more GSPs
    df['GSP_check'] = df['GSP']
    df.GSP_check.fillna(df.Region, inplace=True)
    df["GSP_check"] = df.apply(lambda x: x["GSP_check"].replace(str(x["GSP"]), "").strip(), axis=1)

    #Checking which Regions and which GSPs are missing values
    check_Region_for_NaN = df['Region'].isna()
    check_GSP_for_NaN = df['GSP'].isna()
    NaN_Region_count = check_Region_for_NaN.sum()
    NaN_GSP_count = check_GSP_for_NaN.sum()

    #Printing warning messages to user.
    if NaN_Region_count > 0 and NaN_GSP_count > 0:
        print('Warning: There are blanks in both GSP and Region columns after the merge with the grouped_regions data frame. There is a risk some GSPs have not been asssigned a correct region and that some regions are missing data from one or more GSPs. This may lead to a loss of data when data is pivoted. Please check input and output data files.') 
        print('\n')
        print('The GSPs in Scenario: ' + scenario + ' which have not been assigned regions are: ')
        print(df.Region_check.unique())
        print('\n')
        print('The regions in Scenario: ' + scenario + ' with missing GSPs are: ')
        print(df.GSP_check.unique())
        print('\n')
    elif NaN_Region_count > 0 and NaN_GSP_count == 0:
        print('Warning: There are blanks in the Region column after the merge with the grouped_regions data frame. There is a risk some GSPs have not been assigned to their correct region. This may lead to a loss of data when data is pivoted. Please check input and output data files.')
        print('\n')
        print('The GSPs in Scenario: ' + scenario + ' which have not been assigned regions are: ')
        print(df.Region_check.unique())
        print('\n')
    elif NaN_Region_count == 0 and NaN_GSP_count > 0:
        print('There are blanks in the GSP column after the merge with the grouped_regions data frame. Some regions are missing data from one or more GSPs. This may lead to a loss of data when data is pivoted. Please check input and output data files.') 
        print('\n')
        print('The Regions in Scenario: ' + scenario + ' which are missing GSPs are:')
        print(df.GSP_check.unique())
        print('\n')      
    else:
        print('Data is okay. Please proceed.')         
    
    
    df['year'] = df['year'].astype('Int64')
    
    # Pivot to have years across the top
    df = pd.pivot_table(df, index='Region', columns= 'year', values='DemandPk', aggfunc = 'sum')

    # Export to CSV
    df.index.name = 'Primary'
    filename = scenario + "-DemandPk-" + "All" + ".csv"
    df.to_csv(r".\ETYS data\Output\Active\\" + filename, index=True, float_format='%.3f')




The GSPs in Scenario: SP which have not been assigned regions are: 
['' 'G_EXTRA_13' 'G_EXTRA_14' 'G_EXTRA_15' 'G_EXTRA_16']


The regions in Scenario: SP with missing GSPs are: 
['' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'CANTN1;RICH1;RICH_J'
 'COWL_1;ECLA_H' 'G_EXTRA_12']




The GSPs in Scenario: CT which have not been assigned regions are: 
['' 'G_EXTRA_13' 'G_EXTRA_14' 'G_EXTRA_15' 'G_EXTRA_16']


The regions in Scenario: CT with missing GSPs are: 
['' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'CANTN1;RICH1;RICH_J'
 'COWL_1;ECLA_H' 'G_EXTRA_12']




The GSPs in Scenario: ST which have not been assigned regions are: 
['' 'G_EXTRA_13' 'G_EXTRA_14' 'G_EXTRA_15' 'G_EXTRA_16']


The regions in Scenario: ST with missing GSPs are: 
['' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'CANTN1;RICH1;RICH_J'
 'COWL_1;ECLA_H' 'G_EXTRA_12']




The GSPs in Scenario: LW which have not been assigned regions are: 
['' 'G_EXTRA_13' 'G_EXTRA_14' 'G_EXTRA_15' 'G_EXTRA_16']


The regions in Scenario: LW with missing

In [13]:
#DG.csv

for technology in ["Hydro", "Other", "Solar", "Storage", "Wind"]:
    for scenario in ["SP", "CT", "ST", "LW"]:
        # Create df as a filter of df_csv
        df = df_DG[(df_DG['scenario'] == scenario) & (df_DG['tech'] == technology)]
          
        # Merge df on grouped_regions. Merge type 'outer' to prevent dropping of data from 
        # df_DG and grouped_regions dataframes. Drop index & GSP columns.
        df = df.merge(grouped_regions, left_on = "GSP", right_on = "index", how = 'outer').drop(columns = ["index"]).rename(columns = {0:"Region"})
        
        
        # Perform checks on df merged with grouped_regions
        
        # Creating new column containing GSPs which have not been assigned regions in the merge
        df['Region_check'] = df['Region']
        df.Region_check.fillna(df.GSP, inplace=True)
        df["Region_check"] = df.apply(lambda x: x["Region_check"].replace(str(x["Region"]), "").strip(), axis=1)


        # Creating new column containing Regions which are missing one or more GSPs
        df['GSP_check'] = df['GSP']
        df.GSP_check.fillna(df.Region, inplace=True)
        df["GSP_check"] = df.apply(lambda x: x["GSP_check"].replace(str(x["GSP"]), "").strip(), axis=1)

        #Checking which Regions and which GSPs are missing values
        check_Region_for_NaN = df['Region'].isna()
        check_GSP_for_NaN = df['GSP'].isna()
        NaN_Region_count = check_Region_for_NaN.sum()
        NaN_GSP_count = check_GSP_for_NaN.sum()

        #Printing warning messages to user.
        if NaN_Region_count > 0 and NaN_GSP_count > 0:
            print('Warning: There are blanks in both GSP and Region columns after the merge with the grouped_regions data frame. There is a risk some GSPs have not been asssigned a correct region and that some regions are missing data from one or more GSPs. This may lead to a loss of data when data is pivoted. Please check input and output data files.') 
            print('\n')
            print('The GSPs in Scenario ' + scenario + ' and Technology Type ' + technology + ' which have not been assigned regions are: ')
            print(df.Region_check.unique())
            print('\n')
            print('The regions in Scenario ' + scenario + ' and Technology Type ' + technology + ' with missing GSPs are: ')
            print(df.GSP_check.unique())
            print('\n')
        elif NaN_Region_count > 0 and NaN_GSP_count == 0:
            print('Warning: There are blanks in the Region column after the merge with the grouped_regions data frame. There is a risk some GSPs have not been assigned to their correct region. This may lead to a loss of data when data is pivoted. Please check input and output data files.')
            print('\n')
            print('The GSPs in Scenario ' + scenario + ' and Technology Type ' + technology + 'which have not been assigned regions are: ')
            print(df.Region_check.unique())
            print('\n')
        elif NaN_Region_count == 0 and NaN_GSP_count > 0:
            print('There are blanks in the GSP column after the merge with the grouped_regions data frame. Some regions are missing data from one or more GSPs. This may lead to a loss of data when data is pivoted. Please check input and output data files.') 
            print('\n')
            print('The Regions in Scenario ' + scenario + ' and Technology Type ' + technology + ' which are missing GSPs are:')
            print(df.GSP_check.unique())
            print('\n')
        else:
            print('Data is okay. Please proceed.')        
        
        df['year'] = df['year'].astype('Int64')
        
        # Pivot to have years across the top
        df = pd.pivot_table(df, index='Region', columns= 'year', values='capacity', aggfunc = 'sum')
        
        #Export to CSV
        df.index.name = 'Primary'
        filename = scenario + "-DxCapacity-" + technology + ".csv"
        df.to_csv(r".\ETYS data\Output\DG\\" + filename, index=True, float_format='%.3f')

There are blanks in the GSP column after the merge with the grouped_regions data frame. Some regions are missing data from one or more GSPs. This may lead to a loss of data when data is pivoted. Please check input and output data files.


The Regions in Scenario SP and Technology Type Hydro which are missing GSPs are:
['' 'ABHA1' 'ABNE_P' 'ABTH_1' 'ACTL_2;CBNK_H;GREE_H;PERI_H' 'ALVE1'
 'AMEM_1' 'ARBR_P' 'AXMI1' 'AYRR' 'BAGA' 'BAIN' 'BARKC1;BARKW3'
 'BEAU_P;ORRI_P' 'BEDDT1' 'BEDD_1' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P'
 'BERW' 'BESW_1' 'BICF_1' 'BIRK_1' 'BISW_1' 'BLYTB1' 'BLYTH132' 'BOLN_1'
 'BONN' 'BOTW_1' 'BRAI_1' 'BRAP' 'BRAW_1' 'BRED_1' 'BRFO_1;CLT03' 'BRID_P'
 'BRIM_1' 'BROR_P' 'BROX' 'BRWA1' 'BUMU_P' 'BURM_1' 'BUSH_1' 'BUST_1'
 'CAMB_01' 'CANTN1;RICH1;RICH_J' 'CAPEA1' 'CARE_1' 'CARR_1' 'CATY'
 'CELL_1' 'CHAP' 'CHAR_P' 'CHAS' 'CHIC_1' 'CHSI_1' 'CHTE_1' 'CITR_1'
 'CLAY_P' 'CLYM' 'COAT' 'COCK' 'CONQA1;SASA' 'COUA_P' 'COVE_1'
 'COWL_1;ECLA_H' 'COYL' 'CRAI_P' 'CREB_1' 'CROO' 'CUMB' 'CU

There are blanks in the GSP column after the merge with the grouped_regions data frame. Some regions are missing data from one or more GSPs. This may lead to a loss of data when data is pivoted. Please check input and output data files.


The Regions in Scenario SP and Technology Type Other which are missing GSPs are:
['' 'ACTL_2;CBNK_H;GREE_H;PERI_H' 'ARBR_P' 'ARDK_P;CLAC_P' 'AYRR' 'BAGA'
 'BAIN' 'BEAU_P;ORRI_P' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'BERW'
 'BLYTH132' 'BOAG_P' 'BRAC_P' 'BRAP' 'BRFO_1;CLT03' 'BROA_P' 'BROR_P'
 'BROX' 'BUMU_P' 'CAAD_P' 'CANTN1;RICH1;RICH_J' 'CASS_P' 'CEAN_P' 'CHAP'
 'CHAS' 'COAT' 'COCK' 'CONQA1;SASA' 'COUA_P' 'COWL_1;ECLA_H' 'CRAI_P'
 'CROO' 'CURR' 'DEVM' 'DOUN_P' 'DRCR' 'DRUM' 'DUBE_P' 'DUGR_P' 'DUMF'
 'ECCL' 'EERH' 'EKIL' 'ELGI_P' 'ERSK' 'FASN_P' 'FAUG_P' 'FERRB1;FERRB_M'
 'FETT_P;KINT_P' 'FIDD_P' 'FINN' 'FOUR_1' 'FRAS_P' 'FWIL_P' 'GALA' 'GIFF'
 'GLNI' 'GORG' 'GRUB_P' 'G_EXTRA_12' 'HAGR' 'HARK_1;HUTT_1;RRIG' 'HARM_6'
 'HAWI' 'HELE' 'HEYS_1;HEYS1;ORMO' 

There are blanks in the GSP column after the merge with the grouped_regions data frame. Some regions are missing data from one or more GSPs. This may lead to a loss of data when data is pivoted. Please check input and output data files.


The Regions in Scenario LW and Technology Type Solar which are missing GSPs are:
['' 'AYRR' 'BEAU_P;ORRI_P' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'BRAP'
 'BRID_P' 'BROR_P' 'CANTN1;RICH1;RICH_J' 'CEAN_P' 'CHAR_P' 'COCK'
 'CONQA1;SASA' 'COWL_1;ECLA_H' 'CRAI_P' 'CURR' 'DEWP' 'DRUM' 'DUBE_P'
 'EERH' 'ELDE' 'ERSK' 'FERRB1;FERRB_M' 'FETT_P;KINT_P' 'FINN' 'GORG'
 'GOVA' 'G_EXTRA_12' 'HAGR' 'HARK_1;HUTT_1;RRIG' 'HELE'
 'HEYS_1;HEYS1;ORMO' 'KINL_P' 'LYND_P' 'NAIR_P' 'NORM_1;SALL1' 'PAIS'
 'PART' 'PEHG_P' 'PEHS_P;STRI_P' 'PENW_1;STAH_1;WABO' 'SANX' 'SHIN_P'
 'SHRU' 'SPAV' 'WASF_1;KIBY_G' 'WGEO' 'WHAM_1;ISLI_1' 'WHHO' 'WIOW_P'
 'WISH;RAVE' 'WOHI_P']




The GSPs in Scenario SP and Technology Type Storage which have not been assigned regions are: 
['' 'G_EXTRA_13'



The GSPs in Scenario LW and Technology Type Storage which have not been assigned regions are: 
['' 'G_EXTRA_13']


The regions in Scenario LW and Technology Type Storage with missing GSPs are: 
['' 'ABNE_P' 'ABTH_1' 'ACTL_2;CBNK_H;GREE_H;PERI_H' 'ALNE_P' 'ARBR_P'
 'ARDK_P;CLAC_P' 'AYRR' 'BAGA' 'BAIN' 'BARKC1;BARKW3' 'BEAU_P;ORRI_P'
 'BEDDT1' 'BEDD_1' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'BERW' 'BIRK_1'
 'BLYTB1' 'BLYTH132' 'BOAG_P' 'BRAP' 'BRFO_1;CLT03' 'BRID_P' 'BRIM_1'
 'BROA_P' 'BROR_P' 'BROX' 'BRWA1' 'BUSH_1' 'CAAD_P' 'CAMB_01'
 'CANTN1;RICH1;RICH_J' 'CARE_1' 'CARR_1' 'CASS_P' 'CATY' 'CEAN_P' 'CHAP'
 'CHAR_P' 'CHAS' 'CHSI_1' 'CITR_1' 'CLAY_P' 'COAT' 'COCK' 'CONQA1;SASA'
 'COUA_P' 'COWL_1;ECLA_H' 'COYL' 'CRAI_P' 'CROO' 'CUMB' 'CUPA' 'CURR'
 'DEVM' 'DEVO' 'DEWP' 'DOUN_P' 'DRAX_1' 'DRUM' 'DUBE_P' 'DUDH_P' 'DUGR_P'
 'DUNB' 'DUNF' 'DUNO_P' 'EALI_6' 'ECCL' 'EERH' 'EKIL' 'EKIS' 'ELDE'
 'ELGI_P' 'ELLA_1' 'FASN_P' 'FAUG_P' 'FAWL_1' 'FERRA2' 'FERRB1;FERRB_M'
 'FETT_P;KINT_P' 'FIDD_P' 'FIDF



The GSPs in Scenario LW and Technology Type Wind which have not been assigned regions are: 
['' 'G_EXTRA_13' 'G_EXTRA_15']


The regions in Scenario LW and Technology Type Wind with missing GSPs are: 
['' 'ABHA1' 'ABTH_1' 'ACTL_2;CBNK_H;GREE_H;PERI_H' 'ARBR_P'
 'ARDK_P;CLAC_P' 'AXMI1' 'AYRR' 'BARKC1;BARKW3' 'BEAU_P;ORRI_P' 'BEDDT1'
 'BEDD_1' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'BESW_1' 'BISW_1' 'BOAG_P'
 'BOLN_1' 'BOTW_1' 'BRAC_P' 'BRAP' 'BRED_1' 'BRFO_1;CLT03' 'BRID_P'
 'BRIM_1' 'BROA_P' 'BROR_P' 'BROX' 'BRWA1' 'BUMU_P' 'BUST_1'
 'CANTN1;RICH1;RICH_J' 'CAPEA1' 'CARR_1' 'CASS_P' 'CATY' 'CEAN_P' 'CELL_1'
 'CHAR_P' 'CHAS' 'CHIC_1' 'CHSI_1' 'CITR_1' 'CLAY_P' 'CLYM' 'COAT' 'COCK'
 'CONQA1;SASA' 'COWL_1;ECLA_H' 'CRAI_P' 'CUPA' 'CURR' 'DALM3' 'DEWP'
 'DRAK_1' 'DRCR' 'DRUM' 'DUBE_P' 'DUDH_P' 'DUGR_P' 'DUNF' 'DUNO_P'
 'EALI_6' 'ECLA_1' 'EERH' 'EKIL' 'ELDE' 'ELGI_P' 'ELST_1' 'ERSK' 'EXET1'
 'FASN_P' 'FAUG_P' 'FAWL_1' 'FERRB1;FERRB_M' 'FETT_P;KINT_P' 'FIDF_1'
 'FINN' 'FLEE_1;BRLE_1' 'FWIL_P' 

In [14]:
#Sub1MW.csv

for technology in ["Hydro", "Other", "Solar", "Battery", "Wind"]:
    for scenario in ["SP", "CT", "ST", "LW"]:
        # Create df as a filter of df_csv
        df = df_Sub1MW[(df_Sub1MW['scenario'] == scenario) & (df_Sub1MW['tech'] == technology)]
        
        
        # Merge df on grouped_regions. Merge type 'outer' to prevent dropping of data from 
        # df_DG and grouped_regions dataframes. Drop index & GSP columns.
        df = df.merge(grouped_regions, left_on = "GSP", right_on = "index", how = 'outer').drop(columns = ["index"]).rename(columns = {0:"Region"})
        
        
        # Perform checks on df merged with grouped_regions
        
        # Creating new column containing GSPs which have not been assigned regions in the merge
        df['Region_check'] = df['Region']
        df.Region_check.fillna(df.GSP, inplace=True)
        df["Region_check"] = df.apply(lambda x: x["Region_check"].replace(str(x["Region"]), "").strip(), axis=1)


        # Creating new column containing Regions which are missing one or more GSPs
        df['GSP_check'] = df['GSP']
        df.GSP_check.fillna(df.Region, inplace=True)
        df["GSP_check"] = df.apply(lambda x: x["GSP_check"].replace(str(x["GSP"]), "").strip(), axis=1)

        #Checking which Regions and which GSPs are missing values
        check_Region_for_NaN = df['Region'].isna()
        check_GSP_for_NaN = df['GSP'].isna()
        NaN_Region_count = check_Region_for_NaN.sum()
        NaN_GSP_count = check_GSP_for_NaN.sum()

        #Printing warning messages to user.
        if NaN_Region_count > 0 and NaN_GSP_count > 0:
            print('Warning: There are blanks in both GSP and Region columns after the merge with the grouped_regions data frame. There is a risk some GSPs have not been asssigned a correct region and that some regions are missing data from one or more GSPs. This may lead to a loss of data when data is pivoted. Please check input and output data files.') 
            print('\n')
            print('The GSPs in Scenario ' + scenario + ' and Technology Type ' + technology + ' which have not been assigned regions are: ')
            print(df.Region_check.unique())
            print('\n')
            print('The regions in Scenario ' + scenario + ' and Technology Type ' + technology + ' with missing GSPs are: ')
            print(df.GSP_check.unique())
            print('\n')
        elif NaN_Region_count > 0 and NaN_GSP_count == 0:
            print('Warning: There are blanks in the Region column after the merge with the grouped_regions data frame. There is a risk some GSPs have not been assigned to their correct region. This may lead to a loss of data when data is pivoted. Please check input and output data files.')
            print('\n')
            print('The GSPs in Scenario ' + scenario + ' and Technology Type ' + technology + ' which have not been assigned regions are: ')
            print(df.Region_check.unique())
            print('\n')
        elif NaN_Region_count == 0 and NaN_GSP_count > 0:
            print('There are blanks in the GSP column after the merge with the grouped_regions data frame. Some regions are missing data from one or more GSPs. This may lead to a loss of data when data is pivoted. Please check input and output data files.')
            print('\n')
            print('The Regions in Scenario ' + scenario + ' and Technology Type ' + technology + ' which are missing GSPs are:')
            print(df.GSP_check.unique())
            print('\n')
        else:
            print('Data is okay. Please proceed.')     
       
        df['year'] = df['year'].astype('Int64')
    
        
        # Pivot to have years across the top
        df = pd.pivot_table(df, index='Region', columns= 'year', values='capacity', aggfunc = 'sum')
        
        
        # Export to CSV
        df.index.name = 'Primary'
        if technology == "Battery":
            filename = scenario + "-MxCapacity-" + "Storage" + ".csv"
        else:
            filename = scenario + "-MxCapacity-" + technology + ".csv"
        df.to_csv(r".\ETYS data\Output\Sub1MW\\" + filename, index=True, float_format='%.3f')

There are blanks in the GSP column after the merge with the grouped_regions data frame. Some regions are missing data from one or more GSPs. This may lead to a loss of data when data is pivoted. Please check input and output data files.


The Regions in Scenario SP and Technology Type Hydro which are missing GSPs are:
['' 'ABTH_1' 'ACTL_2;CBNK_H;GREE_H;PERI_H' 'AMEM_1' 'ARBR_P'
 'ARDK_P;CLAC_P' 'AYRR' 'BAGA' 'BAIN' 'BARKC1;BARKW3' 'BEAU_P;ORRI_P'
 'BEDDT1' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'BERW' 'BICF_1' 'BIRK_1'
 'BLYTB1' 'BOLN_1' 'BOTW_1' 'BRAI_1' 'BRAP' 'BRFO_1;CLT03' 'BRIM_1' 'BROX'
 'BURM_1' 'BUST_1' 'CAMB_01' 'CANTN1;RICH1;RICH_J' 'CAPEA1' 'CARE_1'
 'CATY' 'CEAN_P' 'CHAR_P' 'CHAS' 'CHSI_1' 'CITR_1' 'CLAY_P' 'CLYM' 'COAT'
 'CONQA1;SASA' 'COVE_1' 'COWL_1;ECLA_H' 'CRAI_P' 'CREB_1' 'CROO' 'CUMB'
 'CUPA' 'CURR' 'DALM3' 'DEWP' 'DOUN_P' 'DRCR' 'DRUM' 'DUBE_P' 'DUDH_P'
 'DUGR_P' 'DUNB' 'DUNF' 'EALI_6' 'ECCL' 'ECLA_1' 'EERH' 'EKIS' 'ELDE'
 'ENDE_1' 'ERSK' 'FASN_P' 'FAWL_1' 'FERRA2' 'F

There are blanks in the GSP column after the merge with the grouped_regions data frame. Some regions are missing data from one or more GSPs. This may lead to a loss of data when data is pivoted. Please check input and output data files.


The Regions in Scenario SP and Technology Type Other which are missing GSPs are:
['' 'ACTL_2;CBNK_H;GREE_H;PERI_H' 'BARKC1;BARKW3' 'BEAU_P;ORRI_P'
 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'BRFO_1;CLT03' 'CANTN1;RICH1;RICH_J'
 'CONQA1;SASA' 'COWL_1;ECLA_H' 'CROO' 'EKIS' 'FASN_P' 'FERRB1;FERRB_M'
 'FETT_P;KINT_P' 'G_EXTRA_12' 'HACK_1;HACK_6' 'HARK_1;HUTT_1;RRIG'
 'HEYS_1;HEYS1;ORMO' 'IMPK_1' 'KEAR_1;KEAR_3' 'KILB' 'NORM_1;SALL1'
 'PEHG_P' 'PEHS_P;STRI_P' 'PENW_1;STAH_1;WABO' 'QUOI_P' 'SACO' 'SALH_1'
 'TONG;CAFA;EAST;GLLE;KEOO' 'TYNE_1;TYNE_2' 'UPPB_1;UPPB_3'
 'USKM_1;ALST_3' 'WHAM_1;ISLI_1' 'WIMBS1;WIMBN1' 'WISD_6;WISD_1;ACTL_C']


There are blanks in the GSP column after the merge with the grouped_regions data frame. Some regions are missing data from one

There are blanks in the GSP column after the merge with the grouped_regions data frame. Some regions are missing data from one or more GSPs. This may lead to a loss of data when data is pivoted. Please check input and output data files.


The Regions in Scenario ST and Technology Type Battery which are missing GSPs are:
['' 'ACTL_2;CBNK_H;GREE_H;PERI_H' 'ARDK_P;CLAC_P' 'BARKC1;BARKW3'
 'BEAU_P;ORRI_P' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'BRFO_1;CLT03'
 'CAMB_01' 'CANTN1;RICH1;RICH_J' 'CASS_P' 'CEAN_P' 'CHAR_P' 'CLYM'
 'CONQA1;SASA' 'COWL_1;ECLA_H' 'DRCR' 'DUBE_P' 'EKIS' 'FASN_P'
 'FERRB1;FERRB_M' 'FETT_P;KINT_P' 'FINN' 'FRAS_P' 'G_EXTRA_12'
 'HACK_1;HACK_6' 'HARK_1;HUTT_1;RRIG' 'HEYS_1;HEYS1;ORMO' 'HUER' 'IMPK_1'
 'KEAR_1;KEAR_3' 'MYBS_P' 'NETS' 'NORM_1;SALL1' 'PEHG_P' 'PEHS_P;STRI_P'
 'PENW_1;STAH_1;WABO' 'QUOI_P' 'SALH_1' 'SANX' 'SFEG_P' 'SLOY_P' 'TEMP_3'
 'TONG;CAFA;EAST;GLLE;KEOO' 'TYNE_1;TYNE_2' 'UPPB_1;UPPB_3'
 'USKM_1;ALST_3' 'WGEO' 'WHAM_1;ISLI_1' 'WIBA_3' 'WIMBS1;WIMBN1'
 'WI

In [15]:
#DSR.csv

for scenario in ["SP", "CT", "ST", "LW"]:
    # Create df as a filter of df_csv
    df = df_DSR[(df_DSR['scenario'] == scenario)]
    
    # Merge df on grouped_regions. Merge type 'outer' to prevent dropping of data from 
    # df_DSR and grouped_regions dataframes.
    # Drop index & GSP columns
    df = df.merge(grouped_regions, left_on = "GSP", right_on = "index", how = 'outer').drop(columns = ["index"]).rename(columns = {0:"Region"})
    
    # Perform checks on df merged with grouped_regions
        
    # Creating new column containing GSPs which have not been assigned regions in the merge
    df['Region_check'] = df['Region']
    df.Region_check.fillna(df.GSP, inplace=True)
    df["Region_check"] = df.apply(lambda x: x["Region_check"].replace(str(x["Region"]), "").strip(), axis=1)
    
    
    # Creating new column containing Regions which are missing one or more GSPs
    df['GSP_check'] = df['GSP']
    df.GSP_check.fillna(df.Region, inplace=True)
    df["GSP_check"] = df.apply(lambda x: x["GSP_check"].replace(str(x["GSP"]), "").strip(), axis=1)
    
    #Checking which Regions and which GSPs are missing values
    check_Region_for_NaN = df['Region'].isna()
    check_GSP_for_NaN = df['GSP'].isna()
    
    NaN_Region_count = check_Region_for_NaN.sum()
    NaN_GSP_count = check_GSP_for_NaN.sum()
    
    
    #Printing warning messages to user.
    if NaN_Region_count > 0 and NaN_GSP_count > 0:
        print('Warning: There are blanks in both GSP and Region columns after the merge with the grouped_regions data frame. There is a risk some GSPs have not been asssigned a correct region and that some regions are missing data from one or more GSPs. This may lead to a loss of data when data is pivoted. Please check input and output data files.') 
        print('\n')
        print('The GSPs in Scenario ' + scenario + '  which have not been assigned regions are: ')
        print(df.Region_check.unique())
        print('\n')
        print('The regions in Scenario ' + scenario + ' with missing GSPs are: ')
        print(df.GSP_check.unique())
        print('\n')
    elif NaN_Region_count > 0 and NaN_GSP_count == 0:
        print('Warning: There are blanks in the Region column after the merge with the grouped_regions data frame. There is a risk some GSPs have not been assigned to their correct region. This may lead to a loss of data when data is pivoted. Please check input and output data files.')
        print('\n')
        print('The GSPs in Scenario ' + scenario + ' which have not been assigned regions are: ')
        print(df.Region_check.unique())
        print('\n')
    elif NaN_Region_count == 0 and NaN_GSP_count > 0:
        print('There are blanks in the GSP column after the merge with the grouped_regions data frame. Some regions are missing data from one or more GSPs. This may lead to a loss of data when data is pivoted. Please check input and output data files.')
        print('\n')
        print('The Regions in Scenario ' + scenario + ' which are missing GSPs are: ')
        print(df.GSP_check.unique())
        print('\n')
    else:
        print('Data is okay. Please proceed.')    
    
    df['year'] = df['year'].astype('Int64')
    
    # Pivot to have years across the top
    df = pd.pivot_table(df, index='Region', columns= 'year', values='DSR', aggfunc = 'sum')

    # Export to CSV
    df.index.name = 'Primary'
    filename = scenario + "-DSR-" + ".csv"
    df.to_csv(r".\ETYS data\Output\DSR\\" + filename, index=True, float_format='%.3f')



The GSPs in Scenario SP  which have not been assigned regions are: 
['' 'G_EXTRA_13' 'G_EXTRA_14' 'G_EXTRA_15' 'G_EXTRA_16']


The regions in Scenario SP with missing GSPs are: 
['' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'CANTN1;RICH1;RICH_J'
 'COWL_1;ECLA_H' 'G_EXTRA_12' 'WYMOM_1']




The GSPs in Scenario CT  which have not been assigned regions are: 
['' 'G_EXTRA_13' 'G_EXTRA_14' 'G_EXTRA_15' 'G_EXTRA_16']


The regions in Scenario CT with missing GSPs are: 
['' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'CANTN1;RICH1;RICH_J'
 'COWL_1;ECLA_H' 'G_EXTRA_12' 'WYMOM_1']




The GSPs in Scenario ST  which have not been assigned regions are: 
['' 'G_EXTRA_13' 'G_EXTRA_14' 'G_EXTRA_15' 'G_EXTRA_16']


The regions in Scenario ST with missing GSPs are: 
['' 'BERB_P;CAIF_P;DALL_P;GLEF_P;KEIT_P' 'CANTN1;RICH1;RICH_J'
 'COWL_1;ECLA_H' 'G_EXTRA_12' 'WYMOM_1']




The GSPs in Scenario LW  which have not been assigned regions are: 
['' 'G_EXTRA_13' 'G_EXTRA_14' 'G_EXTRA_15' 'G_EXTRA_16']


The regions i

# QA

Note: The active dataset also includes transmission sites that we do not include in the Regional Visulisation. These start "T_", "M_" "B_EXTRA".