#### Development of Employment figures in relation to gender in Austria from 2013 to 2022
University of Salzburg; IPSDI W24/25  
Last update: 07/01/2025  
Adana Mirzoyan and Ethel Ogallo  

Pre-processing and Processing  
Data and sources:  
* NUTS regions shapefile from Eurostat [https://ec.europa.eu/eurostat/web/gisco/geodata/statistical-units/territorial-units-statistics]
* Classification of rural and urban areas at NUTS3 level from Statistik  Austria [https://www.statistik.at/atlas/?mapid=topo_stadt_land&languageid=1]
* Employment data by gender form Statistik Austria [https://www.statistik.at/atlas/?mapid=topo_stadt_land&languageid=1]

In [1]:
# libraries
import geopandas as gpd
import pandas as pd
import os

Load the data downloaded from the sources for processing. View the structure of the datasets to identifz what is needed and what is not needed. 

In [3]:
# Austria NUTS regions
nuts_filepath = "Data/NUTS_RG_20M_2024_3857.shp/NUTS_RG_20M_2024_3857.shp" #change to relative path
nuts = gpd.read_file(nuts_filepath)

# Filter for NUTS 3 region
nuts3 = nuts[(nuts['LEVL_CODE'] == 3) & (nuts['CNTR_CODE'] == 'AT')] 

# keep relevant columns 
nuts3_df = nuts3[['NUTS_ID', 'NUTS_NAME','geometry']]
print(nuts3_df.head(5))

   NUTS_ID                NUTS_NAME  \
93   AT111         Mittelburgenland   
94   AT112           Nordburgenland   
95   AT113            Südburgenland   
96   AT121  Mostviertel-Eisenwurzen   
97   AT122     Niederösterreich-Süd   

                                             geometry  
93  POLYGON ((1853048.743 6015277.17, 1841177.225 ...  
94  POLYGON ((1824661.549 6087262.951, 1841177.225...  
95  POLYGON ((1829397.983 5999870.843, 1831889.589...  
96  POLYGON ((1694063.316 6072668.513, 1679599.198...  
97  POLYGON ((1707514.893 6088554.48, 1733109.016 ...  


The classification of the NUTS 3 region is in 3 classess: PU(Predominantly Urban), IN and PR(Predominantly Rural). We will simplify teh classification by reclassifying into two classes; Rural and Urban.

In [5]:
# Rural-Urban classification at NUTS 3 level
region_filepath = "Data/classifications_of_urban_and_rural_areas.xlsx"
region_df = pd.read_excel(region_filepath, header = 2)

# Rename classifications
region_df['region'] = region_df['Value'].replace({'PU': 'Urban', 'IN': 'Urban', 'PR': 'Rural'})

# keep relevant columns 
region_df = region_df[['ID', 'Name','region']]
print(region_df.head(5))

      ID                     Name region
0  AT111         Mittelburgenland  Rural
1  AT112           Nordburgenland  Rural
2  AT113            Südburgenland  Rural
3  AT121  Mostviertel-Eisenwurzen  Rural
4  AT122     Niederösterreich-Süd  Urban


The employment dataset by gender was downloaded as different datasets per year per gender for teh 10 year period. We merged the datasets and organized it by gender into one comprehensive dataset.

In [6]:
# Employment data by gender at political district level

# file path of teh folder with all the excel files
employment_data_folder = "Data/employment data tables"

# List to store individual DataFrames
dataframes = []

# Load all Excel files
for file in os.listdir(employment_data_folder):
    if file.endswith(".xlsx"):
        filepath = os.path.join(employment_data_folder, file)
        df = pd.read_excel(filepath, header=2)  
        # Extract year and gender from the filename
        year = int(file.split('_')[5].replace(".xlsx", ""))  # Extract year
        gender = file.split('_')[4]  # Extract 'total', 'male', or 'female'
        df['Year'] = year
        df['Gender'] = gender
        dataframes.append(df)

# Concatenate all DataFrames into one
employment_df = pd.concat(dataframes, ignore_index=True)

# Rename gender
employment_df['Gender'] = employment_df['Gender'].replace({'men': 'Male', 'women': 'Female'})

# keep relevant columns 
employment_df = employment_df.drop(columns=['%'])
employment_df = employment_df.rename(columns={'abs.': 'num_employed'})

# Check the result
print(employment_df.head())

    ID                 Name  num_employed  Year Gender
0  101    Eisenstadt(Stadt)          3249  2013   Male
1  102          Rust(Stadt)           459  2013   Male
2  103  Eisenstadt-Umgebung         10529  2013   Male
3  104              Güssing          6479  2013   Male
4  105          Jennersdorf          4581  2013   Male


The employment dataset has the regions as districts and inorder to classify the employment figures by rural and urban, we reclassified the employment figures into the NUTS 3 regions. Each district belongs to a specific NUTS 3 region with a specific code.

In [7]:
# Get all unique district names from the 'Name' column
unique_names = employment_df['Name'].unique()

# Display the unique names
print(unique_names)

['Eisenstadt(Stadt)' 'Rust(Stadt)' 'Eisenstadt-Umgebung' 'Güssing'
 'Jennersdorf' 'Mattersburg' 'Neusiedl am See' 'Oberpullendorf' 'Oberwart'
 'Klagenfurt Stadt' 'Villach Stadt' 'Hermagor' 'Klagenfurt Land'
 'Sankt Veit an der Glan' 'Spittal an der Drau' 'Villach Land'
 'Völkermarkt' 'Wolfsberg' 'Feldkirchen' 'Krems an der Donau(Stadt)'
 'Sankt Pölten(Stadt)' 'Waidhofen an der Ybbs(Stadt)'
 'Wiener Neustadt(Stadt)' 'Amstetten' 'Baden' 'Bruck an der Leitha'
 'Gänserndorf' 'Gmünd' 'Hollabrunn' 'Horn' 'Korneuburg' 'Krems(Land)'
 'Lilienfeld' 'Melk' 'Mistelbach' 'Mödling' 'Neunkirchen'
 'Sankt Pölten(Land)' 'Scheibbs' 'Tulln' 'Waidhofen an der Thaya'
 'Wiener Neustadt(Land)' 'Zwettl' 'Stadt Linz' 'Stadt Steyr' 'Stadt Wels'
 'Braunau' 'Eferding' 'Freistadt' 'Gmunden' 'Grieskirchen' 'Kirchdorf'
 'Linz-Land' 'Perg' 'Ried' 'Rohrbach' 'Schärding' 'Steyr-Land'
 'Urfahr-Umgebung' 'Vöcklabruck' 'Wels-Land' 'Salzburg(Stadt)' 'Hallein'
 'Salzburg-Umgebung' 'St. Johann im Pongau' 'Tamsweg' 'Zell am S

In [8]:
# District aggregation to NUTS 3 mapping 
district_to_nuts3 = {
    # Burgenland
    "Eisenstadt(Stadt)": "AT112", "Rust(Stadt)": "AT112", "Eisenstadt-Umgebung": "AT112", "Güssing": "AT113",
    "Jennersdorf": "AT113", "Mattersburg": "AT112", "Neusiedl am See": "AT112", "Oberpullendorf": "AT111", 
    "Oberwart": "AT113",

    # Kärnten (Carinthia)
    "Klagenfurt Stadt": "AT211", "Villach Stadt": "AT211", "Hermagor": "AT212", "Klagenfurt Land": "AT211", 
    "Sankt Veit an der Glan": "AT213", "Spittal an der Drau": "AT212", "Villach Land": "AT211", "Völkermarkt": "AT213", 
    "Wolfsberg": "AT213", "Feldkirchen": "AT212",

    # Niederösterreich (Lower Austria)
    "Krems an der Donau(Stadt)": "AT124", "Sankt Pölten(Stadt)": "AT123", "Waidhofen an der Ybbs(Stadt)": "AT121", 
    "Wiener Neustadt(Stadt)": "AT122", "Amstetten": "AT121", "Baden": "AT127", "Bruck an der Leitha": "AT127", 
    "Gänserndorf": "AT126", "Gmünd": "AT124", "Hollabrunn": "AT125", "Horn": "AT124", "Korneuburg": "AT126", 
    "Krems(Land)": "AT124", "Lilienfeld": "AT122", "Melk": "AT121", "Mistelbach": "AT125", "Mödling": "AT127", 
    "Neunkirchen": "AT122", "Sankt Pölten(Land)": "AT123", "Scheibbs": "AT121", "Tulln": "AT126", "Waidhofen an der Thaya": "AT124", 
    "Wiener Neustadt(Land)": "AT122", "Zwettl": "AT124",

    # Oberösterreich (Upper Austria)
    "Stadt Linz": "AT312", "Stadt Steyr": "AT314", "Stadt Wels": "AT312", "Braunau": "AT311", "Eferding": "AT312", 
    "Freistadt": "AT313", "Gmunden": "AT315", "Grieskirchen": "AT311", "Kirchdorf": "AT314", "Linz-Land": "AT312", 
    "Perg": "AT313", "Ried": "AT311", "Rohrbach": "AT313", "Schärding": "AT311", "Steyr-Land": "AT314", 
    "Urfahr-Umgebung": "AT313", "Vöcklabruck": "AT315", "Wels-Land": "AT312",

    # Salzburg
    "Salzburg(Stadt)": "AT323", "Hallein": "AT323", "Salzburg-Umgebung": "AT323", "St. Johann im Pongau": "AT322", 
    "Tamsweg": "AT321", "Zell am See": "AT322",

    # Steiermark (Styria)
    "Graz(Stadt)": "AT221", "Deutschlandsberg": "AT225", "Graz-Umgebung": "AT221", "Leibnitz": "AT225", 
    "Leoben": "AT223", "Liezen": "AT222", "Murau": "AT226", "Voitsberg": "AT225", "Weiz": "AT224", 
    "Murtal": "AT226", "Bruck-Mürzzuschlag": "AT223", "Hartberg-Fürstenfeld": "AT224", "Südoststeiermark": "AT224",

    # Tirol (Tyrol)
    "Innsbruck-Stadt": "AT332", "Imst": "AT334", "Innsbruck-Land": "AT332", "Kitzbühel": "AT335", "Kufstein": "AT335", 
    "Landeck": "AT334", "Lienz": "AT333", "Reutte": "AT331", "Schwaz": "AT335",

    # Vorarlberg
    "Bludenz": "AT341", "Bregenz": "AT341", "Dornbirn": "AT342", "Feldkirch": "AT342",

    # Wien (Vienna)
    "Wien(Stadt)": "AT130"
}


# Map districts to NUTS 3 regions
employment_df['NUTS3_ID'] = employment_df['Name'].map(district_to_nuts3)
#print(employment_df.head(5))

# Group by NUTS 3 regions and sum employment figures
employment_nuts3_df = employment_df.groupby(['NUTS3_ID', 'Year', 'Gender'], as_index=False).sum()
employment_nuts3_df = employment_nuts3_df.drop(columns=['ID','Name'])
print(employment_nuts3_df.head(5))


  NUTS3_ID  Year  Gender  num_employed
0    AT111  2013  Female          7816
1    AT111  2013    Male          9332
2    AT111  2013   total         17148
3    AT111  2014  Female          7746
4    AT111  2014    Male          9254


All the datasets were merged into one dataset for further analysis.

In [9]:
# Merge employment data with regional classification data
merged_df = pd.merge(employment_nuts3_df, region_df, left_on='NUTS3_ID', right_on='ID')
#print(merged_df.head(6))

# Merge with NUTS 3 geometries
combined_gdf = nuts3_df.merge(merged_df, left_on='NUTS_ID', right_on='NUTS3_ID')

#keep only the required columns
combined_gdf = combined_gdf[['NUTS_ID', 'NUTS_NAME','geometry','Year','Gender','num_employed','region']]
print(combined_gdf.head(6))


  NUTS_ID         NUTS_NAME  \
0   AT111  Mittelburgenland   
1   AT111  Mittelburgenland   
2   AT111  Mittelburgenland   
3   AT111  Mittelburgenland   
4   AT111  Mittelburgenland   
5   AT111  Mittelburgenland   

                                            geometry  Year  Gender  \
0  POLYGON ((1853048.743 6015277.17, 1841177.225 ...  2013  Female   
1  POLYGON ((1853048.743 6015277.17, 1841177.225 ...  2013    Male   
2  POLYGON ((1853048.743 6015277.17, 1841177.225 ...  2013   total   
3  POLYGON ((1853048.743 6015277.17, 1841177.225 ...  2014  Female   
4  POLYGON ((1853048.743 6015277.17, 1841177.225 ...  2014    Male   
5  POLYGON ((1853048.743 6015277.17, 1841177.225 ...  2014   total   

   num_employed region  
0          7816  Rural  
1          9332  Rural  
2         17148  Rural  
3          7746  Rural  
4          9254  Rural  
5         17000  Rural  


In [10]:
#  Save as a Geopackage
combined_gdf.to_file("Data/austria_employment_data.gpkg", driver="GPKG")


Data validation

In [11]:
# load the merged dataset 
gdf = gpd.read_file(r"C:\Users\Ethel Ogallo\Documents\WS24\IP-SDI\IPSDI_PROJECT\DEFGA\Data\austria_employment_data.gpkg")

gdf = gdf[gdf.is_valid]