# Generate Yearly Employment Rate Data by Region

从 annual_employment_rate_cleaned.csv 中筛选数据，生成2019-2024年的分年度数据表

筛选条件:
- Age Group = 'All ages'
- Sex = 'Both sexes'
- NUTS 2 Region != 'Ireland' (只保留三个具体区域)

每个年份生成一个CSV文件，包含三个区域的就业率数据

In [1]:
import pandas as pd
import os

In [2]:
# Load the cleaned annual employment rate data
df = pd.read_csv('annual_employment_rate_cleaned.csv')

print(f"Original data shape: {df.shape}")
print(f"\nColumns: {list(df.columns)}")
print(f"\nFirst few rows:")
print(df.head(10))

Original data shape: (720, 5)

Columns: ['Year', 'Age Group', 'Sex', 'NUTS 2 Region', 'employment_rate']

First few rows:
   Year Age Group         Sex         NUTS 2 Region  employment_rate
0  2019  All ages  Both sexes               Ireland             74.9
1  2019  All ages  Both sexes  Northern and Western             74.0
2  2019  All ages  Both sexes              Southern             72.8
3  2019  All ages  Both sexes   Eastern and Midland             76.6
4  2019  All ages        Male               Ireland             81.3
5  2019  All ages        Male  Northern and Western             80.3
6  2019  All ages        Male              Southern             79.2
7  2019  All ages        Male   Eastern and Midland             82.9
8  2019  All ages      Female               Ireland             68.7
9  2019  All ages      Female  Northern and Western             67.8


In [3]:
# Filter the data based on conditions
filtered_df = df[
    (df['Age Group'] == 'All ages') & 
    (df['Sex'] == 'Both sexes') & 
    (df['NUTS 2 Region'] != 'Ireland')
]

print(f"Filtered data shape: {filtered_df.shape}")
print(f"\nUnique years: {sorted(filtered_df['Year'].unique())}")
print(f"\nUnique regions: {filtered_df['NUTS 2 Region'].unique()}")
print(f"\nFiltered data:")
print(filtered_df)

Filtered data shape: (18, 5)

Unique years: [np.int64(2019), np.int64(2020), np.int64(2021), np.int64(2022), np.int64(2023), np.int64(2024)]

Unique regions: ['Northern and Western' 'Southern' 'Eastern and Midland']

Filtered data:
     Year Age Group         Sex         NUTS 2 Region  employment_rate
1    2019  All ages  Both sexes  Northern and Western             74.0
2    2019  All ages  Both sexes              Southern             72.8
3    2019  All ages  Both sexes   Eastern and Midland             76.6
121  2020  All ages  Both sexes  Northern and Western             70.9
122  2020  All ages  Both sexes              Southern             70.7
123  2020  All ages  Both sexes   Eastern and Midland             73.2
241  2021  All ages  Both sexes  Northern and Western             76.0
242  2021  All ages  Both sexes              Southern             72.9
243  2021  All ages  Both sexes   Eastern and Midland             75.9
361  2022  All ages  Both sexes  Northern and Western     

In [4]:
# Create output directory if it doesn't exist
output_dir = 'yearly_regional_data'
os.makedirs(output_dir, exist_ok=True)

# Generate CSV files for each year (2019-2024)
years = range(2019, 2025)
generated_files = []

for year in years:
    # Filter data for specific year
    year_data = filtered_df[filtered_df['Year'] == year]
    
    # Select only necessary columns: region and employment_rate
    year_data_export = year_data[['NUTS 2 Region', 'employment_rate']].copy()
    
    # Rename column for clarity
    year_data_export = year_data_export.rename(columns={'NUTS 2 Region': 'region'})
    
    # Generate filename
    filename = f'{output_dir}/employment_rate_{year}.csv'
    
    # Export to CSV
    year_data_export.to_csv(filename, index=False)
    
    generated_files.append(filename)
    
    print(f"\n{'='*50}")
    print(f"Year {year}:")
    print(f"  File: {filename}")
    print(f"  Rows: {len(year_data_export)}")
    print(f"  Data:")
    print(year_data_export.to_string(index=False))

print(f"\n\n{'='*50}")
print(f"Successfully generated {len(generated_files)} files!")
print(f"\nGenerated files:")
for f in generated_files:
    print(f"  - {f}")


Year 2019:
  File: yearly_regional_data/employment_rate_2019.csv
  Rows: 3
  Data:
              region  employment_rate
Northern and Western             74.0
            Southern             72.8
 Eastern and Midland             76.6

Year 2020:
  File: yearly_regional_data/employment_rate_2020.csv
  Rows: 3
  Data:
              region  employment_rate
Northern and Western             70.9
            Southern             70.7
 Eastern and Midland             73.2

Year 2021:
  File: yearly_regional_data/employment_rate_2021.csv
  Rows: 3
  Data:
              region  employment_rate
Northern and Western             76.0
            Southern             72.9
 Eastern and Midland             75.9

Year 2022:
  File: yearly_regional_data/employment_rate_2022.csv
  Rows: 3
  Data:
              region  employment_rate
Northern and Western             78.5
            Southern             76.4
 Eastern and Midland             79.0

Year 2023:
  File: yearly_regional_data/employment_rate

In [5]:
# Verify the generated files
print("Verification of generated files:\n")

for year in range(2019, 2025):
    filename = f'{output_dir}/employment_rate_{year}.csv'
    verify_df = pd.read_csv(filename)
    
    print(f"\n{year}:")
    print(f"  Shape: {verify_df.shape}")
    print(f"  Columns: {list(verify_df.columns)}")
    print(f"  Regions: {verify_df['region'].tolist()}")
    print(f"  Employment rates: {verify_df['employment_rate'].tolist()}")

Verification of generated files:


2019:
  Shape: (3, 2)
  Columns: ['region', 'employment_rate']
  Regions: ['Northern and Western', 'Southern', 'Eastern and Midland']
  Employment rates: [74.0, 72.8, 76.6]

2020:
  Shape: (3, 2)
  Columns: ['region', 'employment_rate']
  Regions: ['Northern and Western', 'Southern', 'Eastern and Midland']
  Employment rates: [70.9, 70.7, 73.2]

2021:
  Shape: (3, 2)
  Columns: ['region', 'employment_rate']
  Regions: ['Northern and Western', 'Southern', 'Eastern and Midland']
  Employment rates: [76.0, 72.9, 75.9]

2022:
  Shape: (3, 2)
  Columns: ['region', 'employment_rate']
  Regions: ['Northern and Western', 'Southern', 'Eastern and Midland']
  Employment rates: [78.5, 76.4, 79.0]

2023:
  Shape: (3, 2)
  Columns: ['region', 'employment_rate']
  Regions: ['Northern and Western', 'Southern', 'Eastern and Midland']
  Employment rates: [78.2, 78.3, 80.7]

2024:
  Shape: (3, 2)
  Columns: ['region', 'employment_rate']
  Regions: ['Northern and Wester

## Summary

生成了6个CSV文件 (2019-2024)，每个文件包含:
- **列**: `region`, `employment_rate`
- **行数**: 3 (对应三个NUTS 2区域)
- **区域**: Northern and Western, Southern, Eastern and Midland

文件保存在 `yearly_regional_data/` 目录中