German Federal Election Analysis

This Jupyter Notebook aims to analyze the structural data and election results of the German federal elections. The data is preprocessed and cleaned to allow for a comprehensive analysis. In particular, the notebook covers the following steps:

    Import the required libraries
    Load and preprocess structural data
    Rename columns for better readability
    Perform calculations on the data
    Save the cleaned structural data to a CSV file
    Load and preprocess election data
    Rename columns for better readability
    Clean and format the data
    Save the cleaned election data to a CSV file

In [None]:
import pandas as pd

# Load and preprocess structural data
import_data = pd.read_csv('/Users/lutz/Downloads/btw21_structural_data_raw.csv', sep=';')
structural_data = import_data.iloc[7:,:]
structural_data.columns = structural_data.iloc[0]
structural_data = structural_data.drop(structural_data.index[0]).reset_index(drop=True)

# Remove unnecessary rows
rows_to_remove = structural_data[structural_data.iloc[:, 2] == 'Land insgesamt']
structural_data = structural_data.drop(rows_to_remove.index)
structural_data.drop([315], inplace=True)
structural_data.drop(['Fußnoten'], inplace=True, axis=1)
structural_data.reset_index(inplace=True, drop=True)
structural_data['Wahlkreis-Nr.']=list(range(1,300))

# Clean and format the data
slice_1 = structural_data.iloc[:,3:].apply(lambda x: x.str.replace('.','').str.replace(',','.').astype('float'))
slice_2 = structural_data.iloc[:,:3]
structural_data = pd.concat([slice_2, slice_1], axis=1)

# Rename columns for better readability
column_name_mapping = {
    # Add the dictionary with the original column names as keys and shortened names as values
}

structural_data.rename(columns=column_name_mapping, inplace=True)

# Perform calculations on the data
structural_data['Total_Population'] = structural_data['Total_Population'] * 1000
structural_data['German_Population_Perc'] = 100 - structural_data['Foreign_Population_Perc']
structural_data['Vocational_School_Graduates'] = structural_data['Vocational_School_Graduates'] / structural_data['Total_Population'] * 100

# Save the cleaned structural data to a CSV file
structural_data.to_csv('/Users/Lutz/Documents/german_federal_elections/structural_data.csv')



In [None]:
# Load and preprocess election data
import_data_election = pd.read_csv('/Users/lutz/Downloads/election_data_raw.csv', header=None, sep=';')
election_data = import_data_election.iloc[2:,:51]

# Clean and format the data
# Remove columns containing the string 'Vorperiode'
string_to_search = "Vorperiode"
cols_to_remove = []

for col in election_data.columns:
    if election_data[col].apply(lambda x: string_to_search in str(x)).any():
        cols_to_remove.append(col)

election_data = election_data.drop(columns=cols_to_remove)

# Merge the first three rows into a single row and format the column names
merged_row = election_data.iloc[:3,:].apply(lambda x: ', '.join(x.astype(str)), axis=0)
merged_row = merged_row.str.replace('nan','').str.replace(' ','').str.replace(',','')
election_data.columns = list(merged_row)
election_data.drop([2,3,4], inplace=True)
election_data.drop([335,336], inplace=True)

# Remove unnecessary rows
rows_to_remove = election_data[election_data.iloc[:, 2
