# Rangliste 3

Quelle: BFS und Wikipedia

In [1]:
import pandas as pd
import requests

## Data fetching and cleaning

Diese Rangliste bezieht die Daten direkt von der Wikipedia-Seite und vom BFS - es ist also kein manueller Download der Daten nötig.

In [2]:
# load data from wikipedia
url_wikipedia = 'https://de.wikipedia.org/wiki/Gemeinden_des_Kantons_Waadt'
df_communities_wikipedia = pd.read_html(url_wikipedia)[0]

# clean up the data
df_communities_wikipedia = df_communities_wikipedia.rename(columns={
    'Name der Gemeinde': 'community_name',
    'Fläche in km² [1]': 'area_km2',
}).drop(columns=['Wappen', 'Einwohner (31.\xa0Dezember 2022)', 'Einw. pro km²', 'Bezirk (bis 2007)', 'Bezirk (ab 2008)'])
df_communities_wikipedia = df_communities_wikipedia[df_communities_wikipedia['community_name'] != 'Total (300)']

# load data from bfs
url_bfs = 'https://www.atlas.bfs.admin.ch/core/projects/13/xshared/csv/27598_131.csv'
response = requests.get(url_bfs)
if response.status_code == 200:
    # Save the file to the specified folder
    with open('raw/SteuerbaresEinkommen_CH.csv', 'wb') as f:
        f.write(response.content)
df_communities_bfs = pd.read_csv('raw/SteuerbaresEinkommen_CH.csv', sep=';')
df_communities_bfs = df_communities_bfs[df_communities_bfs['VARIABLE'] == 'Steuerbares Einkommen pro Steuerpflichtigem/-r, in Franken']

# clean up the data
df_communities_bfs = df_communities_bfs.rename(columns={
    'GEO_NAME': 'community_name',
    'VALUE'   : 'taxable_income_million_CHF'
})[['community_name', 'taxable_income_million_CHF']]

# merge the data
df_communities = df_communities_wikipedia.merge(df_communities_bfs, on='community_name', how='left')
df_communities['taxable_income_million_CHF'].fillna(0, inplace=True)

print("Beispiel der aufbereitete Daten:")
display(df_communities)

Beispiel der aufbereitete Daten:


Unnamed: 0,community_name,area_km2,taxable_income_million_CHF
0,Aclens,390,79585.0
1,Agiez,546,70694.0
2,Aigle,1641,60548.0
3,Allaman,260,98512.0
4,Arnex-sur-Nyon,204,100518.0
...,...,...,...
295,Vully-les-Lacs,2092,74996.0
296,Yens,951,116253.0
297,Yverdon-les-Bains,1354,61649.0
298,Yvonand,1340,67055.0


## Calculate first criteria

Fläche der Gemeinde:
Die Fläche der Gemeinde wird quadriert und dieser Wert wird als Punkte vergeben.

In [3]:
df_communities['criteria1'] = df_communities['area_km2'] ** 2

## Calculate second criteria

Steuerbares Einkommen:
Das steuerbare Einkommen wird als Punkte vergeben. Dabei werden die steuerbare Einkommen, welche grösser als 60'000 CHF sind, doppelt gezählt (also steuerbares Einkommen zählt doppelt als Punktzahl).

In [4]:
def taxable_income_rating(x):
    if x > 60000:
        return x * 2;
    else:
        return x;

df_communities['criteria2'] = df_communities['taxable_income_million_CHF'].apply(taxable_income_rating)

## Calculate third criteria

Anzahl Wörter im Gemeindenamen:
Die Anzahl Wörter im Gemeindenamen (Gemeindenamen, separiert nach Leerzeichen) wird als Punkte vergeben. Gemeinden, welche eine gerade Anzahl Wörter im Namen haben, erhalten die doppelte Anzahl Punkte (also Anzahl Wörter zählt doppelt als Punktzahl).

In [5]:
def get_number_of_words(x):
    length = len(x.split(' '))
    if length % 2 == 0:
        return length * 2
    return length

df_communities['criteria3'] = df_communities['community_name'].apply(get_number_of_words)

## Calculate the final score

In [6]:
# normalize criteria (0-100 scale)
def normalize_column(df, column_name):
    min_val = df[column_name].min()
    max_val = df[column_name].max()
    return ((df[column_name] - min_val) / (max_val - min_val)) * 100

df_communities['criteria1'] = normalize_column(df_communities, 'criteria1')
df_communities['criteria2'] = normalize_column(df_communities, 'criteria2')
df_communities['criteria3'] = normalize_column(df_communities, 'criteria3')

# compute final score
df_communities['score'] = df_communities['criteria1'] * 0.4 + df_communities['criteria2'] * 0.2 + df_communities['criteria3'] * 0.4

In [7]:
# display results
df_communities_to_display = df_communities.sort_values('score', ascending=False)[['community_name', 'criteria1', 'criteria2', 'criteria3', 'score']]
print('Gemeinde Rangliste:')
display(df_communities_to_display)
print("Ausgewählte Gemeinden:")
display(df_communities_to_display.query('`community_name` == "Lavey-Morcles" or `community_name` == "Le Chenit" or `community_name` == "Mauraz"'))

Gemeinde Rangliste:


Unnamed: 0,community_name,criteria1,criteria2,criteria3,score
152,Le Chenit,76.158690,11.403459,100.0,72.744168
6,Arzier-Le Muids,20.850052,20.388279,100.0,52.417677
153,Le Lieu,8.195654,11.644236,100.0,45.607109
285,Villeneuve (VD),7.915968,11.137155,100.0,45.393818
146,La Rippe,2.129752,19.562634,100.0,44.764428
...,...,...,...,...,...
192,Mutrux,0.080015,0.000000,0.0,0.032006
58,Chêne-Pâquier,0.033719,0.000000,0.0,0.013488
193,Novalles,0.032105,0.000000,0.0,0.012842
236,Rossenges,0.008285,0.000000,0.0,0.003314


Ausgewählte Gemeinden:


Unnamed: 0,community_name,criteria1,criteria2,criteria3,score
152,Le Chenit,76.15869,11.403459,100.0,72.744168
150,Lavey-Morcles,1.560118,5.052441,0.0,1.634536
173,Mauraz,0.001191,0.0,0.0,0.000477


In [8]:
# export results
df_export = df_communities.sort_values('score', ascending=False)
df_export.to_csv('rankings/3_ranking.csv', index=False)