# Analysing some numbers about Luxembourg

In this one, I'm going to look at some random datasets which I found interesting from [the open data portal](https://data.public.lu/en/). Namely:

1. Weather data (and comparison to Switzerland)
1. Citizen
1. Composition of the population
1. Energy production
1. Driving licenses

In [3]:
import pandas as pd
import numpy as np
from plotnine import *

## 1. Weather Data


In [4]:
# Importing Swiss Data
swiss_weather = (pd.read_csv("Weather/weather_data.csv")
    .filter(["date", "STG_precipitation"])
    .assign(date = lambda df: pd.to_datetime(df["date"], format="%d/%m/%Y"),
            location = "CH")
    .rename(columns={"STG_precipitation":"precipitation"})
)

swiss_weather.head(3)

Unnamed: 0,date,precipitation,location
0,1900-01-01,0.0,CH
1,1900-01-02,0.0,CH
2,1900-01-03,0.1,CH


In [5]:
# Importing Luxembourgish data
lu_weather = (pd.read_csv("Weather/Findel Daily Weather since 1947.csv")
    .assign(date = lambda df: pd.to_datetime(df["DATE"], format="%d.%m.%Y"),
            location = "LU")
    .filter(["date", "precipitation_mm", "location"])
    .rename(columns={"precipitation_mm":"precipitation"})
)

lu_weather.head(3)

Unnamed: 0,date,precipitation,location
0,1947-01-01,0.0,LU
1,1947-01-02,0.0,LU
2,1947-01-03,0.0,LU


In [6]:
# Concatenate the two to one for plots and aggregation
weather_data = pd.concat([lu_weather, swiss_weather], axis=0)

weather_data

Unnamed: 0,date,precipitation,location
0,1947-01-01,0.0,LU
1,1947-01-02,0.0,LU
2,1947-01-03,0.0,LU
3,1947-01-04,0.0,LU
4,1947-01-05,0.0,LU
...,...,...,...
44555,2021-12-27,1.6,CH
44556,2021-12-28,3.6,CH
44557,2021-12-29,20.3,CH
44558,2021-12-30,0.3,CH


With this, I can now make the visualisations!

- Additional: Rain hours (> x mm per hour) per year
- Does it rain longer but less in Luxembourg?

## 2. Citizen

On this one, I have data for

1. Nationalities
1. Number of kids

In [7]:
# Nationalities
nationalities = pd.read_csv("Citizen/01-01-2023-Number of citizens per nationality and municipality.csv", encoding='latin-1')

nationalities

Unnamed: 0,COMMUNE_CODE,COMMUNE_NOM,NATIONALITE_ISO3,NATIONALITE_NOM,NOMBRE_TOTAL
0,1101,Beaufort,ERI,érythrée,32
1,1101,Beaufort,ETH,éthiopienne,7
2,1101,Beaufort,AFG,afghane,16
3,1101,Beaufort,ALB,albanaise,7
4,1101,Beaufort,DZA,algérienne,6
...,...,...,...,...,...
7628,1209,Wormeldange,THA,thailandaise,1
7629,1209,Wormeldange,TUN,tunisienne,1
7630,1209,Wormeldange,TUR,turque,5
7631,1209,Wormeldange,UKR,ukrainienne,39


In [8]:
# Nationalities
kids = pd.read_csv("Citizen/Anzahl Kinder nach Gemeinde.csv", encoding='latin-1')

kids

Unnamed: 0,COMMUNE_CODE,COMMUNE_NOM,NOMBRE_ENFANTS,NOMBRE_PARENTS
0,1,Luxembourg,1,17025
1,1,Luxembourg,2,19726
2,1,Luxembourg,3,7058
3,1,Luxembourg,4,1810
4,1,Luxembourg,5,480
...,...,...,...,...
805,1310,Schengen,5,30
806,1310,Schengen,6,13
807,1310,Schengen,7,5
808,1310,Schengen,8,1


## 3. État de la Population

Main takeaway here: This data is a mess. Had to clean a lot before being able to do this:

In [53]:
import glob as glob
import os

path = r'C:\Users\mathi\OneDrive\Python\52-Weeks-of-Python-and-R-2023\Week 15 - Luxembourg Stats\Luxembourg Etat de la Population'
all_files = glob.glob(os.path.join(path, "*.csv"))

li = []
year_index = 2010

for filename in all_files:
    df = (pd.read_csv(filename, encoding="latin-1", delimiter=";")
        .assign(year = year_index))
    li.append(df)

    year_index += 1

In [54]:
composition = pd.concat(li, axis=0, ignore_index=True)

composition

Unnamed: 0,Age,Sexe,Nationalite,Quartier,year
0,0.0,M,A,GASPERICH,2010
1,0.0,M,A,NEUDORF/WEIMERSHOF,2010
2,0.0,M,"A,DK",WEIMERSKIRCH,2010
3,0.0,M,APA,BONNEVOIE-SUD,2010
4,0.0,M,APA,BELAIR,2010
...,...,...,...,...,...
1209137,35.0,M,R,VILLE-HAUTE,2020
1209138,0.0,M,ETHKJ,NEUDORF/WEIMERSHOF,2020
1209139,56.0,M,P,BONNEVOIE-NORD/VERLORENKOST,2020
1209140,25.0,M,GB,GASPERICH,2020


In [60]:
iso_codes = pd.read_csv("Nationalites_ISO3_Nom.csv", encoding="latin-1", delimiter=";")

iso_codes

Unnamed: 0,NATIONALITE_CODEPAYS,NATIONALITE_ISO3,NATIONALITE_NOM
0,A,AUT,autrichienne
1,AFG,AFG,afghane
2,AL,ALB,albanaise
3,AND,AND,andorrane
4,ANG,AGO,angolaise
...,...,...,...
166,XXX,XXX,indéterminée
167,YMN,YEM,yéménite
168,YU,YUG,***yougoslave***
169,YV,VEN,vénézuélienne


Which age/sex/nationality by quartier? Over time?

## 4. Energy Production by Canton

In [64]:
energy = pd.read_csv("Energieproduktion nach Kanton und Gemeinde Luxemburg 2022.csv")

energy

Unnamed: 0,LAU1,Canton,LAU2,Commune,Type,kW
0,1,Capellen,101,Dippach,Installation photovoltaïque,1.32
1,1,Capellen,101,Dippach,Installation photovoltaïque,1.40
2,1,Capellen,101,Dippach,Installation photovoltaïque,1.40
3,1,Capellen,101,Dippach,Installation photovoltaïque,1.90
4,1,Capellen,101,Dippach,Installation photovoltaïque,1.98
...,...,...,...,...,...,...
10822,12,Remich,1208,Waldbredimus,Installation photovoltaïque,29.90
10823,12,Remich,1208,Waldbredimus,Installation photovoltaïque,29.97
10824,12,Remich,1208,Waldbredimus,Installation photovoltaïque,30.00
10825,12,Remich,1208,Waldbredimus,Installation photovoltaïque,37.50


## 5. Driving Licenses

In [68]:
licenses = pd.read_csv("Examens d’obtention de permis de conduire par catégorie, type d’examen et résultat 2018-2021.csv",
                       encoding="latin-1")

licenses

Unnamed: 0,Categorie,Type_Examen,Resultat,Nombre
0,A,Examen contrôle pratique,S - Réussi,4
1,A,Examen contrôle théorie,F - Raté,1
2,A,Examen contrôle théorie,S - Réussi,3
3,A1,Examen contrôle pratique,S - Réussi,1
4,A1,Examen pratique,N - Interrompu,2
...,...,...,...,...
92,F,Examen pratique,S - Réussi,112
93,F,Examen théorique,A - Candidat ne s'est pas présenté,1
94,F,Examen théorique,R - Candidat refusé,4
95,F,Examen théorique,F - Raté,38
