# Project Group 16

Members: Jurian Fijen, Quirine Japikse, Christos Paschalidis, Kristian Terlien, Timo Locher

Student numbers: 

# Research Objective

*Requires data modeling and quantitative research in Transport, Infrastructure & Logistics*

# Contribution Statement

*Be specific. Some of the tasks can be coding (expect everyone to do this), background research, conceptualisation, visualisation, data analysis, data modelling*

**Author 1**: Jurian Fijen

**Author 2**: Quirine Japikse

**Author 3**: Christos Paschalidis

**Author 4**: Kristian Terlien

**Author 5**: Timo Locher

# Data Used

Research question: What is the effect of the population density of a municipality on the first and last mile transport of their NS stations?

The plan is to combine first- and last-mile transport data with CBS data concerning the built environment of the train station surroundings. Cumbersome first- and last-mile transport can be an inhibition in choosing public transport. The goal is to obtain a better understanding of these factors in multi-modal trips. Data sources are NS, CBS and ODiN. The plan is to choose data from a specific year (probably 2019) and only within The Netherlands NS train stations.

Data sources:
https://dashboards.nsjaarverslag.nl/reizigersgedrag/
https://opendata.cbs.nl/statline/#/CBS/nl/

# Data Pipeline

# Load Libraries

In [1]:
import pandas as pd
import requests
import time
from bs4 import BeautifulSoup
from selenium import webdriver  
from selenium.common.exceptions import NoSuchElementException  
from selenium.webdriver.common.keys import Keys  

# Data Import / Filtering

In [2]:
def read_ns_data(station:str,year:str):
    """
    Input: NS station and year
    Method: Use Selenium which uses local browser to load a webpage
    Reason: NS page is dynamic (not static) therefore the page loads in several phases (Due to animations)
    Output: Statistics for the input station and year

    """
    URL = "https://dashboards.nsjaarverslag.nl/reizigersgedrag/" + station +"?dtYear=" + year
    browser = webdriver.Safari()  
    browser.get(URL)  
    time.sleep(3)
    html_source = browser.page_source  
    browser.quit()
    soup = BeautifulSoup(html_source,'html.parser')  
    results = soup.find(id="content")
    kpi_value_containers = soup.find_all(class_='db-kpi_value-container')

    # Initialize a dictionary to store the extracted values
    kpi_values = {}

    # Iterate through the value containers
    for container in kpi_value_containers:
        # Find the associated title (assuming it's in the parent div)
        title = container.find_previous(class_='db-kpi_title').text.strip()

        # Extract the value (assumes it's within a <span> element)
        value_element = container.find('span', class_='db-kpi_value')
        value = value_element.text.strip()

        # Check if the title is already in the dictionary
        if title in kpi_values:
            # If it is, append the value to the existing list
            kpi_values[title].append(value)
        else:
            # If it's not, create a new list with the value
            kpi_values[title] = [value]
            
    
    # Print the extracted values
    #for title, value in kpi_values.items():
        #print(f"{title}: {value}")
    data ={}
    for title, value in kpi_values.items():
        if len(value) == 2:
            data[title + '_voor'] = value[0]
            data[title + '_na'] = value[1]
        else:
            data[title] = value
    df = pd.DataFrame(data)
    df.index = pd.MultiIndex.from_tuples([(station, year)], names=['Station', 'Year'])
    return df

In [3]:
read_ns_data('delft','2019')

Unnamed: 0_level_0,Unnamed: 1_level_0,Reizigers per dag,Klantoordeel,In- en uitstappers,Overstappers,Ochtendspits,Avondspits,Daluren,Lopend_voor,Lopend_na,Fiets_voor,Fiets_na,Bus/tram/metro_voor,Bus/tram/metro_na,Auto (bestuurder)_voor,Auto (bestuurder)_na,Auto (passagier)_voor,Auto (passagier)_na,(Deel)taxi_voor,(Deel)taxi_na
Station,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
delft,2019,40.818,92%,40.435,383,22%,19%,59%,27%,38%,53%,37%,17%,21%,1%,1%,2%,3%,0%,0%


In [4]:
ns_stations = ['alphen-aan-den-rijn','barendrecht','bodegraven','boskoop','boskoop-snijdelwijk','capelle-schollevaar',
               'de-vink','delft','delft-campus','den-haag-centraal','den-haag-hs','den-haag-laan-van-noi','den-haag-mariahoeve',
               'den-haag-moerwijk','den-haag-ypenburg','dordrecht','dordrecht-zuid','gouda','gouda-goverwelle','hillegom',
               'lansingerland-zoetermeer','leiden-centraal','leiden-lammenschans','nieuwerkerk-a-d-ijssel','rijswijk',
               'rotterdam-alexander','rotterdam-blaak','rotterdam-centraal','rotterdam-lombardijen','rotterdam-noord',
               'rotterdam-zuid','sassenheim','schiedam-centrum','voorburg','voorhout','voorschoten','waddinxveen','waddinxveen-noord',
               'waddinxveen-triangel','zoetermeer','zoetermeer-oost','zwijndrecht'
              ]
years = ['2019','2020','2021','2022']

In [None]:
df_ns_data = pd.DataFrame()
# Loop through stations and years
for station in ns_stations:
    for year in years:
        # Call the read_ns_data function
        df = read_ns_data(station, year)
        
        # Concatenate the obtained DataFrame with the final_df
        df_ns_data = pd.concat([df_ns_data, df])

# Print or return the final DataFrame
#print(df_ns_data)

In [None]:
#df_ns_data.to_csv('ns_data.csv')

In [6]:
df_test = pd.read_csv('ns_data.csv',index_col = [0, 1])
df_test

Unnamed: 0_level_0,Unnamed: 1_level_0,Reizigers per dag,Klantoordeel,In- en uitstappers,Overstappers,Ochtendspits,Avondspits,Daluren,Lopend_voor,Lopend_na,Fiets_voor,Fiets_na,Bus/tram/metro_voor,Bus/tram/metro_na,Auto (bestuurder)_voor,Auto (bestuurder)_na,Auto (passagier)_voor,Auto (passagier)_na,(Deel)taxi_voor,(Deel)taxi_na
Station,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
alphen-aan-den-rijn,2019,12.630,72%,10.996,1.634,32%,13%,55%,23%,46%,48%,24%,17%,17%,8%,2%,4%,11%,0%,0%
alphen-aan-den-rijn,2020,5.900,0%,5.124,776.000,27%,14%,59%,23%,46%,48%,24%,17%,17%,8%,2%,4%,11%,0%,0%
alphen-aan-den-rijn,2021,6.426,0%,5.550,876.000,26%,14%,60%,23%,46%,48%,24%,17%,17%,8%,2%,4%,11%,0%,0%
alphen-aan-den-rijn,2022,9.029,0%,7.786,1.243,28%,13%,59%,23%,46%,48%,24%,17%,17%,8%,2%,4%,11%,0%,0%
barendrecht,2019,6.066,79%,6.066,0.000,33%,12%,55%,21%,40%,34%,22%,4%,8%,28%,6%,13%,24%,0%,0%
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
zoetermeer-oost,2022,1.428,0%,1.428,0.000,27%,16%,57%,32%,43%,32%,21%,8%,18%,21%,6%,7%,12%,0%,0%
zwijndrecht,2019,5.551,62%,5.551,0.000,37%,10%,53%,21%,45%,48%,20%,7%,14%,16%,4%,8%,17%,0%,0%
zwijndrecht,2020,2.653,0%,2.653,0.000,31%,11%,58%,21%,45%,48%,20%,7%,14%,16%,4%,8%,17%,0%,0%
zwijndrecht,2021,2.687,0%,2.687,0.000,28%,12%,60%,21%,45%,48%,20%,7%,14%,16%,4%,8%,17%,0%,0%
