# Dynamic train ridership on the Swedish Southern Main Line 2018.

After collecting the static ridership from sampers, this is the estimation of the dynamic train ridership on the Southern Main Line during 2018!

## Read data

Let us first read the existing relevant data, namely:
- Static ridership (from Sampers), i.e., the average number of passengers onboard between each consecutive links per line and weekday.
- Lines, including the first and last station, the average number of turer (round-trips) during a weekday. Note that there are lines with the same start and last station but they are two different lines as they have different stops.
- Needed later is the Traffic data (from Lupp), we are more interested here in the scheduled number of trains during (off)peak hours.

In [2]:
import pandas as pd
import os
import warnings

# Suppress warnings (e.g., from pandas or others)
warnings.filterwarnings("ignore")

# Get the current working directory
current_directory = os.getcwd()

# Define file paths (assuming the files are in a 'data' subdirectory)
file_path_lines = os.path.join(current_directory, 'data', 'Linjer_sträckor_turer.xlsx')
file_path_stations = os.path.join(current_directory, 'Plats_sign_pos.xlsx')
file_path_static_pass = os.path.join(current_directory, 'static_pass_SSB_2018.xlsx')


# Define a function to replace commas with dots and remove hidden characters
def clean_column_values(value):
    if isinstance(value, str):
        value = value.replace(',', '.')  # Replace commas with dots for decimals
        value = value.replace('\r', '').replace('\n', '')  # Remove carriage returns and line breaks
    return value

# Read the additional Excel files into DataFrames
df_lines = pd.read_excel(file_path_lines, dtype={"line": str})  # Read lines data
df_stations = pd.read_excel(file_path_stations, dtype={"station": str})  # Read station positions data
df_static_pass = pd.read_excel(file_path_static_pass, dtype={"line": str})  # Read static ridership data

# Clean all string columns in the new DataFrames
df_lines = df_lines.applymap(lambda x: clean_column_values(x) if isinstance(x, str) else x)
df_stations = df_stations.applymap(lambda x: clean_column_values(x) if isinstance(x, str) else x)
df_static_pass = df_static_pass.applymap(lambda x: clean_column_values(x) if isinstance(x, str) else x)

Overview of collected data (columns, types).

In [6]:
# Print DataFrame summaries for verification
print("\nLines DataFrame Column Types:")
print(df_lines.dtypes)

print("\nStations DataFrame Column Types:")
print(df_stations.dtypes)

print("\nStatic Passenger DataFrame Column Types:")
print(df_static_pass.dtypes)



Lines DataFrame Column Types:
Linje                    object
Sträcka                  object
Vehicle type              int64
Mode                     object
Antal dubbel-turer        int64
Linjetid minuter        float64
Linjelängd kilometer    float64
dtype: object

Stations DataFrame Column Types:
Plats        object
Signatur     object
Latitud     float64
Longitud    float64
dtype: object

Static Passenger DataFrame Column Types:
line                          object
From                          object
To                            object
Nat_Priv_Ombord                int64
Nat_Tj_Ombord                  int64
Reg_arb_Ombord                 int64
Reg_tj_Ombord                  int64
Reg_övr_Ombord                 int64
Nationella_tot_Ombord          int64
Regionala_tot_Ombord           int64
Tot_Ombord                     int64
Nat_Priv_Påstigande          float64
Nat_Tj_Påstigande            float64
Reg_arb_Påstigande           float64
Reg_tj_Påstigande            float64
Reg_öv

## Clean up

Keep only the lines where the corresponding trains are passing by at least two consecutive stations of the southern main line (SSB).

In [5]:
import pandas as pd

# Assuming df_pax is the DataFrame you are working with

# List of ordered stations on the Southern Main Line (SSB)
stations_SSB_ordered_north_to_south = [
    "Stockholms Central", "Stockholms Södra", "Årstaberg", "Älvsjö", "Stuvsta",
    "Huddinge", "Flemingsberg", "Björnkulla", "Malmsjö", "Södertälje syd övre",
    "Bränninge", "Järna", "Mölnbo", "Gnesta", "Kolke", "Björnlunda", "Stjärnhov",
    "Nyckelsjön", "Sparreholm", "Skebokvarn", "Flen", "Sköldinge", "Stolpstugan",
    "Katrineholms central", "Strångsjö", "Simonstorp", "Åby", "Norrköpings central",
    "Fiskeby", "Kimstad", "Norsholm", "Gistad", "Linghem", "Linköpings central",
    "Vikingstad", "Mantorp", "Mjölby", "Lindekullen", "Boxholm", "Sommen",
    "Tranås", "Gripenberg", "Frinnaryd", "Ralingsås", "Aneby", "Flisby", "Vimnarp",
    "Gamlarp", "Nässjö central", "Grimstorp", "Bodafors", "Ulvstorp", "Sävsjö",
    "Aleholm", "Stockaryd", "Rörvik", "Lammhult", "Grevaryd", "Lidnäs", "Moheda",
    "Gåvetorp", "Alvesta", "Blädinge", "Vislanda", "Eneryda", "Diö Norra", "Diö Södra",
    "Älmhult", "Killeberg", "Tunneby", "Osby", "Hästveda", "Mosselund", "Ballingslöv",
    "Hässleholm", "Mellby", "Sösdala", "Vätteryd", "Tjörnarp", "Höör", "Stehag", "Eslöv",
    "Dammstorp", "Örtofta", "Stångby", "Tornhill", "Lunds central", "Klostergården",
    "Flackarp", "Hjärup", "Åkarps norra", "Åkarp", "Burlöv", "Arlöv", "Malmö godsbangård",
    "Malmö central"
]

# Function to check if a train passes at least two distinct stations in the SSB
def passes_two_or_more_stations_for_line(line_df):
    # Check if the line passes through at least two distinct SSB stations
    station_indices = set()  # Use a set to keep track of unique station indices

    for _, row in line_df.iterrows():
        from_station = row['From']
        to_station = row['To']

        if from_station in stations_SSB_ordered_north_to_south:
            station_indices.add(stations_SSB_ordered_north_to_south.index(from_station))
        if to_station in stations_SSB_ordered_north_to_south:
            station_indices.add(stations_SSB_ordered_north_to_south.index(to_station))

    # If the line passes through at least two distinct stations, we return True
    return len(station_indices) >= 2

# Group the DataFrame by 'line' to analyze each line individually
lines_meeting_criteria = []
for line, line_df in df_pax.groupby('line'):
    if passes_two_or_more_stations_for_line(line_df):
        lines_meeting_criteria.append(line)

# Filter the original DataFrame to include all rows for the lines that meet the criteria
filtered_df_pax = df_pax[df_pax['line'].isin(lines_meeting_criteria)]

Remove all the pairs not on the southern main line SSB.

## Plotting 

We first plot the ridership (pax ombord) for some random southbound lines.

## Exporting to excel file

In [12]:
filtered_df_pax.columns

Index(['line', 'From', 'To', 'Nat_Priv_Ombord', 'Nat_Tj_Ombord',
       'Reg_arb_Ombord', 'Reg_tj_Ombord', 'Reg_övr_Ombord',
       'Nationella_tot_Ombord', 'Regionala_tot_Ombord', 'Tot_Ombord',
       'Nat_Priv_Påstigande', 'Nat_Tj_Påstigande', 'Reg_arb_Påstigande',
       'Reg_tj_Påstigande', 'Reg_övr_Påstigande', 'Nationella_tot_Påstigande',
       'Regionala_tot_Påstigande', 'Totalt Påstigande', 'Nat_Priv_Avstigande',
       'Nat_Tj_Avstigande', 'Reg_arb_Avstigande', 'Reg_tj_Avstigande',
       'Reg_övr_Avstigande', 'Nationella_tot_Avstigande',
       'Regionala_tot_Avstigande', 'Tot_Avstigande', 'Direction'],
      dtype='object')

In [13]:
# Ensure the 'line' column is treated as text (string type)
filtered_df_pax['line'] = filtered_df_pax['line'].astype(str)

# Save the DataFrame to an Excel file
output_filename = "static_pass_SSB_2018.xlsx"
filtered_df_pax.to_excel(output_filename, index=False)

print(f"Data successfully saved to {output_filename}")

Data successfully saved to static_pass_SSB_2018.xlsx
