# Matching the location of each TCR to contracts

We first start by reading the cleaned and matched TCRs for 2024 which are matched with contracts number. As well as Karins matching between bandel and number of hours of servicefönster.

## Förbindelser and BIS (for mapping Bandelnr <-> Kontrakt)

Load the Excel file containing the dictionary for bandel matching, and keep only relevant columns and bandel.

In [1]:
import pandas as pd

# Update: the new file includes spårnummer and type of track
dictionary_file_path = "BIS-data 2024-01-09 - Bandel, plats och förbindelselinje, alla spår.xlsx"

# Read the entire dictionary into a DataFrame
dictionary_df = pd.read_excel(dictionary_file_path)

FileNotFoundError: [Errno 2] No such file or directory: 'BIS-data 2024-01-09 - Bandel, plats och förbindelselinje, alla spår.xlsx'

In [33]:
# Focus on the main tracks
# keep only rows where column Spår_huvud_sido is nhsp
main_tracks_df = dictionary_df[dictionary_df["Spår_huvud_sido"] == "nhsp"]

In [34]:
# Step 1: Group by 'BdlNr', 'Bandel', 'Plats_sign', 'Plats' and sum 'Banlangd'
grouped_by_plats = main_tracks_df.groupby(['BdlNr', 'Bandel', 'Plats_sign', 'Plats'])['Banlangd'].sum().reset_index()

# Step 2: Group by 'BdlNr', 'Bandel', 'Forbind' and sum 'Banlangd'
grouped_by_forbind = main_tracks_df.groupby(['BdlNr', 'Bandel', 'Forbind'])['Banlangd'].sum().reset_index()

# Step 3: Add 'Plats_sign' and 'Plats' columns with NaN to 'grouped_by_forbind' for consistency
grouped_by_forbind['Plats_sign'] = pd.NA
grouped_by_forbind['Plats'] = pd.NA

# Step 4: Add 'Forbind' column with NaN to 'grouped_by_plats' for consistency
grouped_by_plats['Forbind'] = pd.NA

# Step 5: Combine the two DataFrames using outer concatenation
combined_df = pd.concat([grouped_by_plats, grouped_by_forbind], ignore_index=True, sort=False)

## Bandelar from contracts

In [36]:
# Step 1: Load the Excel file containing service contracts for each bandel
#excel_file_path = "servicekontrakt_per_bandel_Abdou.xlsx"
excel_file_path = "more_servicekontrakt_per_bandel.xlsx"

#sheet_name = "uppdaterad"
sheet_name = "tid per bandel"

# Read the specific sheet 'T24' into a DataFrame
servicekontrakt_df = pd.read_excel(excel_file_path, sheet_name=sheet_name)

## Matching TCR:s Förbindelser with bandelar

There some rows where the bandel is not identified because the Från trafikplats is not in the dictionary. For these rows we will use the bandel that is identified in related row, i.e., rows with the same TCR-id and with Platssekvensnummer which is neighboring (i.e., Platssekvensnummer = Platssekvensnummer of the unidentified bandel row minus or plus 1).

In [None]:
# Group by TCR-id and Starttid
def get_first_last_rows(group):
    # Get min and max Platssekvensnummer rows
    first_row = group[group['Platssekvensnummer'] == group['Platssekvensnummer'].min()]
    last_row = group[group['Platssekvensnummer'] == group['Platssekvensnummer'].max()]
    return pd.concat([first_row, last_row])

# Apply the function to each group
filtered_tcr_df = tcr_df.groupby(['TCR-id', 'Starttid'], as_index=False).apply(get_first_last_rows)

# Reset index if needed
filtered_tcr_df = filtered_tcr_df.reset_index(drop=True)

Before calculating the length between two consecutive places/rows, we need to reformat the tcr_df so that in each row we combine the row with the next row (in Platssekvensnummer), if any (until final row in the sequence).  We create a new column called förbind_list which will contactenate Från trafikplats of two consecutive rows, e.g., A-B (where A is trafikplats of the first row and B is the second), next row will have B-C, etc. until the final förbind in the sequence.

In [47]:
# Step 1: Sort the DataFrame by 'TCR-id' and 'Platssekvensnummer'
filtered_tcr_df = filtered_tcr_df.sort_values(by=['TCR-id', 'Starttid', 'Platssekvensnummer']).reset_index(drop=True)

# Step 4: Create 'next_trafikplats' and 'next_Från_inkluderad'
filtered_tcr_df['next_trafikplats'] = filtered_tcr_df.groupby(['TCR-id', 'Starttid'])['Från trafikplats'].shift(-1)
filtered_tcr_df['next_Från_inkluderad'] = filtered_tcr_df.groupby(['TCR-id', 'Starttid'])['Från inkluderad'].shift(-1)

# Step 5: Create 'förbind_list' with conditional parentheses
def format_trafikplats(trafikplats, inkluderad):
    """Format trafikplats name with parentheses based on inclusion status."""
    if inkluderad != 'Helt':
        return f"({trafikplats})"
    return trafikplats

def create_förbind(row):
    """Create förbind string for a row, connecting two trafikplats names."""
    if pd.isna(row['next_trafikplats']):
        return None

    if(row['Från trafikplats'] == row['next_trafikplats']):
        return f"{row['Från trafikplats']}"

    from_tp = format_trafikplats(row['Från trafikplats'], row['Från inkluderad'])
    to_tp = format_trafikplats(row['next_trafikplats'], row['next_Från_inkluderad'])


    return f"{from_tp}-{to_tp}"

# Apply the function to create förbind_list
filtered_tcr_df['förbind_list'] = filtered_tcr_df.apply(create_förbind, axis=1)

In [48]:
# Step 6: Remove temporary 'next_trafikplats' and 'next_Från_inkluderad' columns
#filtered_tcr_df = filtered_tcr_df.drop(columns=['next_trafikplats', 'next_Från_inkluderad'])

# Step 7: Remove the final row in each sequence
filtered_tcr_df = filtered_tcr_df.dropna(subset=['förbind_list']).reset_index(drop=True)

In [49]:
# keep a copy
tcr_df = filtered_tcr_df.copy()

We need to update the identified_BdlNr given the förbind_list. So, if the förbind_list is in the dictionary (column Forbind), and the corresponding BdlNr is different then the current identified_BdlNr, then update it. Otherwise leave it as it is.

In [50]:
# # Step 1: Create a mapping from 'Forbind' to 'BdlNr' for quick lookups
# forbind_to_bdl_map = dictionary_df.set_index('Forbind')['BdlNr'].to_dict()

In [51]:
# # Step 2: Update 'identified_BdlNr' based on 'förbind_list'
# def update_bandel(row):
#     # Check if 'förbind_list' exists in the dictionary
#     if row['förbind_list'] in forbind_to_bdl_map:
#         new_bdl_nr = forbind_to_bdl_map[row['förbind_list']]
#         # Update only if the new BdlNr is different
#         if new_bdl_nr != row['identified_BdlNr']:
#             return new_bdl_nr
    
#     # If not found, try the inverted link
#     inverted_link = '-'.join(reversed(row['förbind_list'].split('-')))
#     if inverted_link in forbind_to_bdl_map:
#         new_bdl_nr = forbind_to_bdl_map[inverted_link]
#         # Update only if the new BdlNr is different
#         if new_bdl_nr != row['identified_BdlNr']:
#             return new_bdl_nr
    
#     # If no update is needed or not found, return the current value
#     return row['identified_BdlNr']

# # Apply the function to update 'identified_BdlNr'
# tcr_df['identified_BdlNr'] = tcr_df.apply(update_bandel, axis=1)

Now, once we have identified BdlNR, we can use servicekontrakt_df to add a column with Kontraktsområdesnamn.

In [52]:
# Create the contract map with float conversion
contract_map = servicekontrakt_df_T23.drop_duplicates(subset=['Bandelnr']).copy()
contract_map['Bandelnr'] = contract_map['Bandelnr'].astype(float)
contract_map = contract_map.set_index('Bandelnr')['Kontraktsområdesnamn'].to_dict()

# Map with the contract map
tcr_df['identified_BdlNr'] = tcr_df['identified_BdlNr'].astype(float)
tcr_df['Kontraktsområdesnamn'] = tcr_df['identified_BdlNr'].map(contract_map)

We can also include a similar column with contract name, this one is based on BIS file.

In [53]:
# Additional mapping using the bandel_contract_dict
tcr_df['kontrakt_från_bandel'] = tcr_df['identified_BdlNr'].map(bandel_contract_dict)

No that we have identified BdlNr, we want to get the total length langd (which is in meter) of the Från trafikplats  and put it in a column (sum_langd). The idea is to use the order of forbind_list and look for the corresponding rows in dictionary_df (within same bandelnr = identified BdlNr) and accumulate the lenght in column dictionary_df(Banlangd). The forbind_list are normally linked, e.g., A-B, B-C, etc. 

In [54]:
# Apply the function to create the 'sum_langd' column for tcr_df
# tcr_df['sum_langd'] = tcr_df.apply(
#     lambda row: calculate_sum_langd(
#         row['förbind_list'], 
#         row['identified_BdlNr'], 
#         dictionary_df
#     ), 
#     axis=1
# )

tcr_df['sum_langd'] = tcr_df.apply(
    lambda row: calculate_sum_langd_for_bandelnamn(row, dictionary_df),
    axis=1
)

## Validations

In [None]:
# check if there are any rows in tcr_df where sum_langd is None, print them
print(tcr_df[tcr_df['sum_langd'].isna()])

## Export to Excel files