## Note on the GML File and Network Modeling

Our quarterly GML datasets are not GML files in the traditional sense of the **Graph Modelling Language** as expected by `networkx.read_gml()`. Instead, they use the **Geographic Markup Language (GML)** – an XML-based format for representing geographical features such as lines, points, and polygons.

While `networkx.read_gml()` expects a GML file structured as a graph with named nodes and edges, the dataset in question contains georeferenced lines in the form of `LineString` geometries. These lines represent bicycle paths defined by their coordinates but do not explicitly define a network with uniquely labeled nodes and directed edges.


To build a `networkx`-compatible network from this file, the following steps were required:

1. **Load the file using `geopandas`**: The geographic GML format can be read directly with GeoPandas.
2. **Extract geometries**: The geometries in the file represent paths, and each geometry can be used to define edges in the network.
3. **Determine the start and end points** of each line: These serve as nodes in the graph.
4. **Create a directed graph with `networkx`**: Each line (edge) can be enriched with attributes like `TRACKS_FWD`, `TRACKS_BAC`, `SPEED_REL`, `YEAR`, and `MONTH`. The forward (`TRACKS_FWD`) and backward (`TRACKS_BAC`) edges are stored separately, allowing for directional analysis of the paths.

These steps make it possible to transform geospatial data into a network structure suitable for advanced analysis such as **centrality measures**, **clustering**, or **prediction tasks**.

To install the required libraries, use the following command:
```bash
pip install geopandas networkx shapely

In [None]:
import geopandas as gpd
import networkx as nx
import os

def create_bike_network(file_path, month, year):
    """
    Creates a directed bike network graph from the provided GML file, 
    filtered by the specified month and year. The graph is then saved 
    as a GraphML file.

    Parameters:
    file_path (str): The file path to the GML file containing the bike network data.
    month (int): The month to filter the data.
    year (int): The year to filter the data.
    """
    
    # 1. Load the GML file
    gdf = gpd.read_file(file_path)

    # 2. Initialize the network (directed graph)
    G = nx.DiGraph()

    # 3. Iterate over each row in the GeoDataFrame
    for idx, row in gdf.iterrows():
        # Only process rows with the specified month and year
        if row.get('MONTH') != month or row.get('YEAR') != year:
            continue

        # Get the geometry of the row
        geom = row.geometry

        # Process each line in the geometry (in case of multiple geometries in a row)
        for line in geom.geoms:
            coords = list(line.coords)
            # Round the coordinates to 3 decimal places
            coords_rounded = [(round(x, 3), round(y, 3)) for x, y in coords]

            start = tuple(coords_rounded[0])
            end = tuple(coords_rounded[-1])

            # Skip if the start and end points are the same (no edge)
            if start == end:
                continue

            # Add the start and end nodes if they don't already exist in the graph
            if not G.has_node(start):
                G.add_node(start)
            if not G.has_node(end):
                G.add_node(end)

            # Add or update the forward edge (tracks_fwd)
            if G.has_edge(start, end) and 'tracks_fwd' in G[start][end]:
                G[start][end]['tracks_fwd'] += row.get('TRACKS_FWD', 0)
            elif G.has_edge(start, end) and 'tracks_bac' in G[start][end]:
                G[start][end]['tracks_bac'] += row.get('TRACKS_BAC', 0)
            else:
                # Create the forward edge if it doesn't exist
                G.add_edge(
                    start, end,
                    id=row.get('ID'),
                    tracks_fwd=row.get('TRACKS_FWD'),
                    year=row.get('YEAR'),
                    month=row.get('MONTH'),
                    speed_rel=row.get('SPEED_REL')
                )

            # Add or update the backward edge (tracks_bac)
            if G.has_edge(end, start) and 'tracks_bac' in G[end][start]:
                G[end][start]['tracks_bac'] += row.get('TRACKS_FWD', 0)
            elif G.has_edge(end, start) and 'tracks_fwd' in G[end][start]:
                G[end][start]['tracks_fwd'] += row.get('TRACKS_BAC', 0)
            else:
                # Create the backward edge if it doesn't exist
                G.add_edge(
                    end, start,
                    id=row.get('ID'),
                    tracks_bac=row.get('TRACKS_BAC'),
                    year=row.get('YEAR'),
                    month=row.get('MONTH'),
                    speed_rel=row.get('SPEED_REL')
                )
     

    # 4. Print the number of nodes and edges in the graph
    print(f"Nodes: {len(G.nodes)}, Edges: {len(G.edges)}")
    print("First 5 nodes:", list(G.nodes)[:5])
    print("First 5 edges:", list(G.edges(data=True))[:5])

    # 5. Save the network as a GraphML file
    output_dir = os.path.join('..', 'graphs', str(year))
    os.makedirs(output_dir, exist_ok=True)  # Stelle sicher, dass das Verzeichnis existiert
    output_path = os.path.join(output_dir, f"bike_network_{year}_{month}.graphml")

    # 6. Save the graph in GraphML format
    nx.write_graphml(G, output_path)

    print(f"Graph saved at: {output_path}")



## Generating Bike Network Graphs for Each Year and Quarter

The function `generate_networks()` automates the process of generating bike network graphs for each month in the years 2021 through 2024. The function iterates over all quarters and months within each year and processes the corresponding GML files for each quarter.


In [2]:
def generate_networks():
    # Years from 2021 to 2024
    years = range(2021, 2025)
    
    # Quarters and months
    quarters = {
        "Q1": range(0, 3),   # January to March
        "Q2": range(3, 6),   # April to June
        "Q3": range(6, 9),   # July to September
        "Q4": range(9, 12)   # October to December
    }
    
    # Loop through each year and each quarter, then each month
    for year in years:
        for quarter, months in quarters.items():
            for month in months:
                # Create the file path based on the schema
                file_path = os.path.join("..", "data", str(year), f"bike_citizens_rh_{quarter}_{year}.gml")
                
                # Check if the file exists
                if os.path.exists(file_path):
                    print(f"Processing file: {file_path} (Year: {year}, Quarter: {quarter}, Month: {month})")
                    create_bike_network(file_path, month, year)
                else:
                    print(f"File not found: {file_path} (Year: {year}, Quarter: {quarter}, Month: {month})")

# Example call of the script
generate_networks()


Processing file: ..\data\2021\bike_citizens_rh_Q1_2021.gml (Year: 2021, Quarter: Q1, Month: 0)
Nodes: 7061, Edges: 16190
First 5 nodes: [(9.717, 52.373), (9.715, 52.375), (9.714, 52.377), (9.705, 52.378), (9.705, 52.377)]
First 5 edges: [((9.717, 52.373), (9.715, 52.375), {'id': 85, 'tracks_fwd': 78, 'year': 2021, 'month': 0, 'speed_rel': 0.8453424565}), ((9.717, 52.373), (9.718, 52.372), {'id': 7849, 'tracks_bac': 99, 'year': 2021, 'month': 0, 'speed_rel': 0.9411203871}), ((9.717, 52.373), (9.718, 52.374), {'id': 30249, 'tracks_fwd': 9, 'year': 2021, 'month': 0, 'speed_rel': 0.6649648386}), ((9.717, 52.373), (9.716, 52.373), {'id': 42548, 'tracks_fwd': 28, 'year': 2021, 'month': 0, 'speed_rel': 0.794345224}), ((9.715, 52.375), (9.717, 52.373), {'id': 85, 'tracks_bac': 95, 'year': 2021, 'month': 0, 'speed_rel': 0.8453424565})]
Graph saved at: ..\graphs\2021\bike_network_2021_0.graphml
Processing file: ..\data\2021\bike_citizens_rh_Q1_2021.gml (Year: 2021, Quarter: Q1, Month: 1)
Nodes: 