# Dataset conversion
This dataset, provided by the City of Bologna, provides data about hourly bicycle traffic counts collected continuously at various locations throughout the city.
For the San Donato, Parri, Ercolani, and Sabotino counters, data is updated monthly, while for all other counters updates are provided daily.

The dataset is accessible in RDF/XML format at [data.europa.eu](http://data.europa.eu/88u/dataset/c_a944-colonnine-conta-bici)

In this notebook the dataset is converted from the original RDF/XML into tabular format (CSV) for the sake of speed and ease of access in subsequent operations.

In [22]:
import xml.etree.ElementTree as ET
import csv
from pathlib import Path

## Specify input/output parameters
Edit the fields if needed

In [23]:
rdf_path = Path("colonnine-conta-bici.rdf")
csv_path = Path("colonnine-conta-bici.csv")

## Function Definitions

This cell defines two functions used to convert the `colonnine-conta-bici.rdf` file into a structured CSV.

The first function, `detect_namespace`, takes the root of the parsed RDF/XML tree and a local tag name (`colonnine-conta-bici-record`). It iterates through the XML elements to find the first occurrence of a tag with the given local name, then extracts and returns its associated namespace URI. This is necessary because RDF/XML uses fully qualified tag names that include the namespace in curly braces.

The second function, `rdfxml_to_csv`, handles the actual conversion process. It parses the RDF file, determines the namespace using `detect_namespace`, and finds all the `<colonnine-conta-bici-record>` elements. The function then extracts six predefined fields-`colonnina`, `totale`, `direzione_periferia`, `direzione_centro`, `geo_point_2d`, and `data`-from each record and writes them into a CSV file. Each field is retrieved as a direct child of the record element, and if a field is missing, an empty string is written in its place.

In [24]:
def detect_namespace(root, record_local_name: str) -> str:
    for elem in root.iter():
        if isinstance(elem.tag, str) and elem.tag.startswith("{"):
            uri, local = elem.tag[1:].split("}")
            if local == record_local_name:
                return uri
    raise ValueError(f"Tag record '{record_local_name}' not found")

def rdfxml_to_csv(rdf_path: Path, csv_path: Path):
    tree = ET.parse(rdf_path)
    root = tree.getroot()
    record_local = "colonnine-conta-bici-record"
    fields = ["colonnina","totale","direzione_periferia","direzione_centro","geo_point_2d","data"]
    ns_uri = detect_namespace(root, record_local)
    ns = {"ns": ns_uri}
    records = root.findall(f".//ns:{record_local}", ns)
    with open(csv_path, "w", newline="", encoding="utf-8") as f:
        writer = csv.writer(f)
        writer.writerow(fields)
        for rec in records:
            row = []
            for fld in fields:
                el = rec.find(f"ns:{fld}", ns)
                row.append(el.text.strip() if el is not None and el.text else "")
            writer.writerow(row)
    print(f"[OK] {len(records)} records written in {csv_path}")

In [25]:
rdfxml_to_csv(rdf_path, csv_path)

[OK] 298436 records written in colonnine-conta-bici.csv


## Exemplar table visualization
Visualize the first 10 rows of the dataset

In [37]:
import pandas as pd

df = pd.read_csv(csv_path)

pd.set_option('display.max_columns', None)

display(df.head(10))


Unnamed: 0,colonnina,totale,direzione_periferia,direzione_centro,geo_point_2d,data
0,Orti_II,3.0,3.0,,"44.47624162079627,11.37609044129954",2025-06-09 21:00:00+00:00
1,Murri_I,5.0,5.0,,"44.48440745989066,11.35658715560829",2025-06-09 20:00:00+00:00
2,Mazzini_II,130.0,,130.0,"44.48936290709702,11.35940582976347",2025-06-10 05:00:00+00:00
3,Sturzo_II,14.0,14.0,,"44.48820778081575,11.29599058158018",2025-06-10 07:00:00+00:00
4,Massarenti_II,33.0,,33.0,"44.49300884795814,11.37056742338153",2025-06-10 02:00:00+00:00
5,Zanardi_I,,,,"44.50817288812807,11.32990298071598",2025-06-09 22:00:00+00:00
6,Murri_II,7.0,,7.0,"44.48418981668801,11.35719458905058",2025-06-09 21:00:00+00:00
7,Zanardi_II,191.0,191.0,,"44.5082267800901,11.32964907760978",2025-06-10 04:00:00+00:00
8,Zanardi_II,1.0,1.0,,"44.5082267800901,11.32964907760978",2025-06-09 20:00:00+00:00
9,Zanardi_II,68.0,68.0,,"44.5082267800901,11.32964907760978",2025-06-10 08:00:00+00:00


## Dataset Integration with Speed Limits