# Collisions Study - Pipeline
This notebook uses UrbanPipeline to efficiently analyse collision data, counting collisions per intersection.

## Data Sources

- **[NYC DOT Motor Vehicle Collisions](https://data.cityofnewyork.us/Public-Safety/Motor-Vehicle-Collisions-Crashes/h9gi-nx95)**  


⚠️ Please Note — Within The Documentation's Interactive Examples ⚠️

First and foremost, please bear with us; some of our Jupyter Notebooks cannot be interactive and are thus displayed as is in the documentation.  Feel free to install the library and test it out locally.  Next, determine whether they are interactive, which means you can see the output of each cell.  As a result, because it is not a good practice to save datasets in a GitHub (or any other Git in general) repository, we attempted to import urban datasets from `HuggingFace` using `from_huggingface(.)` rather than `from_file(.)`, which would need local file availability.  Nonetheless, this was (1) not always viable (certain datasets are not on `HuggingFace`), and (2) this does not preclude you from using `from_file(.)` or any other available via the API reference's `Loader` module.

In [None]:
import urban_mapper as um
from urban_mapper.pipeline import UrbanPipeline

# Note: For the documentation interactive mode, we only query 5000 records from the dataset.  Feel free to remove for a more realistic analysis.
data = (
    um.UrbanMapper()
    .loader
    .from_huggingface("oscur/NYC_vehicle_collisions", number_of_rows=5000, streaming=True)
    .with_columns(longitude_column="LONGITUDE", latitude_column="LATITUDE")
    .load()
)

data['LONGITUDE'] = data['LONGITUDE'].astype(float)
data['LATITUDE'] = data['LATITUDE'].astype(float)

data.to_csv("./NYC_Motor_Vehicle_Collisions_Mar_12_2025.csv")

In [None]:
mapper = um.UrbanMapper()

# Define the pipeline
pipeline = UrbanPipeline([
    ("urban_layer", (
        mapper
        .urban_layer
        .with_type("streets_intersections")
        .from_place("Downtown Brooklyn, New York City, USA", network_type="drive")
        .with_mapping(
            longitude_column="LONGITUDE",
            latitude_column="LATITUDE",
            output_column="nearest_intersection"
        )
        .build()
    )),
    ("loader", (
        mapper
        .loader
        .from_file("./NYC_Motor_Vehicle_Collisions_Mar_12_2025.csv")
        .with_columns(longitude_column="LONGITUDE", latitude_column="LATITUDE")
        .build()
    )),
    ("imputer", (
        mapper
        .imputer
        .with_type("SimpleGeoImputer")
        .on_columns("LONGITUDE", "LATITUDE")
        .build()
    )),
    ("filter", um.UrbanMapper().filter.with_type("BoundingBoxFilter").build()),
    ("enricher", (
        mapper
        .enricher
        .with_data(group_by="nearest_intersection")
        .count_by(output_column="collision_count")
        .build()
    )),
    ("visualiser", (
        mapper
        .visual
        .with_type("Interactive")
        .with_style({"tiles": "CartoDB dark_matter", "colorbar_text_color": "white"})
        .build()
    ))
])

In [None]:
# Execute the pipeline
mapped_data, enriched_layer = pipeline.compose_transform()

In [None]:
# Visualize results
pipeline.visualise(["collision_count"])

In [None]:
# Save the pipeline
pipeline.save("./collisions_pipeline.dill")