# Welcome to OSMNxMapping ☀️!

_Chapter 4_ will be about similar to chapter 1 yet will be exploring different enrichers _defaults_ previously used.

We therefore will take a different data example. We will map urban taxi trip data to the New York City's street networks –– once more step-by-step for the time being.

- Import the library.
- Initialise an OSMNxMapping instance.
- Search for taxi trip datasets using Auctus (or load manually).
- Load the selected dataset.
- Convert the dataset for OSMNxMapping.
- Query a road network.
- Map data to street nodes.
- Preprocess the data (imputing and filtering).
- Enrich the network by counting trips (with a preview).
- Visualise the results.

Let's dive in! 🚕

## Step 1: Import the Library

We begin by importing the `osmnx_mapping` library with the alias `oxm` for convenience.

In [None]:
import osmnx_mapping as oxm

## Step 2: Initialise an OSMNxMapping Instance

Next, we create an instance of `OSMNxMapping` named `taxi_trips`. This instance will manage our taxi trip data and road network analysis. At this stage, no data is loaded or queried—it’s just the foundation for our next steps.

In [None]:
taxi_trips = oxm.OSMNxMapping()

## Step 3: Search for Datasets

We use the `AuctusSearchMixin` to search for datasets related to "taxis" via the Auctus API. Setting `display_initial_results=True` shows an interactive grid of dataset cards in the notebook.

> **Note**: After running this cell, browse the displayed datasets and click "Select This Dataset" on the one you want to use (e.g., a NYC taxi trip dataset). Alternatively, you can load taxi trip data manually from a file—see the `examples/` folder in the repository for examples like loading from CSV or Parquet files.

In [None]:
collection = taxi_trips.auctus.search_datasets(search_query="taxis, NYC", size=100, display_initial_results=True)

## Step 4: Load the Selected Dataset

After selecting a dataset in Step 3, we load it into memory using `load_dataset_from_auctus()`. This returns a `pandas.DataFrame` or `geopandas.GeoDataFrame` and displays an interactive table preview by default.

In [None]:
dataset = taxi_trips.auctus.load_dataset_from_auctus()

## Step 5: Load Your Auctus Dataset into OSMNxMapping

We convert the loaded dataset into a format compatible with OSMNxMapping using `load_from_dataframe`. This method transposes the data into a `geopandas.GeoDataFrame`, specifying the latitude and longitude columns. We then display it interactively with `interactive_display` (using Skrub).

> **Note**: Adjust `"latitude"` and `"longitude"` to match your dataset’s actual column names if they differ (e.g., `"pickup_latitude"` and `"pickup_longitude"` for taxi data).

In [None]:
loaded_data = taxi_trips.loader.load_from_dataframe(
    input_dataframe=dataset,
    latitude_column="latitude",  # Replace with your dataset's latitude column name
    longitude_column="longitude"  # Replace with your dataset's longitude column name
)
taxi_trips.table_vis.interactive_display(loaded_data)

## Step 6: Query a Road Network

We query the road network for Manhattan using `network_from_place`. Setting `render=True` displays a plot of the network.

In [None]:
graph, nodes, edges = taxi_trips.road_networks.network_from_place("Manhattan, New York City, USA", render=True)

## Step 7: Geo Preprocessing Your Dataset

We perform two preprocessing steps:

1. **Impute Missing Values**: Drop rows with missing latitude or longitude values using the default `SimpleGeoImputer`.
2. **Filter Data**: Keep only points within the road network’s bounding box using `BoundingBoxFilter`.

> **Note**: Each `PreprocessingMixin` instance can only perform one action (impute or filter). We reuse the mixin for each step here. For advanced preprocessing, see the `PreprocessingMixin` API.

### Substep 1: Impute Missing Values

In [None]:
loaded_data = (
    taxi_trips.preprocessing
    .with_default_imputer(latitude_column_name="latitude", longitude_column_name="longitude")
    .transform(input_data=loaded_data)
)

### Substep 2: Filter Data Within Bounding Box

Note that this is most of the time very useful especially say your data is for NYC entirely but you are focussing only on Manhattan, you better be filtering this out otherwise you may have outliers that could skew your urban pipeline analysis here.

In [None]:
loaded_data = (
    taxi_trips.preprocessing
    .with_default_filter(nodes=nodes)
    .transform(input_data=loaded_data)
)

## Step 8: Map the Loaded Data to the Nearest Street Nodes

We map each taxi trip to the nearest node in the road network using `map_nearest_street`. This adds a column (default: `"nearest_node"`) to `loaded_data` with the ID of the closest node, essential for the enrichment step.

In [None]:
loaded_data = taxi_trips.road_networks.map_nearest_street(
    data=loaded_data,
    longitude_column="longitude",
    latitude_column="latitude"
)

## Step 9: Enrich the Network with the Loaded Data

We enrich the network by counting the number of taxi trips per street segment, using the `CreateEnricher` factory with `count_by` instead of `aggregate_with` (as used in chapter 1). This counts occurrences by `"nearest_node"` without aggregating a specific value column.

Before building the enricher, we preview its configuration with `preview()` to compare with Notebook 1’s approach.

> **Tip**: Compare this `count_by` method with chapter 1’s `aggregate_with` to see the difference in enrichment strategies!

In [None]:
from osmnx_mapping.modules.enricher import CreateEnricher

# Configure the enricher to count trips
count_enricher = (
    CreateEnricher()
    .with_data(group_by="nearest_node")
    .count_by(edge_method="average", output_column="total_trips")
)

# Preview the enricher configuration
print(count_enricher.preview())

# Build the enricher
count_enricher = count_enricher.build()

# Apply the enricher to the network
enriched_data, graph, nodes, edges = taxi_trips.enricher(count_enricher).enrich_network(
    input_data=loaded_data,
    input_graph=graph,
    input_nodes=nodes,
    input_edges=edges
)

## Step 10: Visualise Your Enriched Network

We visualise the enriched network in two ways:

1. **Static Visualisation**: Using the default `StaticVisualiser` to create a Matplotlib plot.
2. **Interactive Visualisation**: Using `InteractiveVisualiser` for a Folium map.

> **Note**: Ensure you have the necessary Jupyter extensions installed for interactive visualisations (see the README’s Installation section).

### Substep 1: Static Visualisation (Matplotlib)

In [None]:
viz = taxi_trips.visual.visualise(graph=graph, nodes=nodes, edges=edges, result_columns="total_trips")
viz

### Substep 2: Interactive Visualisation (Folium)

In [None]:
from osmnx_mapping.modules.visualiser.visualisers import InteractiveVisualiser

taxi_trips = oxm.OSMNxMapping() # Refreshing the instance to avoid the visualiser to be confused between static and interactive visualiser.

viz = taxi_trips.visual(visualiser=InteractiveVisualiser()).visualise(graph=graph, nodes=nodes, edges=edges, result_columns="total_trips")
viz

## Conclusion

Voila! 🥐 You’ve successfully mapped taxi trip data to Manhattan’s street network, enriched it by counting trips per street segment, and visualised the results both statically and interactively. 🎉

Compare this notebook with chapter's 1 and notice how we used `count_by` here instead of `aggregate_with` to simply count trips rather than averaging a value like building floors. The `preview()` output highlights this difference in enrichment strategy.

For more advanced features or to explore manual loading options, dive into the [OSMNxMapping API](https://github.com/VIDA-NYU/OSMNXMapping#api) and the `examples/` directory.

Happy urban mapping! 🌆