# Taxi Trips Study - Step-by-Step
This notebook analyzes taxi trip data, mapping pickups and dropoffs to street segments and visualizing counts.

## Data Sources

- **[Yellow NYC Taxis 2015](https://arc.net/l/quote/pwljlsqk)**: Sample taxi trip data for NYC.

⚠️ Please Note — Within The Documentation's Interactive Examples ⚠️

First and foremost, please bear with us; some of our Jupyter Notebooks cannot be interactive and are thus displayed as is in the documentation.  Feel free to install the library and test it out locally.  Next, determine whether they are interactive, which means you can see the output of each cell.  As a result, because it is not a good practice to save datasets in a GitHub (or any other Git in general) repository, we attempted to import urban datasets from `HuggingFace` using `from_huggingface(.)` rather than `from_file(.)`, which would need local file availability.  Nonetheless, this was (1) not always viable (certain datasets are not on `HuggingFace`), and (2) this does not preclude you from using `from_file(.)` or any other available via the API reference's `Loader` module.

In [None]:
import urban_mapper as um

# Initialise UrbanMapper
mapper = um.UrbanMapper()

# Step 1: Create urban layer for street segments
layer = (
    mapper.urban_layer
    .with_type("streets_roads")
    .from_place("Downtown Brooklyn, New York City, USA", network_type="drive")
    .build()
)

In [None]:
# Step 2: Load taxi trip data
# Note: For the documentation interactive mode, we only query 5000 records from the dataset.  Feel free to remove for a more realistic analysis.    
data = (
    mapper.loader
    .from_huggingface("oscur/taxisvis1M", number_of_rows=5000, streaming=True)
    .with_columns(longitude_column="pickup_longitude", latitude_column="pickup_latitude")
    .load()
)

data['pickup_longitude'] = data['pickup_longitude'].astype(float)
data['pickup_latitude'] = data['pickup_latitude'].astype(float)

data['dropoff_longitude'] = data['dropoff_longitude'].astype(float)
data['dropoff_latitude'] = data['dropoff_latitude'].astype(float)

In [None]:
# Step 3: Impute missing coordinates
imputer_pickup = (
    mapper.imputer
    .with_type("SimpleGeoImputer")
    .on_columns("pickup_longitude", "pickup_latitude")
    .build()
)
data = imputer_pickup.transform(data, layer)

imputer_dropoff = (
    mapper.imputer
    .with_type("SimpleGeoImputer")
    .on_columns("dropoff_longitude", "dropoff_latitude")
    .build()
)
data = imputer_dropoff.transform(data, layer)

In [None]:
# Step 4: Filter to bounding box
filter_step = mapper.filter.with_type("BoundingBoxFilter").build()
data = filter_step.transform(data, layer)

In [None]:
# Step 5: Map pickups and dropoffs
import copy
tmp_layer = copy.deepcopy(layer)

_, mapped_pickups = layer.map_nearest_layer(
    data,
    longitude_column="pickup_longitude",
    latitude_column="pickup_latitude",
    output_column="pickup_segment"
)

_, mapped_dropoffs = tmp_layer.map_nearest_layer(
    data,
    longitude_column="dropoff_longitude",
    latitude_column="dropoff_latitude",
    output_column="dropoff_segment"
)

In [None]:
# Step 6: Enrich with counts
enricher_pickup = (
    mapper.enricher
    .with_data(group_by="pickup_segment")
    .count_by(output_column="pickup_count")
    .build()
)
enriched_layer_pickup = enricher_pickup.enrich(mapped_pickups, layer)

enricher_dropoff = (
    mapper.enricher
    .with_data(group_by="dropoff_segment")
    .count_by(output_column="dropoff_count")
    .build()
)
enriched_layer = enricher_dropoff.enrich(mapped_dropoffs, enriched_layer_pickup)

In [None]:
# Step 7: Visualize interactively
visualiser = (
    mapper.visual
    .with_type("Interactive")
    .with_style({"tiles": "CartoDB dark_matter", "colorbar_text_color": "white"})
    .build()
)
fig = visualiser.render(enriched_layer.get_layer(), columns=["pickup_count", "dropoff_count"])
fig