# Enricher

In this notebook we’ll sprinkle some extra magic onto your urban layers. Let’s add e.g. average building floors to a layer and see it sparkle!

**Data source used**:
- PLUTO data from NYC Open Data. https://www.nyc.gov/content/planning/pages/resources/datasets/mappluto-pluto-change

Let’s jazz things up! 🏙️

⚠️ Please Note — Within The Documentation's Interactive Examples ⚠️

First and foremost, please bear with us; some of our Jupyter Notebooks cannot be interactive and are thus displayed as is in the documentation.  Feel free to install the library and test it out locally.  Next, determine whether they are interactive, which means you can see the output of each cell.  As a result, because it is not a good practice to save datasets in a GitHub (or any other Git in general) repository, we attempted to import urban datasets from `HuggingFace` using `from_huggingface(.)` rather than `from_file(.)`, which would need local file availability.  Nonetheless, this was (1) not always viable (certain datasets are not on `HuggingFace`), and (2) this does not preclude you from using `from_file(.)` or any other available via the API reference's `Loader` module.

In [None]:
import urban_mapper as um

# Start UrbanMapper
mapper = um.UrbanMapper()

## Loading Data and Creating a Layer

First, let’s grab some PLUTO data and set up a street intersections layer for Downtown Brooklyn.

Note that:

- Loader example can be seen in `examples/Basics/loader.ipynb`
- Urban Layer example can be seen in `examples/Basics/urban_layer.ipynb`
- Imputer example can be seen in `examples/Basics/imputer.ipynb`

In [None]:
# Load data
# Note: For the documentation interactive mode, we only query 5000 records from the dataset.  Feel free to remove for a more realistic analysis.
data = (
    mapper
    .loader
    .from_huggingface("oscur/pluto", number_of_rows=5000, streaming=True).with_columns("longitude", "latitude").load()
    # From the loader module, from the following file within the HuggingFace OSCUR datasets hub and with the `longitude` and `latitude`
)

# Create urban layer
layer = (
    mapper
    .urban_layer # From the urban_layer module
    .with_type("streets_intersections")  # With the type streets_intersections
    .from_place("Downtown Brooklyn, New York City, USA") # From place
    .build()
)

# Impute your data if they contain missing values
data = (
    mapper
    .imputer # From the imputer module
    .with_type("SimpleGeoImputer")  # With the type SimpleGeoImputer
    .on_columns(longitude_column="longitude", latitude_column="latitude") # On the columns longitude and latitude
    .transform(data, layer)  # All imputers require access to the urban layer in case they need to extract information from it.
)

## Enriching the Layer with Debug Enabled

Now that we've gathered the ingredients let's enrich our urban layer. E.g with the average number of floors per intersection. We’ll map the data, set up the enricher with the debug feature enabled, and apply it.

Feel free for further readings to explore our Figma system workflow at: https://www.figma.com/board/0uaU4vJiwyZJSntljJDKWf/Developer-Experience-Flow-Diagram---Snippet-Code?node-id=0-1&t=mESZ52qU1D2lfzvH-1

In [None]:
# Map data to the nearest layer
# Here the point is to say which intersection of the city maps with which record(s) in your data
# so that we can take into account when enriching.
_, mapped_data = layer.map_nearest_layer(
    data,
    longitude_column="longitude",
    latitude_column="latitude",
    output_column="nearest_intersection", # Will create this column in the data, so that we can re-use that throughout the enriching process below.
)

# Set up and apply enricher with debug enabled
enricher = (
    mapper
    .enricher # From the enricher module
    .with_data(
        group_by="nearest_intersection", values_from="numfloors"
    ) # Reading: With data grouped by the nearest intersection, and the values from the attribute numfloors
    .aggregate_by(
        method="mean", output_column="avg_floors"
    ) # Reading: Aggregate by using the mean and output the computation into the avg_floors new attribute of the urban layer
    .with_debug()  # Enable debug to add DEBUG_avg_floors column which will contain the list of indices from the input data used for each enrichment
    .build()
)
enriched_layer = enricher.enrich(
    mapped_data, layer
)  # Data to use, Urban Layer to Enrich.

## Inspecting the Enriched Layer with Debug Information

Let’s take a look at the enriched layer, which now includes the `avg_floors` column and the `DEBUG_avg_floors` column with the list of indices from the input data used for each enrichment.

In [None]:
# Preview the enriched layer with debug information
print(enriched_layer.layer[['avg_floors', 'DEBUG_avg_floors']].head(50))

## Be Able To Preview Your Enricher

Fancy a peek at your enricher? Use `preview()` to see the setup—great for when you’re digging into someone else’s work!

In [None]:
# Preview enricher
print(enricher.preview())

# More Enricher / Aggregators primitives ?

Yes ! We deliver `cont_by` instead of `aggregate_by` which simply count the number of records rather than aggregating. Further is shown per future examples outside `Basics`.

Wants more? Come shout that out on https://github.com/VIDA-NYU/UrbanMapper/issues/11

## Wrapping Up

Smashing work! 🎉 Your layer’s now enriched with average floors and includes debug information to trace back to the original data. Try visualising it next with `visualiser`.