# Polygon labeling widget

This standalone notebook contains an `ipywidget` app that allows the user to visually inspect polygons and associated NDVI time-series plots and label them accordingly. To keep things simple, we use the following labels -

• **Farm:** parcels of land where we can see crops growing or there are markings indicative of them being farm lands
• **Field:** parcels of land which are not used for farming (as far as we can tell from visual inspection)
• **Tree:** groups of trees; a single tree might not produce an NDVI characteristic of trees depending on GEE aggregation
• **Other:** Man-made structures (houses, roads etc)

In [1]:
import pandas as pd 
import geopandas as gpd
from scipy.signal import savgol_filter
import ipywidgets as widgets
from IPython.display import display
import holoviews as hv
hv.extension("bokeh")

## Import polygon geopandas dataframe

The first step is to import the geopandas dataframe of the region of interest. The polygon files can be found as `.gpkg` files in the `samgeo_aws_ec2/vectors/` directory.

In [2]:
def gdf_tile(file_path: str, tile_name: str) -> gpd.GeoDataFrame:
    """ 
    This function creates a subset of a larger geopandas dataframe depending
    on the particular tile required.

    Args: (i) file_path: file path to the .gpkg file
          (ii) tile_name: tile number under consideration ('tile_0', 'tile_1',..., 'tile_24')
    
    Returns: gdf_subset - gdf containing only the polygons in the required tile
    """

    gdf = gpd.read_file(file_path)
    gdf_subset = gdf[gdf["tile_name"]==tile_name]

    return gdf_subset

In [29]:
FILE_PATH = "../../../samgeo_aws_ec2/vectors/Kajiado_1.gpkg"

gdf_subset = gdf_tile(FILE_PATH, "tile_13")

gdf_subset.explore(tiles="Esri.WorldImagery")

## Import and clean raw NDVI time-series

Depending on which tile of a given regions was chosen, the raw NDVI time-series `.csv` file is imported to be processed. The raw files can be found in the `crop_classification/time_series_analyses/ndvi_series_raw/` directory.

In [4]:
def clean_ndvi_series(df: pd.DataFrame) -> pd.DataFrame:
    """ 
    This function cleans the raw NDVI data by dropping unnecessary columns, chaning it
    from a wide to long format and filling in NaN values through interpolation.

    Args: df - the raw NDVI time-series data for a single tile

    Returns: df_melted - the cleaned version of the time-series data
    """
    df = df.drop(columns=["system:index", ".geo"]) # Remove useless columns

    uuid_col = df.columns[-1]
    new_cols = [uuid_col] + list(df.columns[:-1])
    df = df.reindex(columns=new_cols)

    # Isolate only the numerical portion of the dataframe
    #df.iloc[:, 1:] = df.iloc[:, 1:].apply(fill_dates, axis=1)
    df.iloc[:, 1:] = df.iloc[:, 1:].interpolate(method="linear", axis=1)

    # Use melt() to transform the wide-format data to long-format
    df_melted = (
        df.melt(id_vars="uuid", var_name="date", value_name="ndvi")
          .groupby("uuid", group_keys=False).apply(lambda row: row.sort_values(by="date", ascending=False))
    )
    df_melted["date"] = pd.to_datetime(df_melted["date"])
    df_melted = df_melted.drop_duplicates(subset=["uuid", "date"], keep="first")

    return df_melted.reset_index(drop=True)

def date_resample(df: pd.DataFrame) -> pd.DataFrame:
    """ 
    This function performs resampling on chunks of the dataframe (based on uuid)
    to remove irregular time samples by resample to 5 day intervals and interpolating
    the additional fields.
    """
    if len(df["date"].diff().value_counts()) > 1:
        # If there are multiple `periods` in the data

        df = (
            df.set_index("date").resample("5D")
              .asfreq()
        )

        df[["ndvi", "ndvi_smoothed"]] = df[["ndvi", "ndvi_smoothed"]].interpolate()
        df["uuid"] = df["uuid"].fillna(df["uuid"].mode()[0])

        return df.reset_index()
    else:
        return df

In [30]:
ndvi_raw = pd.read_csv("../ndvi_series_raw/ndvi_series_Kajiado_1_tile_13.csv")

ndvi_clean = clean_ndvi_series(ndvi_raw)

groups = []
for uuid, group in ndvi_clean.groupby("uuid"):
    group["ndvi_smoothed"] = savgol_filter(group["ndvi"], window_length=7, polyorder=3)
    #groups.append(date_resample(group))
    groups.append(group)

ndvi_clean = pd.concat(groups)

  .groupby("uuid", group_keys=False).apply(lambda row: row.sort_values(by="date", ascending=False))


## Labeling widget

The following cell contains the labeling widget. It contains a plotting function which stacks the NDVI time-series plot with the corresponding polygon overlaid on a tile map. Furthermore, there are other functions for making the dropdown menu and the label field. The `labels_dict` logs label entries any time they are made.

In [31]:
# Outputs for plotting and map
plot_out = widgets.Output()
map_out = widgets.Output()

# Plot function
def plot_timeseries(uuid: str):
    """ 
    This function plots the NDVI time-series of a given polygon identified by its uuid.
    """
    df_subset = ndvi_clean[ndvi_clean["uuid"] == uuid].set_index("date")

    with plot_out:
        plot_out.clear_output(wait=True)

        # HoloViews plot
        s1 = hv.Scatter(df_subset["ndvi"], label="raw").opts(size=8, ylim=(-0.1, 1.1), line_color="black", tools=["hover"])
        c1 = hv.Curve(df_subset["ndvi"])
        s2 = hv.Scatter(df_subset["ndvi_smoothed"], label="Savitzky-Golay").opts(size=8, line_color="black", tools=["hover"])
        c2 = hv.Curve(df_subset["ndvi_smoothed"])
        overlay_1 = (s1 * c1 * s2 * c2).opts(width=1050, height=400, show_grid=True, title="NDVI Time Series")
        display(overlay_1)

    with map_out:
        map_out.clear_output(wait=True)

        # Folium/GeoPandas map
        gdf_polygon = gdf_subset[gdf_subset["uuid"] == uuid]

        if not gdf_polygon.empty:
            fmap = gdf_polygon.explore(tiles="Esri.WorldImagery", color="red", zoom=15)
            display(fmap)

# Widgets
uuid_list = ndvi_clean["uuid"].unique().tolist()
uuid_dropdown = widgets.Dropdown(options=uuid_list, description="UUID:")
label_text = widgets.Text(description="Label")
save_button = widgets.Button(description="Save Label", button_style="success")

# Dictionary where the labels are stored
labels_dict = {}

def save_label(b):
    current_uuid = uuid_dropdown.value
    label = label_text.value
    labels_dict[current_uuid] = label
    print(f"Saved label for {current_uuid}: {label}")

save_button.on_click(save_label)

# UI layout
ui = widgets.HBox([uuid_dropdown, label_text, save_button])
uuid_dropdown.observe(lambda change: plot_timeseries(change['new']), names='value')

# Initial display
display(ui, plot_out, map_out)
plot_timeseries(uuid_list[0])  # Load first plot by default


HBox(children=(Dropdown(description='UUID:', options=('0017f0f5-6601-4539-81a7-77237e2b0adb', '001bfe0c-4dcb-4…

Output()

Output()

It is advised that the data is saved regularly since resetting the Jupyter Notebook kernel or closing the notebook will result in the `label_dict` variable being deleted.

In [8]:
def make_labels_df(labels_dict: dict["str", "str"]) -> pd.DataFrame:
    """ 
    This function converts the `labels_dict` dictionary into
    a pandas dataframe.
    """

    labels_df = pd.DataFrame.from_dict(labels_dict, orient="index")
    labels_df.rename(columns={0: "class"}, inplace=True)

    #labels_df.to_csv("Trans_Nzoia_1_tile_1_NDVI_labels_4.csv")

    return labels_df

In [9]:
# labels_df = make_labels_df(labels_dict)

In [10]:
# labels_count_df = labels_df["class"].value_counts().reset_index()

# bar = hv.Bars(labels_count_df, kdims="class", vdims="count").opts(
#     color="class",
#     cmap='Category10', 
#     line_color="black",
#     line_width=2,
#     width=600,
#     height=400,
#     title="Trans Nzoia tile_0 labels",
#     tools=["hover"]
# )

# bar