# recalculate_sflm_processed_features

This notebook pulls the Single Feature to Label Mapping (SFLM) Google sheet and reprocesses the `unprocessed_features` column and writes back the `processed_features`. 

This is necessary as when running the `pull_objects_labelling_sheet.ipynb`, it will process any features from the `manual_data_labelling` sheet, and check for duplicates between the newly processed output and the current SFLM using its [`label` , `unprocessed_features`, `processed_features` ] columns. 

If the processing function is changed between runs of `pull_objects_labelling_sheet.ipynb`, it will mistakenly say that all previously added labels are new.

The reprocess_sflm function will reprocess the SFLM google sheet, and update the sheet with newly processed features. If the changed row was active it will also change the `status` to `"analyst_action_needed"` and the `use_processed_features` to `False`. The analyst's action is to check if that newly processed feature should be used in its processed form, or if it should use the unprocessed feature. 


In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import datetime
import logging

import pandas as pd

from phoenix.common import artifacts, run_params, utils
from phoenix.tag.labelling import sflm_processing


In [None]:
utils.setup_notebook_output()
utils.setup_notebook_logging()
papermill_logger = logging.getLogger("papermill")

In [None]:
# Parameters
# See phoenix/common/run_datetime.py expected format of parameter
RUN_DATETIME = None
TENANT_ID = "test"

# See phoenix/common/artifacts/registry_environment.py expected format of parameter
ARTIFACTS_ENVIRONMENT_KEY = "local"

OBJECT_TYPE = "facebook_posts"


In [None]:
cur_run_params = run_params.general.create(ARTIFACTS_ENVIRONMENT_KEY, TENANT_ID, RUN_DATETIME)

# INPUT
SPREADSHEET_NAME = f"{TENANT_ID}_class_mappings"
WORKSHEET_NAME = f"{OBJECT_TYPE}_feature_mappings"

TENANT_FOLDER_ID = cur_run_params.tenant_config.google_drive_folder_id


In [None]:
# Display params.
print(
cur_run_params.run_dt.dt,
cur_run_params.tenant_config,
SPREADSHEET_NAME,
WORKSHEET_NAME,
sep='\n',
)

In [None]:
google_client = artifacts.google_sheets.get_client()

In [None]:
labelled_objects_df = artifacts.google_sheets.get(
    google_client, TENANT_FOLDER_ID, SPREADSHEET_NAME, WORKSHEET_NAME
)

In [None]:
labelled_objects_df

In [None]:
reprocessed_sflm = sflm_processing.reprocess_sflm(labelled_objects_df)

In [None]:
num_action_needed_rows = reprocessed_sflm[reprocessed_sflm["status"] == "analyst_action_needed"].shape[0]
papermill_logger.info(f"{num_action_needed_rows} processed features for active row changed, please notify analyst that action is needed")

In [None]:
artifacts.google_sheets.persist(
    google_client, TENANT_FOLDER_ID, SPREADSHEET_NAME, WORKSHEET_NAME, reprocessed_sflm
)