# Processing History Data from Input Messages
Process input message, feed matrix (ideally) with shape as follows to esteemer for incorporation in algorithm
    - NOTE: could instead process math here and just feedvalue matrix to esteemer and slap right into rank algo, depends on es.score setup...

### Import step outcome matrix (ideally)
|Month     |Template Name      |Message datetime|Measure |Message Instance ID |
|--|--|--|--|--|
|2023-XX-01|'Not Top Performer'| 'str'| 'BP01'| 'str'|
|2023-XX-01|'Not Top Performer'| 'str'| 'BP01'| 'str'|
|2023-XX-01|'Not Top Performer'| 'str'| 'BP01'| 'str'|
|2023-XX-01|'Not Top Performer'| 'str'| 'BP01'| 'str'|
|2023-XX-01|'Not Top Performer'| 'str'| 'BP01'| 'str'|
|2023-XX-01|'Not Top Performer'| 'str'| 'BP01'| 'str'|

### Layout of input message metadata
```json
    "History": {
      "2023-11-01": {
        "message_template_name":"Not Top Performer",
        "message_generated_datetime": "2023-11-01T1850.426262",
        "measure":"BP01",
        "message_instance_id":"history-test-alpha"
      },
      "2023-12-01": {
      }
    },
```

In [None]:
import pandas as pd
import sys
from loguru import logger

logger.remove()
logger.add(sys.stdout, format="<b><level>{level}</></> \t{message}")

## Pull in history component of input message
def retrieve_history_data(input_message_json):
    logger.trace('Running esteemer.process_history.retrieve_history_data...')
    # Define blank output matrix as a list of dictionaries
    history_component_matrix = []

    # Iterate through items in 'History'
    for month, data in input_message_json['History'].items():
        if data:  # Check if the month has data
            
            # Create a dictionary for each row in the output matrix per month in history
            matrix_row = {
                'Month': month,
                'Measure': data.get('measure', ''),
                'Template Name': data.get('message_template_name', ''),
                'Message Instance ID': data.get('message_instance_id', ''),
                'Message datetime': data.get('message_generated_datetime', ''),
            }
            # Append the row to the output matrix
            history_component_matrix.append(matrix_row)

    # Convert the list of dictionaries to a pandas DataFrame
    outcome_matrix = pd.DataFrame(history_component_matrix)

    # Print the DataFrame (optional, for debugging)
    logger.debug(outcome_matrix)

    return outcome_matrix

## Step 2: Math Time
This math should probably live in Esteemer itself, writing in the notebook for my own sanity :)

Objectives: 
1) Pull in requisite data for calculations:
    - the current_feedback_month (month for which feedback is being generated), assign as time 0
    - Acceptable Candidates' measure(s)
    - Acceptable Candidates' message template name(s)
2) Make empty dicts for storing message and measure recency values

3) Compare row 'month' to t0, convert to integers representing distance from t0
   - Ex: t0 = Dec, Nov therefore t(-1), Oct = t(-2), etc
4) Calculate message recency
   - if acceptable_candidate['message_template_name'] matches a message_template_name value in matrix, calculate how many months it has been since that match
   - ex, if month is december now and candidate template is 'not top performer', calc months since last 'not top performer' noted in history
5) Calculate measure recency
    - if acceptable_candidate['measure'] matches measure in matrix, calculate months since that match


In [None]:
def rank_history_component(acceptable_candidate):

    # 1) Assign time 0 (current feedback month)
    this_month = pd.to_datetime(acceptable_candidate['current_feedback_month'])

    # 2) Define dicts for data output and input
    message_recency_dict = {}
    measure_recency_dict = {}
    history_matrix = retrieve_history_data(acceptable_candidate)

    # 3) Compare row 'month' to t0, convert to integers representing time in months between extant feedback and current month (positive integer)
    history_matrix['Time_Since_t0'] = (
        this_month - pd.to_datetime(history_matrix['Month'])
    ).astype('<m8[M]').astype(int)

    # 4) Calculate message recency
    for candidate in acceptable_candidate:

        # Filter matrix for the specific candidate
        candidate_rows = history_matrix[
            (history_matrix['message_template_name'] == candidate['message_template_name']) &
            (history_matrix['Measure'] == candidate['measure'])
        ]

        # Calculate months since last occurrence of same message
        if not candidate_rows.empty:
            last_occurrence = candidate_rows['Distance_From_t0'].max()
            message_recency_dict[candidate['message_template_name']] = last_occurrence
        else:
            message_recency_dict[candidate['message_template_name']] = None

    # 5) Calculate measure recency
    for candidate in acceptable_candidate:
        candidate_measure = candidate['measure']

        # Filter matrix for the specific candidate measure
        candidate_rows = history_matrix[history_matrix['Measure'] == candidate_measure]

        # Calculate months since last occurrence of same measure
        if not candidate_rows.empty:
            last_occurrence = candidate_rows['Distance_From_t0'].max()
            measure_recency_dict[candidate['measure']] = last_occurrence
        else:
            measure_recency_dict[candidate['measure']] = None

    # Print the recency dictionaries
    logger.debug("Message Recency:", message_recency_dict)
    logger.debug("Measure Recency:", measure_recency_dict)


### Transform code
# Need to turn the raw numbers in the recency dicts (simply time since last recieved, in months)
# into terms that can be added to rank summation. Being able to use boolean conditions actually simplifies the 
# transform math, so we don't really need to do anything now to it assuming the weighting terms in MPM are correct
#   Transform function (for each term) is:  weight_adjustment*(e^(-X)) (X+1)^(-1))
'''
def transform_recency_terms(recency_dict, mpm_df):
    for if recency_dict is None:
        return 0 
'''