In [11]:
import pandas as pd

# Load moderator data
moderator_data = pd.read_excel("../EDA/Datasets/moderator-data-cleaned.xlsx")

# Filter out moderators with a handling time of 0
moderator_data = moderator_data[moderator_data['handling time'] > 0]

# Creating a Composite Scoring System for Moderators
To create a scoring system for the moderators, we will use the `Productivity` and `accuracy` columns to determine how "good" the moderators are.

**Normalization**

Since the 2 metrics (`Productivity` and `accuracy`) are on different scales, normalization is required to scale them to a common scale. This ensures that no single metric disproportionately affects the composite score.

We will use Min-Max scaling to normalize the features.

**Scoring System**
$$
\text{score} = \beta_1 \times \text{productivity} + \beta_2 \times \text{accuracy}
$$
The coefficients $\beta_1$ and $\beta_2$ represent weights that determine the importance of `productivity` and `accuracy`. These coefficients ideally should be determined from calibrating and optimizing based on feedback and desired outcomes. However, for this we will assume each weight to be 0.5. 

In [12]:
# Normalize accuracy and productivity
moderator_data['normalized_accuracy'] = (moderator_data['accuracy'] - moderator_data['accuracy'].min()) / \
                                        (moderator_data['accuracy'].max() - moderator_data['accuracy'].min())

moderator_data['normalized_productivity'] = (moderator_data['Productivity'] - moderator_data['Productivity'].min()) / \
                                           (moderator_data['Productivity'].max() - moderator_data['Productivity'].min())

# Assigning the average accuracy of other moderators to those with NaN values
average_accuracy = moderator_data['accuracy'].mean()
moderator_data['accuracy'].fillna(average_accuracy, inplace=True)

# Recalculating the normalized_accuracy and moderator_score
moderator_data['normalized_accuracy'] = (moderator_data['accuracy'] - moderator_data['accuracy'].min()) / \
                                        (moderator_data['accuracy'].max() - moderator_data['accuracy'].min())
moderator_data['moderator_score'] = (moderator_data['normalized_accuracy'] + moderator_data['normalized_productivity']) / 2                                           


# Compute moderator score as the average of normalized accuracy and normalized productivity
moderator_data['moderator_score'] = (moderator_data['normalized_accuracy'] + moderator_data['normalized_productivity']) / 2

# Display the first few rows with the new columns
moderator_data[['moderator', 'normalized_accuracy', 'normalized_productivity', 'moderator_score']].head()

Unnamed: 0,moderator,normalized_accuracy,normalized_productivity,moderator_score
0,1704427801912322,0.906667,0.20714,0.556903
1,1712377365906433,0.828,0.333465,0.580732
2,1705699742139394,0.826667,0.276388,0.551527
3,1759969798094866,0.708,0.302229,0.505114
4,9060023,0.066667,0.007371,0.037019


**Calculating Max Tasks per Day**

When the queue optimization system is allocating ads to moderators, it should ensure that the ads are evenly allocated among all suitable moderators. As the TikTok Data Science team expects an increase of `utilization %` by 10%, we set that as the threshold (i.e. each moderator's utilization can only increase by max 10%). 

With the above, we can calculate the max number of tasks each moderator can take on per day with the below formula, it assumes that TikTok moderators work 8 paid hours per day and that `handling time` is in ms (as shown from EDA).
$$
\text{max\_tasks\_per\_day} = \frac{0.1 \times 8 \times 60 \times 60 \times 1000}{\text{handling\_time}}
$$

In [13]:
# Assumed paid hours per day for TikTok moderators
PAID_HOURS_PER_DAY = 8

# Calculate the maximum number of tasks each moderator can handle in a day based on a 10% increase in utilization
moderator_data['max_tasks_per_day'] = (0.1 * PAID_HOURS_PER_DAY * 60 * 60 * 1000) / moderator_data['handling time']

# Display the first few rows with the updated max_tasks_per_day
moderator_data[['moderator', 'handling time', 'Utilisation %', 'max_tasks_per_day']].head()

Unnamed: 0,moderator,handling time,Utilisation %,max_tasks_per_day
0,1704427801912322,119688,1.28725,24.062563
1,1712377365906433,102324,1.157927,28.14589
2,1705699742139394,76773,1.150042,37.513188
3,1759969798094866,100732,1.146969,28.590716
4,9060023,340,1.133573,8470.588235


**Generating Expertise**

In the additional models we proposed, we implemented a Sentence Transformers (?) model that is able to identify and categorize ad descriptions (change if wrong) into a set list of categories. We would then match these categories with the moderators' expertise to ensure timely and accurate moderation.

However, since the current dataset does not include this `expertise` feature, we will assume that all moderators do not have any expertise as of now. The optimization matching model will take note of this and still ensure that each moderator will only be given a maximum of 3 categories per day, to ensure that their work is more focused and efficient. Below, we will generate an empty array for the `expertise` column.

In [14]:
# Add a new column with empty lists
moderator_data["expertise"] = [[] for _ in range(len(moderator_data))]

moderator_data

Unnamed: 0.1,Unnamed: 0,moderator,market,Productivity,Utilisation %,handling time,accuracy,normalized_accuracy,normalized_productivity,moderator_score,max_tasks_per_day,expertise
0,0,1704427801912322,"[""RO""]",274.5480,1.287250,119688,0.930,0.906667,0.207140,0.556903,24.062563,[]
1,1,1712377365906433,"[""KH""]",441.6525,1.157927,102324,0.871,0.828000,0.333465,0.580732,28.145890,[]
2,2,1705699742139394,"[""KH""]",366.1500,1.150042,76773,0.870,0.826667,0.276388,0.551527,37.513188,[]
3,3,1759969798094866,"[""KH""]",400.3325,1.146969,100732,0.781,0.708000,0.302229,0.505114,28.590716,[]
4,4,9060023,"[""GB"", ""IT"", ""IE""]",10.2900,1.133573,340,0.300,0.066667,0.007371,0.037019,8470.588235,[]
...,...,...,...,...,...,...,...,...,...,...,...,...
1280,1280,7579980,"[""TH""]",5.3500,0.000000,172,0.667,0.556000,0.003636,0.279818,16744.186047,[]
1281,1281,5827188,"[""US"", ""CA""]",37.2000,0.000000,3103,0.571,0.428000,0.027714,0.227857,928.134064,[]
1282,1282,7167613,"[""UA"", ""RU"", ""GB"", ""IE""]",0.5400,0.000000,99,0.556,0.408000,0.000000,0.204000,29090.909091,[]
1283,1283,9020538,"[""PL"", ""PT"", ""GB"", ""IL""]",125.2900,0.000000,30487,0.250,0.000000,0.094306,0.047153,94.466494,[]
