In [8]:
import pandas as pd
import numpy as np
import pickle
from tqdm import tqdm
from sklearn.metrics import ndcg_score
tqdm.pandas()

In [9]:
size = 'demo'
type_ = 'validation'
amount = 100
predictions_df_path = f'./files/pickle/predictions_df_{size}_{type_}_{str(amount)}.pkl'
predictions_df = pd.read_pickle(predictions_df_path)
print('Predictions df shape:                      ',predictions_df.shape)

Predictions df shape:                       (100, 6)


In [10]:
predictions_df.head(2)

Unnamed: 0,user_id,article_ids_inview,article_ids_clicked,Predicted_read_times,Predicted_tuples_sorted,Predicted_article_ids
0,76658,"[9788239, 9780702, 9553264, 9787499, 6741781, ...",[9783042],"[23.87838, 21.764265, 20.772623, 24.495409, 11...","[(9787499, 24.495409), (9788239, 23.87838), (9...","[9787499, 9788239, 9780702, 9553264, 9783042, ..."
1,76658,"[9788521, 9786217, 9553264, 9788361, 9788352, ...",[9788125],"[14.02666, 13.230712, 20.772623, 12.180795, 18...","[(9553264, 20.772623), (9788352, 18.921907), (...","[9553264, 9788352, 9788125, 9788521, 9786217, ..."


In [11]:
def get_reciprocal_rank(row):
    predicted_item_list = row['Predicted_article_ids']
    clicked_article = row['article_ids_clicked'][0]
    try:
        index = predicted_item_list.index(clicked_article)
        # Return the reciprocal rank
        return 1 / (index + 1)
    except ValueError:
        # If the clicked article is not in the predicted list, return 0
        return 0
    
def calculate_precision(target, predictions):
    tp = predictions.count(target)  # Count true positives
    fp = len(predictions) - tp  # Calculate false positives
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0  # Compute precision
    return precision

def calculate_recall(target, predictions):
    tp = predictions.count(target)  # True Positives: target in predictions
    fn = 1 if tp == 0 else 0  # False Negatives: target not in predictions
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0  # Compute recall
    return recall

In [12]:
predictions_df['MMR_rank'] = predictions_df.progress_apply(get_reciprocal_rank,axis=1)
predictions_df['Precision@10']= predictions_df.progress_apply(lambda row: calculate_precision(row['article_ids_clicked'][0],row['Predicted_article_ids']),axis=1)
predictions_df['Recall@10']= predictions_df.progress_apply(lambda row: calculate_recall(row['article_ids_clicked'][0],row['Predicted_article_ids']),axis=1)
predictions_df.head(2)


100%|██████████| 100/100 [00:00<00:00, 22047.43it/s]
100%|██████████| 100/100 [00:00<00:00, 31439.20it/s]
100%|██████████| 100/100 [00:00<00:00, 48691.71it/s]


Unnamed: 0,user_id,article_ids_inview,article_ids_clicked,Predicted_read_times,Predicted_tuples_sorted,Predicted_article_ids,MMR_rank,Precision@10,Recall@10
0,76658,"[9788239, 9780702, 9553264, 9787499, 6741781, ...",[9783042],"[23.87838, 21.764265, 20.772623, 24.495409, 11...","[(9787499, 24.495409), (9788239, 23.87838), (9...","[9787499, 9788239, 9780702, 9553264, 9783042, ...",0.2,0.166667,1.0
1,76658,"[9788521, 9786217, 9553264, 9788361, 9788352, ...",[9788125],"[14.02666, 13.230712, 20.772623, 12.180795, 18...","[(9553264, 20.772623), (9788352, 18.921907), (...","[9553264, 9788352, 9788125, 9788521, 9786217, ...",0.333333,0.142857,1.0


In [13]:
print("MRR: ", predictions_df['MMR_rank'].sum()/predictions_df.shape[0])
print("Precision@10: ", predictions_df['Precision@10'].sum()/predictions_df.shape[0])
print("Recall@10: ", predictions_df['Recall@10'].sum()/predictions_df.shape[0])

MRR:  0.32375
Precision@10:  0.13027380952380951
Recall@10:  0.83


### Interpretation

- An MRR of approximately 0.299 means that, on average, the true positive item is found at about the third position in the recommendation list (since 1/0.299≈3.34). This is a reasonably good result, indicating that the recommender system often ranks the relevant items near the top of the recommendation list.

- A Precision@10 of approximately 0.128 means that about 12.8% of the items in the top 10 recommendations are relevant. This indicates that for every 10 items recommended, around 1.28 items are relevant. This precision value suggests that there is room for improvement in terms of recommending more relevant items in the top 10.

- A Recall@10 of 0.86 means that the recommender system successfully identifies 86% of the relevant items within the top 10 recommendations. This high recall value indicates that the system is very effective at finding relevant items, though they may not always be ranked at the very top of the list.

### Overall Interpretation
- MRR (0.299): The relevant items are generally ranked around the third position on average.
- Precision@10 (0.128): About 12.8% of the top 10 recommendations are relevant, indicating room for improvement in the quality of top recommendations.
- Recall@10 (0.86): The system successfully finds 86% of the relevant items within the top 10 recommendations, showing strong recall performance.

To improve precision and MRR for your recommender system, you can consider various strategies that involve improving your preprocessing, model, and recommendation logic. Here are some approaches:

1. Data Preprocessing Improvements
    1. Feature Engineering:

        - Add More Features: Incorporate additional features that could influence reading time predictions, such as article length, topic, author, publication date, user preferences, etc.
        - Normalization and Scaling: Ensure that your features are properly normalized and scaled to help the model learn more effectively.
        - Categorical Features: Use techniques like one-hot encoding or embeddings for categorical features (e.g., article categories or authors).

    2. Data Cleaning:

        - Remove Outliers: Identify and remove or handle outliers in reading times to ensure the model isn’t biased by extreme values.
        - Handle Missing Values: Ensure any missing values in the dataset are properly handled through imputation or removal.

    3. Data Augmentation:

        - Synthetic Data: If your dataset is small, consider generating synthetic data to improve model training.

2. Model Improvements

    1. Model Architecture:

        - Experiment with Different Architectures: Try different neural network architectures such as deeper networks, recurrent neural networks (RNNs), transformers, etc.
    - Hyperparameter Tuning: Perform hyperparameter tuning to find the optimal parameters for your model (learning rate, batch size, number of layers, units per layer, etc.).

    2. Training Techniques:

        - Regularization: Use regularization techniques like dropout, L2 regularization, and early stopping to prevent overfitting.
        - Ensemble Methods: Combine predictions from multiple models using ensemble methods (e.g., bagging, boosting) to improve accuracy and robustness.

3. Recommendation Logic Improvements

    1. Post-Processing Predictions:

        - Re-Ranking: After predicting reading times, re-rank articles using additional criteria such as user preferences, recent trends, or article popularity.
        - Hybrid Recommendations:

        - Combine Models: Use a hybrid approach that combines collaborative filtering, content-based filtering, and your read time prediction model to recommend articles.

In [14]:

### Feature engineering
## 1. Outlier Detection - Trained with read time <60 no such improvement 
## 2. Scroll percentage * read time-zero percentage what to do (NaN-not a good idea)
## 3. Normalize-scaling 
## 4. Threshold

### Model Architecture
## 1. Learning Rate
## 2. Early Stopping
## 3. Dropout
## 4. Add more Dense,Layers

### Hybrid Approach
## 1. Contend Based 

### Recommendations
## 1. Apply clustering

### Evaluation
## 1.Different Metric-F1



### Check the join behaviours and history
