# Rank prediction using Random Forest Classifier

-- Vishwa Sheth

<b><u>Key components of the model</u></b>

Data Preprocessing: After imputing missing values using KNN, we convert categorical data to numeric using the get_dummies function in pandas. This conversion helps to format the data in a way that is suitable for the model.

Feature selection: We remove irrelevant features such as "Name" and "College" from consideration for prediction. Additionally, "Round" and "Pick" are excluded as they are part of the target feature.

Target feature: Currently, we aim to predict ranking using the "Round" feature. In the future, we plan to incorporate "Pick" before final submission.

Dataset split: Given that this is a ranking problem, the training dataset includes all years except for 2023. Data from 2023 will be used solely for predicting the rank.

The hyperparameters are tuned using cross-validation. The disparity between baseline measurements and best-fit measurements demonstrates an improvement in accuracy and other metrics following 5-Fold cross-validation.

Comparative Analysis of Baseline and Best-Fit Random Forest Models for Ranking Prediction

<u>Note</u>: This work is in progress; we aim to improve measurement parameters, include ranking parameters in Cross Validation instead of accuracy and include "Pick" in the target feature.

In [1]:
import pandas as pd

# Read the CSV file
df = pd.read_csv("data/imputed_data.csv")
print(df.columns)

Index(['Name', 'Position', 'College', 'Round', 'Pick', 'Stat URL', 'Height',
       'Weight', '40 Yard Dash', 'Bench Press', 'Vertical Jump', 'Broad Jump',
       '3 Cone Drill', 'Shuttle', 'conf_abbr', 'games', 'seasons',
       'tackles_solo', 'tackles_assists', 'tackles_total', 'tackles_loss',
       'sacks', 'def_int', 'def_int_yds', 'def_int_td', 'pass_defended',
       'fumbles_rec', 'fumbles_rec_yds', 'fumbles_rec_td', 'fumbles_forced',
       'rec', 'rec_yds', 'rec_yds_per_rec', 'rec_td', 'rush_att', 'rush_yds',
       'rush_yds_per_att', 'rush_td', 'scrim_att', 'scrim_yds',
       'scrim_yds_per_att', 'scrim_td', 'Year'],
      dtype='object')


In [2]:
df.head

<bound method NDFrame.head of                  Name Position          College  Round  Pick  \
0       Emmanuel Acho      OLB            Texas      6   204   
1           Joe Adams       WR         Arkansas      4   104   
2        Chas Alecxih       DT       Pittsburgh      0     0   
3     Frank Alexander       DE         Oklahoma      4   103   
4       Antonio Allen        S   South Carolina      7   242   
...               ...      ...              ...    ...   ...   
3679      Luke Wypler        C         Ohio St.      6   190   
3680      Bryce Young       QB          Alabama      1     1   
3681      Byron Young       DT          Alabama      3    70   
3682      Byron Young     EDGE        Tennessee      3    77   
3683    Cameron Young       DT  Mississippi St.      4   123   

                                               Stat URL  Height  Weight  \
0     https://www.sports-reference.com/cfb/players/e...    74.0   238.0   
1     https://www.sports-reference.com/cfb/players/

In [3]:
df.loc[df.Round != 1, "Round"] = 0

# Dropping the columns which donot contribute in prediction
all_X = df.drop(["Name", "Round", "Pick", "College"], axis=1)
all_X = pd.get_dummies(all_X)

# Splitting testing and training sets
train_X = all_X[(all_X.Year != 2023)].drop(["Year"], axis=1)
test_X = all_X[all_X.Year == 2023].drop(["Year"], axis=1)
train_y = df[(df.Year != 2023)].Round
test_y = df[df.Year == 2023].Round



In [4]:
train_X.head()

Unnamed: 0,Height,Weight,40 Yard Dash,Bench Press,Vertical Jump,Broad Jump,3 Cone Drill,Shuttle,games,seasons,...,conf_abbr_CUSA,conf_abbr_Ind,conf_abbr_MAC,conf_abbr_MVC,conf_abbr_MWC,conf_abbr_Pac-10,conf_abbr_Pac-12,conf_abbr_SEC,conf_abbr_Sun Belt,conf_abbr_WAC
0,74.0,238.0,4.64,24.0,35.5,118.0,7.13,4.28,37.0,3.0,...,False,False,False,False,False,False,False,False,False,False
1,71.0,179.0,4.51,14.59,36.0,123.0,7.09,4.12,40.0,4.0,...,False,False,False,False,False,False,False,False,False,False
2,76.0,296.0,5.31,19.0,25.5,99.0,7.74,4.62,34.0,3.0,...,False,False,False,False,False,False,False,False,False,False
3,76.0,270.0,4.8,24.48,31.13,115.26,7.19,4.48,37.0,4.0,...,False,False,False,False,False,False,False,False,False,False
4,73.0,210.0,4.58,17.0,34.0,118.0,7.02,4.25,42.0,4.0,...,False,False,False,False,False,False,False,False,False,False


In [5]:
test_X.head()

Unnamed: 0,Height,Weight,40 Yard Dash,Bench Press,Vertical Jump,Broad Jump,3 Cone Drill,Shuttle,games,seasons,...,conf_abbr_CUSA,conf_abbr_Ind,conf_abbr_MAC,conf_abbr_MVC,conf_abbr_MWC,conf_abbr_Pac-10,conf_abbr_Pac-12,conf_abbr_SEC,conf_abbr_Sun Belt,conf_abbr_WAC
3400,70.0,216.0,4.51,19.42,33.64,115.58,7.03,4.28,31.0,3.0,...,False,False,False,False,False,False,False,False,False,False
3401,73.0,237.0,4.47,17.09,36.5,129.0,7.22,4.25,53.0,5.0,...,False,False,False,False,False,False,False,False,False,False
3402,69.0,188.0,4.32,14.92,33.0,119.26,7.02,4.19,30.0,3.0,...,False,False,False,False,False,False,False,True,False,False
3403,71.0,173.0,4.49,15.14,34.0,122.0,7.0,4.16,35.0,3.0,...,False,False,False,False,False,False,False,False,False,False
3404,74.0,282.0,4.49,27.0,37.5,125.0,7.22,4.47,36.0,4.0,...,False,False,False,False,False,False,False,False,False,False


In [6]:
import numpy as np

# Assuming y_train is a numpy array or a pandas Series
class_counts = np.unique(train_y, return_counts=True)

# Print class labels and their counts
for label, count in zip(class_counts[0], class_counts[1]):
    print(f"Class {label}: {count} instances")

Class 0: 3064 instances
Class 1: 336 instances


In [7]:
from sklearn.ensemble import RandomForestClassifier

# Define the parameter values as baseline
n_estimators = 1      
max_depth = None      
min_samples_split = 1000  
min_samples_leaf = 1000   
max_features = None   
bootstrap = False     

# Initialize the Random Forest classifier with custom parameters and class weights
baseline_rf = RandomForestClassifier(n_estimators=n_estimators,
                                    max_depth=max_depth,
                                    min_samples_split=min_samples_split,
                                    min_samples_leaf=min_samples_leaf,
                                    max_features=max_features,
                                    bootstrap=bootstrap,
                                    class_weight='balanced')

# Initialize and train Random Forest classifier as baseline
baseline_rf.fit(train_X, train_y)


In [8]:
# Make predictions on test data
baseline_preds = preds = baseline_rf.predict_proba(test_X)
count = 1

# Ranking done according to the probability scores
for i in pd.DataFrame(baseline_preds).sort_values(by=1, ascending=False).index:
    print(str(count) + " " + str(df[df.Year==2023].reset_index().at[i, "Name"]))
    count += 1

1 Israel Abanikanda
2 Mike Morris
3 Tashawn Manning
4 Michael Mayer
5 Warren McClendon
6 Jordan McFadden
7 Tanner McKee
8 Kendre Miller
9 Marvin Mims
10 Keaton Mitchell
11 Wanya Morris
12 Calijah Kancey
13 Myles Murphy
14 Lukas Van Ness
15 John Ojukwu
16 BJ Ojulari
17 Jarrett Patterson
18 Kyle Patterson
19 Jack Podlesny
20 Asim Richards
21 Jaxson Kirkland
22 Darrell Luter Jr.
23 Anton Harrison
24 Clark Phillips III
25 Malik Heath
26 Nick Herbig
27 Ronnie Hickman
28 Brandon Hill
29 Xavier Hutchinson
30 Jalin Hyatt
31 Andre Carter II
32 Rashad Torrence II
33 Thomas Incoom
34 Paris Johnson Jr.
35 Rakim Jarrett
36 Antonio Johnson
37 Quentin Johnston
38 Broderick Jones
39 Dawand Jones
40 Jaylon Jones
41 Will Anderson Jr.
42 Emil Ekiyor Jr.
43 Anthony Richardson
44 Eli Ricks
45 Kelee Ringo
46 Parker Washington
47 DJ Turner
48 Carrington Valentine
49 Deuce Vaughn
50 Andrew Vorhees
51 Dalton Wagner
52 Alex Ward
53 Carter Warren
54 Darnell Washington
55 Tyrus Wheat
56 Tavius Robinson
57 Blake W

In [9]:
print(baseline_preds)

[[0.40744303 0.59255697]
 [0.6722166  0.3277834 ]
 [0.40744303 0.59255697]
 [0.40744303 0.59255697]
 [0.6722166  0.3277834 ]
 [0.40744303 0.59255697]
 [0.6722166  0.3277834 ]
 [0.40744303 0.59255697]
 [0.40744303 0.59255697]
 [0.40744303 0.59255697]
 [0.40744303 0.59255697]
 [0.6722166  0.3277834 ]
 [0.40744303 0.59255697]
 [0.6722166  0.3277834 ]
 [0.6722166  0.3277834 ]
 [0.6722166  0.3277834 ]
 [0.6722166  0.3277834 ]
 [0.6722166  0.3277834 ]
 [0.6722166  0.3277834 ]
 [0.6722166  0.3277834 ]
 [0.6722166  0.3277834 ]
 [0.40744303 0.59255697]
 [0.6722166  0.3277834 ]
 [0.6722166  0.3277834 ]
 [0.40744303 0.59255697]
 [0.40744303 0.59255697]
 [0.6722166  0.3277834 ]
 [0.6722166  0.3277834 ]
 [0.40744303 0.59255697]
 [0.6722166  0.3277834 ]
 [0.40744303 0.59255697]
 [0.40744303 0.59255697]
 [0.6722166  0.3277834 ]
 [0.40744303 0.59255697]
 [0.40744303 0.59255697]
 [0.40744303 0.59255697]
 [0.40744303 0.59255697]
 [0.6722166  0.3277834 ]
 [0.6722166  0.3277834 ]
 [0.40744303 0.59255697]


In [10]:
predicted_labels = (baseline_preds[:, 1] > 0.5).astype(int)

In [11]:
print(predicted_labels)

[1 0 1 1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 0 1 1 0 1 1 1 1
 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 0 0 0 0 1 1 0 1 0 1 0 1
 1 1 0 1 1 1 1 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 1 0 1 0 1 0 0 0 0 0 1 1 0 1 0
 1 0 1 0 1 0 1 0 0 1 0 1 1 0 0 1 1 0 0 0 0 1 0 0 1 1 1 0 0 1 0 1 1 0 0 0 0
 0 0 0 0 1 0 0 1 1 1 0 0 1 0 1 0 1 0 0 1 0 0 1 1 0 1 0 0 0 1 0 0 0 1 1 0 0
 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0 1 1 1 0 1 1 1 0 1 0 1
 0 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 1 0 1 0 1 0 0 1 1 1 0 1 1 1
 0 1 1 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 1 1 0 1 0]


In [12]:
from sklearn.metrics import accuracy_score, roc_auc_score
import numpy as np

# # Convert predicted probabilities to binary predictions based on a threshold (e.g., 0.5)
# predicted_labels = (baseline_preds[:, 1] > 0.5).astype(int)

# Evaluation for ranking metrics
# Sort the predictions based on probability scores
sorted_indices = np.argsort(-preds[:, 1])
k = 10
num_relevant = sum(test_y)

def calculate_MRR(sorted_indices, test_y):
    # Calculate Mean Reciprocal Rank (MRR)
    mrr = 0
    for idx, i in enumerate(sorted_indices):
        if test_y.iloc[i] == 1:  # Use iloc to access test_y by index
            mrr = 1 / (idx + 1)
            break
    return mrr

def calculate_MAP(sorted_indices, test_y):
    # Calculate Mean Average Precision (MAP)
    ap = 0
    for idx, i in enumerate(sorted_indices):
        if test_y.iloc[i] == 1:
            ap += sum(test_y.iloc[:idx + 1]) / (idx + 1)
    map_score = ap / num_relevant
    return map_score

def calculate_NDCG(sorted_indices, test_y):
    # Calculate Normalized Discounted Cumulative Gain (NDCG) at k=10
    dcg = 0
    idcg = sum(1 / np.log2(np.arange(2, k + 2)))
    for idx, i in enumerate(sorted_indices[:k]):
        if test_y.iloc[i] == 1:
            dcg += 1 / np.log2(idx + 2)
    ndcg = dcg / idcg
    return ndcg

def calculate_PAK(sorted_indices, test_y):
    # Calculate Precision at k (P@k) 
    tp_at_k = sum(test_y.iloc[sorted_indices[:k]])
    precision_at_k = tp_at_k / k
    return precision_at_k

def calculate_RAK(sorted_indices, test_y):
    # Calculate Recall at k (R@k)
    tp_at_k = sum(test_y.iloc[sorted_indices[:k]])
    recall_at_k = tp_at_k / num_relevant
    return recall_at_k

In [13]:
pip install tabulate

Note: you may need to restart the kernel to use updated packages.


In [14]:
from tabulate import tabulate

# Calculate all measurements
baseline_measurements = [
    ("Accuracy", accuracy_score(test_y, predicted_labels)),
    ("ROC AUC Score", roc_auc_score(test_y, baseline_preds[:, 1])),
    ("Mean Reciprocal Rank (MRR)", calculate_MRR(sorted_indices, test_y)),
    ("Mean Average Precision (MAP)", calculate_MAP(sorted_indices, test_y)),
    ("Normalized Discounted Cumulative Gain (NDCG) at k=10", calculate_NDCG(sorted_indices, test_y)),
    ("Precision at k (P@k) at k=10", calculate_PAK(sorted_indices, test_y)),
    ("Recall at k (R@k) at k=10", calculate_RAK(sorted_indices, test_y))
]

# Print measurements in a table format
print("Baseline measurements")
print(tabulate(baseline_measurements, headers=["Metric", "Value"]))


Baseline measurements
Metric                                                    Value
----------------------------------------------------  ---------
Accuracy                                              0.619718
ROC AUC Score                                         0.696552
Mean Reciprocal Rank (MRR)                            0.125
Mean Average Precision (MAP)                          0.115021
Normalized Discounted Cumulative Gain (NDCG) at k=10  0.0694312
Precision at k (P@k) at k=10                          0.1
Recall at k (R@k) at k=10                             0.0344828


In [17]:
from sklearn.model_selection import GridSearchCV

# Define the parameter grid for hyperparameter tuning
param_grid = {
    'n_estimators': [50, 100, 150],  # example values, adjust as needed
    'max_depth': [None, 10, 20],      # example values, adjust as needed
    'min_samples_split': [2, 5, 10],  # example values, adjust as needed
    'min_samples_leaf': [1, 2, 4],
    'class_weight': ['balanced']
}

# Initialize RandomForestClassifier
rf_classifier = RandomForestClassifier(class_weight='balanced', random_state=42)

# Perform grid search with cross-validation
grid_search = GridSearchCV(estimator=rf_classifier, param_grid=param_grid, cv=5, scoring='accuracy')

# Train the model with cross-validation
grid_search.fit(train_X, train_y)

# Get the best parameters and best model from grid search
best_params = grid_search.best_params_
best_model = grid_search.best_estimator_

# Print the best parameters
print("Best Parameters:", best_params)

# Evaluate the best model
accuracy = grid_search.best_score_
print("Accuracy of best model is : ", accuracy)
# Add evaluation for other ranking parameters using appropriate metrics/functions

# You can also use the best model for predictions on new data
# best_model.predict(test_X)


Best Parameters: {'class_weight': 'balanced', 'max_depth': None, 'min_samples_leaf': 1, 'min_samples_split': 5, 'n_estimators': 50}
Accuracy of best model is :  0.9017647058823529


In [18]:
# Make predictions on test data
baseline_preds = preds = best_model.predict_proba(test_X)
count = 1

# Ranking done according to the probability scores
for i in pd.DataFrame(baseline_preds).sort_values(by=1, ascending=False).index:
    print(str(count) + " " + str(df[df.Year==2023].reset_index().at[i, "Name"]))
    count += 1

1 C.J. Stroud
2 Bryce Young
3 Wanya Morris
4 Anthony Richardson
5 Christian Gonzalez
6 Jalen Redmond
7 Jakorian Bennett
8 Isaiah Foskey
9 YaYa Diaby
10 Jaren Hall
11 Ryan Hayes
12 Tavius Robinson
13 Emmanuel Forbes
14 Tyler Steen
15 Joe Tippmann
16 Anton Harrison
17 John Ojukwu
18 Malaesala Aumavae-Laulu
19 Aidan O'Connell
20 Hendon Hooker
21 Julius Brents
22 Isaiah McGuire
23 Zacch Pickens
24 Lukas Van Ness
25 Derick Hall
26 Yasir Abdullah
27 Blake Freeland
28 Carrington Valentine
29 Nolan Smith
30 Kelee Ringo
31 Cory Trice
32 Earl Bostick Jr.
33 Rejzohn Wright
34 Dante Stills
35 Matthew Bergeron
36 Mazi Smith
37 Paris Johnson Jr.
38 Rashee Rice
39 Riley Moss
40 Jalin Hyatt
41 Darnell Wright
42 Malik Cunningham
43 Nick Herbig
44 Charlie Thomas
45 Clayton Tune
46 Andre Carter II
47 Byron Young
48 Darrell Luter Jr.
49 Jonathan Mingo
50 Ali Gaye
51 Calijah Kancey
52 A.T. Perry
53 Carter Warren
54 Will Anderson Jr.
55 Chamarri Conner
56 Tyreque Jones
57 Trenton Simpson
58 Demario Douglas


In [19]:
predicted_labels = (baseline_preds[:, 1] > 0.5).astype(int)


In [20]:
predicted_labels

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [21]:

# Evaluation for ranking metrics
# Sort the predictions based on probability scores
sorted_indices = np.argsort(-preds[:, 1])

# Calculate all measurements
best_rf_measurements = [
    ("Accuracy", accuracy_score(test_y, predicted_labels)),
    ("ROC AUC Score", roc_auc_score(test_y, baseline_preds[:, 1])),
    ("Mean Reciprocal Rank (MRR)", calculate_MRR(sorted_indices, test_y)),
    ("Mean Average Precision (MAP)", calculate_MAP(sorted_indices, test_y)),
    ("Normalized Discounted Cumulative Gain (NDCG) at k=10", calculate_NDCG(sorted_indices, test_y)),
    ("Precision at k (P@k) at k=10", calculate_PAK(sorted_indices, test_y)),
    ("Recall at k (R@k) at k=10", calculate_RAK(sorted_indices, test_y))
]

# Print measurements in a table format
print("Best Fit Random Forest measurements")
print(tabulate(best_rf_measurements, headers=["Metric", "Value"]))

Best Fit Random Forest measurements
Metric                                                   Value
----------------------------------------------------  --------
Accuracy                                              0.901408
ROC AUC Score                                         0.659905
Mean Reciprocal Rank (MRR)                            1
Mean Average Precision (MAP)                          0.110052
Normalized Discounted Cumulative Gain (NDCG) at k=10  0.538886
Precision at k (P@k) at k=10                          0.4
Recall at k (R@k) at k=10                             0.137931


# Comparative Analysis of Baseline and Best-Fit Random Forest Models for Ranking Prediction

The comparison between the baseline and best-fit Random Forest models reveals notable differences in performance across various metrics. 

Here's a comparative analysis of the baseline and best-fit Random Forest models for ranking prediction based on the provided metrics:

1. **Accuracy:**
   - Baseline: 0.619718
   - Best Fit Random Forest: 0.901408
   - Interpretation: The best-fit Random Forest model significantly outperforms the baseline model in terms of accuracy, indicating that it makes more accurate predictions overall. <br>



2. **ROC AUC Score:**
   - Baseline: 0.696552
   - Best Fit Random Forest: 0.659905
   - Interpretation: The baseline model has a higher ROC AUC score compared to the best-fit Random Forest model. This suggests that the baseline model has a better ability to distinguish between positive and negative instances. <br>



3. **Mean Reciprocal Rank (MRR):**
   - Baseline: 0.125
   - Best Fit Random Forest: 1
   - Interpretation: The best-fit Random Forest model achieves a perfect Mean Reciprocal Rank, indicating that it ranks the correct class at the top position for all predictions. This is a significant improvement over the baseline model. <br>



4. **Mean Average Precision (MAP):**
   - Baseline: 0.115021
   - Best Fit Random Forest: 0.110052
   - Interpretation: The best-fit Random Forest model achieves a slightly lower Mean Average Precision compared to the baseline model. However, the difference is relatively small. <br>



5. **Normalized Discounted Cumulative Gain (NDCG) at k=10:**
   - Baseline: 0.0694312
   - Best Fit Random Forest: 0.538886
   - Interpretation: The best-fit Random Forest model achieves a significantly higher NDCG score at k=10 compared to the baseline model. This indicates that the best-fit model provides better ranking quality for the top 10 predictions. <br>



6. **Precision at k (P@k) at k=10:**
   - Baseline: 0.1
   - Best Fit Random Forest: 0.4
   - Interpretation: The best-fit Random Forest model achieves a much higher Precision at k=10 compared to the baseline model, indicating that it is more precise in identifying positive instances among the top 10 predictions. <br>



7. **Recall at k (R@k) at k=10:**
   - Baseline: 0.0344828
   - Best Fit Random Forest: 0.137931
   - Interpretation: The best-fit Random Forest model achieves a significantly higher Recall at k=10 compared to the baseline model, indicating that it captures a larger proportion of positive instances among the top 10 predictions.<br>



Overall, the best-fit Random Forest model demonstrates substantial improvements across most metrics compared to the baseline model, particularly in terms of accuracy, mean reciprocal rank, NDCG, precision at k, and recall at k.
