# Experimental Setup & Evaluation (Unsupervised Detection) â€” MovieLens 1M

This notebook consolidates the experimental setup and evaluation protocol for unsupervised detection models.
It assumes the following artifacts already exist:

- `movielens1m_user_features.csv` (user-level features)
- `movielens1m_unsupervised_scores.csv` (model anomaly scores + ensemble score)

Main goals:
1) Define a consistent thresholding protocol for suspicious users (top-p% / percentile).
2) Provide label-free evaluation signals (stability, agreement, score distribution).
3) Produce paper-ready tables and optional qualitative audits using the original ratings.


In [1]:
import os
import numpy as np
import pandas as pd

ROOT = os.getcwd()
FEATURES_PATH = os.path.join(ROOT, "movielens1m_user_features.csv")
SCORES_PATH   = os.path.join(ROOT, "movielens1m_unsupervised_scores.csv")

# Optional: original MovieLens files for qualitative audit
RATINGS_PATH = os.path.join(ROOT, "ml-1m", "ratings.dat")

pd.set_option("display.max_columns", 200)


Load features + scores

In [2]:
features = pd.read_csv(FEATURES_PATH)
scores   = pd.read_csv(SCORES_PATH)

assert "user_id" in features.columns
assert "user_id" in scores.columns
assert "ensemble_score" in scores.columns

print("features:", features.shape)
print("scores  :", scores.shape)

scores.head()


features: (6040, 21)
scores  : (6040, 8)


Unnamed: 0,user_id,iso_score,svm_score,ae_score,iso_score_n,svm_score_n,ae_score_n,ensemble_score
0,4486,0.673695,1.648691,0.594374,1.0,1.0,1.0,1.0
1,3598,0.616883,1.648691,0.457052,0.819323,1.0,0.768188,0.862504
2,4463,0.604229,1.179753,0.427355,0.77908,0.86898,0.718057,0.788706
3,46,0.598529,1.011191,0.399165,0.760954,0.821884,0.67047,0.751103
4,164,0.668234,0.657408,0.307285,0.982632,0.723038,0.515367,0.740346


Merge features + scores (for analysis)

In [3]:
df = scores.merge(features, on="user_id", how="left")
print("merged df:", df.shape)
df.head()


merged df: (6040, 28)


Unnamed: 0,user_id,iso_score,svm_score,ae_score,iso_score_n,svm_score_n,ae_score_n,ensemble_score,num_ratings,mean_rating,std_rating,min_rating,max_rating,entropy_rating,ratio_1,ratio_5,extreme_ratio,mean_abs_dev,delta_mean_s,delta_std_s,profile_span_s,ratings_per_day,burst_ratio_10min,index,mean_item_pop,std_item_pop,min_item_pop,max_item_pop
0,4486,0.673695,1.648691,0.594374,1.0,1.0,1.0,1.0,36,1.083333,0.363242,1.0,3.0,0.365099,0.944444,0.0,0.944444,0.157407,5.057143,10.45519,177.0,17572.881356,0.972222,4485,513.416667,787.70794,23,2883
1,3598,0.616883,1.648691,0.457052,0.819323,1.0,0.768188,0.862504,60,1.016667,0.128019,1.0,2.0,0.122292,0.983333,0.0,0.983333,0.032778,14.79661,30.2928,873.0,5938.14433,0.983333,3597,719.083333,788.694941,28,2990
2,4463,0.604229,1.179753,0.427355,0.77908,0.86898,0.718057,0.788706,20,3.95,0.920598,2.0,5.0,1.782229,0.0,0.3,0.3,0.675,3487618.0,14796390.0,66264734.0,0.026077,0.9,4462,945.0,807.48226,66,2653
3,46,0.598529,1.011191,0.399165,0.760954,0.821884,0.67047,0.751103,38,4.368421,1.458569,1.0,5.0,0.629249,0.157895,0.842105,1.0,1.063712,17.24324,31.33895,638.0,5146.081505,0.973684,45,382.342105,416.07725,26,2250
4,164,0.668234,0.657408,0.307285,0.982632,0.723038,0.515367,0.740346,26,4.384615,0.624926,3.0,5.0,1.31432,0.0,0.461538,0.461538,0.568047,2145513.0,8082084.0,53637824.0,0.041881,0.846154,163,739.0,655.382918,63,2269


Thresholding protocol (paper-ready)
Helper: select top-p% and thresholds

In [4]:
def select_top_percent(df: pd.DataFrame, score_col: str, p: float) -> pd.DataFrame:
    assert 0 < p < 100
    thr = np.percentile(df[score_col].values, 100 - p)
    out = df[df[score_col] >= thr].copy()
    out = out.sort_values(score_col, ascending=False).reset_index(drop=True)
    return out, float(thr)

def threshold_summary(df: pd.DataFrame, score_col: str, ps=(1,2,5,10)):
    rows = []
    for p in ps:
        top, thr = select_top_percent(df, score_col, p)
        rows.append({
            "score_col": score_col,
            "p_top_percent": p,
            "threshold": thr,
            "num_flagged": len(top)
        })
    return pd.DataFrame(rows)

threshold_summary(df, "ensemble_score", ps=(1,2,5,10))


Unnamed: 0,score_col,p_top_percent,threshold,num_flagged
0,ensemble_score,1,0.461688,61
1,ensemble_score,2,0.414934,121
2,ensemble_score,5,0.347694,302
3,ensemble_score,10,0.293319,604


Create suspicious sets for multiple p values

In [5]:
P_LIST = [1, 2, 5]  # common choices for paper reporting

suspicious = {}
thresholds = {}

for p in P_LIST:
    top, thr = select_top_percent(df, "ensemble_score", p)
    suspicious[p] = top
    thresholds[p] = thr

for p in P_LIST:
    print(f"Top-{p}% | threshold={thresholds[p]:.6f} | flagged={len(suspicious[p])}")


Top-1% | threshold=0.461688 | flagged=61
Top-2% | threshold=0.414934 | flagged=121
Top-5% | threshold=0.347694 | flagged=302


Paper-ready table: top-20 suspicious users

In [6]:
P_FOR_TABLE = 2  # choose a single p to report in the main table (common: 2% or 5%)
TOP_N = 20

cols = ["user_id", "ensemble_score", "iso_score_n", "svm_score_n", "ae_score_n",
        "num_ratings", "mean_rating", "std_rating", "extreme_ratio", "burst_ratio_10min", "ratings_per_day"]

table_top = suspicious[P_FOR_TABLE][cols].head(TOP_N).copy()
table_top


Unnamed: 0,user_id,ensemble_score,iso_score_n,svm_score_n,ae_score_n,num_ratings,mean_rating,std_rating,extreme_ratio,burst_ratio_10min,ratings_per_day
0,4486,1.0,1.0,1.0,1.0,36,1.083333,0.363242,0.944444,0.972222,17572.881356
1,3598,0.862504,0.819323,1.0,0.768188,60,1.016667,0.128019,0.983333,0.983333,5938.14433
2,4463,0.788706,0.77908,0.86898,0.718057,20,3.95,0.920598,0.3,0.9,0.026077
3,46,0.751103,0.760954,0.821884,0.67047,38,4.368421,1.458569,1.0,0.973684,5146.081505
4,164,0.740346,0.982632,0.723038,0.515367,26,4.384615,0.624926,0.461538,0.846154,0.041881
5,5635,0.689863,0.781946,0.871207,0.416437,24,3.583333,0.953794,0.208333,0.875,0.024522
6,5411,0.673361,0.873057,0.802114,0.344913,23,2.826087,1.493527,0.478261,0.913043,0.045813
7,89,0.672197,0.83625,0.765165,0.415177,21,3.238095,1.659167,0.619048,0.857143,0.111451
8,2744,0.670874,0.676869,0.960783,0.374969,137,1.306569,1.050284,0.992701,0.992701,8096.30643
9,195,0.666262,0.54845,0.913641,0.536694,804,3.883085,1.015434,0.34204,0.449005,1.016332


### Label-free evaluation signals
Agreement across models (Top-K overlap)

In [7]:
def topk_set(df: pd.DataFrame, col: str, k: int) -> set:
    return set(df.sort_values(col, ascending=False).head(k)["user_id"].astype(int).tolist())

def jaccard(a: set, b: set) -> float:
    return len(a & b) / len(a | b) if len(a | b) else 0.0

K = 50
top_iso = topk_set(df, "iso_score_n", K)
top_svm = topk_set(df, "svm_score_n", K)
top_ae  = topk_set(df, "ae_score_n",  K)
top_ens = topk_set(df, "ensemble_score", K)

agreement = pd.DataFrame({
    "pair": ["ISO vs SVM", "ISO vs AE", "SVM vs AE", "ENS vs ISO", "ENS vs SVM", "ENS vs AE"],
    "jaccard_topK": [
        jaccard(top_iso, top_svm),
        jaccard(top_iso, top_ae),
        jaccard(top_svm, top_ae),
        jaccard(top_ens, top_iso),
        jaccard(top_ens, top_svm),
        jaccard(top_ens, top_ae),
    ]
})
agreement


Unnamed: 0,pair,jaccard_topK
0,ISO vs SVM,0.282051
1,ISO vs AE,0.162791
2,SVM vs AE,0.282051
3,ENS vs ISO,0.470588
4,ENS vs SVM,0.449275
5,ENS vs AE,0.408451


Stability across p (overlap between Top-1%, Top-2%, Top-5%)

In [8]:
def set_for_p(p: int) -> set:
    return set(suspicious[p]["user_id"].astype(int).tolist())

sets = {p: set_for_p(p) for p in P_LIST}

rows = []
for i, p1 in enumerate(P_LIST):
    for p2 in P_LIST[i+1:]:
        rows.append({
            "p1": p1,
            "p2": p2,
            "overlap_count": len(sets[p1] & sets[p2]),
            "jaccard": jaccard(sets[p1], sets[p2])
        })

pd.DataFrame(rows)


Unnamed: 0,p1,p2,overlap_count,jaccard
0,1,2,61,0.504132
1,1,5,61,0.201987
2,2,5,121,0.400662


Descriptive stats of ensemble scores (paper text support)

In [9]:
desc = df["ensemble_score"].describe(percentiles=[0.90, 0.95, 0.98, 0.99]).to_frame("ensemble_score")
desc

Unnamed: 0,ensemble_score
count,6040.0
mean,0.188484
std,0.082143
min,0.037357
50%,0.170668
90%,0.293319
95%,0.347694
98%,0.414934
99%,0.461688
max,1.0


### Practical selection of p (how to justify in the paper)
Create a compact threshold table for the paper

In [10]:
paper_thresholds = threshold_summary(df, "ensemble_score", ps=(1,2,5))
paper_thresholds

Unnamed: 0,score_col,p_top_percent,threshold,num_flagged
0,ensemble_score,1,0.461688,61
1,ensemble_score,2,0.414934,121
2,ensemble_score,5,0.347694,302


### Save paper-ready artifacts
Export tables (thresholds + top suspicious)

In [11]:
OUT_THRESHOLDS = os.path.join(ROOT, "paper_thresholds_ensemble.csv")
OUT_TOP_TABLE  = os.path.join(ROOT, f"paper_top{TOP_N}_suspicious_p{P_FOR_TABLE}.csv")
OUT_AGREEMENT  = os.path.join(ROOT, "paper_agreement_topK.csv")

paper_thresholds.to_csv(OUT_THRESHOLDS, index=False)
table_top.to_csv(OUT_TOP_TABLE, index=False)
agreement.to_csv(OUT_AGREEMENT, index=False)

print("Saved:")
print("-", OUT_THRESHOLDS)
print("-", OUT_TOP_TABLE)
print("-", OUT_AGREEMENT)


Saved:
- C:\Users\USUARIO\Desktop\app\paper_thresholds_ensemble.csv
- C:\Users\USUARIO\Desktop\app\paper_top20_suspicious_p2.csv
- C:\Users\USUARIO\Desktop\app\paper_agreement_topK.csv
