These results provide a detailed view of how well your logistic regression model distinguishes between the two classes (0 and 1) on your validation and test data. Here's what the numbers tell you about the experiment:

## 1. **Overall Performance**

- **Accuracy:**  
  - Validation: ~91%  
  - Test: ~85%  
  The model correctly classifies approximately 91% of validation samples and 85% of test samples, which shows strong but slightly reduced performance on unseen test data—a common and expected result.

- **Interpretation:**  
  High accuracy on both sets indicates your model generally makes good predictions with modest performance drop on test data (which is normal).

## 2. **Precision (Positive Predictive Value)**

- Validation Class 0: 91%  
- Validation Class 1: 90%  
- Test Class 0: 83%  
- Test Class 1: 87%  

**Meaning:**  
When the model predicts a sample as belonging to a class, it's correct 83–91% of the time depending on class and set. Precision is slightly better on validation, which suggests the model has learned well and generalizes fairly but also makes modestly more false positives on test data.

## 3. **Recall (Sensitivity or True Positive Rate)**

- Validation Class 0: 87%  
- Validation Class 1: 93%  
- Test Class 0: 83%  
- Test Class 1: 87%  

**Meaning:**  
The model detects 83–93% of actual positives correctly. On validation, recall for Class 1 is higher (fewer false negatives), but it drops for Class 0 on test, indicating the model misses more true negatives or positives in test compared to validation.

## 4. **F1-Score (Harmonic Mean of Precision and Recall)**

- Validation Class 0: 89%  
- Validation Class 1: 92%  
- Test Class 0: 83%  
- Test Class 1: 87%  

**Meaning:**  
Balances precision and recall. High values (>80%) mean your model balances false positives and false negatives reasonably well, with slightly stronger performance on validation.

## 5. **ROC-AUC (Area under Receiver Operating Characteristic Curve)**

- Validation: 0.9681  
- Test: 0.9417  

**Meaning:**  
Highly discriminative model: 1.0 indicates perfect classification; values above 0.9 are considered excellent. It shows your model robustly distinguishes between the classes across thresholds.

## 6. **PRC-AUC (Area under Precision-Recall Curve)**

- Validation: 0.9748  
- Test: 0.9606  

**Meaning:**  
Strong performance particularly for the positive class, excellent for imbalanced tasks: your model maintains high precision and recall across thresholds.

## **Summary Interpretation**

- **Strong classification model:** Logistic regression effectively separates classes with high precision and recall.  
- **Generalizes well:** Slight performance drop on test compared to validation is expected but not dramatic.  
- **Balanced errors:** Comparable precision and recall suggest no strong bias toward false positives or false negatives.  
- **Potential improvements:** Less than perfect accuracy and recall on test suggests room for improvement, possibly via feature engineering, more complex models, or data augmentation.



In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    accuracy_score, classification_report, roc_auc_score,
    precision_recall_curve, auc
)

# === Replace these with your actual raw GitHub URLs ===
# # NON Member: 1
# files_class1 = [
#    X "https://raw.githubusercontent.com/large-lang-model/Mech-MIA/main/data/csv/knowman-book.csv",
#     "https://raw.githubusercontent.com/large-lang-model/Mech-MIA/main/data/csv/replica4.csv"
# ]
# # MEMBER: 0
# files_class0 = [
#     "https://raw.githubusercontent.com/large-lang-model/Mech-MIA/main/data/csv/worlds_facts.csv",
#   X  "https://raw.githubusercontent.com/large-lang-model/Mech-MIA/main/data/csv/real_authers.csv"
# ]



# NON Member: 1
files_class1 = [

    "https://raw.githubusercontent.com/large-lang-model/Mech-MIA/main/data/csv/replica4.csv"
]
# MEMBER: 0
files_class0 = [
    "https://raw.githubusercontent.com/large-lang-model/Mech-MIA/main/data/csv/worlds_facts.csv"

]
# ======================================================

# Load and label each file accordingly
dfs = []
for file in files_class1:
    df = pd.read_csv(
        file,
        engine='python',
        quotechar='"',
        on_bad_lines='skip',
        sep=',',
        encoding='utf-8'
    )
    df["target"] = 1
    dfs.append(df)
for file in files_class0:
    df = pd.read_csv(
        file,
        engine='python',
        quotechar='"',
        on_bad_lines='skip',
        sep=',',
        encoding='utf-8'
    )
    df["target"] = 0
    dfs.append(df)

# Combine into one dataframe
data = pd.concat(dfs).reset_index(drop=True)

# Features: keep only numeric columns
feature_cols = data.select_dtypes(include=['number']).columns.tolist()
# Remove 'target' if it was included (shouldn't be with select_dtypes but as a safeguard)
if 'target' in feature_cols:
    feature_cols.remove('target')

X = data[feature_cols]
y = data["target"]

# Train/val/test split (60/20/20)
X_train, X_temp, y_train, y_temp = train_test_split(
    X, y, test_size=0.4, stratify=y, random_state=42
)
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, stratify=y_temp, random_state=42
)

# Train logistic regression
clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)

# Evaluate
def eval_metrics(y_true, preds, probs):
    acc = accuracy_score(y_true, preds)
    roc_auc = roc_auc_score(y_true, probs)
    prec, recall, _ = precision_recall_curve(y_true, probs)
    prc_auc = auc(recall, prec)
    print(classification_report(y_true, preds))
    print(f"Accuracy: {acc:.4f}")
    print(f"ROC-AUC: {roc_auc:.4f}")
    print(f"PRC-AUC: {prc_auc:.4f}")

val_preds = clf.predict(X_val)
val_probs = clf.predict_proba(X_val)[:, 1]
print("\nValidation set metrics:")
eval_metrics(y_val, val_preds, val_probs)

test_preds = clf.predict(X_test)
test_probs = clf.predict_proba(X_test)[:, 1]
print("\nTest set metrics:")
eval_metrics(y_test, test_preds, test_probs)


Validation set metrics:
              precision    recall  f1-score   support

           0       0.91      0.87      0.89        23
           1       0.90      0.93      0.92        30

    accuracy                           0.91        53
   macro avg       0.91      0.90      0.90        53
weighted avg       0.91      0.91      0.91        53

Accuracy: 0.9057
ROC-AUC: 0.9681
PRC-AUC: 0.9748

Test set metrics:
              precision    recall  f1-score   support

           0       0.83      0.83      0.83        24
           1       0.87      0.87      0.87        30

    accuracy                           0.85        54
   macro avg       0.85      0.85      0.85        54
weighted avg       0.85      0.85      0.85        54

Accuracy: 0.8519
ROC-AUC: 0.9417
PRC-AUC: 0.9606
