# ‚öñÔ∏è Gatekeeper CTF Challenge ‚Äî Level 5: The Weight of Truth

## The Story So Far

The reformed thief ‚Äî now a respected data engineer at the bank ‚Äî has proven himself time and again. After fixing the broken Sentinel in Level 4, the bank's leadership noticed his rare talent for understanding systems from the inside out.

One evening, the Chief Data Officer called him into a private meeting room.

*"We have a situation,"* she said, sliding a folder across the table. *"Our core prediction engine ‚Äî the one that drives all downstream decisions ‚Äî was built by a mathematician who left the company years ago. He didn't document anything. No saved model, no training logs, no notes. All we have is the data he used."*

She paused.

*"But here's the thing. We know the model was a perfect linear system. Zero residual error. Every prediction matched the output exactly. We just don't know the exact weights he used."*

She leaned forward.

*"I need you to recover those weights. Look at the data. Use whatever tools, techniques, or methods you want ‚Äî code, math, ML libraries, brute force, anything. Just find the exact weights that produce a perfect fit."*

---

## üéØ Your Mission

The original model is a **linear regression** ‚Äî it takes in input features and produces an output using a set of **weights (coefficients)** and a **bias (intercept)**.

Your task is to **find the exact weights and bias** that the original model used.

### üìå Key Facts
- The dataset contains **input features** and a **target output** (`y`)
- The original model achieved **absolute 0 residual error** and an **R¬≤ score of exactly 1.0** ‚Äî a mathematically perfect fit
- This means there is an **exact, unique set of weights** that perfectly maps inputs to outputs
- You may use **any method** you want ‚Äî mathematical reasoning, coding, ML libraries, visualisation, trial and error ‚Äî whatever works

### üîë Submission
Once you determine the weights, submit them using the **submission cell** at the bottom of this notebook.

The evaluation server will verify your weights against the original dataset. The flag is revealed **only** if:
- **R¬≤ = 1.0** (absolute, not 0.999...)
- **Residual error = 0** (absolute zero)

---

## üì¶ Step 1: Load and Explore the Dataset

Start by loading the dataset and understanding its structure. Look at the features and the target variable.

In [None]:
import pandas as pd
import numpy as np

# Load the dataset
df = pd.read_csv('dataset.csv')
print(f"Dataset shape: {df.shape}")
print(f"\nColumns: {list(df.columns)}")
print(f"\n--- First 10 rows ---")
df.head(10)

In [None]:
# Basic statistics
df.describe()

## üîç Step 2: Analyse the Data

Study the relationships between the input features and the target variable. Use any technique ‚Äî correlations, scatter plots, pattern recognition, or anything else that helps you understand how `y` is derived from `X1` and `X2`.

In [None]:
# Explore the data ‚Äî use any approach you like
# Some ideas: correlations, scatter plots, checking special rows, etc.

# Correlation matrix
print("Correlations:")
print(df.corr())

# Scatter plots
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].scatter(df['X1'], df['y'], color='steelblue', edgecolor='black')
axes[0].set_xlabel('X1'); axes[0].set_ylabel('y'); axes[0].set_title('X1 vs y')
axes[0].grid(True, alpha=0.3)

axes[1].scatter(df['X2'], df['y'], color='coral', edgecolor='black')
axes[1].set_xlabel('X2'); axes[1].set_ylabel('y'); axes[1].set_title('X2 vs y')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## üß© Step 3: Find the Weights

Use whatever method or technique you prefer to determine the model's weights (coefficients) and bias (intercept). Write your solution below.

In [None]:
# ============================================================
# YOUR SOLUTION HERE
# ============================================================
# Build a linear regression model that perfectly fits the data.
# Use any method you want ‚Äî sklearn, numpy, manual math, etc.
#
# Your model must achieve:
#   - R¬≤ = 1.0 (absolute)
#   - Residual error = 0 (absolute)
#
# Example using sklearn:
# from sklearn.linear_model import LinearRegression
# model = LinearRegression()
# model.fit(df[['X1', 'X2']], df['y'])

# --- Write your solution below ---

model = None  # Replace with your trained model

## ‚úÖ Step 4: Verify Your Weights

If your weights are correct, the model should produce **zero residual error** across every row and an **R¬≤ score of exactly 1.0**.

Run the verification cell below to check.

In [None]:
from sklearn.metrics import r2_score, mean_squared_error

if model is None:
    print("‚ùå You haven't built your model yet! Go back to Step 3.")
else:
    # Predict using your model
    y_pred = model.predict(df[['X1', 'X2']])
    
    # Calculate residuals
    residuals = df['y'].values - y_pred
    
    # Metrics
    r2 = r2_score(df['y'], y_pred)
    mse = mean_squared_error(df['y'], y_pred)
    
    print("=" * 50)
    print("        MODEL VERIFICATION REPORT")
    print("=" * 50)
    print(f"  R¬≤ Score        : {r2}")
    print(f"  MSE (residual)  : {mse}")
    print(f"  Max |residual|  : {max(abs(residuals))}")
    print("=" * 50)
    
    # Show model weights if available
    if hasattr(model, 'coef_'):
        print(f"\n  Coefficients    : {model.coef_}")
    if hasattr(model, 'intercept_'):
        print(f"  Intercept       : {model.intercept_}")
    
    if r2 == 1.0 and mse == 0.0:
        print("\nüéâ PERFECT FIT! Your model has recovered the exact weights!")
        print("   Proceed to Step 5 to submit your model.")
    else:
        print("\n‚ùå Not a perfect fit. Keep working on your model.")
        print("\nResiduals per row:")
        result_df = df[['X1', 'X2', 'y']].copy()
        result_df['y_pred'] = y_pred
        result_df['residual'] = residuals
        print(result_df)

## üìä Optional: Visualise the Fit

Plot the predicted values against the actual values. A perfect model will show all points lying exactly on the diagonal line $y = x$.

In [None]:
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Actual vs Predicted
axes[0].scatter(df['y'], df['y_pred'], color='steelblue', edgecolor='black', s=80, zorder=5)
min_val = min(df['y'].min(), df['y_pred'].min()) - 2
max_val = max(df['y'].max(), df['y_pred'].max()) + 2
axes[0].plot([min_val, max_val], [min_val, max_val], 'r--', linewidth=1.5, label='Perfect fit (y=x)')
axes[0].set_xlabel('Actual y', fontsize=12)
axes[0].set_ylabel('Predicted y', fontsize=12)
axes[0].set_title('Actual vs Predicted', fontsize=14)
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Plot 2: Residuals
axes[1].bar(range(len(df)), df['residual'], color='coral', edgecolor='black')
axes[1].axhline(y=0, color='black', linewidth=1)
axes[1].set_xlabel('Data Point Index', fontsize=12)
axes[1].set_ylabel('Residual (y - y_pred)', fontsize=12)
axes[1].set_title('Residuals', fontsize=14)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

## üì§ Step 5: Submit Your Model

Run the cell below to save your model as a `.pkl` file and submit it to the evaluation server.

The server will load your model, run it on the original dataset, and check the results.

**The flag is revealed only if R¬≤ = 1.0 (absolute) and residual error = 0 (absolute).**

In [None]:
import requests
import joblib
import os

# ============================================================
# SUBMISSION ‚Äî saves your model and uploads it to the server
# ============================================================

# Set the evaluation server URL (change if running elsewhere)
EVAL_URL = "http://localhost:5000/challenge-5"
USERNAME   = ""  # Enter your CTF platform username here
PASSWORD   = ""  # Enter your CTF platform password here
MODEL_PATH = "model.pkl"

if model is None:
    print("‚ùå You haven't built your model yet! Go back to Step 3.")
else:
    # Save the model to a .pkl file
    joblib.dump(model, MODEL_PATH)
    print(f"‚úÖ Model saved to {MODEL_PATH}")

    print("\n" + "=" * 50)
    print("   SUBMITTING MODEL TO EVALUATION SERVER")
    print("=" * 50)

    try:
        with open(MODEL_PATH, "rb") as f:
            response = requests.post(
                EVAL_URL,
                files={"file": ("model.pkl", f, "application/octet-stream")},
        data={"username": USERNAME, "password": PASSWORD},
                timeout=30
            )
        result = response.json()

        print(f"\nüìä Result: {result['result'].upper()}")
        print(f"   {result['message']}")

        if result.get('r2_score') is not None:
            print(f"\n   R¬≤ Score       : {result['r2_score']}")
        if result.get('max_abs_error') is not None:
            print(f"   Max |Error|    : {result['max_abs_error']}")
        if result.get('mse') is not None:
            print(f"   MSE            : {result['mse']}")
        if result.get('coefficients') is not None:
            print(f"   Coefficients   : {result['coefficients']}")
        if result.get('intercept') is not None:
            print(f"   Intercept      : {result['intercept']}")

        if result.get('flag'):
            print(f"\nüè¥ FLAG: {result['flag']}")

    except requests.exceptions.ConnectionError:
        print(f"\n‚ùå Could not connect to the evaluation server at {EVAL_URL}")
        print("   Make sure the server is running.")
    except Exception as e:
        print(f"\n‚ùå Error: {e}")