# Will You Get Hired? Let's Ask the Data

## Job Placement Prediction | EDA | Statistics | ML

---

**Ah yes, the job hunt.** That beautiful period of life where you question every decision you've ever made while refreshing LinkedIn 47 times a day.

But what if we could predict who gets placed and who doesn't? What actually matters - your grades? Your MBA score? That one internship you barely survived? Your ability to not panic during interviews?

**This dataset has 215 candidates** with everything from their 10th grade scores to their interview performance. Some got placed. Some didn't. Let's figure out why.

**What we're working with:**
- Education history (SSC, HSC, Degree, MBA percentages)
- Work experience & internships
- Skills match, certifications
- Interview scores
- Company tier & competition level

**The goal:** Build a model that predicts placement. And maybe learn what actually matters in this whole hiring game.

Let's go.

---

## Part 1: Setup

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import (classification_report, confusion_matrix,
                            accuracy_score, precision_score, recall_score,
                            f1_score, roc_auc_score, roc_curve)

import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("Ready to predict some futures.")

Standard toolkit loaded. Nothing fancy - just the weapons every data scientist needs to answer life's big questions. Like "why didn't they call me back?"

---

In [None]:
df = pd.read_csv('Job_Placement_Data_Enhanced.csv')

print(f"Dataset: {df.shape[0]} candidates, {df.shape[1]} features")
print(f"\nFeatures: {list(df.columns)}")

**215 candidates, 20 features.** That's their entire professional existence reduced to 20 numbers and categories. Beautiful and slightly terrifying.

---

In [None]:
df.head(10)

**First look at our candidates.** SSC = 10th grade, HSC = 12th grade. We've got their entire academic journey plus work experience, interview scores, and whether they landed the job. 

Notice the mix - some high scorers got placed, but so did some average ones. Already interesting.

---

## Part 2: EDA - Who Gets Hired?

In [None]:
# The big question: how many made it?
status_counts = df['status'].value_counts()

print("PLACEMENT RESULTS")
print("="*40)
print(f"Placed:     {status_counts['Placed']} ({status_counts['Placed']/len(df)*100:.1f}%)")
print(f"Not Placed: {status_counts['Not Placed']} ({status_counts['Not Placed']/len(df)*100:.1f}%)")

fig, ax = plt.subplots(figsize=(8, 5))
colors = ['#2ecc71', '#e74c3c']
bars = ax.bar(['Placed\n(Got the job!)', 'Not Placed\n(Back to LinkedIn)'], 
              [status_counts['Placed'], status_counts['Not Placed']], 
              color=colors, edgecolor='black', linewidth=2)
ax.set_ylabel('Number of Candidates')
ax.set_title('The Harsh Reality of Job Placement')
for bar, count in zip(bars, [status_counts['Placed'], status_counts['Not Placed']]):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 2, 
            str(count), ha='center', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

**69% got placed, 31% didn't.** Not terrible odds, but that 31% is someone's sleepless nights and existential crisis. 

Also note: this is an imbalanced dataset. More placed than not. Our models might get lazy and just predict "Placed" for everyone. We'll watch for that.

---

In [None]:
# Quick data health check
print("DATA HEALTH CHECK")
print("="*40)
print(f"\nMissing values: {df.isnull().sum().sum()}")
print(f"Duplicates: {df.duplicated().sum()}")

print(f"\nNumerical features: {len(df.select_dtypes(include=[np.number]).columns)}")
print(f"Categorical features: {len(df.select_dtypes(include=['object']).columns)}")

**Clean data.** No missing values, no duplicates. Someone did the preprocessing for us. Bless them.

Mix of numerical (scores, percentages) and categorical (gender, board, degree type) features. We'll need to encode the categorical ones later.

---

In [None]:
# Numerical features summary
num_cols = df.select_dtypes(include=[np.number]).columns.tolist()
print("NUMERICAL FEATURES SUMMARY")
print("="*50)
df[num_cols].describe().round(2)

**The numbers behind the candidates:**
- SSC/HSC/Degree percentages: 40s to 90s range. Normal academic spread.
- Interview scores: 50-98. Wide range - some crushed it, some choked.
- Skills match: 40-98%. How well your skills fit the job.
- Years experience: 0-5. Fresh grads to slightly seasoned folks.
- Certifications: 0-5. The "I took online courses" metric.

---

In [None]:
# Categorical features breakdown
cat_cols = df.select_dtypes(include=['object']).columns.tolist()
cat_cols.remove('status')  # That's our target

print("CATEGORICAL FEATURES BREAKDOWN")
print("="*50)
for col in cat_cols:
    print(f"\n{col}:")
    print(df[col].value_counts().to_string())

**The categorical breakdown:**
- Gender: More males in the dataset (typical in many fields)
- Boards: Central vs Others (education board types)
- Degrees: Commerce & Management dominates, followed by Sci&Tech
- Specialization: Marketing & Finance vs Marketing & HR
- Company tiers: MNC, Mid-tier, Startup
- Competition: Low, Medium, High

---

In [None]:
# The money question: What's different between Placed and Not Placed?
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Key numerical features comparison
key_features = ['mba_percent', 'interview_score', 'skills_match_percent', 
                'emp_test_percentage', 'degree_percentage', 'years_experience']

for idx, col in enumerate(key_features):
    ax = axes[idx//3, idx%3]
    placed = df[df['status']=='Placed'][col]
    not_placed = df[df['status']=='Not Placed'][col]
    
    ax.hist(placed, bins=15, alpha=0.6, color='green', label='Placed', edgecolor='black')
    ax.hist(not_placed, bins=15, alpha=0.6, color='red', label='Not Placed', edgecolor='black')
    ax.set_xlabel(col)
    ax.set_ylabel('Count')
    ax.set_title(f'{col} Distribution')
    ax.legend()

plt.suptitle('Placed vs Not Placed: Key Features Comparison', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

**Interesting patterns emerging:**
- **MBA percent:** Placed folks tend to score higher. Makes sense.
- **Interview score:** This one's important - placed candidates cluster higher.
- **Skills match:** Higher match = better chances. Duh.
- **Years experience:** Surprisingly, not a huge separator.
- **Degree percentage:** Some overlap, but placed folks skew higher.

**Early hypothesis:** Interview score and skills match might be key predictors.

---

In [None]:
# Categorical features vs Placement
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

cat_features = ['gender', 'work_experience', 'specialisation', 
                'company_tier', 'job_competition_level', 'undergrad_degree']

for idx, col in enumerate(cat_features):
    ax = axes[idx//3, idx%3]
    cross = pd.crosstab(df[col], df['status'], normalize='index') * 100
    cross.plot(kind='bar', stacked=True, color=['#e74c3c', '#2ecc71'], ax=ax, edgecolor='black')
    ax.set_title(f'Placement Rate by {col}')
    ax.set_ylabel('Percentage')
    ax.legend(['Not Placed', 'Placed'], loc='upper right')
    ax.tick_params(axis='x', rotation=45)

plt.suptitle('How Categories Affect Your Chances', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

**Category insights:**
- **Gender:** Pretty similar rates. Good - no obvious bias here.
- **Work experience:** Having experience helps! (shocking, I know)
- **Specialization:** Mkt&Fin slightly edges out Mkt&HR
- **Company tier:** MNCs place better than startups? Interesting.
- **Competition level:** High competition = fewer placements. Math checks out.
- **Degree:** Sci&Tech and Comm&Mgmt have similar rates

---

In [None]:
# Correlation heatmap for numerical features
num_cols_for_corr = ['ssc_percentage', 'hsc_percentage', 'degree_percentage', 
                     'emp_test_percentage', 'mba_percent', 'years_experience',
                     'skills_match_percent', 'num_certifications', 'interview_score']

fig, ax = plt.subplots(figsize=(10, 8))
corr = df[num_cols_for_corr].corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
sns.heatmap(corr, mask=mask, annot=True, cmap='RdBu_r', center=0, 
            fmt='.2f', ax=ax, square=True)
ax.set_title('Feature Correlations')
plt.tight_layout()
plt.show()

**Correlation insights:**
- Academic scores correlate with each other (good student = consistently good)
- Interview score shows some independence - you can be book smart but interview poorly
- No extremely high correlations - features are reasonably independent

Good news: No major multicollinearity issues to worry about.

---

## Part 3: Statistical Study

Let's get serious. Which features ACTUALLY matter statistically?

In [None]:
# T-tests for numerical features
print("STATISTICAL SIGNIFICANCE: NUMERICAL FEATURES")
print("="*60)
print("\nH0: No difference between Placed and Not Placed")
print("H1: Significant difference exists")
print("-"*60)

placed = df[df['status'] == 'Placed']
not_placed = df[df['status'] == 'Not Placed']

stat_results = []
for col in num_cols_for_corr:
    t_stat, p_val = stats.ttest_ind(placed[col], not_placed[col])
    
    # Cohen's d
    pooled_std = np.sqrt((placed[col].std()**2 + not_placed[col].std()**2) / 2)
    cohens_d = (placed[col].mean() - not_placed[col].mean()) / pooled_std if pooled_std > 0 else 0
    
    stat_results.append({
        'Feature': col,
        'Placed_Mean': placed[col].mean(),
        'NotPlaced_Mean': not_placed[col].mean(),
        'Difference': placed[col].mean() - not_placed[col].mean(),
        'T_Stat': t_stat,
        'P_Value': p_val,
        'Cohens_d': cohens_d,
        'Significant': 'YES' if p_val < 0.05 else 'NO'
    })

stat_df = pd.DataFrame(stat_results).sort_values('P_Value')

for _, row in stat_df.iterrows():
    sig = "***" if row['Significant'] == 'YES' else "   "
    print(f"{sig} {row['Feature']:25s} | p={row['P_Value']:.4f} | d={row['Cohens_d']:+.3f} | {row['Significant']}")

print("-"*60)
print(f"\nSignificant features: {(stat_df['Significant']=='YES').sum()}/{len(stat_df)}")

**Statistical verdict on numerical features:**
- **p < 0.05** = statistically significant difference between placed and not placed
- **Cohen's d** = effect size (positive means placed folks score higher)

The significant features are our best predictors. Non-significant ones? They don't reliably distinguish the two groups.

---

In [None]:
# Chi-square tests for categorical features
print("STATISTICAL SIGNIFICANCE: CATEGORICAL FEATURES")
print("="*60)
print("\nChi-Square Tests for Independence")
print("-"*60)

chi_results = []
for col in cat_cols:
    contingency = pd.crosstab(df[col], df['status'])
    chi2, p_val, dof, expected = stats.chi2_contingency(contingency)
    
    # Cramer's V
    n = contingency.sum().sum()
    min_dim = min(contingency.shape) - 1
    cramers_v = np.sqrt(chi2 / (n * min_dim)) if min_dim > 0 else 0
    
    chi_results.append({
        'Feature': col,
        'Chi2': chi2,
        'P_Value': p_val,
        'Cramers_V': cramers_v,
        'Significant': 'YES' if p_val < 0.05 else 'NO'
    })

chi_df = pd.DataFrame(chi_results).sort_values('P_Value')

for _, row in chi_df.iterrows():
    sig = "***" if row['Significant'] == 'YES' else "   "
    print(f"{sig} {row['Feature']:25s} | p={row['P_Value']:.4f} | V={row['Cramers_V']:.3f} | {row['Significant']}")

print("-"*60)
print(f"\nSignificant features: {(chi_df['Significant']=='YES').sum()}/{len(chi_df)}")

**Chi-square results:**
- Tests if categorical features are independent of placement status
- **CramÃ©r's V** = strength of association (0 = none, 1 = perfect)

Significant categorical features actually influence whether you get placed. Non-significant ones? Random noise.

---

In [None]:
# Visualize effect sizes
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Cohen's d for numerical
stat_df_sorted = stat_df.sort_values('Cohens_d')
colors = ['#2ecc71' if d > 0 else '#e74c3c' for d in stat_df_sorted['Cohens_d']]
axes[0].barh(stat_df_sorted['Feature'], stat_df_sorted['Cohens_d'], color=colors, edgecolor='black')
axes[0].axvline(x=0, color='black', linewidth=1)
axes[0].axvline(x=0.5, color='orange', linestyle='--', alpha=0.7, label='Medium effect')
axes[0].axvline(x=-0.5, color='orange', linestyle='--', alpha=0.7)
axes[0].set_xlabel("Cohen's d")
axes[0].set_title('Effect Size: Numerical Features\n(Green = Placed higher, Red = Not Placed higher)')
axes[0].legend()

# Cramer's V for categorical
chi_df_sorted = chi_df.sort_values('Cramers_V')
colors_v = ['#2ecc71' if row['Significant']=='YES' else '#95a5a6' for _, row in chi_df_sorted.iterrows()]
axes[1].barh(chi_df_sorted['Feature'], chi_df_sorted['Cramers_V'], color=colors_v, edgecolor='black')
axes[1].axvline(x=0.1, color='orange', linestyle='--', alpha=0.7, label='Small effect (0.1)')
axes[1].set_xlabel("Cramer's V")
axes[1].set_title('Effect Size: Categorical Features\n(Green = Significant)')
axes[1].legend()

plt.tight_layout()
plt.show()

**Effect size visualization:**
- Bigger bars = stronger relationship with placement
- Green numerical features = placed candidates score higher
- These are the features our models should lean on

---

## Part 4: Model Building

Time to see if machines can predict who gets hired.

In [None]:
# Prepare data
df_model = df.copy()

# Encode categorical variables
label_encoders = {}
for col in cat_cols:
    le = LabelEncoder()
    df_model[col] = le.fit_transform(df_model[col])
    label_encoders[col] = le

# Encode target
le_target = LabelEncoder()
df_model['status'] = le_target.fit_transform(df_model['status'])
print(f"Target encoding: {dict(zip(le_target.classes_, le_target.transform(le_target.classes_)))}")

# Features and target
X = df_model.drop('status', axis=1)
y = df_model['status']

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Scale
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"\nTraining: {len(X_train)} | Test: {len(X_test)}")
print(f"Features: {X.shape[1]}")

**Data prep done:**
- Categorical features encoded to numbers
- 80/20 train/test split
- Stratified to maintain class balance
- Scaled for algorithms that need it

---

In [None]:
# Model battle
models = {
    'Logistic Regression': (LogisticRegression(max_iter=1000, random_state=42), True),
    'Decision Tree': (DecisionTreeClassifier(random_state=42), False),
    'Random Forest': (RandomForestClassifier(n_estimators=100, random_state=42), False),
    'Gradient Boosting': (GradientBoostingClassifier(random_state=42), False),
    'K-Nearest Neighbors': (KNeighborsClassifier(n_neighbors=5), True),
    'SVM': (SVC(kernel='rbf', probability=True, random_state=42), True),
    'Naive Bayes': (GaussianNB(), True)
}

results = []
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

print("MODEL BATTLE")
print("="*70)
print(f"{'Model':<25} | {'Test Acc':>10} | {'CV Mean':>10} | {'F1':>8} | {'ROC-AUC':>8}")
print("-"*70)

for name, (model, use_scaled) in models.items():
    X_tr = X_train_scaled if use_scaled else X_train
    X_te = X_test_scaled if use_scaled else X_test
    
    model.fit(X_tr, y_train)
    y_pred = model.predict(X_te)
    y_prob = model.predict_proba(X_te)[:, 1] if hasattr(model, 'predict_proba') else None
    
    acc = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    roc = roc_auc_score(y_test, y_prob) if y_prob is not None else 0
    
    cv_X = X_train_scaled if use_scaled else X_train
    cv_scores = cross_val_score(model, cv_X, y_train, cv=cv)
    
    results.append({
        'Model': name,
        'Test_Acc': acc,
        'CV_Mean': cv_scores.mean(),
        'CV_Std': cv_scores.std(),
        'F1': f1,
        'ROC_AUC': roc
    })
    
    print(f"{name:<25} | {acc*100:>9.2f}% | {cv_scores.mean()*100:>9.2f}% | {f1:>7.3f} | {roc:>7.3f}")

print("-"*70)

**Model comparison:**
- **Test Acc:** Performance on held-out data
- **CV Mean:** 5-fold cross-validation (more reliable)
- **F1:** Balance of precision and recall
- **ROC-AUC:** Overall discrimination ability

Higher is better across all metrics. Let's see who won.

---

In [None]:
# Leaderboard
results_df = pd.DataFrame(results).sort_values('Test_Acc', ascending=False)

print("\n" + "="*60)
print("FINAL LEADERBOARD")
print("="*60)

for rank, (_, row) in enumerate(results_df.iterrows(), 1):
    medal = "CHAMPION" if rank == 1 else f"#{rank}"
    print(f"[{medal:^8}] {row['Model']:<25} | {row['Test_Acc']*100:.2f}%")

champion = results_df.iloc[0]
print(f"\nWINNER: {champion['Model']}")
print(f"Test Accuracy: {champion['Test_Acc']*100:.2f}%")
print(f"ROC-AUC: {champion['ROC_AUC']:.3f}")

**We have a winner!** The best model for predicting job placement has been crowned.

---

In [None]:
# Visual comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Accuracy comparison
colors = ['gold' if acc == results_df['Test_Acc'].max() else 'steelblue' for acc in results_df['Test_Acc']]
axes[0].barh(results_df['Model'], results_df['Test_Acc'], color=colors, edgecolor='black')
axes[0].set_xlabel('Accuracy')
axes[0].set_title('Test Accuracy')
axes[0].set_xlim(0.6, 1.0)
for i, (_, row) in enumerate(results_df.iterrows()):
    axes[0].text(row['Test_Acc'] + 0.01, i, f"{row['Test_Acc']*100:.1f}%", va='center', fontweight='bold')

# CV with error bars
axes[1].barh(results_df['Model'], results_df['CV_Mean'], 
             xerr=results_df['CV_Std'], color='steelblue', edgecolor='black', capsize=5)
axes[1].set_xlabel('Accuracy')
axes[1].set_title('Cross-Validation Accuracy (with std)')
axes[1].set_xlim(0.6, 1.0)

plt.tight_layout()
plt.show()

**Visual leaderboard:**
- Gold bar = champion
- Error bars show stability across CV folds
- Smaller error bars = more reliable performance

---

In [None]:
# Best model deep dive
best_name = results_df.iloc[0]['Model']
best_model = models[best_name][0]
use_scaled = models[best_name][1]

X_te_final = X_test_scaled if use_scaled else X_test
y_pred_final = best_model.predict(X_te_final)
y_prob_final = best_model.predict_proba(X_te_final)[:, 1]

print(f"CHAMPION MODEL: {best_name}")
print("="*60)
print("\nClassification Report:")
print(classification_report(y_test, y_pred_final, target_names=['Not Placed', 'Placed']))

**Classification report breakdown:**
- **Precision for Placed:** When we predict "Placed", how often are we right?
- **Recall for Placed:** Of everyone who actually got placed, how many did we catch?
- **Precision for Not Placed:** When we predict rejection, how often are we right?
- **Recall for Not Placed:** Of everyone rejected, how many did we identify?

---

In [None]:
# Confusion matrix and ROC
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Confusion matrix
cm = confusion_matrix(y_test, y_pred_final)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=axes[0],
            xticklabels=['Not Placed', 'Placed'], 
            yticklabels=['Not Placed', 'Placed'],
            annot_kws={'size': 16, 'weight': 'bold'})
axes[0].set_xlabel('Predicted')
axes[0].set_ylabel('Actual')
axes[0].set_title(f'Confusion Matrix: {best_name}')

# ROC Curve
fpr, tpr, _ = roc_curve(y_test, y_prob_final)
auc = roc_auc_score(y_test, y_prob_final)
axes[1].plot(fpr, tpr, 'b-', linewidth=2, label=f'ROC (AUC = {auc:.3f})')
axes[1].plot([0, 1], [0, 1], 'k--', label='Random Guess')
axes[1].fill_between(fpr, tpr, alpha=0.3)
axes[1].set_xlabel('False Positive Rate')
axes[1].set_ylabel('True Positive Rate')
axes[1].set_title('ROC Curve')
axes[1].legend()

plt.tight_layout()
plt.show()

print(f"\nConfusion Matrix:")
print(f"  Correctly predicted NOT placed: {cm[0,0]}")
print(f"  Correctly predicted placed:     {cm[1,1]}")
print(f"  Missed placements (False Neg):  {cm[1,0]}")
print(f"  False hopes (False Pos):        {cm[0,1]}")

**The verdict:**
- **True Negatives:** Correctly told people they wouldn't get placed
- **True Positives:** Correctly predicted placements
- **False Negatives:** Said "no" to people who actually got placed (sad)
- **False Positives:** Gave false hope (also sad, differently)

---

In [None]:
# Feature importance (for tree-based models)
if hasattr(best_model, 'feature_importances_'):
    feat_imp = pd.DataFrame({
        'Feature': X.columns,
        'Importance': best_model.feature_importances_
    }).sort_values('Importance', ascending=True)
else:
    # Use Random Forest for feature importance
    rf = RandomForestClassifier(n_estimators=100, random_state=42)
    rf.fit(X_train, y_train)
    feat_imp = pd.DataFrame({
        'Feature': X.columns,
        'Importance': rf.feature_importances_
    }).sort_values('Importance', ascending=True)

fig, ax = plt.subplots(figsize=(10, 8))
colors = plt.cm.RdYlGn(np.linspace(0.2, 0.8, len(feat_imp)))
ax.barh(feat_imp['Feature'], feat_imp['Importance'], color=colors, edgecolor='black')
ax.set_xlabel('Importance')
ax.set_title('What Actually Matters for Getting Hired?')
plt.tight_layout()
plt.show()

print("\nTOP 5 FEATURES FOR PLACEMENT:")
for _, row in feat_imp.tail(5).iloc[::-1].iterrows():
    print(f"  {row['Feature']:25s}: {row['Importance']:.4f}")

**The hiring cheat code (sort of):**

These are the features that matter most according to the model. If you want to get placed, focus on the top ones. Or don't - I'm a notebook, not your career counselor.

---

## Part 5: Conclusion

What did we learn about getting hired?

In [None]:
print("\n" + "="*70)
print("                         FINAL SUMMARY")
print("="*70)

print(f"""
DATASET:
--------
- 215 candidates, 19 features
- Placed: {status_counts['Placed']} ({status_counts['Placed']/len(df)*100:.1f}%) | Not Placed: {status_counts['Not Placed']} ({status_counts['Not Placed']/len(df)*100:.1f}%)

STATISTICAL INSIGHTS:
--------------------
- {(stat_df['Significant']=='YES').sum()}/{len(stat_df)} numerical features significantly different between groups
- {(chi_df['Significant']=='YES').sum()}/{len(chi_df)} categorical features significantly associated with placement

MODEL PERFORMANCE:
-----------------
- Champion: {champion['Model']}
- Test Accuracy: {champion['Test_Acc']*100:.2f}%
- ROC-AUC: {champion['ROC_AUC']:.3f}

KEY FINDINGS:
------------""")

print(f"Top 3 important features: {', '.join(feat_imp.tail(3)['Feature'].tolist()[::-1])}")

print(f"""
REAL TALK:
---------
- No single feature guarantees placement
- It's a combination of academics, skills, and interview performance
- Work experience helps but isn't everything
- The model is ~{champion['Test_Acc']*100:.0f}% accurate - not perfect, but useful

Remember: This is data, not destiny. Models can predict probabilities,
not determine your future. Keep grinding.
""")

print("="*70)
print("                      GOOD LUCK OUT THERE")
print("="*70)

---

## The Bottom Line

We took 215 candidates, analyzed their entire academic and professional history, ran statistical tests, and threw 7 machine learning algorithms at the problem.

**What actually matters for getting placed:**
- Your academic scores DO matter (but aren't everything)
- Interview performance is crucial
- Skills match and relevant experience help
- It's a combination, not a single magic number

**What the model achieved:**
- Solid prediction accuracy
- Can identify likely placements with reasonable confidence
- Not perfect - humans are messy and unpredictable

**Final thought:** 
Machine learning can find patterns in placement data, but it can't capture everything that makes someone hireable - your energy in the interview, your ability to connect with people, that random project you did that perfectly matched what the company needed.

Data is a guide, not a guarantee. Use it wisely.

---

**Connect:** [GitHub](https://github.com/Rekhii) | [Kaggle](https://kaggle.com/seki32)