# Admissions Classifier

Binary classifier that predicts **Accepted** vs **Rejected** for undergraduate college applications using GPA, SAT score, and extracurricular activities.

**Project requirements (instructor)**: University/College Admissions — a pipeline from beginning to end. Beginning: a student comes to the university/college for an admission enquiry. End product: Admitted.

**Objective**: Train a neural network on the [Kaggle student admission dataset](https://www.kaggle.com/datasets/amanace/student-admission-dataset).

**Model type**: Binary classification (sigmoid output, binary cross-entropy).

**Workflow**:
1. Load data
2. Preprocess and binarize target (Accepted=1, Rejected/Waitlisted=0)
3. Train/test split, scale features
4. Build and train neural network
5. Evaluate and save model + scaler

## Run in the browser (no local setup)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/adzuci/admissions-classifier/blob/main/admissions_classifier.ipynb)

Works on **Colab**, **Windows**, and **Mac**. Colab fetches the dataset from GitHub. Local Mac: `pip install tensorflow-macos tensorflow-metal`. Local Windows: `pip install tensorflow`. Ensure `student_admission_dataset.csv` is in the folder if running locally (download from [Kaggle](https://www.kaggle.com/datasets/amanace/student-admission-dataset) if needed).

In [10]:
# Colab: install tensorflow, sklearn, etc. Local: pip install tensorflow-macos tensorflow-metal scikit-learn
try:
    import google.colab
    get_ipython().run_line_magic('pip', 'install -q tensorflow pandas joblib scikit-learn')
except (ImportError, NameError):
    pass  # local: deps assumed installed (Mac: tensorflow-macos+tensorflow-metal; Windows: tensorflow)

## STEP 1: Imports and SimpleScaler

In [11]:
# Mac: tensorflow-macos + tensorflow-metal (Metal GPU). Colab & Windows: standard tensorflow.
import joblib
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from tensorflow import keras
from tensorflow.keras import layers


class SimpleScaler:
    """Minimal scaler with fit/transform (no sklearn)."""

    def fit(self, X):
        self.mean_ = np.mean(X, axis=0)
        self.scale_ = np.std(X, axis=0)
        self.scale_[self.scale_ == 0] = 1.0
        return self

    def transform(self, X):
        return (X - self.mean_) / self.scale_


print(f"TensorFlow: {keras.__version__}")

TensorFlow: 3.12.1


## STEP 2: Load data

In [12]:
# Colab: fetch from GitHub. Local (Mac/Windows): use student_admission_dataset.csv
try:
    import google.colab
    csv_name = "https://raw.githubusercontent.com/adzuci/admissions-classifier/main/student_admission_dataset.csv"
except ImportError:
    csv_name = "student_admission_dataset.csv"

df = pd.read_csv(csv_name)
df.head(10)

Unnamed: 0,GPA,SAT_Score,Extracurricular_Activities,Admission_Status
0,3.46,1223,8,Rejected
1,2.54,974,8,Rejected
2,2.91,909,9,Rejected
3,2.83,1369,5,Accepted
4,3.6,1536,7,Accepted
5,3.52,1476,9,Accepted
6,3.84,1002,8,Rejected
7,2.63,975,6,Waitlisted
8,3.13,1450,8,Waitlisted
9,2.54,1118,7,Rejected


## STEP 3: Preprocess

In [13]:
# Binarize: Accepted=1, Rejected/Waitlisted=0
df["Admit"] = (df["Admission_Status"] == "Accepted").astype(int)

feature_cols = ["GPA", "SAT_Score", "Extracurricular_Activities"]
X = df[feature_cols].values.astype(np.float32)
y = df["Admit"].values

print(f"X shape: {X.shape}, y shape: {y.shape}")
print(f"Class balance: Admit=1 {y.sum()}, Admit=0 {len(y) - y.sum()}")

X shape: (250, 3), y shape: (250,)
Class balance: Admit=1 81, Admit=0 169


In [14]:
# Stratified train/test split (80/20) — preserves class balance
np.random.seed(42)
idx_1 = np.where(y == 1)[0]
idx_0 = np.where(y == 0)[0]
np.random.shuffle(idx_1)
np.random.shuffle(idx_0)
n_test_1 = max(1, int(len(idx_1) * 0.2))
n_test_0 = max(1, int(len(idx_0) * 0.2))
test_idx = np.concatenate([idx_1[:n_test_1], idx_0[:n_test_0]])
train_idx = np.concatenate([idx_1[n_test_1:], idx_0[n_test_0:]])
np.random.shuffle(train_idx)
np.random.shuffle(test_idx)

X_train, X_test = X[train_idx], X[test_idx]
y_train, y_test = y[train_idx], y[test_idx]

# Scale features
scaler = SimpleScaler()
scaler.fit(X_train)
X_train_s = scaler.transform(X_train)
X_test_s = scaler.transform(X_test)

## STEP 4: Build and train model

In [15]:
model = keras.Sequential([
    keras.Input(shape=(3,)),
    layers.Dense(32, activation="relu"),
    layers.Dropout(0.2),
    layers.Dense(16, activation="relu"),
    layers.Dropout(0.2),
    layers.Dense(1, activation="sigmoid"),
])
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001), loss="binary_crossentropy", metrics=["accuracy"])
model.summary()

In [16]:
# Class weights to handle imbalance (81 Accepted vs 169 Rejected/Waitlisted)
n_pos, n_neg = int(y_train.sum()), len(y_train) - int(y_train.sum())
class_weight = {0: 1.0, 1: n_neg / n_pos}

early_stop = keras.callbacks.EarlyStopping(
    monitor="val_loss", patience=15, restore_best_weights=True, verbose=0
)
history = model.fit(
    X_train_s, y_train,
    epochs=100, batch_size=16,
    validation_split=0.15,
    class_weight=class_weight,
    callbacks=[early_stop],
    verbose=1,
)

Epoch 1/100


[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 32ms/step - accuracy: 0.5471 - loss: 0.9761 - val_accuracy: 0.3871 - val_loss: 0.7393
Epoch 2/100
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.6176 - loss: 0.9219 - val_accuracy: 0.3871 - val_loss: 0.7372
Epoch 3/100
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - accuracy: 0.4529 - loss: 1.0558 - val_accuracy: 0.3871 - val_loss: 0.7285
Epoch 4/100
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.5000 - loss: 1.0122 - val_accuracy: 0.3871 - val_loss: 0.7316
Epoch 5/100
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.6059 - loss: 0.9131 - val_accuracy: 0.3871 - val_loss: 0.7295
Epoch 6/100
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.5471 - loss: 0.9917 - val_accuracy: 0.4516 - val_loss: 0.7255
Epoch 7/100
[1m11/11[0m [32m━━━━━━━━━

## STEP 5: Evaluate

In [17]:
loss, acc = model.evaluate(X_test_s, y_test, verbose=0)
print(f"Test accuracy: {acc:.4f}")

preds = (model.predict(X_test_s, verbose=0) > 0.5).astype(int).flatten()
print(f"Sample predictions: {preds[:15]}...")

Test accuracy: 0.5510
Sample predictions: [1 0 1 0 1 1 0 0 1 0 0 0 0 0 1]...


**Why is accuracy ~59%?** Small dataset (250 samples, ~50 test) and only 3 features. Many students with similar GPA/SAT/extracurriculars have different outcomes (accepted vs rejected), so there's inherent ambiguity. Neural nets tend to need more data; simpler models often do better on small tabular data.

## Alternative models: Logistic Regression & Random Forest

Compare with classical ML models that often outperform neural nets on small tabular data.

In [18]:
# Train and evaluate Logistic Regression and Random Forest
lr = LogisticRegression(max_iter=500, class_weight="balanced", random_state=42)
rf = RandomForestClassifier(n_estimators=100, class_weight="balanced", random_state=42)

lr.fit(X_train_s, y_train)
rf.fit(X_train_s, y_train)

acc_nn = model.evaluate(X_test_s, y_test, verbose=0)[1]
acc_lr = (lr.predict(X_test_s) == y_test).mean()
acc_rf = (rf.predict(X_test_s) == y_test).mean()

print("Test accuracy comparison:")
print(f"  Neural net:    {acc_nn:.4f}")
print(f"  Logistic Reg: {acc_lr:.4f}")
print(f"  Random Forest: {acc_rf:.4f}")
print(f"\nBest: {'Logistic Regression' if acc_lr >= max(acc_nn, acc_rf) else 'Random Forest' if acc_rf >= max(acc_nn, acc_lr) else 'Neural net'}")

Test accuracy comparison:
  Neural net:    0.5510
  Logistic Reg: 0.6531
  Random Forest: 0.6327

Best: Logistic Regression


## STEP 6: Save model and scaler

In [19]:
model.save("model.keras")
joblib.dump(scaler, "scaler.joblib")
print("Saved model.keras and scaler.joblib")
print("In Colab: use Files pane to download these files.")

Saved model.keras and scaler.joblib
In Colab: use Files pane to download these files.


## Predict a single student

In [20]:
# Predict a single test student (all models)
student = np.array([[3.5, 1400, 7]])  # GPA, SAT_Score, Extracurricular_Activities
student_s = scaler.transform(student)

prob_nn = model.predict(student_s, verbose=0)[0][0]
pred_nn = "ACCEPTED" if prob_nn > 0.5 else "REJECTED"
pred_lr = "ACCEPTED" if lr.predict(student_s)[0] else "REJECTED"
pred_rf = "ACCEPTED" if rf.predict(student_s)[0] else "REJECTED"

print(f"Student: GPA={student[0,0]}, SAT={int(student[0,1])}, Extracurriculars={int(student[0,2])}")
print(f"  Neural net:      {pred_nn} (prob={prob_nn:.3f})")
print(f"  Logistic Reg:   {pred_lr}")
print(f"  Random Forest:  {pred_rf}")

Student: GPA=3.5, SAT=1400, Extracurriculars=7
  Neural net:      ACCEPTED (prob=0.529)
  Logistic Reg:   ACCEPTED
  Random Forest:  REJECTED
