# Appointment Scheduling Model

This notebook imports `student.csv` and `professor.csv`, trains a Random Forest Classifier to recommend optimal 30-minute appointment slots (Monday-Thursday, weekday 0-3) in 24-hour format, and saves the model as `appointment_model.joblib` for Flask backend integration. It prioritizes students with fewer preferred lecture days and post-lecture slots, ensuring professor availability. Designed to run on Kaggle.

In [8]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [9]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix
import joblib

# Set random seed for reproducibility
np.random.seed(42)

## Load Data

Load `student.csv` and `professor.csv` from Kaggle input directory.

In [10]:
# Load datasets and convert binary string columns to strings
students = pd.read_csv('/content/student.csv')
professors = pd.read_csv('/content/professor.csv')

# Convert columns to strings to ensure zfill works
students['pre_lecs'] = students['pre_lecs'].astype(str)
students['empty_day'] = students['empty_day'].astype(str)
students['empty_slot'] = students['empty_slot'].astype(str)
professors['availability_day'] = professors['availability_day'].astype(str)
professors['availability_slots'] = professors['availability_slots'].astype(str)

# Ensure binary string columns are padded to 4 digits
students['pre_lecs'] = students['pre_lecs'].apply(lambda x: x.zfill(4))
students['empty_day'] = students['empty_day'].apply(lambda x: x.zfill(4))
professors['availability_day'] = professors['availability_day'].apply(lambda x: x.zfill(4))

print('Students:')
print(students.head())
print('\nProfessors:')
print(professors.head())

Students:
  pre_lecs  count  post_lec_count empty_day empty_slot  is_best_appointment
0     1000      1               1      0111      11:30                    1
1     0001      1               2      1110      10:30                    0
2     1000      1               2      0111      08:30                    1
3     0010      1               1      1101      13:00                    1
4     1001      2               1      0110      11:30                    0

Professors:
  availability_day       availability_slots
0             1111  08:00,08:30,10:30,09:30
1             1010        08:30,11:30,11:00
2             1110        12:30,08:30,10:30
3             1010        13:30,13:00,09:00
4             0111        08:00,09:00,11:00


## Feature Engineering

Generate student-professor pairs and compute features for model training.

In [11]:
# Function to compute overlap between binary strings
def compute_overlap(str1, str2):
    return sum(a == '1' and b == '1' for a, b in zip(str1, str2))

# Generate pairs
n_pairs = 1000
data = {
    'day_overlap': [],
    'slot_match': [],
    'student_count': [],
    'post_lec_count': [],
    'is_best_appointment': []
}

for _ in range(n_pairs):
    student = students.sample(1).iloc[0]
    professor = professors.sample(1).iloc[0]

    day_overlap = compute_overlap(student['empty_day'], professor['availability_day'])
    prof_slots = professor['availability_slots'].split(',')
    slot_match = 1 if student['empty_slot'] in prof_slots else 0

    # Assign is_best_appointment
    if slot_match == 1 and day_overlap > 0:
        if student['count'] == 1 and student['post_lec_count'] == 1:
            is_best = np.random.choice([0, 1], p=[0.2, 0.8])
        elif student['count'] <= 2 and student['post_lec_count'] <= 2:
            is_best = np.random.choice([0, 1], p=[0.5, 0.5])
        else:
            is_best = 0
    else:
        is_best = 0

    data['day_overlap'].append(day_overlap)
    data['slot_match'].append(slot_match)
    data['student_count'].append(student['count'])
    data['post_lec_count'].append(student['post_lec_count'])
    data['is_best_appointment'].append(is_best)

# Create DataFrame
df = pd.DataFrame(data)
print('\nFeature DataFrame:')
print(df.head())


Feature DataFrame:
   day_overlap  slot_match  student_count  post_lec_count  is_best_appointment
0            0           1              2               2                    0
1            3           0              1               1                    0
2            3           0              1               1                    0
3            3           0              1               1                    0
4            1           0              2               2                    0


## Train Model

Train a Random Forest Classifier with class balancing.

In [19]:
X = df[['day_overlap', 'slot_match', 'student_count', 'post_lec_count']]
y = df['is_best_appointment']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Random Forest model with class balancing
model = RandomForestClassifier(n_estimators=100, random_state=42, class_weight='balanced')
model.fit(X_train, y_train)

# ✅ حفظ النموذج
joblib.dump(model, 'appointment_model.pkl')

# Evaluate model
y_pred = model.predict(X_test)
print('\nClassification Report:')
print(classification_report(y_test, y_pred))
print('\nConfusion Matrix:')
print(confusion_matrix(y_test, y_pred))

# Feature importance
feature_importance = pd.DataFrame({
    'feature': X.columns,
    'importance': model.feature_importances_
})
print('\nFeature Importance:')
print(feature_importance.sort_values('importance', ascending=False))


Classification Report:
              precision    recall  f1-score   support

           0       1.00      0.94      0.97       175
           1       0.71      1.00      0.83        25

    accuracy                           0.95       200
   macro avg       0.86      0.97      0.90       200
weighted avg       0.96      0.95      0.95       200


Confusion Matrix:
[[165  10]
 [  0  25]]

Feature Importance:
          feature  importance
1      slot_match    0.944585
2   student_count    0.022406
0     day_overlap    0.021289
3  post_lec_count    0.011720


## Save Model

Save the model for Flask backend integration.

In [20]:
import pandas as pd
import joblib

# تحميل النموذج المحفوظ
model = joblib.load("appointment_model.pkl")


# حالة مثالية (تطابق في اليوم والوقت، وعدد طلاب قليل)
new_data = pd.DataFrame({
    'day_overlap': [2],
    'slot_match': [1],
    'student_count': [1],
    'post_lec_count': [1]
})

# التنبؤ
prediction = model.predict(new_data)
print('\nExample Prediction:')
print(f'Best Appointment? {prediction[0] == 1}')

# حالة غير مثالية (لا يوجد تطابق في الوقت، وعدد طلاب أكبر)
new_data_non_optimal = pd.DataFrame({
    'day_overlap': [1],
    'slot_match': [0],
    'student_count': [2],
    'post_lec_count': [2]
})

prediction_non_optimal = model.predict(new_data_non_optimal)
print(f'Non-Optimal Case Prediction:')
print(f'Best Appointment? {prediction_non_optimal[0] == 1}')



Example Prediction:
Best Appointment? True
Non-Optimal Case Prediction:
Best Appointment? False


## Test Model

Test the model with a sample input.

In [21]:
# Example input (optimal case: high overlap, matching slot, low counts)
new_data = pd.DataFrame({
    'day_overlap': [2],
    'slot_match': [1],
    'student_count': [1],
    'post_lec_count': [1]
})

# Predict
prediction = model.predict(new_data)
print('\nExample Prediction:')
print(f'Best Appointment? {prediction[0] == 1}')

# Non-optimal case: no slot match
new_data_non_optimal = pd.DataFrame({
    'day_overlap': [1],
    'slot_match': [0],
    'student_count': [2],
    'post_lec_count': [2]
})

prediction_non_optimal = model.predict(new_data_non_optimal)
print(f'Non-Optimal Case Prediction:')
print(f'Best Appointment? {prediction_non_optimal[0] == 1}')


Example Prediction:
Best Appointment? True
Non-Optimal Case Prediction:
Best Appointment? False
