Starting from Anzhelika's initial code and some help provided by ChatGPT for the actual code production, this code should be able to consider projects' capacity and deal with over-subscription as required by the stable matching problem.  

Remember: stable matching mechanism ensures that no pair of student and project would prefer each other over their current matches.  


In [1]:
import pandas as pd
from collections import defaultdict

In [6]:
# Preparation:

# Read excel files containing the student and project data
# 'students_df' contains student names and their project preferences
# 'projects_df' contains project names, their capacities, and preferences for students
students_df = pd.read_excel('test_data/td_1_students.xlsx')  
projects_df = pd.read_excel('test_data/td_1_projects.xlsx')  

# Initialize data structures for the algorithm:
# 'students' list contains the names of all students
# 'projects' list contains the names of all projects
# 'project_capacity' maps each project to its maximum student capacity
students = students_df['student_names'].tolist()
projects = projects_df['project_name'].tolist()
project_capacity = projects_df.set_index('project_name')['max_students'].to_dict()

# Preferences setup:
# 'student_prefs' maps each student to a list of their project preferences
# 'project_prefs' maps each project to a list of their preferred students
student_prefs = {row['student_names']: row[['1st_choice', '2nd_choice', '3rd_choice']].tolist() for _, row in students_df.iterrows()} # iterrows() loop through each row
project_prefs = {row['project_name']: row[['1st_choice', '2nd_choice', '3rd_choice', '4th_choice', '5th_choice']].dropna().tolist() for _, row in projects_df.iterrows()}

In [8]:
# Matching algorithm:

# Initialize matching and availability:
# 'matches' maps students to their assigned projects
# 'project_assignments' maps projects to lists of assigned students
matches = {}  
project_assignments = defaultdict(list) # for now default value is an empty list

# The matching algorithm iteratively assigns students to projects based on preferences and capacity
# This should ensure that every student is matched with a project according to their preferences as far as possible:
while len(matches) < len(students):
    for student in students:
        if student not in matches:
            for project in student_prefs[student]:
                # Check if the project can accept more students
                if len(project_assignments[project]) < project_capacity[project]:
                    # If yes, assign the student to the project
                    matches[student] = project
                    project_assignments[project].append(student)
                    break
                else:
                    
                    # Handling over-subscription
                    # If the project is full, check if the new student is preferred over current assignees:
                    current_assignees = project_assignments[project]
                    all_prefs = project_prefs[project] + [student]  # Include the new student for comparison
                    # Determine the preferred assignees based on project preferences
                    preferred_assignees = sorted(current_assignees, key=lambda x: all_prefs.index(x))[:project_capacity[project]]
                    
                    if student in preferred_assignees:
                        # If the new student is preferred, replace the least preferred assigned student
                        to_remove = [s for s in current_assignees if s not in preferred_assignees][0]
                        project_assignments[project].remove(to_remove)
                        del matches[to_remove]  # Remove the least preferred student from matches
                        project_assignments[project].append(student)
                        matches[student] = project
                        break

# Print the final student-project matches.
for student, project in matches.items():
    print(f"{student} is matched with {project}")

s1 is matched with p3
s2 is matched with p1
s3 is matched with p2
s4 is matched with p3
s5 is matched with p1
s6 is matched with p3
s7 is matched with p2
s8 is matched with p1
s9 is matched with p2


This code should ensure that:   
- Every student is matched with a project according to their preferences as far as possible.
- Projects do not exceed their capacity.
- The matching is stable, meaning there are no two pairs (student, project) where both would prefer each other over their current assignment.

This version now considers project capacities and, when an project is over-subscribed, consider the project's preference to potentially replace a currently matched student with a more preferred one. 