## Imports ##

In [57]:
#imports
import math, random, copy, datetime

## Student Object 
Creates a class called student. This is an object that has all the characteristics that a student has (the columns in the csv data file) as attributes.

***

### Attributes 
1. tutorial_group (_str_): First column in the data file (index 0), specifies which tutorial group this student belongs to. E.g. G-100

    ```self.tutorial_group = tutorial_group```

2. id (_int_): Second column in the data file (index 1), serves as the primary key as it is the unique identifier that identifies each student. (Something like the NRIC number) E.g. 2432

    ```self.id = id```

3. school (_str_): Third column in the data file (index 2), specifies the school that this student belongs to. E.g. CCDS

    ```self.school = school```

4. name (_str_): Fourth column in the data file (index 3), contains the name of this student. E.g. John Doe

    ```self.name = name```

5. gender (_str_): Fifth column in the data file (index 4), is either 'Male' or 'Female', specifying the gender of this student.

    ```self.gender = gender```

6. cgpa (_float_): Sixth and last column in data file (index 5), specifies the cgpa of this student. E.g. 4.31

    ```self.cgpa = cgpa```


In [58]:
#Define Student structure
class Student:
    def __init__(self, tutorial_group, id, school, name, gender, cgpa) -> None:
        """To initialise a student object

        Args:
            tutorial_group (str): Student's class
            id (int): Student indentifier
            school (str): School that student belongs to
            name (str): Name of student
            gender (str): Either Male or Female
            cgpa (float): GPA of the studen
        """
        self.tutorial_group = tutorial_group
        self.id = id
        self.school = school
        self.name = name
        self.gender = gender
        self.cgpa = cgpa

## Split_Student Class

Class containing functions ***split_students_by_school***, ***split_students_by_gender*** and ***split_students_by_cgpa***, which are used to split students based on a specific factor, 'school'/'gender'/'cgpa'.

***

### Split Students by ***School*** Function 

##### **Input:** ```split_students_by_school(self, students, **kwargs)```

1. students (list of student objects, that need to be split by school)
2. **kwargs (to allow for generic calling of functions with getattr)

##### **Output:**

1. A dictionary, where key: school & value: list of students in that school

##### **How it works**

1. A dictionary called *sorted_students* is first initialized, which will be the one returned in **output**.

   ```sorted_students = {}```

2. We use a _for loop_ to start iterating through each student
    
    ``` for student in students:```

3. For each student, we first check if the *school* attribute of the student is in the .keys() method of *sorted_students*.
    
    ```if student.school not in sorted_students.keys():```

4. If the school is **not** already present as a key in *sorted_students*, we will create a new key:value pair in *sorted_students*, where the key is the school of the current student, and the value is the current student object in a list. 
    
    ```sorted_students[student.school] = [student]```

5. Else, if the school is present, we will simply append the current student object to his/her school, by calling *sorted_students* with the key being the school of the current student.

    ```sorted_students[student.school].append(student)```

***

### Split Students by ***Gender*** Function 

##### **Input:** ```split_students_by_gender(self, students, **kwargs)```

1. students (list of student objects, that need to be split by gender)
2. **kwargs (to allow for generic calling of functions with getattr)

##### **Output:**

1. A dictionary, where key: gender & value: list of students of that gender

##### **How it works**

1. A dictionary called *sorted_students* containing two key:value pair is first initialized, which will be the one returned in **output**. The two key:value pairs are *"Male":[]* & *"Female":[]*, where the keys are the 2 possible genders and value being a list, which will contain students of that gender.

   ```sorted_students = {"Male":[], "Female":[]}```

2. We use a _for loop_ to start iterating through each student
    
    ``` for student in students:```

3. For each student, we first check if the *gender* attribute of the student is "Male".
    
    ```if student.gender == "Male":```

4. If the _gender_ attribute is "Male", this means that this student is male, and we will append this student to the list in the key:value pair where the key is "Male" in *sorted_students*. 
    
    ```sorted_students["Male"].append(student)```

5. Else, the only other possbility is that the student is "Female", and hence we will append this student to the list in the key:value pair where the key is "Female" in *sorted_students*.

    ```sorted_students["Female"].append(student)```

6. To account for the cases where a tutorial group only has "Female"s, we check the length of the list in the key:value pair where the key is "Male" in *sorted_students*. If there are no "Male" members present, we would simply delete this key:value pair.

    ```
    if len(sorted_students["Male"]) == 0:
        del sorted_students["Male"]
    ```

7. Similarily, to account for tutorial group with only "Male"s, we check the length of the list in the key:value pair where the key is "Female" in *sorted_students*. If there are no "Female" members present, we would simply delete this key:value pair.

    ```
    if len(sorted_students["Female"]) == 0:
        del sorted_students["Female"]
    ```

***

### Split Students by ***CGPA*** Function 

##### **Input:** ```split_students_by_cgpa(self, students, **kwargs)```

1. students (list of student objects, that need to be split by cgpa)
2. **kwargs (to allow for generic calling of functions with getattr)
    * Should include a parameter size (which is the size of groups to split into)

##### **Output:**

1. A dictionary, where key: cgpa band & value: list of students in that cgpa band

##### **How it works**

1. We first retrieve the *size* variable from kwargs, which is the size of groups to split the whole tutorial class into

    ```size = kwargs["size"]```

2. Next, we find the percentile that we need to split by, by finding 100/(*size*). For example, if there 5 people in each group, and 100/5 = 20, we would be splitting the students at the percentiles 20%, 40%, 60% & 80%.

    ```percentile = 100/size```

3. First, we sort the students by their cgpa, in ascending order using .sort() and setting the key to be the *cgpa* attribute of the student object with a single line function

    ```students.sort(key=lambda student: student.cgpa)```

4. Next, we will split the students into the different cgpa bands. But first we will initialize some counters for the while loop, including:
    * A dictionary called *sorted_students* is first initialized, which will be the one returned in **output**.

        ```sorted_students = {}```

    * A variable called *band_number*, which serves as a counter for the while loop and label of that specific band. This variable will be incremented upon the completion of each iteration of the while loop.

        ```band_number = 0```

5. Now, we calculate the index at which we need to cutoff students at for this current band. To find this cutoff, we have 3 steps.
    * First, we find the float of the percentile by diving it by 100.

        ```percentile/100```

    * Next, we take this float and multiply it by the length of all students, which would give us the index of at which student to stop.

        ```(percentile/100)*len(students)```
    
    * Lastly, since the above value is a float, we would wish to round it up, using *math.ceil()* in this case.

        ```cutoff = math.ceil((percentile/100)*len(students))```

6. For the while loop, we set an exit condition of *band_number* >= *size*, as the number of bands should be the same as the size of groups, with the band number being zero indexed.

    ```while band_number < size:```

7. In the while loop, for each band, we would create a new key:value pair in *sorted_students* where the key is the *band_number* and the value being the list of students sliced at our previously calculated *cutoff* point. 

    ```sorted_students[band_number] = students[:cutoff]```

8. Then, we need to remove the students already assigned to a band from those yet to be assigned, deleting them from the list of *students*

    ```del students[:cutoff]```

9. In the case where the number of students was smaller than the number of bands, we add a break condition to prevent errors.

    ```
    if len(students) == 0:
        break
    ```

10. Lastly, at the end of each iteration, we need to increment the *band_number* by 1.

    ```band_number += 1```

In [59]:
#Class of functions to split students
class Split_Student:
    
    def __init__(self):
        pass
    
    
    #Functions to split students into different categories
    def split_students_by_school(self, students, **kwargs) -> dict:
        """Splits students by school

        Args:
            students (list): List of Student objects to be split
            
        Returns:
            dict: where key is the school, and the value is a list of students in the school 
        """
        #Seperate students by school
        sorted_students = {}

        for student in students:
            if student.school not in sorted_students.keys(): #Current student's school is not recorded yet
                sorted_students[student.school] = [student]

            else: #student's school is already present
                sorted_students[student.school].append(student)

        return sorted_students


    def split_students_by_gender(self, students, **kwargs) -> dict:
        """Splits students by gender

        Args:
            students (list): List of Student objects to be split

        Returns:
            dict: where key is the gender, and the value is a list of students split by gender
        """
        #seperate into male & female
        sorted_students = {"Male":[], "Female":[]}

        for student in students:
            if student.gender == "Male": #if student is male
                sorted_students["Male"].append(student)

            else: #if student is female
                sorted_students["Female"].append(student)


        #If no male or female, remove it from dict
        if len(sorted_students["Male"]) == 0:
            del sorted_students["Male"]

        if len(sorted_students["Female"]) == 0:
            del sorted_students["Female"]

        return sorted_students


    def split_students_by_cgpa(self, students, **kwargs) -> dict:
        """Splits students by cgpa

        Args:
            students (list): List of Student objects to be split
            size (int): size of groups to sort into, expected to be input from kwargs

        Returns:
            dict: dict of where values are students split by cgpa and key their bands.
        """
        #Retrieve number_of_groups input
        size = kwargs["size"]
        
        #Find the splitting gpa, E.g. mean for 2 teams
        percentile = 100/size

        #Sort students based on cgpa first
        students.sort(key=lambda student: student.cgpa)

        #Split at the various percentiles
        sorted_students = {}
        band_number = 0
        cutoff = math.ceil((percentile/100)*len(students))

        while band_number < size:
            sorted_students[band_number] = students[:cutoff]

            #Remove assgined students from unassigned
            del students[:cutoff]

            #Check if any students left after the deletion, if no exit
            if len(students) == 0:
                break

            #increment band_number
            band_number += 1

        return sorted_students

### Choose Students Function

This function is designed to randomly split the students into each team, while taken into consideration the balance of the three factors, **School**, **Gender** & **CGPA**.

#### **Input:** ```choose_students(students, schools, factor_order, number_of_groups, size)```

1. students (list): list of students splited and sorted by factor order
2. schools (list): list of schools that students are from
3. factor_order (list): list of which the factors were considered, with first factor considered at index 0
4. number_of_groups (int): number of groups to split into
5. size (int): Size of groups to sort into

#### **Output:**

1. A List containing the (*number_of_groups*) lists, where each nested list is a group, containg the student objects of those students who belong in that group

#### **How it works**

1. A List called _all grouping_ is initialized, which would be the list returned in the **output**

    ```all_grouping = []```

2. We initialize 3 variables, *school_choices*, *gender_choices*, *cgpa_choices*, which are the possible selections to make in choosing a student.
    * *school_choices* : Same as the *schools* input, which is a list of all unique schools that the students belong to.

        ```school_choices = schools```

    * *gender_choices* : A List of 2 choices, "Male" or "Female"

        ```gender_choices = ["Male", "Female"]```

    * *cgpa_choices* : A list of number from 0 to size-1

        ```cgpa_choices = [band for band in range(size)]```

3. Next, we enter a for loop that will iterate the same number of times as the *number_of_groups*, where in each iteration we select the students for each group.

    ```for grouping in range(number_of_groups):```

4. Then, we make a copy of each of the choices lists using *copy.deepcopy()* to ensure that brand new instances of the same lists are created, instead of a pointer to the original list.

    ```
    schools_not_selected_yet = copy.deepcopy(school_choices)
    gender_not_selected_yet = copy.deepcopy(gender_choices)
    cgpas_not_selected_yet = copy.deepcopy(cgpa_choices)
    ```

5. Before we start on selecting students, we first initialize 2 variables *current_group* and *students_selected*.
    * *current_group* : An empty list where selected students will be appended into.

        ```current_group = []```
    
    * *students_selected* : 0, which is the number of students already added into the current group, and will be incremented after each successful choice.

        ```students_selected = 0```

6. For the while loop, we set a break condition of *students_selected* >= *size*, where the while loop will iterate until the number of students selcted is the same as the required group size.

    ```while students_selected < size:```

7. Firstly, we will work on selecting a school.
    * Before choosing a school, we check if the *schools_not_selected_yet* is empty, which means that all schools have been chosen at least once. This would mean that we can refresh the choices, equating this variable to *school_choices* again with *copy.deepcopy()*.

        ```
        if len(schools_not_selected_yet) == 0: 
            schools_not_selected_yet = copy.deepcopy(school_choices)
        ```
    
    * Next, we set the seed of random to the current time in the *time* module, increasing the randomness of the generated choices.

        ```random.seed(datetime.datetime.now().timestamp())```
    
    * Now, we make the choice for the school of the chosen student with *random.choice()*.

        ```school_choice = random.choice(schools_not_selected_yet)```
    
    * To ensure that there is no repeat of the same school too much, we have to remove the chosen school from this iteration of *schools_not_selected_yet*

        ```schools_not_selected_yet.remove(school_choice)```

8. Secondly, we perform a similar operation for selecting the gender of the student.
    * Before choosing a gender, we check if the *gender_not_selected_yet* is empty, which means that all genders have been chosen at least once. This would mean that we can refresh the choices, equating this variable to *gender_choices* again with *copy.deepcopy()*.

        ```
        if len(gender_not_selected_yet) == 0: 
                gender_not_selected_yet = copy.deepcopy(gender_choices)
        ```
    
    * Next, we set the seed of random to the current time in the *time* module, increasing the randomness of the generated choices.

        ```random.seed(datetime.datetime.now().timestamp())```
    
    * Now, we make the choice for the gender of the chosen student with *random.choice()*.

        ```gender_choice = random.choice(gender_not_selected_yet)```
    
    * To ensure that there is no repeat of the same gender too much, we have to remove the chosen gender from this iteration of *gender_not_selected_yet*

        ```gender_not_selected_yet.remove(gender_choice)```

9. Thirdly, we repeat this operation once again, but this time for choosing the cgpa of the student.
    * Before choosing a cgpa, we check if the *cgpas_not_selected_yet* is empty, which means that all cgpa bands have been chosen at least once. This would mean that we can refresh the choices, equating this variable to *cgpa_choices* again with *copy.deepcopy()*.

        ```
        if len(cgpas_not_selected_yet) == 0:
                cgpas_not_selected_yet = copy.deepcopy(cgpa_choices)
        ```
    
    * Next, we set the seed of random to the current time in the *time* module, increasing the randomness of the generated choices.

        ```random.seed(datetime.datetime.now().timestamp())```
    
    * Now, we make the choice for the cgpa band of the chosen student with *random.choice()*.

        ```cgpa_choice = random.choice(cgpas_not_selected_yet)```
    
    * To ensure that there is no repeat of the same cgpa band too much, we have to remove the chosen cgpa band from this iteration of *cgpas_not_selected_yet*

        ```cgpas_not_selected_yet.remove(cgpa_choice)```

10. Having made the choices for each factor, stored in *school_choice*, *gender_choice* and *cgpa_choice*, we can attempt to retrieve the student object. In this case, we use **try except** to account for certain cases where the choice made does not exist. In these cases, an error is caught in the exceptions and a new attempt is made in the next while loop iteration.

    ```
    try: 

    except IndexError:
        pass

    except KeyError:
        pass
    ```

11. In retrieving the correct student object, we need to arrange the choices in the correct order. Using a dictionary and *factor_order*, we are able to convert the three choices into *first_factor*, *second_factor* and *third_factor* respectively based on the *factor_order*.

    ```
    choices = {"school": school_choice, "gender": gender_choice, "cgpa": cgpa_choice}
                
    first_choice = choices[factor_order[0]]
    second_choice = choices[factor_order[1]]
    third_choice = choices[factor_order[2]]
    ```

12. Since there may be multiple students fitting the same profile, we would add a index 0, to select the first person in the list.

    ```chosen_one = students[first_choice][second_choice][third_choice][0]```

13. Once a student is selected, we remove the student from all *students* to prevent him/her from being chosen again.

    ```students[first_choice][second_choice][third_choice].remove(chosen_one)```

14. Next, we add the chosen student to the *current_group* list which should eventually contain everyone in the group.

    ```current_group.append(chosen_one)```

15. Upon this successful choice of a student, only then would we increment the counter *students_selected* by 1.

    ```students_selected += 1```

16. Having removed a student, we have to check if the current profile is empty in *students*. If yes, then we should delete it for better housekeeping. We do this to all 3 layers, from *third_choice* to *first_choice*.

    ```
    if len(students[first_choice][second_choice][third_choice]) == 0:
        del students[first_choice][second_choice][third_choice]

    if len(students[first_choice][second_choice]) == 0:
        del students[first_choice][second_choice]

    if len(students[first_choice]) == 0:
        del students[first_choice]
    ```

17. Exiting the while loop, which means that we have successfully selected everyone for the *current_group*, we append it into the *all_grouping* list and move on to the next iteration (next group)

    ```all_grouping.append(current_group)```

In [60]:
#Grouping students into their respective teams (REDO)
def choose_students(students, schools, factor_order, number_of_groups, size) -> list:
    """Splits students into number of groups specified, with balanced teams based on school, gender and cgpa

    Args:
        students (list): list of students splited and sorted by factor order
        schools (list): list of schools that students are from
        factor_order (list): list of which the factors were considered, with first factor considered at index 0
        number_of_groups (int): number of groups to split into
        size (int): Size of groups to sort into

    Returns:
        list: Contains lists, where each list is a group of students
    """
    #Initialise the final grouping list
    all_grouping = []

    #Possible profiles
    school_choices = schools #Schools
    gender_choices = ["Male", "Female"]
    cgpa_choices = [band for band in range(size)] #GPA bands


    for grouping in range(number_of_groups):
        #Duplicate the choices so it can be edited, using copy.deepcopy() so it does not affect original copy
        schools_not_selected_yet = copy.deepcopy(school_choices)
        gender_not_selected_yet = copy.deepcopy(gender_choices)
        cgpas_not_selected_yet = copy.deepcopy(cgpa_choices)

        #For each group iteration
        current_group = []
        students_selected = 0
        while students_selected < size:
            
            #Select a school
            if len(schools_not_selected_yet) == 0: #all schools chosen once already
                schools_not_selected_yet = copy.deepcopy(school_choices)

            random.seed(datetime.datetime.now().timestamp()) #Changes seed every time a random choice is made, increasing randomness
            
            school_choice = random.choice(schools_not_selected_yet) #of those not chosen randomly choose 1
            schools_not_selected_yet.remove(school_choice) #Prevent repeat of same choice when there are other options


            #Select a Gender
            if len(gender_not_selected_yet) == 0: #all genders chosen once already
                gender_not_selected_yet = copy.deepcopy(gender_choices)

            random.seed(datetime.datetime.now().timestamp()) #Changes seed every time a random choice is made, increasing randomness
            
            gender_choice = random.choice(gender_not_selected_yet) #of those not chosen randomly choose 1
            gender_not_selected_yet.remove(gender_choice) #Prevent repeat of same choice when there are other options


            #Select a cgpa band
            if len(cgpas_not_selected_yet) == 0:
                cgpas_not_selected_yet = copy.deepcopy(cgpa_choices)

            random.seed(datetime.datetime.now().timestamp()) #Changes seed every time a random choice is made, increasing randomness
            
            cgpa_choice = random.choice(cgpas_not_selected_yet) #of those not chosen randomly choose 1
            cgpas_not_selected_yet.remove(cgpa_choice) #Prevent repeat of same choice when there are other options

            

            try: 
                #Convert the choices in the same order, based on factor order
                choices = {"school": school_choice, "gender": gender_choice, "cgpa": cgpa_choice}
                
                first_choice = choices[factor_order[0]]
                second_choice = choices[factor_order[1]]
                third_choice = choices[factor_order[2]]

                #Extract that student
                chosen_one = students[first_choice][second_choice][third_choice][0] #Take the first one in the chosen profile

                #Remove the chosen student from future choices
                students[first_choice][second_choice][third_choice].remove(chosen_one)

                #Add the chosen student to the group
                current_group.append(chosen_one)

                #Increment the counter
                students_selected += 1



                #Check if the chosen one's parents are empty, if yes remove them
                #Check third choice
                if len(students[first_choice][second_choice][third_choice]) == 0:
                    del students[first_choice][second_choice][third_choice]

                #Check Second Choice
                if len(students[first_choice][second_choice]) == 0:
                    del students[first_choice][second_choice]

                #Check Third Choice
                if len(students[first_choice]) == 0:
                    del students[first_choice]

            except IndexError:
                pass

            except KeyError:
                pass


        #Add current group to all groups
        all_grouping.append(current_group)


    return all_grouping

## Sort Students Function

This function is designed to first decide on the order of consideration of the factors depending on its uniqueness, then split the students with the functions from *Split Function Class* based on the calculated order. After the split, it then feeds the layered data into the *choose_students* function to group them.

### **Imports:** sort_students(students, size)

1. students (list): List of Student objects
2. size (int): Size of groups to sort into

### **Output**

1. A List containing the (*number_of_groups*) lists, where each nested list is a group, containg the student objects of those students who belong in that group

### How it works

1. Find the uniqueness of each factor by calculating the number of unique elements in each factor in the *students* data.
    * Schools Factor: Length of set of all schools of students, where set in python ensures that all elements it contains is unique.
    
    ```schools = len(set([student.school for student in students]))```

    * Gender Factor: Length of set of all gender of students, where set in python ensures that all elements it contains is unique.

    ```genders = len(set([student.gender for student in students]))```

    * CGPA Bands: Same as the size of each group, as we would like each member to come from a different band to ensure balance.

    ```cgpa_bands = size```

2. For better selection, we put these values into a dictionary where the key is "school", "gender" and "cgpa" respectively.

    ```factors = {"school": schools, "gender": genders, "cgpa": cgpa_bands}```

3. In deciding the order of factors, we would select the factor with the **least** unique elements to be the first factor, and factor with the **most** unique elements to be the last factor. This would ensure that when choosing the students based on the factors, all factors are considered so that the groups are balanced. what we want our data to look like after the splitting is a pyramid, with least splits on the top and most splits at the bottom. 

    * E.g. When the first factor has 2 unique elements (A,B), second factor 3 unique elements (C,D,E) and third factor 4 unique elements (F,G,H,I), our list of students should look like this:

        ```
        Layer 1:                                     A                                                           B                           
                                   /                 |                 \                       /                 |                 \         
        Layer 2:                  C                  D                  E                     C                  D                  E        
                            /   /   \   \      /   /   \   \      /   /   \   \         /   /   \   \      /   /   \   \      /   /   \   \  
        Layer 3:           F   G     H   I    F   G     H   I    F   G     H   I       F   G     H   I    F   G     H   I    F   G     H   I 
        ```
    
    * Hence, we use *min* function to find the factors with the least unique elements. (We delete the factor after its order has been decided)

        ```
        first_factor = min(factors, key=factors.get)
        del factors[first_factor]
        
        second_factor = min(factors, key=factors.get)
        del factors[second_factor]
        
        third_factor = min(factors, key=factors.get)
        del factors[third_factor]
        ```

4. Before accessing the splitting functions, we first have to initliaze the *Split_Student()* class.

    ```spliting_functions = Split_Student()```

5. For the first factor, we can simply feed the *students* data and *size* of groups in, using *getattr* function to call our intended function that splits by the first factor.

    ```sorted_students = getattr(spliting_functions, "split_students_by_" + first_factor)(students, size=size)```

6. For the second factor, we have to access the values in the *sorted_students* to sort it by the second factor. We do this by using the *items()* method, and feeding the value to our intended function for each key:value pair.

    ```
    for first_key, first_value in sorted_students.items(): #Access Second layer
        sorted_students[first_key] = getattr(spliting_functions, "split_students_by_" + second_factor)(first_value, size=size)
    ```

7. Similarily, we do the same for the third factor. However, since the students have already been split by the first two factors, we have **2** layers to go through, hence **2** for loops.

    ```
    for first_key, first_value in sorted_students.items(): #Access Second layer
        for second_key,second_value in first_value.items(): #Access Third layer
            sorted_students[first_key][second_key] = getattr(spliting_functions, "split_students_by_" + third_factor)(second_value, size=size)
    ```

8. With our students already split, we initliaze a few variables before choosing the students.
    * *no_of_groups* : The number of groups to be split into based on the number of students. *math.ceil()* is used to round up the result.

        ```no_of_groups = math.ceil(len(students)/size)```
    
    * *factor_order* : The factors in a list, with the *first_factor* at start and *third_factor* last.

        ```factor_order = [first_factor, second_factor, third_factor]```
    
    * *schools* : List of unique schools that the student belongs to, found using a set which allows only one copy of each element

        ```schools = list(set([student.school for student in students]))```

9. Next, we call the *choose_students()* function to sort the students into their respective groupings, with *final_grouping* being the finalized grouping for all students in this specific tutorial group.

    ```final_grouping = choose_students(sorted_students, schools, factor_order, no_of_groups, size)```

In [61]:
#Parent function to execute subfunctions for organising students and grouping into teams
def sort_students(students, size) -> list:
    """Sort students based on School, Gender, CGPA in groups of given size

    Args:
        students (list): List of Student objects
        size (int): Size of groups to sort into

    Returns
        list: list of students sorted into groups of given size
    """
    #Decide order of factors based on unique values
    schools = len(set([student.school for student in students]))
    genders = len(set([student.gender for student in students]))
    cgpa_bands = size #we are splitting gpa bands by number of ppl in each group

    factors = {"school": schools, "gender": genders, "cgpa": cgpa_bands}
    
    #First factor to consider
    first_factor = min(factors, key=factors.get)
    del factors[first_factor]
    
    #Second factor to consider
    second_factor = min(factors, key=factors.get)
    del factors[second_factor]
    
    #Third factor to consider
    third_factor = min(factors, key=factors.get)
    del factors[third_factor]
    
    
    #Split and organise students based on order of factors
    spliting_functions = Split_Student()
    #First factor
    sorted_students = getattr(spliting_functions, "split_students_by_" + first_factor)(students, size=size)
    
    #Second factor
    for first_key, first_value in sorted_students.items(): #Access Second layer
        sorted_students[first_key] = getattr(spliting_functions, "split_students_by_" + second_factor)(first_value, size=size)
    
    #Third factor
    for first_key, first_value in sorted_students.items(): #Access Second layer
        for second_key,second_value in first_value.items(): #Access Third layer
            sorted_students[first_key][second_key] = getattr(spliting_functions, "split_students_by_" + third_factor)(second_value, size=size)


    #Number of groups
    no_of_groups = math.ceil(len(students)/size) #Rounded up

    #Store the order of factors
    factor_order = [first_factor, second_factor, third_factor]
    
    #Get unique schools of students
    schools = list(set([student.school for student in students]))
    
    #Group the students
    final_grouping = choose_students(sorted_students, schools, factor_order, no_of_groups, size)

    return final_grouping

## Main Function

The main function where we read the data, clean it and call other functions to achieve our intended goal before writing it.

### How it works

1. Extract and clean the csv data, then converting it into a list using list comprehension. *.strip().split(",")* is used to strip each line of record and then split it by comma, since csv is comma delimeted. *records.readlines()[1:]* reads every line in csv file, while removing the headers since we slice it at index 1 ([1:]).

    ```
    with open("records.csv", "r") as records:
        student_data = [record.strip().split(",") for record in records.readlines()[1:]]
    ```

2. We ask the user for how many students in each team. Using while True loop, try except and *int()* function, we force the user to keep answering until he gives a valid input, which is an integer.

    ```
    while True:
        try:
            size = int(input("How many students in each team: ").strip())
            break #Break only if input is converted into an integer

        except ValueError:
            pass
    ```

3. We want to sort the students first into their respective tutorial groups. Hence, we first intialize a dictionary *all_tutorial_groups* where the key would be the tutorial group and value a list of students that belong to that group.

    ```all_tutorial_groups = {}```

4. Next, we use a for loop to iterate through each student.

    ```for record in student_data:```

5. We retrieve the *tutorial_group* the student belongs to, and also create a *student* object for the student for easier future access of its elements. The order of attributes for the student order as such: tutorial_group (str), id (int), school (str), name (str), gender (str), cgpa (float)

    ```
    tutorial_group = record[0]

    student = Student(record[0],int(record[1]),record[2],record[3],record[4],float(record[5]))
    ```

6. Next, we add the student into *all_tutorial_groups*. If the student's *tutorial_group* exists, we would simply append the student, else we need to create a brand new element in the dictionary, where the key is *tutorial_group* and value a list containing the student.

    ```
    if tutorial_group in all_tutorial_groups.keys():
        all_tutorial_groups[tutorial_group].append(student)
        
    else:
        all_tutorial_groups[tutorial_group] = [student]
    ```

7. Having prepared the data, we are ready to sort the students into their respective teams. But first we intialize a dictionary *final_grouping* which will store the final groupings, where the key is tutorial group and the value is a list of lists, where each nested list is a team.

    ```final_grouping = {}```

8. Since students from each tutorial group can only group with those in the same tutorial group, we sort them one tutorial group by one tutorial group. In this case, we use a for loops, iterating through the *.items()* method of *all_tutorial_groups*. We take each ***tutorial_group:students*** pair and feed it into the function *sort_students()* which returns a list of the students already split into their teams. We then insert this result into *final_grouping* dictionary with the key being the *tutorial_group*

    ```
    sorted_students = sort_students(students, size)

    final_grouping[tutorial_group] = sorted_students
    ```

9. With the students split into teams, we need to write the results into a csv.
    * Since we need to label each team, we initialize a counter *group_number* which will be incremented after every team, serving as their team number

        
        ```group_number = 1```
    
    * Here, we use **with** to open a file in *w+* mode as *output* to access the file.
        
        ```with open("final_grouping.csv", "w+") as output:```
    
    * First, we write in the headers of the data.
        
        ```output.write("Tutorial_Group,Student ID,School,Name,Gender,CGPA,Team Assigned\n")```
    
    * We then iterate through each tutorial_group.
        
        ```for tutorial_group, students in final_grouping.items():```
    
    * In each tutorial group, we have a list containing lists that represent each team. Once again, we use a for loop to access each team.

        ```for group in students:```
    
    * In each team, the elements represent the students that belong to the team. Using a for loop, we can access the student objects respresenting the student and write them to the file.
        
        ```
        for student in group:
            output.write(f"{tutorial_group},{student.id},{student.school},{student.name},{student.gender},{student.cgpa},{group_number}\n")
        ```
    
    * After all members of a team is written to the output file, we increment the team number by 1.
                
        ```group_number += 1```
    

In [62]:
def main():
    #Extract and split the data
    with open("records.csv", "r") as records:
        student_data = [record.strip().split(",") for record in records.readlines()[1:]] #Removing first row headers

    #Ask the group size from the user
    while True:
        try:
            size = int(input("How many students in each team: ").strip())
            break #Break only if input is converted into an integer

        except ValueError:
            pass

    #Initialize dictionary for all students
    all_tutorial_groups = {}

    #Apply the student class for easier access & move students into their tutorial groups
    for record in student_data:
        #Retrieve tutorial group
        tutorial_group = record[0]
        
        #Set up student object, attributes order: tutorial_group (str), id (int), school (str), name (str), gender (str), cgpa (float)
        student = Student(record[0],int(record[1]),record[2],record[3],record[4],float(record[5]))
        
        if tutorial_group in all_tutorial_groups.keys(): #If there was already a student from this group
            all_tutorial_groups[tutorial_group].append(student)
        
        else: #This tutorial group is not included yet
            all_tutorial_groups[tutorial_group] = [student] #Create new entry, with value being a list containing the student


    #Perform grouping for all students
    final_grouping = {}
    for tutorial_group, students in all_tutorial_groups.items():
        #For us to know progress
        print(f"Currently sorting tutorial group: {tutorial_group}")
        
        #Sort into their groups
        sorted_students = sort_students(students, size)

        #Store the sorted students
        final_grouping[tutorial_group] = sorted_students


    #Output the final grouping (Parse it back into the orginal csv format)
    group_number = 1
    with open("final_grouping.csv", "w+") as output:
        #Write headers
        output.write("Tutorial_Group,Student ID,School,Name,Gender,CGPA,Team Assigned\n")
        for tutorial_group, students in final_grouping.items():
            for group in students:
                for student in group:
                    output.write(f"{tutorial_group},{student.id},{student.school},{student.name},{student.gender},{student.cgpa},{group_number}\n")

                #Increment group number by 1 after current group is complete
                group_number += 1

In [63]:
if __name__ == "__main__":
    main()

Currently sorting tutorial group: G-1
Currently sorting tutorial group: G-10
Currently sorting tutorial group: G-100
Currently sorting tutorial group: G-101
Currently sorting tutorial group: G-102
Currently sorting tutorial group: G-103
Currently sorting tutorial group: G-104
Currently sorting tutorial group: G-105
Currently sorting tutorial group: G-106
Currently sorting tutorial group: G-107
Currently sorting tutorial group: G-108
Currently sorting tutorial group: G-109
Currently sorting tutorial group: G-11
Currently sorting tutorial group: G-110
Currently sorting tutorial group: G-111
Currently sorting tutorial group: G-112
Currently sorting tutorial group: G-113
Currently sorting tutorial group: G-114
Currently sorting tutorial group: G-115
Currently sorting tutorial group: G-116
Currently sorting tutorial group: G-117
Currently sorting tutorial group: G-118
Currently sorting tutorial group: G-119
Currently sorting tutorial group: G-12
Currently sorting tutorial group: G-120
Curre