# Using Genetic Algorithms to Form Optimal Groups

We have all been in that situation. You have to do a group project in a class, and the lecturer assigns the groups randomly. You find yourself in a group with people you don’t know and end up doing all the work by yourself. We want to change that and want you to implement an Genetic Algorithm that optimizes groups.


## Task 1:


You have to implement the initial population, crossover, mutation, and fitness function. We divided it into smaller subtasks for you and implemented some of the functions for you. If you want to do your own implementation, feel free to ignore our hints and functions. 

Your initial population consists of 50 group distributions, where 100 students are assigned to 20 groups. Each student has a Student ID, a name, their spoken language, their 2 Majors (we assume everyone is doing their Masters), their ambition in the course, their preferred meeting place, their personality type, their gender, a friend that they want to be in a group with, and their preferred meeting day. 

To get you even more involved with the task, we want every member of your group to add their own person into the data. TODO Cornelius. We are using the Myers-Briggs personality, which may not be the most scientific, but it for sure is entertaining. If you don’t know your type, you can take the test here (approx. 10 minutes) https://www.16personalities.com/free-personality-test. 

Run this cell to load the packages:

In [1]:
import numpy as np
import random
import pandas as pd
import matplotlib as plt
import ipywidgets
import IPython

Hyperparameters:

In [2]:
students = pd.read_csv (r'../dataset_full.csv')
student_ids = students.ID.tolist()

# hyperparameters
num_individuals = 50
groupsize = 5
# between 0 and 1
mutation_rate = 0.05

Execute this cell to create the dataset:

We create one random individual:

In [3]:
def create_random_individual(student_ids):
    #You don't need to do anything here.

    individual = student_ids.copy()
    random.shuffle(individual)

    return individual

print(create_random_individual(student_ids))

[71, 59, 27, 97, 56, 89, 96, 22, 6, 54, 48, 10, 39, 93, 36, 42, 80, 92, 66, 46, 43, 62, 14, 33, 87, 30, 38, 13, 60, 17, 98, 86, 11, 70, 94, 8, 3, 2, 75, 25, 85, 1, 95, 53, 52, 9, 5, 18, 79, 63, 40, 91, 82, 78, 68, 65, 100, 51, 83, 57, 72, 12, 31, 90, 55, 24, 35, 88, 37, 4, 74, 81, 23, 69, 73, 32, 64, 20, 26, 45, 99, 21, 77, 49, 47, 7, 84, 16, 67, 76, 34, 28, 19, 41, 50, 15, 61, 44, 29, 58]




Create the initial population of 50 (=num_individuals) individuals:


In [None]:
def create_initial_population(students, groupsize):
    
    #should return a numpy array of the whole population
    pass

We need a fitness function that computes how good the group distribution is. You have to take into consideration that we have many parameters that are not equally important and should therefore be differently weighted. We already coded some of the specific evaluation functions for the parameters for you. You have to implement the remaining evaluation functions. Think about what is desirable to have in a group and how you can calculate it. For example, you might have to change the data type of a parameter to make meaningful calculatons.

At the end, you have to weigh all evaluation functions in one fitness function. 

In [2]:
def evaluate_language(group):
    pass


In [10]:

def evaluate_majors(group):
    #Remember that the majors are stored as a string like "('NS','AI')". 
    # You might need to preprocess those to actual tupels.
    pass


In [None]:

def evaluate_ambition(group):
    #it might be useful to convert the strings to an integer scala
    
    pass


In [None]:

def evaluate_meeting_place(group):
    #You don't need to code anything in here.
    #This is an example evaluation function.
    
    # number of groupmembers for each preferred meeting place
    meeting_place = group['Preferred meeting place'].value_counts()

    # if all prefer the same meeting place return 5, else 0
    if meeting_place[0] == groupsize:
        return 5

    return 0

In [None]:
def evaluate_gender(group):
    pass

In [None]:
def evaluate_friends(array,students):
    pass

In [None]:
def evaluate_personality(group):
    #You dont need to change anything in here 

    #information about compatible personality types is taken from
    # Montequín, Vicente Rodríguez, et al. "Using Myers-Briggs type indicator (MBTI) as a tool for setting up student teams for information technology projects." Journal of Information Technology and Application in Education 1.1 (2012): 28-34.

    #count existing personality types in each group
    personalities = group['Personality type']
    types = personalities.value_counts()

    #fitness function starts with 0 and gets better
    # with every good group member
    fitness = 0

    #its good if there is a group leader like an ISTJ or an ESTJ, but only one
    try:
        if (types['ISTJ'] + types['ESTJ'] == 1):
            fitness+=5
        elif (types['ISTJ'] + types['ESTJ'] >= 2):
            fitness-=5
    except KeyError:
        pass

    #compare compatibility of group members
    for i, personality_a in enumerate(personalities.tolist()):
        for j, personality_b in enumerate(personalities.tolist()):
            # skip same group member and members already compared
            if i <= j:
                continue

            # increase fitness if
            if (personality_a[1] != personality_b[1]) ^ (personality_a[2] != personality_b[2]):
                if (personality_a[0] != personality_b[0]) or (personality_a[3] != personality_b[3]):
                    fitness+=1

    return fitness

In [None]:
def evaluate_meeting_day(group):
    pass

And now put everything together.
The function is almost done, but remember to add the weights.

In case you want to add hard constraints, feel free to do that in this function.



In [3]:
def evaluate_fitness(individual, students):
    # split individual into student groups of the groupsize
    groups = np.array_split(individual, (len(individual)/groupsize))

    # iterate over groups and calculate scores for the different parameters
    scores = []
    for group_ids in groups:
        # get full data for students in this group from pd dataframe
        group = students.loc[students['ID'].isin(group_ids)]

        # get individual scores for parameters
        language_score = evaluate_language(group)
        major_score = evaluate_majors(group)
        ambition_score = evaluate_ambition(group)
        place_score = evaluate_meeting_place(group)
        gender_score = evaluate_gender(group)
        friend_score = evaluate_friends(group)
        personality_score = evaluate_personality(group)
        day_score = evaluate_meeting_day(group)

        # formula for adding and weighting different scores
        scores.append(language_score+major_score+ambition_score+place_score+gender_score+friend_score+personality_score+day_score)

    #Convert to series to calculate mean more easily
    return pd.Series(scores).mean()
    

Now, you have to code a crossover function, which takes 2 individuals based on their fitness function and produces a child from them. Use the tournament selection for parent selection.
For the crossover, we want you to use the uniform crossover function with random templates. 

In [11]:
def tournament_selection(population, tournament_size):
    

    #should return 2 parents as a list
    pass


Use a boolean template in the length of the individual, this can be hardcoded or generated randomly every time to add more variance. On the places where the template is true use the genes from parent1, then take all the genes from parent2 that are not used and add them to the empty places in the child in the same order as they appear on parent2

In [12]:
def uniform_order_crossover(parent1, parent2, template):
    pass

The last thing you need is the mutation function. It should take the individual produced by the crossover function and mutate it with a chance of for example 5%. A mutation switches the assigned groups of 2 people. 

In [None]:
def mutation(individual, mutation_rate):
    pass


You can now execute the code below and see if everything is working.

In [7]:
#episodes is the number of episodes after the algorithm stops
#num_replace is the number of unfit individuals that will be replaced
def genetic_algorithm(episodes, num_replace): 
    
    
    
    #visualization:
    pass

In [None]:
genetic_algorithm(20,2)

## Task 2

Play around with the different values like initial population size, mutation rate, fitness function, and number of students and observe when it works the best. Write your insights down here. 