# University of Waterloo's Engineering Program Classifier

__Check Out the Program Classifier in Action on the [UW Engineering Placement Quiz!](https://waterlooengineeringquiz.herokuapp.com/)__

For context or for more information on the steps leading up to model building or details on how the model was used or deployed please refer to: LINK HERE FOR THE OTHER PAGE.

As a quick summary: the goal was to build the best possible performing model that could classify and rank engineering programs for highschool students. Students would take a quiz as to which of the University of Waterloo's 15 engineering programs could be suitable for them and receive a ranked list based on their responses.

Existing UW undergraduate students and Alumni were surveyed. In total, twenty-two questions were asked. During the data collection phase totalling 1.5 months, approximately 1650 responses from current students and alumni, from all 15 UWaterloo Engineering programs, were collected. Once the data was cleaned (all “No” responses to “are you happy with what your program provided you with?” question, were removed), 1300 rows of usable data remained.

## Model Building Process:
In efforts to ensure that the best performing model is selected for our machine learning algorithm, several different models were closely compared. A few different scoring methods were established as part of an evaluation scheme that summarized the performance of each model built.

Testing and training datasets were created using the data collected from the survey. Initially, a large dataset was created using the lowest number of data points in each program-gender combination (e.g. Management-female) for students that indicated that they were “happy” with their program. This large dataset was then split in half to create the unique training and testing datasets with 7 data points for each program-gender combination. 
To ensure a local maximum is not chosen as the best model, several variations of models were built and scored. Each unique dataset that was used to build models is defined by the term “hypothesis”. There were 96 hypotheses created and they varied based on the questions asked and the datasets used. There were 51 unique models built for each of the 96 hypotheses. The models varied with respect to the type of encoding, model type, and the label classifier used. A breakdown of the sixteen types of models being built can be seen below.

<img src="images/uw_placement_quiz_classifier_model_tree.png">

The three models explored for this quiz are Naive Bayes, logistic regression, and support vector machines. Naive Bayes works really well when there is limited training data available for the model to use but it has the disadvantage of assuming that all the features, in our case the questions, are independent from each other. Logistic Regression is known for its simplicity to implement and its adaptability to changes in the feature set but it can have poor predictive performance. Support Vector Machine is known for working well when there is a clear distinction between classes and it doesn’t perform well if there is a lot of noise in the dataset. 

When creating new hypotheses, an iterative process was used; the results of previous high-scoring models were analyzed,then various modifications were performed to attempt improvement.


## Scoring Models

Several metrics were used in combination to determine the best scoring model implemented in the quiz. The five scoring methods used were Accuracy, Reciprocal Rank - All Programs, t3 Rank, Reassignment Rate, and Top X Rate. 
Accuracy was defined as how often a model outputs a happy student’s program as it’s the number one recommendation. For example, if the model was being trained with data from a happy Civil Engineering student, then the model would be 100% accurate if it outputted Civil as it’s top recommendation.
Reciprocal rank measures where in the list of all fifteen recommended programs, a student's actual program is located. The closer to the number one spot, the more accurate the model. The farther away, the less accurate. The calculation for it is shown in Formula 1 below:

<img src="images/uw_placement_quiz_classifier_reciprocal_rank.png">


The t3 rank is similar to the Reciprocal Rate but it focuses on the top 3 recommendations as shown in Formula 2 below. 

<img src="images/uw_placement_quiz_classifier_t3_rate.png">

The Reassignment Rate uses the data points from the survey consisting of students who reported being “unhappy” with their program. It determines the rate of how often the model accurately places the student into a different program. Although this is an important metric to have, it is also important to remember that it is possible for a student to be unhappy in a program that is best suited for them. For that reason, it is not expected that the model will place every single unhappy student into a different program. 
Finally, the Top X rate, where X is an integer from 1-15, was calculated to determine how often the student’s program is within the top X recommendations. For example, in a Top 3 score if Chemical Engineering for a student was in the top 3 recommendations, it would give it a score of 100% but if it was ranked 4-15, it would have a score of 0 for that datapoint.
In total, 4896 models were built and scored. A Naive Bayes model was selected as the most accurate model. This model was built using Naive Bayes, one-hot encoding,and a multi-label classifier. It was trained using 100 responses for each program, except for Computer Engineering which was lowered to 40 responses to adjust a bias that existed. The summary of scores for this model are outlined in Table 1 below. 

<table>
 <tr>
 <td>Reciprocal Rank</td>
 <td>0.643 </td>
 </tr>
 <tr>
 <td>t3</td>
 <td>0.606 </td>
 </tr>
 <tr>
 <td>Reassignment Rate</td>
 <td>0.833 </td>
 </tr>
 <tr>
 <td>Accuracy</td>
 <td>0.462</td>
 </tr>
 <tr>
 <td>Top 2</td>
 <td>0.681</td>
 </tr>
 <tr>
 <td>Top 3</td>
 <td>0.786 </td>
 </tr>
 <tr>
 <td>Top 4</td>
 <td>0.838 </td>
 </tr>
 <tr>
 <td>Top 5</td>
 <td>0.890 </td>
 </tr>
 <tr>
 <td>Top 10</td>
 <td>0.967 </td>
 </tr> 
 <tr>
 <td>Top 15</td>
 <td>1.000</td>
 </tr>     
</table>



## Evaluating the Model

After selecting the best performing model, an extensive verification process was completed. This process included verifying that the model did not have a gender bias or distribution bias and validating that the quiz was accurately predicting a students’ programs. 

__Gender Bias__: The graph seen below was created by running the testing dataset through the selected model. This was done to ensure that the model selected did not have a gender bias. Two specific criteria were being checked: that the model was not placing certain genders into certain programs, and that the number of males being recommended a program was similar to the number of females being recommended that same program. As seen in Figure 1, the selected model satisfies this criteria.

<img src="images/uw_engineering_placement_qui_gender_bias.png">

__Program Bias__: To visualize the bias in the model regarding programs, the testing dataset consisting of an even ratio of responses from each program was used to generate a confusion matrix. One of the major complaints with respect to the pre-existing solution, was its bias towards the programs it recommended. This bias is seen in the confusion matrix developed using the old quiz’s testing dataset, shown below. Students of all programs are primarily recommended Civil, Mechanical, Software, and Mechatronics Engineering while some programs like Geological and Nanotechnology Engineering were not recommended. 

<img src="images/uw_engineering_quiz_old_team_matrix.png">

Below is a confusion matrix for the selected model which was built using the new testing dataset. The strong diagonal line indicates that the model does not have a significant bias towards any single program and there is a relatively even distribution with respect to the programs being recommended. Some outliers like the six Architectural engineers being recommended the School of Architecture can be explained by the similarities between the two programs, and the limited data collected from the program. While data balancing was performed by replicating data points, it can still contribute to the bias due to lack of information.


<img src="images/uw_engineering_placement_quiz_confusion_matrix.png">

Below is a heatmap depicting where each program is being ranked in the list of recommended programs (i.e. at the top, middle, or end of the list). A Reciprocal Rank score is used to calculate the values shown in this heatmap, meaning that the closer the value is to 1.00, the more likely it is to be a top recommendation. If the row of students that are actually in Biomedical Engineering (BMED) is used as an example, the scores of 0.8 for the prediction of BMED and 0.79 for Systems Design Engineering (SYDE) can be seen. This shows that BMED is most often ranked first in the list recommendations, while SYDE follows as ranking second. This is synonymous with research indicating several similarities between the programs.


<img src="images/uw_engineering_placement_quiz_ranking_heatmap.png">


__Ranking Performance__: Several models were analyzed to understand where in the list of recommended programs, the student’s actual program was placed. In the chart below, these values are displayed for the selected model (in purple) and three other models (various greys).

<img src="images/uw_engineering_placement_quiz_scores.png">

This graph was used to validate that displaying five recommendations to the user was most optimal. As seen in the figure, the Top X scores of the selected model start to flatten out after Top 5. This is represented with a purple dot on the graph, and has a value of 89%. This indicates that 89% of the time, a student’s actual program was listed as one of their top five recommendations.