## **University of Illinois Chicago**
CS 418 - Fall 2024 Team 5

## **Data-Driven Course Insights: Predicting Grade Trends**

## **Authors:**
| **Name**  | **Email** | **Github Handle** |
|---|---|---|
| Arlette Diaz | adiaz218@uic.edu | adiaz218 |
| Marianne Hernandez | mhern85@uic.edu | marhern19 |
| Nandini Jirobe | njiro2@uic.edu | nandinijirobe |
| Sharadruthi Muppidi | smuppi2@uic.edu | sharadruthi-uic |
| Sonina Mut | smut3@uic.edu | snina22 |
| Yuting Lu | lyuti@uic.edu | yutinglu103 |

**Github Repository Link: https://github.com/cs418-fa24/project-check-in-team-5**

## **Project Description**

This project is to predict course grade distributions and popularity rankings for upcoming semesters, enabling students to make informed decisions about their class selections. By shifting the focus from individual grade predictions to overall course outcomes, the project provides insights into course grading trends and demand. It uses clustering to rank courses based on student performance and popularity, and topic-based grouping to help students discover courses aligned with their interests, factoring in professor expertise and class attributes. This data-driven tool uncovers hidden patterns, aiding both students and academic planning.

## **Project Update**

### **Import Packages**

In [8]:
import sys
python_loc = sys.executable

!{python_loc} -m pip install pandas
!{python_loc} -m pip install scikit-learn
!{python_loc} -m pip install matplotlib
!{python_loc} -m pip install seaborn



In [9]:
# import useful libraries
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import seaborn as sns

from tabulate import tabulate # 'pip install tabulate' if you haven't install this library

### **Part 1: Load Datasets**

In [10]:
# Grade distribution data 
cs_grades = pd.read_csv('uic_GD_CS_14_24.csv')
meie_grades = pd.read_csv('uic_GD_MEIE_14_24.csv')

# Rate My Professor Data
cs_rmp = pd.read_csv('uic_RMP_CS_14_24.csv')
meie_rmp = pd.read_csv('CS418_Team5_DataSet - RMP_MEIE_14_24.csv')

# Google Scholar Data
cs_gs = pd.read_csv('CS418_Team5_DataSet - GS_CS_14_24.csv')
meie_gs = pd.read_csv('CS418_Team5_DataSet - GS_MEIE_14_24.csv')

# Lecture Data
cs_lectures = pd.read_csv('uic_CS_lectures_all_semesters.csv')
me_lectures = pd.read_csv('uic_ME_lectures_all_semesters.csv')
ie_lectures = pd.read_csv('uic_IE_lectures_all_semesters.csv')

# Course Description Data
cs_descrip = pd.read_csv('CS418_Team5_DataSet - CS_Descrip.csv')

In [11]:
cs_lectures.head(5)

# print(cs_lectures['Method'].unique())






Unnamed: 0,Course Code,Course Title,CRN,Section Type,Time,Days,Instructor,Method,Semester,Year
0,CS 100,Discovering Computer Science,17397.0,LCD,02:00 PM - 02:50 PM,MWF,"Reed, D",,Spring,2014
1,CS 107,Introduction to Computing and Programming,17412.0,LEC,12:30 PM - 01:45 PM,TR,"Theys, M",,Spring,2014
2,CS 109,C/C ++ Programming for Engineers with MatLab,19466.0,LCD,02:00 PM - 02:50 PM,MW,"Hummel, J",,Spring,2014
3,CS 111,Program Design I,34013.0,LCD,02:00 PM - 03:15 PM,TR,"Troy, P",,Spring,2014
4,CS 141,Program Design II,34447.0,LCD,01:00 PM - 01:50 PM,MWF,"Reed, D",,Spring,2014


### **Part 2: Data Cleaning**

#### **Dataset 2 - Rate My Professor - computer Science Department **

In [12]:

cs_grades.rename(columns={'Primary Instructor': 'Instructor'}, inplace=True)

# Filter for courses
cs_grades = cs_grades[cs_grades['CRS NBR'].between(100, 599)]

merged_data = pd.merge(cs_grades, cs_rmp, on='Instructor', how='left')

# Fill missing values with "N/A" for NULL columns
merged_data['Rating'] = merged_data['Rating'].fillna("N/A")
merged_data['Num Reviews'] = merged_data['Num Reviews'].fillna("N/A")
merged_data[['CRS SUBJ CD', 'CRS TITLE', 'Instructor']] = merged_data[['CRS SUBJ CD', 'CRS TITLE', 'Instructor']].fillna("N/A")

# Select relevant columns and sort by course number (CRS NBR)
result_data = merged_data[['CRS SUBJ CD', 'CRS NBR', 'CRS TITLE', 'Instructor', 'Rating', 'Num Reviews']]
result_data = result_data.sort_values(by=['CRS NBR'])

print(tabulate(result_data, headers='keys', tablefmt='fancy_grid', showindex=False))

╒═══════════════╤═══════════╤════════════════════════════════╤═══════════════════════════════════════╤══════════╤═══════════════╕
│ CRS SUBJ CD   │   CRS NBR │ CRS TITLE                      │ Instructor                            │ Rating   │ Num Reviews   │
╞═══════════════╪═══════════╪════════════════════════════════╪═══════════════════════════════════════╪══════════╪═══════════════╡
│ CS            │       100 │ Discovering Computer Science   │ Reed, Dale F                          │ 3.5      │ 128.0         │
├───────────────┼───────────┼────────────────────────────────┼───────────────────────────────────────┼──────────┼───────────────┤
│ CS            │       100 │ Discovering Computer Science   │ Bell, John T                          │ 2.5      │ 117.0         │
├───────────────┼───────────┼────────────────────────────────┼───────────────────────────────────────┼──────────┼───────────────┤
│ CS            │       100 │ Discovering Computer Science   │ Hogan, Douglas Joseph      

#### **Dataset 2 - Rate My Professor - Mechanical & Industrial Engineering Department**

In [7]:
meie_grades.rename(columns={'Primary Instructor': 'Instructor'}, inplace=True)

# Filter for courses
meie_grades = meie_grades[meie_grades['CRS NBR'].between(100, 599)]

merged_data = pd.merge(meie_grades, meie_rmp, on='Instructor', how='left')

# Fill missing values with "N/A" for Null columns
merged_data['Rating'] = merged_data['Rating'].fillna("N/A")
merged_data['Num Reviews'] = merged_data['Num Reviews'].fillna("N/A")
merged_data[['CRS SUBJ CD', 'CRS TITLE', 'Instructor']] = merged_data[['CRS SUBJ CD', 'CRS TITLE', 'Instructor']].fillna("N/A")

# Select relevant columns then sort them by course number
result_data = merged_data[['CRS SUBJ CD', 
                           'CRS NBR', 
                           'CRS TITLE', 
                           'Instructor', 
                           'Rating', 
                           'Num Reviews']]
result_data = result_data.sort_values(by=['CRS NBR'])

print(tabulate(result_data, headers='keys', tablefmt='fancy_grid', showindex=False))

╒═══════════════╤═══════════╤════════════════════════════════╤═══════════════════════════════╤══════════╤═══════════════╕
│ CRS SUBJ CD   │   CRS NBR │ CRS TITLE                      │ Instructor                    │ Rating   │ Num Reviews   │
╞═══════════════╪═══════════╪════════════════════════════════╪═══════════════════════════════╪══════════╪═══════════════╡
│ IE            │       118 │ Energy for Sustainable Society │ Alonso, Matthew Paul          │ N/A      │ N/A           │
├───────────────┼───────────┼────────────────────────────────┼───────────────────────────────┼──────────┼───────────────┤
│ ME            │       118 │ Energy for Sustainable Society │ Alonso, Matthew Paul          │ N/A      │ N/A           │
├───────────────┼───────────┼────────────────────────────────┼───────────────────────────────┼──────────┼───────────────┤
│ IE            │       201 │ Financial Engineering          │ Hu, Mengqi                    │ 2.2      │ 12            │
├───────────────┼───────

#### **Dataset 3 - class Scheduler Data**

### **Part 3: Exploratory Data Analysis**

### **Part 4: Data Visualizations**

### **Part 5: Machine Learning Analysis**

## **Reflection**

**What is the hardest part of the project that you’ve encountered so far?**


<br>**What are your initial insights?**


<br>**Are there any concrete results you can show at this point? If not, why not?**


<br>**Going forward, what are the current biggest problems you’re facing?**


<br>**Do you think you are on track with your project? If not, what parts do you need to dedicate more time to?**


<br>**Given your initial exploration of the data, is it worth proceeding with your project, why? If not, how are you going to change your project and why do you think it’s better than your current results?**



## **Roles/Coordination (important)**

**Arlette Diaz:** 
* Text

<br>**Marianne Hernandez:** 
* Text

<br>**Nandini Jirobe:** 
* Collected Rate My Professor ratings for professors who taught Mechanical and Industrial Enginnering classes from 2014-2024
* Collected Rate My Professor ratings for professors in the Computer Science classes from 2014-2024
* Collected Google Scholar research interests of professors who taught Mechanical and Industrial Enginnering classes from 2014-2024
* Collected Google Scholar research interests of professors in the Computer Science classes from 2014-2024
* Collected course description data for computer science courses taught at UIC. 

<br>**Sharadruthi Muppidi:** 
* Text

<br>**Sonina Mut:** 
* Collected UIC Grade Distribution for professors who taught Mechanical and Industrial Enginnering classes from 2014-2024
* Collected UIC Grade Distribution for professors in the Computer Science classes from 2014-2024
* Collected Rate My Professor ratings for professors who taught Mechanical and Industrial Enginnering classes from 2014-2024
* Collected Rate My Professor ratings for professors in the Computer Science classes from 2014-2024

<br>**Yuting Lu:** 
* Text

## **Next Steps**