## **University of Illinois Chicago**
CS 418 - Fall 2024 Team 5

## **Data-Driven Course Insights: Predicting Grade Trends**

## **Authors:**
| **Name**  | **Email** | **Github Handle** |
|---|---|---|
| Arlette Diaz | adiaz218@uic.edu | adiaz218 |
| Marianne Hernandez | mhern85@uic.edu | marhern19 |
| Nandini Jirobe | njiro2@uic.edu | nandinijirobe |
| Sharadruthi Muppidi | smuppi2@uic.edu | sharadruthi-uic |
| Sonina Mut | smut3@uic.edu | snina22 |
| Yuting Lu | lyuti@uic.edu | yutinglu103 |

**Github Repository Link: https://github.com/cs418-fa24/project-check-in-team-5**

## **Project Description**

This project is to predict course grade distributions and popularity rankings for upcoming semesters, enabling students to make informed decisions about their class selections. By shifting the focus from individual grade predictions to overall course outcomes, the project provides insights into course grading trends and demand. It uses clustering to rank courses based on student performance and popularity, and topic-based grouping to help students discover courses aligned with their interests, factoring in professor expertise and class attributes. This data-driven tool uncovers hidden patterns, aiding both students and academic planning.

## **Project Update**

In [2]:
import sys
python_loc = sys.executable

!{python_loc} -m pip install pandas
!{python_loc} -m pip install scikit-learn
!{python_loc} -m pip install matplotlib
!{python_loc} -m pip install seaborn



In [3]:
# import useful libraries
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns

### **Part 1: Load Datasets**

In [4]:
# Grade distribution data 
cs_grades = pd.read_csv('uic_GD_CS_14_24.csv')
meie_grades = pd.read_csv('uic_GD_MEIE_14_24.csv')

# Rate My Professor Data
cs_rmp = pd.read_csv('uic_RMP_CS_14_24.csv')
meie_rmp = pd.read_csv('CS418_Team5_DataSet - RMP_MEIE_14_24.csv')

# Google Scholar Data
cs_gs = pd.read_csv('CS418_Team5_DataSet - GS_CS_14_24.csv')
meie_gs = pd.read_csv('CS418_Team5_DataSet - GS_MEIE_14_24.csv')

# Lecture Data
cs_lectures = pd.read_csv('uic_CS_lectures_all_semesters.csv')
me_lectures = pd.read_csv('uic_ME_lectures_all_semesters.csv')
ie_lectures = pd.read_csv('uic_IE_lectures_all_semesters.csv')

# Course Description Data
cs_descrip = pd.read_csv('CS418_Team5_DataSet - CS_Descrip.csv')

### **Part 2: Data Cleaning**

#### **Dataset 1 - Grade Distribution**

In [5]:
# Grade distribution data cleaning
# Drop columns where all values are zero
cs_grades = cs_grades.loc[:, (cs_grades != 0).any(axis=0)]
meie_grades = meie_grades.loc[:, (meie_grades != 0).any(axis=0)]

# Drop rows where CRS TITLE (course title) contains "research" or "seminar" (case-insensitive)
cs_grades = cs_grades[~cs_grades['CRS TITLE'].str.contains("research|seminar", case=False, na=False)]
meie_grades = meie_grades[~meie_grades['CRS TITLE'].str.contains("research|seminar", case=False, na=False)]

# Convert all numeric columns to integers or floats
for col in cs_grades.columns:
    cs_grades[col] = pd.to_numeric(cs_grades[col], errors='ignore')

for col in meie_grades.columns:
    meie_grades[col] = pd.to_numeric(meie_grades[col], errors='ignore')

# Save the cleaned data to a new CSV file
cs_grades.to_csv("uic_GD_CS_14_24.csv", index=False)
meie_grades.to_csv("uic_GD_MEIE_14_24.csv", index=False)

  cs_grades[col] = pd.to_numeric(cs_grades[col], errors='ignore')
  meie_grades[col] = pd.to_numeric(meie_grades[col], errors='ignore')


#### **Dataset 2.1 - Rate My Professor - computer Science Department**

This cleaning data shows the result of joining two files: a CS grade distribution file (uic_GD_CS_14_24.csv), which contains course details and instructors and a Rate My Professors (RMP) file (uic_RMP_CS_14_24.csv), which includes ratings and the number of reviews for each instructor. Each row represents a record of the course titled along with information about instructors and their ratings.

In [6]:
cs_grades.rename(columns={'Primary Instructor': 'Instructor'}, inplace=True)

# Filter for courses
cs_grades = cs_grades[cs_grades['CRS NBR'].between(100, 599)]

merged_data = pd.merge(cs_grades, cs_rmp, on='Instructor', how='left')

# Fill missing values with "N/A" for NULL columns
merged_data['Rating'] = merged_data['Rating'].fillna("N/A")
merged_data['Num Reviews'] = merged_data['Num Reviews'].fillna("N/A")
merged_data[['CRS SUBJ CD', 'CRS TITLE', 'Instructor']] = merged_data[['CRS SUBJ CD', 'CRS TITLE', 'Instructor']].fillna("N/A")

# Select relevant columns and sort by course number (CRS NBR)
result_data = merged_data[['CRS SUBJ CD', 'CRS NBR', 'CRS TITLE', 'Instructor', 'Rating', 'Num Reviews']]
result_data = result_data.sort_values(by=['CRS NBR'])

# print(tabulate(result_data, headers='keys', tablefmt='fancy_grid', showindex=False))
print(result_data.head(20).to_string(index=False))

CRS SUBJ CD  CRS NBR                    CRS TITLE              Instructor Rating Num Reviews
         CS      100 Discovering Computer Science            Reed, Dale F    3.5       128.0
         CS      100 Discovering Computer Science                       ,    N/A         N/A
         CS      100 Discovering Computer Science            Bell, John T    2.5       117.0
         CS      100 Discovering Computer Science         Kidane, Ellen G    1.9       108.0
         CS      100 Discovering Computer Science   Hogan, Douglas Joseph    2.6        34.0
         CS      100 Discovering Computer Science            Reed, Dale F    3.5       128.0
         CS      100 Discovering Computer Science            Reed, Dale F    3.5       128.0
         CS      100 Discovering Computer Science         Kidane, Ellen G    1.9       108.0
         CS      100 Discovering Computer Science          Parker, Kendal    N/A         N/A
         CS      100 Discovering Computer Science            Bell, Joh

#### **Dataset 2.2 - Rate My Professor - Mechanical & Industrial Engineering Department**

This cleaning data shows the result of joining two files: a MEIE grade distribution file(uic_GD_MEIE_14_24.csv), which contains course details and instructors and a Rate My Professors (RMP) file(CS418_Team5_DataSet - RMP_MEIE_14_24.csv), which includes ratings and the number of reviews for each instructor. Each row represents a record of the course titled along with information about instructors and their ratings.

In [9]:
meie_grades.rename(columns={'Primary Instructor': 'Instructor'}, inplace=True)

# Filter for courses
meie_grades = meie_grades[meie_grades['CRS NBR'].between(100, 599)]

merged_data = pd.merge(meie_grades, meie_rmp, on='Instructor', how='left')

# Fill missing values with "N/A" for Null columns
merged_data['Rating'] = merged_data['Rating'].fillna("N/A")
merged_data['Num Reviews'] = merged_data['Num Reviews'].fillna("N/A")
merged_data[['CRS SUBJ CD', 'CRS TITLE', 'Instructor']] = merged_data[['CRS SUBJ CD', 'CRS TITLE', 'Instructor']].fillna("N/A")

# Select relevant columns then sort them by course number
result_data = merged_data[['CRS SUBJ CD', 
                           'CRS NBR', 
                           'CRS TITLE', 
                           'Instructor', 
                           'Rating', 
                           'Num Reviews']]
result_data = result_data.sort_values(by=['CRS NBR'])

# print(tabulate(result_data, headers='keys', tablefmt='fancy_grid', showindex=False))
print(result_data.head(20).to_string(index=False))

CRS SUBJ CD  CRS NBR                      CRS TITLE           Instructor Rating Num Reviews
         IE      118 Energy for Sustainable Society Alonso, Matthew Paul    N/A         N/A
         ME      118 Energy for Sustainable Society Alonso, Matthew Paul    N/A         N/A
         IE      201          Financial Engineering   Banerjee, Prashant    1.8          10
         IE      201          Financial Engineering     Darabi, Houshang    3.3          28
         IE      201          Financial Engineering     Haghighi, Azadeh    4.8           6
         IE      201          Financial Engineering   Banerjee, Prashant    1.8          10
         IE      201          Financial Engineering           Hu, Mengqi    2.2          12
         IE      201          Financial Engineering     Darabi, Houshang    3.3          28
         IE      201          Financial Engineering     Haghighi, Azadeh    4.8           6
         IE      201          Financial Engineering      Anahideh, Hadis    2.5 

#### **Dataset 3 - Class Scheduler Data**

In [10]:
# Function to determine the time of day
def get_time_of_day(start_time):
    if 5 <= start_time.hour < 12:
        return "morning"
    elif 12 <= start_time.hour < 17:
        return "afternoon"
    else:
        return "evening"

# Function to calculate class duration in minutes
def calculate_duration(start_time, end_time):
    duration = end_time - start_time
    return duration.total_seconds() / 60  # Convert seconds to minutes

# All cs lectures
# Lists to store calculated values
times_of_day = []
durations = []

for time_range in cs_lectures['Time']:
    try:
        # Split the time range (e.g., "08:00 AM - 09:15 AM")
        start_str, end_str = time_range.split(" - ")
        
        # Convert to datetime objects
        start_time = datetime.strptime(start_str.strip(), "%I:%M %p")
        end_time = datetime.strptime(end_str.strip(), "%I:%M %p")
        
        # Determine time of day and calculate duration
        times_of_day.append(get_time_of_day(start_time))
        durations.append(calculate_duration(start_time, end_time))
        
    except Exception as e:
        # Handle any parsing errors by setting defaults
        times_of_day.append("unknown")
        durations.append(None)

# Assign the lists directly to the new columns
cs_lectures['Time of Day'] = times_of_day
cs_lectures['Duration of Class (minutes)'] = durations

unwanted_columns = ['Morning', 'Afternoon', 'Evening', 'Duration of Class']
cs_lectures.drop(columns=unwanted_columns, errors='ignore', inplace=True)

cs_lectures.to_csv("uic_CS_lectures_all_semesters.csv", index=False)

# All me lectures
# Lists to store calculated values
times_of_day = []
durations = []

for time_range in me_lectures['Time']:
    try:
        # Split the time range (e.g., "08:00 AM - 09:15 AM")
        start_str, end_str = time_range.split(" - ")
        
        # Convert to datetime objects
        start_time = datetime.strptime(start_str.strip(), "%I:%M %p")
        end_time = datetime.strptime(end_str.strip(), "%I:%M %p")
        
        # Determine time of day and calculate duration
        times_of_day.append(get_time_of_day(start_time))
        durations.append(calculate_duration(start_time, end_time))
        
    except Exception as e:
        # Handle any parsing errors by setting defaults
        times_of_day.append("unknown")
        durations.append(None)

# Assign the lists directly to the new columns
me_lectures['Time of Day'] = times_of_day
me_lectures['Duration of Class (minutes)'] = durations

unwanted_columns = ['Morning', 'Afternoon', 'Evening', 'Duration of Class']
me_lectures.drop(columns=unwanted_columns, errors='ignore', inplace=True)

me_lectures.to_csv("uic_ME_lectures_all_semesters.csv", index=False)

# All ie lectures
# Lists to store calculated values
times_of_day = []
durations = []

for time_range in ie_lectures['Time']:
    try:
        # Split the time range (e.g., "08:00 AM - 09:15 AM")
        start_str, end_str = time_range.split(" - ")
        
        # Convert to datetime objects
        start_time = datetime.strptime(start_str.strip(), "%I:%M %p")
        end_time = datetime.strptime(end_str.strip(), "%I:%M %p")
        
        # Determine time of day and calculate duration
        times_of_day.append(get_time_of_day(start_time))
        durations.append(calculate_duration(start_time, end_time))
        
    except Exception as e:
        # Handle any parsing errors by setting defaults
        times_of_day.append("unknown")
        durations.append(None)

# Assign the lists directly to the new columns
ie_lectures['Time of Day'] = times_of_day
ie_lectures['Duration of Class (minutes)'] = durations

unwanted_columns = ['Morning', 'Afternoon', 'Evening', 'Duration of Class']
ie_lectures.drop(columns=unwanted_columns, errors='ignore', inplace=True)

ie_lectures.to_csv("uic_IE_lectures_all_semesters.csv", index=False)

#### **Dataset 4 - Google Scholar**

### **Part 3: Exploratory Data Analysis**

### **Part 4: Data Visualizations**

### **Part 5: Machine Learning Analysis**

## **Reflection**

**What is the hardest part of the project that you’ve encountered so far?**


<br>**What are your initial insights?**


<br>**Are there any concrete results you can show at this point? If not, why not?**


<br>**Going forward, what are the current biggest problems you’re facing?**


<br>**Do you think you are on track with your project? If not, what parts do you need to dedicate more time to?**


<br>**Given your initial exploration of the data, is it worth proceeding with your project, why? If not, how are you going to change your project and why do you think it’s better than your current results?**



## **Roles/Coordination (important)**

**Arlette Diaz:** 
* Text

<br>**Marianne Hernandez:** 
* Text

<br>**Nandini Jirobe:** 
* Collected Rate My Professor ratings for professors who taught Mechanical and Industrial Enginnering classes from 2014-2024
* Collected Rate My Professor ratings for professors in the Computer Science classes from 2014-2024
* Collected Google Scholar research interests of professors who taught Mechanical and Industrial Enginnering classes from 2014-2024
* Collected Google Scholar research interests of professors in the Computer Science classes from 2014-2024
* Collected course description data for computer science courses taught at UIC. 

<br>**Sharadruthi Muppidi:** 
* Text

<br>**Sonina Mut:** 
* Collected UIC Grade Distribution for professors who taught Mechanical and Industrial Enginnering classes from 2014-2024
* Collected UIC Grade Distribution for professors in the Computer Science classes from 2014-2024
* Collected Rate My Professor ratings for professors who taught Mechanical and Industrial Enginnering classes from 2014-2024
* Collected Rate My Professor ratings for professors in the Computer Science classes from 2014-2024

<br>**Yuting Lu:** 
* Text

## **Next Steps**