Skip to content

The ETL process of creating a dataset containing salary, demographic, position, and other information of professors at the University of Arizona.

Notifications You must be signed in to change notification settings

drewscottt/UArizona-RMP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

Description

The included notebook contains the steps I took in the ETL process for my final project in ISTA 322 (Data Engineering) at University of Arizona in Fall 2021. The project's goal was to merge two datasets (a public salary dataset of UArizona employees and the RateMyProfessor information for all UArizona professors) and load them into a database to allow for easy access. A few sample queries have been included at the end of the notebook to show what sorts of queries can be asked.

Usage

Connecting to the database:

import mysql.connector as mysql
conn = mysql.connect(host="54.184.86.6", database="rmp_ista", user="select_rmp", password="select_rmp")

Tables:

  • Professor
+---------------------+--------------+------+-----+---------+-------+  
| Field               | Type         | Null | Key | Default | Extra |  
+---------------------+--------------+------+-----+---------+-------+  
| FirstName           | varchar(50)  | YES  |     | NULL    |       |  
| LastName            | varchar(50)  | YES  |     | NULL    |       |  
| ProfID              | int          | NO   | PRI | NULL    |       |  
| NumRatings          | int          | YES  |     | NULL    |       |  
| AverageRating       | double       | YES  |     | NULL    |       |  
| Title               | varchar(100) | YES  |     | NULL    |       |  
| AcutalSalary        | int          | YES  |     | NULL    |       |  
| FTESalary           | int          | YES  |     | NULL    |       |  
| Department          | varchar(100) | YES  |     | NULL    |       |  
| InferredGender      | varchar(10)  | YES  |     | NULL    |       |  
| EthnicityConfidence | double       | YES  |     | NULL    |       |  
| InferredEthnicity   | varchar(10)  | YES  |     | NULL    |       |  
+---------------------+--------------+------+-----+---------+-------+    
  • Review
+---------------+---------------+------+-----+---------+-------+  
| Field         | Type          | Null | Key | Default | Extra |  
+---------------+---------------+------+-----+---------+-------+  
| ReviewID      | int           | NO   | PRI | NULL    |       |  
| ClarityRating | double        | YES  |     | NULL    |       |  
| CourseID      | varchar(20)   | YES  |     | NULL    |       |  
| Comment       | varchar(5000) | YES  |     | NULL    |       |  
| ReviewDate    | date          | YES  |     | NULL    |       |  
| EasyRating    | double        | YES  |     | NULL    |       |  
| HelpfulRating | double        | YES  |     | NULL    |       |  
| OverallRating | double        | YES  |     | NULL    |       |  
| CourseGrade   | varchar(20)   | YES  |     | NULL    |       |  
| ProfID        | int           | YES  |     | NULL    |       |  
| CoursePercent | double        | YES  |     | NULL    |       |  
+---------------+---------------+------+-----+---------+-------+  
  • Department
+-----------------+--------------+------+-----+---------+-------+  
| Field           | Type         | Null | Key | Default | Extra |  
+-----------------+--------------+------+-----+---------+-------+  
| Department      | varchar(100) | NO   | PRI | NULL    |       |  
| NumProfessors   | int          | YES  |     | NULL    |       |  
| AvgRating       | double       | YES  |     | NULL    |       |  
| AvgFTESalary    | double       | YES  |     | NULL    |       |  
| TotalProfPay    | int          | YES  |     | NULL    |       |  
| Percentwhite    | double       | YES  |     | NULL    |       |  
| Percentblack    | double       | YES  |     | NULL    |       |  
| Percentapi      | double       | YES  |     | NULL    |       |  
| Percentaian     | double       | YES  |     | NULL    |       |  
| Percent2prace   | double       | YES  |     | NULL    |       |  
| Percenthispanic | double       | YES  |     | NULL    |       |  
| Percentnonwhite | double       | YES  |     | NULL    |       |  
+-----------------+--------------+------+-----+---------+-------+  
  • Course
+----------------+--------------+------+-----+---------+-------+  
| Field          | Type         | Null | Key | Default | Extra |  
+----------------+--------------+------+-----+---------+-------+  
| CourseID       | varchar(20)  | NO   | PRI | NULL    |       |  
| AvgEasy        | double       | YES  |     | NULL    |       |  
| AvgHelpful     | double       | YES  |     | NULL    |       |  
| AvgClarity     | double       | YES  |     | NULL    |       |  
| AvgOverall     | double       | YES  |     | NULL    |       |  
| MostCommonProf | int          | YES  |     | NULL    |       |  
| AvgGrade       | double       | YES  |     | NULL    |       |  
| NumRatings     | int          | YES  |     | NULL    |       |  
| ProfID         | int          | YES  |     | NULL    |       |  
| Department     | varchar(100) | YES  |     | NULL    |       |  
+----------------+--------------+------+-----+---------+-------+  

About

The ETL process of creating a dataset containing salary, demographic, position, and other information of professors at the University of Arizona.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published