Skip to content

Latest commit

 

History

History
86 lines (82 loc) · 4.86 KB

README.md

File metadata and controls

86 lines (82 loc) · 4.86 KB

Description

The included notebook contains the steps I took in the ETL process for my final project in ISTA 322 (Data Engineering) at University of Arizona in Fall 2021. The project's goal was to merge two datasets (a public salary dataset of UArizona employees and the RateMyProfessor information for all UArizona professors) and load them into a database to allow for easy access. A few sample queries have been included at the end of the notebook to show what sorts of queries can be asked.

Usage

Connecting to the database:

import mysql.connector as mysql
conn = mysql.connect(host="54.184.86.6", database="rmp_ista", user="select_rmp", password="select_rmp")

Tables:

  • Professor
+---------------------+--------------+------+-----+---------+-------+  
| Field               | Type         | Null | Key | Default | Extra |  
+---------------------+--------------+------+-----+---------+-------+  
| FirstName           | varchar(50)  | YES  |     | NULL    |       |  
| LastName            | varchar(50)  | YES  |     | NULL    |       |  
| ProfID              | int          | NO   | PRI | NULL    |       |  
| NumRatings          | int          | YES  |     | NULL    |       |  
| AverageRating       | double       | YES  |     | NULL    |       |  
| Title               | varchar(100) | YES  |     | NULL    |       |  
| AcutalSalary        | int          | YES  |     | NULL    |       |  
| FTESalary           | int          | YES  |     | NULL    |       |  
| Department          | varchar(100) | YES  |     | NULL    |       |  
| InferredGender      | varchar(10)  | YES  |     | NULL    |       |  
| EthnicityConfidence | double       | YES  |     | NULL    |       |  
| InferredEthnicity   | varchar(10)  | YES  |     | NULL    |       |  
+---------------------+--------------+------+-----+---------+-------+    
  • Review
+---------------+---------------+------+-----+---------+-------+  
| Field         | Type          | Null | Key | Default | Extra |  
+---------------+---------------+------+-----+---------+-------+  
| ReviewID      | int           | NO   | PRI | NULL    |       |  
| ClarityRating | double        | YES  |     | NULL    |       |  
| CourseID      | varchar(20)   | YES  |     | NULL    |       |  
| Comment       | varchar(5000) | YES  |     | NULL    |       |  
| ReviewDate    | date          | YES  |     | NULL    |       |  
| EasyRating    | double        | YES  |     | NULL    |       |  
| HelpfulRating | double        | YES  |     | NULL    |       |  
| OverallRating | double        | YES  |     | NULL    |       |  
| CourseGrade   | varchar(20)   | YES  |     | NULL    |       |  
| ProfID        | int           | YES  |     | NULL    |       |  
| CoursePercent | double        | YES  |     | NULL    |       |  
+---------------+---------------+------+-----+---------+-------+  
  • Department
+-----------------+--------------+------+-----+---------+-------+  
| Field           | Type         | Null | Key | Default | Extra |  
+-----------------+--------------+------+-----+---------+-------+  
| Department      | varchar(100) | NO   | PRI | NULL    |       |  
| NumProfessors   | int          | YES  |     | NULL    |       |  
| AvgRating       | double       | YES  |     | NULL    |       |  
| AvgFTESalary    | double       | YES  |     | NULL    |       |  
| TotalProfPay    | int          | YES  |     | NULL    |       |  
| Percentwhite    | double       | YES  |     | NULL    |       |  
| Percentblack    | double       | YES  |     | NULL    |       |  
| Percentapi      | double       | YES  |     | NULL    |       |  
| Percentaian     | double       | YES  |     | NULL    |       |  
| Percent2prace   | double       | YES  |     | NULL    |       |  
| Percenthispanic | double       | YES  |     | NULL    |       |  
| Percentnonwhite | double       | YES  |     | NULL    |       |  
+-----------------+--------------+------+-----+---------+-------+  
  • Course
+----------------+--------------+------+-----+---------+-------+  
| Field          | Type         | Null | Key | Default | Extra |  
+----------------+--------------+------+-----+---------+-------+  
| CourseID       | varchar(20)  | NO   | PRI | NULL    |       |  
| AvgEasy        | double       | YES  |     | NULL    |       |  
| AvgHelpful     | double       | YES  |     | NULL    |       |  
| AvgClarity     | double       | YES  |     | NULL    |       |  
| AvgOverall     | double       | YES  |     | NULL    |       |  
| MostCommonProf | int          | YES  |     | NULL    |       |  
| AvgGrade       | double       | YES  |     | NULL    |       |  
| NumRatings     | int          | YES  |     | NULL    |       |  
| ProfID         | int          | YES  |     | NULL    |       |  
| Department     | varchar(100) | YES  |     | NULL    |       |  
+----------------+--------------+------+-----+---------+-------+