# Presentation: Analysis of PolyRatings Reviews

 <center> <h4> Robert Hensley - DATA 301 Final Project </h4> </center>

<img src="img/website.jpg" />

## N. Purpose

What is Polyratings? (for the uninitiated)


** Is Polyratings Dead? **

Within the last year, Polyratings has appeared to be dormant in terms of support by the moderators. The last significant change to the website occurred over a year ago, introducing a new polished design. There however hasn’t been any significant updates to the website since this redesign. In fact, although users are still able to write reviews about existing professors, the site is no longer accepting new professor submissions. There is no way to contact the current moderators of the site, as the email for reporting bugs (errors@polyratings.com) appears to be down.
- [Polyratings revamped site (September 2016)](https://www.reddit.com/r/CalPoly/comments/53j9uu/polyratings_got_a_massive_design_update_check_it/)
- [Discussion on the state of Polyratings](https://www.reddit.com/r/CalPoly/comments/8pn0t0/is_there_any_way_to_fix_polyratings/)
- [Discussion about the lack of moderator support](https://www.reddit.com/r/CalPoly/comments/8gu309/scouting_profsno_info_on_polyratings/)

Polyratings is far from dead though. 

<img src="img/pass_plebs.jpg" >

** Combating Toxicity ** 

<center> <img src="img/toxic.jpg" /> </center>

## I. Collection

Polyratings is a relatively small website in terms of popularity given it's a review site specific to one school (a google search for "Polyratings" yields only 11,300 results). Therefore, it is understandable that the website does not come with a developer API. 

In order to collect reviews, I created a web scraping function with Beautiful Soup, iterating through the Polyratings directory of teacher profiles and scraping each profile individually. The HTML was a bit messy, but the HTML structure of each page was fortunately consistent across each profile. So I scraped each line of HTML in the file and created an array of these lines, and used offsets (indexes) to collect the following information:

- Class Information
    - class (full name)
    - subject abbreviation
    - class number (or level, stored as an int)


- Review Information:
    - the review contents (as a large string)
    - the month of the review (string)
    - the year of the review (as an int)


- Student Information:
    - grade in the class
        - stored as a string (to account for credit/no credit/withdrawn classes)
        - also stored as a float (gpa) with the corresponding grade points:
            - A: 4.0
            - B: 3.0
            - C: 2.0
            - D: 1.0
            - F: 0.0
            - Withdrawn/Credit/No Credit: Nan
    - the reviewer’s academic standing (as an integer and string)
        - Freshman: 0
        - Sophomore: 1
        - Junior: 2
        - Senior: 3
        - 5th/6th Year Senior: 4
        - Graduate Student: 5
    - major of student (is the class major required? is it for a G.E.)


- Teacher Information:
    - Name
    - Field of Study
    - Material Presentation Score (float)
    - Overall Rating (float)


Some of the offsets were fixed, such as the teacher’s name and score (it remained the same across every html file). Others were variable depending on the amount of classes the teacher offered and the lengths or reviews. To collect this data from the page, I used delimiters to separate the content. For example, when reading a review in, I could tell if I reached the end of a review if the next line was the name of a class or the school standing of the next reviewer. 

Here’s an example of one what one of my DataFrames looked like. This is the profile contents of the revered Chemistry teacher [Professor Snape](http://polyratings.com/eval.php?profid=3485):

<center> <img src="img/snape.jpg" /> </center>

In [3]:
import pandas as pd

pd.read_csv("polyratings_profiles/snape_severus.csv")

Unnamed: 0.1,Unnamed: 0,class,class_abrv,class_number,review_content,review_month,review_year,student_gpa,student_grade,student_major,student_rank,student_standing,teacher_difficulties,teacher_field,teacher_name,teacher_presentaion,teacher_rating
0,0,CHEM 101,CHEM,101,"Snape, Snape, Severus Snape.",May,2014,3.0,B,General Ed,0,Freshman,2.33,Chemistry and Biochemistry,"Snape, Severus",3.33,3.17
1,1,CHEM 110,CHEM,110,This professor is the most amazing teacher in ...,Mar,2014,4.0,A,Elective,0,Freshman,2.33,Chemistry and Biochemistry,"Snape, Severus",3.33,3.17
2,2,CHEM 124,CHEM,124,I tried really hard in his class but he seemed...,May,2015,1.0,D,Required (Major),0,Freshman,2.33,Chemistry and Biochemistry,"Snape, Severus",3.33,3.17
3,3,CHEM 124,CHEM,124,"Very cramped handwriting, difficult to read on...",Jan,2018,2.0,C,Required (Support),0,Freshman,2.33,Chemistry and Biochemistry,"Snape, Severus",3.33,3.17
4,4,CHEM 125,CHEM,125,Coming in to the class I had heard some things...,Jan,2015,2.0,C,Elective,1,Sophomore,2.33,Chemistry and Biochemistry,"Snape, Severus",3.33,3.17
5,5,CHEM 125,CHEM,125,I really wanted to get him for defense against...,May,2016,3.0,B,Required (Major),1,Sophomore,2.33,Chemistry and Biochemistry,"Snape, Severus",3.33,3.17
6,6,CHEM 129,CHEM,129,"Prof. Snape knows his chemistry, but seems per...",Jun,2015,2.0,C,Required (Support),0,Freshman,2.33,Chemistry and Biochemistry,"Snape, Severus",3.33,3.17
7,7,CHEM 202,CHEM,202,Professor Snape is a fantastic professor. Bare...,Jan,2018,4.0,A,General Ed,1,Sophomore,2.33,Chemistry and Biochemistry,"Snape, Severus",3.33,3.17
8,8,CHEM 211,CHEM,211,"He can teach you how to bewitch the mind, and ...",Mar,2014,3.0,B,General Ed,0,Freshman,2.33,Chemistry and Biochemistry,"Snape, Severus",3.33,3.17
9,9,CHEM 216,CHEM,216,I miss him. Always.,Jan,2016,3.0,B,Required (Major),1,Sophomore,2.33,Chemistry and Biochemistry,"Snape, Severus",3.33,3.17


** Error Handling **

I wrote a csv for each professor when scraping the content before concatenating the csv’s into one large csv. Although this made the collection process slightly long, it allowed me to isolate web-scraping errors. Problems I encountered included:
- invalid profile pages (pages that initially existed but were taken down)
- teachers that had N/A ratings (there were few of these, I filtered them out because they would not be useful for my machine learning model)
- ‘\r’, the unknown character
    - this caused some of my DataFrame rows to become fragmented and unreadable
    - I replaced these characters with a blank character before adding the string to the DataFrame
- pages with only one review had slightly different offsets

** Statistics **

- the process of collecting merging all of the reviews took approximately 30 minutes
    - I had to sleep for 0.1 seconds between each profile scrape because I initially flooded the website with requests without this implemented
    
    
- there are 2,478 visible profiles
- I was able to collect 65,020 reviews from PolyRatings (creating a csv file 44.1 mb)

## II. Analysis

** Site Use Over Time **

<img src="img/pr_usage.png" > <img src="img/pr_usage_bar.png" >

Notice the spike of reviews written around late 2016. This was around the time that the website was redesigned (September 2016), which probably revitalized use of the site.

** Demographics **



## III. Machine Learning  

## IV. Conclusions

<img src="img/logo_p_small.png" />