Skip to content

Final Project for W3101 Programming Languages - Python, analyzes culpa reviews for a given culpa.info page. Done with Lusa Zhan and WInston Lin

Notifications You must be signed in to change notification settings

Alex-Fabbri/CulpaReviews

Repository files navigation

Culpa Reviews: Alex Fabbri (arf2145), Winston Lin (wyl2106), Lusa Zhan (lz2371)

To run the program: First, the following modules will be needed to run the program:

pip install -r requirements.txt

python -m nltk.downloader punkt

In terminal, run app.py. This will start the webapp on your local server.

You will get a message similar to this: "* Running on http://0.0.0.0:5000/ (Press CTRL+C to quit) * Restarting with stat".

After opening that site (page.html), you are prompted to enter a link for the Culpa page of a professor. The link must be of the form http://culpa.info/professors/ followed by a number that corresponds to a given professor's page. Here are some test links: http://culpa.info/professors/44 http://culpa.info/professors/2643. Failure to provide a link or providing a link of improper format will result in a popup or an error message on the new page respectively. You may go back and fill in a proper link. Having provided a proper link, you are taken to a results page (results.html). Here you find an initial message with the professor's name. Then we display quotes, if any, from reviews that contain the words 'easy', 'difficult', 'best', 'worst' as these are polar indicator words that can give a sense of the overall views on a professor. Following that, for each review we display the review number followed by how many positive and how many negative words (according to two lists of words) the review contains out of the total number of words. Below that, we say which reviews are reliable and which are not. culpa.info has a feature which allows viewers to agree with a review, disagree with a review, or mark it as funny. There is a counter for each category in each review. If a given review has more disagree votes than agree votes, we alert the user that this review is not very credible. Finally, using the count of positive and negative words in each review from above, we state the ratio of positive to negative reviews among agreeable reviews(those for which more people voted 'agree' than 'disagree') and then the ratio among all reviews(if all reviews were agreeable these are the same).

Some details on the project: This project allowed us to learn how to build a Flask webapp while learning and testing some basic NLP. After learning the basic setup of a flask project we were able to integrate our functions in the app.py file and output the results on the webpage. Beautiful soup and regular expressions were key to getting the data from the reviews for a given webpage. We considered multiple strategies for how to judge the positive/negative nature of a given review. We looked into some known techniques for using NaivesBayes classifiers to say whether a review is positive or negative. We put our tests in Classifiers.py. We made two classifiers, one that judges based on training on just the lists of positive and negative words, and one which uses the corpus of movie review data from nltk. On testing the classifiers on part of the movie review data, they were about 52 and 79 percent accurate respectively. When we attempted to use them on culpa reviews they were often inaccurate, and as a result we ultimately decided to test based on the occurences of words, something more quantitative. Using what open lists of words we could find, we split the reviews into individual words and compared them to the positive and negative words for matches. Note that we do try to account for the words 'not' or 'n't' being in the review through a basic boolean test. Other options like using dependency trees require a much deeper knowledge of NLP, so we found this test best. Overall we found Python's libraries such as Beautiful soup, as well as the ability to pickle objects ( we pickle the classifiers and the function that reads positive and negative words so it doesn't have to iterate through the files or train on examples each time) to be very helpful. Additionally, Flask allowed us to display our results a relatively simple manner.

About

Final Project for W3101 Programming Languages - Python, analyzes culpa reviews for a given culpa.info page. Done with Lusa Zhan and WInston Lin

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published