# Abstract

# Introduction

For most user-generated content (UGC) driven websites, the quality of the the content presented to other users is paramount to the success of the website. For example, on a site like YouTube, the quality of the videos that are recommended to users is a key factor in determining whether or not the user will continue to use the site. Sites that are overrun with spam or low-quality content are often abandoned by users in favor of sites that present less noise.

MiddCourses is a website that allows Middlebury students to review courses that they have taken. The site is driven by UGC. And as we see in the previous example, the quality of the reviews that are shown to users is a key factor in determining whether or not the user will:

1. gain valuable information about the course they are interested in

2. continue to use the site

3. recommend the site to other users


Creating a method to estimate the quality of a review on MiddCourses will allow better review curation and on-page sorting of reviews. This will allow users to more easily find the information they are looking for and increases the value of the site to users.

Additionally, a method to identity low-quality reviews can reduce administrative burden on the site's moderators. Currently, moderators manually review each review that is submitted to the site to check for prohibited content. This is a time-consuming process that could be made more efficient by using a classifier reduce the number of reviews that need to be manually reviewed.

Our problem shares some similarities to the problem of spam detection but differs in key ways:

1. Low quality content is a superset of spam

    While spam is certainly low-quality, not all low-quality content is spam. For example, a review that provides little information about the course is low-quality but not spam. 

2. Existing filters remove most extremely low-quality content

    When reviews are submitted to MiddCourses they are sent through a verification pipeline that tries to determine if the review is spam or if it contains prohibited content. This pipeline is fairly effective at preventing extremely low-quality content from being posted to the site. The pipeline focuses on preventing the following types of content from being posted:

    1. Reviews that are too long or too short
    2. Reviews that contain any profanity
    3. Reviews that contain junk characters (eg "asdfasdfa")
    4. Reviews that are too self-similar (eg repeating the same phrase over and over)
    5. Reviews that don't seem to be in English
    6. Reviews that contain padding to make the minimum character count
    7. Reviews that are copy-pasted from the course description

    While this does not explicitly filter out spam, the site's incentives and authenticate structure provide little incentive to post commercial-type spam reviews. The site is not monetized and non-100-level reviews are not indexed by search engines. This, along with the requirement to have a Middlebury student email address, means that there is little incentive or ability to post spam reviews to the site. And given the authentication requirement, users who post spam reviews can be easily identified and banned from the site.

    This means that the remaining low-quality reviews do not contain the same types of content that are typically found in spam. This means that we cannot use the same methods that are used to detect spam to detect these reviews.

    These remaining low-quality reviews are typically low-quality for one of the following reasons:

    1. **Low-Effort Reviews**: characterized by a lack of detail and a lack of information about the course
    2. **Fraudulent Reviews**: characterized by randomness and a lack of coherence
    3. **Hyperbolic Reviews**: characterized by excessive polarity and a limited perspective



This means our problem is isomorphic to a review helpfulness prediction problem, not a spam detection task.




## Related Work

We started our research by looking at existing work on review helpfulness prediction. We found that there are two main approaches to this problem:

1. **Metadata-based**: uses metadata and computed features to predict helpfulness (@du2019feature; @singh2017singh; @mauro2021user; @liu2007low)
2. **Text-based**: uses NLP-based methods or deep learning on review text to predict helpfulness (@9416474; @salminen2022creating; @8288877)




# Values Statement

Our project has direct implications for the MiddCourses website. It will impact users on 3 levels:

1. **Users who are looking for information about a course**: 

    Our project aims to allow users to more easily find the valuable information they are searching for. This will increase the value of the site to users and increase the likelihood that they will continue to use the site and recommend it to other users.

    Nevertheless, if types of bias are introduced into the model, it could have a negative impact on these users. For example, if the model is biased against reviews that are written by non-native English speakers, those opinions would be less likely to be shown to users. This systematic de-ranking would be a negative outcome for users who share similar characteristics.

    This impact is partial mitigated by the fact that the model will only used to sort reviews on the page. Users can still see all reviews by scrolling down the page. However, this means a user's first impression of a course will be impacted by the model's predictions.


2. **Users who are writing reviews**:

    For users who are writing reviews, our project will have a direct impact on their experience. If a user's review is flagged as low-quality, it may be subject to additional scrutiny by the site's moderators. This could lead to an increased chance that the review is removed.
    

    Additionally, if the model is used to sort reviews on the page, the user's review may be less likely to be seen by other users. This may lead to a decreased sense of value for the user and may lead to them using the site less frequently.

    Conversely, if a user's review is flagged as high-quality, it may be more likely to be seen by other users. This may lead to an increased sense of value for the user and may lead to them using the site more frequently.

    Since MiddCourses does not offer analytics of reviews to users, this impact is largely mitigated. Users would only be able to infer the impact of the model on their reviews based on the position of their review on the page. But even this is not a perfect indicator since review sorting takes into account other factors such as the date the review was posted and the number of user votes the review has received.

3. **Instructors who are being reviewed**:

    Instructors face the largest potential for harm from our project. MiddCourses, like other review sites, is a controversial topic among instructors. On one hand, it allows them to get feedback from students and improve their courses. On the other hand, it allows students to publicly criticize them and their courses.
    
    These sites are well-known in the literature for being biased against women, people of color, and non-native English speakers. This bias is often due to the fact that these groups are more likely to be criticized for their teaching style and communication skills. Or due to discrimination against these groups and the perception that they are less qualified to teach.
    
    Our project has the potential to exacerbate this bias. Given the likelihood of biased training data since the model will be trained on real reviews, it is possible for the model to learn these biases. A biased model would look like one which promotes biased reviews with a higher probability compared to unbiased reviews.

    If these biases exist in the reviews, it is likely that features like word choice and overall rating will be influenced by these biases. 

    At the model-level, we can try and mitigate these biases by not training directly from these features. Instead, we will try and build second-order features that are not directly influenced by these biases. For example, we can try and build features that are based on the sentiment of the review rather than the words used in the review. This will allow us to capture the sentiment of the review without being overly-influenced by the word choice.

    Additionally, we use a triple-annotation method to hopefully reduce the impact of biased reviews on the model. This method allows us to downvote reviews that are flagged as biased by the annotators. This let's us identify egregious cases of bias but may miss subtle cases of bias.
    
    If the model is used to sort reviews on a course page, reviews about an instructor may be more or less likely to be seen by users. This may cause inflated or deflated perception of said instructor from student perspectives. This, in turn, may impact course enrollment and instructor evaluations. 

    Early review bias also influences the impact of the model on instructors. If the first reviews about an instructor are biased and are treated as high-quality by the model, it may influence the perception of the instructor for future students. This may lead to a feedback loop where the instructor is perceived as being better (or worse) than they actually are.

    With this in mind, we worry about feedback loops in reviews (poisoning the well) where early reviews influence later reviews. Given our model's interaction with how reviews are presented to users, we must consider these multiple-order effects. We attempt to mitigate the feedback loop in two ways:

    1. We use a time-decay factor in the review sorting algorithm. This means that reviews that are posted closer to the current date will be ranked higher than older reviews.

    2. New users are not able to see reviews until they have submitted 2 of their own reviews in the previous 6 months. This helps to ensure that users are not influenced by existing reviews when writing their own reviews.

    With a biased model, instructors could be presented with negative or positive bias.
    
    Negative bias against instructors has a higher potential for harm than positive bias. This is because negative bias can lead to a feedback loop where instructors are perceived as being worse than they actually are. This can lead to lower enrollment in their courses and lower instructor evaluations. This, in turn, can lead to lower pay and less job security for instructors. This is especially true for instructors who are not tenured or who are on the tenure track. We also worry about the emotional impact of unfair criticism on instructors. This can lead to increased stress and anxiety for instructors. This can lead to a negative impact on their mental health and their ability to teach effectively.

    At the review sorting level, negative bias is partially mitigated by a hardcoded bias for positive reviews. This means that reviews with a positive overall rating will be ranked higher than reviews with a negative overall rating. The intention of this choice is to shift bias towards the less-harmful positive direction.

    Positive bias, on the other hand, primarily negatively impacts students. While positive bias can certainly unfairly aid an instructor, its impact is felt by the students who choose said instructor because of the biased review and have a worse experience than what they expected.

    There is a large potential for harm here and we must be careful not to exacerbate existing biases. There is the potential for large-scale feedback loops if our model is biased. And certainly we see pathways to real-world harm.



Overall, we believe that our model will make the world (Middlebury) a better place. By identifying high and low quality reviews on MiddCourses we hope to make it easier for Middlebury students to find high quality information on the courses they are searching for. And therefore improve their ability to find courses they will enjoy. This belief assumes that:

1. Higher quality reviews lead to better outcomes for students.

2. Our classification of reviews is fair and does not unduly discriminate or disadvantage specific groups of students.

As you can see, our primary focus is on the impact of our model on students. This is because students are the primary users of MiddCourses. And therefore the primary beneficiaries of our model. We believe that our model will have a positive impact on students by making it easier for them to find high quality information on the courses they are searching for. This will allow them to make better decisions about which courses to take and therefore improve their experience at Middlebury.






# Material and Methods



## Data

## Approach

# Results

# Concluding Discussion

# Group Contributions

# Personal Reflection




# References
