# Sentiment Analysis using Regex on Riliv App Reviews

This notebook demonstrates how to apply regular expressions (regex) to extract important words from app reviews for sentiment analysis.

### Dataset Overview:
We are using review data from the Riliv Mental Health app, with columns such as:
- `content`: Review text
- `score`: Rating score
- `sentiment_rating`: Sentiment (Positive/Negative)
- `wordCount`: Number of words before stopword removal
- `tweet_with_stopwords`: Review text with stopwords
- `wordCount_after_stopwords`: Number of words after stopword removal
- `tweet_stemmed`: Review text after stemming

### Regex Examples:
Below are some examples of regex patterns applied to the reviews.

1. **Words starting with a capital letter**:
   - `^[A-Z]\w+`

2. **Finding mental health-related terms**:
   - `(terapi|psikolog|depresi|cemas|stres|mental)`

3. **Positive sentiment words**:
   - `(baik|membantu|efektif|mudah|puas|bagus)`

4. **Negative sentiment words**:
   - `(buruk|sulit|tidak puas|mengecewakan|jelek)`

5. **Numeric values (e.g., ratings or durations)**:
   - `\b\d+\b`

6. **Phrases containing 'tidak' followed by another word**:
   - `tidak \w+`

7. **Questions (sentences ending with '?')**:
   - `\w+\?`

8. **Terms related to app features**:
   - `(fitur|aplikasi|fungsi|antarmuka|login)`

9. **Abbreviations (e.g., BPJS, NIK)**:
   - `\b[A-Z]{2,}\b`

10. **Quoted phrases**:
   - `\"[^\"]+\"`


In [None]:
import pandas as pd

# Load the dataset
df_rilivrev = pd.read_csv('/mnt/data/rilivrev_final.csv', sep=';')

# Display the first few rows of the dataset
df_rilivrev.head()

In [None]:
import re

# Example: Find reviews that mention 'terapi', 'psikolog', 'depresi', etc.
mental_health_reviews = df_rilivrev[df_rilivrev['content'].str.contains(r'(terapi|psikolog|depresi|cemas|stres|mental)', case=False, na=False)]

# Display the first few matching reviews
mental_health_reviews.head()
