# Yelp Pulse

##### Generating Positive vs Negative Mentions from Yelp reviews for deeper sentiment understanding, leveraging LDA and BERT.

Hypothetical Requirement: [https://github.com/clement-hironimus/yelp-pulse-sentiment-analysis/blob/main/README.md](https://github.com/clement-hironimus/yelp-pulse-sentiment-analysis/blob/main/README.md)

&nbsp;
PROJECT OUTLINE:
- Step 1: Data Wrangling & Cleaning (see: 01_yelp_pulse_data_wrangling_and_cleaning.ipynb)
- Step 2: Exploratory Data Analysis (see: yelp_pulse_exploratory_data_analysis.ipynb)
- Step 3: Topic Extraction and Sentiment Classification (THIS NOTEBOOK)

Steps:
1. Topic Identification using LDA
2. Segementing the Topics
3. Feature Extraction with DistilBERT
4. Integraiton of Numerical/Categorical Features
5. Sentiment Classification using Dense/LSTM Neural Network
6. Model Validation

Reference on combining models (concatenation): https://www.educative.io/answers/how-to-merge-two-different-models-in-keras

## Detailed Steps with Examples:

Step 1: Topic Identification with LDA
- **Input**: Collection of reviews.
- **Process**: Apply LDA to identify prevalent topics.
- **Output**:
  - Topics: "Ambiance", "Beverages", "Food"

Step 2: Segmentation Based on Topics
- **Input**: Full review text.
- **Process**: Segment text by topics identified in Step 1.
- **Output**:
  - "Ambiance": "The ambiance at Cafe Paris is cozy and inviting."
  - "Beverages": "The coffee is excellent."
  - "Food": "The cakes are too sweet for my taste."

Step 3: Feature Extraction with DistilBERT
- **Input**: Text segments.
- **Process**: Use DistilBERT to obtain deep contextual embeddings for each text segment.
- **Output**:
  - Embeddings for "Ambiance", "Beverages", "Food".

Step 4: Integration of Numerical/Categorical Features
- **Input**: Embeddings from DistilBERT; numerical features like review stars and useful counts.
- **Process**: Concatenate numerical features with text embeddings.
- **Output**:
  - Combined feature vectors:
    - "Ambiance": [Embeddings, 4 (Stars), 5 (Useful)]
    - "Beverages": [Embeddings, 4, 5]
    - "Food": [Embeddings, 4, 5]

Step 5: Sentiment Classification using Dense Neural Network
- **Input**: Combined feature vectors for each segment.
- **Technique**: Utilize a neural network with a dense layer followed by softmax activation for classification.
- **Output**:
  - Sentiments for each segment:
    - "Ambiance": Positive
    - "Beverages": Positive
    - "Food": Negative

Step 6: Model Validation
- **Input**: Sentiment predictions.
- **Process**: Evaluate the model's performance using metrics like F1-score and accuracy, comparing against a manually annotated test set.
- **Output**: Evaluation results and performance metrics.

Step 7: Visualization and Reporting
- Develop a dashboard to display sentiments by topics for each business, allowing users to understand the nuanced sentiments regarding different aspects like ambiance, beverages, and food.



Numerical/Categorical features (Input) --> Dense Neural Network - - -  \
`                                                                         Concatenate -------> Output
Text Embeddings (Input) --> LSTM - - - - - - - - - - - - - - - - - - - /

## Executive Summary: Key Preprocessing Steps and Decisions