Skip to content

aasthaguptaa/Big-Data-Sentiment-Analysis

Repository files navigation

Sentiment Analysis

Table of Contents

Overview

– This project implements a sentiment analysis pipeline using Kafka, Flink, Elasticsearch, and Kibana. The pipeline processes Reddit comments, analyzes their sentiment, and visualizes the results in a Kibana dashboard. The user can type two keywords to see the sentiment change over time in the graph.

Architecture

– Reddit Comments Dataset -> Kafka -> Flink -> Elasticsearch ->Kibana

  1. Kafka: Acts as the message broker for sending Reddit comments data to Flink.
  2. Flink: Preprocesses the comments and performs model inference.
  3. Elasticsearch: Stores the processed data for querying to be pushed to Kibana
  4. Kibana: Visualizes the sentiment analysis results(One line for each keyword).

Prerequisites

  1. For Cloud Run:
    • Web browser and Internet connection.
  2. For Local Run:
    • Docker Desktop App and VsCode

Quick Start

  1. For Cloud Run:
  2. For Local Run:
curl -X PUT "http://localhost:9200/sentiment_analysis" \
   -H 'Content-Type: application/json' \
   -d '{
"mappings": {
  "properties": {
    "author":      { "type": "keyword" },
    "body":        { "type": "text"    },
    "controversiality": { "type": "integer" },
    "created_utc": { "type": "date", "format": "epoch_second" },
    "id":          { "type": "keyword" },
    "ingest_time": { "type": "date"    },
    "score":       { "type": "integer" },
    "sentiment":   { "type": "float"   },
    "subreddit":   { "type": "keyword" }
  }
}
}'
  1. Start the Flink job:
    docker-compose exec jobmanager flink run -py /project/jobs/flink_sentiment_job_es.py --detached

Accessing Services

– Flink Dashboard: http://34.32.114.98:8081 (port 8081) – Kibana :http://34.32.114.98:5601 (port 5601)

Common errors & quick fixes

– Elasticsearch: If you can't see the data on Kibana Dashboard; - Go to Stack Management -> Index Management -> Sentiment_Analysis -> Mapping check the data types of "created_utc" and "sentiment"

  • Docker : If the producer container crashes: Make sure The value of ```sh env.set_parallelism(10)
    taskmanager.numberOfTaskSlots: 10 !Depending on your system, you can increase or decrease the parallelism and task slots.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published