This project is my Insight Data Engineering Project. This project was done in 3 weeks. The project builds a distributed streaming Pipeline on AWS that processes the stream of twitter messages to find the co-occurring twitter hashtags for any hashtag in a given time duration. In addition, the project computes the trending twitter hash-tag clusters in real-time.
Slide Deck link
Video link
Spark, Kafka, Cassandra, Flask