Real-Time Web Server Log Processing
CSC 591 Data Intensive Computing
- Daxkumar Amin (dkamin)
- Khantil Choksi (khchoksi)
- Riken Shah (rshah9)
- Make sure you are in US-WEST-2 AWS Region
- Create EC2 instance by CloudFormation Template
- Create DynamoDB to be used by consumers to put data for analysis
- Producers will use raw data and simulate like high velocity of stream from the dataset
- Configure the conf.py file depending on your requirements of input and output stream rates and how you want to visualize.
- Run the script on EC2 instance with Python3 installed.
- Run setup.sh file to install pip3 and dependencies.
- Producer:
python ./datagenerator/producer.py
- Consumer:
python ./consumer/consumer.py <shard_id>
- Also follow this steps: https://aws.amazon.com/blogs/big-data/perform-near-real-time-analytics-on-streaming-data-with-amazon-kinesis-and-amazon-elasticsearch-service/
Screenshots:
Click Here to access Project Proposal
Click Here to access Project Overview
Click Here to access Project Status Report
- Web log data https://www.kaggle.com/shawon10/web-log-dataset
- https://kafka.apache.org/10/javadoc/
- https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-kinesis-stream.html#aws-resource-kinesis-stream-examples
- https://kafka-python.readthedocs.io/en/master/apidoc/modules.html
- https://aws.amazon.com/blogs/big-data/perform-near-real-time-analytics-on-streaming-data-with-amazon-kinesis-and-amazon-elasticsearch-service/