Providing no-click bias analytics for online news sources. Microstrategy Analytics Prize Winner at VTHacks 2019.
A real-time data processing pipeline using AWS Kinesis Firehose, S3, Lambda, Comprehend, and RDS.
- AWS Lambda functions
- Adding to AWS Firehose (python)
The main Lambda function to process firehose data does the following:
- Reads Firehose data from S3
- Calls AWS Comprehend API to add sentiment to data
- Adds records to postgres database
The other Lambda function can be scheduled to run every time interval to stream data into Kinesis Firehose.
- Calls NewsAPI to collect data for streaming
- Cleans NewsAPI response
- Puts record into AWS firehose
- API Keys and envirnoment variables are not added, be sure to add them if you are trying to use these scripts
- Lambda functions are added as lambda_(general function). Make sure to zip contents to add as lambda functions