This project is an End-to-End ML pipeline for natural language processing with Amazon SageMaker. The main objective of this project is to build an infrastructure for continuous analytics and automating the pipeline. This project involves training and tuning a text classifier to predict the star rating for product reviews using the SOTA BERT model for language representation. To build BERT-based NLP text classifier, I used a product reviews dataset where each record contains some review text and a star rating (1-5). Advanced model training and deployment techniques such as hyper-parameter tuning, A/B testing, and Bandit testing are also performed. Lastly, a real-time, streaming analytics and data science pipeline to perform window-based aggregations and anomaly detection is set up.
- Data Ingestion and Analysis with AWS S3, Redshift, and Athena
- Exploring and visualizing the data with Athena and Matplotlib
- Building an Automated Data Pipeline with EventBridge and Step functions
- Testing different models in live production with AB testing
- Dynamically shifting the traffic to better performing BERT model with Multi-Armed Bandits
- Continuous Analytics and ML over Streaming data with AWS Kinesis
- Setup
- Todos
- Acknowledgements
- Citation
- Connect with me
For data ingestion, I have used Amazon S3 as the central data lake where I store all raw data in TSV/Parquet format. This data needs to be accessed by both the data science / machine learning team, as well as the business intelligence / data analyst team.
Data science or Machine learning teams need access to all of the raw data, and be able to quickly explore it. They leverage Amazon Athena as an interactive query service to analyze data in Amazon S3 using standard SQL, without moving the data.
Business intelligence team and Data analysts mostly want to have a subset of the data in a data warehouse which they can then transform, and query with their standard SQL clients to create reports and visualize trends. I have leveraged Amazon Redshift, a fully managed data warehouse service.
Explore data ingestion notebook Ingest.ipynb
for more details.
This section invloves exploring, visualizing and understanding part of the raw data with Athena and Matplotlib. I have covered these important questions regarding the problem statement and raw data:
- Which Product Categories are Highest Rated by Average Rating?
- Which Product Categories Have the Most Reviews?
- When did each product category become available in the Data catalog based on the date of the first review?
- What is the breakdown of ratings (1-5) per product category?
- How Many Reviews per Star Rating? (5, 4, 3, 2, 1)
- How Did Star Ratings Change Over Time?
- Which Star Ratings (1-5) are Most Helpful?
- Which Products have Most Helpful Reviews? How Long are the Most Helpful Reviews?
- What is the Ratio of Positive (5, 4) to Negative (3, 2 ,1) Reviews?
- Which Customers are Abusing the Review System by Repeatedly Reviewing the Same Product? What Was Their Average Star Rating for Each Product?
Explore data ingestion notebook Analysis.ipynb
for more details.
This section involves creating an automated training and deployment pipeline. I have used AWS Step functions for this. The AWS Step Functions Data Science SDK enables us to easily construct and run machine learning workflows that use AWS infrastructure directly in Python, instantiate common training pipelines, create standard machine learning workflows in a Jupyter notebook from templates. Using this SDK we can create steps, chain them together to create a workflow, create that workflow in AWS Step Functions, and execute the workflow in the AWS cloud.
To automate the entire training and deployment process, I have used AWS EventBridge. AWS S3 is acting as EventBridge source and Step function is target. The AWS Cloudtrail logs the events. Amazon EventBridge is a serverless event bus that makes it easy to connect applications together using data from our own applications, integrated Software-as-a-Service (SaaS) applications, and AWS services.We can choose an event source (i.e. Amazon S3) and select a target from a number of AWS services including AWS Step Functions, AWS Lambda, Amazon SNS, and Amazon Kinesis Data Firehose. Amazon EventBridge will automatically deliver the events in near real-time.
Explore training and deployment pipeline notebook TrainDeploy_Pipeline.ipynb
for more details.
A/B testing is a statistical approach for comparing two or more versions/features to evaluate not only which one works better but also if the difference is statistically significant. A/B testing can be used for a variety of purposes like: refining the messaging and design of marketing campaigns, increasing conversion rates by improving the user experience, consider user involvement while optimizing assets such as web pages, ads, etc.
This section involves A/B testing two different models which have been trained on different subsets of data. Model A has been trained on one month old data and Model B has been trained on most recent data. We have used traffic splitting to direct subsets of users to different model variants for the purpose of comparing and testing different models in live production. The goal is to see which variants perform better. Often, these tests need to run for a long period of time (weeks) to be statistically significant. The figure shows 2 different models deployed using a random 50-50 traffic split between the 2 variants.
Explore AB testing notebook AB-Test.ipynb
for more details.
Unlike traditional A/B tests, the bandit model learns the best BERT model (action) for a given context over time and begin to shift traffic to the best model. Depending on the aggressiveness of the bandit model algorithm selected, the bandit model continuously explores the under-performing models, but start to favor and exploit the over-performing models. And unlike A/B tests, multi-armed bandits allow us to add a new action (ie. BERT model) dynamically throughout the life of the experiment. When the bandit model sees the new BERT model, it starts sending traffic and exploring the accuracy of the new BERT model - alongside the existing BERT models in the experiment.
This implementation continuously updates a Vowpal Wabbit reinforcement learning model using Amazon SageMaker, DynamoDB, Kinesis, and S3.
The client application, a recommender system with a review service in our case, pings the SageMaker hosting endpoint that is serving the bandit model. The application sends the an event
with the context
(ie. user, product, and review text) to the bandit model and receives a recommended action from the bandit model. In our case, the action is 1 of 2 BERT models that we are testing. The bandit model stores this event data (given context and recommended action) in S3 using Amazon Kinesis.
The client application uses the recommended BERT model to classify the review text as star rating 1 through 5 and compares the predicted star rating to the user-selected star rating. If the BERT model correctly predicts the star rating of the review text (ie. matches the user-selected star rating), then the bandit model is rewarded with reward=1
. If the BERT model incorrectly classifies the star rating of the review text, the bandit model is not rewarded (reward=0
).
The client application stores the rewards data in S3 using Amazon Kinesis. Periodically (ie. every 100 rewards), we incrementally train an updated bandit model with the latest the reward and event data. This updated bandit model is evaluated against the current model using a holdout dataset of rewards and events. If the bandit model accuracy is above a given threshold relative to the existing model, it is automatically deployed in a blue/green manner with no downtime. SageMaker RL supports offline evaluation by performing counterfactual analysis (CFA). By default, we apply doubly robust (DR) estimation method. The bandit model tries to minimize the cost (1 - reward
), so a smaller evaluation score indicates better bandit model performance.
Explore multi armed bandit notebook Bandit-Test.ipynb
for more details.
In this section, I have tried to move from customer reviews training dataset into a real-world scenario. Customer feedback about products appear in all of a company's social media channels, on partner websites, in customer support messages etc. We need to capture this valuable customer sentiment about our products as quickly as possible to spot trends and react fast. I have focused on analyzing a continuous stream of product review messages that is collected from all available online channels. In this project, I have used Kinesis Data Firehose
to prepare and load the data continuously to a destination of your choice. With Kinesis Data Analytics
, I have processed and analyzed the data as it arrives. And I have used Kinesis Data Streams
to deal with ingestion of data streams for custom applications.
In the first step, I have analyzed the sentiment of the customer, so we can identify which customers might need high-priority attention. Next, we run continuous streaming analytics over the incoming review messages to capture the average sentiment per product category. We visualize the continuous average sentiment in a metrics dashboard
for the line of business owners. The line of business owners can now detect sentiment trends quickly, and take action. We also calculate an anomaly score of the incoming messages to detect anomalies in the data schema or data values. In case of a rising anomaly score
, we can alert the application developers in charge to investigate the root cause. As a last metric, we also calculate a continuous approximate count
of the received messages. This number of online messages could be used by the digital marketing team to measure effectiveness of social media campaigns.
Explore streaming analytics notebook StreamingAnalytics.ipynb
for more details.
Let's get this thing running! Follow the next steps:
- Create an AWS account and launch Sagemaker studio.
- Configure IAM to run the notebooks. Attach
AdministratorAccess
policy. - Launch a terminal in your Sagemaker Jupyter instance.
git clone https://github.com/abideenml/RealTime-StarRatingPrediction-with-AWSKinesis
- Navigate into project directory
cd path_to_repo
- Create a new venv environment and run
pip install -r requirements.txt
. - Run the
Ingest.ipynb
,Analysis.ipynb
,TrainDeploy_Pipeline.ipynb
,StreamingAnalytics.ipynb
,AB-Test
, andBandit-Test
files in order for ingestion, exploration, model training, realtime model prediction, AB testing and bandit testing.
That's it!
Finally there are a couple more todos which I'll hopefully add really soon:
- Test Data quality with Deequ and also add workflow to capture data drift.
- Make AWS QuickSight Dashboard to view KPIs and others metrics.
- Use Kubeflow for managing machine learning workflows.
I found these resources useful (while developing this one):
- BERT Paper
- AWS Sagemaker
- Multi-Armed Bandit
- AWS Kinesis
- AWS Serverless
- Working with Contextual Bandits
If you find this code useful, please cite the following:
@misc{Zain2023Realtime-ratingprediction-productreviews,
author = {Zain, Abideen},
title = {realtime-ratingprediction-productreviews},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/abideenml/RealTime-StarRatingPrediction-with-AWSKinesis}},
}
If you'd love to have some more AI-related content in your life 🤓, consider:
- Connect and reach me on LinkedIn and Twitter
- Follow me on 📚 Medium
- Subscribe to my 📢 weekly AI newsletter!