From 8140023adad3d82733bc30e5c2681d87c0c316fa Mon Sep 17 00:00:00 2001 From: Jason Andrews Date: Thu, 2 Jan 2025 13:24:14 -0600 Subject: [PATCH] first review of sentiment analysis Learning Path --- ... monitoring with Prometheus and Grafana.md | 49 +++++++++----- ...onitoring with Elasticsearch and Kibana.md | 17 +++-- .../Sentiment Analysis.md | 65 ++++++++++++++----- .../Understand the basics.md | 27 ++++---- .../sentiment-analysis-eks/_index.md | 19 +++--- 5 files changed, 117 insertions(+), 60 deletions(-) diff --git a/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/Cluster monitoring with Prometheus and Grafana.md b/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/Cluster monitoring with Prometheus and Grafana.md index e16092944f..7ac80c7525 100644 --- a/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/Cluster monitoring with Prometheus and Grafana.md +++ b/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/Cluster monitoring with Prometheus and Grafana.md @@ -1,33 +1,48 @@ --- -title: Cluster monitoring with Prometheus and Grafana in Amazon EKS +title: Monitor the cluster with Prometheus and Grafana weight: 5 ### FIXED, DO NOT MODIFY layout: learningpathall --- -## CPU and RAM usage statistics with Prometheus and Grafana +## Monitor CPU and RAM usage with Prometheus and Grafana -Prometheus is a monitoring and alerting tool. It is used for collecting and querying real-time metrics in cloud-native environments like Kubernetes. Prometheus collects essential metrics (e.g., CPU, memory usage, pod counts, request latency) that help in monitoring the health and performance of Kubernetes clusters. Grafana is a visualization and analytics tool that integrates with data sources from Prometheus, to create interactive dashboards to monitor and analyze Kubernetes metrics over time. +Prometheus is a monitoring and alerting tool. It is used for collecting and querying real-time metrics in cloud-native environments like Kubernetes. Prometheus collects essential metrics about CPU usage, memory usage, pod counts, and request latency. This helps you monitor the health and performance of your Kubernetes clusters. +Grafana is a visualization and analytics tool that integrates with data sources from Prometheus to create interactive dashboards to monitor and analyze Kubernetes metrics over time. -## Install Prometheus on Arm-based EKS cluster +## Install Prometheus on your EKS cluster -This learning path uses `helm` to install prometheus on the Kubernetes cluster. Follow the [helm documentation](https://helm.sh/docs/intro/install/) to install it on your laptop. +You can use Helm to install prometheus on the Kubernetes cluster. -Create a namespace in your EKS cluster to host `prometheus` pods +Follow the [Helm documentation](https://helm.sh/docs/intro/install/) to install it on your computer. + +Confirm Helm is installed by running the version command: + +```console +helm version +``` + +The output is similar to: + +```output +version.BuildInfo{Version:"v3.16.3", GitCommit:"cfd07493f46efc9debd9cc1b02a0961186df7fdf", GitTreeState:"clean", GoVersion:"go1.22.7"} +``` + +Create a namespace in your EKS cluster to host `prometheus` pods: ```console kubectl create namespace prometheus ``` -Add the following helm repo for prometheus +Add the following Helm repo for prometheus: ```console helm repo add prometheus-community https://prometheus-community.github.io/helm-charts ``` -Install `prometheus` on the cluster with the following command +Install Prometheus on the cluster with the following command: ```console helm install prometheus prometheus-community/prometheus \ @@ -36,22 +51,21 @@ helm install prometheus prometheus-community/prometheus \ --set server.persistentVolume.storageClass="gp2" ``` -Check all pods are up and running +Check all pods are up and running: ```console kubectl get pods -n prometheus ``` +## Install Grafana on your EKS cluster -## Install Grafana on Arm-based EKS cluster - -Add the following helm repo for grafana +Add the following Helm repo for Grafana: ```console helm repo add grafana https://grafana.github.io/helm-charts ``` -Create `grafana.yaml` file with the following contents +Use a text editor to create a `grafana.yaml` file with the following contents: ```console datasources: @@ -65,13 +79,13 @@ datasources: isDefault: true ``` -Create another namespace for `grafana` pods +Create another namespace for Grafana pods: ```console kubectl create namespace grafana ``` -Install `grafana` on the cluster with the following command +Install Grafana on the cluster with the following command: ```console helm install grafana grafana/grafana \ @@ -82,12 +96,15 @@ helm install grafana grafana/grafana \ --values grafana.yaml \ --set service.type=LoadBalancer ``` + Check all pods are up and running ```console kubectl get pods -n grafana ``` -Login to the grafana dashboard using the LoadBalancer IP and click on `Dashboards` in the left navigation page. Locate a `Kubernetes / Compute Resources / Node` dashboard and click on it. You should see a dashboard like below for your Kubernetes cluster +Login to the grafana dashboard using the LoadBalancer IP and click on `Dashboards` in the left navigation page. Locate a `Kubernetes / Compute Resources / Node` dashboard and click on it. + +You see a dashboard like below for your Kubernetes cluster: ![grafana #center](_images/grafana.png) diff --git a/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/Monitoring with Elasticsearch and Kibana.md b/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/Monitoring with Elasticsearch and Kibana.md index 7281c02570..671bc3e8e9 100644 --- a/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/Monitoring with Elasticsearch and Kibana.md +++ b/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/Monitoring with Elasticsearch and Kibana.md @@ -1,5 +1,5 @@ --- -title: Monitoring the sentiments with Elasticsearch and Kibana +title: Monitoring sentiment with Elasticsearch and Kibana weight: 4 ### FIXED, DO NOT MODIFY @@ -8,11 +8,13 @@ layout: learningpathall ## Deploy Elasticsearch and Kibana on Arm-based EC2 instance -Elasticsearch is a NoSQL database and search & analytics engine. It's designed to store, search and analyze large amounts of data. It has real-time indexing capability which is crucial for handling high-velocity data streams like tweets. Kibana is a dashboard and visualization tool that integrates seamlessly with Elasticsearch. It provides an interface to interact with twitter data, apply filters and receive alerts. There are multiple ways to install Elasticsearch and Kibana, one of the methods is shown below. +Elasticsearch is a NoSQL database, search, and analytics engine. It's designed to store, search and analyze large amounts of data. It has real-time indexing capability which is crucial for handling high-velocity data streams like Tweets. -Before you begin, ensure that docker and docker compose have been installed on your laptop. +Kibana is a dashboard and visualization tool that integrates seamlessly with Elasticsearch. It provides an interface to interact with twitter data, apply filters, and receive alerts. There are multiple ways to install Elasticsearch and Kibana, one method is shown below. -Create the following docker-compose.yml file +Before you begin, ensure that Docker and Docker Compose have been installed on your computer. + +Use a text editor to create a `docker-compose.yml` file with the contents below: ```yml version: '2.18.1' @@ -47,15 +49,18 @@ networks: elk: driver: bridge ``` + Use the following command to deploy Elasticsearch and Kibana Dashboard. +```console docker-compose up +``` After the dashboard is up, use the the public IP of your server on the port 5601 to access the Kibana dashboard. ![kibana #center](_images/kibana.png) -Now switch to the stack management using the menu on the left side as shown in below image. +Switch to the stack management using the menu on the left side as shown in below image. ![kibana-data #center](_images/Kibana-data.png) @@ -71,7 +76,7 @@ One of the sample dashboard structures looks as below, showing the records of di ![kibana-dashboard2 #center](_images/Kibana-dashboard2.png) -Similarly, you can desgin and create dashboards to analyze a particular set of data. The screenshot below shows the dashboard designed for this learning path +Similarly, you can design and create dashboards to analyze a particular set of data. The screenshot below shows the dashboard designed for this learning path ![kibana-dashboard3 #center](_images/Kibana-dashboard3.png) diff --git a/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/Sentiment Analysis.md b/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/Sentiment Analysis.md index dea389afdb..8ae7d8dfe0 100644 --- a/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/Sentiment Analysis.md +++ b/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/Sentiment Analysis.md @@ -8,20 +8,38 @@ layout: learningpathall ## Before you begin -You will need an [AWS account](https://aws.amazon.com/). Create an account if needed. +You will need an [AWS account](https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-creating.html). Create an account if needed. -Three tools are required on your local machine. Follow the links to install the required tools. +Four tools are required on your local machine. Follow the links to install each tool. * [Kubectl](/install-guides/kubectl/) -* [AWS CLI](/install-guides/aws-cli) -* [Docker](/install-guides/docker) -* [Terraform](/install-guides/terraform) +* [AWS CLI](/install-guides/aws-cli/) +* [Docker](/install-guides/docker/) +* [Terraform](/install-guides/terraform/) + +To use the AWS CLI, you will need to generate AWS access keys and configure the CLI. Follow the [AWS Credentials](/install-guides/aws_access_keys/) install guide for instructions. ## Setup sentiment analysis -Clone this github [repository](https://github.com/koleini/spark-sentiment-analysis) on your local workstation. Navigate to `eks` directory and update the `variables.tf` file with your AWS region. +Take a look at the [GitHub repository](https://github.com/koleini/spark-sentiment-analysis) then clone it on your local computer: + +```console +git clone https://github.com/koleini/spark-sentiment-analysis.git +cd spark-sentiment-analysis +``` + +Edit the file `eks/variables.tf` if you want to change the default AWS region. + +The default value is at the top of the file and is set to `us-east-1`. + +```output +variable "AWS_region" { + default = "us-east-1" + description = "AWS region" +} +``` -Execute the following commands to create the Amazon EKS cluster with pre-configured labels. +Execute the following commands to create the Amazon EKS cluster: ```console terraform init @@ -30,8 +48,10 @@ terraform apply --auto-approve Update the `kubeconfig` file to access the deployed EKS cluster with the following command: +If you want to use an AWS CLI profile not named `default`, change the profile name before running the command. + ```console -aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw cluster_name) --profile +aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw cluster_name) --profile default ``` Create a service account for Apache spark @@ -43,24 +63,26 @@ kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount ## Build the sentiment analysis JAR file -Navigate to the `sentiment_analysis` folder and create a JAR file for the sentiment analyzer +Navigate to the `sentiment_analysis` folder and create a JAR file for the sentiment analyzer: ```console cd sentiment_analysis sbt assembly ``` -You should see a JAR file created at the following location +A JAR file is created at the following location: ```console sentiment_analysis/target/scala-2.13/bigdata-assembly-0.1.jar ``` -## Create Spark docker container image +## Create a Spark container image Create a repository in Amazon ECR to store the docker images. You can also use Docker Hub. -The Spark repository contains a script to build the Docker image needed for running inside the Kubernetes cluster. Execute this script on your Arm-based laptop to build the arm64 image. +The Spark repository contains a script to build the container image you need to run inside the Kubernetes cluster. + +Execute this script on your Arm-based computer to build the arm64 image. In the current working directory, clone the `apache spark` github repository prior to building the image @@ -69,18 +91,20 @@ git clone https://github.com/apache/spark.git cd spark git checkout v3.4.3 ``` -Build the docker container using the following commands: + +Build the docker container using the following commands. Substitute the name of your container repository before running the commands. ```console cp ../sentiment_analysis/target/scala-2.13/bigdata-assembly-0.1.jar jars/ bin/docker-image-tool.sh -r -t sentiment-analysis build bin/docker-image-tool.sh -r -t sentiment-analysis push ``` + ## Run Spark computation on the cluster Execute the `spark-submit` command within the Spark folder to deploy the application. The following commands will run the application with two executors, each with 12 cores, and allocate 24GB of memory for both the executors and driver pods. -Set the following variables before executing the `spark-submit` command +Set the following variables before executing the `spark-submit` command: ```console export MASTER_ADDRESS= @@ -88,7 +112,8 @@ export ES_ADDRESS= export CHECKPOINT_BUCKET= export EKS_ADDRESS= ``` -Execute the following command + +Execute the `spark-submit` command: ```console bin/spark-submit \ @@ -122,16 +147,20 @@ spark-twitter 1/1 Running 0 12m ## Twitter sentiment analysis -Create a twitter(X) [developer account](https://developer.x.com/en/docs/x-api/getting-started/getting-access-to-the-x-api) and create a `bearer token`. Using the following script to fetch the tweets +Create a twitter(X) [developer account](https://developer.x.com/en/docs/x-api/getting-started/getting-access-to-the-x-api) and create a `bearer token`. + +Use the following commands to set the token and fetch the Tweets: ```console export BEARER_TOKEN= python3 scripts/xapi_tweets.py ``` -You can modify the script `xapi_tweets.py` with your own keywords. Update the following section in the script to do so +You can modify the script `xapi_tweets.py` with your own keywords. -```console +Here is the code which includes the keywords: + +```output query_params = {'query': "(#onArm OR @Arm OR #Arm OR #GenAI) -is:retweet lang:en", 'tweet.fields': 'lang'} ``` diff --git a/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/Understand the basics.md b/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/Understand the basics.md index 62a2be94f9..0b9aedae6a 100644 --- a/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/Understand the basics.md +++ b/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/Understand the basics.md @@ -1,27 +1,30 @@ --- -title: What is Twitter Sentiment Analysis +title: Understand sentiment analysis weight: 2 ### FIXED, DO NOT MODIFY layout: learningpathall --- -## What is Sentiment Analysis +## What is sentiment analysis? -Sentiment analysis is a natural language processing technique used to identify and categorize opinions expressed in a piece of text, such as a tweet or a product review. It can help to gauge public opinion, identify trends and patterns, and improve decision-making. Social media platforms, such as Twitter, provide a wealth of information about public opinion, trends, and events. Sentiment analysis is important because it provides insights into how people feel about a particular topic or issue, and can help to identify emerging trends and patterns. +Sentiment analysis is a natural language processing technique used to identify and categorize opinions expressed in a piece of text, such as a tweet or a product review. It can help gauge public opinion, identify trends and patterns, and improve decision-making. Social media platforms, such as Twitter (X), provide a wealth of information about public opinion, trends, and events. Sentiment analysis is important because it provides insights into how people feel about a particular topic or issue, and can help to identify emerging trends and patterns. +## Can I perform real-time sentiment analysis using an Arm-based Amazon EKS cluster? -## Real-time sentiment analysis with Arm-based Amazon EKS clusters +Yes, you can use EKS for sentiment analysis. -Real-time sentiment analysis is a compute-intensive task and can quickly drive up resources and increase costs if not managed effectively. Tracking real-time changes enables organizations to understand sentiment patterns and make informed decisions promptly, allowing for timely and appropriate actions. +Real-time sentiment analysis is a compute-intensive task and can quickly drive up resources and increase costs if not managed effectively. Tracking real-time changes enables you to understand sentiment patterns and make informed decisions promptly, allowing for timely and appropriate actions. + +The architecture used for the solution is shown below: ![sentiment analysis #center](_images/Sentiment-Analysis.png) -The high-level technology stack for the solutions is as follows: +The technology stack for the solution includes the following steps: -- Twitter(X) Developer API to fetch tweets based on certain keywords -- Captured data is processed using Amazon Kinesis -- Sentiment Analyzer model to classify the text and tone of tweets -- Process the sentiment of tweets using Apache Spark streaming API -- Elasticsearch and Kibana to store the processed tweets and showcase on dashboard -- Prometheus and Grafana to monitor the CPU and RAM resources of the Amazon EKS cluster +- Use the Twitter (X) developer API to fetch Tweets based on certain keywords +- Process the captured data using Amazon Kinesis +- Run a sentiment analysis model to classify the text and tone of the text +- Process the sentiment of Tweets using Apache Spark streaming API +- Use Elasticsearch and Kibana to store the processed Tweets and showcase the activity on a dashboard +- Monitor the CPU and RAM resources of the Amazon EKS cluster with Prometheus and Grafana diff --git a/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/_index.md b/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/_index.md index 08e6aab885..5fae514538 100644 --- a/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/sentiment-analysis-eks/_index.md @@ -1,19 +1,22 @@ --- -title: Learn how to perform Twitter(X) Sentiment Analysis on Arm-based EKS clusters +title: Learn how to perform Twitter (X) sentiment analysis on Arm-based EKS clusters + +draft: true +cascade: + draft: true minutes_to_complete: 60 -who_is_this_for: This is an advanced topic for software developers who like to build an end-to-end solution ML solution to analyze the sentiments of live tweets with Arm-based Amazon EKS cluster +who_is_this_for: This is an advanced topic for software developers who want to build an end-to-end ML sentiment analysis solution to analyze live Tweets on an Arm-based Amazon EKS cluster. learning_objectives: - - Deploy text classification model on Amazon EKS with Apache Spark - - Learn how to deploy Elasticsearch and Kibana dashboard to analyze the tweets - - Deploy Prometheus and Grafana dashboard to keep track of CPU and RAM usage of Kubernetes nodes + - Deploy a text classification model on Amazon EKS with Apache Spark. + - Use Elasticsearch and a Kibana dashboard to analyze the Tweets. + - Deploy Prometheus and Grafana dashboards to keep track of CPU and RAM usage of Kubernetes nodes. prerequisites: - - An [AWS account](https://aws.amazon.com/). Create an account if needed. - - A computer with [Amazon eksctl CLI](/install-guides/eksctl) and [kubectl](/install-guides/kubectl/)installed. - - Docker installed on local computer [Docker](/install-guides/docker) + - An AWS account. + - A computer with Docker, Terraform, the Amazon eksctl CLI, and kubectl installed. author_primary: Pranay Bakre, Masoud Koleini, Nobel Chowdary Mandepudi, Na Li