<a href="https://colab.research.google.com/github/ManjunathAdi/Bandits/blob/main/Bandits_production.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [13]:
# Bhagavan Mahamrityunjaya Mahadev

# Step 1:  Define the Architecture

In a microservices architecture, each service is a standalone component that can be scaled and deployed independently. In this case, we will deploy a Bandit-based recommendation system where:

* Front-end service: This interacts with the users.
* Bandit algorithm service: This handles the decision-making for real-time personalized recommendations.
* Recommendation database: Stores and updates information about user preferences and historical data.
* Model management service: This can handle retraining of the model based on new data.
* Monitoring service: This tracks the performance and accuracy of the recommendations.

The architecture includes:

* 1.] API Gateway: Entry point for incoming requests.
* 2.] Bandit Algorithm Service: Deployed as a microservice using Docker and Kubernetes.
* 3.] Database: Stores data like user history, action-reward logs.
*4.] Monitoring and Metrics: Provides insights into the performance and A/B testing.

# Step 2: Implement the Bandit Algorithm

Let’s start by implementing the Bandit-based reinforcement learning algorithm.

For simplicity, we will use an Epsilon-Greedy Bandit.

In [7]:
import random

class EpsilonGreedyBandit:
    def __init__(self, n_arms, epsilon=0.1):
        self.n_arms = n_arms
        self.epsilon = epsilon
        self.counts = [0] * n_arms  # Track the number of times each arm was chosen
        self.values = [0.0] * n_arms  # Estimated value of each arm

    def select_arm(self):
        # With probability epsilon, choose a random arm (exploration)
        if random.random() < self.epsilon:
            return random.randrange(self.n_arms)
        # Otherwise choose the arm with the highest estimated value (exploitation)
        return max(range(self.n_arms), key=lambda x: self.values[x])

    def update(self, chosen_arm, reward):
        # Update counts
        self.counts[chosen_arm] += 1
        # Update the estimated value of the chosen arm
        n = self.counts[chosen_arm]
        value = self.values[chosen_arm]
        # Incremental update to the mean
        self.values[chosen_arm] = ((n - 1) / n) * value + (1 / n) * reward


# Step 3: Build the Bandit Algorithm Microservice with Flask

Now we encapsulate the Bandit algorithm into a microservice using Flask.

This service will have two endpoints:
* /recommend: to get the recommendation (i.e., select an arm).
* /reward: to update the model with a reward after an action is taken.

In [10]:
# Flask API for Bandit Microservice:

#This code creates a simple microservice that listens for incoming requests,
#selects a recommendation (an arm), and updates the model based on the reward.

from flask import Flask, request, jsonify
#from bandit import EpsilonGreedyBandit

app = Flask(__name__)

# Initialize the Bandit algorithm with 3 arms
bandit = EpsilonGreedyBandit(n_arms=3)

@app.route('/recommend', methods=['GET'])
def recommend():
    chosen_arm = bandit.select_arm()
    return jsonify({'chosen_arm': chosen_arm})

@app.route('/reward', methods=['POST'])
def reward():
    data = request.json
    chosen_arm = data['chosen_arm']
    reward = data['reward']
    bandit.update(chosen_arm, reward)
    return jsonify({'message': 'Reward updated successfully'})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)



 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:5000
 * Running on http://172.28.0.12:5000
INFO:werkzeug:[33mPress CTRL+C to quit[0m


#Step 4: Containerize the Microservice Using Docker

Next, we will containerize the Flask microservice using Docker so it can be deployed in any environment.

In [None]:
# Dockerfile:

# Use an official Python runtime as a parent image
FROM python:3.8-slim

# Set the working directory in the container
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# Install the required packages
RUN pip install flask

# Make port 5000 available to the world outside this container
EXPOSE 5000

# Run app.py when the container launches
CMD ["python", "app.py"]


In [None]:
# Build and run the Docker container:

# Build the image
docker build -t bandit-service .

# Run the container
docker run -p 5000:5000 bandit-service

#Step 5: Deploy on Kubernetes

To scale and orchestrate this microservice, we will deploy it on Kubernetes.

Here’s how to deploy it on Kubernetes using Minikube or a cloud provider like AWS or GCP.

In [None]:
# Kubernetes Deployment YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bandit-service-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: bandit-service
  template:
    metadata:
      labels:
        app: bandit-service
    spec:
      containers:
      - name: bandit-service
        image: bandit-service:latest
        ports:
        - containerPort: 5000
        resources:
          requests:
            cpu: "100m"
            memory: "200Mi"
          limits:
            cpu: "500m"
            memory: "500Mi"
---
apiVersion: v1
kind: Service
metadata:
  name: bandit-service
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 5000
  selector:
    app: bandit-service


 This YAML file deploys 3 replicas of the Bandit microservice and exposes it via a load balancer, allowing real-time scalability and high availability.

In [None]:
# Deploy on Kubernetes:

# Apply the deployment and service
kubectl apply -f bandit-deployment.yaml

# Check the deployment status
kubectl get deployments

# Get the service URL (on cloud providers like AWS or GCP)
kubectl get svc

#Step 6: Add Monitoring and Metrics

For monitoring, you can integrate Prometheus and Grafana to track key metrics like:
* Number of recommendations served
* Reward rates
* System resource usage (CPU, memory)

In [None]:
# Here’s an example of integrating Prometheus:

# 1.] Add a Prometheus client to your Flask service.
!pip install prometheus-flask-exporter

# 2.] Modify your Flask app to expose Prometheus metrics.
from prometheus_flask_exporter import PrometheusMetrics

metrics = PrometheusMetrics(app)

# 3.] Deploy Prometheus and Grafana on Kubernetes using Helm:
helm install prometheus prometheus-community/prometheus
helm install grafana grafana/grafana


# Step 7: Scale the System

In [None]:
#With Kubernetes, you can easily scale the system horizontally by increasing the number of replicas
#for the Bandit microservice:

kubectl scale deployment bandit-service-deployment --replicas=5

#Kubernetes will automatically load balance incoming requests across the replicas, ensuring scalability.

# Summary

We implemented a scalable microservices architecture for a Bandit-based reinforcement learning system using:

* 1.] Flask for serving the Bandit algorithm as an API.
* 2.] Docker to containerize the service.
* 3.] Kubernetes for scalable deployment and orchestration.
* 4.] Prometheus/Grafana for monitoring.

This system is designed to handle real-time requests, automatically scale, and provide personalized recommendations with
continuous feedback loops.

# Bandit System Interaction with Users and frontend-backend Integration

In a Bandit-based recommendation system, the interaction between the backend (where the Bandit algorithm is deployed) and the frontend (where users make requests) happens through APIs.

The frontend is responsible for collecting user input and displaying recommendations, while the backend (our Bandit service) handles decision-making and learning from the user’s interactions.


# 1.] User Visits the Website or Application

When a user visits a website or opens the application where the Bandit-based recommendation system is implemented, they might be looking for product recommendations, content, or any personalized suggestions.

# 2.] Frontend Sends a Request to the Bandit API

When a user action is triggered (e.g., clicking on a "Recommend Me" button), the frontend (usually built using HTML/JavaScript or a framework like React/Angular) sends an HTTP request to the backend. This request typically hits the /recommend endpoint of the Bandit microservice.


In [None]:
#Here’s an example of how this might look in JavaScript on the frontend:

function getRecommendation() {
  // Sending request to the Bandit service
  fetch('http://backend-service-url/recommend', {
    method: 'GET',
  })
  .then(response => response.json())
  .then(data => {
    // Get the chosen arm from the response (this is the recommendation)
    let recommendedItem = data.chosen_arm;
    // Display the recommended item in the UI
    displayRecommendation(recommendedItem);
  })
  .catch(error => console.error('Error fetching recommendation:', error));
}

function displayRecommendation(item) {
  document.getElementById('recommendation').innerText = `We recommend item ${item}`;
}



# 3.] Backend Returns a Recommendation

When the backend receives the request at the /recommend endpoint, the Bandit algorithm will select an arm (i.e., a recommended item) based on the current state of the algorithm.

In [None]:

@app.route('/recommend', methods=['GET'])
def recommend():
    chosen_arm = bandit.select_arm()
    return jsonify({'chosen_arm': chosen_arm})


# The backend will respond with a JSON object, for example: {"chosen_arm": 2}.
#The chosen arm represents the recommended content or product (e.g., item ID 2).

# 4.] Frontend Displays the Recommendation to the User

Once the frontend receives the recommended item from the Bandit system, it updates the UI to display the recommendation.



In [None]:

# In the example JavaScript code above, this might look like:

 div id="recommendation" We recommend item X /div

#When the recommendation is received, the inner text of this element is updated to reflect the recommendation.

# 5.] User Interacts with the Recommendation (e.g., Clicks, Purchases, etc.)

The user interacts with the recommendation by taking an action such as clicking on the recommended item, watching a recommended video, or purchasing a suggested product.

# 6.] Frontend Sends Feedback to the Bandit System

After the user interacts with the recommendation, the frontend sends feedback to the Bandit system.
This feedback is crucial for training the algorithm. For example, if the user clicked on the recommended item,
the frontend can send a positive reward to the Bandit system.


In [None]:
# Here's how the frontend can send the reward back to the Bandit system using the /reward endpoint:

function sendReward(chosenArm, reward) {
  fetch('http://backend-service-url/reward', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      chosen_arm: chosenArm,
      reward: reward  // 1 for success (e.g., clicked), 0 for failure
    })
  })
  .then(response => response.json())
  .then(data => console.log('Reward sent successfully:', data))
  .catch(error => console.error('Error sending reward:', error));
}

# Reward: If the user clicked or engaged with the recommended item, a reward (e.g., 1) is sent.
#If the user ignored the recommendation or gave negative feedback, a different reward (e.g., 0) is sent.

#This feedback helps the Bandit system update its internal model, learning from the user's behavior to improve future recommendations.

# Example: json
{
  "chosen_arm": 2,
  "reward": 1
}

In [None]:
# The Bandit system then updates its internal state:

@app.route('/reward', methods=['POST'])
def reward():
    data = request.json
    chosen_arm = data['chosen_arm']
    reward = data['reward']
    bandit.update(chosen_arm, reward)
    return jsonify({'message': 'Reward updated successfully'})

# 7.] Backend Updates the Bandit Algorithm

When the backend receives the reward data, the Bandit algorithm updates its internal values for the chosen arm based on the received reward. This helps the algorithm learn and improve its decision-making process over time.


In [None]:

#In the Bandit service, the following method updates the Bandit’s knowledge of which arms (recommendations) are performing well:
def update(self, chosen_arm, reward):
    self.counts[chosen_arm] += 1
    n = self.counts[chosen_arm]
    value = self.values[chosen_arm]
    self.values[chosen_arm] = ((n - 1) / n) * value + (1 / n) * reward



# 8.] User Receives Improved Recommendations

Over time, as the system receives more feedback and learns from user interactions, the recommendations become more personalized and accurate, improving the user experience and engagement.

#Visual Interaction Flow

* User Request: User clicks "Recommend Me" → Frontend sends GET request to /recommend.
* Backend Response: Bandit service responds with the recommended item (e.g., Item 2).
* Frontend Action: Frontend displays the recommendation to the user.
*  User Interaction: User clicks on the recommended item.
* Feedback Loop: Frontend sends POST request to /reward with the chosen arm and reward (1 for positive interaction).
* Learning: Bandit service updates its model based on the reward, improving future recommendations.

In [None]:
# Example of Simplified Frontend Implementation

#This is a simplified HTML and JavaScript frontend that interacts with the Bandit recommendation system:
'''
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Bandit Recommendation System</title>
</head>
<body>
  <h1>Personalized Recommendations</h1>
  <div id="recommendation">Click the button to get a recommendation</div>
  <button onclick="getRecommendation()">Recommend Me</button>

  <script>
    let currentRecommendation;

    function getRecommendation() {
      fetch('http://localhost:5000/recommend', {
        method: 'GET'
      })
      .then(response => response.json())
      .then(data => {
        currentRecommendation = data.chosen_arm;
        document.getElementById('recommendation').innerText = `We recommend item ${currentRecommendation}`;
      })
      .catch(error => console.error('Error fetching recommendation:', error));
    }

    function sendReward(reward) {
      fetch('http://localhost:5000/reward', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({
          chosen_arm: currentRecommendation,
          reward: reward  // Reward is either 1 for clicked, or 0 for not clicked
        })
      })
      .then(response => response.json())
      .then(data => console.log('Reward sent successfully:', data))
      .catch(error => console.error('Error sending reward:', error));
    }
  </script>
</body>
</html>

'''