<a href="https://colab.research.google.com/github/Abhiss123/AlmaBetter-Projects/blob/main/Multi_Armed_Bandit_Based_SEO_Optimization_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name- Multi-Armed Bandit-Based SEO Optimization System**

---

### **Purpose of the Project**

The **“Multi-Armed Bandit-Based SEO Optimization System”** is designed to help website owners and digital marketers continuously improve their website's performance in search engines by optimizing different aspects of their website pages in real time. This system uses an advanced mathematical model known as the **Multi-Armed Bandit (MAB) algorithm** to automatically test different strategies on webpages and figure out which ones work best for increasing user engagement, such as views, time spent on the page, clicks, and more.

#### **Breaking it Down Step-by-Step:**

1. **What is SEO?**
   - **SEO (Search Engine Optimization)** is a way to make a website more visible on search engines like Google so that more people visit it. This includes optimizing keywords, titles, content, and other elements to rank higher in search results.
   
2. **The Problem with Traditional SEO Optimization:**
   - Traditionally, optimizing SEO requires a lot of manual testing and constant tweaking to figure out what works best. This is not only time-consuming but also challenging because user behavior can change quickly, making it hard to keep up.

3. **The Role of the Multi-Armed Bandit Algorithm:**
   - Imagine a scenario where you have multiple options for headlines, keywords, or webpage layouts, and you want to know which option will attract the most users. The Multi-Armed Bandit algorithm acts like a smart decision-making system that tests each option, observes how users react, and then focuses more on the options that perform the best.
   - It’s called a “bandit” because it works like a gambler playing slot machines (or “arms”), where each arm represents a different strategy. The goal is to maximize rewards (like user engagement) by finding and sticking with the best-performing options over time.

4. **Purpose of the Project:**
   - The main purpose of this project is to automate the process of **SEO optimization** using the Multi-Armed Bandit algorithm. Instead of manually testing and tweaking different SEO strategies, the system continuously tests and adapts the website to select the best-performing strategies in real time.
   - This helps website owners and marketers quickly respond to changes in user behavior and ensures that the website remains optimized for maximum engagement and visibility.

5. **Benefits of the Project:**
   - **Real-Time Optimization**: The system adapts and improves SEO strategies on-the-fly, which saves time and effort.
   - **Data-Driven Decisions**: The algorithm uses actual user data to make intelligent decisions about which strategies to prioritize.
   - **Reduced Manual Effort**: By automating the testing and selection process, the system reduces the need for constant manual adjustments.
   - **Increased Engagement and Traffic**: The goal is to drive more traffic and engagement to the website by focusing on what works best.

---

### **Simple Example:**

Imagine you have a webpage with three different headlines: “Best Tips for SEO,” “SEO Tips You Need,” and “Top SEO Tricks.” The Multi-Armed Bandit system will test all three headlines by showing them to different visitors. It will track which headline gets the most clicks, engagement, and positive user interactions. If “Top SEO Tricks” performs the best, the system will start showing that headline more often while still occasionally testing the others to make sure they haven’t improved. This ensures that your website always uses the best strategies to attract visitors.

---


### **Understanding Multi-Armed Bandit Algorithms for SEO**

#### **What is a Multi-Armed Bandit Algorithm?**
To understand Multi-Armed Bandit Algorithms, imagine you’re at a casino with multiple slot machines (often called “one-armed bandits” due to their lever). Each machine has different but unknown chances of winning. Your goal is to find the machine that gives the best rewards. This scenario captures the essence of the "multi-armed bandit problem." Similarly, when applied to SEO, this algorithm helps select and continuously improve strategies that bring the best results (like website visits, clicks, or conversions) by "testing" different SEO actions and sticking with the ones that perform the best.

#### **Use Cases for Multi-Armed Bandit Algorithms in SEO**
- **Optimizing Content Headlines:** Test multiple headlines for a web page to see which draws the most traffic or engagement.
- **Keyword Optimization:** Automatically identify and use the best-performing keywords.
- **A/B Testing for Web Design and SEO Tactics:** Unlike traditional A/B testing that requires a lot of time to declare a winner, multi-armed bandit algorithms can identify the best-performing option faster and keep adapting.
- **Ad Campaign Optimization:** Continuously test different ad copies or keywords to maximize conversions.

#### **Real-Life Implementation Examples**
- **E-commerce Websites:** Automatically test different product page titles, descriptions, and layouts to find what drives the most sales.
- **Content Websites or Blogs:** Use multi-armed bandits to find which article topics, headings, or tags drive the most engagement or organic traffic.
- **Landing Pages:** Continuously optimize landing page elements (e.g., text, images, CTAs) based on visitor interactions.

#### **Multi-Armed Bandit Algorithms for Websites**
For a website, a multi-armed bandit algorithm would "test" different SEO strategies by evaluating user behavior metrics like clicks, bounce rates, and conversion rates in real-time. The algorithm automatically shifts traffic toward strategies that perform better and minimizes the need for constant manual adjustment, unlike traditional methods that might require detailed and repeated tests over time.

#### **Data Requirements for Multi-Armed Bandit Algorithms**
- **Page URLs and Content Data:** If the focus is on content optimization (like testing headlines), data about webpage content would be necessary, which can be retrieved through web scraping or input as CSV files with relevant content data (e.g., page URL, content type, headline, etc.).
- **User Behavior Metrics:** This data is crucial. Metrics like click-through rates, bounce rates, time on page, conversions, etc., can be input to guide and adjust the algorithm's selections in real-time.
- **CSV Data vs. Web Scraping:** CSV format (with structured data columns) can work if you have collected relevant SEO data. However, if the algorithm needs to dynamically adjust based on live website content, automated data extraction from URLs may be required.

#### **Outputs of a Multi-Armed Bandit Algorithm in SEO Context**
- **Best-Performing Strategy Selection:** The output often highlights which option (headline, keyword, page element, etc.) is currently performing the best.
- **Performance Metrics:** It may provide metrics such as conversion rates, engagement rates, or traffic data for each option tested.
- **Recommended Actions:** The model may suggest actions like redirecting traffic toward a high-performing version of a webpage or tweaking underperforming elements.
  
#### **How Multi-Armed Bandit Algorithms Optimize SEO in Real-Time**
The algorithm continuously tests variations (e.g., different keywords or page titles) and gathers performance data. Based on what works best (highest engagement or conversion rates), it gradually pushes more traffic toward better-performing options. Unlike traditional A/B testing that runs static comparisons, multi-armed bandits dynamically adapt, reducing wasted traffic on ineffective options and speeding up the optimization process.


In [None]:
import pandas as pd
import numpy as np
import random

### Explanation of the Code Snippet

1. **`import pandas as pd`**:
   - **What it does**: This line imports the `pandas` library and gives it the alias `pd`.
   - **Why it's used**: `pandas` is a powerful library for data manipulation and analysis. It provides data structures like DataFrames that make it easy to read, write, and process data in various formats, including CSV, Excel, and more.
   - **Use Case**: You will typically use `pandas` to read datasets into DataFrames, clean and transform data, and perform data analysis.

2. **`import numpy as np`**:
   - **What it does**: This line imports the `numpy` library and gives it the alias `np`.
   - **Why it's used**: `numpy` is a fundamental library for numerical computing in Python. It provides support for arrays and matrices, as well as mathematical functions to operate on these data structures.
   - **Use Case**: `numpy` is often used for performing mathematical operations on large datasets, creating arrays, generating random numbers, and performing other numerical tasks.

3. **`import random`**:
   - **What it does**: This line imports Python’s built-in `random` module.
   - **Why it's used**: The `random` module provides functions to generate random numbers, shuffle data, and select random elements from a list.
   - **Use Case**: In data science and modeling, you might use `random` for tasks like random sampling, simulating data, or selecting a random element (e.g., in a Multi-Armed Bandit model).


In [None]:
import pandas as pd  # Importing the pandas library for data manipulation and analysis
import numpy as np  # Importing the numpy library for numerical operations (not used in this snippet but useful for future use)
import random  # Importing the random module for generating random values (useful for simulations)

# Step 1: Load the dataset containing user engagement metrics
# Explanation: We are loading a CSV file that contains information about how users engage with various pages on a website.
# This data is stored in a pandas DataFrame named 'user_engagement_data' for further processing.
# File Path Note: Make sure the file path is correct and accessible for your environment.
try:
    user_engagement_data = pd.read_csv('/content/drive/MyDrive/Multi-Armed Bandit Datasets/User Engagement Metrics.csv')
    print("Dataset loaded successfully!\n")
except FileNotFoundError:
    print("Error: The dataset could not be found. Please check the file path.\n")
    user_engagement_data = None  # Set to None if the dataset couldn't be loaded

# Display the first few rows of the dataset to confirm successful loading
if user_engagement_data is not None:
    print("First few rows of the dataset:")
    print(user_engagement_data.head())  # Displaying the first few rows for inspection

# Step 2: Define URLs as different "arms" (strategies) to evaluate
# Explanation: We are creating a list of URLs representing different pages or strategies on the website.
# Each URL is treated as a "strategy" or "arm" that can be tested for its effectiveness in user engagement.
urls = [
    "https://webtool.co/",
    "https://webtool.co/adult-seo-service/",
    "https://webtool.co/cosine-similarity/",
    "https://webtool.co/advanced-seo-service/",
    "https://webtool.co/contact-us/",
    "https://webtool.co/diamond-seo-package/",
    "https://webtool.co/silver-seo-package/",
    "https://webtool.co/proximity/",
    "https://webtool.co/adult-seo-service-thailand/",
    "https://webtool.co/features/",
    "https://webtool.co/about-us-webtool/",
    "https://webtool.co/blogs/",
    "https://webtool.co/cora/",
    "https://webtool.co/content-optimization-using-ai/",
    "https://webtool.co/sitemap-checker/",
    "https://webtool.co/lda/"
]

# Step 3: Extract paths from URLs for matching (strip domain and normalize)
# Explanation: This function takes a full URL and extracts the "path" portion (e.g., converting 'https://webtool.co/contact-us/' to '/contact-us').
# The purpose of this function is to normalize the paths by removing the protocol (e.g., 'https://'), converting to lowercase, and removing trailing slashes.
# This normalization ensures accurate matching with data entries.
def extract_path(url):
    """Extract the path from a full URL."""
    if "://" in url:
        url = url.split("://")[1]  # Remove the protocol (e.g., 'https://')
    path = '/' + '/'.join(url.split('/')[1:])  # Extract the path after the domain and add a leading '/'
    return path.lower().rstrip('/')  # Convert to lowercase and remove any trailing slash for consistent matching

# Normalize the URLs for comparison
normalized_urls = [extract_path(url) for url in urls]
print("\nNormalized URLs:")
print(normalized_urls)  # Print the list of normalized URLs for verification

# Step 4: Normalize data for matching with URLs
# Explanation: This function cleans and standardizes data in the provided DataFrame.
# It removes missing values, trims whitespace from column names, and ensures values in the 'Page path and screen class' column
# (if it exists) are formatted consistently (lowercase, no trailing slashes).
# This step makes it easier to match URLs with corresponding entries in the dataset.
def preprocess_data(df):
    """Clean and standardize data."""
    if df is None:
        print("No data to preprocess.\n")
        return None  # Return None if no data was loaded
    df = df.dropna()  # Remove rows with missing data to ensure accurate analysis
    df.columns = df.columns.str.strip()  # Trim any extra spaces from column names
    # Check if the 'Page path and screen class' column exists in the dataset
    if 'Page path and screen class' in df.columns:
        # Clean and normalize values in this column for easier matching with URLs
        df['Page path and screen class'] = df['Page path and screen class'].str.strip().str.lower().str.rstrip('/')
    return df  # Return the cleaned and standardized DataFrame

# Apply the preprocessing function to the dataset
processed_data = preprocess_data(user_engagement_data)

# Display the first few rows of the processed data for verification
if processed_data is not None:
    print("\nProcessed Data:")
    print(processed_data.head())  # Print the first few rows of the processed DataFrame
else:
    print("No processed data available.\n")


Dataset loaded successfully!

First few rows of the dataset:
  Page path and screen class  Views  Active users  Views per active user  \
0                          /   1699           622               2.731511   
1        /adult-seo-service/    246            86               2.860465   
2        /cosine-similarity/    206            76               2.710526   
3     /advanced-seo-service/    186            55               3.381818   
4               /contact-us/    108            43               2.511628   

   Average engagement time per active user  Event count  Key events  \
0                                40.651125         5980           0   
1                                44.593023          856           0   
2                                64.421053          740           0   
3                                47.163636          600           0   
4                                41.720930          318           0   

   Total revenue  
0              0  
1              0 

### Detailed Explanation of Each Step

---

#### **Step 1: Load the Dataset**
```python
try:
    user_engagement_data = pd.read_csv('/content/drive/MyDrive/Multi-Armed Bandit Datasets/User Engagement Metrics.csv')
    print("Dataset loaded successfully!\n")
except FileNotFoundError:
    print("Error: The dataset could not be found. Please check the file path.\n")
    user_engagement_data = None  # Set to None if the dataset couldn't be loaded
```
- **What This Does**: This code tries to load a dataset from a CSV file using the `pandas` library, which is a powerful tool for data analysis.
- **Explanation**:
  - It attempts to read a file located at the specified path. If the file is found, it is loaded into a `pandas` DataFrame called `user_engagement_data`.
  - If the file is not found, an error message is printed, and the `user_engagement_data` variable is set to `None`.
- **Example**:
  - If the file exists and contains data like:
    ```
    Page path and screen class,Views,Engagement Time
    /home,1000,300
    /about-us,500,200
    ```
  - The DataFrame `user_engagement_data` will store this data in a table-like structure.

---

#### **Display the First Few Rows of the Dataset**
```python
if user_engagement_data is not None:
    print("First few rows of the dataset:")
    print(user_engagement_data.head())  # Displaying the first few rows for inspection
```
- **What This Does**: If the dataset was loaded successfully, it displays the first few rows of the data.
- **Explanation**:
  - `user_engagement_data.head()` prints the first few rows of the DataFrame so you can quickly check if the data was loaded correctly.
- **Example**:
  - If the dataset contains:
    ```
    Page path and screen class,Views,Engagement Time
    /home,1000,300
    /about-us,500,200
    ```
  - The output would show:
    ```
    First few rows of the dataset:
       Page path and screen class  Views  Engagement Time
    0                      /home   1000              300
    1                   /about-us    500              200
    ```

---

#### **Step 2: Define URLs as Different "Arms" (Strategies)**
```python
urls = [
    "https://webtool.co/",
    "https://webtool.co/adult-seo-service/",
    "https://webtool.co/cosine-similarity/",
    # ... more URLs
]
```
- **What This Does**: This code creates a list of URLs that represent different pages on a website.
- **Explanation**:
  - Each URL is treated as a "strategy" or "arm" that will be tested to see how effective it is at engaging users.
- **Example**:
  - If you are trying to optimize which page to focus on, each URL represents a different option that you want to evaluate.

---

#### **Step 3: Extract Paths from URLs for Matching (Strip Domain and Normalize)**
```python
def extract_path(url):
    """Extract the path from a full URL."""
    if "://" in url:
        url = url.split("://")[1]  # Remove the protocol (e.g., 'https://')
    path = '/' + '/'.join(url.split('/')[1:])  # Extract the path after the domain and add a leading '/'
    return path.lower().rstrip('/')  # Convert to lowercase and remove any trailing slash for consistent matching
```
- **What This Does**: This function extracts the "path" portion of a URL, converts it to lowercase, and removes any trailing slashes.
- **Explanation**:
  - This step ensures consistency when comparing URLs with data entries in the dataset by removing differences like `https://` or trailing slashes.
- **Example**:
  - Input: `"https://webtool.co/contact-us/"`
  - Output: `"/contact-us"`
  - This makes it easier to compare and match URLs with data.

---

#### **Normalize the URLs for Comparison**
```python
normalized_urls = [extract_path(url) for url in urls]
print("\nNormalized URLs:")
print(normalized_urls)  # Print the list of normalized URLs for verification
```
- **What This Does**: This code applies the `extract_path` function to each URL in the `urls` list to create a list of normalized paths.
- **Explanation**:
  - This step prepares the URLs for accurate matching with data entries.
- **Example**:
  - Input list: `["https://webtool.co/contact-us/", "https://webtool.co/about-us/"]`
  - Output: `["/contact-us", "/about-us"]`

---

#### **Step 4: Normalize Data for Matching with URLs**
```python
def preprocess_data(df):
    """Clean and standardize data."""
    if df is None:
        print("No data to preprocess.\n")
        return None  # Return None if no data was loaded
    df = df.dropna()  # Remove rows with missing data to ensure accurate analysis
    df.columns = df.columns.str.strip()  # Trim any extra spaces from column names
    # Check if the 'Page path and screen class' column exists in the dataset
    if 'Page path and screen class' in df.columns:
        # Clean and normalize values in this column for easier matching with URLs
        df['Page path and screen class'] = df['Page path and screen class'].str.strip().str.lower().str.rstrip('/')
    return df  # Return the cleaned and standardized DataFrame
```
- **What This Does**: This function cleans and standardizes the data in a DataFrame to ensure consistent formatting for comparison.
- **Explanation**:
  - Removes missing values.
  - Trims whitespace from column names.
  - Normalizes the `Page path and screen class` column (if it exists) by converting values to lowercase and removing trailing slashes.
- **Example**:
  - Input data: `[" /about-us ", "/Contact-Us/ "]`
  - Output after processing: `["/about-us", "/contact-us"]`

---

#### **Apply the Preprocessing Function**
```python
processed_data = preprocess_data(user_engagement_data)
```
- **What This Does**: Applies the `preprocess_data` function to the `user_engagement_data` DataFrame.

#### **Display the First Few Rows of the Processed Data**
```python
if processed_data is not None:
    print("\nProcessed Data:")
    print(processed_data.head())  # Print the first few rows of the processed DataFrame
else:
    print("No processed data available.\n")
```
- **What This Does**: Displays the first few rows of the processed data for verification.

---


In [None]:
# Step 5: Match normalized paths with dataset entries
# Explanation: This step matches the normalized URLs (created earlier) with entries in the 'Page path and screen class' column
# of the processed 'user_engagement_data' DataFrame. It checks if each normalized URL exists in the dataset.
# If a match is found, the URL is considered valid (i.e., there is data available for it).
valid_urls = [url for url in normalized_urls if not user_engagement_data[user_engagement_data['Page path and screen class'].str.contains(url, na=False)].empty]

# Step 6: Check if valid URLs are found
# Explanation: If no valid URLs are found in the dataset, a message is printed to inform the user.
# Otherwise, it prints the list of valid URLs that were matched with the data.
if not valid_urls:
    print("\nNo valid URLs with available data found. Please check the data and format.")
else:
    print(f"\nValid URLs with data: {valid_urls}")

    # Step 7: Define the Multi-Armed Bandit Model
    # Explanation: Here, we create a class called 'MultiArmedBandit' to simulate the Multi-Armed Bandit problem.
    # This model helps us find the best-performing strategy (in this case, a URL) based on user engagement data.
    class MultiArmedBandit:
        def __init__(self, arms):
            """
            Initialize the Multi-Armed Bandit model with a list of valid arms (URLs).
            Each arm represents a possible strategy to evaluate.
            """
            self.arms = arms  # List of valid strategies (valid URLs with data)
            self.successes = {arm: 0 for arm in arms}  # Track the number of successes for each arm
            self.failures = {arm: 0 for arm in arms}  # Track the number of failures for each arm

        def select_arm(self):
            """
            Select an arm (strategy) to test based on a probabilistic approach.
            The selection process uses a simple algorithm where the probability of choosing an arm
            is based on its past performance (successes vs. failures).
            """
            total_tries = sum(self.successes.values()) + sum(self.failures.values()) + 1  # Total trials + 1 (for smoothing)
            probabilities = [(self.successes[arm] + 1) / (self.successes[arm] + self.failures[arm] + 2) for arm in self.arms]  # Calculate probabilities
            if sum(probabilities) == 0:
                return random.choice(self.arms)  # Fallback to random choice if probabilities are invalid
            return np.random.choice(self.arms, p=[p / sum(probabilities) for p in probabilities])  # Select arm using probabilities

        def update(self, arm, success):
            """
            Update the performance of an arm based on observed success or failure.
            This function adjusts the success and failure counts for a given arm based on the observed result.
            """
            if success:
                self.successes[arm] += 1  # Increment successes if the trial was successful
            else:
                self.failures[arm] += 1  # Increment failures if the trial was not successful

    # Step 8: Initialize the Multi-Armed Bandit with valid URLs only
    # Explanation: We create an instance of the Multi-Armed Bandit model using the valid URLs identified earlier.
    bandit = MultiArmedBandit(valid_urls)

    # Step 9: Simulate the Multi-Armed Bandit Process
    # Explanation: This loop simulates multiple rounds of testing to find the best-performing URL.
    # In each round, a URL (arm) is selected, and its performance is evaluated based on the data.
    for _ in range(100):  # Simulate 100 rounds of testing
        selected_arm = bandit.select_arm()  # Select an arm (URL) to test
        # Retrieve relevant data for the selected URL (arm) from the dataset
        relevant_data = user_engagement_data[user_engagement_data['Page path and screen class'].str.contains(selected_arm, na=False)]
        # Check if there is data for the selected URL
        if not relevant_data.empty:
            # Calculate a simple success probability based on the 'Views' column
            # (This is just an example; in practice, you might use more complex calculations)
            success_probability = relevant_data['Views'].mean() / 1000  # Normalize views for success probability
            success = random.random() < success_probability  # Determine if this trial is a success or failure
            bandit.update(selected_arm, success)  # Update the bandit's records based on the result

    # Step 10: Identify the Best-Performing Strategy
    # Explanation: We find the URL (arm) with the highest number of successes.
    best_strategy = max(bandit.successes, key=bandit.successes.get)
    print(f"\nThe best-performing strategy (URL) is: {best_strategy}")

    # Step 11: Display Metrics for the Best Strategy
    # Explanation: We retrieve and display data related to the best-performing URL to gain insights into its performance.
    best_strategy_metrics = user_engagement_data[user_engagement_data['Page path and screen class'].str.contains(best_strategy, na=False)]
    if not best_strategy_metrics.empty:
        print("\nPerformance Metrics for the Best Strategy:")
        print(best_strategy_metrics)
    else:
        print("\nNo performance data available for the best-performing strategy.")



Valid URLs with data: ['', '/adult-seo-service', '/cosine-similarity', '/advanced-seo-service', '/contact-us', '/diamond-seo-package', '/silver-seo-package', '/proximity', '/adult-seo-service-thailand', '/features', '/about-us-webtool', '/blogs', '/cora', '/content-optimization-using-ai', '/sitemap-checker', '/lda']

The best-performing strategy (URL) is: /advanced-seo-service

Performance Metrics for the Best Strategy:
  Page path and screen class  Views  Active users  Views per active user  \
3     /advanced-seo-service/    186            55               3.381818   

   Average engagement time per active user  Event count  Key events  \
3                                47.163636          600           0   

   Total revenue  
3              0  


---

### **Step By Step Code Explanation**

#### **Step 5: Match Normalized Paths with Dataset Entries**
```python
valid_urls = [url for url in normalized_urls if not user_engagement_data[user_engagement_data['Page path and screen class'].str.contains(url, na=False)].empty]
```
- **What This Does**: This line of code matches the normalized URLs (prepared earlier) with entries in the `Page path and screen class` column of the `user_engagement_data` DataFrame.
- **Explanation**:
  - For each URL in the list of normalized URLs, it checks if there is any data available in the dataset.
  - If a match is found, the URL is added to the `valid_urls` list.
  - **Example**: If `/contact-us` is one of the URLs and there is data for it in the dataset, it is considered "valid" and added to the list.
- **Why This Is Important**: This ensures that only URLs with data are considered for further processing.

#### **Step 6: Check if Valid URLs are Found**
```python
if not valid_urls:
    print("\nNo valid URLs with available data found. Please check the data and format.")
else:
    print(f"\nValid URLs with data: {valid_urls}")
```
- **What This Does**: Checks if any valid URLs were found in the previous step.
- **Explanation**:
  - If `valid_urls` is empty, it means no matches were found, and a message is printed.
  - Otherwise, it prints the list of valid URLs that have corresponding data.
- **Example**: If `valid_urls` contains `["/contact-us", "/about-us"]`, it will print these URLs.

#### **Step 7: Define the Multi-Armed Bandit Model**
```python
class MultiArmedBandit:
    def __init__(self, arms):
        """
        Initialize the Multi-Armed Bandit model with a list of valid arms (URLs).
        Each arm represents a possible strategy to evaluate.
        """
        self.arms = arms  # List of valid strategies (valid URLs with data)
        self.successes = {arm: 0 for arm in arms}  # Track the number of successes for each arm
        self.failures = {arm: 0 for arm in arms}  # Track the number of failures for each arm
```
- **What This Does**: Defines a class called `MultiArmedBandit` that represents the Multi-Armed Bandit model.
- **Explanation**:
  - The `__init__` method initializes the model with a list of "arms" (valid URLs).
  - It also creates dictionaries to keep track of successes and failures for each URL.
- **Example**:
  - If `valid_urls` contains `["/contact-us", "/about-us"]`, `self.successes` will be `{'/contact-us': 0, '/about-us': 0}` and `self.failures` will be `{'/contact-us': 0, '/about-us': 0}`.

---

#### **Step 8: Selecting an Arm (URL) to Test**
```python
def select_arm(self):
    """
    Select an arm (strategy) to test based on a probabilistic approach.
    The selection process uses a simple algorithm where the probability of choosing an arm
    is based on its past performance (successes vs. failures).
    """
    total_tries = sum(self.successes.values()) + sum(self.failures.values()) + 1  # Total trials + 1 (for smoothing)
    probabilities = [(self.successes[arm] + 1) / (self.successes[arm] + self.failures[arm] + 2) for arm in self.arms]  # Calculate probabilities
    if sum(probabilities) == 0:
        return random.choice(self.arms)  # Fallback to random choice if probabilities are invalid
    return np.random.choice(self.arms, p=[p / sum(probabilities) for p in probabilities])  # Select arm using probabilities
```
- **What This Does**: Selects a URL (arm) to test based on a probabilistic approach.
- **Explanation**:
  - Calculates probabilities for each URL based on past successes and failures.
  - Selects an arm using these probabilities. If probabilities are invalid, it randomly selects an arm.
- **Example**: If `/contact-us` has a high success rate, it is more likely to be chosen for testing.

#### **Step 9: Updating the Performance of an Arm**
```python
def update(self, arm, success):
    """
    Update the performance of an arm based on observed success or failure.
    This function adjusts the success and failure counts for a given arm based on the observed result.
    """
    if success:
        self.successes[arm] += 1  # Increment successes if the trial was successful
    else:
        self.failures[arm] += 1  # Increment failures if the trial was not successful
```
- **What This Does**: Updates the number of successes or failures for a given URL based on the result of the trial.
- **Explanation**:
  - If the trial was successful, the success count for the arm is increased.
  - Otherwise, the failure count is increased.

#### **Step 10: Initialize the Multi-Armed Bandit Model**
```python
bandit = MultiArmedBandit(valid_urls)
```
- **What This Does**: Creates an instance of the `MultiArmedBandit` class using the valid URLs found earlier.

#### **Step 11: Simulate the Multi-Armed Bandit Process**
```python
for _ in range(100):  # Simulate 100 rounds of testing
    selected_arm = bandit.select_arm()  # Select an arm (URL) to test
    # Retrieve relevant data for the selected URL (arm) from the dataset
    relevant_data = user_engagement_data[user_engagement_data['Page path and screen class'].str.contains(selected_arm, na=False)]
    # Check if there is data for the selected URL
    if not relevant_data.empty:
        # Calculate a simple success probability based on the 'Views' column
        success_probability = relevant_data['Views'].mean() / 1000  # Normalize views for success probability
        success = random.random() < success_probability  # Determine if this trial is a success or failure
        bandit.update(selected_arm, success)  # Update the bandit's records based on the result
```
- **What This Does**: Simulates 100 rounds of testing different URLs to evaluate their performance.
- **Explanation**:
  - In each round, a URL is selected and tested based on the data.
  - The success or failure is determined using a simple probability calculation.

#### **Step 12: Identify the Best-Performing Strategy**
```python
best_strategy = max(bandit.successes, key=bandit.successes.get)
print(f"\nThe best-performing strategy (URL) is: {best_strategy}")
```
- **What This Does**: Finds the URL with the highest number of successes.
- **Example**: If `/about-us` has the most successes, it is considered the best-performing URL.

#### **Step 13: Display Metrics for the Best Strategy**
```python
best_strategy_metrics = user_engagement_data[user_engagement_data['Page path and screen class'].str.contains(best_strategy, na=False)]
if not best_strategy_metrics.empty:
    print("\nPerformance Metrics for the Best Strategy:")
    print(best_strategy_metrics)
else:
    print("\nNo performance data available for the best-performing strategy.")
```
- **What This Does**: Retrieves and displays data related to the best-performing URL.

---


In [None]:
# Step 8: Initialize the Multi-Armed Bandit with valid URLs only
# Explanation: Here, we create an instance of the Multi-Armed Bandit class using only the valid URLs found in previous steps.
# These valid URLs represent strategies that have data and will be tested using the Multi-Armed Bandit approach.
bandit = MultiArmedBandit(valid_urls)

# Step 9: Simulate the Multi-Armed Bandit Process
# Explanation: We simulate a process of selecting and testing strategies (URLs) using the Multi-Armed Bandit model.
# This simulation runs for 100 rounds, where each round represents testing one of the URLs.
for _ in range(100):  # Simulate 100 rounds of testing
    selected_arm = bandit.select_arm()  # Select an arm (URL) to test using the Multi-Armed Bandit model's selection logic
    # Retrieve relevant data for the selected URL (arm) from the dataset
    relevant_data = user_engagement_data[user_engagement_data['Page path and screen class'].str.contains(selected_arm, na=False)]
    # Check if there is data for the selected URL
    if not relevant_data.empty:
        # Calculate a simple success probability based on the average number of views
        # This normalization divides the mean view count by 1000 to create a probability value
        success_probability = relevant_data['Views'].mean() / 1000  # Normalize views for success calculation
        # Determine if this trial is successful using a random value compared against the success probability
        success = random.random() < success_probability
        # Update the model's success/failure record for this URL (arm)
        bandit.update(selected_arm, success)

# Step 10: Identify the Best-Performing Strategy
# Explanation: After running the simulation, we identify the URL (strategy) with the highest number of successes.
best_strategy = max(bandit.successes, key=bandit.successes.get)  # Find the URL with the most successes
print(f"The best-performing strategy (URL) is: {best_strategy}")  # Print the best-performing URL

# Step 11: Display Metrics for the Best Strategy
# Explanation: We retrieve data related to the best-performing URL to gain insights into its performance.
best_strategy_metrics = user_engagement_data[user_engagement_data['Page path and screen class'].str.contains(best_strategy, na=False)]
if not best_strategy_metrics.empty:
    print("Performance Metrics for the Best Strategy:")
    print(best_strategy_metrics)  # Print relevant metrics for the best-performing URL
else:
    print("No performance data available for the best-performing strategy.")

# Step 12: Provide Recommendations
# Explanation: Based on the results, we provide a recommendation to redirect traffic or optimize similar pages.
recommendations = f"Consider redirecting more traffic to {best_strategy} or optimizing similar pages based on observed engagement metrics."
print(recommendations)


The best-performing strategy (URL) is: /advanced-seo-service
Performance Metrics for the Best Strategy:
  Page path and screen class  Views  Active users  Views per active user  \
3     /advanced-seo-service/    186            55               3.381818   

   Average engagement time per active user  Event count  Key events  \
3                                47.163636          600           0   

   Total revenue  
3              0  
Consider redirecting more traffic to /advanced-seo-service or optimizing similar pages based on observed engagement metrics.


---

### Code Explanation

#### **Step 8: Initialize the Multi-Armed Bandit with Valid URLs Only**
```python
bandit = MultiArmedBandit(valid_urls)
```
- **Purpose**: This line creates an instance of the `MultiArmedBandit` class using a list of valid URLs.
- **Explanation**:
  - These valid URLs were identified earlier as having data available in the dataset.
  - The model will test these URLs to find out which one performs best.
- **Example**:
  - If `valid_urls` is `["/contact-us", "/about-us"]`, the `bandit` object will track and evaluate these URLs using the Multi-Armed Bandit approach.

#### **Step 9: Simulate the Multi-Armed Bandit Process**
```python
for _ in range(100):  # Simulate 100 rounds of testing
    selected_arm = bandit.select_arm()  # Select an arm (URL) to test using the Multi-Armed Bandit model's selection logic
    relevant_data = user_engagement_data[user_engagement_data['Page path and screen class'].str.contains(selected_arm, na=False)]
```
- **Purpose**: This loop simulates 100 rounds of testing to evaluate the performance of different URLs (arms).
- **Explanation**:
  - The `bandit.select_arm()` method selects a URL to test based on its past performance (successes and failures).
  - Data related to the selected URL is retrieved from the `user_engagement_data` DataFrame.
- **Example**:
  - If `/contact-us` is selected, the data for `/contact-us` from the dataset will be retrieved for analysis.

```python
if not relevant_data.empty:
    success_probability = relevant_data['Views'].mean() / 1000  # Normalize views for success calculation
    success = random.random() < success_probability
    bandit.update(selected_arm, success)
```
- **Purpose**: This block evaluates the performance of the selected URL.
- **Explanation**:
  - If data is found for the selected URL, a success probability is calculated using the mean number of views divided by 1000. This creates a normalized probability value.
  - A random value is generated, and if it is less than the success probability, the trial is considered a success.
  - The `bandit.update()` method is called to record whether the trial was a success or failure.
- **Example**:
  - If the mean views for `/contact-us` is 500, the success probability is `500 / 1000 = 0.5`.
  - A random value is generated (e.g., 0.4), and since `0.4 < 0.5`, the trial is a success, and the success count for `/contact-us` is increased.

---

#### **Step 10: Identify the Best-Performing Strategy**
```python
best_strategy = max(bandit.successes, key=bandit.successes.get)  # Find the URL with the most successes
print(f"The best-performing strategy (URL) is: {best_strategy}")  # Print the best-performing URL
```
- **Purpose**: This line identifies the URL (strategy) with the highest number of successes.
- **Explanation**:
  - The `bandit.successes` dictionary contains the number of successes for each URL.
  - The `max()` function finds the URL with the most successes.
- **Example**:
  - If `/contact-us` has 30 successes and `/about-us` has 20 successes, `/contact-us` is identified as the best-performing URL.

#### **Step 11: Display Metrics for the Best Strategy**
```python
best_strategy_metrics = user_engagement_data[user_engagement_data['Page path and screen class'].str.contains(best_strategy, na=False)]
if not best_strategy_metrics.empty:
    print("Performance Metrics for the Best Strategy:")
    print(best_strategy_metrics)  # Print relevant metrics for the best-performing URL
else:
    print("No performance data available for the best-performing strategy.")
```
- **Purpose**: This block retrieves and displays data related to the best-performing URL.
- **Explanation**:
  - Data for the best-performing URL is retrieved from the `user_engagement_data` DataFrame.
  - If data is found, it is displayed; otherwise, a message is printed indicating no data is available.
- **Example**:
  - If `/contact-us` is the best-performing URL, metrics like views and engagement will be displayed.

#### **Step 12: Provide Recommendations**
```python
recommendations = f"Consider redirecting more traffic to {best_strategy} or optimizing similar pages based on observed engagement metrics."
print(recommendations)
```
- **Purpose**: This line provides a recommendation based on the results of the simulation.
- **Explanation**:
  - Suggests redirecting more traffic to the best-performing URL or optimizing similar pages to improve engagement.
- **Example**:
  - If `/contact-us` is the best-performing URL, the recommendation would be: "Consider redirecting more traffic to `/contact-us` or optimizing similar pages based on observed engagement metrics."

---


In [None]:
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


In [None]:
import pandas as pd
import numpy as np
import random

# Step 1: Load datasets (assuming datasets are loaded correctly)
user_engagement_data = pd.read_csv('/content/drive/MyDrive/Multi-Armed Bandit Datasets/User Engagement Metrics.csv')

# Step 2: Define URLs as different "arms" (strategies)
urls = [
    "https://webtool.co/",
    "https://webtool.co/adult-seo-service/",
    "https://webtool.co/cosine-similarity/",
    "https://webtool.co/advanced-seo-service/",
    "https://webtool.co/contact-us/",
    "https://webtool.co/diamond-seo-package/",
    "https://webtool.co/silver-seo-package/",
    "https://webtool.co/proximity/",
    "https://webtool.co/adult-seo-service-thailand/",
    "https://webtool.co/features/",
    "https://webtool.co/about-us-webtool/",
    "https://webtool.co/blogs/",
    "https://webtool.co/cora/",
    "https://webtool.co/content-optimization-using-ai/",
    "https://webtool.co/sitemap-checker/",
    "https://webtool.co/lda/"
]

# Step 3: Extract paths from URLs for matching (strip domain and normalize)
def extract_path(url):
    """Extract the path from a full URL."""
    if "://" in url:
        url = url.split("://")[1]  # Remove protocol (e.g., 'https://')
    path = '/' + '/'.join(url.split('/')[1:])  # Extract path and add leading '/'
    return path.lower().rstrip('/')  # Normalize to lowercase and remove trailing slash

# Step 4: Normalize data for matching
def preprocess_data(df):
    """Clean and standardize data."""
    df = df.dropna()
    df.columns = df.columns.str.strip()
    if 'Page path and screen class' in df.columns:
        df['Page path and screen class'] = df['Page path and screen class'].str.strip().str.lower().str.rstrip('/')
    return df

user_engagement_data = preprocess_data(user_engagement_data)
normalized_urls = [extract_path(url) for url in urls]

# Step 5: Match normalized paths with dataset entries
valid_urls = [url for url in normalized_urls if not user_engagement_data[user_engagement_data['Page path and screen class'].str.contains(url, na=False)].empty]

# Step 6: Check if valid URLs are found
if not valid_urls:
    print("\nNo valid URLs with available data found. Please check the data and format.")
else:
    print(f"\nValid URLs with data: {valid_urls}")

    # Step 7: Define the Multi-Armed Bandit Model
    class MultiArmedBandit:
        def __init__(self, arms):
            """
            Initialize the Multi-Armed Bandit model with a list of valid arms (URLs).
            """
            self.arms = arms  # List of strategies (valid URLs)
            self.successes = {arm: 0 for arm in arms}
            self.failures = {arm: 0 for arm in arms}

        def select_arm(self):
            """
            Select an arm (strategy) based on a probabilistic approach.
            """
            total_tries = sum(self.successes.values()) + sum(self.failures.values()) + 1
            probabilities = [(self.successes[arm] + 1) / (self.successes[arm] + self.failures[arm] + 2) for arm in self.arms]
            if sum(probabilities) == 0:
                return random.choice(self.arms)  # Fallback to random choice if probabilities are invalid
            return np.random.choice(self.arms, p=[p / sum(probabilities) for p in probabilities])

        def update(self, arm, success):
            """
            Update the performance of an arm based on observed success or failure.
            """
            if success:
                self.successes[arm] += 1
            else:
                self.failures[arm] += 1

    # Step 8: Initialize the Multi-Armed Bandit with valid URLs only
    bandit = MultiArmedBandit(valid_urls)

    # Step 9: Simulate the Multi-Armed Bandit Process
    for _ in range(100):  # Simulate 100 rounds
        selected_arm = bandit.select_arm()
        relevant_data = user_engagement_data[user_engagement_data['Page path and screen class'].str.contains(selected_arm, na=False)]
        if not relevant_data.empty:
            success_probability = relevant_data['Views'].mean() / 1000  # Example normalization for success calculation
            success = random.random() < success_probability
            bandit.update(selected_arm, success)

    # Step 10: Identify the Best-Performing Strategy
    best_strategy = max(bandit.successes, key=bandit.successes.get)
    print(f"The best-performing strategy (URL) is: {best_strategy}")

    # Step 11: Display Metrics for the Best Strategy
    best_strategy_metrics = user_engagement_data[user_engagement_data['Page path and screen class'].str.contains(best_strategy, na=False)]
    if not best_strategy_metrics.empty:
        print("Performance Metrics for the Best Strategy:")
        print(best_strategy_metrics)
    else:
        print("No performance data available for the best-performing strategy.")

    # Step 12: Provide Recommendations
    recommendations = f"Consider redirecting more traffic to {best_strategy} or optimizing similar pages based on observed engagement metrics."
    print(recommendations)



Valid URLs with data: ['', '/adult-seo-service', '/cosine-similarity', '/advanced-seo-service', '/contact-us', '/diamond-seo-package', '/silver-seo-package', '/proximity', '/adult-seo-service-thailand', '/features', '/about-us-webtool', '/blogs', '/cora', '/content-optimization-using-ai', '/sitemap-checker', '/lda']
The best-performing strategy (URL) is: /cosine-similarity
Performance Metrics for the Best Strategy:
  Page path and screen class  Views  Active users  Views per active user  \
2         /cosine-similarity    206            76               2.710526   

   Average engagement time per active user  Event count  Key events  \
2                                64.421053          740           0   

   Total revenue  
2              0  
Consider redirecting more traffic to /cosine-similarity or optimizing similar pages based on observed engagement metrics.


### **Explanation of the Output**

#### **1. Valid URLs with Data**
- **What it means**: The MAB algorithm first checked which URLs from your list have corresponding data in the dataset. These URLs represent different pages on your website.
- **Valid URLs Found**: The output shows a list of valid URLs that were matched with data entries in your dataset. For example, some of the valid URLs include:
  - `/adult-seo-service`
  - `/cosine-similarity`
  - `/advanced-seo-service`
  - And so on...
- **Why it's important**: This step is crucial because the MAB algorithm can only test and optimize pages that have available data. If a page doesn't have any data, it can't be included in the optimization process.

#### **2. The Best-Performing Strategy (URL)**
- **What it means**: The MAB algorithm tested and evaluated different URLs from the list of valid URLs. After multiple trials, it determined which URL performed the best based on certain criteria, such as user engagement metrics.
- **Best-Performing URL**: In your case, the best-performing URL is `/cosine-similarity`.
- **Why it's important**: This tells you which page on your website is currently performing the best based on the data available. It helps you understand where users are most engaged or where the page performs well compared to others.

#### **3. Performance Metrics for the Best Strategy**
- **What it means**: After identifying the best-performing URL (`/cosine-similarity`), the algorithm retrieved specific performance metrics from the dataset for this page.
- **Metrics Displayed**:
  - **Page Path and Screen Class**: Shows the URL path being analyzed (`/cosine-similarity`).
  - **Views**: The page had 206 views, meaning it was visited 206 times during the period covered by the data.
  - **Active Users**: There were 76 active users interacting with this page.
  - **Views per Active User**: On average, each active user viewed the page approximately 2.71 times.
  - **Average Engagement Time per Active User**: The average amount of time each active user spent on the page was 64.42 units of time (e.g., seconds or minutes, depending on your dataset).
  - **Event Count**: This indicates the number of tracked interactions or events that occurred on the page, which was 740 in this case.
  - **Key Events**: This metric is `0`, meaning there were no specific key events tracked or recorded for this page.
  - **Total Revenue**: The revenue generated from this page was `0`, indicating no revenue during the tracked period.
- **Why it's important**: These metrics provide detailed insights into how users interacted with the page. High views, engagement time, and event counts indicate strong user interest and engagement.

#### **4. Recommendations**
- **What it means**: Based on the performance of the best-performing URL, the algorithm provides a recommendation. It suggests that you:
  - **Redirect More Traffic**: Consider driving more users to the `/cosine-similarity` page since it performed well.
  - **Optimize Similar Pages**: You could also improve pages with similar content or structure to boost overall engagement.
- **Why it's important**: These recommendations help you focus your efforts on pages that are already performing well, potentially increasing traffic and engagement across your website.

### Use Cases and Next Steps
1. **Focus on High-Performing Pages**: Use the insights to drive more traffic to `/cosine-similarity`. This could involve updating content, promoting the page through marketing campaigns, or optimizing SEO keywords.
2. **Optimize Other Pages**: Analyze why this page performs well and apply similar strategies (e.g., content type, layout, keywords) to other pages.
3. **Track Improvements**: Continue using the MAB algorithm to monitor and adjust your SEO strategy based on changing user behaviors and engagement metrics.


### **Analysis of the Output**

1. **Valid URLs with Data**:
   - The output indicates that a list of valid URLs with data was found from the `user_engagement_data` dataset. These URLs were normalized (removing domains) to match the format of entries in the dataset.
   - Valid URLs include paths such as `'/adult-seo-service'`, `'/cosine-similarity'`, and so forth. The empty string `''` is shown because it matches the root path `/`.

2. **Best-Performing Strategy**:
   - The MAB model identified `'/cosine-similarity'` as the best-performing strategy based on the simulations.
   - This selection is made using a probabilistic approach where the MAB model tests different strategies and updates their success rates based on observed performance.

3. **Performance Metrics for the Best Strategy**:
   - The model retrieved detailed performance metrics for the selected strategy `'/cosine-similarity'`.
   - Metrics include:
     - **Views**: 206 views for the page.
     - **Active Users**: 76 active users.
     - **Views per Active User**: Approximately 2.71 views per user.
     - **Average Engagement Time per Active User**: 64.42 units of time (likely in seconds or minutes, depending on your dataset).
     - **Event Count and Key Events**: Indicates user interactions and any specific tracked events (none in this case).
     - **Total Revenue**: 0, indicating no revenue was generated from this page in the dataset.

4. **Recommendations**:
   - The model suggests redirecting more traffic to `'/cosine-similarity'` or optimizing similar pages based on observed engagement metrics. This recommendation aligns with the best-performing strategy and suggests a data-driven approach to SEO improvement.

### **Comparison with Expected Output**

- **Best-Performing Strategy Selection**: The model successfully highlighted the best-performing URL based on available data, which matches the expected output.
- **Performance Metrics**: Detailed metrics were provided for the best-performing URL, such as views, engagement time, etc., as expected.
- **Recommended Actions**: The output offered actionable recommendations to improve SEO performance, consistent with the expected behavior of a Multi-Armed Bandit model for SEO optimization.

### Conclusion
- **Match with Expected Output**: Yes, the output aligns with what we were expecting from the MAB model. The model identified a best-performing strategy, provided detailed performance metrics, and made actionable recommendations based on data.
- **Further Considerations**: You can refine the recommendations further or introduce more complex metrics (e.g., conversion rates) if your dataset supports it.

### Simple Explanation for Non-Technical Understanding
- **What Happened**: The MAB model tested different webpage paths to see which one was performing best based on user engagement data (e.g., views, time spent). After many rounds of testing, it decided that `'/cosine-similarity'` was performing the best.
- **Metrics**: The model looked at how many people visited, how much time they spent, and how engaged they were.
- **Recommendations**: It suggested focusing on `'/cosine-similarity'` because it performed well compared to other pages, meaning you could improve your website’s SEO by focusing more traffic or optimization efforts there.

---