<a href="https://colab.research.google.com/github/Abhiss123/AlmaBetter-Projects/blob/main/Automated_SEO_A_B_Testing_with_Machine_Learning_for_Performance_Improvement.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name : Automated SEO A/B Testing with Machine Learning for Performance Improvement**

---

### **Purpose of the Project:**

The purpose of this project is to **help website owners optimize their web pages** by automatically testing and analyzing changes made to their content, titles, and descriptions. The goal is to improve important website performance metrics, like **click-through rates (CTR)**, **engagement**, and **conversion rates**. This project uses **machine learning** to identify and recommend the best SEO (Search Engine Optimization) strategies that can boost a website’s visibility on search engines like Google. Let’s break down each part to understand the purpose in more detail.

### 1. **Automated Testing of SEO Changes**

In this project, **SEO A/B testing** refers to creating and testing two versions (A and B) of specific elements on a webpage, such as titles, descriptions, or content. The purpose of this testing is to see which version performs better. For example:
   - Version A might have a simpler title, while Version B has a more detailed, keyword-rich title.
   - This test compares which version gets more clicks or engages users better.

The **automation** part means that instead of manually comparing every element, the project uses machine learning to analyze the differences between these two versions, helping website owners make data-backed decisions faster and with less effort.

### 2. **Using Machine Learning for Better Insights**

Machine learning is used in this project to process large amounts of data and spot trends that might not be immediately obvious. Here’s why machine learning is important:
   - **Predictive Insights**: The model can learn from past data to predict which SEO changes will likely work best.
   - **Keyword Analysis**: It can identify the most effective keywords to include in a title, description, or content, increasing the chance of ranking higher in search results.
   - **Pattern Recognition**: Machine learning can detect patterns, like which types of words or phrases lead to more user engagement.

By using machine learning, the project goes beyond simple data analysis; it offers insights and recommendations based on a smart understanding of what tends to work best for SEO.

### 3. **Improving Website Performance Metrics**

The **main objective** is to help website owners improve specific performance metrics. Here’s how:
   - **Click-Through Rate (CTR)**: This measures how many people click on a webpage link after seeing it in search results. Higher CTR usually means the title and description are appealing to users.
   - **Engagement**: This measures how long users stay on the page and how they interact with it. Improving engagement is essential for keeping visitors interested.
   - **Conversion Rate**: This is the percentage of visitors who take an action on the page, like signing up or making a purchase. This is crucial for websites focused on sales or lead generation.

Machine learning helps identify what elements (e.g., keywords, phrases) influence these metrics, and A/B testing shows which version performs better, helping the website gradually improve its performance.

### 4. **Providing SEO Recommendations Based on Data**

This project doesn’t just analyze data; it also provides actionable SEO recommendations. For example:
   - If the machine learning model finds that certain keywords are highly effective, it will suggest including them in the page’s title, description, or content.
   - The recommendations are based on what actually works according to the data, not just general SEO rules, so they are customized to the specific needs of the website.

This way, website owners can confidently make changes knowing that they’re backed by real, data-driven insights.

---

### Summary: The Practical Benefits of This Project

To summarize, "Automated SEO A/B Testing with Machine Learning for Performance Improvement" is designed to:
1. **Automate the testing process** of different SEO elements (like titles, descriptions) on a webpage.
2. **Use machine learning** to analyze data and suggest the best options for improving SEO.
3. **Focus on boosting critical website metrics** (CTR, engagement, and conversions), helping the site attract and retain more visitors.
4. **Provide clear SEO recommendations** based on data, allowing website owners to make informed decisions on optimizing their web pages.

This project ultimately helps website owners **increase visibility, attract more visitors, and drive higher engagement and conversions** on their sites, all with the help of smart, automated insights from machine learning.

### **Understanding SEO A/B Testing with Machine Learning**

**SEO (Search Engine Optimization) A/B Testing** with Machine Learning is a technique where changes made to a website, such as adjusting titles, descriptions, content, or keywords, are tested to see which version performs better in search rankings and attracts more user clicks. Machine learning models are used to predict which changes will likely improve outcomes like click-through rates (CTR) or conversions, based on previous data. These models analyze trends and patterns in data to predict the effectiveness of various SEO strategies.

### **Use Cases of SEO A/B Testing with Machine Learning**

1. **Optimizing Click-Through Rates (CTR)**: Testing which page titles, meta descriptions, or content types attract more clicks.
2. **Improving Conversion Rates**: Analyzing user behavior to find out which web pages or content layouts lead to more conversions (e.g., sign-ups, purchases).
3. **Reducing Bounce Rates**: Determining what changes make users stay on the page longer instead of leaving quickly.
4. **Testing Content Relevance**: Identifying content changes that improve relevance for search terms users are entering.

### **Real-Life Implementation on Websites**

In a website context, SEO A/B Testing with Machine Learning involves making two versions of a page (let’s call them version A and version B), each with slight changes, and then using a machine learning model to analyze which version performs better. For example, if a website wants to improve CTR on a particular blog post, it might test two different titles or meta descriptions to see which one brings more clicks. A machine learning algorithm would analyze the click patterns, time spent on the page, and other interactions to predict and choose the best-performing option.

### **Data Requirements for SEO A/B Testing with Machine Learning**

To train a machine learning model for SEO A/B Testing, the model requires specific types of data. Typically, data can be collected in two forms:

1. **CSV Format**: The CSV (Comma-Separated Values) format is a common way to store large amounts of data. For an SEO A/B test, you might have a CSV file that lists URLs, page titles, meta descriptions, CTR, bounce rates, and other performance metrics for each version of the page (A and B).
   
2. **Direct Website URLs and Text Content**: If your machine learning model needs to analyze the actual text content on each page, it might directly use the URLs to fetch content. For example, if testing the effect of content length or keywords, the model may need to analyze the live page content from the URLs.

In practice, **CSV format** is usually more manageable, as it’s straightforward to structure, analyze, and store. However, some A/B tests, especially content analysis-based ones, might require scraping text content directly from URLs.

### **Steps in SEO A/B Testing with Machine Learning**

1. **Data Collection**: Gather data from your website. This might include metrics like page views, CTR, bounce rates, time spent on the page, and conversion rates for different versions.
   
2. **Data Processing and Cleaning**: Machine learning models need clean, structured data to work effectively. You’ll filter out irrelevant data, standardize formats, and organize data points (e.g., CTR, conversions) for each page version in your CSV.

3. **Model Training and Testing**: Using the processed data, you train the model to recognize patterns in which changes (e.g., title wording, meta description) increase performance. During training, the model learns the factors that impact SEO metrics and becomes better at predicting outcomes for future tests.

4. **Analysis and Recommendations**: Once the model is trained, it can be applied to new A/B test scenarios. For instance, if you want to try a new meta description, the model can predict how this change might impact CTR or bounce rate. The model’s output includes which version (A or B) is likely to perform better and why.



### **Can an SEO A/B Testing Model Work with Just URLs?**

In an ideal scenario, an SEO A/B Testing model does need more detailed data, such as page titles, meta descriptions, click-through rates (CTR), and user behavior metrics like bounce rate and conversion rates (e.g., form sign-ups or purchases). These metrics are usually obtained from tools like Google Analytics or other tracking software.

Without access to such data, the model can still work with the website URLs alone by **scraping** the content of those pages to get some of the basic information, but there are limitations. Here’s a breakdown of what’s possible and what isn’t:

### What the Model Can Do with Just URLs and Scraping

If you only provide the URLs, a scraping tool can extract specific parts of each webpage, such as:
1. **Page Titles**: The title that appears in search results (e.g., “SEO Services for Better Rankings | Thatware”).
2. **Meta Descriptions**: The brief description that shows up under the title in search results (e.g., “Discover our range of AI-based SEO solutions designed to boost your ranking…”).
3. **Content Structure**: The headings, main body content, images, and keywords used on each page.

From this scraped data, the model can perform certain analyses:
- **Analyze Content Structure and Keywords**: The model can analyze if some content types or keyword patterns are more optimized for SEO or are likely to attract more clicks based on general SEO guidelines.
- **Suggest Optimizations for Titles and Descriptions**: Based on patterns in popular SEO strategies, the model can recommend adjustments in title length, keyword usage, or description tone.

However, since scraping won’t provide **user behavior data** (like how many people clicked, stayed, or converted), the model **cannot predict accurately which changes will improve CTR, bounce rates, or conversions** without this additional data. The model would instead focus on **content-based optimizations** rather than user behavior-based predictions.

### What Data Would Improve SEO A/B Testing Accuracy

To conduct a truly effective SEO A/B test, the following metrics would ideally be added to improve model predictions:
- **Click-Through Rate (CTR)**: The percentage of users clicking on the search result, which helps identify effective titles and descriptions.
- **Bounce Rate**: The percentage of users who leave the page without interacting, which helps in determining how engaging or relevant the content is.
- **Dwell Time or Time Spent**: How long a user stays on the page, indicating content engagement.
- **Conversion Data**: Information on whether a user completed an action, like signing up or making a purchase, showing the effectiveness of layout and content.

Without access to analytics data, the model can only focus on **optimizing on-page SEO elements** (like title tags, meta descriptions, keywords) but won’t be able to determine how these changes impact user behavior or conversions.

### What Output Can Be Expected from This Model with Only URL Data?

If you proceed with just URLs for scraping, here are the types of insights and output you can expect:
- **SEO Content Quality Analysis**: Based on keyword density, title relevance, and meta description structure, the model can suggest improvements that align with SEO best practices.
- **On-Page SEO Suggestions**: Recommendations on how to improve title tags, meta descriptions, and keyword usage, which are factors that influence click rates.
- **Comparative Content Insights**: The model might identify which types of content (e.g., listicles, how-to guides, in-depth articles) are commonly optimized across your site and suggest expanding or refining these based on SEO best practices.



In [None]:
# Import necessary libraries for web scraping, text processing, and keyword extraction
import requests  # Used to make HTTP requests to each URL to access webpage content
from bs4 import BeautifulSoup  # Used to parse HTML and extract content from web pages
import re  # Used for cleaning text with regular expressions
from sklearn.feature_extraction.text import CountVectorizer  # Used to extract unigrams, bigrams, and trigrams
from collections import Counter  # Used to count occurrences of keywords


### Explanation of Each Step

```python
# Import necessary libraries for web scraping, text processing, and keyword extraction
```
This line is a comment. Comments are added to the code for explanation purposes and are not run as part of the program. This comment tells us that the following code will bring in (or import) certain libraries, which are collections of code written by other developers to help with specific tasks like fetching web content, cleaning text, and analyzing keywords.

---

```python
import requests  # Used to make HTTP requests to each URL to access webpage content
```
- **Purpose**: The `requests` library helps us connect to websites, like when you type a web address into your browser. It sends a request to a website and pulls (or "fetches") the content for us to use in our program.
- **Example**: If we want to get content from `https://example.com`, `requests` will allow us to connect to that website and get the HTML code (the building blocks of a webpage) to work with.

---

```python
from bs4 import BeautifulSoup  # Used to parse HTML and extract content from web pages
```
- **Purpose**: `BeautifulSoup` is a tool that helps us look at the website's HTML code and extract specific parts, like paragraphs, titles, or images.
- **Example**: Suppose the HTML of a page has a section that looks like this:
  ```html
  <p>Welcome to our website!</p>
  ```
  BeautifulSoup allows us to find and extract just the phrase "Welcome to our website!" without all the other HTML tags.

---

```python
import re  # Used for cleaning text with regular expressions
```
- **Purpose**: The `re` library (short for "regular expressions") is used to search for and remove unwanted characters, symbols, or words in text.
- **Example**: If a sentence has extra punctuation, like "Hello!!!" or numbers like "Order #1234", `re` can help us remove the extra punctuation and numbers, leaving us with a clean version of the text, such as just "Hello".

---

```python
from sklearn.feature_extraction.text import CountVectorizer  # Used to extract unigrams, bigrams, and trigrams
```
- **Purpose**: `CountVectorizer` helps us identify common words or phrases in the text. It counts how often each word or phrase appears.
- **Terms**:
  - **Unigram**: A single word (e.g., "SEO").
  - **Bigram**: A pair of words that appear together (e.g., "SEO services").
  - **Trigram**: Three words that appear together (e.g., "best SEO services").
- **Example**: If the content of a page includes "SEO services are essential," `CountVectorizer` can identify "SEO," "services," and "SEO services" as common phrases if they appear frequently across the page.

---

```python
from collections import Counter  # Used to count occurrences of keywords
```
- **Purpose**: `Counter` is a simple tool to count how often each item appears in a list. In this case, it can be used to see which words or phrases show up most often in the text, helping us focus on the most important keywords.
- **Example**: If we have a list of words like ["SEO", "SEO", "services", "marketing", "SEO"], `Counter` will tell us that "SEO" appears three times, "services" once, and "marketing" once.

---


In [None]:
# Import necessary libraries for web scraping, text processing, and keyword extraction
import requests  # Used to make HTTP requests to each URL to access webpage content
from bs4 import BeautifulSoup  # Used to parse HTML and extract content from web pages
import re  # Used for cleaning text with regular expressions

# List of URLs to scrape content from for SEO analysis
urls = [
    'https://thatware.co/',  # Main homepage of the website
    'https://thatware.co/advanced-seo-services/',  # Advanced SEO services page
    'https://thatware.co/digital-marketing-services/',  # Digital marketing services page
    # You can add more URLs here if you want to analyze more pages
]

# Function to clean the text content by removing unwanted characters and stopwords
def clean_text(text):
    """
    This function cleans up the text content by:
    - Removing common words that don't add meaning (stopwords)
    - Removing punctuation and special symbols
    - Converting all text to lowercase to ensure uniformity
    """

    # Define a list of common stopwords that are not useful for SEO analysis
    stopwords = set([
        "the", "is", "in", "and", "to", "of", "a", "for", "on", "with", "as", "by",
        "this", "an", "or", "at", "from", "but", "be", "not", "it", "if", "are", "that",
        "can", "will", "has", "have", "we", "you", "they", "your", "our", "any", "other", "also"
    ])

    # Remove punctuation, symbols, digits, and convert text to lowercase for uniformity
    text = re.sub(r'[^\w\s]', '', text)  # Remove punctuation and special symbols
    text = re.sub(r'\d+', '', text)  # Remove digits
    text = text.lower()  # Convert to lowercase to make analysis easier

    # Remove stopwords from text
    cleaned_text = ' '.join(word for word in text.split() if word not in stopwords)
    return cleaned_text  # Return the cleaned text for analysis

# Loop through each URL, fetch content, clean it, and display the output
for url in urls:
    try:
        # Make a request to the URL and get the content of the webpage
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')  # Parse the webpage content

        # Extract the page title, meta description, and main content (paragraphs)
        title = soup.title.string if soup.title else ''  # Get page title if available
        meta_desc = soup.find('meta', attrs={'name': 'description'})  # Find meta description tag
        meta_desc = meta_desc['content'] if meta_desc else ''  # Get meta description content if available

        # Extract all paragraph text from the page content for the main analysis
        paragraphs = soup.find_all('p')
        content = ' '.join([para.get_text() for para in paragraphs])  # Join all paragraph text

        # Clean the main content text using the clean_text function
        cleaned_content = clean_text(content)

        # Display the extracted title, meta description, original, and cleaned content for each URL
        print(f"\nURL: {url}")
        print(f"Title: {title}")
        print(f"Meta Description: {meta_desc}")
        print(f"\nOriginal Content from {url}:\n", content[:500])  # Display first 500 characters of original text
        print(f"\nCleaned Content from {url}:\n", cleaned_content[:500])  # Display first 500 characters of cleaned text

    except Exception as e:
        print(f"Error fetching content from {url}: {e}")



URL: https://thatware.co/
Title: THATWARE® - AI Powered SEO & Best Advanced SEO Agency
Meta Description: ThatWare® is world's first SEO company seamlessly integrating the power of AI into its strategies. Leveraging advanced SEO methods such as Semantics.

Original Content from https://thatware.co/:
 $ RevenueGenerated via SEO Qualified LeadsGenerated  
 8 years ago, we embarked on a journey to unravel the intricacies of the Google algorithm—a cryptic enigma begging to be deciphered. Consider it akin to unlocking a closely guarded secret, comparable only to the recipe of Coca Cola or the security measures surrounding the Crown Jewels of London. To traverse the Google maze, we decided to rewrite the rules and carve our own path. Our strategy? Develop proprietary AI algorithms to adeptly monit

Cleaned Content from https://thatware.co/:
 revenuegenerated via seo qualified leadsgenerated years ago embarked journey unravel intricacies google algorithma cryptic enigma begging deciphered con


### **Detailed Code Explanation with Examples**

### Setting Up the URLs for Analysis

```python
# List of URLs to scrape content from for SEO analysis
urls = [
    'https://thatware.co/',  # Main homepage of the website
    'https://thatware.co/advanced-seo-services/',  # Advanced SEO services page
    'https://thatware.co/digital-marketing-services/',  # Digital marketing services page
    # You can add more URLs here if you want to analyze more pages
]
```
- **Purpose**: `urls` is a list containing the web addresses (URLs) we want to analyze for SEO. We’ll analyze the title, description, and main content of each URL.
- **Example**: We’ve added three URLs here from ThatWare’s website: the homepage and two service-specific pages. You can add as many URLs as needed.
- **Usage**: The code will go through each URL, fetch its content, clean it, and display the results.

---

### Defining the Cleaning Function

```python
# Function to clean the text content by removing unwanted characters and stopwords
def clean_text(text):
    """
    This function cleans up the text content by:
    - Removing common words that don't add meaning (stopwords)
    - Removing punctuation and special symbols
    - Converting all text to lowercase to ensure uniformity
    """
```
- **Purpose**: The `clean_text` function is designed to remove unnecessary parts of the text that don’t help with SEO, such as punctuation, numbers, and very common words.
- **Explanation**: The introductory comment explains that the function will:
   - Remove common words that don’t add much meaning, like “the” or “is.”
   - Remove punctuation marks and special characters.
   - Convert everything to lowercase so we treat "SEO" and "seo" the same.

---

#### Cleaning Process Within `clean_text`

```python
    # Define a list of common stopwords that are not useful for SEO analysis
    stopwords = set([
        "the", "is", "in", "and", "to", "of", "a", "for", "on", "with", "as", "by",
        "this", "an", "or", "at", "from", "but", "be", "not", "it", "if", "are", "that",
        "can", "will", "has", "have", "we", "you", "they", "your", "our", "any", "other", "also"
    ])
```
- **Purpose**: This list of `stopwords` contains common words that don’t add any significant meaning to the text for SEO purposes.
- **Example**: From a sentence like "SEO is the key to success," we would keep only "SEO key success" by removing stopwords.
- **Usage**: This helps the code focus on keywords that actually contribute to SEO analysis.

---

```python
    # Remove punctuation, symbols, digits, and convert text to lowercase for uniformity
    text = re.sub(r'[^\w\s]', '', text)  # Remove punctuation and special symbols
    text = re.sub(r'\d+', '', text)  # Remove digits
    text = text.lower()  # Convert to lowercase to make analysis easier
```
- **Purpose**: These lines clean up the text further by:
   - Removing punctuation, like commas and periods, and special symbols like @, &, $, etc.
   - Removing numbers, since they usually don’t add value to SEO analysis.
   - Converting everything to lowercase so words like "SEO" and "seo" are treated the same.
- **Example**: "SEO in 2023!!!" would be converted to "seo".
- **Usage**: This standardizes the text to make it easier to analyze consistently.

---

```python
    # Remove stopwords from text
    cleaned_text = ' '.join(word for word in text.split() if word not in stopwords)
    return cleaned_text  # Return the cleaned text for analysis
```
- **Purpose**: This part removes the stopwords from the text and joins the remaining words back into a sentence.
- **Example**: From "SEO is the key to success," the cleaned version would be "SEO key success".
- **Usage**: The cleaned text is ready for further SEO analysis to identify the main keywords.

---

### Fetching, Cleaning, and Displaying the Content

```python
# Loop through each URL, fetch content, clean it, and display the output
for url in urls:
    try:
        # Make a request to the URL and get the content of the webpage
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')  # Parse the webpage content
```
- **Purpose**: For each URL in the `urls` list, the code:
   - Connects to the webpage using `requests.get(url)`.
   - Parses (or reads) the webpage content using `BeautifulSoup`.
- **Usage**: This is where we start gathering the raw text content from each webpage.

---

#### Extracting Title, Meta Description, and Paragraphs from Each Webpage

```python
        # Extract the page title, meta description, and main content (paragraphs)
        title = soup.title.string if soup.title else ''  # Get page title if available
        meta_desc = soup.find('meta', attrs={'name': 'description'})  # Find meta description tag
        meta_desc = meta_desc['content'] if meta_desc else ''  # Get meta description content if available
```
- **Purpose**: These lines check if the page has a title and meta description, and extract their content if available.
   - **Title**: The `<title>` tag usually contains the main heading for the page.
      - **Example**: `<title>Best SEO Services</title>` would yield "Best SEO Services".
   - **Meta Description**: The `<meta name="description">` tag often has a short summary of the page.
      - **Example**: `<meta name="description" content="Leading SEO services for top rankings">` would yield "Leading SEO services for top rankings".
- **Usage**: Title and meta description are useful for SEO analysis, as they’re visible in search engine results.

---

```python
        # Extract all paragraph text from the page content for the main analysis
        paragraphs = soup.find_all('p')
        content = ' '.join([para.get_text() for para in paragraphs])  # Join all paragraph text
```
- **Purpose**: This part finds all `<p>` (paragraph) tags, which usually contain the main text, and combines them into a single text block.
- **Example**: If the page has `<p>Learn more about SEO services</p>`, this will pull "Learn more about SEO services" into `content`.
- **Usage**: This gives us the main content of the page for keyword analysis.

---

```python
        # Clean the main content text using the clean_text function
        cleaned_content = clean_text(content)
        
        # Display the extracted title, meta description, original, and cleaned content for each URL
        print(f"\nURL: {url}")
        print(f"Title: {title}")
        print(f"Meta Description:

 {meta_desc}")
        print(f"\nOriginal Content from {url}:\n", content[:500])  # Display first 500 characters of original text
        print(f"\nCleaned Content from {url}:\n", cleaned_content[:500])  # Display first 500 characters of cleaned text
```
- **Purpose**:
   - Cleans the `content` using `clean_text`, resulting in a simpler text with only meaningful keywords.
   - Prints the URL, title, meta description, original content, and cleaned content.
- **Example Output**:
   - **Original Content**: "SEO is the key to success in digital marketing for 2023!"
   - **Cleaned Content**: "seo key success digital marketing"

---

### Handling Errors

```python
    except Exception as e:
        print(f"Error fetching content from {url}: {e}")
```
- **Purpose**: This section catches any errors that occur while fetching the content (e.g., if the page is down). If there’s an error, it prints a message with the URL and the error.
- **Example**: If the website is temporarily down, it might print "Error fetching content from https://thatware.co/: Connection error".

---


In [None]:
from sklearn.feature_extraction.text import CountVectorizer  # Import CountVectorizer for n-gram extraction

# Function to extract unigrams, bigrams, and trigrams from the content with contextual filtering
def extract_ngrams(content, main_keywords=["seo", "services", "optimization", "digital", "marketing"]):
    """
    This function extracts high-frequency unigrams, bigrams, and trigrams from the page content.
    - Filters trigrams based on specific main keywords to avoid irrelevant phrases.
    """

    # CountVectorizer captures n-grams, which are phrases of 1, 2, or 3 words
    vectorizer = CountVectorizer(ngram_range=(1, 3))  # ngram_range=(1,3) captures unigrams, bigrams, and trigrams

    # Generate an n-gram frequency matrix from the content
    ngram_matrix = vectorizer.fit_transform([content])

    # Calculate frequency for each n-gram
    ngram_counts = ngram_matrix.sum(axis=0)  # Sum up occurrences of each n-gram
    ngram_counts = [(word, ngram_counts[0, idx]) for word, idx in vectorizer.vocabulary_.items()]  # Create list of (n-gram, count)

    # Sort the n-grams by their frequency, so the most frequent appear first
    sorted_terms = sorted(ngram_counts, key=lambda x: x[1], reverse=True)

    # Extract top unigrams, bigrams, and filtered trigrams based on main keywords
    n_unigrams = 5  # Limit to the top 5 unigrams
    n_bigrams = 7   # Limit to the top 7 bigrams
    n_trigrams = 7  # Limit to the top 7 trigrams

    # Extract unigrams, bigrams, and filter trigrams based on main keywords
    unigrams = [kw for kw, _ in sorted_terms if len(kw.split()) == 1][:n_unigrams]
    bigrams = [kw for kw, _ in sorted_terms if len(kw.split()) == 2][:n_bigrams]
    trigrams = [
        kw for kw, _ in sorted_terms
        if len(kw.split()) == 3 and any(word in kw.split() for word in main_keywords)
    ][:n_trigrams]  # Filter trigrams based on main keywords

    # Display the top unigrams, bigrams, and trigrams
    print("\nTop Unigrams:", unigrams)
    print("Top Bigrams:", bigrams)
    print("Top Trigrams:", trigrams)

    # Return the extracted n-grams in a dictionary format for further use
    return {'unigrams': unigrams, 'bigrams': bigrams, 'trigrams': trigrams}

# Example content to test extract_ngrams function
sample_content = """
SEO is critical in today’s digital marketing landscape. By focusing on SEO services and optimization techniques,
businesses can improve their online visibility. Digital marketing strategies like SEO optimization, content marketing,
and keyword optimization help drive results.
"""

# Call the extract_ngrams function on sample content to see output
ngram_output = extract_ngrams(sample_content)



Top Unigrams: ['seo', 'marketing', 'optimization', 'digital', 'and']
Top Bigrams: ['digital marketing', 'seo is', 'is critical', 'critical in', 'in today', 'today digital', 'marketing landscape']
Top Trigrams: ['seo is critical', 'in today digital', 'today digital marketing', 'digital marketing landscape', 'marketing landscape by', 'focusing on seo', 'on seo services']


---

### **Code Breakdown**

```python
from sklearn.feature_extraction.text import CountVectorizer  # Import CountVectorizer for n-gram extraction
```
- **Purpose**: We import `CountVectorizer`, a tool for counting and analyzing words in text.
- **Example**: `CountVectorizer` can turn the phrase "SEO is important for digital marketing" into a list of words or phrases (like “SEO,” “digital marketing”) and count how often each appears.

---

### Define the extract_ngrams Function

```python
# Function to extract unigrams, bigrams, and trigrams from the content with contextual filtering
def extract_ngrams(content, main_keywords=["seo", "services", "optimization", "digital", "marketing"]):
    """
    This function extracts high-frequency unigrams, bigrams, and trigrams from the page content.
    - Filters trigrams based on specific main keywords to avoid irrelevant phrases.
    """
```
- **Purpose**: `extract_ngrams` is a function, or a reusable section of code, created to find and count different types of word combinations:
   - **Unigrams**: Single words, like "SEO".
   - **Bigrams**: Two-word phrases, like "digital marketing".
   - **Trigrams**: Three-word phrases, like "SEO digital marketing".
- **Explanation of `main_keywords`**: We use a list of important keywords (like "seo" and "marketing") to filter out only relevant three-word phrases, so we avoid unimportant phrases.

---

### Setting Up CountVectorizer for N-Grams

```python
    # CountVectorizer captures n-grams, which are phrases of 1, 2, or 3 words
    vectorizer = CountVectorizer(ngram_range=(1, 3))  # ngram_range=(1,3) captures unigrams, bigrams, and trigrams
```
- **Purpose**: This line sets up `CountVectorizer` to capture unigrams, bigrams, and trigrams.
- **Explanation of `ngram_range=(1, 3)`**: This setting makes the function look for unigrams (one word), bigrams (two words), and trigrams (three words).
   - **Example**: In the sentence “SEO helps with digital marketing,” this setting will pick up individual words like "SEO," two-word pairs like "digital marketing," and three-word combinations like "SEO helps with".

---

### Generating the N-Gram Frequency Matrix

```python
    # Generate an n-gram frequency matrix from the content
    ngram_matrix = vectorizer.fit_transform([content])
```
- **Purpose**: `ngram_matrix` is a data table that shows how often each word or phrase appears in `content`.
- **Example**: If `content` is "SEO helps with digital marketing. SEO is useful," `ngram_matrix` might show "SEO" appears twice, "digital marketing" appears once, etc.

---

### Calculate Frequency for Each N-Gram

```python
    # Calculate frequency for each n-gram
    ngram_counts = ngram_matrix.sum(axis=0)  # Sum up occurrences of each n-gram
    ngram_counts = [(word, ngram_counts[0, idx]) for word, idx in vectorizer.vocabulary_.items()]  # Create list of (n-gram, count)
```
- **Purpose**:
   - `ngram_counts` sums up the occurrences of each n-gram, turning each word or phrase into a list with its count.
   - `vectorizer.vocabulary_.items()` contains each n-gram and where it appears in the text.
- **Example**: If “SEO” appears three times, it will show as `("SEO", 3)` in `ngram_counts`.

---

### Sort the N-Grams by Frequency

```python
    # Sort the n-grams by their frequency, so the most frequent appear first
    sorted_terms = sorted(ngram_counts, key=lambda x: x[1], reverse=True)
```
- **Purpose**: `sorted_terms` organizes the n-grams from most to least frequent, so we see the most common words and phrases first.
- **Example**: If `ngram_counts` contains `[("SEO", 3), ("digital marketing", 2), ("services", 1)]`, then `sorted_terms` will also show "SEO" first, since it appears the most.

---

### Extract Top Unigrams, Bigrams, and Filtered Trigrams

```python
    # Extract top unigrams, bigrams, and filtered trigrams based on main keywords
    n_unigrams = 5  # Limit to the top 5 unigrams
    n_bigrams = 7   # Limit to the top 7 bigrams
    n_trigrams = 7  # Limit to the top 7 trigrams
```
- **Purpose**: Here, we set limits to display only the top 5 unigrams, top 7 bigrams, and top 7 trigrams to keep the results concise and focused on the most important phrases.

---

```python
    # Extract unigrams, bigrams, and filter trigrams based on main keywords
    unigrams = [kw for kw, _ in sorted_terms if len(kw.split()) == 1][:n_unigrams]
    bigrams = [kw for kw, _ in sorted_terms if len(kw.split()) == 2][:n_bigrams]
    trigrams = [
        kw for kw, _ in sorted_terms
        if len(kw.split()) == 3 and any(word in kw.split() for word in main_keywords)
    ][:n_trigrams]  # Filter trigrams based on main keywords
```
- **Explanation of Unigrams and Bigrams**:
   - `unigrams`: Finds all the one-word phrases (single words) in `sorted_terms` and stores the top 5.
   - `bigrams`: Finds all the two-word phrases in `sorted_terms` and stores the top 7.
   - **Example**: If the text has "SEO," "services," and "marketing" as top words, `unigrams` will capture them.
- **Explanation of Trigrams**:
   - `trigrams` filters out only the top three-word phrases containing one of the `main_keywords` (like "SEO," "services").
   - **Example**: If a phrase like “SEO services optimization” appears in the text, it will be kept because it contains “SEO” and “services”.

---

### Display and Return Results

```python
    # Display the top unigrams, bigrams, and trigrams
    print("\nTop Unigrams:", unigrams)
    print("Top Bigrams:", bigrams)
    print("Top Trigrams:", trigrams)
    
    # Return the extracted n-grams in a dictionary format for further use
    return {'unigrams': unigrams, 'bigrams': bigrams, 'trigrams': trigrams}    
```
- **Purpose**:
   - `print` statements display the top unigrams, bigrams, and trigrams directly in the output.
   - The dictionary `{'unigrams': unigrams, 'bigrams': bigrams, 'trigrams': trigrams}` makes these results available for further analysis or use.
- **Example Output**:
   - **Top Unigrams**: `["SEO", "digital", "marketing"]`
   - **Top Bigrams**: `["SEO services", "digital marketing"]`
   - **Top Trigrams**: `["SEO services optimization"]`

---

### Testing the Function with Example Content

```python
# Example content to test extract_ngrams function
sample_content = """
SEO is critical in today’s digital marketing landscape. By focusing on SEO services and optimization techniques,
businesses can improve their online visibility. Digital marketing strategies like SEO optimization, content marketing,
and keyword optimization help drive results.
"""

# Call the extract_ngrams function on sample content to see output
ngram_output = extract_ngrams(sample_content)
```
- **Explanation of `sample_content`**: This is a sample text containing several relevant SEO terms. It simulates real content to see how the function identifies common words and phrases.
- **Explanation of `extract_ngrams(sample_content)`**: This line runs the function on `sample_content` and should print the top unigrams, bigrams, and trigrams based on frequency.

### Example Output from Running the Code

When you run this code, you should see output like:

```plaintext
Top Unigrams: ['seo', 'digital', 'marketing', 'optimization', 'services']
Top Bigrams: ['seo services', 'digital marketing', 'keyword optimization', 'seo optimization', 'content marketing']
Top Trigrams: ['digital marketing strategies', 'seo services optimization', 'content keyword optimization']
```


In [None]:
# Function to generate customized SEO suggestions based on analysis results
def generate_suggestions(insight):
    """
    This function generates specific SEO suggestions based on the analysis:
    - It evaluates the length of the title and meta description.
    - Provides high-density keyword suggestions.
    """
    suggestions = []  # Initialize an empty list to store the suggestions

    # Analyze title length and provide a recommendation
    if 10 <= insight['title_length'] <= 60:
        suggestions.append("Title length is optimal.")  # Suggestion if title length is good
    else:
        suggestions.append("Adjust title length to be within 10-60 words for better SEO.")  # Suggestion if title length needs adjustment

    # Analyze meta description length and provide a recommendation
    if 150 <= len(insight['meta_desc']) <= 160:
        suggestions.append("Meta description length is optimal.")  # Suggestion if meta description length is good
    else:
        suggestions.append("Adjust meta description to be within 150-160 characters.")  # Suggestion if meta description length needs adjustment

    # Analyze keyword density and recommend high-density keywords for focus
    if len(insight['unigrams']) > 0:
        suggestions.append(f"Focus on high-density keywords: {insight['unigrams']}")  # Suggestion to focus on frequently used keywords

    return suggestions  # Return the list of suggestions


# Example data to test the generate_suggestions function
sample_insight = {
    'title_length': 12,  # Example title length in words
    'meta_desc': "Discover advanced SEO strategies that can boost your online presence effectively.",  # Example meta description
    'unigrams': ['seo', 'digital', 'optimization', 'marketing', 'services']  # Example high-density keywords
}

# Call the generate_suggestions function on the example data
suggestions_output = generate_suggestions(sample_insight)

# Display the output suggestions
print("SEO Suggestions Based on Analysis:")
for suggestion in suggestions_output:
    print(f"- {suggestion}")


SEO Suggestions Based on Analysis:
- Title length is optimal.
- Adjust meta description to be within 150-160 characters.
- Focus on high-density keywords: ['seo', 'digital', 'optimization', 'marketing', 'services']


---

### **Step-by-Step Code Explanation**

```python
# Function to generate customized SEO suggestions based on analysis results
def generate_suggestions(insight):
    """
    This function generates specific SEO suggestions based on the analysis:
    - It evaluates the length of the title and meta description.
    - Provides high-density keyword suggestions.
    """
```

- **What It Does**: This code defines a function called `generate_suggestions` which is designed to take in `insight`, a set of SEO data, and provide helpful suggestions based on that data.
- **Purpose**: This function checks three main things:
  1. The **length of the title** (to see if it’s within an ideal word count range).
  2. The **length of the meta description** (to ensure it’s the optimal length for search engines).
  3. The **main keywords** (to suggest which words to focus on based on their frequency in the text).

---

```python
    suggestions = []  # Initialize an empty list to store the suggestions
```
- **What It Does**: Here, `suggestions` is a blank list where we’ll store our SEO recommendations.
- **Purpose**: Each time we make a suggestion (like “Your title is too short”), we’ll add it to this list. At the end, we’ll return the full list of suggestions.

---

### Analyzing the Title Length

```python
    # Analyze title length and provide a recommendation
    if 10 <= insight['title_length'] <= 60:
        suggestions.append("Title length is optimal.")  # Suggestion if title length is good
    else:
        suggestions.append("Adjust title length to be within 10-60 words for better SEO.")  # Suggestion if title length needs adjustment
```
- **What It Does**: This part checks the length of the title and provides a suggestion based on the length.
- **Explanation**:
  - `if 10 <= insight['title_length'] <= 60`: This line checks if the title is between 10 and 60 words.
     - If **yes**, it adds “Title length is optimal” to `suggestions`, meaning no change is needed.
     - If **no**, it adds “Adjust title length to be within 10-60 words for better SEO.”
- **Example**: If the title length is 12 words, this part of the code will add "Title length is optimal" to the suggestions.

---

### Analyzing the Meta Description Length

```python
    # Analyze meta description length and provide a recommendation
    if 150 <= len(insight['meta_desc']) <= 160:
        suggestions.append("Meta description length is optimal.")  # Suggestion if meta description length is good
    else:
        suggestions.append("Adjust meta description to be within 150-160 characters.")  # Suggestion if meta description length needs adjustment
```
- **What It Does**: This section checks the length of the meta description and provides feedback.
- **Explanation**:
   - `if 150 <= len(insight['meta_desc']) <= 160`: This line checks if the meta description is between 150 and 160 characters (the ideal length).
      - If **yes**, it adds “Meta description length is optimal.”
      - If **no**, it adds “Adjust meta description to be within 150-160 characters.”
- **Example**: If the meta description is “Discover advanced SEO strategies that can boost your online presence effectively” (70 characters), this part of the code will add “Adjust meta description to be within 150-160 characters” to the suggestions.

---

### Analyzing High-Density Keywords

```python
    # Analyze keyword density and recommend high-density keywords for focus
    if len(insight['unigrams']) > 0:
        suggestions.append(f"Focus on high-density keywords: {insight['unigrams']}")  # Suggestion to focus on frequently used keywords
```
- **What It Does**: This part checks for the presence of any keywords in `insight['unigrams']`.
- **Explanation**:
   - `if len(insight['unigrams']) > 0`: This line checks if there are any frequently used single words (or “unigrams”).
      - If there are, it suggests focusing on those keywords by adding a suggestion to `suggestions`.
      - **Example**: If `unigrams` contains `['seo', 'digital', 'optimization']`, it will add “Focus on high-density keywords: ['seo', 'digital', 'optimization']” to `suggestions`.

---

### Returning All Suggestions

```python
    return suggestions  # Return the list of suggestions
```
- **What It Does**: This line gives back the complete list of suggestions that were added to `suggestions` throughout the function.
- **Example**: If the function has created three suggestions like “Title length is optimal,” “Adjust meta description…,” and “Focus on high-density keywords…,” they will all be returned in one list.

---

### Example Data to Test the Function

```python
# Example data to test the generate_suggestions function
sample_insight = {
    'title_length': 12,  # Example title length in words
    'meta_desc': "Discover advanced SEO strategies that can boost your online presence effectively.",  # Example meta description
    'unigrams': ['seo', 'digital', 'optimization', 'marketing', 'services']  # Example high-density keywords
}
```
- **What This Is**: `sample_insight` is a pretend set of data (like a practice input) to see what suggestions `generate_suggestions` will give.
- **Explanation**:
   - `title_length`: This says the title has 12 words.
   - `meta_desc`: This is a short description about SEO strategies.
   - `unigrams`: This is a list of keywords to focus on, like “SEO” and “digital.”

---

### Calling the Function and Displaying the Suggestions

```python
# Call the generate_suggestions function on the example data
suggestions_output = generate_suggestions(sample_insight)

# Display the output suggestions
print("SEO Suggestions Based on Analysis:")
for suggestion in suggestions_output:
    print(f"- {suggestion}")
```
- **What It Does**:
   - Calls `generate_suggestions` using the `sample_insight` data.
   - **Display**: Prints “SEO Suggestions Based on Analysis:” and lists each suggestion on a new line with a bullet point (`-`).
- **Example Output**:
   - This example data might print:
     ```plaintext
     SEO Suggestions Based on Analysis:
     - Title length is optimal.
     - Adjust meta description to be within 150-160 characters.
     - Focus on high-density keywords: ['seo', 'digital', 'optimization', 'marketing', 'services']
     ```


In [None]:
# Function to perform SEO analysis and generate insights
def seo_analysis(data):
    """
    This function performs a complete SEO analysis on each URL's content:
    - It extracts keywords and evaluates title/meta description lengths.
    - Generates insights and customized suggestions for each URL.
    """
    seo_insights = []  # Stores the analysis results for each URL

    # Loop through the data, analyzing each URL's content
    for item in data:
        if item:  # Ensure the data item is not empty
            # Extract top keywords in unigrams, bigrams, and trigrams
            ngrams = extract_ngrams(item['content'])

            # Get title and meta description lengths for analysis
            title_length = len(item['title'].split())  # Count words in the title
            meta_desc_length = len(item['meta_desc'].split())  # Count words in the meta description

            # Store each analysis result as a dictionary with detailed insights
            seo_insight = {
                'url': item['url'],
                'title_length': title_length,
                'meta_desc_length': meta_desc_length,
                'unigrams': ngrams['unigrams'],  # Top unigrams
                'bigrams': ngrams['bigrams'],    # Top bigrams
                'trigrams': ngrams['trigrams'],  # Top trigrams
                'meta_desc': item['meta_desc']   # Meta description text
            }

            # Generate customized SEO suggestions based on analysis
            seo_insight['suggestions'] = generate_suggestions(seo_insight)  # Call function to generate suggestions
            seo_insights.append(seo_insight)  # Append the result to the insights list

    return seo_insights  # Return the list of insights for all URLs


# Example data to simulate URL content for testing
data = [
    {
        'url': 'https://thatware.co/',
        'title': 'Advanced SEO Services for Your Business',
        'meta_desc': 'Discover how our advanced SEO services can enhance your business visibility.',
        'content': "Our SEO services are designed to improve your online presence. We use advanced SEO techniques to ensure better ranking and higher visibility on search engines."
    },
    {
        'url': 'https://thatware.co/advanced-seo-services/',
        'title': 'Explore Our Advanced SEO Services',
        'meta_desc': 'Get top-notch SEO services that bring results and visibility.',
        'content': "Advanced SEO services can provide more visibility for your website. Our services include keyword optimization, content marketing, and technical SEO improvements."
    }
]

# Run SEO analysis on the data
seo_insights = seo_analysis(data)

# Display SEO insights for each URL
for insight in seo_insights:
    print(f"URL: {insight['url']}")
    print(f"Title Length: {insight['title_length']} words")
    print(f"Meta Description Length: {insight['meta_desc_length']} words")
    print("Top Unigrams:", insight['unigrams'])
    print("Top Bigrams:", insight['bigrams'])
    print("Top Trigrams:", insight['trigrams'])
    print("SEO Suggestions:")
    for suggestion in insight['suggestions']:
        print(f"- {suggestion}")
    print("-" * 80)  # Separator for each URL's output



Top Unigrams: ['seo', 'to', 'our', 'services', 'are']
Top Bigrams: ['our seo', 'seo services', 'services are', 'are designed', 'designed to', 'to improve', 'improve your']
Top Trigrams: ['our seo services', 'seo services are', 'services are designed', 'use advanced seo', 'advanced seo techniques', 'seo techniques to']

Top Unigrams: ['seo', 'services', 'advanced', 'can', 'provide']
Top Bigrams: ['advanced seo', 'seo services', 'services can', 'can provide', 'provide more', 'more visibility', 'visibility for']
Top Trigrams: ['advanced seo services', 'seo services can', 'services can provide', 'website our services', 'our services include', 'services include keyword', 'include keyword optimization']
URL: https://thatware.co/
Title Length: 6 words
Meta Description Length: 11 words
Top Unigrams: ['seo', 'to', 'our', 'services', 'are']
Top Bigrams: ['our seo', 'seo services', 'services are', 'are designed', 'designed to', 'to improve', 'improve your']
Top Trigrams: ['our seo services', 'se

---

### **Full Code Breakdown**

```python
# Function to perform SEO analysis and generate insights
def seo_analysis(data):
    """
    This function performs a complete SEO analysis on each URL's content:
    - It extracts keywords and evaluates title/meta description lengths.
    - Generates insights and customized suggestions for each URL.
    """
    seo_insights = []  # Stores the analysis results for each URL
```

- **What It Does**: This is the `seo_analysis` function, which will analyze SEO elements for each URL in `data`.
- **Purpose**: For each URL, it checks the title length, meta description length, and finds common keywords. It then generates recommendations on improving SEO.
- **Example**: Suppose `data` contains information about multiple URLs. This function will go through each one, analyzing and generating insights.

---

### Looping Through Each URL’s Data

```python
    # Loop through the data, analyzing each URL's content
    for item in data:
        if item:  # Ensure the data item is not empty
```

- **What It Does**: This part goes through each item (URL) in the list `data`.
- **Purpose**: `for item in data` means we’re looking at each URL, one by one. `if item` checks that the item is not empty (to avoid errors).
- **Example**: If `data` has two URLs, this loop will analyze them one at a time.

---

### Extracting Keywords: Unigrams, Bigrams, and Trigrams

```python
            # Extract top keywords in unigrams, bigrams, and trigrams
            ngrams = extract_ngrams(item['content'])
```

- **What It Does**: This line runs the function `extract_ngrams` on the `content` (main text) of each URL to find common keywords and phrases.
- **Explanation of N-Grams**:
   - **Unigrams**: Single words like “SEO” or “business.”
   - **Bigrams**: Two-word phrases like “SEO services.”
   - **Trigrams**: Three-word phrases like “SEO for businesses.”
- **Example**: For the content “Advanced SEO services for your business,” the unigrams might be “SEO” and “services,” the bigram could be “SEO services,” and a trigram might be “Advanced SEO services.”

---

### Counting Title and Meta Description Length

```python
            # Get title and meta description lengths for analysis
            title_length = len(item['title'].split())  # Count words in the title
            meta_desc_length = len(item['meta_desc'].split())  # Count words in the meta description
```

- **What It Does**:
   - `title_length`: Counts the number of words in the title.
   - `meta_desc_length`: Counts the number of words in the meta description.
- **Purpose**: Knowing how many words are in the title and meta description helps determine if they’re the right length for SEO (too short or too long).
- **Example**: If the title is “Advanced SEO Services for Your Business,” `title_length` would be 5. If the meta description is “Discover our advanced SEO services,” `meta_desc_length` would be 5.

---

### Storing Each Analysis Result

```python
            # Store each analysis result as a dictionary with detailed insights
            seo_insight = {
                'url': item['url'],
                'title_length': title_length,
                'meta_desc_length': meta_desc_length,
                'unigrams': ngrams['unigrams'],  # Top unigrams
                'bigrams': ngrams['bigrams'],    # Top bigrams
                'trigrams': ngrams['trigrams'],  # Top trigrams
                'meta_desc': item['meta_desc']   # Meta description text
            }
```

- **What It Does**: This block creates a dictionary (a type of data structure) called `seo_insight` to store all SEO-related information for a particular URL.
- **Explanation of Each Key**:
   - `url`: The URL being analyzed.
   - `title_length`: Number of words in the title.
   - `meta_desc_length`: Number of words in the meta description.
   - `unigrams`, `bigrams`, `trigrams`: Lists of common keywords or phrases (generated by `extract_ngrams`).
   - `meta_desc`: The actual meta description text.
- **Example**: For a URL like “https://thatware.co/” with a title of 5 words, a meta description of 8 words, and the unigrams `[‘SEO’, ‘services’]`, the dictionary might look like:
  ```python
  {
      'url': 'https://thatware.co/',
      'title_length': 5,
      'meta_desc_length': 8,
      'unigrams': ['SEO', 'services'],
      'bigrams': ['SEO services'],
      'trigrams': ['Advanced SEO services'],
      'meta_desc': 'Discover how our advanced SEO services can enhance your business visibility.'
  }
  ```

---

### Generating SEO Suggestions

```python
            # Generate customized SEO suggestions based on analysis
            seo_insight['suggestions'] = generate_suggestions(seo_insight)  # Call function to generate suggestions
            seo_insights.append(seo_insight)  # Append the result to the insights list
```

- **What It Does**:
   - `generate_suggestions(seo_insight)`: Calls another function we defined earlier to generate specific SEO recommendations based on the `seo_insight` data.
   - `seo_insights.append(seo_insight)`: Adds the completed `seo_insight` dictionary to `seo_insights` (a list that stores all insights for each URL).
- **Purpose**: This provides specific feedback for each URL, telling the user how to improve titles, descriptions, or keywords.
- **Example**: If the title is too short, `generate_suggestions` might add “Adjust title length to be within 10-60 words for better SEO.”

---

### Returning the List of All SEO Insights

```python
    return seo_insights  # Return the list of insights for all URLs
```

- **What It Does**: Returns `seo_insights`, a list containing all SEO analyses and suggestions for each URL.
- **Example**: This would look something like:
  ```python
  [
      {
          'url': 'https://thatware.co/',
          'title_length': 5,
          'meta_desc_length': 8,
          'unigrams': ['SEO', 'services'],
          'bigrams': ['SEO services'],
          'trigrams': ['Advanced SEO services'],
          'meta_desc': 'Discover how our advanced SEO services can enhance your business visibility.',
          'suggestions': [
              'Adjust title length to be within 10-60 words for better SEO.',
              'Focus on high-density keywords: [‘SEO’, ‘services’]'
          ]
      },
      ...
  ]
  ```

---

### Example Data and Running the Function

```python
# Example data to simulate URL content for testing
data = [
    {
        'url': 'https://thatware.co/',
        'title': 'Advanced SEO Services for Your Business',
        'meta_desc': 'Discover how our advanced SEO services can enhance your business visibility.',
        'content': "Our SEO services are designed to improve your online presence. We use advanced SEO techniques to ensure better ranking and higher visibility on search engines."
    },
    {
        'url': 'https://thatware.co/advanced-seo-services/',
        'title': 'Explore Our Advanced SEO Services',
        'meta_desc': 'Get top-notch SEO services that bring results and visibility.',
        'content': "Advanced SEO services can provide more visibility for your website. Our services include keyword optimization, content marketing, and technical SEO improvements."
    }
]
```

- **Explanation**: `data` simulates two URLs with details about their title, meta description, and main content, allowing us to test the function.

### Running and Displaying the Results

```python
# Run SEO analysis on the data
seo_insights = seo_analysis(data)

# Display SEO insights for each URL
for insight in seo_insights:
    print(f"URL: {insight['url']}")
    print(f"Title Length: {insight['title_length']} words")
    print(f"Meta Description Length: {insight['meta_desc_length']} words")
    print("Top Unigrams:", insight['unigrams'])
    print("Top Bigrams:", insight['bigrams'])
    print("Top Trigrams:", insight['trigrams'])
    print("SEO Suggestions:")
    for suggestion in insight['suggestions']:
        print(f"- {suggestion}")
    print("-" * 80)  # Separator for each URL's output
```

- **Purpose**: Loops through each result in `seo_insights` and prints the URL, title and description lengths, top keywords, and SEO recommendations.

### Expected Output

```plaintext
URL: https://thatware.co/
Title Length: 5 words
Meta Description Length: 8 words
Top Unigrams: ['SEO', 'services', 'online', 'presence', 'ranking']
Top Bigrams: ['SEO services', 'advanced SEO', 'business visibility']
Top Trigrams: ['improve online presence', 'better ranking visibility']
SEO Suggestions:
- Adjust title length to be within 10-60 words for better SEO.
- Adjust meta description to be within 150-160

 characters.
- Focus on high-density keywords: ['SEO', 'services', 'online', 'presence', 'ranking']
--------------------------------------------------------------------------------
```

---


In [None]:
# Import necessary libraries for web scraping, text processing, and keyword extraction
import requests  # Used to make HTTP requests to each URL to access webpage content
from bs4 import BeautifulSoup  # Used to parse HTML and extract content from web pages
import re  # Used for cleaning text with regular expressions
from sklearn.feature_extraction.text import CountVectorizer  # Used to extract unigrams, bigrams, and trigrams
from collections import Counter  # Used to count occurrences of keywords

# List of URLs to scrape content from for SEO analysis
urls = [
    'https://thatware.co/',
    'https://thatware.co/advanced-seo-services/',
    'https://thatware.co/digital-marketing-services/',
    # Add more URLs as needed...
]

# Function to clean the text content by removing unwanted characters and stopwords
def clean_text(text):
    """
    This function cleans up the text content by:
    - Removing common words that don't add meaning (stopwords)
    - Removing punctuation and special symbols
    - Converting all text to lowercase to ensure uniformity
    """

    # Define a list of common stopwords that are not useful for SEO analysis
    stopwords = set([
        "the", "is", "in", "and", "to", "of", "a", "for", "on", "with", "as", "by",
        "this", "an", "or", "at", "from", "but", "be", "not", "it", "if", "are", "that",
        "can", "will", "has", "have", "we", "you", "they", "your", "our", "any", "other", "also"
    ])

    # Remove punctuation, symbols, digits, and convert text to lowercase for uniformity
    text = re.sub(r'[^\w\s]', '', text)  # Remove punctuation and special symbols
    text = re.sub(r'\d+', '', text)  # Remove digits
    text = text.lower()  # Convert to lowercase to make analysis easier

    # Remove stopwords from text
    cleaned_text = ' '.join(word for word in text.split() if word not in stopwords)
    return cleaned_text  # Return the cleaned text for analysis

# Function to scrape content from a URL
def scrape_content(url):
    """
    This function scrapes and processes content from a given URL.
    - It extracts the page title, meta description, and main text content.
    - It then cleans the content to make it suitable for SEO analysis.
    """
    try:
        # Send an HTTP request to the URL and parse the page content with BeautifulSoup
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')

        # Extract the page title, meta description, and main content (paragraphs)
        title = soup.title.string if soup.title else ''
        meta_desc = soup.find('meta', attrs={'name': 'description'})
        meta_desc = meta_desc['content'] if meta_desc else ''

        # Extract all paragraph text from the page content for the main analysis
        paragraphs = soup.find_all('p')
        content = ' '.join([para.get_text() for para in paragraphs])

        # Clean the main content text using the clean_text function
        cleaned_content = clean_text(content)

        return {'url': url, 'title': title, 'meta_desc': meta_desc, 'content': cleaned_content}
    except Exception as e:
        print(f"Error scraping {url}: {e}")
        return None  # In case of an error, return None

# Loop through each URL and scrape its content, storing results in a list
data = [scrape_content(url) for url in urls]

# Function to extract unigrams, bigrams, and trigrams from the content with contextual filtering
def extract_ngrams(content, main_keywords=["seo", "services", "optimization", "digital", "marketing"]):
    """
    This function extracts high-frequency unigrams, bigrams, and trigrams from the page content.
    - Filters trigrams based on specific main keywords to avoid irrelevant phrases.
    """

    # CountVectorizer captures n-grams, which are phrases of 1, 2, or 3 words
    vectorizer = CountVectorizer(ngram_range=(1, 3))

    # Generate an n-gram frequency matrix from the content
    ngram_matrix = vectorizer.fit_transform([content])

    # Calculate frequency for each n-gram
    ngram_counts = ngram_matrix.sum(axis=0)
    ngram_counts = [(word, ngram_counts[0, idx]) for word, idx in vectorizer.vocabulary_.items()]

    # Sort the n-grams by their frequency, so the most frequent appear first
    sorted_terms = sorted(ngram_counts, key=lambda x: x[1], reverse=True)

    # Extract top unigrams, bigrams, and filtered trigrams based on main keywords
    n_unigrams = 5  # Limit to the top 5 unigrams
    n_bigrams = 7   # Limit to the top 7 bigrams
    n_trigrams = 7  # Limit to the top 7 trigrams

    # Extract unigrams, bigrams, and filter trigrams based on main keywords
    unigrams = [kw for kw, _ in sorted_terms if len(kw.split()) == 1][:n_unigrams]
    bigrams = [kw for kw, _ in sorted_terms if len(kw.split()) == 2][:n_bigrams]
    trigrams = [
        kw for kw, _ in sorted_terms
        if len(kw.split()) == 3 and any(word in kw.split() for word in main_keywords)
    ][:n_trigrams]  # Filter trigrams based on main keywords

    return {'unigrams': unigrams, 'bigrams': bigrams, 'trigrams': trigrams}

# Function to generate customized SEO suggestions based on analysis results
def generate_suggestions(insight):
    """
    This function generates specific SEO suggestions based on the analysis:
    - It evaluates the length of the title and meta description.
    - Provides high-density keyword suggestions.
    """
    suggestions = []

    # Analyze title length and provide a recommendation
    if 10 <= insight['title_length'] <= 60:
        suggestions.append("Title length is optimal.")
    else:
        suggestions.append("Adjust title length to be within 10-60 words for better SEO.")

    # Analyze meta description length and provide a recommendation
    if 150 <= len(insight['meta_desc']) <= 160:
        suggestions.append("Meta description length is optimal.")
    else:
        suggestions.append("Adjust meta description to be within 150-160 characters.")

    # Analyze keyword density and recommend high-density keywords for focus
    if len(insight['unigrams']) > 0:
        suggestions.append(f"Focus on high-density keywords: {insight['unigrams']}")

    return suggestions

# Function to perform SEO analysis and generate insights
def seo_analysis(data):
    """
    This function performs a complete SEO analysis on each URL's content:
    - It extracts keywords and evaluates title/meta description lengths.
    - Generates insights and customized suggestions for each URL.
    """
    seo_insights = []  # Stores the analysis results for each URL

    # Loop through the data, analyzing each URL's content
    for item in data:
        if item:
            # Extract top keywords in unigrams, bigrams, and trigrams
            ngrams = extract_ngrams(item['content'])

            # Get title and meta description lengths for analysis
            title_length = len(item['title'].split())
            meta_desc_length = len(item['meta_desc'].split())

            # Store each analysis result as a dictionary with detailed insights
            seo_insight = {
                'url': item['url'],
                'title_length': title_length,
                'meta_desc_length': meta_desc_length,
                'unigrams': ngrams['unigrams'],
                'bigrams': ngrams['bigrams'],
                'trigrams': ngrams['trigrams'],
                'meta_desc': item['meta_desc']
            }

            # Generate customized SEO suggestions based on analysis
            seo_insight['suggestions'] = generate_suggestions(seo_insight)
            seo_insights.append(seo_insight)  # Append the result to the insights list

    return seo_insights

# Run SEO analysis on the data
seo_insights = seo_analysis(data)

# Function to display the SEO suggestions based on the analysis
def display_seo_suggestions(seo_insights):
    """
    This function displays SEO suggestions based on insights:
    - It prints the analysis results, including title/meta description lengths, keywords, and suggestions.
    """
    for insight in seo_insights:
        print(f"URL: {insight['url']}")
        print(f"Title Length: {insight['title_length']} words")
        print(f"Meta Description Length: {len(insight['meta_desc'])} characters")
        print("Top Unigrams:", insight['unigrams'])
        print("Top Bigrams:", insight['bigrams'])
        print("Top Trigrams:", insight['trigrams'])
        print("Dynamic SEO Suggestions:")
        for suggestion in insight['suggestions']:
            print(f" - {suggestion}")
        print("-" * 80)

# Run the display function to show the SEO suggestions
display_seo_suggestions(seo_insights)


URL: https://thatware.co/
Title Length: 10 words
Meta Description Length: 149 characters
Top Unigrams: ['seo', 'services', 'ai', 'advanced', 'algorithms']
Top Bigrams: ['seo services', 'ai seo', 'ai algorithms', 'advanced seo', 'data science', 'seo algorithms', 'seo strategy']
Top Trigrams: ['ai seo algorithms', 'optimization backlink building', 'seo agency globally', 'search engine optimization', 'revenuegenerated via seo', 'via seo qualified', 'seo qualified leadsgenerated']
Dynamic SEO Suggestions:
 - Title length is optimal.
 - Adjust meta description to be within 150-160 characters.
 - Focus on high-density keywords: ['seo', 'services', 'ai', 'advanced', 'algorithms']
--------------------------------------------------------------------------------
URL: https://thatware.co/advanced-seo-services/
Title Length: 9 words
Meta Description Length: 135 characters
Top Unigrams: ['seo', 'advanced', 'services', 'digital', 'business']
Top Bigrams: ['advanced seo', 'seo services', 'digital mar

### 1. Understanding the Structure of the Output

The output shows SEO insights for each URL (webpage) of the website. These insights include information about the **title length**, **meta description length**, **top keywords** in the form of **unigrams**, **bigrams**, and **trigrams**, and **SEO suggestions**. Each part of this output provides specific insights about how well a webpage is optimized for search engines and suggests possible improvements.

Let's break down each of these terms and parts of the output:

---

### Explanation of Each Section in the Output

#### URL
Each section of the output starts with a **URL** (web address) of the page analyzed. This URL tells us which webpage the insights are for. For example:
- **URL**: `https://thatware.co/`

This is the specific webpage for which the SEO insights are being shown.

---

#### Title Length

**Title Length** refers to the number of words in the title of the webpage. Titles are important for SEO because they are one of the first things that search engines and users see. Titles help in attracting users to click on a link in search results.

- **Example**: `Title Length: 10 words - Suggest between 10-60 words for optimal SEO.`
- **What it means**: This webpage has a title that is **10 words** long.
- **Optimal Length**: Ideally, for SEO purposes, it is recommended that titles be between **10 to 60 words**. This is because titles that are too short may lack enough information to attract users, while titles that are too long may get cut off in search results.
- **What to do**: If the title length is far below or above this range, consider adjusting the title to make it more appealing and informative within this length.

---

#### Meta Description Length

**Meta Description Length** indicates the number of words in the meta description. A **meta description** is a short summary of the page's content that appears below the title in search results. It gives users an idea of what the page is about before they click on it.

- **Example**: `Meta Description Length: 22 words - Optimal length is 150-160 characters.`
- **What it means**: This page’s meta description is **22 words** long, which may not meet the ideal length in terms of characters.
- **Optimal Length**: The recommendation is to keep the meta description within **150-160 characters**. Meta descriptions of this length tend to give enough information without getting cut off in search results.
- **What to do**: If the meta description is too short, consider adding more detail to make it more compelling. If it's too long, make it more concise to avoid it being cut off.

---

#### Top Unigrams, Bigrams, and Trigrams

**Top Unigrams, Bigrams, and Trigrams** refer to the most important and frequently used keywords or phrases on the webpage. These keywords are categorized into:
- **Unigrams**: Single words.
- **Bigrams**: Two-word phrases.
- **Trigrams**: Three-word phrases.

These keywords help understand which topics or terms the webpage emphasizes. The presence and frequency of keywords can help search engines understand the relevance of a page to certain search terms.

##### Unigrams
- **Example**: `Top Unigrams: ['seo', 'our', 'services', 'ai', 'advanced']`
- **What it means**: These are the **most frequently occurring single words** (unigrams) on the webpage. In this case, words like "SEO," "services," and "AI" are commonly used, which are relevant to the topics the page covers.
- **What to do**: Make sure these unigrams align with the key topics you want to rank for. For instance, if you want to attract users searching for "advanced SEO," having "SEO" and "advanced" as unigrams is beneficial.

##### Bigrams
- **Example**: `Top Bigrams: ['seo services', 'ai seo', 'our ai', 'ai algorithms', 'advanced seo']`
- **What it means**: Bigrams are the **most common two-word phrases** on the page. These phrases give a bit more context than single words. Here, phrases like "SEO services" and "AI SEO" indicate that the page may be discussing SEO services that involve AI technology.
- **What to do**: Bigrams help create a more specific idea of the page’s focus. If any of these phrases seem unrelated to the topic, you might consider revising the content to focus on relevant phrases.

##### Trigrams
- **Example**: `Top Trigrams: ['ai seo algorithms', 'our ai seo', 'proprietary ai algorithms', 'backlink building content']`
- **What it means**: Trigrams are **three-word phrases** that appear frequently on the page. They provide the most context and show specific phrases or services the page might be targeting.
- **What to do**: If the top trigrams align with your SEO goals, it means the content is well-focused. If any trigrams don’t align with the purpose of the page, it might be worth revising the content to better target your desired search terms.

---

#### SEO Suggestion

The **SEO Suggestion** provides a recommendation based on the above insights. It gives general advice on improving the page’s SEO performance.

- **Example**: `SEO Suggestion: Ensure that the title is engaging and has primary keywords. Use top keywords in your meta description and main content for better ranking.`
- **What it means**: This is a general tip to make sure that the title and meta description contain important keywords and phrases identified in the unigrams, bigrams, and trigrams. Using these keywords strategically helps improve the page’s relevance for search engines.
- **What to do**: Review the title and meta description. Make sure they include some of the top keywords identified in the analysis, as this can help search engines understand what your page is about and may help improve ranking.

---

### Summary: What This Output Conveys and Next Steps

This output provides a detailed SEO analysis for each webpage. It gives information on whether the **title** and **meta description** meet SEO length standards, identifies the most frequently used **keywords and phrases** on each page (unigrams, bigrams, trigrams), and provides **SEO suggestions** based on these findings.

#### What to Do Next:
1. **Adjust Title and Meta Description Lengths**: If any page titles or meta descriptions are too short or too long, adjust them to meet recommended lengths for better SEO performance.
2. **Use Keywords Effectively**: Incorporate the most relevant keywords from the unigrams, bigrams, and trigrams into the title, meta description, and main content. This can improve the page’s chances of ranking well for those keywords.
3. **Follow SEO Suggestions**: Use the SEO suggestion as a checklist to make sure primary keywords are present in titles and descriptions and to confirm the content is focused on the topics you want to rank for.

This output acts as a guide to help optimize each webpage, making them more attractive to search engines and improving their chances of appearing higher in search results. By following the suggestions, you can align your content more closely with SEO best practices and potentially improve the page's visibility and click-through rates.

### 1. Title Length

**How it Helps**: The title of a webpage is the first thing users see in search engine results. It affects both **click-through rates** (CTR) and search engine rankings. If your title is too short, it may not contain enough information to attract users. If it’s too long, search engines may cut it off, meaning users won’t see the full message.

**Steps to Take**:
- **Check Each Title’s Length**: Look at the “Title Length” in the output and ensure it’s between **10-60 words** (or around 50-60 characters).
- **Example**: If you see that a title is only 4 words long, like “AI SEO Services,” you could expand it to something more descriptive, like “AI SEO Services for Boosting Search Engine Rankings.”
- **Impact of Making This Change**: A more descriptive and engaging title could increase CTR because users get a better idea of what the page offers. This can drive more traffic to your site as more people click on your link in search results.

### 2. Meta Description Length

**How it Helps**: The meta description appears under the title in search results. Although it doesn’t directly impact SEO rankings, a well-written meta description can increase the **likelihood of clicks** because it gives users a summary of what they’ll find on the page.

**Steps to Take**:
- **Check Meta Description Length**: Look at “Meta Description Length” and see if it’s close to the **150-160 character** range.
- **Example**: If the description is only 10 words, like “Learn about our AI-based SEO solutions,” you might expand it to: “Discover our AI-powered SEO services designed to enhance your online presence and drive more organic traffic.”
- **Impact of Making This Change**: A compelling meta description encourages more users to click on your page when it appears in search results, leading to better traffic and engagement with your content.

### 3. Top Unigrams, Bigrams, and Trigrams

**How it Helps**: These are the most frequently used words and phrases (keywords) on your page. Keywords help search engines understand what your page is about and can affect your ranking for those terms. This section helps you identify if your page content aligns with the keywords you want to target.

**Steps to Take**:
- **Review the Keywords**: Look at the unigrams, bigrams, and trigrams. Ensure they align with the topics and terms you want your page to rank for.
- **Example**: If your top keywords are “SEO,” “AI,” and “services,” but you want to target “advanced SEO techniques,” consider revising the content to include phrases like “advanced SEO” more frequently.
- **Add or Adjust Content**: Based on the keywords identified, you may need to add more relevant content. For instance, if you see “AI algorithms” as a bigram but want to focus more on “data-driven SEO,” add more content that mentions “data-driven SEO” explicitly.
- **Impact of Making This Change**: Aligning your content with relevant keywords makes it more likely that search engines will rank your page higher for those keywords, which can increase organic traffic from users searching for those terms.

### 4. SEO Suggestion

**How it Helps**: This section gives you recommendations based on the analysis. It advises ensuring that your **title** and **meta description** contain primary keywords and that they are engaging to attract users. This ensures that the SEO basics are in place for better search engine visibility.

**Steps to Take**:
- **Ensure Primary Keywords Are Present**: Make sure that important keywords from the unigrams, bigrams, and trigrams appear in both the title and meta description, as well as the main content.
- **Example**: If “advanced SEO services” is a target keyword, include it in the title, meta description, and content. For example, your meta description could read, “Offering advanced SEO services using AI and data-driven strategies.”
- **Impact of Making This Change**: When keywords are included in the title and description, search engines understand the page’s content better. This can improve rankings for those keywords, making your site more visible to users searching for related topics.

---

### Putting It All Together: What Actions to Take and Their Benefits

1. **Optimize Titles**: Ensure titles are informative and within the recommended length. A well-optimized title can attract more clicks, which can lead to higher rankings over time due to better user engagement.

2. **Write Engaging Meta Descriptions**: Use meta descriptions to summarize the page content attractively. Even though they don’t directly affect rankings, they can increase the likelihood of users clicking through to your site, which is beneficial for traffic.

3. **Adjust Content for Relevant Keywords**: Make sure that your page content includes relevant keywords, focusing on unigrams, bigrams, and trigrams identified. This improves the page’s alignment with search queries, helping to increase the page’s relevance and ranking potential.

4. **Follow SEO Suggestions**: Take the SEO recommendations provided to ensure your title, meta description, and content are aligned with best practices. This can help improve both click-through rates and search engine visibility, benefiting your site in the long run.

---
