##### Copyright 2024 Google LLC.

all copy right reserved for google, without them, this wouldn't be possible

# Before you start:
* Get Your Key from [https://aistudio.google.com/app/apikey](https://aistudio.google.com/app/apikey)
* User Guide: How to Use the SEO Bot
* click on the key icon by your left
* add new secret and paste it there



**Prerequisites**

* **A website URL:** Have the URL of the website you'd like to analyze ready.
* **Python Environment:** You can clone this notebook or run it directly here on [https://colab.research.google.com/](https://colab.research.google.com/)

**Steps**

1. **Run the Code:**
   * Open the Python script for this SEO bot in a code editor or your preferred Python environment.
   * Locate the "play" button or equivalent function to execute the code (this will vary depending on your environment).

2. **Initialization:** Wait for the bot to initialize. You might see some output indicating that it's loading necessary libraries.

3. **Input Website URL:** You'll be prompted to enter the website URL you want to analyze. Type in the URL and press Enter.

4. **Analysis and Tip Generation:** The bot will:
   * Crawl your website.
   * Analyze its SEO elements.
   * Utilize Gemini (your AI model) to generate optimization tips.
   * You may see progress updates during this process.

5. **Save Results:**
   * Once the analysis is complete, a file named "seo_analysis.txt" will be created.
   * Locate this file (usually in the same directory as your code) and open it to view the detailed SEO analysis and tips.

**Example Usage**

Let's say you want to analyze the website "[www.example.com](https://www.example.com)". You would follow these steps, inputting "[www.example.com](https://www.example.com)" when prompted.

**Additional Notes**

* **Analysis Time:** The time it takes to analyze a website depends on its size and complexity.
* **Customization:** If you're familiar with Python, you can customize the SEO bot's behavior by modifying the code.


### Install the SDK's


In [None]:
!pip install -q -U google-generativeai
!pip install summarizer
!pip install nltk textblob
!pip install requests
!pip install beautifulsoup4
!pip install transformers
!pip install textstat



### Import packages

Import the necessary packages.

In [None]:
import pathlib
import textwrap

import google.generativeai as genai

from IPython.display import display
from IPython.display import Markdown
from transformers import pipeline

In [None]:
from google.colab import userdata


GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)


In [None]:
import nltk
!python -m nltk.downloader all


[nltk_data] Downloading collection 'all'
[nltk_data]    | 
[nltk_data]    | Downloading package abc to /root/nltk_data...
[nltk_data]    |   Package abc is already up-to-date!
[nltk_data]    | Downloading package alpino to /root/nltk_data...
[nltk_data]    |   Package alpino is already up-to-date!
[nltk_data]    | Downloading package averaged_perceptron_tagger to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Package averaged_perceptron_tagger is already up-
[nltk_data]    |       to-date!
[nltk_data]    | Downloading package averaged_perceptron_tagger_ru to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Package averaged_perceptron_tagger_ru is already
[nltk_data]    |       up-to-date!
[nltk_data]    | Downloading package basque_grammars to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Package basque_grammars is already up-to-date!
[nltk_data]    | Downloading package bcp47 to /root/nltk_data...
[nltk_data]    |   Package bcp47 is already up-to-dat

### Running the bot

In [None]:
import requests
from bs4 import BeautifulSoup
import textstat

from nltk.corpus import stopwords
from nltk import word_tokenize
from nltk.probability import FreqDist
from textblob import TextBlob
import os.path
from urllib.parse import urlparse, urljoin




summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# you can add to this list
SOCIAL_DOMAINS = [
    "facebook.com",
    "instagram.com",
    "github.com",
    "youtube.com",
    "telegram.org",
    "twitter.com",
    "linkedin.com",
    "githubstatus.com",
    "vercel.app"
]
IGNORED_EXTENSIONS = [
    ".pdf", ".doc", ".docx", ".xlsx", ".ppt", ".pptx",  # Add more as needed
]

# *** Helper Functions ***
def crawl_page(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    return soup

def extract_subpages(soup, base_domain, base_url):
    subpages = []
    skipped_count = 0  # Counter for skipped links

    for link in soup.find_all('a', href=True):
        subpage_url = link['href']

        # Handle relative URLs
        if not urlparse(subpage_url).netloc:
            subpage_url = urljoin(base_url, subpage_url)

        # Check for ignored websites and keywords
        subpage_domain = urlparse(subpage_url).netloc
        if any(domain in subpage_domain for domain in SOCIAL_DOMAINS):
            skipped_count += 1
            continue  # Skip to the next link

        # Subdomain Check (Existing Logic)
        subpage_parts = subpage_domain.split('.')
        if len(subpage_parts) > 2:
            if subpage_parts[-2:] != base_domain.split('.'):
                skipped_count += 1
                print(f" skipped {skipped_count } times")
                continue

        subpages.append(subpage_url)

    return subpages



def extract_seo_elements(soup):
    img_tags = soup.find_all('img')
    header_tags = [tag for tag in soup.find_all() if tag.name in ['h1', 'h2', 'h3', 'h4', 'h5', 'h6']]
    p_tags = soup.find_all('p')
    meta_tags = {}  # Create a dictionary for meta tags
    for tag in soup.find_all('meta'):
        if 'name' in tag.attrs:
            meta_tags[tag['name']] = tag.get('content', '')  # Store content (if it exists)

    return img_tags, header_tags, p_tags, meta_tags


def extract_text_from_tags(img_tags, header_tags, p_tags):
    text_content = ""
    for tag in img_tags:
        if 'alt' in tag.attrs:
            text_content += tag['alt'] + " "
    for tag in header_tags:
        text_content += tag.text.strip() + " "
    for tag in p_tags:
        text_content += tag.text.strip() + " "
    return text_content

def generate_website_summary(text_content):

    summary = summarizer(text_content, max_length=250, min_length=150)

    return summary[0]['summary_text']


def build_prompt(summary, img_tags, header_tags, meta_tags):
    prompt = f"""You are an SEO expert and a staff software engineer that understands how seo is good for marketing. I will provide you with:

* **Website Summary:** {summary}
* **SEO Elements:**
    * **Image Tags:** {img_tags}
    * **Headers:** {header_tags}
    * **Meta Tags:** {meta_tags}

Analyze this website. Provide specific SEO tips with examples for the following:

* **Image Optimization:** How can images be improved for SEO? **For example:**
    * Adding descriptive alt text to images (e.g., <img alt="Golden Retriever puppy playing fetch" src="puppy.jpg"/>).
    * Optimizing image file names to include relevant keywords (e.g., "golden-retriever-puppy-playing.jpg").
* **Header Structure:** How can the headers be better organized to improve SEO?  **For example:**
    * Using a clear hierarchy (H1, H2, H3, etc.), with the most important keywords in the H1 tag.
    * Keeping headers concise and informative.
* **Content:** Are there areas for improvement with keywords or readability? **For example:**
    * Naturally integrating relevant keywords throughout the text, especially in headings.
    * Using short paragraphs and bullet points to improve readability.
* **Meta Tags:** Are they optimized and present? Can they be improved?  **For example:**
    * Ensuring the meta title tag accurately describes the page content.
    * Writing a compelling meta description that summarizes the page and entices users to click.

**Please provide clear instructions and actionable examples and they can be as much as many points, that you know would be a great feat for a webpage to be ranked .**
"""
    return prompt


def calculate_seo_score(seo_elements, text_content):
    score = 0
    stop_words = set(stopwords.words('english'))

    # Keyword analysis
    content_words = word_tokenize(text_content.lower())
    filtered_words = [w for w in content_words if w not in stop_words and w.isalpha()]
    fdist = FreqDist(filtered_words)

    # Determine the most frequent keyword
    most_common_keyword = fdist.most_common(1)[0][0]
    keywords = [most_common_keyword]

    keyword_density = sum(w in keywords for w in filtered_words) / len(filtered_words)
    if keyword_density > 0.02:
        score += 15

    # Image optimization check
    if any("alt" in tag.attrs and tag['alt'] for tag in seo_elements['img_tags']):
        score += 10

    # Meta tags check
    if 'title' in seo_elements['meta_tags'] and \
        'description' in seo_elements['meta_tags'] and \
        'author' in seo_elements['meta_tags']:
        score += 10

    # Readability
    readability = textstat.flesch_reading_ease(text_content)
    if readability > 60:
        score += 5

    return score

# *** Main Analysis Function ***
def analyze_website(url):
    base_domain = urlparse(url).netloc
    results = {
        "pages": {} # To organize results per page
    }
    visited_urls = set()
    urls_to_visit = [url]
    total_subpages = 0  # Counter for tracking analyzed pages

    while urls_to_visit:
        current_url = urls_to_visit.pop(0)
        if current_url in visited_urls:
            continue

        visited_urls.add(current_url)
        results['pages'][current_url] = {}  # Initialize the page's data
        error_file_name = "seo_analysis_error.txt"
        results_file_name = "seo_analysis.txt"
        try:
            soup = crawl_page(current_url)

            # SEO Analysis
            img_tags, header_tags, p_tags, meta_tags = extract_seo_elements(soup)
            text_content = extract_text_from_tags(img_tags, header_tags, p_tags)
            website_summary = generate_website_summary(text_content)
            prompt = build_prompt(website_summary, img_tags, header_tags, meta_tags)

            # SEO Score Calculation
            seo_score = calculate_seo_score(
                {"img_tags": img_tags, "header_tags": header_tags, "meta_tags": meta_tags},
                text_content
            )
            results['pages'][current_url]['seo_score'] = seo_score

            # Gemini Interaction
            model = genai.GenerativeModel('gemini-pro')
            response = model.generate_content(prompt, stream=True)
            seo_tips = ""
            for chunk in response:
                seo_tips += chunk.text

            results['pages'][current_url]['seo_tips'] = seo_tips
            results['pages'][current_url]['img_tags'] = img_tags  # Store for clarity
            results['pages'][current_url]['header_tags'] = header_tags
            results['pages'][current_url]['p_tags'] = p_tags
            results['pages'][current_url]['meta_tags'] = meta_tags

            # Discover subpages
            soup = crawl_page(current_url)
            subpages = extract_subpages(soup, base_domain, current_url)  # Call the function
            urls_to_visit.extend(subpages)


            total_subpages += 1
            print(f"Analyzed {total_subpages} subpages so far...")


        except Exception as e:
            with open(error_file_name, "a") as error_file:
                error_file.write(f"Error analyzing {current_url}\n Error: {e}\n")
            print(f"Error analyzing {current_url}. Details written to '{error_file_name}'")

        else:  # Execute if no exception occurred
            with open(results_file_name, "w") as f:
              for url, data in results['pages'].items():  # Iterate over pages
                f.write(f"URL: {url}\n")
                f.write(f"SEO Tips: {data.get('seo_tips', 'Analysis not available')}\n")
                f.write("-"*20 + "\n")

            print(f"SEO analysis saved to '{results_file_name}'")

    return results

# Get website URL as input
my_website_url = input("Enter the website URL: ")

# Analyze and save results
results = analyze_website(my_website_url)


print("SEO analysis saved to seo_analysis.txt")

Enter the website URL: https://alivexem.vercel.app/
 skipped 1 times
 skipped 2 times
 skipped 3 times
 skipped 4 times
 skipped 5 times
 skipped 6 times
 skipped 7 times
 skipped 9 times
 skipped 10 times
 skipped 11 times
 skipped 12 times
 skipped 13 times
 skipped 14 times
 skipped 15 times
Analyzed 1 subpages so far...
SEO analysis saved to 'seo_analysis.txt'
SEO analysis saved to seo_analysis.txt
