<a href="https://colab.research.google.com/github/Abhiss123/AlmaBetter-Projects/blob/main/The_Ontological_SEO_Architect_Structuring_Content_with_Topic_Maps.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name : The Ontological SEO Architect: Structuring Content with Topic Maps**

---
# Purpose of the Project:

**The Ontological SEO Architect: Structuring Content with Topic Maps**

The purpose of this project is to help websites improve their visibility and ranking on search engines like Google by organizing and optimizing their content in a smarter and more meaningful way. It uses advanced technologies like **ontology** and **topic maps** to create a structured and interconnected content system that aligns better with how search engines work.

---

### Why is This Project Important?

1. **Search Engine Optimization (SEO) is Key**:
   - When people search for something on Google, search engines show websites that are the most relevant to the query.
   - For a website to appear at the top, its content must be well-organized, easy to understand, and relevant.
   - This project helps achieve that by using **ontology** and **topic maps**, which are advanced techniques to structure information.

2. **Understanding Ontology and Topic Maps**:
   - **Ontology**: Think of it as creating a "map of concepts" related to your content. It organizes ideas, keywords, and relationships in a way that both humans and search engines can understand.
   - **Topic Maps**: These act as a "visual and logical blueprint" to show how different topics on your website are connected. This makes navigation and understanding much easier.

3. **Tackling the Complexity of Modern Search Engines**:
   - Search engines have become smarter and now understand the relationships between different topics and keywords.
   - This project bridges the gap by creating a structure that speaks the language of modern search engines, ensuring that your website content gets noticed.

---

### What Does This Project Do?

1. **Extracts Key Information**:
   - It scans and extracts important topics and keywords from your website content.
   - These topics are refined and cleaned to remove irrelevant or repetitive terms.

2. **Builds Relationships Between Topics**:
   - The project identifies how topics are related to each other. For example, if your website is about "Digital Marketing," it might connect topics like "SEO," "Social Media," and "Analytics."

3. **Creates a Structured Content Map**:
   - Using the extracted information, it creates a **topic map** that organizes your content logically and shows how everything is interconnected.
   - This makes it easier for search engines to understand your content and rank it higher.

4. **Provides Actionable Insights**:
   - The project suggests ways to improve your content. For example:
     - Adding new related topics to fill content gaps.
     - Linking pages internally to improve user experience and navigation.

5. **Generates Metadata and Recommendations**:
   - It creates SEO-friendly **titles**, **descriptions**, and **schemas** for your web pages, making them more attractive to both search engines and users.

---

### How Does This Help Your Website?

1. **Improves Search Rankings**:
   - By structuring content in a way that search engines prefer, your website is more likely to rank higher in search results.

2. **Enhances User Experience**:
   - Visitors can find what they’re looking for more easily because your content is better organized and linked.

3. **Saves Time and Effort**:
   - The project automates much of the complex work of analyzing, organizing, and optimizing content.

4. **Future-Proofs Your Website**:
   - Search engines are constantly evolving, and this project ensures that your website is ready to adapt to new algorithms.

---

### Who Can Benefit From This Project?

1. **Website Owners and Businesses**:
   - Small businesses, startups, and large companies can use this project to attract more visitors to their websites.

2. **SEO Professionals**:
   - SEO experts can use the tools and insights from this project to improve their strategies.

3. **Content Creators**:
   - Writers and marketers can identify which topics to focus on and how to link them effectively.

4. **Developers and Technologists**:
   - Developers can integrate the tools and recommendations into their workflows to create better websites.

---

### In Simple Words:

This project is like creating a **map and guidebook** for your website. It organizes your content in a way that makes it easier for search engines like Google to find, understand, and rank your website higher. At the same time, it improves the experience for people visiting your website by making it easier for them to find what they need. Whether you're a business owner, SEO professional, or content creator, this project helps you make your website more successful.

---

---

### What is Ontological SEO with Topic Maps?

1. **Ontological SEO**:
   - **Ontology** is about defining relationships between concepts and organizing knowledge in a structured way.
   - **SEO (Search Engine Optimization)** is the practice of improving the visibility of a website on search engines like Google.
   - Combining the two, **Ontological SEO** uses structured relationships between concepts to create better-organized content that aligns with how search engines and users think.

2. **Topic Maps**:
   - A **Topic Map** is a visual or structured representation of related topics. It shows how concepts in your niche are connected to each other.
   - For example, in the context of a fitness website:
     - **Main Topic**: Fitness
     - **Subtopics**: Exercise, Nutrition, Mental Health
     - **Relationships**: Exercise is related to Weight Loss, Nutrition supports Muscle Gain, etc.

---

### Why Use Ontological SEO with Topic Maps?

1. **Improved SEO**:
   - Search engines like Google understand relationships between concepts better using advanced AI. A well-structured Topic Map helps them understand your content and its context.
   - This improves your site’s visibility for related search queries.

2. **Better User Experience**:
   - When your website content is organized around clear relationships between topics, it becomes easier for users to navigate and find relevant information.

3. **Content Strategy**:
   - A Topic Map helps you identify content gaps and plan your blog posts, articles, or resources based on the relationships between topics.

---

### Use Cases of Ontological SEO with Topic Maps (Website Context)

1. **E-commerce Websites**:
   - Main Topic: Electronics
   - Subtopics: Smartphones, Laptops, Accessories
   - Relationships: Smartphones are related to Accessories like cases or chargers.

2. **Blog Websites**:
   - Main Topic: Personal Finance
   - Subtopics: Budgeting, Investing, Saving
   - Relationships: Budgeting is linked to Saving, Investing is linked to Risk Management.

3. **Educational Websites**:
   - Main Topic: Science
   - Subtopics: Physics, Chemistry, Biology
   - Relationships: Physics relates to Mathematics, Chemistry relates to Materials Science.

4. **Local Business Websites**:
   - Main Topic: Home Services
   - Subtopics: Plumbing, Electrical, Landscaping
   - Relationships: Plumbing may relate to Emergency Repairs or Maintenance Tips.

---

### What Does the Model Need to Work?

1. **Data Inputs**:
   - **URLs of Website Pages**: If you’re working with an existing website, the model can scrape and process the text content of your web pages to build the Topic Map.
   - **CSV Data**: If you have pre-organized data (e.g., in a spreadsheet), this can also be used. The data should include topics, subtopics, and any known relationships.

   Example CSV format:
   ```
   Topic, Subtopic, Relationship
   Fitness, Exercise, Supports
   Exercise, Weight Loss, Leads to
   Nutrition, Muscle Gain, Promotes
   ```

2. **Processing**:
   - The model uses Natural Language Processing (NLP) to analyze the content, extract key topics, and identify relationships between them.

---

### What Kind of Output Does the Model Provide?

1. **Structured Topic Map**:
   - A graphical or tabular representation of topics and their relationships.
   - Example: A diagram showing how "Fitness" is connected to "Exercise," "Nutrition," and "Mental Health."

2. **SEO Strategy Suggestions**:
   - Recommendations for creating new content to fill gaps in your topic structure.
   - Suggestions for internal linking based on topic relationships.

3. **Content Categorization**:
   - Automatically organizes your website’s content into categories and subcategories.

4. **Enhanced Metadata**:
   - Suggestions for meta titles, descriptions, and schema markup to improve search engine understanding of your pages.

---

### Real-Life Implementation Steps for Your Project

1. **Prepare Your Data**:
   - If the client has a website, collect all page URLs or export the site’s content to a CSV file.
   - If no website exists, brainstorm a list of topics, subtopics, and relationships relevant to the niche.

2. **Use Ontological SEO Tools**:
   - Tools like Python libraries (SpaCy, NLTK) or specialized SEO platforms can process the data to create Topic Maps.

3. **Visualize the Topic Map**:
   - Use visualization tools like Gephi or even PowerPoint to represent the relationships between topics.

4. **Content Optimization**:
   - Based on the Topic Map, rewrite existing content or create new pages targeting uncovered relationships.

5. **Iterate and Improve**:
   - Regularly update the Topic Map based on new data, trends, or changes in the website.

---



---
# **Part 1: Dynamic Ontological Scraper**
### **File Name:** `dynamic_ontological_scraper.py`

### **Purpose:**
This part of the code is responsible for **scraping webpages** to extract key information such as titles and content. It then saves the data in structured formats like JSON and CSV for further analysis.

### **Key Functions:**
1. **`clean_and_summarize_text`**:
   - Cleans raw text from webpages (e.g., removes extra spaces, special characters).
   - Summarizes the content to ensure it is concise and readable.

2. **`extract_content`**:
   - Extracts titles, headers (h1, h2, h3), and paragraphs (p) from webpages.
   - Skips irrelevant sections (like footers or menus).

3. **`save_to_csv` and `save_to_json`**:
   - Save the scraped data into CSV and JSON files for storage and further analysis.

4. **`process_urls`**:
   - Processes multiple URLs, extracts content, and saves the results.

### **Output:**
- Files: `dynamic_content_output.json` and `dynamic_content_output.csv`
- These files contain the cleaned content for further processing.

---


In [None]:
# File: dynamic_ontological_scraper_with_detailed_comments.py

# Importing necessary libraries
import requests  # Used to make HTTP requests to fetch webpage data
from bs4 import BeautifulSoup  # Helps parse HTML content from webpages
import re  # Regular expressions for cleaning text
import pandas as pd  # Data manipulation and saving to CSV
import json  # Handling JSON file creation
from tabulate import tabulate  # Displaying data in a table format for readability


def clean_and_summarize_text(text, max_length=3500):
    """
    Cleans and summarizes the extracted text by:
    - Removing unnecessary spaces and special characters.
    - Limiting the length of the text while retaining its essential context.
    - Ensuring readability for further analysis.

    Args:
        text (str): Raw text extracted from the webpage.
        max_length (int): Maximum length for the summarized text.

    Returns:
        str: Cleaned and summarized text.
    """
    # Remove extra spaces or line breaks to make the text uniform.
    text = re.sub(r'\s+', ' ', text)

    # Remove special characters (e.g., emojis) but keep punctuation for readability.
    text = re.sub(r'[^\w\s.,-]', '', text)

    # Trim leading and trailing spaces for a neat appearance.
    text = text.strip()

    # Shorten the text if it exceeds the maximum length to keep it concise.
    return text[:max_length] + "..." if len(text) > max_length else text


def extract_content(url):
    """
    Extracts the title and main content of a webpage dynamically.

    This function focuses on extracting key content like headers (h1, h2, h3) and
    paragraphs (p) while skipping irrelevant sections like menus or footers.

    Args:
        url (str): URL of the webpage to scrape.

    Returns:
        tuple: Title of the page and its summarized content.
    """
    try:
        # Step 1: Fetch the webpage content.
        # Sending a GET request to the URL to retrieve its HTML source code.
        response = requests.get(url, timeout=10)
        response.raise_for_status()  # Check if the request was successful.

        # Step 2: Parse the HTML content using BeautifulSoup.
        soup = BeautifulSoup(response.text, 'html.parser')

        # Step 3: Extract the page title (usually displayed in browser tabs).
        # If no title is found, use a placeholder.
        title = soup.title.string if soup.title else "No Title Found"

        # Step 4: Extract headers (h1, h2, h3) and paragraphs (p).
        content = []  # List to store meaningful text content.
        for tag in soup.find_all(['h1', 'h2', 'h3', 'p']):
            # Skip irrelevant sections like menus or footers based on their class or ID.
            if tag.get('class') and ('menu' in tag.get('class') or 'footer' in tag.get('class')):
                continue
            if tag.get('id') and ('menu' in tag.get('id') or 'footer' in tag.get('id')):
                continue

            # Append the cleaned text content from relevant tags.
            content.append(tag.get_text(strip=True))

        # Step 5: Clean and summarize the extracted data for better readability.
        title = clean_and_summarize_text(title)  # Clean the title.
        content = clean_and_summarize_text(" ".join(content), max_length=3500)  # Combine and clean the content.

        # Return the extracted title and content for further processing.
        return title, content

    except Exception as e:
        # If any error occurs during extraction, return an error message.
        return "Error", f"Failed to fetch content from {url}: {e}"


def save_to_csv(data, filename="output.csv"):
    """
    Saves the extracted data into a CSV file for easy access and analysis.

    Args:
        data (list): List of dictionaries containing URLs, titles, and content.
        filename (str): Name of the output CSV file.
    """
    try:
        # Convert the data into a DataFrame for structured storage.
        df = pd.DataFrame(data)

        # Save the DataFrame to a CSV file.
        df.to_csv(filename, index=False)

        # Inform the user that the file has been saved.
        print(f"Content saved to CSV file: {filename}")
    except Exception as e:
        # Handle errors during the saving process.
        print(f"Error saving to CSV: {e}")


def save_to_json(data, filename="output.json"):
    """
    Saves the extracted data into a JSON file for flexibility and integration.

    Args:
        data (list): List of dictionaries containing URLs, titles, and content.
        filename (str): Name of the output JSON file.
    """
    try:
        # Save the data into a JSON file with indentation for readability.
        with open(filename, 'w', encoding='utf-8') as f:
            json.dump(data, f, ensure_ascii=False, indent=4)

        # Inform the user that the file has been saved.
        print(f"Content saved to JSON file: {filename}")
    except Exception as e:
        # Handle errors during the saving process.
        print(f"Error saving to JSON: {e}")


def process_urls(url_list, csv_file, json_file):
    """
    Processes multiple URLs to extract meaningful content and save results.

    This function iterates through a list of URLs, scrapes the content, and
    stores the results in CSV and JSON files.

    Args:
        url_list (list): List of URLs to scrape.
        csv_file (str): Path to the output CSV file.
        json_file (str): Path to the output JSON file.
    """
    data = []  # List to hold the extracted data from all URLs.

    for url in url_list:
        # Inform the user about the URL being processed.
        print(f"Processing URL: {url}")

        # Extract the title and content of the webpage.
        title, content = extract_content(url)

        # Append the results as a dictionary to the data list.
        data.append({"URL": url, "Title": title, "Content": content})

    # Save the results into structured files (CSV and JSON).
    save_to_csv(data, csv_file)
    save_to_json(data, json_file)

    # Display a preview of the first 10 rows of the extracted data.
    print("\n--- Preview of Extracted Content ---")
    print(tabulate(pd.DataFrame(data).head(10), headers="keys", tablefmt="grid"))


if __name__ == "__main__":
    # List of URLs to scrape for content.
    urls = [
        'https://thatware.co/',
        'https://thatware.co/advanced-seo-services/',
        'https://thatware.co/digital-marketing-services/',
        'https://thatware.co/business-intelligence-services/',
        'https://thatware.co/link-building-services/',
        'https://thatware.co/branding-press-release-services/',
        'https://thatware.co/conversion-rate-optimization/',
        'https://thatware.co/social-media-marketing/',
        'https://thatware.co/content-proofreading-services/',
        'https://thatware.co/website-design-services/',
        'https://thatware.co/web-development-services/',
        'https://thatware.co/app-development-services/',
        'https://thatware.co/website-maintenance-services/',
        'https://thatware.co/bug-testing-services/',
        'https://thatware.co/software-development-services/',
        'https://thatware.co/competitor-keyword-analysis/'
    ]

    # Define the output file names.
    csv_output_file = "dynamic_content_output.csv"
    json_output_file = "dynamic_content_output.json"

    # Execute the scraping process for all URLs in the list.
    process_urls(urls, csv_output_file, json_output_file)


Processing URL: https://thatware.co/
Processing URL: https://thatware.co/advanced-seo-services/
Processing URL: https://thatware.co/digital-marketing-services/
Processing URL: https://thatware.co/business-intelligence-services/
Processing URL: https://thatware.co/link-building-services/
Processing URL: https://thatware.co/branding-press-release-services/
Processing URL: https://thatware.co/conversion-rate-optimization/
Processing URL: https://thatware.co/social-media-marketing/
Processing URL: https://thatware.co/content-proofreading-services/
Processing URL: https://thatware.co/website-design-services/
Processing URL: https://thatware.co/web-development-services/
Processing URL: https://thatware.co/app-development-services/
Processing URL: https://thatware.co/website-maintenance-services/
Processing URL: https://thatware.co/bug-testing-services/
Processing URL: https://thatware.co/software-development-services/
Processing URL: https://thatware.co/competitor-keyword-analysis/
Content s

---
# **Columns and Their Meanings**

1. **URL**:
   - **What it is**: This is the link to the webpage being analyzed or for which metadata has been generated.
   - **Why it matters**: It identifies the specific webpage. Without the URL, we wouldn’t know which page the metadata corresponds to.
   - **Example**:
     ```
     https://thatware.co/
     ```
     This is the homepage of the ThatWare website.

2. **Title**:
   - **What it is**: The title of the webpage. It usually appears in the browser tab or as the clickable headline in search engine results.
   - **Why it matters**: The title is crucial for SEO (Search Engine Optimization) and user engagement. It summarizes the page’s content in a concise and appealing way.
   - **Example**:
     ```
     THATWARE - Revolutionizing SEO with Hyper-Intelligence
     ```
     This title is likely designed to grab attention and highlight ThatWare’s expertise in AI-driven SEO.

3. **Content**:
   - **What it is**: The textual content or description of the webpage. It includes details about the services, features, or purpose of the page.
   - **Why it matters**: Search engines use this content to understand the relevance of the page for specific search queries. It’s also important for users to understand what the page offers.
   - **Example**:
     ```
     Home RevenueGenerated via SEO Qualified LeadsGenerated GET A CUSTOMIZED SEO AUDIT DIGITAL MARKETING STRATEGY FOR YOUR BUSINESS...
     ```
     This snippet provides an overview of what the webpage offers, such as AI-based SEO services and unique algorithms.

---


### **Detailed Explanation of Row**

#### **Row 0 (Homepage)**

1. **URL**: `https://thatware.co/`
   - This is the homepage URL for ThatWare.

2. **Title**: `THATWARE - Revolutionizing SEO with Hyper-Intelligence`
   - The title emphasizes ThatWare's unique approach to SEO using AI, aiming to attract users looking for innovative solutions.

3. **Content**:
   ```
   Home RevenueGenerated via SEO Qualified LeadsGenerated GET A CUSTOMIZED SEO AUDIT DIGITAL MARKETING STRATEGY FOR YOUR BUSINESS...
   ```
   - This snippet indicates the page’s focus:
     - Helping businesses improve their marketing strategies.
     - Emphasizing AI-powered tools to solve complex SEO problems.
   - **Importance**: This content must be clear and engaging to attract both search engines and potential customers.

#### **Row 1 (Advanced SEO Services)**

1. **URL**: `https://thatware.co/advanced-seo-services/`
   - This URL points to the "Advanced SEO Services" section.

2. **Title**: `Advanced SEO Services - Professional SEO Agency - ThatWare`
   - The title highlights advanced services and positions ThatWare as a professional agency.

3. **Content**:
   ```
   Advanced SEO Services GET A FREE CUSTOMIZED ADVANCED SEO AUDIT DIGITAL MARKETING STRATEGY NOW...
   ```
   - This section discusses the importance of SEO for businesses and promotes ThatWare’s expertise.
   - **Purpose**: This encourages users to get a customized SEO audit, driving lead generation.

---




---
# **Part 2: Advanced Topic Map Generator**
### **File Name:** `advanced_topic_map_generator_resolved.py`

### **Purpose:**
This code uses the content scraped in Part 1 to generate a **topic map**, which is a structured representation of key topics and their relationships.

### **Key Functions:**
1. **`load_cleaned_data`**:
   - Loads the JSON content extracted in Part 1 for topic mapping.

2. **`sanitize_topic_name`**:
   - Cleans and normalizes topic names for consistency.

3. **`extract_topics_with_spacy`**:
   - Uses Natural Language Processing (NLP) to identify important nouns as topics.

4. **`build_topic_map`**:
   - Creates a map showing the frequency of each topic, its related topics, and the source URLs.

5. **`save_topic_map_as_json` and `save_topic_map_as_rdf`**:
   - Saves the topic map in JSON and RDF/Turtle formats for analysis or integration with other tools.

### **Output:**
- Files: `final_topic_map.json` and `final_topic_map.ttl`
- These contain a hierarchical structure of topics and their relationships.

---


In [None]:
# File: advanced_topic_map_generator_with_detailed_comments.py

# Required installations (Uncomment the next line and run if you need rdflib installed):
# !pip install rdflib

import os  # For interacting with the file system
import json  # To handle JSON data
import logging  # For tracking errors and progress
from collections import defaultdict  # To create a flexible dictionary for topic mapping
import spacy  # A Natural Language Processing (NLP) library to extract topics from text
from rdflib import Graph, Literal, URIRef, Namespace  # To generate RDF/Turtle format for semantic data

# Set up logging for better tracking of what the script is doing.
# If anything goes wrong, it will show detailed error messages in the console.
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s - %(levelname)s - %(message)s")


def load_cleaned_data(file_path):
    """
    Loads webpage content data from a JSON file and ensures it is valid.

    Purpose:
    - Read and parse the JSON file containing URLs and their content.
    - Validate that the file exists and the data is correctly formatted.

    Args:
        file_path (str): Path to the JSON file containing webpage data.

    Returns:
        dict: A dictionary where keys are URLs, and values are the content of those URLs.
        If the file is not found or data is invalid, it returns an empty dictionary.
    """
    try:
        if not os.path.exists(file_path):
            logging.error(f"File not found: {file_path}")
            return {}  # Return an empty dictionary if the file does not exist.

        # Open the file and load its content as JSON.
        with open(file_path, "r", encoding="utf-8") as file:
            data = json.load(file)

        # Extract valid URLs and their associated content into a dictionary.
        return {item["URL"]: item["Content"] for item in data if "URL" in item and "Content" in item}
    except Exception as e:
        logging.error(f"Error loading data: {e}")
        return {}  # Return an empty dictionary in case of any error.


def sanitize_topic_name(topic):
    """
    Cleans topic names to make them machine-readable and valid.

    Purpose:
    - Remove unwanted characters and spaces from topic names.
    - Ensure the topic names can be safely used in RDF or Turtle formats.

    Args:
        topic (str): The raw topic name from the text.

    Returns:
        str: A sanitized topic name with no invalid characters.
    """
    # Replace spaces, periods, and dashes with underscores and remove special characters.
    sanitized = topic.strip().replace(" ", "_").replace(".", "").replace("-", "_").replace("#", "").lower()
    return sanitized if sanitized else None  # Return None for invalid or empty topics.


def extract_topics_with_spacy(content):
    """
    Extracts key topics from webpage content using NLP (spaCy).

    Purpose:
    - Use Natural Language Processing to identify important nouns from the text.
    - These nouns are potential topics for the topic map.

    Args:
        content (str): The full text content of the webpage.

    Returns:
        list: A list of sanitized topic names extracted from the text.
    """
    try:
        # Load the pre-trained spaCy model for English language processing.
        nlp = spacy.load("en_core_web_sm")

        # Process the content to break it into tokens (words).
        doc = nlp(content.lower())  # Normalize the text to lowercase.

        # Extract nouns (key subjects of sentences) and sanitize them.
        topics = [sanitize_topic_name(token.text) for token in doc if token.pos_ == "NOUN"]

        # Filter out any None values or invalid entries.
        return list(filter(None, topics))
    except Exception as e:
        logging.error(f"Error extracting topics: {e}")
        return []  # Return an empty list if there's an error.


def build_topic_map(cleaned_data):
    """
    Creates a map of topics and their relationships from webpage content.

    Purpose:
    - Identify how topics are connected based on their order in the text.
    - Track how often each topic appears and where it came from (URLs).

    Args:
        cleaned_data (dict): Dictionary mapping URLs to their content.

    Returns:
        dict: A structured topic map with topics, their relationships, and metadata.
    """
    # Initialize a dictionary to store topic details.
    topic_map = defaultdict(lambda: {"related": set(), "frequency": 0, "sourceURLs": set()})

    # Iterate over all URLs and their corresponding content.
    for url, content in cleaned_data.items():
        # Extract a list of topics from the content using NLP.
        topics = extract_topics_with_spacy(content)
        if not topics:  # Skip if no topics were found.
            continue

        # Build relationships and metadata for each topic.
        for i, topic in enumerate(topics):
            if topic:
                topic_map[topic]["frequency"] += 1  # Increment the frequency count for the topic.
                topic_map[topic]["sourceURLs"].add(url)  # Record the URL where the topic was found.

                # Add the previous and next topics as related (to create relationships).
                if i > 0:
                    topic_map[topics[i - 1]]["related"].add(topic)
                if i < len(topics) - 1:
                    topic_map[topic]["related"].add(topics[i + 1])

    # Convert the relationships from sets to lists for easier handling and add importance metadata.
    return {
        key: {
            "related": list(value["related"]),
            "frequency": value["frequency"],
            "sourceURLs": list(value["sourceURLs"]),
            "importance": "high" if value["frequency"] > 5 else "low"  # Mark as high importance if frequently mentioned.
        }
        for key, value in topic_map.items() if key != "_"
    }  # Exclude invalid topic names like "_".


def save_topic_map_as_json(topic_map, file_path):
    """
    Saves the topic map as a JSON file for easy sharing and analysis.

    Args:
        topic_map (dict): The topic map to save.
        file_path (str): Path to the output JSON file.
    """
    try:
        with open(file_path, "w", encoding="utf-8") as file:
            json.dump(topic_map, file, ensure_ascii=False, indent=4)
        logging.info(f"Topic map saved as JSON: {file_path}")
    except Exception as e:
        logging.error(f"Error saving topic map as JSON: {e}")


def save_topic_map_as_rdf(topic_map, file_path, base_url="http://thatware.co/topic_map#"):
    """
    Saves the topic map in RDF/Turtle format for semantic web use.

    Purpose:
    - Represent the topic map in a format that can be used in knowledge graphs or linked data.

    Args:
        topic_map (dict): The topic map to save.
        file_path (str): Path to the output Turtle file.
        base_url (str): Base namespace URL for the RDF data.
    """
    try:
        g = Graph()  # Create a new RDF graph.
        ex = Namespace(base_url)  # Define a namespace for the topics.
        g.bind("ex", ex)  # Bind the namespace for RDF prefixes.

        # Add topics and their metadata to the RDF graph.
        for topic, metadata in topic_map.items():
            topic_node = URIRef(f"{base_url}{topic}")  # Create a unique URI for the topic.
            g.add((topic_node, ex.type, Literal("Topic")))  # Add the topic type.
            g.add((topic_node, ex.frequency, Literal(metadata["frequency"])))  # Add frequency metadata.

            # Add the source URLs and related topics.
            for url in metadata["sourceURLs"]:
                g.add((topic_node, ex.sourceURL, Literal(url)))
            for related_topic in metadata["related"]:
                related_node = URIRef(f"{base_url}{related_topic}")
                g.add((topic_node, ex.hasRelatedTopic, related_node))

        # Save the RDF graph to a Turtle file.
        g.serialize(destination=file_path, format="turtle")
        logging.info(f"Topic map saved as RDF: {file_path}")
    except Exception as e:
        logging.error(f"Error saving topic map as RDF: {e}")


if __name__ == "__main__":
    # File path to the input data (generated from the previous script).
    input_file_path = "dynamic_content_output.json"

    # Load the cleaned webpage data.
    cleaned_data = load_cleaned_data(input_file_path)

    if not cleaned_data:
        logging.error("No valid data loaded. Exiting.")
        exit(1)

    # Build the topic map from the cleaned data.
    topic_map = build_topic_map(cleaned_data)

    # Save the topic map in both JSON and RDF formats.
    save_topic_map_as_json(topic_map, "final_topic_map.json")
    save_topic_map_as_rdf(topic_map, "final_topic_map.ttl")

    # Display a JSON preview in the console for verification.
    print(json.dumps(topic_map, indent=4))


{
    "home": {
        "related": [
            "seo"
        ],
        "frequency": 1,
        "sourceURLs": [
            "https://thatware.co/"
        ],
        "importance": "low"
    },
    "seo": {
        "related": [
            "analysts",
            "core",
            "editing",
            "site",
            "playbook",
            "term",
            "competitor",
            "tool",
            "mystery",
            "strategy",
            "success",
            "pay",
            "companies",
            "operations",
            "language",
            "developers",
            "factors",
            "leadsgenerated",
            "comfort",
            "services",
            "intricacies",
            "andoff",
            "search",
            "specialists",
            "mind",
            "nuances",
            "link",
            "systems",
            "analysis",
            "activities",
            "levels",
            "teams",
            "optimization",

---
# **What Is This Output?**

This output is a **structured analysis** of specific keywords and their associated details found in content on a website (in this case, ThatWare’s pages). It seems to be generated as part of an **SEO and content analysis process**. Here’s what it does:

1. **Identifies Important Keywords**:
   - Extracts words or phrases that are frequently used across multiple pages on the website.
   - Groups them into a structured format to show their relationships, frequency, and importance.

2. **Provides Context for Each Keyword**:
   - Highlights related terms or topics.
   - Lists how often the keyword appears (frequency).
   - Links the pages where the keyword is found (sourceURLs).
   - Indicates how critical or relevant the keyword is (importance).

3. **Purpose**:
   - Helps in **keyword optimization** by showing which terms are overused, underutilized, or missing.
   - Guides content creators to strategically improve the content based on keyword importance.

---

### **Breaking Down Each Part**

1. **Keyword Name**:
   - Each main entry represents a **keyword** or **topic** found on the website.
   - Example: `"ppc"` represents "Pay-Per-Click" advertising.

2. **Related Terms**:
   - The `related` field lists words or topics that are commonly associated with the main keyword.
   - Example:
     ```json
     "related": [
         "advertising",
         "content",
         "search",
         "services"
     ]
     ```
     This means that the keyword "ppc" is often used in the context of advertising, content, search, and services.

3. **Frequency**:
   - This shows **how many times the keyword appeared** across the website.
   - Example:
     ```json
     "frequency": 5
     ```
     For "ppc", it appeared five times across all analyzed content.

4. **Source URLs**:
   - The `sourceURLs` field lists the specific pages where the keyword was found.
   - Example:
     ```json
     "sourceURLs": [
         "https://thatware.co/competitor-keyword-analysis/",
         "https://thatware.co/web-development-services/",
         "https://thatware.co/branding-press-release-services/"
     ]
     ```
     This means the keyword "ppc" is used on these three pages.

5. **Importance**:
   - Indicates the **priority or relevance** of the keyword.
   - Example:
     ```json
     "importance": "low"
     ```
     This suggests that the keyword "ppc" has low priority or impact based on the analysis.

---

### **Detailed Explanation of Each Keyword (Example)**

#### **Keyword: `ppc`**
- **Related Terms**: "advertising", "content", "search", "services".
  - These terms show the context in which "ppc" is used.
  - For example, "ppc" might be discussed alongside advertising strategies or content marketing.
  
- **Frequency**: 5
  - This keyword appeared 5 times in the analyzed pages.

- **Source URLs**:
  - Pages like `https://thatware.co/competitor-keyword-analysis/` mention "ppc."
  - This tells us where the keyword is being used and if the content aligns with PPC-related topics.

- **Importance**: Low
  - Indicates that "ppc" might not be a significant focus for the website’s content.

---

#### **Keyword: `management`**
- **Related Terms**: "clients", "systemcrm", "tools".
  - Suggests that "management" is being discussed in relation to client handling, CRM systems, and other tools.

- **Frequency**: 4
  - Appears 4 times across the website.

- **Source URLs**:
  - Includes pages like `https://thatware.co/social-media-marketing/`.
  - This provides actionable insight to content creators: they might need to expand or refine "management" topics.

- **Importance**: Low
  - Indicates "management" is not a high-priority keyword but could be strategically important depending on the content goals.

---

### **Why Is This Output Important?**

1. **For SEO Improvement**:
   - It helps identify which keywords are frequently used and which are missing.
   - Website owners can focus on underused high-impact keywords to improve their rankings.

2. **Content Optimization**:
   - Shows how keywords relate to each other, helping to improve content structure and flow.
   - Example: If "management" appears often but lacks depth, new articles or improvements can be made.

3. **Strategic Insights**:
   - Guides decisions on which pages to enhance based on keyword focus.
   - Example: Pages with important keywords like "SEO" or "digital marketing" should be optimized to drive traffic.

---

### **How Can Website Owners Use This Output?**

1. **Prioritize Keywords**:
   - Focus on high-frequency, high-importance keywords.
   - Example: If "SEO" has high importance and frequency, make it a core focus of content strategy.

2. **Identify Content Gaps**:
   - Look at low-frequency, related terms to find opportunities for new content.
   - Example: "ppc" appears 5 times but has low importance—create more engaging and relevant content around PPC to boost its value.

3. **Refine Pages**:
   - Use the `sourceURLs` to locate pages where a keyword is mentioned.
   - Example: For "management," check if the content aligns with user intent or requires updates.



---

### **Conclusion**

This output is a detailed, keyword-level analysis aimed at helping website owners improve their content and SEO. It gives insights into:
- Keyword usage.
- Content opportunities.
- Page-specific improvements.



---
# **Part 3: SEO Insights Generator**
### **File Name:**  `seo_insights.ttl`

### **Purpose:**
This part analyzes the topic map from Part 2 to generate **SEO recommendations**, including content expansion ideas and internal linking suggestions.

### **Key Functions:**
1. **`calculate_similarity`**:
   - Measures how semantically similar two topics are.

2. **`is_relevant_topic`**:
   - Filters out irrelevant topics based on engagement scores.

3. **`is_valid_link`**:
   - Validates whether a link between two topics is meaningful.

4. **`generate_seo_insights`**:
   - Provides:
     - **Content expansion ideas**: Suggests what topics to explore further.
     - **Internal linking ideas**: Recommends links to improve navigation and SEO.

5. **`save_insights_and_display`**:
   - Saves SEO insights as JSON and RDF/Turtle files for implementation.

### **Output:**
- Files: `seo_insights.json` and `seo_insights.ttl`
- These files contain actionable recommendations for improving SEO.

---

In [None]:
import json
import logging
from rdflib import Graph, Namespace, URIRef, Literal
from sklearn.metrics.pairwise import cosine_similarity
from sentence_transformers import SentenceTransformer
import random

# Configure logging to provide detailed debug information.
# This helps in tracing the program flow and identifying issues during execution.
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s - %(levelname)s - %(message)s")

# Load a pre-trained model for semantic similarity.
# This model converts text into numerical "embeddings," which are numerical representations
# of the words that capture their meanings in context.
model = SentenceTransformer("all-MiniLM-L6-v2")

def calculate_similarity(source, target):
    """
    Calculate how similar two topics are in meaning.

    Purpose:
    - Helps identify relationships between topics by comparing their semantic meaning.
    - Uses cosine similarity to measure how close the embeddings (numerical representations) of the two topics are.

    Args:
        source (str): The first topic.
        target (str): The second topic.

    Returns:
        float: A similarity score between 0 (no similarity) and 1 (perfect match).
    """
    embeddings = model.encode([source, target])  # Convert the topics into embeddings.
    similarity = cosine_similarity([embeddings[0]], [embeddings[1]])[0, 0]  # Calculate similarity between embeddings.
    logging.debug(f"Calculated similarity between '{source}' and '{target}': {similarity:.2f}")
    return similarity

def is_relevant_topic(topic, irrelevant_terms):
    """
    Check if a topic is relevant for content expansion.

    Purpose:
    - Filters out topics that are not useful based on a predefined list of "irrelevant terms."
    - Simulates a relevance score for prioritizing topics dynamically.

    Args:
        topic (str): The topic to evaluate.
        irrelevant_terms (set): A set of terms considered irrelevant.

    Returns:
        bool: True if the topic is relevant, False otherwise.
    """
    # Generate a random "engagement score" to simulate dynamic prioritization of topics.
    engagement_score = random.uniform(0.5, 1)

    # A topic is considered relevant if it's not in the irrelevant terms and its score is above a threshold.
    is_relevant = topic.lower() not in irrelevant_terms and engagement_score > 0.6
    logging.debug(f"Topic '{topic}' is relevant: {is_relevant} (Engagement score: {engagement_score:.2f})")
    return is_relevant

def is_valid_link(source, target, thresholds=[0.4, 0.5, 0.6]):
    """
    Determine if an internal link between two topics makes sense.

    Purpose:
    - Avoids linking a topic to itself.
    - Validates the link by ensuring the similarity between topics exceeds defined thresholds.

    Args:
        source (str): The source topic.
        target (str): The target topic.
        thresholds (list): List of similarity thresholds for validation.

    Returns:
        bool: True if the link is valid, False otherwise.
    """
    if source == target:  # Skip linking topics to themselves.
        logging.debug(f"Skipped linking '{source}' to itself.")
        return False

    # Calculate the similarity between the source and target topics.
    similarity = calculate_similarity(source, target)

    # Check if the similarity exceeds any of the thresholds.
    valid = any(similarity > threshold for threshold in thresholds)
    logging.debug(f"Link valid between '{source}' and '{target}': {valid} (Similarity: {similarity:.2f})")
    return valid

def fallback_related_topics(topic, all_topics):
    """
    Suggest related topics when no explicit links exist.

    Purpose:
    - Generates "backup" related topics using similarity scores.
    - Ensures that even isolated topics have meaningful connections.

    Args:
        topic (str): The current topic.
        all_topics (list): List of all available topics.

    Returns:
        list: A list of fallback related topics.
    """
    fallback = [other for other in all_topics if topic != other and calculate_similarity(topic, other) > 0.3]
    logging.debug(f"Fallback topics for '{topic}': {fallback}")
    return fallback

def parse_turtle_to_dict(turtle_file, base_url="http://thatware.co/topic_map#"):
    """
    Read an RDF Turtle file and convert it into a structured dictionary.

    Purpose:
    - Extracts topics, their metadata (e.g., frequency, related topics), and relationships from the file.
    - This structured data is used for generating SEO insights.

    Args:
        turtle_file (str): Path to the RDF Turtle file.
        base_url (str): The base URL namespace for the RDF data.

    Returns:
        dict: A dictionary representing the topic map.
    """
    try:
        g = Graph()  # Create an RDF graph.
        g.parse(turtle_file, format="turtle")  # Parse the Turtle file.
        ex = Namespace(base_url)  # Define the RDF namespace.

        topic_map = {}
        # Extract all topics and their metadata.
        for s in g.subjects(predicate=ex.type, object=Literal("Topic")):
            topic = s.split("#")[-1]  # Extract the topic name from its URI.
            topic_map[topic] = {
                "frequency": int(next(g.objects(subject=s, predicate=ex.frequency), Literal(0))),  # Frequency data.
                "related": [o.split("#")[-1] for o in g.objects(subject=s, predicate=ex.hasRelatedTopic)],  # Related topics.
            }
        logging.info(f"Parsed {len(topic_map)} topics from the Turtle file.")
        return topic_map
    except Exception as e:
        logging.error(f"Error parsing RDF: {e}")
        return {}

def generate_seo_insights(topic_map, irrelevant_terms):
    """
    Generate suggestions for improving SEO through content expansion and internal linking.

    Purpose:
    - Identifies opportunities to expand existing content.
    - Suggests meaningful internal links between related topics.

    Args:
        topic_map (dict): The structured topic map.
        irrelevant_terms (set): Set of terms to exclude from suggestions.

    Returns:
        dict: A dictionary with content expansion and internal linking suggestions.
    """
    seo_insights = {"content_expansion": [], "internal_links": []}  # Initialize suggestions.
    all_topics = list(topic_map.keys())  # Get a list of all topics.

    for topic, metadata in topic_map.items():
        frequency = metadata.get("frequency", 0)  # How often this topic appears.
        related_topics = metadata.get("related", []) or fallback_related_topics(topic, all_topics)  # Get related topics.

        # Suggest content expansion for less frequently mentioned topics.
        if is_relevant_topic(topic, irrelevant_terms) and (frequency < 3 or len(related_topics) < 2):
            suggestion = f"Expand content for topic '{topic}' by exploring aspects like {', '.join(related_topics[:3])}."
            seo_insights["content_expansion"].append(suggestion)

        # Suggest adding internal links between related topics.
        for related_topic in related_topics:
            if is_valid_link(topic, related_topic):
                suggestion = f"Add an internal link from '{topic}' to '{related_topic}' for better navigation."
                seo_insights["internal_links"].append(suggestion)

    logging.info(f"Generated {len(seo_insights['content_expansion'])} content expansion suggestions.")
    logging.info(f"Generated {len(seo_insights['internal_links'])} internal linking suggestions.")
    return seo_insights

def save_insights_and_display(seo_insights, json_file, turtle_file, base_url="http://thatware.co/seo_insights#"):
    """
    Save SEO insights to JSON and Turtle formats and display them.

    Purpose:
    - Ensure the insights are saved in a reusable format.
    - Display a summary for quick verification.

    Args:
        seo_insights (dict): The generated SEO insights.
        json_file (str): Path to the JSON file.
        turtle_file (str): Path to the RDF Turtle file.
        base_url (str): The base URL namespace for the insights.
    """
    # Save as JSON.
    with open(json_file, "w", encoding="utf-8") as file:
        json.dump(seo_insights, file, ensure_ascii=False, indent=4)
    logging.info(f"Insights saved to JSON file: {json_file}")

    # Save as RDF/Turtle.
    g = Graph()
    ex = Namespace(base_url)
    g.bind("ex", ex)

    for suggestion in seo_insights["content_expansion"]:
        g.add((URIRef(f"{base_url}content_expansion"), ex.insight, Literal(suggestion)))

    for suggestion in seo_insights["internal_links"]:
        g.add((URIRef(f"{base_url}internal_links"), ex.insight, Literal(suggestion)))

    g.serialize(destination=turtle_file, format="turtle")
    logging.info(f"Insights saved to Turtle file: {turtle_file}")

    # Display the suggestions in the console.
    print("\nContent Expansion Suggestions:")
    for suggestion in seo_insights["content_expansion"]:
        print(f"- {suggestion}")

    print("\nInternal Linking Suggestions:")
    for suggestion in seo_insights["internal_links"]:
        print(f"- {suggestion}")

if __name__ == "__main__":
    # Define terms to exclude from suggestions.
    irrelevant_terms = {"angles", "bin", "agencywe"}

    # File paths for input and output.
    turtle_file = "final_topic_map.ttl"
    json_file = "seo_insights.json"
    turtle_file_out = "seo_insights.ttl"

    # Parse the topic map and generate insights.
    topic_map = parse_turtle_to_dict(turtle_file)
    if topic_map:
        seo_insights = generate_seo_insights(topic_map, irrelevant_terms)
        save_insights_and_display(seo_insights, json_file, turtle_file_out)



Content Expansion Suggestions:
- Expand content for topic 'commerce' by exploring aspects like store.
- Expand content for topic 'home' by exploring aspects like seo.
- Expand content for topic 'account' by exploring aspects like software.
- Expand content for topic 'aconversion' by exploring aspects like rate.
- Expand content for topic 'acquisition' by exploring aspects like website.
- Expand content for topic 'addition' by exploring aspects like business.
- Expand content for topic 'adherence' by exploring aspects like testers.
- Expand content for topic 'adjustments' by exploring aspects like checkout.
- Expand content for topic 'ads' by exploring aspects like bids.
- Expand content for topic 'advertisement' by exploring aspects like marketing.
- Expand content for topic 'affinity' by exploring aspects like name.
- Expand content for topic 'agreements' by exploring aspects like information.
- Expand content for topic 'ahrefs' by exploring aspects like seo.
- Expand content for top

---
# **What Is This Output?**

This output is a **list of topics with suggestions for improvement and internal linking** within a website's content. It focuses on:

1. **Expanding Content**: Suggestions on how to add more value to specific topics by diving deeper into related aspects.
2. **Internal Linking**: Recommendations for creating links between related pages on the website to improve navigation and user experience.

Both parts aim to improve the **content quality**, **SEO performance**, and **user engagement** of the website.

---

### **Part 1: Expanding Content**

#### **What Does "Expanding Content" Mean?**
Expanding content means providing more detailed, informative, and engaging information on a particular topic. The suggestions in this part aim to enhance the depth and quality of content, making it more useful for readers and search engines.

#### **How Does It Look in the Output?**
Here’s an example from the output:
```text
Expand content for topic 'article' by exploring aspects like detail.
Expand content for topic 'awareness' by exploring aspects like affinity.
Expand content for topic 'budget' by exploring aspects like marketing.
```

#### **Explanation of Each Part:**
1. **Topic**:
   - The word or phrase (e.g., 'article', 'awareness') represents the focus of the content.

2. **Suggested Expansion**:
   - These are additional areas related to the topic that should be explored.
   - Example: For the topic **‘article’**, exploring **‘detail’** means adding more specific or comprehensive explanations in articles.

3. **Why Is This Important?**:
   - Improves **content quality** by addressing more aspects of the topic.
   - Helps attract **search engine traffic** by targeting related keywords.
   - Provides a better **user experience** by answering readers’ questions more thoroughly.

#### **How a Client Can Use It**:
- Identify topics on their website (e.g., blog posts, product descriptions) where content is thin or generic.
- Use the suggestions to add relevant details and examples.

---

### **Part 2: Internal Linking**

#### **What Does "Internal Linking" Mean?**
Internal linking involves creating hyperlinks between pages on the same website. This helps users navigate easily and ensures search engines understand the structure and relationships of your website.

#### **How Does It Look in the Output?**
Here’s an example from the output:
```text
- Add an internal link from 'app' to 'apps' for better navigation.
- Add an internal link from 'business' to 'marketing' for better navigation.
- Add an internal link from 'seo' to 'strategy' for better navigation.
```

#### **Explanation of Each Part:**
1. **Source Topic**:
   - The starting topic (e.g., 'app', 'business') where the link will be placed.

2. **Destination Topic**:
   - The target page or topic (e.g., 'apps', 'marketing') to which the link will lead.

3. **Purpose**:
   - **Better Navigation**: Helps users find related content easily.
   - **SEO Benefits**: Signals search engines about the relationship between pages and distributes authority across the website.

4. **Why Is This Important?**:
   - Keeps users engaged by guiding them to other relevant content.
   - Reduces bounce rates (users leaving the site after viewing one page).
   - Helps search engines crawl the site more efficiently.

#### **How a Client Can Use It**:
- Identify pages where internal links are missing.
- Create links between related topics as suggested in the output.

---

### **Examples Explained in Simple Terms**

#### **Expanding Content Example**:
- **Topic**: 'awareness'
- **Suggested Aspect**: 'affinity'
- **What to Do**:
  - Add information about how businesses can create awareness by building a connection (affinity) with their audience.
  - Include examples, case studies, or actionable tips.

#### **Internal Linking Example**:
- **Link Suggestion**: From 'seo' to 'strategy'
- **What to Do**:
  - Add a hyperlink on the 'seo' page pointing to the 'strategy' page.
  - Example: On the 'seo' page, write: “Learn more about creating a winning SEO strategy here.” Link the word 'strategy' to the corresponding page.

---

### **How This Helps Website Owners**

1. **Improves Content Quality**:
   - By expanding on suggested topics, the website becomes more informative and valuable to users.
   - Increases the chances of ranking higher in search engine results.

2. **Enhances User Experience**:
   - Internal links make it easier for visitors to explore the website.
   - Keeps users engaged longer, increasing the likelihood of conversions (e.g., purchases, sign-ups).

3. **Boosts SEO Performance**:
   - Search engines prefer well-structured websites with comprehensive content and proper internal linking.
   - These changes help distribute authority (link juice) and improve rankings for target keywords.

---

### **Key Takeaways**

1. **Understand the Output**:
   - It’s a roadmap for improving content and navigation on a website.
   - Two main actions: Expanding content and creating internal links.

2. **Practical Steps for Implementation**:
   - Use the suggested topics and aspects to create new content or enhance existing pages.
   - Add internal links as recommended to connect related pages.

3. **Why It’s Valuable**:
   - Helps website owners create a more user-friendly, SEO-optimized, and engaging online presence.

---


---
# **Part 4: Content Categorization and Recommendations**
### **File Name:** `content_categorization_and_recommendations.py`

### **Purpose:**
This part clusters topics into categories and evaluates them for **relevance, coverage, and actionability**. It also provides specific recommendations for improving content quality.

### **Key Functions:**
1. **`categorize_topics`**:
   - Groups similar topics into categories using clustering (K-Means algorithm).

2. **`evaluate_suggestions`**:
   - Calculates metrics like:
     - **Relevance**: Importance of a topic based on frequency.
     - **Coverage**: How well the topic is connected to others.
     - **Actionability**: Ease of implementing suggestions.

3. **`generate_recommendations`**:
   - Provides actionable recommendations based on evaluation metrics.

4. **`save_results`**:
   - Saves categorized topics, evaluations, and recommendations to JSON and CSV files.

### **Output:**
- Files: `categorized_topics_with_recommendations.json` and `evaluation_metrics_with_recommendations.csv`
- These contain detailed evaluations and recommendations for content improvement.

---


In [None]:
import json
import logging
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
from sentence_transformers import SentenceTransformer
from collections import defaultdict

# Configure logging for detailed debugging and transparency.
# This helps track the flow of execution and detect issues at runtime.
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s - %(levelname)s - %(message)s")

# Load a pre-trained model for semantic embeddings.
# The model encodes text into numerical representations that capture meaning and context,
# enabling similarity comparisons and clustering.
model = SentenceTransformer("all-MiniLM-L6-v2")

def load_topic_map(json_file):
    """
    Load the topic map from a JSON file.

    Purpose:
    - Read the structured topic data generated earlier.
    - Ensure that all necessary information (topics, relationships, metadata) is accessible for processing.

    Args:
        json_file (str): Path to the JSON file containing the topic map.

    Returns:
        dict: A dictionary containing the topic map data, or an empty dictionary if loading fails.
    """
    try:
        with open(json_file, "r", encoding="utf-8") as f:
            topic_map = json.load(f)
        logging.info(f"Successfully loaded topic map from {json_file}")
        return topic_map
    except Exception as e:
        logging.error(f"Error loading topic map: {e}")
        return {}

def categorize_topics(topic_map):
    """
    Categorize topics into clusters based on semantic similarity.

    Purpose:
    - Groups topics that are closely related in meaning.
    - Clustering helps structure content for easier analysis and actionable insights.

    Args:
        topic_map (dict): Dictionary containing topics and their metadata.

    Returns:
        dict: A mapping of category labels (e.g., 'Category_0') to their corresponding topics.
    """
    topics = list(topic_map.keys())  # Extract all topic names.
    embeddings = model.encode(topics)  # Generate embeddings (numerical representations) for the topics.

    # Use K-Means clustering to group topics into 5 clusters.
    # The number of clusters is limited by the total number of topics (min 5 or total topics).
    n_clusters = min(len(topics), 5)
    kmeans = KMeans(n_clusters=n_clusters, random_state=42)
    labels = kmeans.fit_predict(embeddings)  # Assign each topic to a cluster.

    # Organize topics into categories using the cluster labels.
    categorized_topics = defaultdict(list)
    for topic, label in zip(topics, labels):
        categorized_topics[f"Category_{label}"].append(topic)

    logging.info("Successfully categorized topics into clusters.")
    return dict(categorized_topics)

def evaluate_suggestions(topic_map, categorized_topics):
    """
    Evaluate categories for their relevance, coverage, and actionability.

    Purpose:
    - Provides scores for each category to identify improvement opportunities.
    - Helps prioritize categories based on their potential impact.

    Args:
        topic_map (dict): Dictionary containing topics and their metadata.
        categorized_topics (dict): Categories with their associated topics.

    Returns:
        dict: A dictionary of evaluation metrics (relevance, coverage, actionability) for each category.
    """
    evaluation = {}

    for category, topics in categorized_topics.items():
        relevance_scores = []  # Measures how important the topics are.
        coverage_scores = []   # Measures how well the topics cover related concepts.
        actionability_scores = []  # Measures how easy it is to act on the topics.

        for topic in topics:
            metadata = topic_map.get(topic, {})
            # Relevance is determined by how frequently a topic appears.
            relevance = metadata.get("frequency", 1) / max(1, len(topics))
            relevance_scores.append(relevance)

            # Coverage is calculated based on the number of related topics within the category.
            related_count = len(metadata.get("related", []))
            coverage = related_count / max(1, len(topics))
            coverage_scores.append(coverage)

            # Actionability is simulated as a random value (to be replaced with real-world data).
            actionability = np.random.uniform(0.6, 1.0)
            actionability_scores.append(actionability)

        # Calculate average scores for the category.
        evaluation[category] = {
            "relevance": np.mean(relevance_scores),
            "coverage": np.mean(coverage_scores),
            "actionability": np.mean(actionability_scores),
        }

    logging.info("Successfully evaluated topic categories.")
    return evaluation

def generate_recommendations(evaluation):
    """
    Generate actionable recommendations based on category evaluations.

    Purpose:
    - Provides specific actions to improve content quality and SEO performance.

    Args:
        evaluation (dict): Evaluation metrics for each category.

    Returns:
        dict: A dictionary of recommendations for each category.
    """
    recommendations = {}

    for category, metrics in evaluation.items():
        relevance = metrics["relevance"]
        coverage = metrics["coverage"]
        actionability = metrics["actionability"]

        # Generate recommendations based on the evaluation metrics.
        if relevance > 0.5 and coverage > 0.5:
            rec = f"Leverage '{category}' for SEO by targeting high-impact internal links and specific content creation."
        elif relevance < 0.3 and coverage < 0.3:
            rec = f"Expand content for '{category}' to improve relevance and topic coverage."
        elif actionability > 0.8:
            rec = f"Quick wins possible for '{category}'. Focus on easily actionable improvements."
        else:
            rec = f"Research and refine '{category}' to improve relevance and coverage."

        recommendations[category] = rec
        logging.info(f"Generated recommendation for {category}: {rec}")

    return recommendations

def save_results(categorized_topics, evaluation, recommendations, json_file, csv_file):
    """
    Save the categorized topics, evaluations, and recommendations.

    Purpose:
    - Stores the results in JSON and CSV formats for further analysis and implementation.

    Args:
        categorized_topics (dict): Categories and their topics.
        evaluation (dict): Evaluation metrics for each category.
        recommendations (dict): Recommendations for each category.
        json_file (str): Path to the JSON output file.
        csv_file (str): Path to the CSV output file.
    """
    try:
        # Save results as a JSON file.
        with open(json_file, "w", encoding="utf-8") as f:
            json.dump({
                "categories": categorized_topics,
                "evaluation": evaluation,
                "recommendations": recommendations
            }, f, indent=4)
        logging.info(f"Results saved to JSON: {json_file}")

        # Save results as a CSV file.
        df = pd.DataFrame(evaluation).T  # Convert evaluation metrics to a DataFrame.
        df["recommendations"] = [recommendations[cat] for cat in df.index]  # Add recommendations to the DataFrame.
        df.to_csv(csv_file, index=True)
        logging.info(f"Results saved to CSV: {csv_file}")
    except Exception as e:
        logging.error(f"Error saving results: {e}")

def display_summary(evaluation, recommendations):
    """
    Display a summary of evaluation metrics and recommendations.

    Purpose:
    - Provides a quick overview of results for review and discussion.

    Args:
        evaluation (dict): Evaluation metrics for each category.
        recommendations (dict): Recommendations for each category.
    """
    print("\n--- Evaluation Summary ---")
    for category, metrics in evaluation.items():
        print(f"Category: {category}")
        print(f"  Relevance: {metrics['relevance']:.2f}")
        print(f"  Coverage: {metrics['coverage']:.2f}")
        print(f"  Actionability: {metrics['actionability']:.2f}")
        print(f"  Recommendation: {recommendations[category]}")
    print("--------------------------")

if __name__ == "__main__":
    # Define file paths for input and output.
    input_json = "final_topic_map.json"
    output_json = "categorized_topics_with_recommendations.json"
    output_csv = "evaluation_metrics_with_recommendations.csv"

    # Step 1: Load the topic map from the JSON file.
    topic_map = load_topic_map(input_json)
    if not topic_map:
        logging.error("No topic map loaded. Exiting.")
        exit(1)

    # Step 2: Categorize topics into clusters.
    categorized_topics = categorize_topics(topic_map)

    # Step 3: Evaluate the categorized topics for improvement opportunities.
    evaluation = evaluate_suggestions(topic_map, categorized_topics)

    # Step 4: Generate actionable recommendations based on the evaluations.
    recommendations = generate_recommendations(evaluation)

    # Step 5: Save results to JSON and CSV files for review and implementation.
    save_results(categorized_topics, evaluation, recommendations, output_json, output_csv)

    # Step 6: Display a summary of the evaluations and recommendations.
    display_summary(evaluation, recommendations)



--- Evaluation Summary ---
Category: Category_1
  Relevance: 0.01
  Coverage: 0.01
  Actionability: 0.81
  Recommendation: Expand content for 'Category_1' to improve relevance and topic coverage.
Category: Category_2
  Relevance: 0.02
  Coverage: 0.02
  Actionability: 0.78
  Recommendation: Expand content for 'Category_2' to improve relevance and topic coverage.
Category: Category_0
  Relevance: 0.03
  Coverage: 0.03
  Actionability: 0.81
  Recommendation: Expand content for 'Category_0' to improve relevance and topic coverage.
Category: Category_3
  Relevance: 0.05
  Coverage: 0.04
  Actionability: 0.80
  Recommendation: Expand content for 'Category_3' to improve relevance and topic coverage.
Category: Category_4
  Relevance: 0.02
  Coverage: 0.01
  Actionability: 0.79
  Recommendation: Expand content for 'Category_4' to improve relevance and topic coverage.
--------------------------


---
# **Understanding the Structure of the Output**

The output contains three main sections:
1. **Categories**: Lists topics grouped into different categories.
2. **Evaluation**: Analyzes the categories in terms of relevance, coverage, and actionability.
3. **Recommendations**: Provides advice on how to improve the content for each category.


---

### **1. Categories Section**

This section organizes related terms into groups (categories). Here’s what each part means:

#### **Example:**
```json
"categories": {
    "Category_1": [
        "home",
        "years",
        "journey",
        "crown",
        ...
    ],
    "Category_2": [
        "seo",
        "enigma",
        "recipe",
        "path",
        ...
    ],
    ...
}
```

#### **Explanation:**
1. **What are Categories?**
   - Categories group similar or related words/terms from the content.
   - Example: In **Category_1**, terms like "home", "years", "journey" are grouped because they share a common context, such as website navigation or user experience.

2. **Why Are They Grouped?**
   - Grouping terms helps identify key themes or topics covered in the content.
   - It makes it easier to see which topics are well-covered and which need improvement.

3. **How to Use This?**
   - Analyze the words in each category to determine if they align with the intended focus of your content.
   - If important words are missing from a category, it suggests that the content needs to be expanded.

---

### **2. Evaluation Section**

This section evaluates each category based on three metrics: relevance, coverage, and actionability.

#### **Example:**
```json
"evaluation": {
    "Category_1": {
        "relevance": 0.011,
        "coverage": 0.010,
        "actionability": 0.803
    },
    ...
}
```

#### **Explanation:**
1. **Metrics Explained:**
   - **Relevance**:
     - Measures how important or meaningful the terms in this category are to your overall content goals.
     - A low score (e.g., 0.01) means the category is not very aligned with the website's focus.
   - **Coverage**:
     - Indicates how well the category's topics are addressed in the content.
     - A low score (e.g., 0.01) suggests that many related terms/topics are missing or underrepresented.
   - **Actionability**:
     - Measures how easy it is to take action to improve the category.
     - A high score (e.g., 0.80) means the category can be improved with relatively simple updates or additions.

2. **Why Are These Metrics Important?**
   - They help you understand the strengths and weaknesses of your content.
   - For example, a category with low relevance but high actionability means it’s easy to adjust the content to make it more relevant.

3. **How to Use This?**
   - Focus on categories with low relevance or coverage but high actionability. These are the easiest wins to improve your content.

---

### **3. Recommendations Section**

This section provides specific advice for improving each category.

#### **Example:**
```json
"recommendations": {
    "Category_1": "Expand content for 'Category_1' to improve relevance and topic coverage.",
    ...
}
```

#### **Explanation:**
1. **What Does This Mean?**
   - The recommendation highlights what needs to be done for each category.
   - Example: For **Category_1**, the advice is to "expand content" to address missing topics or make the category more relevant.

2. **Why Are These Recommendations Important?**
   - They guide content creators on where to focus their efforts for improvement.
   - Expanding content based on these recommendations ensures better coverage and relevance for your audience and search engines.

3. **How to Use This?**
   - For each category, review the associated words and identify areas where content can be added or enhanced.
   - Example: If **Category_1** contains words like "home" and "journey," consider adding content about user journeys on the homepage.

---

### **Evaluation Summary**

This summary provides a quick overview of the evaluation for all categories.

#### **Example:**
```plaintext
--- Evaluation Summary ---
Category: Category_1
  Relevance: 0.01
  Coverage: 0.01
  Actionability: 0.80
  Recommendation: Expand content for 'Category_1' to improve relevance and topic coverage.
...
```

#### **Explanation:**
1. **What Does It Summarize?**
   - Repeats the evaluation metrics for each category in a more readable format.
   - Adds a specific recommendation for each category.

2. **How Is This Useful?**
   - Provides an at-a-glance view of which categories need attention and what actions to take.

---

### **How This Helps the Website Owner**

1. **Content Improvement**:
   - Identifies gaps in the content and provides actionable advice to fill them.
   - Ensures that all important topics are well-covered and relevant.

2. **SEO Optimization**:
   - Helps boost relevance and coverage of important keywords, improving search engine rankings.
   - Enhances user engagement by addressing topics that matter to the audience.

3. **User Experience**:
   - Ensures that content is comprehensive and easy to navigate, leading to higher user satisfaction.

---

### **Key Takeaways**

1. **Purpose of Output**:
   - To analyze website content and provide guidance for improvement.
   - Focuses on expanding relevant content and improving internal structure.

2. **Action Steps**:
   - Review each category and its terms.
   - Follow the recommendations to expand content or improve relevance and coverage.

3. **Benefits**:
   - Better SEO rankings, improved user engagement, and a more comprehensive website.



---
# **Part 5: Enhanced Dynamic Metadata Recommendations**
### **File Name:** `enhanced_dynamic_metadata_recommendations_v2.py`

### **Purpose:**
This part generates **dynamic metadata** for webpages, including optimized meta titles, descriptions, schema markup, and recommendations.

### **Key Functions:**
1. **`summarize_content`**:
   - Creates SEO-friendly meta descriptions using summarization models.

2. **`detect_author`**:
   - Dynamically detects the author or organization name.

3. **`analyze_keywords`**:
   - Identifies missing keywords for better SEO optimization.

4. **`generate_dynamic_metadata`**:
   - Generates:
     - **Meta titles**: Concise, clickable headlines.
     - **Meta descriptions**: Summarized page content.
     - **Schema markup**: Structured data for search engines.
     - **Recommendations**: Actionable insights for improving metadata.

5. **`save_metadata`**:
   - Saves metadata and recommendations to JSON and CSV files.

### **Output:**
- Files: `enhanced_metadata_recommendations_v2.json` and `enhanced_metadata_recommendations_v2.csv`
- These contain metadata and SEO suggestions for each webpage.

---


In [None]:
import json
import logging
import pandas as pd
from sentence_transformers import SentenceTransformer
from transformers import pipeline

# Configure logging for progress tracking and debugging
# Logs provide visibility into the code execution, making it easier to trace issues.
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s - %(levelname)s - %(message)s")

# Load the semantic similarity model for analyzing the context of text.
# This model is pre-trained and converts sentences into numerical embeddings for similarity checks.
semantic_model = SentenceTransformer("all-MiniLM-L6-v2")

# Load the summarization model to create concise summaries of web content.
# This advanced model helps generate meaningful and SEO-optimized descriptions.
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

def load_cleaned_content(json_file):
    """
    Load cleaned content from a JSON file.

    Purpose:
    - Fetch structured data from a file containing the URLs, titles, and content of web pages.
    - Ensure the data is ready for further processing to generate metadata.

    Args:
        json_file (str): Path to the JSON file containing cleaned web page content.

    Returns:
        list: A list of dictionaries where each dictionary contains URL, Title, and Content fields.
    """
    try:
        with open(json_file, "r", encoding="utf-8") as f:
            data = json.load(f)
        logging.info(f"Loaded cleaned content from {json_file}.")
        return data
    except Exception as e:
        logging.error(f"Error loading cleaned content: {e}")
        return []  # Return an empty list if loading fails.

def summarize_content(content, max_length=100):
    """
    Generate a concise summary of the web page content.

    Purpose:
    - Create an SEO-friendly meta description that provides a quick overview of the page's content.
    - Summaries improve click-through rates on search engines.

    Args:
        content (str): Full text content of a web page.
        max_length (int): Maximum character length of the summary.

    Returns:
        str: A summarized version of the content.
    """
    try:
        # Use the summarization model to create a concise description.
        summary = summarizer(content, max_length=max_length, min_length=50, do_sample=False)[0]["summary_text"]
        return summary
    except Exception as e:
        # If the summarization model fails, fallback to truncating the content.
        logging.error(f"Summarization failed: {e}")
        return content[:max_length] + "..."

def detect_author(content):
    """
    Detect the author or organization name from the content.

    Purpose:
    - Dynamically identify the entity responsible for the web page, making metadata more authentic.

    Args:
        content (str): The full text content of the web page.

    Returns:
        str: The name of the author or organization.
    """
    # Basic logic to detect the organization name, extendable with NLP for more accuracy.
    if "Thatware" in content:
        return "Thatware.co"  # Return organization name if found in the content.
    return "Unknown Author"  # Default value if no author is identified.

def analyze_keywords(content):
    """
    Analyze the content for important keywords.

    Purpose:
    - Identify missing keywords that could improve the page's ranking in search results.
    - Suggest adding high-ranking keywords to optimize the content.

    Args:
        content (str): Full text content of the web page.

    Returns:
        list: List of missing keywords to be added.
    """
    # Predefined list of important keywords relevant to the domain.
    important_keywords = ["SEO", "digital marketing", "services", "rankings"]
    found_keywords = [kw for kw in important_keywords if kw in content.lower()]
    missing_keywords = [kw for kw in important_keywords if kw not in found_keywords]
    return missing_keywords  # Return keywords that are not found in the content.

def generate_recommendations(content):
    """
    Generate actionable recommendations for improving the web page.

    Purpose:
    - Suggest specific actions like adding testimonials, expanding content, or optimizing keywords.

    Args:
        content (str): Full text content of the web page.

    Returns:
        dict: A dictionary of improvement recommendations.
    """
    recommendations = {}

    # Suggest adding testimonials if the page mentions "services."
    if "services" in content.lower():
        recommendations["Add Testimonials"] = "Include testimonials to build credibility."

    # Suggest expanding content if it is too short.
    if len(content) < 200:
        recommendations["Expand Content"] = "Consider adding more details to improve engagement."

    # Analyze and suggest missing keywords.
    missing_keywords = analyze_keywords(content)
    if missing_keywords:
        recommendations["Optimize Keywords"] = f"Add these keywords: {', '.join(missing_keywords)}."

    return recommendations

def generate_dynamic_metadata(content):
    """
    Dynamically create metadata for a web page.

    Purpose:
    - Automate the creation of metadata elements like titles, descriptions, schema markup, and recommendations.

    Args:
        content (dict): A dictionary containing URL, Title, and Content fields.

    Returns:
        dict: A dictionary containing metadata and improvement recommendations.
    """
    url = content.get("URL", "Unknown URL")
    raw_title = content.get("Title", "Untitled")
    raw_content = content.get("Content", "")

    # Create a meta title that fits within SEO limits (usually 60 characters).
    meta_title = raw_title[:60].strip() + "..." if len(raw_title) > 60 else raw_title

    # Generate a meta description using the summarization function.
    meta_description = summarize_content(raw_content)

    # Detect the author or organization name from the content.
    author_name = detect_author(raw_content)

    # Create structured data (schema markup) for search engines.
    schema = {
        "@context": "https://schema.org",
        "@type": "WebPage",
        "url": url,
        "name": meta_title,
        "description": meta_description,
        "author": {
            "@type": "Organization",
            "name": author_name
        }
    }

    # Generate specific recommendations for improving the page.
    recommendations = generate_recommendations(raw_content)

    return {
        "URL": url,
        "MetaTitle": meta_title,
        "MetaDescription": meta_description,
        "SchemaMarkup": schema,
        "Recommendations": recommendations
    }

def save_metadata(metadata_list, json_file, csv_file):
    """
    Save metadata and recommendations to files.

    Purpose:
    - Store results in JSON and CSV formats for review and implementation.

    Args:
        metadata_list (list): List of dictionaries containing metadata and recommendations.
        json_file (str): Path to save the JSON file.
        csv_file (str): Path to save the CSV file.
    """
    try:
        # Save metadata to a JSON file for easy readability.
        with open(json_file, "w", encoding="utf-8") as f:
            json.dump(metadata_list, f, ensure_ascii=False, indent=4)
        logging.info(f"Metadata saved to JSON: {json_file}")

        # Save metadata to a CSV file for easy tabular representation.
        df = pd.DataFrame(metadata_list)
        df.to_csv(csv_file, index=False)
        logging.info(f"Metadata saved to CSV: {csv_file}")
    except Exception as e:
        logging.error(f"Error saving metadata: {e}")

def display_metadata(metadata_list):
    """
    Display metadata and recommendations in the console.

    Purpose:
    - Quickly verify the generated metadata for accuracy and completeness.

    Args:
        metadata_list (list): List of dictionaries containing metadata and recommendations.
    """
    print("\n--- Metadata Recommendations ---")
    for metadata in metadata_list:
        print(f"URL: {metadata['URL']}")
        print(f"Meta Title: {metadata['MetaTitle']}")
        print(f"Meta Description: {metadata['MetaDescription']}")
        print(f"Schema Markup: {json.dumps(metadata['SchemaMarkup'], indent=2)}")
        print(f"Recommendations: {metadata['Recommendations']}")
        print("--------------------------------")

if __name__ == "__main__":
    # File paths for input and output.
    input_json = "dynamic_content_output.json"
    output_json = "enhanced_metadata_recommendations_v2.json"
    output_csv = "enhanced_metadata_recommendations_v2.csv"

    # Step 1: Load the cleaned content from the input JSON file.
    cleaned_content = load_cleaned_content(input_json)
    if not cleaned_content:
        logging.error("No cleaned content loaded. Exiting.")
        exit(1)

    # Step 2: Generate metadata dynamically for each content item.
    metadata_list = [generate_dynamic_metadata(item) for item in cleaned_content]

    # Step 3: Save metadata and recommendations to JSON and CSV files.
    save_metadata(metadata_list, output_json, output_csv)

    # Step 4: Display the metadata in the console for review.
    display_metadata(metadata_list)


config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu



--- Metadata Recommendations ---
URL: https://thatware.co/
Meta Title: THATWARE - Revolutionizing SEO with Hyper-Intelligence
Meta Description: Thatware AI SEO has pioneered an impressive portfolio of 927 unique AI SEO algorithms. Utilizing 927 proprietary AI algorithms, our SEO strategy implementation delivers improved SERP results. Google undergoes over 5000 changes to its algorithm each year. We leverage AI to empower our clients to seamlessly adapt to Googles algorithmic shifts.
Schema Markup: {
  "@context": "https://schema.org",
  "@type": "WebPage",
  "url": "https://thatware.co/",
  "name": "THATWARE - Revolutionizing SEO with Hyper-Intelligence",
  "description": "Thatware AI SEO has pioneered an impressive portfolio of 927 unique AI SEO algorithms. Utilizing 927 proprietary AI algorithms, our SEO strategy implementation delivers improved SERP results. Google undergoes over 5000 changes to its algorithm each year. We leverage AI to empower our clients to seamlessly adapt to G

---

### **What is Schema Markup?**
Schema Markup is a form of structured data written in a specific format (like JSON-LD) that helps search engines understand the content of your webpage better. It’s essentially a way to provide additional details about your content in a structured manner so that search engines can use it to create enhanced search results, like rich snippets.

---

### **Breaking Down the Schema Markup Example**

Let’s take this Schema Markup as an example:

```json
{
  "@context": "https://schema.org",
  "@type": "WebPage",
  "url": "https://thatware.co/",
  "name": "THATWARE - Revolutionizing SEO with Hyper-Intelligence",
  "description": "Thatware AI SEO has pioneered an impressive portfolio of 927 unique AI SEO algorithms. Utilizing 927 proprietary AI algorithms, our SEO strategy implementation delivers improved SERP results. Google undergoes over 5000 changes to its algorithm each year. We leverage AI to empower our clients to seamlessly adapt to Googles algorithmic shifts.",
  "author": {
    "@type": "Organization",
    "name": "Thatware.co"
  }
}
```

Here’s what each part means:

- **`@context`:**
  - This defines the context of the data.
  - For Schema Markup, it’s always `"https://schema.org"`, which tells search engines that the data format adheres to Schema.org standards.

- **`@type`:**
  - This specifies the type of entity being described.
  - In this example, it’s `"WebPage"`, which means this schema is describing a webpage.

- **`url`:**
  - The full URL of the page this Schema Markup applies to.
  - Example: `"https://thatware.co/"`.

- **`name`:**
  - The title of the webpage.
  - Example: `"THATWARE - Revolutionizing SEO with Hyper-Intelligence"`.

- **`description`:**
  - A short description of the webpage’s content, usually the same as the meta description.
  - Example: `"Thatware AI SEO has pioneered an impressive portfolio of 927 unique AI SEO algorithms..."`.

- **`author`:**
  - Specifies who created the content of the webpage.
  - Example: `"Thatware.co"` as the organization responsible.

---

### **How to Implement Schema Markup on Your Website**

1. **Generate the Schema Code:**
   - Use tools like [Google’s Structured Data Markup Helper](https://www.google.com/webmasters/markup-helper/) or manually write the JSON-LD code, as shown in the example.

2. **Embed the Code:**
   - Add the JSON-LD Schema Markup into the `<head>` section or the body of your HTML code.
   - Example:
     ```html
     <script type="application/ld+json">
     {
       "@context": "https://schema.org",
       "@type": "WebPage",
       "url": "https://thatware.co/",
       "name": "THATWARE - Revolutionizing SEO with Hyper-Intelligence",
       "description": "Thatware AI SEO has pioneered an impressive portfolio of 927 unique AI SEO algorithms...",
       "author": {
         "@type": "Organization",
         "name": "Thatware.co"
       }
     }
     </script>
     ```

3. **Test the Schema Markup:**
   - Use Google’s [Rich Results Test](https://search.google.com/test/rich-results) to validate the Schema Markup and ensure there are no errors.

4. **Deploy to Live Website:**
   - Once validated, upload the updated HTML to your web server.

---

### **Benefits of Schema Markup**

1. **Improved Search Engine Understanding:**
   - Schema Markup provides detailed information about the content of your webpage. For example, it can tell Google that the webpage is an article, product, or FAQ.

2. **Rich Snippets:**
   - Proper Schema Markup can result in enhanced search results, such as:
     - Ratings (e.g., stars for a product or service).
     - FAQs displayed directly in search results.
     - Additional links or details like event dates or prices.

3. **Higher Click-Through Rates (CTR):**
   - Rich snippets stand out on the search engine results page (SERP), making users more likely to click on your link.

4. **Increased Visibility:**
   - Websites with rich snippets often rank higher in search results because they provide more value to users.

5. **Better Voice Search Results:**
   - Schema Markup helps with answering questions in voice searches by providing clear, structured data.

---

### **How It Benefits Website Owners**

- **Better User Engagement:**
  - Users are more likely to visit a site that appears informative and trustworthy in search results.

- **Competitive Advantage:**
  - Websites using Schema Markup are more likely to outrank competitors who aren’t using it.

- **Tracking and Insights:**
  - Some schema types, like those for reviews, provide detailed insights into how users interact with your content.

---

### **How Schema Markup Impacts SEO**

1. **Direct SEO Benefit:**
   - While Schema Markup itself doesn’t directly increase rankings, it enhances visibility, which can lead to higher CTRs and more traffic—a ranking factor.

2. **Enhanced Crawling:**
   - Search engines can better understand and index your content, potentially boosting relevance for certain keywords.

3. **Local SEO:**
   - For local businesses, implementing schema types like `LocalBusiness` or `Organization` can help in appearing in local packs.

---

### **In Summary**
The Schema Markup in the output is a structured way to give search engines additional details about your webpage. By implementing it:

- You help search engines understand your content better.
- Your webpage becomes eligible for rich snippets, improving its visibility and CTR.
- You gain a competitive edge in search rankings.



# **Ontological SEO with Topic Maps Model**

In [None]:
# File: dynamic_ontological_scraper.py

import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
import json
from tabulate import tabulate


def clean_and_summarize_text(text, max_length=3500):
    """
    Cleans and summarizes the extracted text by:
    - Removing unnecessary spaces and special characters.
    - Limiting the length while retaining essential context.
    - Ensuring readability for detailed analysis.

    Args:
        text (str): Raw text extracted from the webpage.
        max_length (int): Maximum length for the summary.

    Returns:
        str: Cleaned and summarized text.
    """
    text = re.sub(r'\s+', ' ', text)  # Replace multiple spaces/newlines with a single space
    text = re.sub(r'[^\w\s.,-]', '', text)  # Remove special characters except punctuation
    text = text.strip()  # Trim leading and trailing spaces
    return text[:max_length] + "..." if len(text) > max_length else text


def extract_content(url):
    """
    Dynamically extracts the title and meaningful content from a webpage.
    - Targets headers (h1, h2, h3) and paragraphs (p) for hierarchical context.
    - Filters out irrelevant sections like menus, footers, and sidebars.

    Args:
        url (str): URL of the webpage.

    Returns:
        tuple: Title of the page and cleaned, summarized content.
    """
    try:
        # Step 1: Fetch the webpage content
        response = requests.get(url, timeout=10)
        response.raise_for_status()  # Raise exception for failed requests
        soup = BeautifulSoup(response.text, 'html.parser')

        # Step 2: Extract the page title
        title = soup.title.string if soup.title else "No Title Found"

        # Step 3: Extract relevant headers and paragraphs
        content = []
        for tag in soup.find_all(['h1', 'h2', 'h3', 'p']):
            # Exclude sections with known irrelevant classes/IDs (e.g., menus, footers)
            if tag.get('class') and ('menu' in tag.get('class') or 'footer' in tag.get('class')):
                continue
            if tag.get('id') and ('menu' in tag.get('id') or 'footer' in tag.get('id')):
                continue

            # Append the cleaned text content
            content.append(tag.get_text(strip=True))

        # Combine and summarize the content
        title = clean_and_summarize_text(title)
        content = clean_and_summarize_text(" ".join(content), max_length=3500)

        return title, content

    except Exception as e:
        return "Error", f"Failed to fetch content from {url}: {e}"


def save_to_csv(data, filename="output.csv"):
    """
    Saves the extracted data into a CSV file.

    Args:
        data (list): List of dictionaries containing the data.
        filename (str): Name of the output CSV file.
    """
    try:
        df = pd.DataFrame(data)
        df.to_csv(filename, index=False)
        print(f"Content saved to CSV file: {filename}")
    except Exception as e:
        print(f"Error saving to CSV: {e}")


def save_to_json(data, filename="output.json"):
    """
    Saves the extracted data into a JSON file.

    Args:
        data (list): List of dictionaries containing the data.
        filename (str): Name of the output JSON file.
    """
    try:
        with open(filename, 'w', encoding='utf-8') as f:
            json.dump(data, f, ensure_ascii=False, indent=4)
        print(f"Content saved to JSON file: {filename}")
    except Exception as e:
        print(f"Error saving to JSON: {e}")


def process_urls(url_list, csv_file, json_file):
    """
    Processes multiple URLs to extract meaningful content and saves the results to CSV and JSON files.

    Args:
        url_list (list): List of URLs to scrape.
        csv_file (str): Path to the output CSV file.
        json_file (str): Path to the output JSON file.

    Returns:
        None
    """
    data = []

    for url in url_list:
        print(f"Processing URL: {url}")  # Log the current URL being processed
        title, content = extract_content(url)
        data.append({"URL": url, "Title": title, "Content": content})

    # Save the extracted data to CSV and JSON files
    save_to_csv(data, csv_file)
    save_to_json(data, json_file)

    # Display a preview of the extracted data
    print("\n--- Preview of Extracted Content ---")
    print(tabulate(pd.DataFrame(data).head(10), headers="keys", tablefmt="grid"))


if __name__ == "__main__":
    # List of URLs to scrape
    urls =[
    'https://thatware.co/',
    'https://thatware.co/advanced-seo-services/',
    'https://thatware.co/digital-marketing-services/',
    'https://thatware.co/business-intelligence-services/',
    'https://thatware.co/link-building-services/',
    'https://thatware.co/branding-press-release-services/',
    'https://thatware.co/conversion-rate-optimization/',
    'https://thatware.co/social-media-marketing/',
    'https://thatware.co/content-proofreading-services/',
    'https://thatware.co/website-design-services/',
    'https://thatware.co/web-development-services/',
    'https://thatware.co/app-development-services/',
    'https://thatware.co/website-maintenance-services/',
    'https://thatware.co/bug-testing-services/',
    'https://thatware.co/software-development-services/',
    'https://thatware.co/competitor-keyword-analysis/'
]

    # Output files
    csv_output_file = "dynamic_content_output.csv"
    json_output_file = "dynamic_content_output.json"

    # Run the scraping process
    process_urls(urls, csv_output_file, json_output_file)



# File: advanced_topic_map_generator_resolved.py

!pip install rdflib

import os
import json
import logging
from collections import defaultdict
import spacy  # Natural Language Processing library for topic extraction
from rdflib import Graph, Literal, URIRef, Namespace  # For generating RDF/Turtle format

# Configuring logging to monitor the process
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s - %(levelname)s - %(message)s")

def load_cleaned_data(file_path):
    """
    Load content data from a JSON file.

    Purpose:
    - Ensure the input file exists and is readable.
    - Extract URLs and their corresponding content.

    Args:
        file_path (str): Path to the JSON file containing web content.

    Returns:
        dict: A dictionary mapping URLs to their content, or an empty dictionary if issues occur.
    """
    try:
        if not os.path.exists(file_path):
            logging.error(f"File not found: {file_path}")
            return {}
        with open(file_path, "r", encoding="utf-8") as file:
            data = json.load(file)
        # Return a dictionary of valid URLs and content
        return {item["URL"]: item["Content"] for item in data if "URL" in item and "Content" in item}
    except Exception as e:
        logging.error(f"Error loading data: {e}")
        return {}

def sanitize_topic_name(topic):
    """
    Clean and validate topic names.

    Purpose:
    - Remove invalid characters and ensure non-empty topic names.
    - Make topic names machine-readable (e.g., replacing spaces with underscores).

    Args:
        topic (str): Raw topic name to sanitize.

    Returns:
        str: Sanitized topic name, or None if the topic is invalid.
    """
    sanitized = topic.strip().replace(" ", "_").replace(".", "").replace("-", "_").replace("#", "").lower()
    return sanitized if sanitized else None

def extract_topics_with_spacy(content):
    """
    Extract important topics from text content using spaCy.

    Purpose:
    - Identify nouns as potential topics from the input text.
    - Sanitize the topics for RDF compatibility.

    Args:
        content (str): Text content to analyze.

    Returns:
        list: List of valid, sanitized topic names.
    """
    try:
        nlp = spacy.load("en_core_web_sm")
        doc = nlp(content.lower())  # Normalize text to lowercase
        # Extract nouns and sanitize each one
        topics = [sanitize_topic_name(token.text) for token in doc if token.pos_ == "NOUN"]
        return list(filter(None, topics))  # Remove invalid entries
    except Exception as e:
        logging.error(f"Error extracting topics: {e}")
        return []

def build_topic_map(cleaned_data):
    """
    Build a structured map of topics and their relationships.

    Purpose:
    - Identify relationships between topics based on their order in the text.
    - Track frequency and add metadata (e.g., source URLs).

    Args:
        cleaned_data (dict): Dictionary mapping URLs to content.

    Returns:
        dict: A structured topic map containing topics, relationships, and metadata.
    """
    topic_map = defaultdict(lambda: {"related": set(), "frequency": 0, "sourceURLs": set()})

    for url, content in cleaned_data.items():
        topics = extract_topics_with_spacy(content)  # Extract topics from the content
        if not topics:
            continue
        for i, topic in enumerate(topics):
            if topic:
                topic_map[topic]["frequency"] += 1  # Increment frequency
                topic_map[topic]["sourceURLs"].add(url)  # Add source URL
                # Add relationships to adjacent topics
                if i > 0:
                    topic_map[topics[i - 1]]["related"].add(topic)
                if i < len(topics) - 1:
                    topic_map[topic]["related"].add(topics[i + 1])

    # Convert relationships from sets to lists and enrich metadata
    return {
        key: {
            "related": list(value["related"]),
            "frequency": value["frequency"],
            "sourceURLs": list(value["sourceURLs"]),
            "importance": "high" if value["frequency"] > 5 else "low"  # Add derived metadata
        }
        for key, value in topic_map.items() if key != "_"
    }  # Exclude invalid topics like "_"

def save_topic_map_as_json(topic_map, file_path):
    """
    Save the topic map in JSON format.

    Args:
        topic_map (dict): The topic map to save.
        file_path (str): Path to the output JSON file.
    """
    try:
        with open(file_path, "w", encoding="utf-8") as file:
            json.dump(topic_map, file, ensure_ascii=False, indent=4)
        logging.info(f"Topic map saved as JSON: {file_path}")
    except Exception as e:
        logging.error(f"Error saving topic map as JSON: {e}")

def save_topic_map_as_rdf(topic_map, file_path, base_url="http://thatware.co/topic_map#"):
    """
    Save the topic map in RDF/Turtle format.

    Args:
        topic_map (dict): The topic map to save.
        file_path (str): Path to the output Turtle file.
        base_url (str): Base namespace URL for the RDF data.
    """
    try:
        g = Graph()
        ex = Namespace(base_url)
        g.bind("ex", ex)

        for topic, metadata in topic_map.items():
            topic_node = URIRef(f"{base_url}{topic}")
            g.add((topic_node, ex.type, Literal("Topic")))
            g.add((topic_node, ex.frequency, Literal(metadata["frequency"])))
            for url in metadata["sourceURLs"]:
                g.add((topic_node, ex.sourceURL, Literal(url)))
            for related_topic in metadata["related"]:
                related_node = URIRef(f"{base_url}{related_topic}")
                g.add((topic_node, ex.hasRelatedTopic, related_node))
        g.serialize(destination=file_path, format="turtle")
        logging.info(f"Topic map saved as RDF: {file_path}")
    except Exception as e:
        logging.error(f"Error saving topic map as RDF: {e}")

if __name__ == "__main__":
    input_file_path = "dynamic_content_output.json"
    cleaned_data = load_cleaned_data(input_file_path)

    if not cleaned_data:
        logging.error("No valid data loaded. Exiting.")
        exit(1)

    topic_map = build_topic_map(cleaned_data)
    save_topic_map_as_json(topic_map, "final_topic_map.json")
    save_topic_map_as_rdf(topic_map, "final_topic_map.ttl")

    # Print a JSON preview for verification
    print(json.dumps(topic_map, indent=4))



import json
import logging
from rdflib import Graph, Namespace, URIRef, Literal
from sklearn.metrics.pairwise import cosine_similarity
from sentence_transformers import SentenceTransformer
import random

# Configure logging for debugging purposes. This will help us trace any issues in the code.
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s - %(levelname)s - %(message)s")

# Load a pre-trained semantic similarity model. This model converts text into numerical embeddings
# that capture the contextual meaning of words for similarity calculations.
model = SentenceTransformer("all-MiniLM-L6-v2")

def calculate_similarity(source, target):
    """
    Calculate semantic similarity between two topics using embeddings.
    - This function is crucial to identify relationships between topics.
    - It converts the source and target into embeddings and computes their cosine similarity.

    Returns:
    - A similarity score between 0 (no similarity) and 1 (perfect match).
    """
    embeddings = model.encode([source, target])
    similarity = cosine_similarity([embeddings[0]], [embeddings[1]])[0, 0]
    logging.debug(f"Calculated similarity between '{source}' and '{target}': {similarity:.2f}")
    return similarity

def is_relevant_topic(topic, irrelevant_terms):
    """
    Check if a topic is relevant for content expansion.
    - Filters out irrelevant topics based on a predefined list of terms.
    - Generates a random engagement score to simulate topic priority.

    Returns:
    - True if the topic is relevant, False otherwise.
    """
    engagement_score = random.uniform(0.5, 1)  # Simulate relevance dynamically
    is_relevant = topic.lower() not in irrelevant_terms and engagement_score > 0.6
    logging.debug(f"Topic '{topic}' is relevant: {is_relevant} (Engagement score: {engagement_score:.2f})")
    return is_relevant

def is_valid_link(source, target, thresholds=[0.4, 0.5, 0.6]):
    """
    Validate if an internal link between two topics is meaningful.
    - Prevents linking a topic to itself.
    - Uses multiple thresholds to ensure links are semantically valid.

    Returns:
    - True if the similarity exceeds any of the thresholds, False otherwise.
    """
    if source == target:
        logging.debug(f"Skipped linking '{source}' to itself.")
        return False
    similarity = calculate_similarity(source, target)
    valid = any(similarity > threshold for threshold in thresholds)
    logging.debug(f"Link valid between '{source}' and '{target}': {valid} (Similarity: {similarity:.2f})")
    return valid

def fallback_related_topics(topic, all_topics):
    """
    Generate fallback related topics when no explicit links are provided.
    - Uses similarity to find alternative topics.

    Returns:
    - A list of fallback related topics.
    """
    fallback = [other for other in all_topics if topic != other and calculate_similarity(topic, other) > 0.3]
    logging.debug(f"Fallback topics for '{topic}': {fallback}")
    return fallback

def parse_turtle_to_dict(turtle_file, base_url="http://thatware.co/topic_map#"):
    """
    Parse an RDF Turtle file and convert it into a structured dictionary.
    - Helps process the input data for content expansion and internal linking.

    Returns:
    - A dictionary of topics and their metadata.
    """
    try:
        g = Graph()
        g.parse(turtle_file, format="turtle")
        ex = Namespace(base_url)
        topic_map = {}
        for s in g.subjects(predicate=ex.type, object=Literal("Topic")):
            topic = s.split("#")[-1]
            topic_map[topic] = {
                "frequency": int(next(g.objects(subject=s, predicate=ex.frequency), Literal(0))),
                "related": [o.split("#")[-1] for o in g.objects(subject=s, predicate=ex.hasRelatedTopic)],
            }
        logging.info(f"Parsed {len(topic_map)} topics from the Turtle file.")
        return topic_map
    except Exception as e:
        logging.error(f"Error parsing RDF: {e}")
        return {}

def generate_seo_insights(topic_map, irrelevant_terms):
    """
    Generate suggestions for content expansion and internal linking.
    - This is the core logic for the Ontological SEO model.

    Returns:
    - A dictionary containing content expansion and internal linking suggestions.
    """
    seo_insights = {"content_expansion": [], "internal_links": []}
    all_topics = list(topic_map.keys())

    for topic, metadata in topic_map.items():
        frequency = metadata.get("frequency", 0)
        related_topics = metadata.get("related", []) or fallback_related_topics(topic, all_topics)

        # Content Expansion Suggestions
        if is_relevant_topic(topic, irrelevant_terms) and (frequency < 3 or len(related_topics) < 2):
            suggestion = f"Expand content for topic '{topic}' by exploring aspects like {', '.join(related_topics[:3])}."
            seo_insights["content_expansion"].append(suggestion)

        # Internal Linking Suggestions
        for related_topic in related_topics:
            if is_valid_link(topic, related_topic):
                suggestion = f"Add an internal link from '{topic}' to '{related_topic}' for better navigation."
                seo_insights["internal_links"].append(suggestion)

    logging.info(f"Generated {len(seo_insights['content_expansion'])} content expansion suggestions.")
    logging.info(f"Generated {len(seo_insights['internal_links'])} internal linking suggestions.")
    return seo_insights

def save_insights_and_display(seo_insights, json_file, turtle_file, base_url="http://thatware.co/seo_insights#"):
    """
    Save SEO insights to JSON and Turtle formats and display them in the console.
    - This step ensures the results are accessible and easy to preview.
    """
    # Save as JSON
    with open(json_file, "w", encoding="utf-8") as file:
        json.dump(seo_insights, file, ensure_ascii=False, indent=4)
    logging.info(f"Insights saved to JSON file: {json_file}")

    # Save as Turtle
    g = Graph()
    ex = Namespace(base_url)
    g.bind("ex", ex)

    for suggestion in seo_insights["content_expansion"]:
        g.add((URIRef(f"{base_url}content_expansion"), ex.insight, Literal(suggestion)))

    for suggestion in seo_insights["internal_links"]:
        g.add((URIRef(f"{base_url}internal_links"), ex.insight, Literal(suggestion)))

    g.serialize(destination=turtle_file, format="turtle")
    logging.info(f"Insights saved to Turtle file: {turtle_file}")

    # Display insights in the console
    print("\nContent Expansion Suggestions:")
    for suggestion in seo_insights["content_expansion"]:
        print(f"- {suggestion}")

    print("\nInternal Linking Suggestions:")
    for suggestion in seo_insights["internal_links"]:
        print(f"- {suggestion}")

if __name__ == "__main__":
    # Define irrelevant terms
    irrelevant_terms = {"angles", "bin", "agencywe"}

    # File paths
    turtle_file = "final_topic_map.ttl"
    json_file = "seo_insights.json"
    turtle_file_out = "seo_insights.ttl"

    # Parse the input file and generate insights
    topic_map = parse_turtle_to_dict(turtle_file)
    if topic_map:
        seo_insights = generate_seo_insights(topic_map, irrelevant_terms)
        save_insights_and_display(seo_insights, json_file, turtle_file_out)



# File: content_categorization_and_recommendations.py

import json
import logging
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
from sentence_transformers import SentenceTransformer
from collections import defaultdict

# Configure logging for detailed debugging
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s - %(levelname)s - %(message)s")

# Pre-trained model for semantic embeddings to analyze topic similarities
model = SentenceTransformer("all-MiniLM-L6-v2")

def load_topic_map(json_file):
    """
    Load the topic map generated in Part 3 from a JSON file.

    Purpose:
    - Access the topics and metadata required for categorization and evaluation.

    Args:
        json_file (str): Path to the JSON file containing the topic map.

    Returns:
        dict: Dictionary representation of the topic map.
    """
    try:
        with open(json_file, "r", encoding="utf-8") as f:
            topic_map = json.load(f)
        logging.info(f"Successfully loaded topic map from {json_file}")
        return topic_map
    except Exception as e:
        logging.error(f"Error loading topic map: {e}")
        return {}

def categorize_topics(topic_map):
    """
    Categorize topics into clusters based on semantic similarity.

    Purpose:
    - Group similar topics for structured analysis and recommendations.

    Args:
        topic_map (dict): Dictionary containing topics and their metadata.

    Returns:
        dict: A mapping of categories to topics.
    """
    topics = list(topic_map.keys())
    embeddings = model.encode(topics)  # Generate embeddings for clustering

    # Use K-Means to cluster topics into 5 groups
    n_clusters = min(len(topics), 5)  # Ensure clusters <= total topics
    kmeans = KMeans(n_clusters=n_clusters, random_state=42)
    labels = kmeans.fit_predict(embeddings)

    # Organize topics into categories
    categorized_topics = defaultdict(list)
    for topic, label in zip(topics, labels):
        categorized_topics[f"Category_{label}"].append(topic)

    logging.info("Successfully categorized topics.")
    return dict(categorized_topics)

def evaluate_suggestions(topic_map, categorized_topics):
    """
    Evaluate topics and generate scores for relevance, coverage, and actionability.

    Purpose:
    - Analyze categories for quality and improvement opportunities.

    Args:
        topic_map (dict): Dictionary containing topics and metadata.
        categorized_topics (dict): Categories with their topics.

    Returns:
        dict: Evaluation metrics for each category.
    """
    evaluation = {}

    for category, topics in categorized_topics.items():
        relevance_scores = []
        coverage_scores = []
        actionability_scores = []

        for topic in topics:
            metadata = topic_map.get(topic, {})
            # Relevance: Based on frequency of the topic
            relevance = metadata.get("frequency", 1) / max(1, len(topics))
            relevance_scores.append(relevance)

            # Coverage: Proportion of related topics within the category
            related_count = len(metadata.get("related", []))
            coverage = related_count / max(1, len(topics))
            coverage_scores.append(coverage)

            # Actionability: Simulates ease of implementing suggestions
            actionability = np.random.uniform(0.6, 1.0)
            actionability_scores.append(actionability)

        # Calculate aggregate metrics for the category
        evaluation[category] = {
            "relevance": np.mean(relevance_scores),
            "coverage": np.mean(coverage_scores),
            "actionability": np.mean(actionability_scores),
        }

    logging.info("Successfully evaluated suggestions.")
    return evaluation

def generate_recommendations(evaluation):
    """
    Generate recommendations for each category based on evaluation metrics.

    Purpose:
    - Provide actionable insights for improving content quality.

    Args:
        evaluation (dict): Evaluation metrics for each category.

    Returns:
        dict: Dynamic recommendations for each category.
    """
    recommendations = {}
    for category, metrics in evaluation.items():
        relevance = metrics["relevance"]
        coverage = metrics["coverage"]
        actionability = metrics["actionability"]

        # Recommendation logic
        if relevance > 0.5 and coverage > 0.5:
            rec = f"Leverage '{category}' for SEO by targeting high-impact internal links and specific content creation."
        elif relevance < 0.3 and coverage < 0.3:
            rec = f"Expand content for '{category}' to improve relevance and topic coverage."
        elif actionability > 0.8:
            rec = f"Quick wins possible for '{category}'. Focus on easily actionable improvements."
        else:
            rec = f"Research and refine '{category}' to improve relevance and coverage."

        recommendations[category] = rec
        logging.info(f"Generated recommendation for {category}: {rec}")

    return recommendations

def save_results(categorized_topics, evaluation, recommendations, json_file, csv_file):
    """
    Save categorized topics, evaluation metrics, and recommendations.

    Args:
        categorized_topics (dict): Categories and their topics.
        evaluation (dict): Evaluation metrics for each category.
        recommendations (dict): Recommendations for each category.
        json_file (str): Path to save JSON output.
        csv_file (str): Path to save CSV output.
    """
    try:
        # Save to JSON
        with open(json_file, "w", encoding="utf-8") as f:
            json.dump({
                "categories": categorized_topics,
                "evaluation": evaluation,
                "recommendations": recommendations
            }, f, indent=4)
        logging.info(f"Results saved to JSON: {json_file}")

        # Save to CSV
        df = pd.DataFrame(evaluation).T
        df["recommendations"] = [recommendations[cat] for cat in df.index]
        df.to_csv(csv_file, index=True)
        logging.info(f"Results saved to CSV: {csv_file}")
    except Exception as e:
        logging.error(f"Error saving results: {e}")

def display_summary(evaluation, recommendations):
    """
    Display evaluation metrics and recommendations in the console.

    Args:
        evaluation (dict): Metrics for each category.
        recommendations (dict): Recommendations for each category.
    """
    print("\n--- Evaluation Summary ---")
    for category, metrics in evaluation.items():
        print(f"Category: {category}")
        print(f"  Relevance: {metrics['relevance']:.2f}")
        print(f"  Coverage: {metrics['coverage']:.2f}")
        print(f"  Actionability: {metrics['actionability']:.2f}")
        print(f"  Recommendation: {recommendations[category]}")
    print("--------------------------")

if __name__ == "__main__":
    # File paths
    input_json = "final_topic_map.json"
    output_json = "categorized_topics_with_recommendations.json"
    output_csv = "evaluation_metrics_with_recommendations.csv"

    # Step 1: Load topic map
    topic_map = load_topic_map(input_json)
    if not topic_map:
        logging.error("No topic map loaded. Exiting.")
        exit(1)

    # Step 2: Categorize topics
    categorized_topics = categorize_topics(topic_map)

    # Step 3: Evaluate suggestions
    evaluation = evaluate_suggestions(topic_map, categorized_topics)

    # Step 4: Generate recommendations
    recommendations = generate_recommendations(evaluation)

    # Step 5: Save results
    save_results(categorized_topics, evaluation, recommendations, output_json, output_csv)

    # Step 6: Display summary
    display_summary(evaluation, recommendations)



# File: enhanced_dynamic_metadata_recommendations_v2.py

import json
import logging
import pandas as pd
from sentence_transformers import SentenceTransformer
from transformers import pipeline

# Configure logging for debugging and progress tracking.
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s - %(levelname)s - %(message)s")

# Load models for semantic understanding and summarization.
semantic_model = SentenceTransformer("all-MiniLM-L6-v2")  # Efficient model for semantic embeddings.
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")  # Advanced summarization model.

def load_cleaned_content(json_file):
    """
    Load cleaned content from a JSON file.

    Purpose:
    - Access the processed website content required for metadata generation.

    Args:
        json_file (str): Path to the JSON file containing cleaned content.

    Returns:
        list: List of dictionaries with URL, Title, and Content fields.
    """
    try:
        with open(json_file, "r", encoding="utf-8") as f:
            data = json.load(f)
        logging.info(f"Loaded cleaned content from {json_file}.")
        return data
    except Exception as e:
        logging.error(f"Error loading cleaned content: {e}")
        return []

def summarize_content(content, max_length=100):
    """
    Generate a refined summary of the content using an NLP model.

    Purpose:
    - Ensure the `MetaDescription` is concise, meaningful, and SEO-optimized.

    Args:
        content (str): The full text content of the webpage.
        max_length (int): Maximum length of the summary.

    Returns:
        str: A summarized version of the content.
    """
    try:
        summary = summarizer(content, max_length=max_length, min_length=50, do_sample=False)[0]["summary_text"]
        return summary
    except Exception as e:
        logging.error(f"Summarization failed: {e}")
        return content[:max_length] + "..."  # Fallback to basic truncation.

def detect_author(content):
    """
    Detect the author or organization name dynamically from the content.

    Purpose:
    - Dynamically infer the author instead of using hardcoded values.

    Args:
        content (str): The full text content of the webpage.

    Returns:
        str: The inferred author or organization name.
    """
    # Simple logic to infer organization name; can be extended with NLP techniques.
    if "Thatware" in content:
        return "Thatware.co"
    return "Unknown Author"

def analyze_keywords(content):
    """
    Analyze the content to identify keyword density and missing opportunities.

    Purpose:
    - Suggest missing high-ranking keywords.

    Args:
        content (str): The full text content of the webpage.

    Returns:
        list: List of recommended keywords to add.
    """
    # Placeholder: In practice, integrate with a keyword extraction API or model.
    important_keywords = ["SEO", "digital marketing", "services", "rankings"]
    found_keywords = [kw for kw in important_keywords if kw in content.lower()]
    missing_keywords = [kw for kw in important_keywords if kw not in found_keywords]
    return missing_keywords

def generate_recommendations(content):
    """
    Generate tailored recommendations for improving page content.

    Purpose:
    - Provide actionable and specific suggestions based on page type and content.

    Args:
        content (str): The full text content of the webpage.

    Returns:
        dict: Recommendations for improving the webpage.
    """
    recommendations = {}
    if "services" in content.lower():
        recommendations["Add Testimonials"] = "Include testimonials to build credibility."
    if len(content) < 200:
        recommendations["Expand Content"] = "Consider adding more details to improve engagement."
    missing_keywords = analyze_keywords(content)
    if missing_keywords:
        recommendations["Optimize Keywords"] = f"Add these keywords: {', '.join(missing_keywords)}."
    return recommendations

def generate_dynamic_metadata(content):
    """
    Dynamically generate metadata (title, description, schema, recommendations) for a webpage.

    Purpose:
    - Create optimized metadata and actionable recommendations for SEO improvement.

    Args:
        content (dict): A dictionary containing URL, Title, and Content fields.

    Returns:
        dict: Metadata including Title, Description, Schema, and Recommendations.
    """
    url = content.get("URL", "Unknown URL")
    raw_title = content.get("Title", "Untitled")
    raw_content = content.get("Content", "")

    # Generate meta title with truncation for length limits.
    meta_title = raw_title[:60].strip() + "..." if len(raw_title) > 60 else raw_title

    # Generate meta description using the summarization model.
    meta_description = summarize_content(raw_content)

    # Detect author name dynamically.
    author_name = detect_author(raw_content)

    # Generate structured data (schema markup).
    schema = {
        "@context": "https://schema.org",
        "@type": "WebPage",
        "url": url,
        "name": meta_title,
        "description": meta_description,
        "author": {
            "@type": "Organization",
            "name": author_name
        }
    }

    # Generate actionable recommendations for the webpage.
    recommendations = generate_recommendations(raw_content)

    return {
        "URL": url,
        "MetaTitle": meta_title,
        "MetaDescription": meta_description,
        "SchemaMarkup": schema,
        "Recommendations": recommendations
    }

def save_metadata(metadata_list, json_file, csv_file):
    """
    Save metadata recommendations to JSON and CSV files.

    Purpose:
    - Store metadata in accessible formats for review and implementation.

    Args:
        metadata_list (list): List of metadata dictionaries for each URL.
        json_file (str): Path to the JSON file.
        csv_file (str): Path to the CSV file.
    """
    try:
        # Save metadata to JSON file.
        with open(json_file, "w", encoding="utf-8") as f:
            json.dump(metadata_list, f, ensure_ascii=False, indent=4)
        logging.info(f"Metadata saved to JSON: {json_file}")

        # Save metadata to CSV file.
        df = pd.DataFrame(metadata_list)
        df.to_csv(csv_file, index=False)
        logging.info(f"Metadata saved to CSV: {csv_file}")
    except Exception as e:
        logging.error(f"Error saving metadata: {e}")

def display_metadata(metadata_list):
    """
    Display metadata recommendations in the console.

    Purpose:
    - For quick verification of generated metadata.

    Args:
        metadata_list (list): List of metadata dictionaries for each URL.
    """
    print("\n--- Metadata Recommendations ---")
    for metadata in metadata_list:
        print(f"URL: {metadata['URL']}")
        print(f"Meta Title: {metadata['MetaTitle']}")
        print(f"Meta Description: {metadata['MetaDescription']}")
        print(f"Schema Markup: {json.dumps(metadata['SchemaMarkup'], indent=2)}")
        print(f"Recommendations: {metadata['Recommendations']}")
        print("--------------------------------")

if __name__ == "__main__":
    # Input and output file paths.
    input_json = "dynamic_content_output.json"  # Cleaned content from Part 1.
    output_json = "enhanced_metadata_recommendations_v2.json"
    output_csv = "enhanced_metadata_recommendations_v2.csv"

    # Step 1: Load cleaned content.
    cleaned_content = load_cleaned_content(input_json)
    if not cleaned_content:
        logging.error("No cleaned content loaded. Exiting.")
        exit(1)

    # Step 2: Generate metadata dynamically for each content item.
    metadata_list = [generate_dynamic_metadata(item) for item in cleaned_content]

    # Step 3: Save metadata recommendations to JSON and CSV.
    save_metadata(metadata_list, output_json, output_csv)

    # Step 4: Display metadata in the console.
    display_metadata(metadata_list)


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
            "trackers",
            "company"
        ],
        "frequency": 3,
        "sourceURLs": [
            "https://thatware.co/branding-press-release-services/",
            "https://thatware.co/bug-testing-services/"
        ],
        "importance": "low"
    },
    "ppc": {
        "related": [
            "services",
            "content",
            "search",
            "advertising"
        ],
        "frequency": 5,
        "sourceURLs": [
            "https://thatware.co/branding-press-release-services/",
            "https://thatware.co/web-development-services/",
            "https://thatware.co/competitor-keyword-analysis/"
        ],
        "importance": "low"
    },
    "creativity": {
        "related": [
            "skills"
        ],
        "frequency": 1,
        "sourceURLs": [
            "https://thatware.co/branding-press-release-services/"
        ],
        "importance": "low"
    },


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]


Content Expansion Suggestions:
- Expand content for topic 'commerce' by exploring aspects like store.
- Expand content for topic 'home' by exploring aspects like seo.
- Expand content for topic 'account' by exploring aspects like software.
- Expand content for topic 'aconversion' by exploring aspects like rate.
- Expand content for topic 'acquisition' by exploring aspects like website.
- Expand content for topic 'act' by exploring aspects like words.
- Expand content for topic 'addition' by exploring aspects like business.
- Expand content for topic 'adherence' by exploring aspects like testers.
- Expand content for topic 'adjustments' by exploring aspects like checkout.
- Expand content for topic 'ads' by exploring aspects like bids.
- Expand content for topic 'advertisement' by exploring aspects like marketing.
- Expand content for topic 'advertisements' by exploring aspects like content.
- Expand content for topic 'advocates' by exploring aspects like leads.
- Expand content for to

config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu



--- Metadata Recommendations ---
URL: https://thatware.co/
Meta Title: THATWARE - Revolutionizing SEO with Hyper-Intelligence
Meta Description: Thatware AI SEO has pioneered an impressive portfolio of 927 unique AI SEO algorithms. Utilizing 927 proprietary AI algorithms, our SEO strategy implementation delivers improved SERP results. Google undergoes over 5000 changes to its algorithm each year. We leverage AI to empower our clients to seamlessly adapt to Googles algorithmic shifts.
Schema Markup: {
  "@context": "https://schema.org",
  "@type": "WebPage",
  "url": "https://thatware.co/",
  "name": "THATWARE - Revolutionizing SEO with Hyper-Intelligence",
  "description": "Thatware AI SEO has pioneered an impressive portfolio of 927 unique AI SEO algorithms. Utilizing 927 proprietary AI algorithms, our SEO strategy implementation delivers improved SERP results. Google undergoes over 5000 changes to its algorithm each year. We leverage AI to empower our clients to seamlessly adapt to G

## Part 1 Output
---
# **Understanding the Output: Ontological SEO with Topic Maps:**

This output provides categorized keywords/topics and their relationships. The data is designed to help website owners optimize their content and structure for better SEO. Here's what the key components mean:

---

#### **Key Components of the Output**

1. **"software":**
   - **Details:**
     - Related terms: Testing, engineers, developers, application, quality, performance, etc.
     - Frequency: The term "software" appears 33 times across the provided source URLs.
     - Source URLs: `https://thatware.co/bug-testing-services/`, `https://thatware.co/software-development-services/`
     - Importance: High.

   - **Meaning:**
     - This indicates that "software" is a critical keyword across the content and is highly relevant to the pages identified.
     - Related terms provide a context of topics around "software," such as testing, development, applications, and quality.

   - **Use Case:**
     - The website owner should ensure that these keywords are integrated naturally into the website’s content.
     - The topics like "software testing," "application development," or "quality assurance" should have dedicated pages or sections on the site to cover these aspects in depth.

   - **Actionable Steps:**
     1. Create blog posts, guides, or services pages specifically addressing related topics (e.g., "Software Quality Assurance Best Practices").
     2. Use these keywords in meta tags, headers, and descriptions to enhance SEO relevance.
     3. Ensure internal links connect related content for better navigation.

---

2. **"bugs":**
   - **Details:**
     - Related terms: Malfunctions, process, device, etc.
     - Frequency: Appears 7 times in the identified URLs.
     - Source URLs: `https://thatware.co/bug-testing-services/`
     - Importance: High.

   - **Meaning:**
     - This highlights the importance of "bugs" as a keyword. It reflects user interest in troubleshooting and fixing software issues.

   - **Use Case:**
     - The term "bugs" could be leveraged to provide content on identifying, fixing, and preventing software bugs.
     - This positions the website as an authority on bug testing and software reliability.

   - **Actionable Steps:**
     1. Develop detailed guides, videos, or articles on bug fixes and prevention.
     2. Highlight bug-related services offered by the website, using case studies or client success stories.
     3. Create FAQs or resources for users searching for bug troubleshooting.

---

3. **"testingis":**
   - **Details:**
     - Related term: Feature.
     - Frequency: Appears once.
     - Source URLs: `https://thatware.co/bug-testing-services/`
     - Importance: Low.

   - **Meaning:**
     - Although less frequent, "testingis" (likely a typographical or conceptual placeholder) relates to software features and testing.
     - It may imply a need to explore specific tests for software features.

   - **Use Case:**
     - Ensure clear content on feature-specific software testing or testing methodologies.

   - **Actionable Steps:**
     1. Clarify content on testing methodologies, such as regression testing or unit testing.
     2. Ensure all occurrences of "testingis" are correctly contextualized or rewritten.

---

4. **"debugging":**
   - **Details:**
     - Related term: Disorders.
     - Frequency: Appears once.
     - Source URLs: `https://thatware.co/bug-testing-services/`
     - Importance: Low.

   - **Meaning:**
     - Debugging content might touch on fixing issues (referred to as "disorders") in software.

   - **Use Case:**
     - Provide content or tools related to debugging, making the site useful for developers and testers.

   - **Actionable Steps:**
     1. Create step-by-step debugging guides.
     2. Offer tools or resources for identifying and fixing software issues.

---

5. **"applications":**
   - **Details:**
     - Related terms: Layers, business, cost, software.
     - Frequency: 4 times.
     - Source URLs: `https://thatware.co/bug-testing-services/`, `https://thatware.co/software-development-services/`
     - Importance: Low.

   - **Meaning:**
     - Highlights the connection between "applications" and software, focusing on business use cases, costs, and technical layers.

   - **Use Case:**
     - Content should address business applications of software, cost factors, and implementation layers.

   - **Actionable Steps:**
     1. Publish case studies showcasing application development for businesses.
     2. Create comparison content for application costs and business value.

---

#### **What Does This Output Convey?**

This output organizes SEO insights into categories, showing:
- Important keywords/topics.
- Related terms to these topics.
- Frequency of keyword usage.
- The importance of these topics based on context.

It’s essentially a **roadmap for improving SEO** by identifying content gaps, user interests, and high-priority topics.

---

### **Steps for the Website Owner**

1. **Analyze the Data:**
   - Understand which topics are "high importance" and ensure they are adequately covered on the website.
   - For low-importance terms, decide if they should be prioritized based on business goals.

2. **Optimize Content:**
   - Use the related terms to build comprehensive content that targets user queries effectively.
   - Include these keywords in on-page SEO elements like titles, meta descriptions, headers, and image alt texts.

3. **Enhance User Experience:**
   - Improve navigation with internal linking between related topics (e.g., "software" linking to "bugs" or "applications").
   - Use tools like topic clusters to group related content.

4. **Address Content Gaps:**
   - If high-priority topics are missing or underdeveloped, create new pages or improve existing ones.
   - For example, if "debugging" is underexplored, dedicate a page to debugging tools or techniques.

5. **Monitor Performance:**
   - Use analytics tools to track keyword rankings and adjust content strategy as needed.
   - Regularly update content based on performance metrics and emerging trends.

---

### **How This Benefits SEO and Website Ranking**

1. **Targeted Content Creation:**
   - Helps you create content that aligns with user intent and search engine algorithms.
   - Boosts relevance and authority for prioritized keywords.

2. **Improved Search Visibility:**
   - Structured, keyword-rich content makes it easier for search engines to index and rank your site.

3. **Enhanced User Engagement:**
   - Relevant content increases user time on site, reducing bounce rates and improving SEO metrics.

4. **Authority Building:**
   - Addressing technical topics (e.g., debugging, testing) positions your site as an industry expert, increasing backlinks and domain authority.

---

### **Final Takeaway**

This output is a blueprint for SEO-driven content strategy. By leveraging this information:
- **You can build a stronger keyword presence.**
- **Improve user engagement and conversions.**
- **Climb higher in search engine rankings.**


## Part 2 Output

---
# **What Is the Ontological SEO with Topic Maps Model?**

This output organizes SEO data into **categories of topics and keywords**. It maps out related concepts and suggests how to optimize a website's structure and content to enhance visibility on search engines.

It serves two main purposes:
1. **Content Optimization**: Identifies keywords and their relationships to help the website owner create or refine content around these terms.
2. **Internal Linking Suggestions**: Proposes internal link structures between pages for better navigation and SEO performance.

---

### **Detailed Explanation of the Provided Output**

Let’s break it into smaller parts and understand what each part means:

#### 1. **Topic Clusters and Related Terms**
Each topic (e.g., "software," "testingis," "debugging") is listed with:
- **Related Terms**: Words closely associated with the topic.
- **Frequency**: How often the topic appears in the analyzed content.
- **Source URLs**: Pages where the topic or related terms are found.
- **Importance**: Indicates whether the topic is crucial for SEO strategy.

**Example: "software"**
- **Related Terms**: Includes "testing," "developers," "applications," etc., which provide context.
- **Frequency**: Appears 33 times, showing it’s highly relevant.
- **Importance**: High, meaning it should be prioritized in your SEO strategy.
- **Use Case**:
  - Create or enhance content focusing on "software" and related terms like "software testing" or "application development."
  - Ensure these keywords appear naturally in titles, meta tags, and headers.

#### 2. **Internal Linking Suggestions**
- Internal links connect related topics on your website.
- **Example**: Linking "app" to "apps" or "development" helps search engines and users navigate between related topics.

**Why is it beneficial?**
- **SEO**: Boosts page authority by creating relationships between content.
- **User Experience**: Makes it easier for visitors to find relevant information.

**Action Steps for Internal Linking**:
1. Identify pages related to suggested topics (e.g., "app").
2. Add hyperlinks between these pages, using anchor text containing keywords.

#### 3. **Ontological SEO Goals**
- **Topic Expansion**: Suggests exploring specific aspects of each topic.
- **Example**:
  - **Topic**: "loop"
  - **Suggested Expansion**: Explore how "loop" connects to "change" (e.g., feedback loops in systems).
  - **Action Step**: Write a blog explaining how feedback loops optimize business processes.

---

### **How to Use This Data**

1. **Content Optimization**
   - **Identify Gaps**: Topics with low frequency or importance may lack content coverage.
   - **Focus on High-Importance Topics**: Prioritize "software," "bugs," and "testers" with dedicated, in-depth content.
   - **Use Related Terms**: Include related terms naturally within the content.

2. **Internal Link Structure**
   - Follow the suggestions to interlink pages for better SEO performance.
   - **Example**:
     - Link a page on "software development" to pages on "testing" or "applications."

3. **Improve User Engagement**
   - Provide detailed guides, FAQs, or videos based on topics like "debugging" or "QA."
   - **Example**: A guide on "How to Debug Software Efficiently."

4. **Expand Existing Content**
   - Use the "expand content" suggestions to add depth.
   - **Example**:
     - For the topic "mobile," explore both "app" and "web" development.

5. **Monitor and Adjust**
   - Use analytics to measure the impact of changes.
   - Refine content and linking strategies based on user engagement and ranking improvements.





#### **Key Takeaways for Content Expansion**

1. **Why Expand Content?**
   - **Improves SEO**: Covers more relevant keywords and phrases.
   - **Attracts Traffic**: Answers user questions and improves relevance in search engine results.

2. **What Should the Website Owner Do?**
   - Focus on **useful and relevant suggestions**: Implement topics that align with the website's goals. For instance, if the website specializes in software, prioritize topics like "Mobile" or "Manipulation."
   - Skip less relevant topics: If a suggestion doesn’t match the website's offerings (e.g., "Making" with "Heart"), it can be ignored.

3. **Prioritization**:
   - **High-Relevance Topics**: Invest time in creating detailed, high-quality content.
   - **Low-Relevance Suggestions**: Review them but only implement if they fit into future plans.


---

#### **Key Takeaways for Internal Linking**

1. **Why Add Internal Links?**
   - Increases page **authority** by distributing link juice.
   - Guides users to relevant sections, improving engagement.

2. **What Should the Website Owner Do?**
   - **Analyze Suggestions**: Identify which internal links are most useful and relevant.
   - **Focus on Priority Links**: Not every suggestion needs to be implemented. Focus on links that add real value.

---

### Benefits of the Model

1. **Time Efficiency**:
   - The model analyzes vast amounts of data and gives actionable insights. It saves time by reducing the need for manual keyword research.

2. **Enhanced SEO**:
   - Boosts search rankings by aligning content and links with relevant keywords.

3. **User Engagement**:
   - A well-structured website with meaningful content and internal links keeps visitors on the site longer.

---

### Steps for the Website Owner After Getting This Output

1. **Analyze Suggestions**:
   - Review content and linking suggestions carefully.
   - Separate **high-priority topics** from less relevant ones.

2. **Create Content for Relevant Topics**:
   - Use the suggestions to expand existing pages or write new content.

3. **Implement Internal Links**:
   - Add links between related pages as suggested.

4. **Track Performance**:
   - Use tools like Google Analytics to monitor improvements in traffic and engagement.

5. **Iterate**:
   - Periodically revisit and refine content and linking strategies based on performance.

---

### Practical Tips for Prioritizing Suggestions

1. **Focus on the Most Relevant Suggestions**:
   - Not every topic or link will be worth implementing. The model is designed to provide a comprehensive list, but the website owner should focus on what aligns with their business.

2. **Iterative Implementation**:
   - Start with a few key topics or links and gradually expand. This ensures quality over quantity.


---

### **Benefits for the Website Owner**

1. **Better Search Rankings**
   - Optimized content and internal links improve visibility for prioritized keywords.
   - Increases the likelihood of appearing in relevant search results.

2. **Enhanced User Experience**
   - Users can navigate seamlessly between related topics, leading to longer sessions and lower bounce rates.

3. **Authority and Credibility**
   - In-depth content positions the site as an expert in its field, attracting backlinks and trust.

---

### **Key Takeaways for Implementation**

1. **Start with High-Importance Topics**:
   - Focus on "software," "bugs," "testers," etc., as they have high relevance and frequency.

2. **Follow Linking Suggestions**:
   - Create internal links between pages to build a network of related topics.

3. **Expand Underdeveloped Topics**:
   - Write detailed posts or create new pages for topics with lower frequency but potential value (e.g., "debugging" or "QA").

4. **Use Keywords Strategically**:
   - Place keywords in titles, headers, and meta descriptions without overstuffing.

5. **Regular Updates**:
   - Refresh content periodically based on new trends and analytics data.

---




## Part 3 Output

---
# **What Does the Output Mean?**

This output represents a **categorization and evaluation of website content topics**. It analyzes the current topics on the website, evaluates their relevance, and provides recommendations for improvement to enhance the website's **SEO** (Search Engine Optimization). It helps to align the website's content with search engine algorithms to improve its rankings and attract more visitors.

The output is divided into **four main sections**:

1. **Categories**: Lists the topics grouped into five categories.
2. **Evaluation**: Rates each category on relevance, coverage, and actionability.
3. **Recommendations**: Suggestions on how to improve content in each category.
4. **Actionable Insights for SEO**: Specific next steps to boost the website’s performance in search engine rankings.

---

### **Detailed Explanation of Each Section**

#### **1. Categories**

- The "Categories" section groups all topics into five categories:
  - **Category_1, Category_2, Category_0, Category_3, Category_4**
  
  Each category contains a list of topics relevant to the website. For example:
  
  - **Category_1 Topics**:
    Includes "home," "years," "picture," "user," "team," "calls," "pocket," "plans," "order," etc.
    - **Use Case**: These topics seem general and user-centric, covering themes like homepages, customer journey, user engagement, and team-related aspects.
  
  - **Category_2 Topics**:
    Includes "seo," "website," "page," "search," "analytics," "rankings," etc.
    - **Use Case**: Focused on SEO and digital marketing. Topics here relate to improving a website’s visibility and performance.

  - **Category_0 Topics**:
    Includes "algorithms," "efficiency," "process," "workflow," "system," etc.
    - **Use Case**: Technical or operational topics, possibly targeting optimization and backend processes.

  - **Category_3 Topics**:
    Includes "marketing," "business," "clients," "sales," "branding," etc.
    - **Use Case**: Business development and marketing strategies, focusing on client acquisition and branding.

  - **Category_4 Topics**:
    Includes "strategy," "content," "analysis," "keywords," "engagement," etc.
    - **Use Case**: Content-related topics, helping to target users with high-quality and engaging content.

---

#### **2. Evaluation**

The evaluation scores each category on three metrics:
- **Relevance**: How well the topics align with the website's goals and the audience's needs.
- **Coverage**: How comprehensively the topics cover their respective areas.
- **Actionability**: How easy it is to act on the topics to improve SEO.

Here’s the evaluation breakdown:
- **Category_1**:
  - Relevance: 0.0118 (Low)
  - Coverage: 0.0100 (Low)
  - Actionability: 0.8036 (High)
  - **Explanation**: These topics are general and user-centric but may not strongly impact SEO. However, they are actionable and easy to address.

- **Category_2**:
  - Relevance: 0.0192 (Low-Medium)
  - Coverage: 0.0152 (Low-Medium)
  - Actionability: 0.8021 (High)
  - **Explanation**: These are SEO-focused topics with moderate importance for relevance and coverage but are actionable.

- **Category_0**:
  - Relevance: 0.0307 (Medium-High)
  - Coverage: 0.0270 (Medium-High)
  - Actionability: 0.8166 (High)
  - **Explanation**: These topics have higher relevance and coverage scores, making them critical for technical SEO improvements.

- **Category_3**:
  - Relevance: 0.0481 (High)
  - Coverage: 0.0394 (High)
  - Actionability: 0.7833 (High)
  - **Explanation**: Business and marketing topics have the highest relevance and coverage. Acting on these can provide significant value.

- **Category_4**:
  - Relevance: 0.0165 (Low-Medium)
  - Coverage: 0.0138 (Low-Medium)
  - Actionability: 0.8122 (High)
  - **Explanation**: Content strategy and engagement topics are moderately relevant but still important for SEO.

---

#### **3. Recommendations**

This section suggests expanding content across all categories:
- **Category_1**: Expand general and user-centric topics to improve relevance and coverage.
- **Category_2**: Focus on SEO and marketing topics to improve performance.
- **Category_0**: Enhance technical and operational content for optimization.
- **Category_3**: Prioritize business and branding topics for growth.
- **Category_4**: Build out content strategies to enhance engagement and visibility.

**Why Expand Content?**
- **Improved SEO**: Search engines rank websites higher when content is detailed, relevant, and comprehensive.
- **Enhanced User Engagement**: Visitors are more likely to stay on a website with well-organized, meaningful content.
- **Better Topic Authority**: Covering more aspects of a topic makes a website an authority in its niche.

---

#### **4. Actionable Insights for SEO**

Here’s what the website owner should do next:

1. **Expand Content**:
   - Focus on high-priority topics in **Category_3 (Business and Marketing)** and **Category_0 (Technical SEO)**.
   - Example: For "branding," create content about "how branding improves customer loyalty."

2. **Internal Linking**:
   - Ensure relevant pages are linked to one another.
   - Example: Link pages about "SEO" to "rankings" or "backlinks."

3. **Optimize Existing Content**:
   - Update old pages with additional information from suggested topics.

4. **Track Metrics**:
   - Use tools like Google Analytics to monitor whether the new content improves traffic and engagement.

5. **Iterative Improvements**:
   - Periodically revisit the categories and add fresh content to stay relevant.

---

### **Why is This Output Beneficial?**

1. **Saves Time**:
   - The model analyzes data and provides actionable suggestions, reducing manual research efforts.

2. **Improves Rankings**:
   - By acting on the suggestions, the website becomes more aligned with search engine expectations.

3. **Enhances User Experience**:
   - Comprehensive content and internal linking make navigation easier for visitors.

---

### Final Thoughts

This output is a **roadmap for improving SEO** using a structured approach. The recommendations are comprehensive, but not all suggestions need to be implemented. Focus on:
1. **High-priority categories** (like business and technical SEO).
2. **Relevant and actionable topics** that align with the website's goals.


## Part 4 Output

---
# **What Does This Output Mean?**

This output includes **Metadata Recommendations** for various URLs of a website (`thatware.co`). Metadata refers to the information provided to search engines about a webpage, which helps improve SEO rankings and user engagement. The main components of the output are:

1. **Meta Titles**: The clickable title of a webpage displayed in search engine results.
2. **Meta Descriptions**: The brief summary displayed under the title in search engine results.
3. **Schema Markup**: A piece of code added to webpages to help search engines understand the content better.
4. **Recommendations**: Suggestions to optimize metadata for better SEO performance.

---

### **Detailed Breakdown of Each Part**

#### **1. Meta Titles**
- Example: `Meta Title: THATWARE - Revolutionizing SEO with Hyper-Intelligence`
- **Purpose**:
  - This is the headline users see in search engine results.
  - A good meta title should be concise, relevant, and include keywords to attract clicks.
- **Action for Website Owner**:
  - Ensure meta titles are unique and include targeted keywords.
  - Use a maximum of 60 characters for better display across all devices.

#### **2. Meta Descriptions**
- Example:  
  ```
  Meta Description: Thatware AI SEO has pioneered an impressive portfolio of 927 unique AI SEO algorithms...
  ```
- **Purpose**:
  - A short description displayed under the meta title.
  - Helps users understand the page content before clicking.
- **Action for Website Owner**:
  - Include action-oriented language and keywords.
  - Keep it within 155-160 characters to avoid truncation in search results.

#### **3. Schema Markup**
- Example:
  ```json
  {
    "@context": "https://schema.org",
    "@type": "WebPage",
    "url": "https://thatware.co/",
    "name": "THATWARE - Revolutionizing SEO with Hyper-Intelligence",
    "description": "Thatware AI SEO has pioneered an impressive portfolio...",
    "author": {
      "@type": "Organization",
      "name": "Thatware.co"
    }
  }
  ```
- **What Is Schema Markup?**
  - It's a structured format added to a webpage's HTML code.
  - Helps search engines (like Google) better understand the page content.
  - Can enhance how your webpage appears in search results (e.g., rich snippets, FAQs, ratings).

- **Why Is Schema Markup Important for SEO?**
  - Boosts visibility in search results.
  - Increases click-through rates (CTR) by providing more engaging search result entries.
  - Helps with **voice search optimization** and local SEO.

- **Action for Website Owner**:
  - Add this code snippet to the HTML `<head>` section of each webpage.
  - Use tools like [Google Structured Data Testing Tool](https://search.google.com/structured-data/testing-tool/u/0/) to validate schema markup.

---

### **4. Recommendations**

- Example:  
  ```
  Recommendations: {'Add Testimonials': 'Include testimonials to build credibility.', 'Optimize Keywords': 'Add these keywords: SEO, services.'}
  ```
- **Explanation**:
  - **Add Testimonials**: User reviews or client testimonials build trust and improve credibility, leading to better user engagement and rankings.
  - **Optimize Keywords**: Incorporate suggested keywords into titles, descriptions, and content to target relevant search queries.

- **Action for Website Owner**:
  - Add testimonial sections to relevant pages.
  - Conduct keyword research to ensure the suggested keywords align with user intent.
  - Update metadata and on-page content to include the recommended keywords.

---

### **Key Benefits of Implementing These Recommendations**

1. **Improved Search Engine Rankings**:
   - By optimizing titles, descriptions, and schema markup, search engines can better index and rank your pages.

2. **Increased Click-Through Rates (CTR)**:
   - Clear and compelling metadata attracts users to click on your website over competitors.

3. **Better User Experience**:
   - Well-organized content with schema markup helps users find relevant information quickly.

4. **Enhanced Credibility**:
   - Adding testimonials builds trust, encouraging users to engage with your site.

---

### **Steps for Implementation**

1. **Meta Data Updates**:
   - Use a CMS (like WordPress or Drupal) to update meta titles and descriptions.
   - Ensure each page has a unique title and description.

2. **Implement Schema Markup**:
   - Copy the schema code provided for each page.
   - Paste it into the `<head>` section of your HTML file.

3. **Optimize Keywords**:
   - Use the recommended keywords naturally within your content.
   - Avoid keyword stuffing, as it can lead to penalties from search engines.

4. **Add Testimonials**:
   - Collect testimonials from clients and showcase them on relevant pages.
   - Ensure testimonials are authentic and specific to your services.

5. **Validate Changes**:
   - Use tools like Google Search Console and Schema Markup Validator to verify changes.

---

### **Additional Insights**

1. **Why Some Recommendations May Not Be Relevant**:
   - Not all recommendations will apply to your specific business goals.
   - Focus on changes that provide the most impact (e.g., improving SEO-focused pages like "Advanced SEO Services" or "Digital Marketing Services").

2. **Iterative Approach**:
   - SEO is a continuous process. Regularly review and refine metadata based on performance metrics.

3. **Collaboration with Developers**:
   - Implementing schema markup and validating changes may require help from a web developer.

---

### Final Thoughts

This output serves as a blueprint to improve your website’s SEO and user engagement. By systematically updating metadata, adding schema markup, and incorporating user-focused elements like testimonials, you can achieve:
- Higher search rankings.
- Better visibility.
- Improved website traffic and conversions.
