# Hot Readme Generator (WIP)

<div style="display:flex; align-items:center; padding: 50px;">
<p style="margin-right:10px;">
    <img height="200px" style="width:auto;" width="200px" src="https://avatars.githubusercontent.com/u/192148546?s=400&u=95d76fbb02e6c09671d87c9155f17ca1e4ef8f21&v=4"> 
</p>
<p style="margin-right:10px;">
    <img height="200px" style="width:auto;" width="200px" src="https://languageicon.org/language-icon.png"> 
</p>

</div>

## Description:

This app automates the generation of structured README files by creating a table of contents, integrating badges, and providing customizable configurations for headers, sections, and taglines. It uses a large language model (LLM) to tailor the content to specific project needs and ensures optimal formatting and professionalism.




## Step 1: Setup and Installation

The first step in any project is setting up the environment. Here, we install the necessary requirements and check if the environment variables are properly configured.

#### Explanation:

- **install_requirements()**: This function ensures that all required Python packages are installed by attempting to install the packages listed in the `requirements.txt`. If the installation fails, it retries up to three times to ensure success.

- **setup_env()**: This function loads environment variables from a `.env` file using the `load_dotenv()` method. It then checks if the required environment variables, such as `OPENAI_API_KEY`, are properly set. If any required variables are missing, the code prompts the user to set them.

#### Key Libraries:

- **dotenv**: Loads environment variables from a `.env` file to configure settings for the project.

- **os**: Allows access to system functions and environment variables, helping the code to verify and load required configurations.


In [None]:
# Boilerplate: This block goes into every notebook.
# It sets up the environment, installs the requirements, and checks for the required environment variables.

from IPython.display import clear_output
from dotenv import load_dotenv
import os

requirements_installed = False
max_retries = 3
retries = 0
REQUIRED_ENV_VARS = ["OPENAI_API_KEY"]


def install_requirements():
    """Installs the requirements from requirements.txt file"""
    global requirements_installed
    if requirements_installed:
        print("Requirements already installed.")
        return

    print("Installing requirements...")
    install_status = os.system("pip install -r requirements.txt")
    if install_status == 0:
        print("Requirements installed successfully.")
        requirements_installed = True
    else:
        print("Failed to install requirements.")
        if retries < max_retries:
            print("Retrying...")
            retries += 1
            return install_requirements()
        exit(1)
    return


def setup_env():
    """Sets up the environment variables"""

    def check_env(env_var):
        value = os.getenv(env_var)
        if value is None:
            print(f"Please set the {env_var} environment variable.")
            exit(1)
        else:
            print(f"{env_var} is set.")

    load_dotenv()

    variables_to_check = REQUIRED_ENV_VARS

    for var in variables_to_check:
        check_env(var)


install_requirements()
clear_output()
setup_env()
print("🚀 Setup complete. Continue to the next cell.")

## Step 2: Understanding the EasyLLM Class and Its Methods

### 1. Import Statements

- **os**: Allows interaction with the operating system, mainly used here for environment variables.

- **openai**: The OpenAI Python client library, used to interact with OpenAI's GPT models.

- **BaseModel from pydantic**: This is used to define data models with validation, making it easier to work with structured data.

- **traceback**: Helps with printing detailed error stack traces for debugging.

- **Union from typing**: A way to specify that a value could be one of several types.

- **json**: Used to parse and serialize JSON data.

- **re**: Provides regular expression matching operations, used here to estimate tokens.


### 2. Constants Definition

- **DEFAULT_OPENAI_MODEL**: Specifies the default OpenAI model to use (gpt-4o-mini).

- **DEFAULT_SYSTEM_PROMPT**: The default system message that instructs the AI model on how to behave during a conversation.

- **DEFAULT_TEMPERATURE**: Controls the randomness of the AI’s output (higher values make the output more random).

- **DEFAULT_MAX_TOKENS**: Sets the maximum number of tokens (words or pieces of words) the model can generate in response.


### 3. EasyLLM Class

This is the main class that wraps the OpenAI API, simplifying its usage. The class provides methods to generate text and objects, interact with OpenAI’s models, and handle common tasks like estimating token usage.

### 4. `__init__()` Method (Constructor)

- **api_key**: The API key used to authenticate with OpenAI, fetched from environment variables (os.getenv).

- **model**: The OpenAI model to use, with a default set to gpt-4o-mini.

- **verbose**: Flag to control the level of logging output.

- **debug**: Flag to enable or disable debugging information.


This constructor initializes the class by setting up the OpenAI client and model, as well as handling logging based on the verbose and debug flags.

**Key steps:**

- Initializes the openai object using the provided api_key.
- Sets the model to the provided value (or the default).
- Prints log messages if verbose is True.

### 5. `generate_text()` Method

- **prompt**: The input text prompt to send to the model.

- **system**: The system message that guides the AI's behavior (defaults to the predefined prompt).

- **temperature**: Controls the randomness of the model’s response.

- **max_tokens**: Specifies the maximum number of tokens to generate in the response.


This method sends a request to OpenAI's API to generate a text response based on the provided prompt and settings.

**Key Steps:**

- Constructs a params dictionary with the input settings and converts it to JSON format if debug is enabled.

- Calls OpenAI’s chat.completions.create() function to generate a response based on the provided parameters.

- Returns the response text if successful, otherwise, it returns None if an error occurs.


### 6. `generate_object()` Method

- **prompt**: The input prompt to be processed by the model.

- **response_model**: A Pydantic model used to parse and validate the response.

- **system, temperature, max_tokens**: Same as in generate_text.


This method is similar to generate_text() but returns the response in the form of a structured Python object based on the provided response_model.

**Key Steps:**

- Similar to generate_text(), it constructs the request and calls OpenAI's API.

- Parses the response using response_model, which is a Pydantic model that validates the data.

- Returns the parsed response or None in case of an error.


### 7. `get_model()` Method

This method simply returns the name of the current OpenAI model being used.

### 8. `set_model()` Method

- **model**: The new model to be used.

This method allows you to change the model being used by the EasyLLM instance. It reinitializes the OpenAI client with the new model and updates the internal model attribute.

**Key Steps:**

- Attempts to set the new model and handle errors if any.

- Prints logs if verbose or debug is enabled.


### 9. `estimate_tokens()` Method

- **text**: The input text whose tokens need to be estimated.

This method estimates the number of tokens in a given string of text. It uses regular expressions to count the tokens by splitting the text at spaces and punctuation.

**Key Steps:**

- The `re.findall(r"\S+", text)` function matches all non-whitespace sequences, effectively counting tokens in the text.

- Returns the number of tokens found.

### Summary:

This EasyLLM class is a Python wrapper around OpenAI's GPT API that simplifies interactions with the model. It includes methods for generating text, handling responses as objects, managing model selection, and estimating token usage.

By using EasyLLM, you can easily interact with OpenAI's GPT models without needing to manually handle API requests, set up system prompts, or process model responses.


In [None]:
import os
import openai
from pydantic import BaseModel
import traceback
from typing import Union
import json
import re

DEFAULT_OPENAI_MODEL = "gpt-4o-mini"
DEFAULT_SYSTEM_PROMPT = "You are an intelligent AI assistant. The user will give you a prompt, respond appropriately."
DEFAULT_TEMPERATURE = 0.5
DEFAULT_MAX_TOKENS = 1024


class EasyLLM:
    """
    A simple abstraction for the OpenAI API. It provides easy-to-use methods to generate text and objects using the OpenAI API.
    A demonstration for the "How to build an Abstaction with Open AI API" blog post.
    Author: Aditya Patange (AdiPat)
    """

    def __init__(
        self,
        api_key=os.getenv("OPENAI_API_KEY"),
        model=DEFAULT_OPENAI_MODEL,
        verbose=True,
        debug=True,
    ):
        self.verbose = verbose
        self.debug = debug

        if self.verbose:
            print("EasyLLM: Powering up! 🚀")

        self.api_key = api_key
        self.openai = openai.OpenAI(api_key=api_key)
        self.model = model

        if self.verbose:
            print(f"EasyLLM: Model set to {model}.")
            print("EasyLLM: Ready for some Generative AI action! ⚡️")

    def generate_text(
        self,
        prompt: str,
        system=DEFAULT_SYSTEM_PROMPT,
        temperature=DEFAULT_TEMPERATURE,
        max_tokens=DEFAULT_MAX_TOKENS,
    ) -> Union[str, None]:
        """Generates text using the OpenAI API."""
        try:
            if self.verbose or self.debug:
                print(f"Generating text for prompt: {prompt}")

            if self.debug:
                params = {
                    "prompt": prompt,
                    "system": system,
                    "temperature": temperature,
                    "max_tokens": max_tokens,
                    "model": self.model,
                }
                params = json.dumps(params, indent=2)
                print(f"Params: {params}")
            response = self.openai.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": system},
                    {"role": "user", "content": prompt},
                ],
                temperature=temperature,
                max_tokens=max_tokens,
            )

            response = response.choices[0].message.content

            if self.verbose or self.debug:
                print("Text generated successfully. 🎉")

            if self.debug:
                response = json.dumps(response)
                print(f"EasyLLM Response: {response}")
            return response
        except Exception as e:
            print(f"Failed to generate text. Error: {str(e)}")
            if self.debug:
                traceback.print_exc()
            return None

    def generate_object(
        self,
        prompt: str,
        response_model: BaseModel,
        system=DEFAULT_SYSTEM_PROMPT,
        temperature=DEFAULT_TEMPERATURE,
        max_tokens=DEFAULT_MAX_TOKENS,
    ) -> Union[BaseModel, None]:
        """Generates an object using the OpenAI API and given response model."""
        try:
            if self.verbose or self.debug:
                print(f"Generating object for prompt: {prompt}")

            if self.debug:
                params = {
                    "prompt": prompt,
                    "system": system,
                    "temperature": temperature,
                    "max_tokens": max_tokens,
                    "model": self.model,
                }
                params = json.dumps(params, indent=2)
                print(f"Params: {params}")

            response = self.openai.beta.chat.completions.parse(
                messages=[
                    {"role": "system", "content": system},
                    {"role": "user", "content": prompt},
                ],
                response_format=response_model,
                model=self.model,
                temperature=temperature,
                max_tokens=max_tokens,
            )

            if self.verbose or self.debug:
                print("Object generated successfully. 🎉")

            if self.debug:
                response_json = response.model_dump_json()
                print(f"EasyLLM Response: {response_json}")
            return response.choices[0].message.parsed
        except Exception as e:
            print(f"Failed to generate object. Error: {str(e)}")
            if self.debug:
                traceback.print_exc()
            return None

    def get_model(self) -> str:
        """Gets the current model."""
        return self.model

    def set_model(self, model: str) -> None:
        """Sets the model to the given model."""
        try:
            if self.verbose or self.debug:
                print(f"Setting model to {model}")
            self.openai = openai.OpenAI(api_key=self.api_key)
            self.model = model
            if self.verbose or self.debug:
                print(f"Model set to {model}")
        except Exception as e:
            print(f"Failed to set model.\nError: {str(e)}")
            if self.debug:
                traceback.print_exc()
            return None

    def estimate_tokens(self, text: str) -> int:
        """
        Estimate the number of tokens in a given string.

        Args:
            text (str): The input string.

        Returns:
            int: Estimated token count.
        """
        # Split text by whitespace and count tokens, including punctuation
        tokens = re.findall(r"\S+", text)
        return len(tokens)

## Step 3: Define Classes, Models, and Functions for README Generation

### 1. Classes and Models

#### Project Class:

The Project class is defined using Pydantic's BaseModel. This model is used to structure project-related data like:

- repo_name: The repository name of the project.

- title: The title of the project.

- description: A short description of the project.

- contributors: A list of contributors to the project.

- references: External references for the project (e.g., articles, research papers).

- tech_stack: A list of technologies used in the project.

- tags: Keywords associated with the project.

- license: The license type of the project.

### Sections Class:

The Sections class, also based on Pydantic, represents a list of sections in a README file:

- sections: A list of strings representing the different sections of the README.

### 2. Functions and Their Purpose

#### `extract_badges_from_markdown`

**Purpose**: This function extracts badges and their URLs from the given markdown content.

**Arguments**:  
- markdown_text (str): The markdown content (typically from a README file) containing badges.

**Returns**:  
- A dictionary where the keys are badge names and the values are URLs.

**How it works**:  
The function uses a regular expression to find all badge links in the markdown text. It matches the pattern of badges formatted as ![badge-name](url).  

The function then returns a dictionary with badge names as keys and corresponding URLs as values.

#### `get_badges`

**Purpose**: Fetches the badges markdown from an online source.

**Arguments**:  
- None

**Returns**:  
- A dictionary of badge names and their URLs.

**How it works**:  
The function fetches markdown content from a public GitHub repository that contains badges.  

The `extract_badges_from_markdown` function is then called to extract badges and their URLs from this markdown content.

#### `get_table_of_contents`

**Purpose**: Generates the table of contents (ToC) for a project's README.

**Arguments**:  
- llm (EasyLLM): An instance of the EasyLLM class for generating sections dynamically.

- project (Project): An instance of the Project class containing project details.

**Returns**:  
- A list of sections that should be included in the README's table of contents.

**How it works**:  
The function starts by defining mandatory sections (e.g., "About", "Problem Statement", "License") that must be included in every README.  

It then generates new, unique sections for the README by prompting the EasyLLM class to generate content based on the project details.  

The generated sections are added to the table of contents, ensuring no duplication of mandatory sections.

#### `get_readme_generator_config`

**Purpose**: Returns the configuration for generating a README with specific visual sections.

**Arguments**:  
- title (str): The title of the project.

**Returns**:  
- A dictionary containing the configuration for generating the README header and table of contents.

**How it works**:  
This function defines a README_GENERATOR_CONFIG dictionary containing prompts for generating:  

- A project header section with a title, badges, a message to encourage stars on GitHub, and a marketing tagline.

- A table of contents with mandatory sections like "About", "License", "Installation", and more.

The function returns this configuration to guide the README generation process.

### 3. Integration and Workflow

The entire setup leverages the EasyLLM instance to dynamically generate parts of the README (like the table of contents and specific sections) based on the project details.

### How it all ties together:

- Badges are fetched and extracted from an external markdown source (via the `get_badges` function).

- Table of contents for the README is dynamically generated by calling the `get_table_of_contents` function, which relies on the EasyLLM instance to add relevant sections specific to the project.

- README generation configuration is prepared using `get_readme_generator_config`, including the title and badges.

The code provides a structured and modular approach for generating well-documented, rich README files that are customized based on project details and contributions.


In [None]:
from typing import Dict, Any
import requests
import re
from pydantic import BaseModel
from typing import List


class Project(BaseModel):
    repo_name: str
    title: str
    description: str
    contributors: list[str]
    references: list[str]
    tech_stack: list[str]
    tags: list[str]
    license: str


class Sections(BaseModel):
    sections: list[str]


def extract_badges_from_markdown(markdown_text: str) -> Dict[str, str]:
    """
    Extract badges and their URLs from the given markdown text.

    Args:
        markdown_text (str): The markdown content containing badges.

    Returns:
        dict: A dictionary with badge names as keys and URLs as values.
    """
    pattern = r"\[!\[([^\]]+)\]\((https?://[^\)]+)\)\]"
    matches = re.findall(pattern, markdown_text)
    return {badge: url for badge, url in matches}


def get_badges() -> Dict[str, str]:
    """
    Get the markdown content of the README file containing badges.

    Args:
        None

    Returns:
        dict: A dictionary with badge names as keys and URLs as values.

    """
    url = (
        "https://raw.githubusercontent.com/inttter/md-badges/refs/heads/main/README.md"
    )
    response = requests.get(url)
    badges_markdown = response.text
    return extract_badges_from_markdown(badges_markdown)


def get_table_of_contents(llm: EasyLLM, project: Project) -> List[str]:
    """
    Generates the table of contents for the README.

    Args:
        llm (EasyLLM): The EasyLLM instance.
        project (Project): The Project instance.

    Returns:
        list: A list of sections for the table of contents.
    """
    mandatory_sections_first_half = [
        "About",
        "Problem Statement",
        "Research Areas",
        "Installation",
        "Usage",
    ]
    mandatory_sections_second_half = [
        "Contributing",
        "License",
        "Acknowledgements",
        "Authors",
        "References",
    ]
    table_of_contents = mandatory_sections_first_half

    prompt = f"""
        Given the table of contents for the GitHub README of Project: '{project.title}' with the following sections:
        Mandatory Sections: {json.dumps(mandatory_sections_first_half + mandatory_sections_second_half)}
        These sections should be linked to the respective sections in the README.
        These are mandatory sections for a good README.
        You can add more sections if you want as per the project requirements.
        For instance, a web scraping project would have "AI & Web Scraping" as a separate section. 
        Then, one of our projects, knowledge-grapher would have "Knowledge Graphs in Practice" and "How Google uses Knowledge Graphs at scale" as a separate section.

        Generate any 3 new sections unique, and specific to the project.
        Make sure you don't repeat the same sections and don't include the mandatory sections again.

        Project Details: '''{project.model_dump_json()}''''
    """
    sections = llm.generate_object(prompt, Sections)

    if sections and len(sections.sections) > 0:
        for section in sections.sections:
            if section not in table_of_contents:
                table_of_contents.append(section)

    for section in mandatory_sections_second_half:
        if section not in table_of_contents:
            table_of_contents.append(section)

    return table_of_contents


def get_readme_generator_config(title: str) -> Dict[str, Any]:
    README_GENERATOR_CONFIG = {
        "visual_config": {
            "header": {
                "prompt": f"""
                Generate a good header section for the README
                1. TITLE: It should contain the title of the project with an emoji. 
                Example: `# 🚀 My Awesome Project`
                Be creative with the emoji and title and try to ensure that the title is relevant to the project and README.
                TITLE: {title}

                2. BADGES: After the title add badges.
                Available BadgesL {json.dumps(get_badges())}   

                3. Message to star us on GitHub.
                Example: Please give us a ⭐ on GitHub if this project helped you!

                4. Marketing Tagline: Add a marketing tagline for the project.
                Example: #### Simplify your ETL pipelines with LitETL 🔥 — the lightweight ETL framework!
                """
            },
            "table_of_contents": {
                "prompt": """
                Generate a table of contents for the README.
                This should include "About", "Problem Statement", "Research Areas", "Installation", "Usage", "Contributing", "License", "Acknowledgements", "Authors", "References".
                These sections should be linked to the respective sections in the README.
                These are mandatory sections for a good README.
                You can add more sections if you want as per the project requirements.
                For instance, a web scraping project would have "AI & Web Scraping" as a separate section. 
                Then, one of our projects, knowledge-grapher would have "Knowledge Graphs in Practice" and "How Google uses Knowledge Graphs at scale" as a separate section.
                """
            },
        }
    }
    return README_GENERATOR_CONFIG

## Step 4: Extract and Process Badges for README

### 1. Importing Dependencies

- `import json`: The json module is used to handle JSON data, specifically for converting the badges dictionary into a well-formatted JSON string.

### 2. Creating an Instance of EasyLLM

- `llm = EasyLLM()`: An instance of the EasyLLM class is created, which will be used to estimate the number of tokens for the badges JSON data.

### 3. Getting Badges

- `badges = get_badges()`: This calls the `get_badges()` function defined earlier. It fetches the badge data from a specific markdown file and returns it as a dictionary, where each key is a badge name and each value is the corresponding URL.

### 4. Converting Badges to JSON

- `badges_json = json.dumps(badges, indent=2)`: The `json.dumps()` function converts the badges dictionary into a well-formatted JSON string. The `indent=2` argument ensures that the JSON is neatly indented for better readability.

### 5. Displaying Badge Count

- `print(f"Badges found in the README: {len(badges)}")`: This prints the number of badges extracted from the README file by displaying the length of the badges dictionary.

### 6. Estimating Tokens

- `print(f"Tokens: {llm.estimate_tokens(badges_json)}")`: The `estimate_tokens()` method of the EasyLLM instance is called to estimate how many tokens the badges JSON will consume in the context window of the language model. The result is printed.

### 7. Token Estimation

- The number of tokens represents the amount of processing the model needs to handle the data. Estimating tokens helps to avoid exceeding the model's context window, which could lead to truncation or errors.

### 8. Printing the Badges in JSON Format

- `print(badges_json)`: This prints the formatted JSON representation of the badges, allowing you to visually inspect the extracted badge data and their URLs.

### **Expected Output**

- **Badge Count**: The number of badges found in the README file will be displayed.

- **Token Count**: The estimated number of tokens required to process the badges JSON data will be printed.

- **Badges in JSON Format**: The badges, along with their respective URLs, will be displayed as a JSON string.

### **Example Output**

```plaintext
Badges found in the README: 5
Tokens: 120
{
  "Build Status": "https://img.shields.io/badge/Build_Status-Passing-brightgreen",
  
  "License": "https://img.shields.io/badge/License-MIT-blue",
  
  "Version": "https://img.shields.io/badge/Version-1.0.0-blue",
  
  "Docs": "https://img.shields.io/badge/Docs-Available-brightgreen",
  
  "Downloads": "https://img.shields.io/badge/Downloads-1000%2B-orange"
}



In [None]:
import json

llm = EasyLLM()
badges = get_badges()
badges_json = json.dumps(badges, indent=2)


print(f"Badges found in the README: {len(badges)}")
# For estimating if it will fit in the LLM context window
print(f"Tokens: {llm.estimate_tokens(badges_json)}")
print(badges_json)

 ## Step 5: Generate Table of Contents for README

### 1. Importing clear_output from IPython

- `from IPython.display import clear_output`: This import allows clearing the output of the Jupyter notebook cell. It's useful for cleaning up the console output before displaying new results.

### 2. Generating Table of Contents

- `table_of_contents = get_table_of_contents(...)`: The `get_table_of_contents()` function is called with the following parameters:

  - `llm (EasyLLM instance)`: An instance of the EasyLLM class that helps generate sections for the ToC based on the project details.

  - `Project`: A Project object created using the Project class (defined earlier). The Project is initialized with sample data such as the repository name, title, description, contributors, references, tech stack, tags, and license. This object is passed to the `get_table_of_contents()` function to help generate relevant sections for the table of contents.

  - The `get_table_of_contents()` function uses this data to generate sections dynamically, considering both mandatory sections and additional sections specific to the project.

### 3. Clearing Output

- `clear_output()`: This function clears the output of the current Jupyter notebook cell. It is useful for resetting the output display before showing new or updated information, making the output look cleaner.

### 4. Printing the Table of Contents

- `print(f"Table of Contents: {table_of_contents}")`: After generating the table of contents, this line prints the resulting list of sections that will be included in the README file. These sections are typically linked in the document to facilitate navigation.

### **Expected Output**

- The code will generate a table of contents (ToC) that includes a combination of mandatory and project-specific sections. The output should look something like this:

```plaintext
Table of Contents: ['About', 'Problem Statement', 'Research Areas', 'Installation', 'Usage', 'Contributing', 'License', 'Acknowledgements', 'Authors', 'References', 'Additional Section 1', 'Additional Section 2', 'Additional Section 3']


In [None]:
from IPython.display import clear_output

table_of_contents = get_table_of_contents(
    llm,
    Project(
        repo_name="test",
        title="Test",
        description="Test",
        contributors=["Adi"],
        references=["Ref"],
        tech_stack=["Python"],
        tags=["Test"],
        license="MIT",
    ),
)

clear_output()

print(f"Table of Contents: {table_of_contents}")

## Step 6: Implementing the EasyLLM Class

### Imports and Constants

- ``import os, import openai, import traceback, import json, and import re``: These are necessary imports to handle environment variables, make requests to the OpenAI API, handle exceptions, process JSON data, and work with regular expressions.

### DEFAULT Constants:

- ``DEFAULT_OPENAI_MODEL``: Specifies the default model to use with the OpenAI API, in this case, "gpt-4o-mini".

- ``DEFAULT_SYSTEM_PROMPT``: The system message that gives context to the AI model, which helps it understand the task. Here, it's set to guide the AI to behave as an intelligent assistant.

- ``DEFAULT_TEMPERATURE``: Controls the randomness of the AI's responses. A temperature of 0.5 means it will generate moderately creative responses.

- ``DEFAULT_MAX_TOKENS``: Sets the maximum number of tokens (words, punctuation, etc.) that the model will generate in its response, capped at 1024 tokens.

### EasyLLM Class

This class serves as an abstraction over the OpenAI API, making it easier to interact with the API and handle different tasks like generating text and estimating tokens.

### Constructor (__init__)

The constructor initializes the EasyLLM instance with several parameters, including the API key, model, and debug/verbose flags.

- ``api_key=os.getenv("OPENAI_API_KEY")``: Retrieves the OpenAI API key from the environment variable OPENAI_API_KEY.

- ``self.openai = openai.OpenAI(api_key=api_key)``: Initializes the OpenAI object with the API key.

- If ``verbose=True``, it prints messages to let the user know that the class is ready for use.

### `generate_text` Method

This method is responsible for generating text using the OpenAI API based on a prompt provided by the user.

### Parameters:

- ``prompt``: The user’s input prompt that the model will respond to.

- ``system``: A system prompt that sets the context for the model (default is ``DEFAULT_SYSTEM_PROMPT``).

- ``temperature``: Determines the randomness of the response (default is 0.5).

- ``max_tokens``: Maximum number of tokens for the model's response (default is 1024).

### Flow:

- First, it checks if ``verbose`` or ``debug`` is enabled to print the details of the prompt and parameters.

- The ``openai.chat.completions.create()`` method is called to send the prompt to the API and generate a response.

- The response content is extracted and returned.

- If an error occurs, it catches the exception and prints an error message, as well as a traceback if debugging is enabled.

### `generate_object` Method

This method generates an object using the OpenAI API, but instead of raw text, it returns a parsed object according to a response model defined by the user.

### Parameters:

- ``prompt``: The input prompt provided by the user.

- ``response_model``: A Pydantic BaseModel class that defines the structure of the response object.

- Other parameters like ``system``, ``temperature``, and ``max_tokens`` work similarly to the ``generate_text`` method.

### Flow:

- Similar to ``generate_text``, it sends the prompt to the OpenAI API.

- The response is parsed into the defined Pydantic model and returned.

- If an error occurs, it prints the error and traceback if debugging is enabled.

### `get_model` and `set_model` Methods

- `get_model`: Returns the currently set model.

- `set_model`: Allows changing the model to a new one, using the ``model`` parameter. The OpenAI object is reinitialized with the new model, and the updated model is stored.

### `estimate_tokens` Method

This method estimates the number of tokens in a given text string. Tokens are units that the model processes, including words and punctuation.

### Flow:

- The function uses the regular expression ``re.findall(r"\S+", text)`` to find non-whitespace characters, treating them as tokens.

- It returns the number of tokens in the input text.

# General Overview of the Flow

1. ``Initialization (__init__)``: Sets up the instance with API keys and model configuration.

2. ``Text Generation (generate_text)``: Generates text responses from the AI based on a prompt.

3. ``Object Generation (generate_object)``: Similar to ``generate_text``, but parses the AI’s response into a structured object.

4. ``Model Management (get_model, set_model)``: Allows getting or changing the AI model.

5. ``Token Estimation (estimate_tokens)``: Estimates the number of tokens in a given string, which helps with understanding input/output limitations in the OpenAI context.


In [49]:
import os
import openai
from pydantic import BaseModel
import traceback
from typing import Union
import json
import re

DEFAULT_OPENAI_MODEL = "gpt-4o-mini"
DEFAULT_SYSTEM_PROMPT = "You are an intelligent AI assistant. The user will give you a prompt, respond appropriately."
DEFAULT_TEMPERATURE = 0.5
DEFAULT_MAX_TOKENS = 1024


class EasyLLM:
    """
    A simple abstraction for the OpenAI API. It provides easy-to-use methods to generate text and objects using the OpenAI API.
    A demonstration for the "How to build an Abstaction with Open AI API" blog post.
    Author: Aditya Patange (AdiPat)
    """

    def __init__(
        self,
        api_key=os.getenv("OPENAI_API_KEY"),
        model=DEFAULT_OPENAI_MODEL,
        verbose=True,
        debug=True,
    ):
        self.verbose = verbose
        self.debug = debug

        if self.verbose:
            print("EasyLLM: Powering up! 🚀")

        self.api_key = api_key
        self.openai = openai.OpenAI(api_key=api_key)
        self.model = model

        if self.verbose:
            print(f"EasyLLM: Model set to {model}.")
            print("EasyLLM: Ready for some Generative AI action! ⚡️")

    def generate_text(
        self,
        prompt: str,
        system=DEFAULT_SYSTEM_PROMPT,
        temperature=DEFAULT_TEMPERATURE,
        max_tokens=DEFAULT_MAX_TOKENS,
    ) -> Union[str, None]:
        """Generates text using the OpenAI API."""
        try:
            if self.verbose or self.debug:
                print(f"Generating text for prompt: {prompt}")

            if self.debug:
                params = {
                    "prompt": prompt,
                    "system": system,
                    "temperature": temperature,
                    "max_tokens": max_tokens,
                    "model": self.model,
                }
                params = json.dumps(params, indent=2)
                print(f"Params: {params}")
            response = self.openai.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": system},
                    {"role": "user", "content": prompt},
                ],
                temperature=temperature,
                max_tokens=max_tokens,
            )

            response = response.choices[0].message.content

            if self.verbose or self.debug:
                print("Text generated successfully. 🎉")

            if self.debug:
                response = json.dumps(response)
                print(f"EasyLLM Response: {response}")
            return response
        except Exception as e:
            print(f"Failed to generate text. Error: {str(e)}")
            if self.debug:
                traceback.print_exc()
            return None

    def generate_object(
        self,
        prompt: str,
        response_model: BaseModel,
        system=DEFAULT_SYSTEM_PROMPT,
        temperature=DEFAULT_TEMPERATURE,
        max_tokens=DEFAULT_MAX_TOKENS,
    ) -> Union[BaseModel, None]:
        """Generates an object using the OpenAI API and given response model."""
        try:
            if self.verbose or self.debug:
                print(f"Generating object for prompt: {prompt}")

            if self.debug:
                params = {
                    "prompt": prompt,
                    "system": system,
                    "temperature": temperature,
                    "max_tokens": max_tokens,
                    "model": self.model,
                }
                params = json.dumps(params, indent=2)
                print(f"Params: {params}")

            response = self.openai.beta.chat.completions.parse(
                messages=[
                    {"role": "system", "content": system},
                    {"role": "user", "content": prompt},
                ],
                response_format=response_model,
                model=self.model,
                temperature=temperature,
                max_tokens=max_tokens,
            )

            if self.verbose or self.debug:
                print("Object generated successfully. 🎉")

            if self.debug:
                response_json = response.model_dump_json()
                print(f"EasyLLM Response: {response_json}")
            return response.choices[0].message.parsed
        except Exception as e:
            print(f"Failed to generate object. Error: {str(e)}")
            if self.debug:
                traceback.print_exc()
            return None

    def get_model(self) -> str:
        """Gets the current model."""
        return self.model

    def set_model(self, model: str) -> None:
        """Sets the model to the given model."""
        try:
            if self.verbose or self.debug:
                print(f"Setting model to {model}")
            self.openai = openai.OpenAI(api_key=self.api_key)
            self.model = model
            if self.verbose or self.debug:
                print(f"Model set to {model}")
        except Exception as e:
            print(f"Failed to set model.\nError: {str(e)}")
            if self.debug:
                traceback.print_exc()
            return None

    def estimate_tokens(self, text: str) -> int:
        """
        Estimate the number of tokens in a given string.

        Args:
            text (str): The input string.

        Returns:
            int: Estimated token count.
        """
        # Split text by whitespace and count tokens, including punctuation
        tokens = re.findall(r"\S+", text)
        return len(tokens)

## Step 7: Generate README Configuration and Content

## Imports

- ``Dict, Any, List, and BaseModel`` are imported from typing and pydantic for type annotations and validation.

- ``requests`` is used for making HTTP requests, specifically to retrieve the content of a file from a URL.

- ``re`` is imported for regular expressions, which are used to extract badges from markdown.

- ``json`` is used for serializing data into JSON format.

## Classes

### Project Class (Pydantic BaseModel)

This class defines the schema for a project and uses Pydantic's BaseModel to enforce validation.

- ``repo_name``: The name of the repository.

- ``title``: The title of the project.

- ``description``: A description of the project.

- ``contributors``: A list of contributors to the project (list of strings).

- ``references``: A list of references related to the project (list of strings).

- ``tech_stack``: The technologies used in the project (list of strings).

- ``tags``: Tags associated with the project (list of strings).

- ``license``: The license type for the project.

### Sections Class (Pydantic BaseModel)

This class defines a list of sections that are to be used in the table of contents. It's a simple schema with a single field ``sections``, which is a list of strings.

## Functions

### `extract_badges_from_markdown` (markdown_text: str) -> Dict[str, str]

This function is responsible for extracting badges from a markdown text.

- It uses a regular expression (``r"\[!\[([^\]]+)\]\((https?://[^\)]+)\)\]``) to match the markdown syntax for badges, which typically appears as ``![badge_name](URL)``.

- The ``re.findall()`` function is used to extract the badge names and URLs from the markdown text.

- The function returns a dictionary where the badge names are the keys, and the URLs are the values.

### `get_badges` () -> Dict[str, str]

This function retrieves the markdown content for badges from a specific URL (``https://raw.githubusercontent.com/inttter/md-badges/refs/heads/main/README.md``). It calls ``requests.get()`` to fetch the README file from this URL, then passes the content to ``extract_badges_from_markdown()`` to extract the badge names and URLs.

- The function returns a dictionary of badges and URLs.

### `get_table_of_contents` (llm: EasyLLM, project: Project) -> List[str]

This function generates a table of contents for the README, using a mandatory list of sections and adding custom ones specific to the project.

- It starts by defining two lists of mandatory sections: one for the first half of the README (e.g., "About", "Problem Statement", "Usage") and the second half (e.g., "Contributing", "License", "References").

- The function then uses an EasyLLM instance (which is presumably a custom class or wrapper for interacting with a large language model, although it’s not defined in the provided code) to generate additional project-specific sections.

- A prompt is constructed, providing the mandatory sections and asking the LLM to suggest three new sections that are unique and specific to the project.

- The sections generated by the LLM are added to the table of contents, ensuring no repetition of the mandatory sections.

- Finally, the function returns the full table of contents.

## `get_readme_generator_config` (title: str) -> Dict[str, Any]

This function generates the configuration for the README generator, focusing on the visual aspects (header and table of contents).

### Visual Configuration for Header:

- The prompt generates a good header section for the README. It includes:

  - A title with an emoji (e.g., ``# 🚀 My Awesome Project``).

  - A list of badges to be included below the title (using the ``get_badges()`` function).

  - A call to action to encourage users to star the project on GitHub.

  - A marketing tagline that succinctly describes the project.

### Table of Contents:

- The prompt generates a table of contents with the mandatory sections and any additional sections generated by the LLM.

- The function returns a dictionary with ``visual_config`` that includes the generated prompts for the header and table of contents.

## Overall Workflow

### Badges Extraction:

- The code starts by retrieving badges from a specific URL (``get_badges()``) and parsing them using regular expressions (``extract_badges_from_markdown()``).

### Generating Table of Contents:

- The ``get_table_of_contents()`` function uses the LLM to generate the table of contents by combining predefined sections and custom sections specific to the project.

### Generating README Configuration:

- The ``get_readme_generator_config()`` function sets up the README header and table of contents configuration, which will be used to generate the final README file for the project.

### Example of Usage

- If you call ``get_table_of_contents()`` with an instance of ``Project`` and an LLM object, the function will return a list of sections that should be included in the project's README.

- Calling ``get_readme_generator_config()`` with the title of the project will generate the visual configuration required to display the README header and its contents.


In [51]:
from typing import Dict, Any
import requests
import re
from pydantic import BaseModel
from typing import List


class Project(BaseModel):
    repo_name: str
    title: str
    description: str
    contributors: list[str]
    references: list[str]
    tech_stack: list[str]
    tags: list[str]
    license: str


class Sections(BaseModel):
    sections: list[str]


def extract_badges_from_markdown(markdown_text: str) -> Dict[str, str]:
    """
    Extract badges and their URLs from the given markdown text.

    Args:
        markdown_text (str): The markdown content containing badges.

    Returns:
        dict: A dictionary with badge names as keys and URLs as values.
    """
    pattern = r"\[!\[([^\]]+)\]\((https?://[^\)]+)\)\]"
    matches = re.findall(pattern, markdown_text)
    return {badge: url for badge, url in matches}


def get_badges() -> Dict[str, str]:
    """
    Get the markdown content of the README file containing badges.

    Args:
        None

    Returns:
        dict: A dictionary with badge names as keys and URLs as values.

    """
    url = (
        "https://raw.githubusercontent.com/inttter/md-badges/refs/heads/main/README.md"
    )
    response = requests.get(url)
    badges_markdown = response.text
    return extract_badges_from_markdown(badges_markdown)


def get_table_of_contents(llm: EasyLLM, project: Project) -> List[str]:
    """
    Generates the table of contents for the README.

    Args:
        llm (EasyLLM): The EasyLLM instance.
        project (Project): The Project instance.

    Returns:
        list: A list of sections for the table of contents.
    """
    mandatory_sections_first_half = [
        "About",
        "Problem Statement",
        "Research Areas",
        "Installation",
        "Usage",
    ]
    mandatory_sections_second_half = [
        "Contributing",
        "License",
        "Acknowledgements",
        "Authors",
        "References",
    ]
    table_of_contents = mandatory_sections_first_half

    prompt = f"""
        Given the table of contents for the GitHub README of Project: '{project.title}' with the following sections:
        Mandatory Sections: {json.dumps(mandatory_sections_first_half + mandatory_sections_second_half)}
        These sections should be linked to the respective sections in the README.
        These are mandatory sections for a good README.
        You can add more sections if you want as per the project requirements.
        For instance, a web scraping project would have "AI & Web Scraping" as a separate section. 
        Then, one of our projects, knowledge-grapher would have "Knowledge Graphs in Practice" and "How Google uses Knowledge Graphs at scale" as a separate section.

        Generate any 3 new sections unique, and specific to the project.
        Make sure you don't repeat the same sections and don't include the mandatory sections again.

        Project Details: '''{project.model_dump_json()}''''
    """
    sections = llm.generate_object(prompt, Sections)

    if sections and len(sections.sections) > 0:
        for section in sections.sections:
            if section not in table_of_contents:
                table_of_contents.append(section)

    for section in mandatory_sections_second_half:
        if section not in table_of_contents:
            table_of_contents.append(section)

    return table_of_contents


def get_readme_generator_config(title: str) -> Dict[str, Any]:
    README_GENERATOR_CONFIG = {
        "visual_config": {
            "header": {
                "prompt": f"""
                Generate a good header section for the README
                1. TITLE: It should contain the title of the project with an emoji. 
                Example: `# 🚀 My Awesome Project`
                Be creative with the emoji and title and try to ensure that the title is relevant to the project and README.
                TITLE: {title}

                2. BADGES: After the title add badges.
                Available BadgesL {json.dumps(get_badges())}   

                3. Message to star us on GitHub.
                Example: Please give us a ⭐ on GitHub if this project helped you!

                4. Marketing Tagline: Add a marketing tagline for the project.
                Example: #### Simplify your ETL pipelines with LitETL 🔥 — the lightweight ETL framework!
                """
            },
            "table_of_contents": {
                "prompt": """
                Generate a table of contents for the README.
                This should include "About", "Problem Statement", "Research Areas", "Installation", "Usage", "Contributing", "License", "Acknowledgements", "Authors", "References".
                These sections should be linked to the respective sections in the README.
                These are mandatory sections for a good README.
                You can add more sections if you want as per the project requirements.
                For instance, a web scraping project would have "AI & Web Scraping" as a separate section. 
                Then, one of our projects, knowledge-grapher would have "Knowledge Graphs in Practice" and "How Google uses Knowledge Graphs at scale" as a separate section.
                """
            },
        }
    }
    return README_GENERATOR_CONFIG

## Step 8: Retrieve and Format Badges, Estimate Token Usage

This code is designed to retrieve badges from a markdown file, format them as JSON, and estimate how many tokens the resulting JSON will consume in the context of a large language model (LLM).

### Initializing EasyLLM Instance:

- An instance of ``EasyLLM`` is created to interact with the LLM. This object likely has methods to help with tasks like token estimation or generating content based on prompts (though its exact implementation is not provided).

### Fetching Badges:

- The ``get_badges()`` function is called to retrieve the badges from a markdown file located at a specified URL. This function parses the markdown, extracting badge names and their corresponding URLs, returning them as a dictionary.

### Converting Badges to JSON:

- The ``json.dumps()`` function is used to convert the dictionary of badges into a JSON-formatted string. The badges are indented with 2 spaces for better readability. This string (``badges_json``) now represents the badges in a structured, machine-readable format.

### Printing the Number of Badges:

- The code prints the number of badges retrieved by calculating the length of the badges dictionary. This helps in understanding how many badges were extracted from the markdown file.

### Estimating Token Usage:

- The ``estimate_tokens()`` method of the ``EasyLLM`` instance is used to estimate the number of tokens the JSON string will consume if passed to the LLM. This is important because LLMs have a maximum token limit, and estimating token usage ensures the data won't exceed this limit.

### Displaying the JSON Representation of Badges:

- Finally, the formatted JSON string (``badges_json``) is printed to display the badge names and their URLs in a readable format.

### Purpose:

The overall purpose of this code is to extract badges from a README markdown file, format them as JSON for structured data handling, and ensure that the data can be processed by the LLM without exceeding token limits. The token estimation step is particularly useful when dealing with large datasets to avoid truncation or errors when passing the data to the model.


In [None]:
import json

llm = EasyLLM()
badges = get_badges()
badges_json = json.dumps(badges, indent=2)


print(f"Badges found in the README: {len(badges)}")
# For estimating if it will fit in the LLM context window
print(f"Tokens: {llm.estimate_tokens(badges_json)}")
print(badges_json)

## Step 9: Generate and Display Table of Contents for README

### Importing clear_output from IPython.display:

- This imports the ``clear_output()`` function, which is used to clear the output of the cell in a Jupyter Notebook. This can be useful for keeping the notebook interface clean by removing previous outputs.

### Calling get_table_of_contents() Function:

- The ``get_table_of_contents()`` function is called to generate the table of contents for a README file.
- A ``Project`` instance is created with the following attributes:

  - ``repo_name``: "test"

  - ``title``: "Test"

  - ``description``: "Test"

  - ``contributors``: ["Adi"]

  - ``references``: ["Ref"]

  - ``tech_stack``: ["Python"]

  - ``tags``: ["Test"]

  - ``license``: "MIT"

- The ``llm`` instance is passed along with the ``Project`` instance to generate the table of contents. The ``llm`` is expected to interact with the LLM (likely a large language model) to suggest sections for the table of contents, combining predefined mandatory sections with custom sections based on the project.

### Clearing the Output:

- This clears any output that was previously displayed in the notebook cell. It is typically used to remove clutter, especially when running multiple iterations or operations in a notebook environment.

### Printing the Table of Contents:

- The table of contents, which was generated by the ``get_table_of_contents()`` function, is printed. This will show the list of sections that should be included in the README file for the "Test" project.

### Purpose:

The goal of this code is to generate and display the table of contents for a README file based on the ``Project`` instance and a large language model (LLM). The output is cleared beforehand to ensure that only the relevant table of contents is shown in the notebook.


In [None]:
from IPython.display import clear_output

table_of_contents = get_table_of_contents(
    llm,
    Project(
        repo_name="test",
        title="Test",
        description="Test",
        contributors=["Adi"],
        references=["Ref"],
        tech_stack=["Python"],
        tags=["Test"],
        license="MIT",
    ),
)

clear_output()

print(f"Table of Contents: {table_of_contents}")

## Conclusion:

This application streamlines the process of generating structured and informative README files for projects by automating key components such as the table of contents, badge extraction, and visual configuration. By leveraging a large language model (LLM) and predefined templates, the app ensures that each README is comprehensive and aligned with best practices.

- ### Key highlights of the app:

    - `Automated Table of Contents Generation`: It dynamically creates a table of contents based on mandatory and custom sections, tailored to each project's specific needs.
    
    - `Badge Integration`: The app can fetch badges from external markdown sources and include them in the README for a more polished presentation.

    - `Customizable Configuration`: Through configurable prompts, it generates headers, section titles, and marketing taglines, ensuring a professional and engaging README.

    - `Token Estimation for LLM`: The app estimates token usage for large data to prevent issues with token limits when working with the LLM.
    
Overall, this app significantly reduces manual effort, enhances the quality of README files, and provides a consistent and automated way to document projects effectively.

---

# Thank You for visiting The Hackers Playbook! 🌐

If you liked this research material;

- [Subscribe to our newsletter.](https://thehackersplaybook.substack.com)

- [Follow us on LinkedIn.](https://www.linkedin.com/company/the-hackers-playbook/)

- [Leave a star on our GitHub.](https://www.github.com/thehackersplaybook)

<div style="display:flex; align-items:center; padding: 50px;">
<p style="margin-right:10px;">
    <img height="200px" style="width:auto;" width="200px" src="https://avatars.githubusercontent.com/u/192148546?s=400&u=95d76fbb02e6c09671d87c9155f17ca1e4ef8f21&v=4"> 
</p>
</div>
