# Automated Content Generation Tool - PoC

## Importing libraries and instanciating the clients

In [1]:
from dotenv import load_dotenv
import os 
from groq import Groq
from openai import OpenAI
import pandas as pd
import json

_ = load_dotenv()

GROQ_API_KEY = os.getenv('GROQ_API_KEY')
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

client_groq = Groq(
    api_key=GROQ_API_KEY,
)

client_openai = OpenAI(
    api_key=OPENAI_API_KEY,
)

## Prompts and chat completions

First, let's define a helper function for output the formated article. The article_generation() function takes 4 arguments and output an article in a JSON format as specified:

```json
{“article_title”: “...”,
“sections”:
    [
        {“headline”: “headline of the section”,
        “body”: “The text body of the section”}
        , # Add other sections
    ]
}
```

In [8]:
def article_generation(
        client, 
        system_prompt: str, 
        user_prompt: str, 
        model: str = "gpt-4o"):
    """
    Generates an article using a language model with a structured JSON schema output.

    Args:
        client: The OpenAI API client instance.
        system_prompt (str): The initial prompt given to the system to set the tone or context.
        user_prompt (str): The content or prompt provided by the user, typically a string generated from a DataFrame or similar source.
        model (str): The model to use for generating the completion. Default is 'llama3-70b-8192'.

    Returns:
        str: The structured JSON content of the generated article.
    """
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            response_format={
                "type": "json_schema",
                "json_schema": {
                    "name": "article_response",
                    "schema": {
                        "type": "object",
                        "properties": {
                            "article_title": {"type": "string"},
                            "sections": {
                                "type": "array",
                                "items": {
                                    "type": "object",
                                    "properties": {
                                        "headline": {"type": "string"},
                                        "body": {"type": "string"}
                                    },
                                    "required": ["headline", "body"],
                                    "additionalProperties": False
                                }
                            }
                        },
                        "required": ["article_title", "sections"],
                        "additionalProperties": False
                    },
                    "strict": True
                }
            }
        )
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

    return json.loads(response.choices[0].message.content)

The article_generation() function takes two prompts as arguments:

1. System prompt: defining the general goal/objective of the model. Here we define our prompt technique, which in our case, it's evaluated Few Shot and Chain of Thought (CoT) 

2. User prompt: defining the user input to the model. We will receive a pandas dataframe csv structure with two colums: Main Keyword, Secondary Keywords

References:

- [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903)
- [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
- [ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629) and [https://react-lm.github.io](https://react-lm.github.io)

In [13]:
FS_PROMPT = """
You are an advanced AI language model designed to generate high-quality, SEO-friendly articles. The target audience for these articles includes high school students and university new joiners. Your writing should be objective, clear, and use a friendly tone that is easy for beginners to understand.

Below are examples of how you should structure the output when given a set of main and secondary keywords.

Example 1:
Main Keyword: Learning Python
Secondary Keywords: Python basics, programming tutorials, beginner Python projects, Python syntax, coding exercises, Python for data science

Article Title: Mastering Python: A Beginner's Guide
Sections:
1. Headline: Introduction to Python
   Body: Python is a versatile and powerful programming language that is easy to learn for beginners. This section introduces Python's key features and why it is a popular choice for new programmers.
2. Headline: Getting Started with Python Syntax
   Body: Understanding Python syntax is the first step in learning to code. This section covers the basics of Python syntax, explained in a clear and approachable manner for beginners.
3. Headline: Hands-on Python Projects for Beginners
   Body: One of the best ways to learn Python is by working on simple projects. This section provides ideas for beginner projects that help reinforce your understanding of Python.

Example 2:
Main Keyword: Learning English
Secondary Keywords: English grammar, vocabulary building, English speaking practice, English writing tips, language learning strategies, English listening exercises

Article Title: Effective Strategies for Learning English
Sections:
1. Headline: Introduction to Learning English
   Body: English is a global language that opens doors to numerous opportunities. This section discusses the importance of learning English and how to get started, especially for high school students and university new joiners.
2. Headline: Building a Strong Vocabulary
   Body: A strong vocabulary is essential for mastering English. This section provides strategies for expanding your vocabulary, including tips on using flashcards and reading regularly.
3. Headline: Practicing English Speaking Skills
   Body: Speaking is one of the most challenging aspects of learning English. This section offers practical advice on how to improve your English speaking skills through regular practice and interaction with native speakers.

When provided with new keywords, generate a structured, SEO-friendly article following the same format as the examples above. The output should include an article title and sections with headlines and body text, all presented in a clear, objective, and friendly tone suitable for high school students and university new joiners.

The output should follow the JSON response format.
"""

COT_PROMPT = """
You are an advanced AI language model designed to generate high-quality, SEO-friendly articles. The target audience for these articles includes high school students and university new joiners. Your writing should be objective, clear, and use a friendly tone that is easy for beginners to understand.

Let's think step-by-step. Here’s how you should approach the task:

1. **Understand the Keywords**: First, identify the main keyword and the secondary keywords. These will be used to determine the central theme and subtopics of the article. The primary objective is to generate an SEO-friendly article on the given topic defined by the {keywords}.

2. **Determine the Article Title**: Based on the main keyword and the secondary keywords, generate a concise and descriptive article title that captures the essence of the content. The title should be clear and engaging, tailored to high school students and new university joiners.

3. **Outline the Article Sections**: Break down the article into sections. Each section should correspond to one or more of the secondary keywords. For each section:
   - Generate a clear and informative headline.
   - Write a detailed body text that explains or elaborates on the topic introduced in the headline. Ensure that the explanation is objective, clear, and uses a friendly tone that is easy to understand.

4. **Generate the Article**: Compile the sections into a structured article. Ensure that the article flows logically from one section to the next, providing valuable information related to the keywords. The tone should remain friendly and approachable throughout, keeping in mind the target audience.

Example:

Main Keyword: Learning Python
Secondary Keywords: Python basics, programming tutorials, beginner Python projects, Python syntax, coding exercises, Python for data science

Step 1: Keywords Analysis
- Main Keyword: Learning Python
- Secondary Keywords: Python basics, programming tutorials, beginner Python projects, Python syntax, coding exercises, Python for data science

Step 2: Title Generation
- Title: Mastering Python: A Beginner's Guide

Step 3: Section Outlines
- Section 1: Headline: Introduction to Python
   Body: Python is a versatile and powerful programming language that is easy to learn for beginners. This section introduces Python's key features and why it is a popular choice for new programmers.
- Section 2: Headline: Getting Started with Python Syntax
   Body: Understanding Python syntax is the first step in learning to code. This section covers the basics of Python syntax, explained in a clear and approachable manner for beginners.
- Section 3: Headline: Hands-on Python Projects for Beginners
   Body: One of the best ways to learn Python is by working on simple projects. This section provides ideas for beginner projects that help reinforce your understanding of Python.

Example:

Main Keyword: Learning English
Secondary Keywords: English grammar, vocabulary building, English speaking practice, English writing tips, language learning strategies, English listening exercises

Step 1: Keywords Analysis
- Main Keyword: Learning English
- Secondary Keywords: English grammar, vocabulary building, English speaking practice, English writing tips, language learning strategies, English listening exercises

Step 2: Title Generation
- Title: Effective Strategies for Learning English

Step 3: Section Outlines
- Section 1: Headline: Introduction to Learning English
   Body: English is a global language that opens doors to numerous opportunities. This section discusses the importance of learning English and how to get started, especially for high school students and university new joiners.
- Section 2: Headline: Building a Strong Vocabulary
   Body: A strong vocabulary is essential for mastering English. This section provides strategies for expanding your vocabulary, including tips on using flashcards and reading regularly.
- Section 3: Headline: Practicing English Speaking Skills
   Body: Speaking is one of the most challenging aspects of learning English. This section offers practical advice on how to improve your English speaking skills through regular practice and interaction with native speakers.

Step 4: Compile and Present
- Compile the title and sections into a well-structured article that is SEO-friendly and suitable for high school students and university new joiners.

Use this approach to generate the article step by step, ensuring clarity, objectivity, and a friendly tone throughout the process.

The output should follow the JSON response format.
"""

## Testing the system

First we need to load our CSV. I used pandas for that. 

For our examples, we have keywords related to learning Arduino.

In [21]:
# Load the CSV file
df = pd.read_csv('../src/dataset/arduino_seo_keywords.csv')
df.head()

# Getting the desired keywords
main_keyword = df['Main Keyword'][1]
sec_keywords = df['Secondary Keywords'][1]

In [15]:
user_prompt = f"""Generate a 200 words SEO-friendly article about the theme in the keywords:
Main keyword = {str(main_keyword)}
Secondary Keywords = {str(sec_keywords)}
"""

So let's generate and article with using the previously defined Few-shot prompt

In [17]:
article_fs = article_generation(
        client_openai, 
        system_prompt = FS_PROMPT, 
        user_prompt   = user_prompt, 
        model         = "gpt-4o-2024-08-06")

with open('article_fs_output.json', 'w') as file:
    json.dump(article_fs, file, indent=2)

Then let's use the Chain-of-Thought one

In [18]:
article_cot = article_generation(
        client_openai, 
        system_prompt = COT_PROMPT, 
        user_prompt   = user_prompt, 
        model         = "gpt-4o-2024-08-06")

with open('article_cot_output.json', 'w') as file:
    json.dump(article_cot, file, indent=2)