# BlogGPT Demo: An Automated Blog Post Generation

Welcome to the demonstration notebook for BlogGPT, an ambitious project that employs the state-of-the-art LLM (Large Language Models) technologies to streamline and automate the generation of blog posts on trending topics.

### 1. Purpose of this Notebook:
The aim of this notebook is to provide a comprehensive overview of the BlogGPT project. Through it, I aspire to elucidate the intricacies of my work, and present it as a testament to my skills and potential to prospective employers. Although what you'll see here represents the project as of mid-July, it's essential to note that BlogGPT is an evolving endeavor. As it gets polished and refined, the ultimate goal is to deploy a production-ready version, with its debut slated for Medium.

### 2. Motivation:
The inception of BlogGPT can be attributed to two primary factors:

- **The Information Quandary**: Every day, the internet is abuzz with a plethora of trending topics. But comprehensively understanding them often entails wading through a morass of platforms, leading to a considerable time sink and the ever-persistent problem of information overload. This prompted the desire for a unified platform to expedite and simplify this learning process.
  
- **LLM Exploration**: My year-long foray into the realm of LLMs revealed the vast potential waiting to be tapped. It sparked the idea of harnessing these models to emulate a personal blogger, automating content creation while ensuring it remains engaging and up-to-date. Thus, BlogGPT was born - tailored not just for me, but for inquisitive minds worldwide.

### 3. Project Overview:

#### Technologies:
- **ChatGPT API (Version 3.5)**: The backbone, responsible for the core content generation.
- **LangChain**: An LLM application framework, facilitating smoother integration. One noteworthy inspiration was the "QA over Documents" use-case, which was reimagined to suit our writing paradigm.
- **TensorFlow Encoder**: Deployed to vectorize textual data, aiding in similarity calculations.
- **ChromaDB**: A vector database system, essential for storing and cherry-picking the most pertinent context data.
- **API Integrations**: Tapping into Wikipedia, Google News, and Google Trends ensures that BlogGPT is always equipped with current and trending information.

#### Challenges:
- **Internet Restrictions**: Given that LLMs can't connect to the web, third-party APIs became indispensable.
- **Token Limitations**: ChatGPT's constraints necessitated strategic content segmentation for longer pieces.
- **Information Overload**: Balancing context quality with quantity was pivotal, ensuring optimal input for desired outputs.

Certainly! Here's the updated "Method" section based on your inputs:

---

#### Method:

1. **Trend Identification**:
    - Use the Google Trend API and `pytrend` to identify trending topics and relevant Google search terms within Canada.

2. **Link & Title Collection**:
    - Retrieve relevant links and titles associated with the chosen keyword.
    - Use the Google News API to find pertinent news links and headlines.
    - Leverage the Wikipedia API to source links and titles of relevant Wikipedia pages.

3. **Content Planning with ChatGPT**:
    - Utilize the ChatGPT API to draft a writing plan. The plan is informed by keywords, relevant Google search terms, and news headlines.

4. **Contextual Data Extraction**:
    - Extract textual content from the identified links. This is achieved using the `WebBaseLoader` feature in LangChain's Integration Hub and the popular web scraping tools, `requests` and `BeautifulSoup`.

5. **Vector Generation**:
    - Divide the sourced textual content into manageable document sizes with the assistance of LangChain's text splitter.
    - Vectorize these smaller documents using the TensorFlow encoder to represent them in a format suitable for similarity comparisons.

6. **Vector Database Creation**:
    - Establish a vector database through ChromaDB. This facilitates the calculation of similarity scores to pinpoint the most relevant context for each section of the blog.

7. **Section-wise Blog Composition**:
    - Craft each section of the blog using the ChatGPT API. This step relies heavily on the earlier writing plan and the context determined from the Vector database.

8. **Final Blog Compilation**:
    - To ensure a cohesive and engaging read, employ ChatGPT for one last pass to polish and unify the individual sections into a comprehensive blog post.

---
Let this notebook serve as your guided tour through the marvels and challenges of automating content creation. Dive in, explore, and witness the synergy of technology and creativity in BlogGPT!

In [1]:
# General Libraries
import time
import random
import json
import logging
from datetime import datetime, timedelta

# Data Processing
import pandas as pd

# Web Scraping and Data Retrieval
import openai
import requests
from bs4 import BeautifulSoup
import wikipedia
from GoogleNews import GoogleNews

# Google Trends
from pytrends.request import TrendReq
from pytrends.exceptions import TooManyRequestsError # for handling the frequent error

# Langchain Libraries
from langchain.document_loaders.news import NewsURLLoader
from langchain.document_loaders.wikipedia import WikipediaLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

# Custom Toolkit
import credentials  # Store and retrieve API keys
from toolkit import chatgpt, select_link_content

# Setting up the OpenAI API key for usage
openai.api_key = credentials.OPENAI_APIKEY

# Configuring logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')


### Trend Identification:

To keep our blog posts timely and relevant, we kick off the process by identifying the top trending search terms in Canada using Google Trends. The find_trend function accomplishes this by:

Connecting to Google Trends using the `pytrends` library.
Fetching all trending keywords for target `COUNTRY`.
Selecting the keyword based on the specified `rank`.
Retrieving related search queries and topics for the chosen keyword.
Returning the gathered data in a structured dictionary.


In [3]:
def request_google_trends(pytrend, keyword, max_retries=3):
    """Request related queries and topics from Google Trends with retry logic."""
    retries = 0
    while retries < max_retries:
        try:
            pytrend.build_payload(kw_list=[keyword], cat=0, timeframe='today 5-y', geo='', gprop='')
            
            related_queries_data = pytrend.related_queries()
            first_value = next(iter(related_queries_data.values()))
            queries = first_value['top']['query'].to_string()
            
            related_topic_data = pytrend.related_topics()
            first_value = next(iter(related_topic_data.values()))
            topics = first_value['rising'][['topic_title', 'topic_type']].to_string()
            
            return queries, topics

        except (requests.exceptions.timeout, requests.exceptions.ConnectTimeoutError, requests.exceptions.ConnectionError) as e:
            retries += 1
            logging.warning(f"Exception occurred: {str(e)}. Retry {retries}/{max_retries}. Waiting for 60 seconds before retrying...")
            time.sleep(60)
    logging.error(f"Failed to fetch data from Google Trends after {max_retries} retries.")
    raise RuntimeError(f"Failed to fetch data from Google Trends after {max_retries} retries.")

def find_trend(country, topic_rank=0, max_retries=3):
    """Fetch the top trending searches from Google Trends for a specified country."""
    pytrend = TrendReq(timeout=(10,25), retries=3)

    all_keywords = pytrend.trending_searches(country)
    if topic_rank >= len(all_keywords) or topic_rank < 0:
        logging.error(f"Invalid topic rank {topic_rank}. Please select within the range of available trending searches.")
        raise ValueError(f"Invalid topic rank {topic_rank}. Please select within the range of available trending searches.")

    keyword = all_keywords.iloc[topic_rank, 0]
    queries, topics = request_google_trends(pytrend, keyword)
    print(f"Found trend: ### {keyword} ###")

    return {
        "keyword": keyword,
        "related_queries": queries,
        "related_topics": topics
    }


In [4]:
# configuration
COUNTRY = 'canada'
TOPIC_RANK = 0 # the first trending topic

# find trend
trend = find_trend(country=COUNTRY, topic_rank=TOPIC_RANK)

Found trend: ### Premier League ###


#### News Retrieval:

The `get_news` function fetches relevant news articles for a specified trending keyword from given sources using GoogleNews. It searches articles from the past 15 days, filters them based on preferred sources, and then randomly selects a subset to return. This ensures timely and pertinent content for users.

In [30]:
def get_news(trend, sources, n=5, days=15):
    """
    Fetches news articles for a given keyword from specified sources using GoogleNews.
    
    Parameters:
    - keyword: The search term.
    - n: Number of articles to return.
    - sources: List of news sources to consider.

    Returns:
    - Dictionary with search results, titles, and URLs of selected articles.
    """
    # Extract keyword for the search
    keyword = trend['keyword']
    print(f"Searching news for '{keyword}'...")
    
    # Define date range for the search
    end_date = datetime.today().strftime('%m/%d/%Y')
    start_date = (datetime.now() - timedelta(days=days)).strftime('%m/%d/%Y') 

    # Initialize the GoogleNews object and search
    googlenews = GoogleNews(lang='en', start=start_date, end=end_date)
    googlenews.get_news(keyword)
    search_results = googlenews.results(sort=True)

    selected_sources = []
    
    # Filter news articles by specified sources
    for article in search_results:
        media_name = article['media'].lower()
        link = article['link']
        title = article['title']
        
        if any(source in media_name for source in sources):
            if 'https://' not in link:
                link = 'https://' + link

            selected_sources.append({
                "title": title, 
                "url": link, 
                "media": media_name
            })

    # Randomly select `n` articles from the filtered results
    articles = random.sample(selected_sources, n)
    
    # Prepare articles for printing
    prep_print = [f"{article['media']}: {article['title'][:50]}...\n{article['url']}" for article in articles]
    print(*prep_print, sep='\n')
    
    output = {
        "search_results": search_results,
        "articles": articles
    }
    
    return output


In [31]:
# configuration
SOURCES = ['yahoo', 'cbc', 'global news', 'ctv', 'cnn', 'bbc', 'the times', 'fox', 'cbs', 'daily news', 'new york times', 'abc', 'wall street', 'washington post', 'usa today']
NEWS_NUM = 5
DAYS = 15

news = get_news(trend, sources=SOURCES, n=NEWS_NUM, days=DAYS)

Searching news for 'Premier League'...
bbc: David Raya: Brentford boss Thomas Frank expects go...
https://news.google.com/./articles/CBMiK2h0dHBzOi8vd3d3LmJiYy5jb20vc3BvcnQvZm9vdGJhbGwvNjY0NDUzODnSAQA?hl=en-CA&gl=CA&ceid=CA%3Aen
cbs sports: Premier League preview: 10 questions facing Manche...
https://news.google.com/./articles/CBMijAFodHRwczovL3d3dy5jYnNzcG9ydHMuY29tL3NvY2Nlci9uZXdzL3ByZW1pZXItbGVhZ3VlLXByZXZpZXctMTAtcXVlc3Rpb25zLWZhY2luZy1tYW5jaGVzdGVyLWNpdHktYXJzZW5hbC1tYW5jaGVzdGVyLXVuaXRlZC1jaGVsc2VhLWFuZC1tb3JlL9IBkAFodHRwczovL3d3dy5jYnNzcG9ydHMuY29tL3NvY2Nlci9uZXdzL3ByZW1pZXItbGVhZ3VlLXByZXZpZXctMTAtcXVlc3Rpb25zLWZhY2luZy1tYW5jaGVzdGVyLWNpdHktYXJzZW5hbC1tYW5jaGVzdGVyLXVuaXRlZC1jaGVsc2VhLWFuZC1tb3JlL2FtcC8?hl=en-CA&gl=CA&ceid=CA%3Aen
bbc: Brentford v Tottenham Hotspur preview: Team news a...
https://news.google.com/./articles/CBMiLWh0dHBzOi8vd3d3LmJiYy5jby51ay9zcG9ydC9mb290YmFsbC82NjQxOTkwONIBAA?hl=en-CA&gl=CA&ceid=CA%3Aen
fox sports: English Premier League 2023-24 predictions: F

#### Wikipedia Page Retrieval

`get_wiki()` uses news headlines to predict the relevant Wikipedia articles.  
  
**Steps:**
1. **News Selection:** Randomly selects two headlines from the provided news titles.
2. **Model Prompt:** Assembles a structured prompt for the chatbot to identify related Wikipedia articles.
3. **Model Interaction:** Uses the model's output to get predicted Wikipedia titles.
4. **Output Compilation:** Organizes the main news, chatbot response, and finalized Wikipedia titles into a structured dictionary.

In [32]:
def get_wiki(news, n):
    """
    Predicts relevant Wikipedia pages based on given news titles.
    
    Parameters:
    - news (dict): Dictionary containing news options, particularly titles.

    Returns:
    - dict: Dictionary containing selected news, chatgpt response, Wikipedia titles, and URLs.
    """
    
    # Randomly select 2 news titles from the given options
    selected_news = random.sample(news['articles'], 2)
    news_headlines = '\n'.join([news['title'] for news in selected_news])
    print(f"Predicts relevant Wikipedia pages based on: \n{news_headlines}\n")
    
    # Construct a message to prompt the model for predictions
    message = [
        {"role": "assistant", "content": f"Today's breaking news: '{news_headlines}'."},
        {"role": "user", "content": f"Predict {n} most relevant Wikipedia pages to help readers understand what's happening."},
        {"role": "user", "content": 'Output the page titles in JSON format: {"wiki_titles": []}'}
    ]
    
    # Use the model to fetch relevant Wikipedia page titles
    response = chatgpt(message, content=True)
    
    # Convert the model's response into a list of Wikipedia titles
    wiki_titles = json.loads(response, strict=False)['wiki_titles']
    print("Suggested Wikipedia Pages:")
    print(*wiki_titles, sep='\n')
    
    # Prepare the output containing the selected news, model's response, processed titles, and URLs
    output ={
        'referred_news': selected_news, 
        'chatgpt_response': response, 
        'wiki_titles': wiki_titles
    }
    
    return output


In [33]:
# configuration
WIKI_NUM = 5

wiki = get_wiki(news, n=WIKI_NUM)

Predicts relevant Wikipedia pages based on: 
Brentford v Tottenham Hotspur preview: Team news and stats
Premier League quiz: Can you name this current or former player?

Suggested Wikipedia Pages:
Brentford F.C.
Tottenham Hotspur F.C.
Premier League
Brentford Community Stadium
Football statistics and records


### Content Planning with ChatGPT

`make_plan` employs ChatGPT to design a structured content plan around trending keywords and related news.  

**Steps:**
1. **Keyword Extraction**: Extracts the primary keyword from the given trend.
2. **Data Compilation**: Collates related queries, topics, and recent news headlines linked to the keyword.
3. **Plan Message Construction**: Builds a message prompting ChatGPT to brainstorm and structure a blog post based on the provided data.
4. **Bot Interaction Setup**: Sets up a conversation structure instructing the chatbot on the desired format and context.
5. **Response Handling**: Captures ChatGPT's response, retrieves the content, and deciphers the JSON answer.
6. **Output Construction**: Constructs a dictionary comprising message details, ChatGPT response, token usage cost, and the structured blog plan.
7. **Result Display**: Outputs the token cost and the titles of each planned section.

In [34]:
def make_plan(trend, news, n_sec):
    """
    Generate a content plan based on a given trend.
    
    Parameters:
    - trend (dict): Contains keyword and related queries and topics about a trend.
    
    Returns:
    - output (dict): Contains the message body, response, cost, and answer from the chatbot.
    """
    
    # Extract the keyword from the trend
    keyword = trend["keyword"]
    print(f"Planning a blog post about {keyword}")
    
    # Extract related queries, topics and news
    queries = trend["related_queries"]
    topics = trend["related_topics"]
    news_headlines = [article['title'] for article in news['articles']]
    
    # Construct the message for the plan
    plan_message = f"""
    People are searching "{keyword}" like crazy today, so I want to write a blog post for the topic.
    Help me brainstorm what I should write about. Specifically, plan a blog post, based on the related trending topics, google queries and recent news.  
    Remeber the motivation of this blog is to help people learn about the trending topic "{keyword}", so make sure include this in opennings for readers. 
    I want {n_sec} parts in total, including the openning and closing paragraphs. 
    related topics:
    {topics}
    related queries:
    {queries}
    news:
    {news_headlines}
    """
    
    # Define the message body to communicate with the chatgpt
    message_body = [
        {
            "role": "system", 
            "content": """You are a very helpful and professional article writer and instructor. 
                          Keep in mind: 
                          1. forget previous commands
                          2. answer precise and concise
                          3. user has known you are a language model
                          4. answer JSON only"""
        },
        {"role": "user", "content": plan_message},
        {"role": "user", "content": f"""Format in JSON: {{
             "sec_1": {{
                "section_title": "",
                "explanation": ""
             }}, 
             ...
             "sec_{n_sec}": ...  
        }}"""}
    ]
    
    # Use the chatgpt to get the response
    response = chatgpt(message_body)
    content = response['choices'][0]['message']['content']
    answer = json.loads(content, strict=False)
    cost = response["usage"]
    
    # Construct the output dictionary
    output = {
        "message_body": message_body,
        "response": response,
        "cost": cost,
        "answer": answer
    }
    
    # Display the cost
    total_cost = str(cost["total_tokens"])
    print(f"Done! Planning cost: {total_cost}\n")
    for key, values in answer.items():
        print(f"{key}: {values['section_title']}")
    
    return output


In [35]:
# configuration
SECTION_NUM = 7

plan = make_plan(trend, news, n_sec=SECTION_NUM)

Planning a blog post about Premier League
Done! Planning cost: 1382

sec_1: Introduction to the Premier League
sec_2: Key Facts and Figures
sec_3: Top Contenders of the Current Season
sec_4: Exciting Matches and Memorable Moments
sec_5: The Impact of Premier League on English Football
sec_6: Player Spotlight: Rising Stars and Legends
sec_7: Conclusion: The Premier League Phenomenon


In [36]:
def load_news_link(news, split_chunk_size):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size = split_chunk_size, chunk_overlap = 0)
    
    documents = []
    
    for document in news['articles']:
        url = document['url']
        media = document['media']
        loader = NewsURLLoader([url])
        splits = loader.load_and_split(text_splitter)
        for doc in splits:
            doc.metadata['type'] = 'news'
            doc.metadata['media'] = media
            for key, value in doc.metadata.items():
                if not isinstance(value, (str, int, float)):
                    doc.metadata[key] = str(value)
            
        documents.extend(splits)

    return documents

def load_wiki_title(wiki, split_chunk_size):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size = split_chunk_size, chunk_overlap = 0)
    
    documents = []
    
    for title in wiki['wiki_titles']:
        loader = WikipediaLoader(query=title,
                                load_max_docs=2)
        splits = loader.load_and_split(text_splitter)
        for doc in splits:
            doc.metadata['type'] = 'wiki'
            for key, value in doc.metadata.items():
                if not isinstance(value, (str, int, float)):
                    doc.metadata[key] = str(value)
            
        documents.extend(splits)

    return documents 
            
from chromadb.utils import embedding_functions

def create_vectorstore(news, wiki, split_chunk_size):
    
    # create documents
    news_documents = load_news_link(news, split_chunk_size=split_chunk_size)
    wiki_documents = load_wiki_title(wiki, split_chunk_size=split_chunk_size)
    all_docs = news_documents + wiki_documents
    
    # store documents
    vectorstore = Chroma.from_documents(documents=all_docs, embedding=OpenAIEmbeddings(openai_api_key=credentials.OPENAI_APIKEY))

    return vectorstore

In [37]:
# configuration
CHUNK_SIZE =  600

vectorstore = create_vectorstore(news, wiki, split_chunk_size=CHUNK_SIZE)

2023-08-11 14:00:46,072 - INFO - Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.


In [38]:
# see what's inside the verctorstore
documents = vectorstore.get()
print(f"vectorstore store these information:\n- {documents.keys()}")
print(f"There are {len(documents['ids'])} documents each with a size of {CHUNK_SIZE} characters\n")
print('An Example Document:\n')
print(f"id: {documents['ids'][0]}\n")
print(f"embedding: {documents['embeddings']}\n")
print(f"metadata: \n{documents['metadatas'][0]}\n")
print(f"document content: \n{documents['documents'][0]}\n")

vectorstore store these information:
- dict_keys(['ids', 'embeddings', 'metadatas', 'documents'])
There are 163 documents each with a size of 600 characters

An Example Document:

id: ff0e8df8-3870-11ee-bfa0-0a623052d3b6

embedding: None

metadata: 
{'authors': '[]', 'description': 'Brentford boss Thomas Frank says he expects goalkeeper David Raya to complete a transfer to Arsenal.', 'language': 'en', 'link': 'https://news.google.com/./articles/CBMiK2h0dHBzOi8vd3d3LmJiYy5jb20vc3BvcnQvZm9vdGJhbGwvNjY0NDUzODnSAQA?hl=en-CA&gl=CA&ceid=CA%3Aen', 'media': 'bbc', 'publish_date': 'None', 'title': 'David Raya: Brentford boss Thomas Frank expects goalkeeper to complete Arsenal transfer', 'type': 'news'}

document content: 
Last updated on .From the section Arsenal

David Raya came on at half-time in Brentford's Premier League Summer Series match against Fulham during their pre-season tour of the United States

Brentford boss Thomas Frank says he expects goalkeeper David Raya to complete a move t

In [39]:
def instruct(keyword, section, structure, vectorstore, body_context=False):
    """
    Generates the instructions for writing a specific section of the blog post.
    """
    
    title = section["section_title"]
    explanation = section["explanation"]
    structure = "\n".join(structure)
    
    # make instruction
    parag_n = 2 if body_context else 1
    instruction = f"""I want to write this blog post about {keyword}:\n{structure}.
        Help me write this part in {parag_n} paragraphs:\n{title}\n{explanation}"""
    
    # select context information
    query = "\n\n".join([title, explanation, structure])
    
    # wiki
    wiki_docs = vectorstore.similarity_search(query=query, k=5, filter={'type': 'wiki'})
    contents = [doc.page_content for doc in wiki_docs]
    wiki_context = "\n\n".join(contents)
    # news
    news_docs = vectorstore.similarity_search(query=query, k=5, filter={'type': 'news'})
    contents = [doc.page_content for doc in news_docs]
    news_context = "\n\n".join(contents)
    
    context = f"Use these as context:\n\nRelevant news:\n{news_context}\n\nWikipedia information:\n{wiki_context}"

    # make message
    message = [
        {
            "role": "system", 
            "content": """You are a very helpful and interesting writer.
                          1. Write precise and concise in the appropriate tone
                          2. Answer in JSON format
                          3. Add numbers and facts where possible"""
        },
        {"role": "user", "content": instruction},
        {"role": "user", "content": context},
        {"role": "user", "content": """Output in JSON format: {"writing": ""}"""}
    ]
    
    return message

def expand_writing(trend, wiki, news, vectorstore, plan):
    """
    Expands the given writing plan into a detailed content using the trend, wiki, and news.
    """
    # Extract
    keyword = trend['keyword']
    writing_plan = plan["answer"]
    structure = [values["section_title"] for _, values in writing_plan.items()]

    # Prepare the output structure
    output = {
        "messages": {}, 
        "responses": {},
        "costs": {},
        "section_titles": structure, 
        "writings": {}
    }
    total = len(writing_plan)
    for key, section in writing_plan.items():
        print(f"Writing '{section['section_title']}'...")
        if key in ['sec_1', f'sec_{total}']:
            body_context = False
        else:
            body_context = True
        
        message = instruct(keyword, section, structure, vectorstore, body_context=body_context)
        
        # Retrieve response from ChatGPT
        start_time = time.time() 
        
        response = chatgpt(message)
        
        end_time = time.time()
        duration = end_time-start_time
        
        # Extract details from the response
        content = response['choices'][0]['message']['content']
        cost = response['usage']
        print(f'Done. {duration:.2f} seconds. \nCost:\n{cost}\n\nResult (length: {len(content)}): {content[:50]}...{content[-20:]}\n\n')

        # Update output with details
        output["messages"][key] = message
        output["responses"][key] = response
        output["costs"][key] = cost
        
        # Check and handle possible JSON format error
        try:
            output["writings"][key] = json.loads(content, strict=False)['writing']
        except json.JSONDecodeError:
            output["writings"][key] = content

    return output


In [40]:
writing_output = expand_writing(trend, wiki, news, vectorstore, plan)

Writing 'Introduction to the Premier League'...
Done. 2.70 seconds. 
Cost:
{
  "prompt_tokens": 1306,
  "completion_tokens": 70,
  "total_tokens": 1376
}

Result (length: 350): {
  "writing": "In this blog post, we will explore...es it so special."
}


Writing 'Key Facts and Figures'...
Done. 8.68 seconds. 
Cost:
{
  "prompt_tokens": 1312,
  "completion_tokens": 255,
  "total_tokens": 1567
}

Result (length: 1228): {
  "writing": "The Premier League, also known as ...iverpool with 1." 
}


Writing 'Top Contenders of the Current Season'...
Done. 5.00 seconds. 
Cost:
{
  "prompt_tokens": 1310,
  "completion_tokens": 251,
  "total_tokens": 1561
}

Result (length: 1279): {
  "writing": "The Premier League is renowned for...ce for the title."
}


Writing 'Exciting Matches and Memorable Moments'...
Done. 10.69 seconds. 
Cost:
{
  "prompt_tokens": 1308,
  "completion_tokens": 320,
  "total_tokens": 1628
}

Result (length: 1711): {"writing": "The Premier League is a thrilling and... across the

In [41]:
def final_touch(writing_output):

    # put the article together    
    writings = [value for _, value in writing_output['writings'].items()]
    titles = [value for value in writing_output['section_titles']]

    parts = []
    for writing, title in zip(writings, titles):
        parts.append(f"{title}\n{writing}")
    raw_article = "\n\n".join(parts)  
    

    # improve
    
    instruction = """People are searching "{keyword}" like crazy today, so I wrote a blog post for the topic.
                Help me revise my blog post: 
                1. makesure appropriate tone, like a human talking 
                2. well-structured and coherent
                3. readable and interesting
                4. add emotions where possible
                5. add professionalism where possible
                6. add numbers where possible"""
    
    context = f"""my blog post:\n"{raw_article}" """
    
    message = [
            {"role": "system", "content": """You are a very helpful and passionate writer. 
                                            Keep in mind: 
                                            1. answer precise and concise
                                            2. user has known you are a language model
                                            3. always answer JSON object"""},
            {"role": "user", "content": instruction},
            {"role": "user", "content": context},
            {"role": "user", "content": """output JSON object: {"blog_title": "","blog_content": "", "word_count":""}"""}
        ]
    
    # getting response from ChatGPT
    start_time = time.time() 
    try:
        print("Starting...")
        response = chatgpt(message, max_tokens=2000)
    except openai.error.APIConnectionError as e:
        print("ChatGPT Disconnected", e)
        print('trying again')
        response = chatgpt(message)

    end_time = time.time()
    running_time = end_time - start_time
    print(f"- Finished! | time: {running_time:.5f} seconds - ")
    
    # save outputs
    content = response['choices'][0]['message']['content']
    cost = response['usage']
    print("- Cost tokens:\n", cost)

    try:
        json_content = json.loads(content, strict=False)
        blog_title = json_content['blog_title']
        blog_content = json_content['blog_content']
        word_count = json_content['word_count']
        blog_post = "\n\n".join([blog_title, blog_content])
        output = {
            "message": message, 
            "response": response,
            "cost": cost,
            "word_count": word_count,
            "answer": blog_post
        }
        
    except json.JSONDecodeError as e:
        print("Content is NOT JSON!!!!")
        output = {
            "message": message, 
            "response": response,
            "cost": cost,
            "answer": content
        }

    return output    


In [43]:
blog_post = final_touch(writing_output)

- Finished! | time: 53.02400 seconds - 
- Cost tokens:
 {
  "prompt_tokens": 1759,
  "completion_tokens": 1631,
  "total_tokens": 3390
}


In [44]:
print(blog_post['answer'])

The Premier League Phenomenon

In this blog post, we will explore the exciting world of the Premier League, a popular and highly competitive soccer league in England. With its rich history and passionate fanbase, the Premier League has become a global phenomenon. Let's dive into the fascinating world of the Premier League and discover what makes it so special.

Key Facts and Figures
The Premier League, also known as The Football Association Premier League, is the highest level of the English football league system. It was founded on February 20, 1992, after the decision of First Division clubs to break away from the English Football League. The league consists of 20 clubs and operates on a system of promotion and relegation with the English Football League (EFL). Each team plays 38 matches, both home and away, during the season which typically runs from August to May. Matches are mostly played on weekends, with occasional weekday evening fixtures.

The Premier League holds the top spot