# Enhance Blog Content Automation with LangChain and Google Search

### Providing the Title, Context & Text that we want to expand 

In [1]:
"""THREE VARIABLES IN TOTAL"""

#Article's Title
title = "OpenAI CEO: AI regulation ‘is essential’"
# Content
text_all = """ Altman highlighted the potential benefits of AI technologies like ChatGPT and Dall-E 2 to help address significant challenges such as climate change and cancer, but he also stressed the need to mitigate the risks associated with increasingly powerful AI models. Altman proposed that governments consider implementing licensing and testing requirements for AI models that surpass a certain threshold of capabilities. He highlighted OpenAI’s commitment to safety and extensive testing before releasing any new systems, emphasising the company’s belief that ensuring the safety of AI is crucial. Senators Josh Hawley and Richard Blumenthal expressed their recognition of the transformative nature of AI and the need to understand its implications for elections, jobs, and security. Blumenthal played an audio introduction using an AI voice cloning software trained on his speeches, demonstrating the potential of the technology. Blumenthal raised concerns about various risks associated with AI, including deepfakes, weaponised disinformation, discrimination, harassment, and impersonation fraud. He also emphasised the potential displacement of workers in the face of a new industrial revolution driven by AI."""
# Part of the text we want to expand upon
text_to_change = """ Senators Josh Hawley and Richard Blumenthal expressed their recognition of the transformative nature of AI and the need to understand its implications for elections, jobs, and security. Blumenthal played an audio introduction using an AI voice cloning software trained on his speeches, demonstrating the potential of the technology."""

### Instailling Required Libraries

In [2]:
!pip install langchain==0.0.208 deeplake openai==0.27.8 tiktoken



In [3]:
!pip install -q newspaper3k==0.2.8

### Setting Up OPENAI_API_KEY

In [4]:
import os
os.environ['OPENAI_API_KEY']='sk-proj-K74wTeg5dakjdls;asdnXovwvk5OCUFVxIm'

os.environ['ACTIVELOOP_TOKEN']='eyJhbGciOiJub25ldkslkdsa;ksdldsklkdfpZnRhaW4iLCJhcGlfa2V5IjoiLXVpdU9rN0pOT18zZ3ZnTFk1TTEtVGxEVUdTUDcwTlNEWGRMbzIzcVVWTjJlIn0.'

### Generating Search Queries

In [5]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
)

template = """ You are an exceptional copywriter and content creator.

You're reading an article with the following title:
----------------
{title}
----------------

You've just read the following piece of text from that article.
----------------
{text_all}
----------------

Inside that text, there's the following TEXT TO CONSIDER that you want to enrich with new details.
----------------
{text_to_change}
----------------

What are some simple and high-level Google queries that you'd do to search for more info to add to that paragraph?
Write 3 queries as a bullet point list, prepending each line with -.
"""

human_message_prompt = HumanMessagePromptTemplate(
    prompt=PromptTemplate(
        template=template,
        input_variables=["text_to_change", "text_all", "title"],
    )
)
chat_prompt_template = ChatPromptTemplate.from_messages([human_message_prompt])

### Initiating the model and temperature

In [6]:
# model’s temperature argument to 0.9 which makes it highly creative, so it generates more diverse results
chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.9)
chain = LLMChain(llm=chat, prompt=chat_prompt_template)

response = chain.run({
    "text_to_change": text_to_change,
    "text_all": text_all,
    "title": title
})

In [7]:
queries = [line[2:] for line in response.split("\n")]
print(queries)

['AI voice cloning software uses in politics', 'AI technology impact on elections and security', "Senators' perspectives on AI regulation and ethics"]


### Setting up GOOGLE_API_KEY & GOOGLE_CSE_ID

In [8]:
import os
os.environ["GOOGLE_API_KEY"] = "AIzaSyASkjdsklk;lkl;;R7Liu4p0cPp4i8IrzspA"
os.environ["GOOGLE_CSE_ID"] = "25dkdsll;;;;;;;4ac1"

### Getting  Search Results

In [9]:
from langchain.tools import Tool
from langchain.utilities import GoogleSearchAPIWrapper

# Remember to set the "GOOGLE_CSE_ID" and "GOOGLE_API_KEY" environment variable.
search = GoogleSearchAPIWrapper()
TOP_N_RESULTS = 5

def top_n_results(query):
    return search.results(query, TOP_N_RESULTS)

tool = Tool(
    name = "Google Search",
    description="Search Google for recent results.",
    func=top_n_results
)

all_results = []

for query in queries:
    results = tool.run(query)
    all_results += results

In [17]:
# print(all_results)

In [18]:
# all_results variable holds 15 web addresses. (3 queries from ChatGPT x 5 top Google search results)
formatted_results = [
    f"Title: {result['title']}\nLink: {result['link']}\nSnippet: {result['snippet']}"
    for result in all_results
]
# Join the formatted results with double newline characters and print them
print("\n\n".join(formatted_results))

Title: AI voice clones mimic politicians and celebrities, reshaping reality
Link: https://www.washingtonpost.com/technology/2023/10/13/ai-voice-cloning-deepfakes/
Snippet: Oct 13, 2023 ... Meanwhile, few software companies have guardrails to prevent illicit use. Previously, voice cloning software churned out robotic ...

Title: My Journey Inside ElevenLabs' Voice-Clone Factory - The Atlantic
Link: https://www.theatlantic.com/technology/archive/2024/05/elevenlabs-ai-voice-cloning-deepfakes/678288/
Snippet: May 4, 2024 ... ... voice but the way the software uses context clues to modulate delivery. If ... ” Voice cloning already existed, but it was crude: I used an AI ...

Title: Tests find AI tools readily create election lies from the voices of well ...
Link: https://apnews.com/article/artificial-intelligence-audio-voice-cloning-elections-2024-2500813b642169478c27c168aab1b3e3
Snippet: May 31, 2024 ... ... AI voice cloning tool. One tool, Invideo AI, not only created the fake ... used it

In [21]:
# """
# The all_results variable contains 15 web addresses (3 queries from ChatGPT x 5 top Google search results). However, it's not optimal to use all the contents due to several considerations:

# 1. Token Limitations: LLMs have input length restrictions ranging from 2K to 4K tokens, depending on the model. While different chain types can overcome this, adhering to the model's window size is more efficient and yields better outcomes.

# 2. Cost: Providing more words to the API increases costs, as model usage is determined by token count. Dividing a prompt into multiple chains is possible but costly.

# 3. Relevance: The content from stored search results will likely be similar in context, so it's better to use the most relevant results to maintain efficiency and relevance.
# """

### Finding  the Most Relevant Results


In [23]:
# newspaper package can extract the contents of a web link using the .parse() method

In [24]:
import newspaper

pages_content = []

for result in all_results:
	try:
		article = newspaper.Article(result["link"])
		article.download()
		article.parse()

		if len(article.text) > 0:
			pages_content.append({ "url": result["link"], "text": article.text })
	except:
		continue

print("Number of pages: ", len(pages_content))

Number of pages:  15


In [25]:
# The output above shows that 14/15 pages were processed while we expected 15.
#  There are specific scenarios in which the newspaper library may encounter
#  difficulties extracting information.
#  These include search results that lead to a PDF file
#  or websites that restrict access to web scraping.

### split the saved contents into smaller chunks to ensure the articles do not exceed the model’s input length.


In [27]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document

text_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=100)

docs = []
for d in pages_content:
    chunks = text_splitter.split_text(d["text"])
    for chunk in chunks:
        new_doc = Document(page_content=chunk, metadata={ "source": d["url"] })
        docs.append(new_doc)

print("Number of chunks: ", len(docs))

Number of chunks:  72


In [None]:
# 72 chunks of data are in the docs variable

### Time to find the most relevant chunks to pass them as context to the large language model


In [28]:
# will use OpenAI to convert the texts into vector space that holds semantics

In [29]:
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
# Embedding document chunks
docs_embeddings = embeddings.embed_documents([doc.page_content for doc in docs])
# Embedding desired sentence from the main article that was chosen for expansion
query_embedding = embeddings.embed_query(text_to_change)

In [30]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

def get_top_k_indices(list_of_doc_vectors, query_vector, top_k):
    # convert the lists of vectors to numpy arrays
    list_of_doc_vectors = np.array(list_of_doc_vectors)
    query_vector = np.array(query_vector)

    # compute cosine similarities
    similarities = cosine_similarity(query_vector.reshape(1, -1), list_of_doc_vectors).flatten()

    # sort the vectors based on cosine similarity
    sorted_indices = np.argsort(similarities)[::-1]

    # retrieve the top K indices from the sorted list
    top_k_indices = sorted_indices[:top_k]

    return top_k_indices

top_k = 3
best_indexes = get_top_k_indices(docs_embeddings, query_embedding, top_k)
best_k_documents = [doc for i, doc in enumerate(docs) if i in best_indexes]

### Top 3 documents as specified

In [34]:
best_k_documents

[Document(page_content='The other companies cited in the report didn’t respond to emailed requests for comment.\n\nThe findings come after AI-generated audio clips already have been used in attempts to sway voters in elections across the globe.\n\nIn fall 2023, just days before Slovakia’s parliamentary elections, audio clips resembling the voice of the liberal party chief were shared widely on social media. The deepfakes purportedly captured him talking about hiking beer prices and rigging the vote.\n\nEarlier this year, AI-generated robocalls mimicked Biden’s voice and told New Hampshire primary voters to stay home and “save” their votes for November. A New Orleans magician who created the audio for a Democratic political consultant demonstrated to the AP how he made it, using ElevenLabs software.\n\nExperts say AI-generated audio has been an early preference for bad actors, in part because the technology has improved so quickly. Only a few seconds of real audio are needed to create a

In [36]:
# """We can now define the prompt using the additional information from Google search.
# There are six input variables in the template:

# 1.title that holds the main article’s title;
# 2.text_all to present the whole article we are working on;
# 3.text_to_change is the selected part of the article that requires expansion;
# 4.doc_1, 5.doc_2, 6.doc_3 to include the close Google search results as context."""


### Initiating the template according to the new extended sentence

In [37]:
template = """You are an exceptional copywriter and content creator.

You're reading an article with the following title:
----------------
{title}
----------------

You've just read the following piece of text from that article.
----------------
{text_all}
----------------

Inside that text, there's the following TEXT TO CONSIDER that you want to enrich with new details.
----------------
{text_to_change}
----------------

Searching around the web, you've found this ADDITIONAL INFORMATION from distinct articles.
----------------
{doc_1}
----------------
{doc_2}
----------------
{doc_3}
----------------

Modify the previous TEXT TO CONSIDER by enriching it with information from the previous ADDITIONAL INFORMATION.
"""

human_message_prompt = HumanMessagePromptTemplate(
    prompt=PromptTemplate(
        template=template,
        input_variables=["text_to_change", "text_all", "title", "doc_1", "doc_2", "doc_3"],
    )
)
chat_prompt_template = ChatPromptTemplate.from_messages([human_message_prompt])


In [38]:
chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.9)
chain = LLMChain(llm=chat, prompt=chat_prompt_template)

response = chain.run({
    "text_to_change": text_to_change,
    "text_all": text_all,
    "title": title,
    "doc_1": best_k_documents[0].page_content,
    "doc_2": best_k_documents[1].page_content,
    "doc_3": best_k_documents[2].page_content
})

### Final Output

In [42]:
print("Text to Change: ", text_to_change)
print("Expanded Variation:", response)

Text to Change:   Senators Josh Hawley and Richard Blumenthal expressed their recognition of the transformative nature of AI and the need to understand its implications for elections, jobs, and security. Blumenthal played an audio introduction using an AI voice cloning software trained on his speeches, demonstrating the potential of the technology.
Expanded Variation: Senators Josh Hawley and Richard Blumenthal expressed their recognition of the transformative nature of AI and the need to understand its implications for elections, jobs, and security. Blumenthal played an audio introduction using an AI voice cloning software trained on his speeches, demonstrating the potential of the technology. This demonstration highlights the growing concerns around AI-generated audio clips being used to deceive and manipulate voters in elections worldwide. Recent incidents, like the sharing of deepfake audio clips resembling political figures discussing controversial topics or the use of AI-generate

In [43]:
def split_sentences(text):
    sentences = text.split('. ')
    return '.\n'.join(sentences) + '.'

In [46]:
# Print the results
print("Original Text:")
print(split_sentences(text_to_change))
print("\nExpanded Variation:")
print(split_sentences(response))

Original Text:
 Senators Josh Hawley and Richard Blumenthal expressed their recognition of the transformative nature of AI and the need to understand its implications for elections, jobs, and security.
Blumenthal played an audio introduction using an AI voice cloning software trained on his speeches, demonstrating the potential of the technology..

Expanded Variation:
Senators Josh Hawley and Richard Blumenthal expressed their recognition of the transformative nature of AI and the need to understand its implications for elections, jobs, and security.
Blumenthal played an audio introduction using an AI voice cloning software trained on his speeches, demonstrating the potential of the technology.
This demonstration highlights the growing concerns around AI-generated audio clips being used to deceive and manipulate voters in elections worldwide.
Recent incidents, like the sharing of deepfake audio clips resembling political figures discussing controversial topics or the use of AI-generate

# Thank You