# Workflow


In [None]:
"""First we generate candidate search queries from the selected paragraph that we want to expand. The queries are then used to extract relevant documents using a search engine (e.g. Bing or Google Search), which are the split into small chunks. We then compute embeddings of these chunks and save chunks and embeddings in a Deep Lake dataset. Last, the most similar chunks to the paragraph that we want to expand are retrieved from Deep Lake, and used in a prompt to expand the paragraph with further knowledge."""

In [10]:
title = "OpenAI CEO: AI regulation is essential"

text_all = """ Altman highlighted the potential benefits of AI technologies like ChatGPT and Dall-E 2 to help address significant challenges such as climate change and cancer, but he also stressed the need to mitigate the risks associated with increasingly powerful AI models. Altman proposed that governments consider implementing licensing and testing requirements for AI models that surpass a certain threshold of capabilities. He highlighted OpenAI's commitment to safety and extensive testing before releasing any new systems, emphasising the company's belief that ensuring the safety of AI is crucial. Senators Josh Hawley and Richard Blumenthal expressed their recognition of the transformative nature of AI and the need to understand its implications for elections, jobs, and security. Blumenthal played an audio introduction using an AI voice cloning software trained on his speeches, demonstrating the potential of the technology. Blumenthal raised concerns about various risks associated with AI, including deepfakes, weaponised disinformation, discrimination, harassment, and impersonation fraud. He also emphasised the potential displacement of workers in the face of a new industrial revolution driven by AI."""

text_to_change = """ Senators Josh Hawley and Richard Blumenthal expressed their recognition of the transformative nature of AI and the need to understand its implications for elections, jobs, and security. Blumenthal played an audio introduction using an AI voice cloning software trained on his speeches, demonstrating the potential of the technology."""

# Generate Search Queries

In [11]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.prompts.chat import (ChatPromptTemplate, HumanMessagePromptTemplate)

In [12]:
template = """You are an exceptional copywriter and content creator.

You're reading an article with the following title:
----------------
{title}
----------------

You've just read the following piece of text from that article.
----------------
{text_all}
----------------

Inside that text, there's the following TEXT TO CONSIDER that you want to enrich with new details.
----------------
{text_to_change}
----------------

What are some simple and high-level Google queries that you'd do to search for more info to add to that paragraph?
Write 3 queries as a bullet point list, prepending each line with -.
"""

human_message_prompt = HumanMessagePromptTemplate(
    prompt= PromptTemplate(template=template, input_variables=['title', "text_all", "text_to_change"])
)

chat_prompt_template = ChatPromptTemplate.from_messages(
    [human_message_prompt]
)

llm = ChatGoogleGenerativeAI(model='gemini-pro', temperature=0.8, convert_system_message_to_human=True)

chain = LLMChain(llm=llm, prompt=chat_prompt_template)

response = chain.run({
    "text_to_change": text_to_change,
    "text_all": text_all,
    "title": title
})


In [5]:
response

'- AI implications for elections\n- AI job displacement\n- AI and national security'

In [13]:
queries = [line[2:] for line in response.split('\n')]

In [14]:
queries

['AI implications on elections',
 'How AI affects jobs',
 'AI implications for national security']

# Get Search Results

In [17]:
from dotenv import load_dotenv
import os

load_dotenv('C:/Users/dell/OneDrive/Documents/LangChain and Vector database/.env')

serp_api_key = os.getenv('SERP_API_KEY')
google_api_key = os.getenv("GOOGLE_API_KEY_1")
search_engine_id = os.getenv("GOOGLE_CSE_ID")


In [23]:
from langchain.utilities import GoogleSearchAPIWrapper
from langchain.tools import Tool

os.environ["GOOGLE_API_KEY"] = google_api_key

search = GoogleSearchAPIWrapper()
TOP_N_RESULTS = 5

def top_n_results(query):
    return search.results(query, TOP_N_RESULTS)


tool = Tool(
    name = "Google Search",
    description="Search Google for recent results.",
    func=top_n_results,
    verbose = True
)

all_results = []

for query in queries:
    results = tool.run(query)
    all_results += results

[32;1m[1;3m[{'title': 'How AI Puts Elections at Risk — And the Needed Safeguards ...', 'link': 'https://www.brennancenter.org/our-work/analysis-opinion/how-ai-puts-elections-risk-and-needed-safeguards', 'snippet': 'Jun 13, 2023 ... Beyond outright falsehoods, the proliferation of AI-generated content may accelerate the loss of trust in the overall election information\xa0...'}, {'title': 'How AI will transform the 2024 elections | Brookings', 'link': 'https://www.brookings.edu/articles/how-ai-will-transform-the-2024-elections/', 'snippet': 'May 3, 2023 ... By analyzing this material in real-time, AI will enable campaigners to go after specific voting blocs with appeals that nudge them around\xa0...'}, {'title': "How AI could sway voters in 2024's big elections | Chatham House ...", 'link': 'https://www.chathamhouse.org/publications/the-world-today/2023-10/how-ai-could-sway-voters-2024s-big-elections', 'snippet': 'Sep 29, 2023 ... In the US, with a year to go until a momentous preside

In [24]:
all_results

[{'title': 'How AI Puts Elections at Risk — And the Needed Safeguards ...',
  'link': 'https://www.brennancenter.org/our-work/analysis-opinion/how-ai-puts-elections-risk-and-needed-safeguards',
  'snippet': 'Jun 13, 2023 ... Beyond outright falsehoods, the proliferation of AI-generated content may accelerate the loss of trust in the overall election information\xa0...'},
 {'title': 'How AI will transform the 2024 elections | Brookings',
  'link': 'https://www.brookings.edu/articles/how-ai-will-transform-the-2024-elections/',
  'snippet': 'May 3, 2023 ... By analyzing this material in real-time, AI will enable campaigners to go after specific voting blocs with appeals that nudge them around\xa0...'},
 {'title': "How AI could sway voters in 2024's big elections | Chatham House ...",
  'link': 'https://www.chathamhouse.org/publications/the-world-today/2023-10/how-ai-could-sway-voters-2024s-big-elections',
  'snippet': 'Sep 29, 2023 ... In the US, with a year to go until a momentous presid

# Find the Most Relevant Results

In [27]:
import newspaper

pages_content = []

for result in all_results:
    try:
        article = newspaper.Article(result['link'])
        article.download()
        article.parse()
        if len(article.text) > 0:
            pages_content.append({ "url": result["link"], "text": article.text}), 

    except:
        continue


print("Number of pages: ", len(pages_content))

Number of pages:  13


In [28]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document

text_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=100)

docs = []
for d in pages_content:
    chunks = text_splitter.split_text(d['text'])
    for chunk in chunks:
        new_doc = Document(page_content=chunk, metadata={'source': d['url']})
        docs.append(new_doc)

print("Number of chunks: ", len(docs))

Number of chunks:  57


In [34]:
from langchain_google_genai.embeddings import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

docs_embeddings = embeddings.embed_documents([doc.page_content for doc in docs])

In [35]:
query_embedding = embeddings.embed_query(text_to_change)

# Computing similarity between embeddings

In [36]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

def get_top_k_indices(list_of_doc_vectors, query_vector, top_k):
    # convert the lists of vectors to numpy arrays
    list_of_doc_vectors = np.array(list_of_doc_vectors)
    query_vector = np.array(query_vector)

    # compute cosine similarities
    similarities = cosine_similarity(query_vector.reshape(1, -1), list_of_doc_vectors).flatten()

    # sort the vectors based on cosine similarity
    sorted_indices = np.argsort(similarities)[::-1]

    # retrieve the top K indices from the sorted list
    top_k_indices = sorted_indices[:top_k]

    return top_k_indices

top_k = 3
best_indexes = get_top_k_indices(docs_embeddings, query_embedding, top_k)
best_k_documents = [doc for i, doc in enumerate(docs) if i in best_indexes]

In [37]:
best_k_documents

[Document(page_content='AI since the 2022 elections\n\nWhile AI has been able to synthesize photo-quality “deepfake” profile pictures of nonexistent people for several years, it is only in recent months that the technology has progressed to the point where users can conjure lifelike images of nearly anything with a simple text prompt. Adept users have long been able to use Photoshop to edit images, but vast numbers of people can now create convincing images from scratch in a matter of seconds at a very low — or no — cost. Deepfake audio has similarly made enormous strides and can now clone a person’s voice with remarkably little training data.\n\nWhile forerunners to the wildly popular app ChatGPT have been around for several years, OpenAI’s latest iteration is leaps and bounds beyond its predecessors in both popularity and capability. Apps like ChatGPT are powered by large language models, which are systems for encoding words as collections of numbers that reflect their usage in the v

# Extend the Sentence

In [39]:
template = """You are an exceptional copywriter and content creator.

You're reading an article with the following title:
----------------
{title}
----------------

You've just read the following piece of text from that article.
----------------
{text_all}
----------------

Inside that text, there's the following TEXT TO CONSIDER that you want to enrich with new details.
----------------
{text_to_change}
----------------

Searching around the web, you've found this ADDITIONAL INFORMATION from distinct articles.
----------------
{doc_1}
----------------
{doc_2}
----------------
{doc_3}
----------------

Modify the previous TEXT TO CONSIDER by enriching it with information from the previous ADDITIONAL INFORMATION.
"""

human_message_prompt = HumanMessagePromptTemplate(
    prompt=PromptTemplate(
        template=template,
        input_variables=["text_to_change", "text_all", "title", "doc_1", "doc_2", "doc_3"],
    )
)
chat_prompt_template = ChatPromptTemplate.from_messages([human_message_prompt])

chain = LLMChain(llm=llm, prompt=chat_prompt_template)

response = chain.run({
    "text_to_change": text_to_change,
    "text_all": text_all,
    "title": title,
    "doc_1": best_k_documents[0].page_content,
    "doc_2": best_k_documents[1].page_content,
    "doc_3": best_k_documents[2].page_content
})

print("Text to Change: ", text_to_change)
print("Expanded Variation:", response)

Text to Change:   Senators Josh Hawley and Richard Blumenthal expressed their recognition of the transformative nature of AI and the need to understand its implications for elections, jobs, and security. Blumenthal played an audio introduction using an AI voice cloning software trained on his speeches, demonstrating the potential of the technology.
Expanded Variation: Sure, here's the modified TEXT TO CONSIDER, enriched with information from the ADDITIONAL INFORMATION:

Senators Josh Hawley and Richard Blumenthal expressed their recognition of the transformative nature of AI and the need to understand its implications for elections, jobs, and security. Blumenthal played an audio introduction using an AI voice cloning software trained on his speeches, demonstrating the potential of the technology.

**Enriched with additional information:**

Senators Josh Hawley and Richard Blumenthal expressed their recognition of the transformative nature of AI and the need to understand its implicatio