### Semantic News Search
This will allow users to enter a prompt about what news they are interested in and then will output the most similar articles found using the Media Stack API
##### Methodology
1. User enters a prompt
2. A few-shot prompt identifies the parameters found in the users input that can be used for the Media Stack API call
3. Process the LLM output to turn it into the appropriate format for the Media Stack API
4. Validate the API call to make sure the LLM is not adding incorrect or invalid parameters
5. Call the API and get the top responses from Media Stack
6. Calculate the similarity between the User Input and each Article
7. Display the most similar articles as the final response

In [5]:
## Import the libraries
from langchain import PromptTemplate, FewShotPromptTemplate, HuggingFaceHub
from langchain.chains import LLMChain
from sentence_transformers import SentenceTransformer, util
import numpy as np
import torch
import os
from IPython.display import Markdown as md
from mediastack import get_news

In [6]:
USER_INPUT = "What are new AI and technology achievements?"
number_of_articles_to_display = 5
API_TOKEN = os.getenv("HUGGINGFACE_API_TOKEN")
## If you do not have a HuggingFace API Token, you can create one for free fairly easily, I'm sticking to free LLMs for this project
##   This limits me to only use an LLM that is fairly small (<10GB), so using a more advanced LLM would ensure better results
##   Note: adding an API token into a notebook is not the standard practice since it is not secure, I can also send you mine if you need it


In [8]:
llm = HuggingFaceHub(repo_id="bigscience/bloom", huggingfacehub_api_token = API_TOKEN)

I chose to use BLOOM since it performed the best at extracting the parameters compared to the other freely available LLMs.

A few-shot prompt works well here since it allows me to have more control over the output of the model, which helps make sure that the output can be parsed properly and doesn't include incorrect parameters.

In [19]:
available_parameters = """
categories must be taken from this list : [ general, business, entertainment, health, science, sports, technology ]
categories can be excluded by prefixing a '-', such as -entertainment to exclude entertainment.
"""
examples = [
    { "input": "I want to know about new technologies that companies are using to make money",
     "parameters":"""
    categories:technology,business
    keywords:new technologies money"""
       },

    { "input": "Give me news about achievements in health science",
     "parameters":"""
    categories:health,science",
    keywords:health science achievements"""
       },

    { "input": "What are the best non-sports movies to go see in theaters now",
     "parameters":"""
    categories:entertainment,-sports",
    keywords:movies theaters"""
       },
       
]

example_template = """
input: {input},
parameters: {parameters}\n"""

example_prompt = PromptTemplate(input_variables=['input','parameters'], template=example_template)

parameter_parser_template = FewShotPromptTemplate(
    examples = examples,
    example_prompt=example_prompt,
    prefix = f"Extract the parameters out of the user's prompt and output the parameters in json format. The parameters should include categories and keywords. {available_parameters}",
    suffix="input: {input}\nparameters:",
    input_variables=["input"],
    example_separator="\n"
)

In [20]:
print("Here is the prompt that will extract the parameters from the user's input:\n\n")
print(parameter_parser_template.format(input=USER_INPUT))

Here is the prompt that will extract the parameters from the user's input:


Extract the parameters out of the user's prompt and output the parameters in json format. The parameters should include categories and keywords. 
categories must be taken from this list : [ general, business, entertainment, health, science, sports, technology ]
categories can be excluded by prefixing a '-', such as -entertainment to exclude entertainment.


input: I want to know about new technologies that companies are using to make money,
parameters: 
    categories:technology,business
    keywords:new technologies money


input: Give me news about achievements in health science,
parameters: 
    categories:health,science",
    keywords:health science achievements


input: What are the best non-sports movies to go see in theaters now,
parameters: 
    categories:entertainment,-sports",
    keywords:movies theaters

input: What are new AI and technology achievements?
parameters:


In [21]:
chain = LLMChain(llm=llm, prompt = parameter_parser_template)
parameters = chain.run(USER_INPUT)

In [22]:
## After calling the LLM I parse the response to turn it into the parameters to call the media stack API and retrieve the relevant news articles
params = {}
for param in parameters.strip().strip().split("\n"):
    vals = param.split(":")
    if len(vals) == 2:
        params[vals[0].strip()] = vals[1].strip()

In [23]:
print("Here are the parameters extracted from the user prompt and then sent to the API")
print(params)
## Retrieve the top news stories that fit the parameters
## The get_news helper function that I made also includes some validation of the parameters to help ensure I'm not making bad API calls
news = get_news(params)

Here are the parameters extracted from the user prompt and then sent to the API
{'categories': 'technology,general', 'keywords': 'ai technology achievements'}


In [24]:
## Here I am using all-MiniLM-L6-v2 to get the embeddings to find the articles most similar to the users input
model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings = [model.encode(article["title"] +" "+ article["description"]) for article in news]
article_embeddings = np.array(embeddings)
user_embedding = model.encode(USER_INPUT)
similarity = util.cos_sim(user_embedding, article_embeddings)
article_idxs = torch.topk(similarity, number_of_articles_to_display).indices.numpy()

In [25]:
results = []
for idx in article_idxs[0]:
    author = news[idx]['author']
    result_card = f"### {news[idx]['title']}\n\n{news[idx]['description']}\n\n[Read Full Article]({news[idx]['url'].strip('<').strip('>')}) by {author if author is not None else news[idx]['source']}"
    results.append(result_card)

In [26]:
print(f'Based on the User input of: "{USER_INPUT}" here are the top results:')
md("\n\n_________________________________________________\n".join(results))

Based on the User input of: "What are new AI and technology achievements?" here are the top results:


### DeepMind Co-Founder Wants the ‘New Turing Test’ to Be Based on How Good an AI Is at Getting Rich

With artificial intelligence hype leaving venture capitalist firms foaming at the mouth, ready to buy into any new company that sticks an “A” and “I” in its name, we sure as hell need a new way of defining what constitutes an AI. The Turing Test fails to define real intelligence in today’s world of large language…Read more...

[Read Full Article](https://gizmodo.com/deepmind-suleyman-new-turing-test-make-money-1850557322) by Kyle Barr

_________________________________________________
### AI could change the future of yogurt—and turn Danone around

Making the yogurt of the future requires a cast of 21st-century helpers: machine learning, gut science and even a mysterious artificial stomach.

[Read Full Article](https://phys.org/news/2023-06-ai-future-yogurtand-danone.html) by physorg

_________________________________________________
### NTIA Publishes Over 1,400 Comments on Proposed AI Accountability Policy

More than 1,400 written responses were received by the National Telecommunications and Information Administration commenting on its draft artificial intelligence accountability policy. The comments have been published two months after NTIA issued its request for feedback, which will help shape its policy recommendations to ensure earned trust in AI technology. The agency drafted the AI Accountability Policy in an aim to guide the development of assessments, audits, and certifications of AI systems. The document is part of a larger effort by the Biden administration to govern AI-related risks...

[Read Full Article](https://executivegov.com/2023/06/ntia-publishes-over-1400-comments-on-proposed-ai-accountability-policy/) by Jamie Bennet

_________________________________________________
### AI's Impact on Job Market Remains a Mystery, But Rise of Internet May Have Some Clues

Wes Cummins, CEO, Applied Digital, joined TheStreet to discuss how artificial intelligence will impact the job market.

[Read Full Article](https://www.thestreet.com/investing/ais-impact-on-job-market-remains-a-mystery-but-rise-of-internet-may-have-some-clues) by Rebecca Mezistrano

_________________________________________________
### President Biden visits Bay Area, uses social media as cautionary tale to guide AI laws

Worries that government and tech firms will move to slowly on the new generative AI have arisen as the cutting-edge technology makes lightning-fast inroads into many aspects of life and commerce.

[Read Full Article](https://www.mercurynews.com/2023/06/20/president-biden-visits-bay-area-uses-social-media-as-cautionary-tale-to-guide-ai-laws/) by Elissa Miolene, Ethan Baron