# Search with LLMs: Recent News Summarizer
---
In this example, we will build a LLM-based news summarizer app with the Exa API to keep us up-to-date with the latest news on a given topic.

This Jupyter notebook is available on [Colab](https://colab.research.google.com/drive/1w1WaPpbdm8fAPw_B5M0U-GHIZpcIbUaw) for easy experimentation. You can also [check it out on Github](https://github.com/exa-labs/exa-py/tree/master/examples/newssummarizer/summarizer.ipynb), including a [plain Python version](https://github.com/exa-labs/exa-py/tree/master/examples/newssummarizer/summarizer.py) if you want to skip to a complete product.

To play with this code, first we need a [Exa API key](https://dashboard.exa.ai/overview) and an [OpenAI API key](https://platform.openai.com/api-keys). Get 1000 Exa searches per month free just for [signing up](https://dashboard.exa.ai/overview)!


In [None]:
# install Exa and OpenAI SDKs
!pip install exa_py
!pip install openai

Collecting exa_py
  Downloading exa_py-1.0.13-py3-none-any.whl (8.6 kB)
Collecting openai>=1.10.0 (from exa_py)
  Downloading openai-1.35.13-py3-none-any.whl (328 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m328.5/328.5 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai>=1.10.0->exa_py)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai>=1.10.0->exa_py)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai>=1.10.0->exa_py)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m6.

In [None]:
from google.colab import userdata # comment this out if you're not using Colab

EXA_API_KEY = userdata.get('EXA_API_KEY') # replace with your api key, or add to Colab Secrets
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY') # replace with your api key, or add to Colab Secrets

### Retrieving news with Exa
Let's use the Exa neural search engine to search the web for relevant links to the user's question.

First, we ask the LLM to generate a search engine query based on the question.

In [None]:
import openai
from exa_py import Exa

openai.api_key = OPENAI_API_KEY
exa = Exa(EXA_API_KEY)

SYSTEM_MESSAGE = "You are a helpful assistant that generates search queries based on user questions. Only generate one search query."
USER_QUESTION = "What's the recent news in physics this week?"

completion = openai.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": SYSTEM_MESSAGE},
        {"role": "user", "content": USER_QUESTION},
    ],
)

search_query = completion.choices[0].message.content

print("Search query:")
print(search_query)

Search query:
"Recent news in physics this week"


Looks good! Now let's put the search query into Exa. Let's also use `start_published_date` to filter the results to pages published in the last week. Notice that we set `use_autoprompt=True` which lets the Exa API further optimize our search query for best results. Essentially, there is a special way to format Exa queries for best results, which `autoprompt` does automatically

In [None]:
from datetime import datetime, timedelta

one_week_ago = (datetime.now() - timedelta(days=7))
date_cutoff = one_week_ago.strftime("%Y-%m-%d")

search_response = exa.search_and_contents(
    search_query, use_autoprompt=True, start_published_date=date_cutoff
)

urls = [result.url for result in search_response.results]
print("URLs:")
for url in urls:
    print(url)

URLs:
https://phys.org/news/2024-07-scientists-successfully-crystal-giant-atoms.html
https://phys.org/news/2024-07-multimode-coupler-advances-scalable-quantum.html?utm_source=twitter.com&utm_medium=social&utm_campaign=v2
https://physicsworld.com/a/matter-wave-interferometry-puts-new-limits-on-chameleon-particles/
https://phys.org/news/2024-07-method-tenfold-quantum-coherence-destructive.html?utm_source=twitter.com&utm_medium=social&utm_campaign=v2
https://www.sciencedaily.com/releases/2024/06/240628125241.htm
https://phys.org/news/2024-07-webb-captures-staggering-quasar-galaxy.html?utm_source=twitter.com&utm_medium=social&utm_campaign=v2
https://www.nature.com/articles/d41586-024-02134-w
https://phys.org/news/2024-07-incompletely-rifted-microcontinent-greenland-canada.html?utm_source=twitter.com&utm_medium=social&utm_campaign=v2
https://phys.org/news/2024-07-archaeologists-ancient-temple-theater-peru.html?utm_source=twitter.com&utm_medium=social&utm_campaign=v2
https://news.umich.edu/a

Now we're getting somewhere! Exa gave our app a list of relevant, useful URLs based on the original question.

By the way, we might be wondering what makes Exa special. Why can't we just search with Google? Well, [let's take a look for ourselves](https://www.google.com/search?q=Recent+breakthroughs+in+physics+news) at the Google search results. It gives us the front page of lots of news aggregators, but not the news articles themselves. And since we used Exa's `search_and_contents`, our search came with the webpage contents, so can use Exa to skip writing a web crawler and access the knowledge directly!

In [None]:
results = search_response.results
result_item = results[0]
print(f"{len(results)} items total, printing the first one:")
print(result_item.text)

10 items total, printing the first one:
Experimental protocol and mean-field phase diagram. Credit: Nature Physics (2024). DOI: 10.1038/s41567-024-02542-9

A crystal is an arrangement of atoms that repeats itself in space, in regular intervals: At every point, the crystal looks exactly the same. In 2012, Nobel Prize winner Frank Wilczek raised the question: Could there also be a time crystal—an object that repeats itself not in space but in time? And could it be possible that a periodic rhythm emerges, even though no specific rhythm is imposed on the system and the interaction between the particles is completely independent of time?
For years, Frank Wilczek's idea has caused much controversy. Some considered time crystals to be impossible in principle, while others tried to find loopholes and realize time crystals under certain special conditions.
Now, a particularly spectacular kind of time crystal has successfully been created at Tsinghua University in China, with the support from TU

Awesome! That's really interesting, or it would be if we had bothered to read it all. But there's no way we're doing that, so let's ask the LLM to summarize it for us:

In [None]:
import textwrap

SYSTEM_MESSAGE = "You are a helpful assistant that briefly summarizes the content of a webpage. Summarize the users input."

completion = openai.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": SYSTEM_MESSAGE},
        {"role": "user", "content": result_item.text},
    ],
)

summary = completion.choices[0].message.content

print(f"Summary for {urls[0]}:")
print(result_item.title)
print(textwrap.fill(summary, 80))

Summary for https://phys.org/news/2024-07-scientists-successfully-crystal-giant-atoms.html:
Scientists successfully create a time crystal made of giant atoms
The webpage discusses the creation of a time crystal made of giant atoms by a
team at Tsinghua University in China, with support from TU Wien in Austria. The
experiment involved using laser light and Rydberg atoms in a gas of rubidium
atoms to generate spontaneous oscillations between atomic states, resulting in
regular light intensity patterns. This breakthrough offers a deeper
understanding of time crystals and potential applications in sensor
technologies.


And we're done! We built an app that translates a question into a search query, uses Exa to search for useful links, uses Exa to grab clean content from those links, and summarizes the content to effortlessly answer your question about the latest news, or whatever we want.

We can be sure that the information is fresh, we have the source in front of us, and we did all this with an Exa query and LLM call. No web scraping or crawling needed!

**Through Exa, we have given our LLM access to the entire Internet.** The possibilities are endless.