<a href="https://colab.research.google.com/github/NHirt32/LLM-News-Digest/blob/dev/API_LLM_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project Setup

Setting up the environment with required imports and installations


In [None]:
!pip install python-dotenv
!pip install openai
!pip install jupyter-ui-poll

In [None]:
# For API requests
import requests
# For mounting files and securing sensitive data
from google.colab import drive
import os
from dotenv import load_dotenv
# For OpenAI requests
import openai
# For interactive UI
import ipywidgets as widgets
from IPython.display import display
import time
from jupyter_ui_poll import ui_events

Here I'm mounting a .env file to prevent sharing my API key on my GitHub repository.

In [None]:
drive.mount('/content/drive')
env_path = '/content/drive/MyDrive/API-LLM-Project/.env'
load_dotenv(env_path)
news_key = os.getenv("NEWS_API_KEY")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# News API Setup

Getting user input about desired country.

In [None]:
# Define the dropdown widget with a placeholder option
country_dropdown = widgets.Dropdown(
    options=[
        ('Select a Country', 'select'),
        ('United States', 'US'),
        ('United Kingdom', 'GB'),
        ('Canada', 'CA'),
        ('Australia', 'AU'),
        ('Germany', 'DE'),
        ('France', 'FR'),
    ],
    value='select',  # default value
    description='Country:',
)

display(country_dropdown)

# Essentially a loop that will alternate between polling ui events and sleeping until the user makes a selection
with ui_events() as poll:
  while country_dropdown.value == 'select':
    poll(10)
    time.sleep(0.1)

Dropdown(description='Country:', options=(('Select a Country', 'select'), ('United States', 'US'), ('United Ki…

Setting up API parameters

In [None]:
# Top URLs syntax: url?country={chosen_country}&apiKey=API_KEY
top_headlines_url = 'https://newsapi.org/v2/top-headlines'
temp_country = country_dropdown.value
news_params = {
    'country': temp_country,
    'apiKey': news_key
}

Here, we retrieve a response from the News API. If an error occurs, we halt the execution of the notebook; otherwise, we combine the titles and descriptions into a single, readable string to provide to our LLM.

In [None]:
news_response = requests.get(top_headlines_url, params = news_params)
# If the response from News API is not "OK" stop the execution of the project
if news_response.status_code != 200:
  raise RuntimeError(f"Error: {news_response.status_code}, {news_response.reason}")
# Converting request object to dictionary
news_data = news_response.json()
# Now we combine all titles and descriptions into a single legible string
combined_content = '\n\n'.join(
    f"Title: {article['title']} \n Description{article['description']}"
    for article in news_data['articles']
)

# OpenAI API Setup

Here we setup our OpenAI API parameters.

In [None]:
openai.api_key = os.getenv("OPENAI_API_KEY")
openai_params = {
    'model': 'gpt-4o',
    'messages': [
        {"role": "system", "content": "You are an intelligent summarization assistant tasked with breaking down news articles into the following categories 'Political, Economic, Social Issues, Environment, Technology, and Entertainment:' based on their content. Your goal is to analyze the articles and categorize them into the following segments with the appropriate percentage distribution. List a maximum of 3 articles in each section ranked by the article's importance. Ensure that each category is clearly defined and summarized concisely. Use headings or bullet points for clarity. Do not repeat the same article in multiple categories. Do not consider interplanetary news to be environmental. If there are no articles in a category the % breakdown should be 0% for that article."},
        {"role": "user", "content": f"Analyze the following articles and categorize them appropriately. After provide a structured dictionary format with category names as keys and percentages as values. Example format: ('Politics': 40, 'Economics': 20, ...): {combined_content}"}

    ],
    # Response length variable
    'max_tokens': 1000
}

Now using our API parameters we make a request to OpenAI to summarize the combined content of our News API data.

In [None]:
try:
  openai_response = openai.chat.completions.create(**openai_params)
  # article_summary = openai_response.choices[0].message.content.strip()
  # print(f"Summary of Articles: {article_summary}")
except Exception as e:
  print(f"An error occured: {e}")

In [None]:
article_summary = openai_response.choices[0].message.content.strip()
print(f"{article_summary}")

In [None]:
for i in range(0, 10):
  print(news_data['articles'][i]['title'])