# Using LLMs to classify if social media posts are political or not (Part II).

I'm working on a project that involves gathering social media posts from [Bluesky](https://bsky.app/) and analyzing them. Part of that project requires knowing which posts are about political or social topics, and if so, what political side they support. Current ML classifiers don't work that well out of the box, so I'm trying to create our own classification scheme using LLMs. I'm trying to use LLMs in order to classify [Bluesky](https://bsky.app/) posts as either having political content or not, and if so, the political ideology, and I've found that LLMs work quite well for this task. I've used Llama3-8b and Llama3-70b via [Groq](https://groq.com/) so far, but are also open to experimenting with other open-source models as well (I have the on-prem infrastructure to host our own models, which is much cheaper at scale).

Previously, I've tried using just naive text classification and then afterwards adding context to the classifications. Now that I've shown that individual prompts work, I've also worked on how to include batching and adding some (simple) scaling for the prompts. I'd like to improve the approach even more.

Specifically, there are a few new experiments things that I'd like to try:
- Can we improve the context? Can we add context about current events?
- How does our model perform with other LLMs (e.g., Mixtral)?
- Can we experiment with optimizing the prompt (e.g, with [dspy](https://github.com/stanfordnlp/dspy))?

In this notebook, I'll go over the first point:
- Can we improve the context? Can we add context about current events?

### Adding context about current events

LLMs have a hard knowledge cutoff date. Although they'll generally encode world knowledge, LLMs won't have specific knowledge of what's currently in the news. If we want to classify political content, we'll have to be able to provide the LLM with knowledge of coverage about certain topics, else it won't be able to make informed classifications.

#### Example post that could use current events knowledge

Here is an example post whose classification would change if the LLM knew what was happening in the news:

![Example Bluesky post](assets/images/sample_post_3.png "Example Bluesky post")

If we use the following prompt, which has the original text of the post plus some additional context:

```plaintext
Pretend that you are a classifier that predicts whether a post has sociopolitical content or not. Sociopolitical refers to whether a given post is related to politics (government, elections,
politicians, activism, etc.) or social issues (major issues that affect a large group of people, such as the economy, inequality, racism, education, immigration, human rights, the environment, etc.).
We refer to any content that is classified as being either of these two categories as "sociopolitical"; otherwise they are not sociopolitical. Please classify the following text as "sociopolitical" or
"not sociopolitical".

Then, if the post is sociopolitical, classify the text based on the political lean of the opinion or argument it presents. Your options are "democrat", "republican", or 'unclear'. You are analyzing
text that has been pre-identified as 'political' in nature. If the text is not sociopolitical, return "unclear".

Think through your response step by step.

Return in a JSON format in the following way:
{
    "sociopolitical": <two values, 'sociopolitical' or 'not sociopolitical'>,
    "political_ideology": <three values, 'democrat', 'republican', 'unclear'. If the post is not sociopolitical, return an empty string, "">,
    "reason_sociopolitical": <optional, a 1 sentence reason for why the text is sociopolitical or not.>,
    "reason_political_ideology": <optional, a 1 sentence reason for why the text has the given political ideology or is unclear. If the post is not sociopolitical, return an empty string, "">
}

All of the fields in the JSON must be present for the response to be valid, and the answer must be returned in JSON format.


Here is the post text that needs to be classified:
<text>
Faculty walkout at Columbia. I think it's safe to say President Shafiq & other leaders have lost their confidence.


The following JSON object contains the post and its context:

{'context': {'content_referenced_in_post': {'embedded_content_type': None,
                                            'embedded_record_with_media_context': {'description': 'Massive faculty walkout at @Columbia opposing the university’s decision to call in NYPD on '
                                                                                                  'Palestine solidarity protests: https://t.co/DcCSxObtx9',
                                                                                   'title': 'Bassam Khawaja on X: "Massive faculty walkout at @Columbia opposing the university’s decision to call in '
                                                                                            'NYPD on Palestine solidarity protests: https://t.co/DcCSxObtx9" / X'},
                                            'has_embedded_content': True},
             'post_author_context': {'post_author_is_reputable_news_org': False},
             'post_tags_labels': {'post_labels': '', 'post_tags': ''},
             'post_thread': {'thread_parent_post': {'embedded_image_alt_text': None, 'text': None}, 'thread_root_post': {'embedded_image_alt_text': None, 'text': None}},
             'urls_in_post': {'embed_url_context': {'is_trustworthy_news_article': False, 'url': 'https://twitter.com/Bassam_Khawaja/status/1782471549007606164'},
                              'url_in_text_context': {'has_trustworthy_news_links': False}}},
 'text': "Faculty walkout at Columbia. I think it's safe to say President Shafiq & other leaders have lost their confidence."}


Justifications are not necessary.
```

We get the following label:

```plaintext
{
    "sociopolitical": "sociopolitical",
    "political_ideology": "unclear",
    "reason_sociopolitical": "The post discusses a faculty walkout at Columbia University, which is a social issue.",
    "reason_political_ideology": ""
}
```

Our context is not helpful because the LLM doesn't have knowledge of the Columbia protests. Ideally, 

If we change the prompt to something like this:

```plaintext
<same prompt as before>

In the news right now, this is what each political side has said about the Columbia protests. Use this as context for classifying political ideology:
- Democrat: The students and faculty have lost confidence in Columbia for calling the NYPD on the Palestine solidarity protests. People have lost confidence in the leaders of Columbia.
- Republican: The students are occupying these schools and are antisemitic. The administration was in the right to protect the campus by using the NYPD.
```

We get the following result:
```plaintext
{
    "sociopolitical": "sociopolitical",
    "political_ideology": "democrat",
    "reason_sociopolitical": "The text discusses a faculty walkout at Columbia, relating to decisions made by university leadership, which is a matter of public concern and activism.",
    "reason_political_ideology": "The text implies a loss of confidence in leadership due to decisions affecting social activism, aligning with the Democratic perspective supporting the protests."
}
```

This is what having context about current events can give to us. It can give the model the key information needed to classify posts that talk about current events.

**Caveat**: We won't be able to catch every possible nuance or every single trend or piece of slang. The references have to be pretty direct and clear for us to easily detect it. On the other hand, it is worth a shot to implement at least a simplified means of getting knowledge about current events to the LLM and having it use that as context for its decisions.

### Setting up access to the news

We can use the [NewsAPI](https://newsapi.org/) service to grab the latest current events. This is a free API that gives us access to the latest current events, as covered by a variety of news outlets.

We can set up access to the NewsAPI service through their [Python client](https://github.com/mattlisiv/newsapi-python):

```python
from newsapi import NewsApiClient

newsapi_client = NewsApiClient(api_key="<NEWSAPI_API_KEY>")
```

We can then get the list of US news domains that are available:

```python
# https://newsapi.org/docs/endpoints/sources
sources: list[dict] = newsapi_client.get_sources(country="us")
urls: list[dict] = [source["url"] for source in sources["sources"]]

def parse_domain_from_url(url: str) -> str:
    """Given a URL, parse the domain.

    Example:
    >>> parse_domain_from_url("https://www.nytimes.com")
    "nytimes.com"
    >>> parse_domain_from_url("https://www.foxnews.com")
    "foxnews.com"
    >>> parse_domain_from_url("http://www.foxnews.com")
    "foxnews.com"
    >>> parse_domain_from_url("www.foxnews.com")
    "foxnews.com"
    """
    link = url
    if "https://" in url:
        link = url.replace("https://", "")
    elif "http://" in url:
        link = url.replace("http://", "")
    if "www." in link:
        link = link.replace("www.", "")
    return link


def parse_url(url: str) -> str:
    """Given the URL, parse it. Grab domain plus do postprocessing.

    Get the domain, plus remove anything like trailing subpages.
    """
    parsed_url = parse_domain_from_url(url)
    parsed_url = parsed_url.split("/")[0]
    return parsed_url


news_domains = [parse_url(url) for url in urls]
```

By doing this, we get the following list of domains:

```plaintext
['abcnews.go.com', 'aljazeera.com', 'arstechnica.com', 'apnews.com', 'axios.com', 'bleacherreport.com', 'bloomberg.com', 'breitbart.com', 'businessinsider.com', 'buzzfeed.com', 'cbsnews.com', 'us.cnn.com', 'cnnespanol.cnn.com', 'ccn.com', 'engadget.com','ew.com', 'espn.com',
 'espncricinfo.com', 'fortune.com', 'foxnews.com','foxsports.com', 'news.google.com', 'news.ycombinator.com', 'ign.com', 'mashable.com', 'medicalnewstoday.com', 'msnbc.com', 'mtv.com', 'news.nationalgeographic.com', 'nationalreview.com', 'nbcnews.com', 'newscientist.com', 'newsweek.com', 'nymag.com', 'nextbigfuture.com', 'nfl.com', 'nhl.com', 'politico.com', 'polygon.com', 'recode.net', 'reddit.com', 'reuters.com', 'techcrunch.com', 'techradar.com', 'theamericanconservative.com', 'thehill.com', 'huffingtonpost.com', 'thenextweb.com', 'theverge.com',
 'wsj.com', 'washingtonpost.com', 'washingtontimes.com', 'time.com', 'usatoday.com', 'news.vice.com','wired.com']
```

We definitely won't need access to all of these, but we likely want a set that's representative of news coverage across the political spectrum. Luckily, we can use [AllSides](https://allsides.com/) for that information. AllSides collects news coverage from across the political spectrum and presents the varying coverage of a certain topic across the aisle. They provide a [Media Bias Chart](https://www.allsides.com/media-bias/media-bias-chart) that shows, in their coverage, the relative media bias of different news outlets. They determine their media bias by a combination of crowdsourced polls across their reader audience as well as expert opinions.

Conveniently, AllSides also provides specific information on their bias scoring for individual news orgs, such as [Bloomberg](https://www.allsides.com/news-source/bloomberg-media-bias).

![AllSides Media Bias Chart](assets/images/allsides_media_bias_chart.png "AllSides Media Bias Chart")

Using this chart and cross-referencing the AllSides website, we can create a mapping of what political leaning each news outlet has:

```plaintext
political_party_to_news_outlet_domains_map = {
    "democrat": [
        "abcnews.go.com", "msnbc.com", "apnews.com", "axios.com",
        "bloomberg.com", "cbsnews.com", "us.cnn.com", "politico.com",
        "washingtonpost.com", "time.com", "usatoday.com", "news.vice.com",
        "aljazeera.com", "businessinsider.com", "news.google.com"
    ],
    "moderate": [
        "wsj.com", "newsweek.com", "reuters.com", "fortune.com", "thehill.com"
    ],
    "conservative": [
        "theamericanconservative.com", "nationalreview.com",
        "washingtontimes.com", "foxnews.com", "breitbart.com"
    ]
}
```

### Loading news from each source

For a given source, we can use the Python client to get all the top headlines using [this](https://github.com/mattlisiv/newsapi-python/blob/master/newsapi/newsapi_client.py#L34) method.

To do this, we need to use the IDs of the different news outlets, not just the domains, so we need to grab that:

```plaintext
political_party_to_news_outlets = {
    'conservative': [
        {'domain': 'theamericanconservative.com', 'id': 'the-american-conservative'},
        {'domain': 'nationalreview.com', 'id': 'national-review'},
        {'domain': 'washingtontimes.com', 'id': 'the-washington-times'},
        {'domain': 'foxnews.com', 'id': 'fox-news'},
        {'domain': 'breitbart.com', 'id': 'breitbart-news'}
    ],
    'democrat': [
        {'domain': 'abcnews.go.com', 'id': 'abc-news'},
        {'domain': 'msnbc.com', 'id': 'msnbc'},
        {'domain': 'apnews.com', 'id': 'associated-press'},
        {'domain': 'axios.com', 'id': 'axios'},
        {'domain': 'bloomberg.com', 'id': 'bloomberg'},
        {'domain': 'cbsnews.com', 'id': 'cbs-news'},
        {'domain': 'us.cnn.com', 'id': 'cnn'},
        {'domain': 'politico.com', 'id': 'politico'},
        {'domain': 'washingtonpost.com', 'id': 'the-washington-post'},
        {'domain': 'time.com', 'id': 'time'},
        {'domain': 'usatoday.com', 'id': 'usa-today'},
        {'domain': 'news.vice.com', 'id': 'vice-news'},
        {'domain': 'aljazeera.com', 'id': 'al-jazeera-english'},
        {'domain': 'businessinsider.com', 'id': 'business-insider'},
        {'domain': 'news.google.com', 'id': 'google-news'}
    ],
    'moderate': [
        {'domain': 'wsj.com', 'id': 'the-wall-street-journal'},
        {'domain': 'newsweek.com', 'id': 'newsweek'},
        {'domain': 'reuters.com', 'id': 'reuters'},
        {'domain': 'fortune.com', 'id': 'fortune'},
        {'domain': 'thehill.com', 'id': 'the-hill'}
    ]
}
```

Let's see what we get if we get the top headlines for one of these news outlets.

```python
headlines = newsapi_client.get_top_headlines(sources="abc-news")
headlines["articles"][0:3]
```

```plaintext
[
    {
        'source': {'id': 'abc-news', 'name': 'ABC News'},
        'author': 'TOM MURPHY AP health writer',
        'title': 'CVS Health chops 2024 forecast as cost struggles with Medicare Advantage persist',
        'description': 'CVS Health missed first-quarter expectations and chopped its 2024 outlook more than a dollar below Wall Street’s forecast',
        'url': 'https://abcnews.go.com/Business/wireStory/cvs-health-chops-2024-forecast-cost-struggles-medicare-109821867',
        'urlToImage': 'https://i.abcnewsfe.com/a/621b804d-d800-4c41-8e67-e0b453912b53/wirestory_1dcdd232c1648f86a2cfc87dc09ab72f_16x9.jpg?w=1600',
        'publishedAt': '2024-05-01T12:15:40Z',
        'content': 'CVS Health missed first-quarter expectations and chopped its 2024 outlook more than a dollar below Wall Streets forecast.\r\nShares of the health care giant plunged Wednesday morning after the company … [+1291 chars]'
    },
    {
        'source': {'id': 'abc-news', 'name': 'ABC News'},
        'author': 'The Associated Press',
        'title': '17-year-old boy charged with attempted murder after assaulting 3 at school in England',
        'description': 'Police say a 17-year-old boy has been arrested on suspicion of attempted murder after three people were assaulted with a sharp object at a secondary school in Sheffield in northern England',
        'url': 'https://abcnews.go.com/International/wireStory/17-year-boy-charged-attempted-murder-after-assaulting-109818807',
        'urlToImage': 'https://i.abcnewsfe.com/a/d920dad6-cb32-4731-83a5-99fc4bd2f43f/wirestory_79206eaebef0d6b379f8053ecc062584_16x9.jpg?w=1600',
        'publishedAt': '2024-05-01T10:21:40Z',
        'content': 'LONDON -- A 17-year-old boy has been arrested on suspicion of attempted murder after three people were assaulted with a sharp object at a secondary school in northern England, South Yorkshire Police … [+303 chars]'
    },
    {
        'source': {'id': 'abc-news', 'name': 'ABC News'},
        'author': 'Aaron Katersky, Meredith Deliso',
        'title': 'Harvey Weinstein scheduled to appear in court after sex crimes conviction overturned',
        'description': 'His attorney said they are prepared to go to trial again, "if it comes to that."',
        'url': 'https://abcnews.go.com/US/harvey-weinstein-manhattan-court-appearance/story?id=109795381',
        'urlToImage': 'https://i.abcnewsfe.com/a/7084e61b-6abb-4f8e-899f-0c7364e53cdb/harvey-weinstein-gty-jef-240430_1714498214340_hpMain_16x9.jpg?w=1600',
        'publishedAt': '2024-05-01T09:15:22Z',
        'content': "Harvey Weinstein is scheduled to appear in court in Manhattan on Wednesday for the first time since New York's highest court overturned his sex crimes conviction.\r\nHe is scheduled to appear in Manhat… [+3133 chars]"
    }
]
```

```python
headlines["articles"][1]["content"]
```

```plaintext
'LONDON -- A 17-year-old boy has been arrested on suspicion of attempted murder after three people were assaulted with a sharp object at a secondary school in northern England, South Yorkshire Police … [+303 chars]'
```



### Getting the latest news across outlets

Now we can access the news articles from a given news provider. The limitation behind this approach is that we can't get the full text of a news article, all we get is a preview of the content. But we do get the headline and the preview text. We can generally, as people, tell what stance a news outlet has towards a topic based on what they choose to focus on, and we can provide that as context to our model.

```python
def get_latest_top_headlines_for_source(source_id: str) -> list[dict]:
    """Get the latest top headlines from a news source."""
    articles = newsapi_client.get_top_headlines(sources=source_id)["articles"]
    return [
        {
            "title": article["title"],
            "description": article["description"],
            "url": article["url"],
            "content": article["content"],
        }
        for article in articles
    ]


def get_latest_top_headlines_for_political_party(
    political_party: str
) -> list[dict]:
    """Get the latest top headlines from a news source."""
    news_outlets = political_party_to_news_outlets[political_party]
    return [
        {
            "source": news_outlet["id"],
            "articles": get_latest_top_headlines_for_source(news_outlet["id"]),
        }
        for news_outlet in news_outlets
    ]


def get_latest_top_headlines() -> dict:
    """Get the latest top headlines from a news source."""
    return {
        "conservative": get_latest_top_headlines_for_political_party("conservative"), # noqa
        "moderate": get_latest_top_headlines_for_political_party("moderate"),
        "democrat": get_latest_top_headlines_for_political_party("democrat"),
    }

latest_top_headlines = get_latest_top_headlines()
conservative_headlines = latest_top_headlines['conservative']
pprint(conservative_headlines[1][0:5])
```

```plaintext
{
    'articles': [
        {
            'content': None,
            'description': 'Performers allegedly wore sexually suggestive '
                              'clothing and prosthetic female breasts and '
                              'genitalia.',
            'title': 'DeSantis Admin Revokes Liquor License of Orlando '
                    'Venue That Hosted Sexual Drag Show for Children',
            'url': 'https://www.nationalreview.com/news/desantis-admin-revokes-liquor-license-of-orlando-venue-that-hosted-sexual-drag-show-for-children/'
        },
        {
            'content': None,
            'description': 'Isabel Vaughan-Spruce has nevertheless decided to pursue a verdict in court to clear her name.',
            'title': 'Charges Dropped against British Woman Arrested for Praying outside Abortion Clinic',
            'url': 'https://www.nationalreview.com/news/charges-dropped-against-british-woman-arrested-for-praying-outside-abortion-clinic/'
        },
        {
            'content': None,
            'description': 'The former representatives who wrote the HEROES Act told SCOTUS it’s being ‘misused and 'distorted’ to advance the administration’s student-loan plan.',
            'title': 'Authors of the Law Biden Used to Wipe Out Student Debt Declare He Doesn’t Have the Authority',
            'url': 'https://www.nationalreview.com/2023/02/authors-of-the-law-biden-used-to-wipe-out-student-debt-declare-he-doesnt-have-the-authority/'
        },
        {
            'content': None,
            'description': 'Universities have been conducting an 'unprecedented experiment in historical amnesia on American students.',
            'title': 'Why We Need Western Civ',
            'url': 'https://www.nationalreview.com/2023/02/why-we-need-western-civ/'
        },
        {
            'content': None,
            'description': 'Chinese officials confirmed the balloon is theirs, but claimed it’s a civilian airship used for research that was blown off course.',
            'title': 'Pentagon Press Secretary Disputes China’s Claim That Balloon Is Civilian Airship',
            'url': 'https://www.nationalreview.com/news/mccarthy-demands-gang-of-eight-briefing-on-chinese-spy-balloon/'
        },
        {
            'content': None,
            'description': 'The contradictions of the Colorado Court of Appeals opinion premises regarding Jack Phillips are very clear.',
            'title': 'Speech for Me, Not for Thee',
            'url': 'https://www.nationalreview.com/corner/speech-for-me-not-for-thee/'
        },
        {
            'content': None,
            'description': 'Joe Biden remains Democrats’ best choice for 2024, and that’s not really a good thing.',
            'title': 'Democrats Are Stuck with Biden',
            'url': 'https://www.nationalreview.com/2023/02/democrats-are-stuck-with-biden/'
        },
        {
            'content': None,
            'description': 'Eunice Dwumfour reportedly had just stopped by her house and was about to leave again in her car when the assailant confronted her and shot her.',
            'title': 'GOP New Jersey Councilwoman Shot to Death outside Her Home',
            'url': 'https://www.nationalreview.com/news/gop-new-jersey-councilwoman-shot-to-death-outside-her-home/'
        },
        {
            'content': None,
            'description': '‘I didn’t feel right being in either category. . . . The only thing that felt right to me would be to abstain from nomination consideration’',
            'title': 'Nonbinary Broadway Star Rejects Tony Award Consideration over Gendered Categories',
            'url': 'https://www.nationalreview.com/news/nonbinary-broadway-star-rejects-tony-award-consideration-over-gendered-categories/'
        },
        {
            'content': None,
            'description': 'The AG offices of Missouri, Texas, and Arizona have led the fight against Biden overreach, but insiders tell NR they’re not in a position to keep it up.',
            'title': 'Legal Resistance to Biden Administration in Doubt as Powerhouse AG Offices Stumble',
            'url': 'https://www.nationalreview.com/news/legal-resistance-to-biden-administration-in-doubt-as-powerhouse-ag-offices-stumble/?utm_source=recirc-%5BSCREENSIZE%5D&#038;utm_medium=homepage&#038;utm_campaign=hero&#038;utm_content=related&#038;utm_term=first'
        }
    ],
    'source': 'national-review'
}
```

### Store articles in a database

Let's first store our articles in a database. I'm using SQLite for my database and [peewee](https://pypi.org/project/peewee/) as a simple ORM.

```python
current_file_directory = os.path.dirname(os.path.abspath(__file__))
SQLITE_DB_NAME = "news.db"
SQLITE_DB_PATH = os.path.join(current_file_directory, SQLITE_DB_NAME)

db = peewee.SqliteDatabase(SQLITE_DB_PATH)
db_version = 2

conn = sqlite3.connect(SQLITE_DB_PATH)
cursor = conn.cursor()


class BaseModel(peewee.Model):
    class Meta:
        database = db


class NewsOutlet(BaseModel):
    """News outlet model."""
    outlet_id = peewee.CharField(primary_key=True)
    domain = peewee.CharField()
    political_party = peewee.CharField()


class NewsArticle(BaseModel):
    """News article model."""
    url = peewee.CharField(primary_key=True)
    title = peewee.CharField()
    content = peewee.TextField()
    description = peewee.TextField()
    publishedAt = peewee.CharField()
    news_outlet_source_id = peewee.ForeignKeyField(
        NewsOutlet, field='outlet_id', backref='articles'
    )


if db.is_closed():
    db.connect()
    db.create_tables([NewsOutlet, NewsArticle])


def create_initial_tables() -> None:
    with db.atomic():
        db.create_tables([NewsOutlet, NewsArticle])


def insert_news_outlet(news_outlet: dict) -> None:
    with db.atomic():
        NewsOutlet.create(**news_outlet)


def insert_news_article(news_article: dict) -> None:
    with db.atomic():
        NewsArticle.create(**news_article)
```

Now we can store all of our news outlets and news articles into the database.



### Creating embeddings for outlets

We can create embeddings for each of the news articles, which will let us store vector representations of each news article so taht we can later do similarity matches. If we want to figure out what context to provide to a post, we need to have a way to figure out what the "most relevant" articles are.

The steps that I'll follow are:
- Create an embedding scheme.
- Create separate embedding databases for each (democrat, moderate, republican)
- Embed the news articles for the given political party.
- Then after doing that, create a cache that, for a given topic, already has prepopulated the top articles per political party.
    - Can use some clustering model or something in order to get the top 5-10 topics per day.
- On inference, can start with a keyword match for each topic.

#### Creating an embedding scheme

For 

In [None]:
# to include auto-reload: https://saturncloud.io/blog/jupyter-notebook-reload-module-a-comprehensive-guide/
#%load_ext autoreload
#%reload_ext autoreload
#%autoreload 2
import importlib

from services.add_context.current_events_enrichment import newsapi_context
importlib.reload(newsapi_context)

In [67]:
latest_top_headlines = newsapi_context.get_latest_top_headlines()

In [68]:
latest_top_headlines.keys()

dict_keys(['conservative', 'moderate', 'democrat'])

In [82]:
latest_top_headlines.keys()

dict_keys(['conservative', 'moderate', 'democrat'])

In [None]:
def store_latest_top_headlines_into_db(latest_top_headlines: dict) -> None:
    # store the latest top headlines into the database
    res = []
    for party in latest_top_headlines.keys():
        for news_source_articles_dict in latest_top_headlines[party]:
            news_outlet_source_id = news_source_articles_dict["source"]
            articles = news_source_articles_dict["articles"]
            for article in articles:
                article_obj = {
                    "article": article,
                }

In [71]:
conservative_headlines = latest_top_headlines['conservative']

In [73]:
from pprint import pprint

In [74]:
# note: I'll have to filter if there are blanks
pprint(conservative_headlines[0])

{'articles': [{'content': None,
               'description': None,
               'title': 'Culture Archives - The American Conservative',
               'url': 'https://www.theamericanconservative.com/category/culture/'},
              {'content': None,
               'description': None,
               'title': 'Foreign Affairs Archives - The American Conservative',
               'url': 'https://www.theamericanconservative.com/category/foreign-affairs/'},
              {'content': None,
               'description': None,
               'title': 'Politics Archives - The American Conservative',
               'url': 'https://www.theamericanconservative.com/category/politics/'}],
 'source': 'the-american-conservative'}


In [77]:
pprint(conservative_headlines[1])

{'articles': [{'content': None,
               'description': 'Performers allegedly wore sexually suggestive '
                              'clothing and prosthetic female breasts and '
                              'genitalia.',
               'title': 'DeSantis Admin Revokes Liquor License of Orlando '
                        'Venue That Hosted Sexual Drag Show for Children',
               'url': 'https://www.nationalreview.com/news/desantis-admin-revokes-liquor-license-of-orlando-venue-that-hosted-sexual-drag-show-for-children/'},
              {'content': None,
               'description': 'Isabel Vaughan-Spruce has nevertheless decided '
                              'to pursue a verdict in court to clear her name.',
               'title': 'Charges Dropped against British Woman Arrested for '
                        'Praying outside Abortion Clinic',
               'url': 'https://www.nationalreview.com/news/charges-dropped-against-british-woman-arrested-for-praying-outside-ab

In [81]:
from services.add_context.current_events_enrichment.newsapi_context import newsapi_client

In [79]:
articles = newsapi_client.get_top_headlines(sources="abc-news")["articles"]

In [80]:
articles[0]

{'source': {'id': 'abc-news', 'name': 'ABC News'},
 'author': 'COLLIN BINKLEY AP education writer',
 'title': 'The botched FAFSA rollout leaves students in limbo. Some wonder if their college dreams will survive',
 'description': 'The bungled rollout of a new federal student aid form has left millions of students in limbo and some wondering if their college dreams will survive',
 'url': 'https://abcnews.go.com/US/wireStory/botched-fafsa-rollout-leaves-students-limbo-college-dreams-109822369',
 'urlToImage': 'https://i.abcnewsfe.com/a/d5b2bcb5-558b-4ce8-8fb2-ef8bfe14d6a5/wirestory_9637c87ed2496491609c4a517cec9be3_16x9.jpg?w=1600',
 'publishedAt': '2024-05-01T12:36:11Z',
 'content': "WASHINGTON -- The last thing standing between Ashnaelle Bijoux and her college dream is the FAFSA form a financial aid application that's supposed to help students go to college, but is blocking her … [+7483 chars]"}