# Development Notebook

Figuring out all these APIs in a notebook first

## Imports & Setup

In [91]:
import datetime as dt
import json
import random
import tomllib

import pandas as pd
import requests
import yfinance as yf

In [3]:
with open(".secrets", "rb") as secrets_file:
    secrets = tomllib.load(secrets_file)
print(secrets.keys())

dict_keys(['ALPHA_VANTAGE_KEY', 'GUARDIAN_KEY', 'NASDAQ_KEY'])


## Grab Market Data

Thank you, Yahoo Finance, for continuting to offer a free API.

In [4]:
nasdaq = yf.Ticker("^IXIC")
nasdaq.info

{'maxAge': 86400,
 'priceHint': 2,
 'previousClose': 14127.282,
 'open': 14319.2,
 'dayLow': 14243.563,
 'dayHigh': 14360.199,
 'regularMarketPreviousClose': 14127.282,
 'regularMarketOpen': 14319.2,
 'regularMarketDayLow': 14243.563,
 'regularMarketDayHigh': 14360.199,
 'volume': 2389571000,
 'regularMarketVolume': 2389571000,
 'averageVolume': 4864780983,
 'averageVolume10days': 4767984000,
 'averageDailyVolume10Day': 4767984000,
 'bid': 0.0,
 'ask': 0.0,
 'bidSize': 0,
 'askSize': 0,
 'fiftyTwoWeekLow': 10088.83,
 'fiftyTwoWeekHigh': 14446.55,
 'fiftyDayAverage': 13452.713,
 'twoHundredDayAverage': 11882.2,
 'currency': 'USD',
 'exchange': 'NIM',
 'quoteType': 'INDEX',
 'symbol': '^IXIC',
 'underlyingSymbol': '^IXIC',
 'shortName': 'NASDAQ Composite',
 'longName': 'NASDAQ Composite',
 'firstTradeDateEpochUtc': 34612200,
 'timeZoneFullName': 'America/New_York',
 'timeZoneShortName': 'EDT',
 'uuid': '6b51a47d-53e9-30d4-8a47-289ac3188b0f',
 'messageBoardId': 'finmb_INDEXIXIC',
 'gmtOff

Okay, so this resolved nicely. Let's see what historical data I can fetch.

In [12]:
nasdaq.history(
    period="1d", start=dt.date.today() - dt.timedelta(days=3), end=dt.date.today()
)

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2023-07-24 00:00:00-04:00,14081.629883,14110.150391,13997.129883,14058.870117,4083070000,0.0,0.0
2023-07-25 00:00:00-04:00,14093.240234,14201.910156,14092.519531,14144.55957,3812470000,0.0,0.0
2023-07-26 00:00:00-04:00,14123.519531,14187.349609,14041.950195,14127.280273,4322000000,0.0,0.0


Nice! That's _literally_ all I need.

Now to mix it up, there's a few tickers I want to track. Looks like I can download them all in a single call.

In [13]:
tickers: dict[str, str] = {
    "NASDAQ": "^IXIC",
    "Dow": "^DJI",
    "S&P 500": "^GSPC",
    "Nikkei": "^N225",
    "FTSE": "^FTSE",
    "Capital One Stock": "COF",
}

In [23]:
stonks = yf.download(
    " ".join(tickers.values()),
    start=dt.date.today() - dt.timedelta(days=3),
    end=dt.date.today(),
)
stonks

[*********************100%***********************]  6 of 6 completed


Unnamed: 0_level_0,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Close,Close,Close,Close,...,Open,Open,Open,Open,Volume,Volume,Volume,Volume,Volume,Volume
Unnamed: 0_level_1,COF,^DJI,^FTSE,^GSPC,^IXIC,^N225,COF,^DJI,^FTSE,^GSPC,...,^FTSE,^GSPC,^IXIC,^N225,COF,^DJI,^FTSE,^GSPC,^IXIC,^N225
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2023-07-24,117.220001,35411.238281,7678.600098,4554.640137,14058.870117,32700.939453,117.220001,35411.238281,7678.600098,4554.640137,...,7663.700195,4543.390137,14081.629883,32648.140625,3173500,284460000,521826500,3856250000,4083070000,83500000
2023-07-25,114.510002,35438.070312,7691.799805,4567.459961,14144.55957,32682.509766,114.510002,35438.070312,7691.799805,4567.459961,...,7678.600098,4555.189941,14093.240234,32705.390625,2586100,299530000,444691500,3812470000,3812470000,101000000
2023-07-26,114.040001,35520.121094,7676.899902,4566.75,14127.280273,32668.339844,114.040001,35520.121094,7676.899902,4566.75,...,7691.799805,4558.959961,14123.519531,32704.960938,2666000,346240000,807357200,3990290000,4322000000,85200000


And really I want to boil this down to a single data point for each.

In [27]:
100 * (stonks.Close - stonks.Open) / stonks.Open

Unnamed: 0_level_0,COF,^DJI,^FTSE,^GSPC,^IXIC,^N225
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2023-07-24,1.462826,0.512192,0.194422,0.247612,-0.161627,0.161721
2023-07-25,-2.41179,0.046813,0.171903,0.269364,0.364141,-0.069961
2023-07-26,-0.886491,0.492652,-0.193712,0.170873,0.026628,-0.111974


And honestly I want it even more basic than that.

In [28]:
(stonks.Close - stonks.Open) > 0

Unnamed: 0_level_0,COF,^DJI,^FTSE,^GSPC,^IXIC,^N225
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2023-07-24,True,True,True,True,False,True
2023-07-25,False,True,True,True,True,False
2023-07-26,False,True,False,True,True,False


## News Stories

Yet another reason to love The Guardian.

In [38]:
result = requests.get(
    "https://content.guardianapis.com/sections",
    params={
        "api-key": secrets["GUARDIAN_KEY"],
        "from-date": dt.date.today() - dt.timedelta(days=3),
        "to-date": dt.date.today(),
    },
)
print(result)

<Response [200]>


In [41]:
[section["webTitle"] for section in json.loads(result.text)["response"]["results"]]

['About',
 'Animals farmed',
 'Art and design',
 'Australia news',
 'Better Business',
 'Books',
 'Business',
 'Business to business',
 'Cardiff',
 "Children's books",
 'Cities',
 'Opinion',
 'Community',
 'Crosswords',
 'Culture',
 'Culture Network',
 'Culture professionals network',
 'Edinburgh',
 'Education',
 'Guardian Enterprise Network',
 'Environment',
 'Extra',
 'Fashion',
 'Film',
 'Food',
 'Football',
 'Games',
 'Global development',
 'Global Development Professionals Network',
 'Guardian Government Computing',
 'Guardian Foundation',
 'Guardian Professional',
 'Healthcare Professionals Network',
 'Help',
 'Higher Education Network',
 'Housing Network',
 'Inequality',
 'Info',
 'Jobs',
 'Katine',
 'Law',
 'Leeds',
 'Life and style',
 'Local',
 'Local Leaders Network',
 'Media',
 'Media Network',
 'Membership',
 'Money',
 'Music',
 'News',
 'Politics',
 'Public Leaders Network',
 'Science',
 'Search',
 'Guardian Small Business Network',
 'Social Care Network',
 'Social Enterpr

So ideally we want to filter out finance-related topics and a few other sections that probably won't map well.

In [58]:
excluded_sections = [
    "About",
    "Better Business",
    "Business",
    "Business to business",
    "Opinion",
    "Community",
    "Crosswords",
    "Global development",
    "Help",
    "Inequality",
    "Info",
    "Jobs",
    "Membership",
    "Money",
    "News",
    "Politics",
    "Search",
    "From the Guardian",
    "From the Observer",
    "Guardian holiday offers",
    "World news",
]

In [69]:
result = requests.get(
    "https://content.guardianapis.com/search",
    params={
        "api-key": secrets["GUARDIAN_KEY"],
        "from-date": dt.date.today() - dt.timedelta(days=3),
        "to-date": dt.date.today(),
        "page-size": 50,
        "section": ",".join([f"-{section}" for section in excluded_sections]),
    },
)
print(result)

<Response [200]>


In [70]:
json.loads(result.text)["response"]["results"][0]

{'id': 'sport/live/2023/jul/27/england-v-australia-ashes-fifth-test-day-one-live-scores-updates-results-aus-vs-eng-cricket-the-oval',
 'type': 'liveblog',
 'sectionId': 'sport',
 'sectionName': 'Sport',
 'webPublicationDate': '2023-07-27T17:45:43Z',
 'webTitle': 'England v Australia: Ashes fifth Test, day one – live reaction',
 'webUrl': 'https://www.theguardian.com/sport/live/2023/jul/27/england-v-australia-ashes-fifth-test-day-one-live-scores-updates-results-aus-vs-eng-cricket-the-oval',
 'apiUrl': 'https://content.guardianapis.com/sport/live/2023/jul/27/england-v-australia-ashes-fifth-test-day-one-live-scores-updates-results-aus-vs-eng-cricket-the-oval',
 'isHosted': False,
 'pillarId': 'pillar/sport',
 'pillarName': 'Sport'}

In [71]:
[article["webTitle"] for article in json.loads(result.text)["response"]["results"]]

['England v Australia: Ashes fifth Test, day one – live reaction',
 'Trump says lawyers were given no indication of looming indictment from DoJ – live',
 'Quashing of Andrew Malkinson’s rape conviction confirms failings of criminal review watchdog',
 'The Labour party is walking a fine line on trans rights | Letters',
 'Home Office is racist for blocking Siyabonga Twala from returning | Letter',
 'Disabled people are receiving degrading treatment in an overstretched NHS | Letter',
 'A short-lived guide to saving the Earth',
 'Sunak under pressure to block ex-Ukip deputy from potential Tory candidacy',
 'The Guardian view on the freeing of Andy Malkinson: a case for reform | Editorial',
 'The Guardian view on levelling up: widening regional pay gaps expose Conservative failure | Editorial',
 'Muslim leaders decry ‘double standard’ of Farage bank account closure furore',
 'First 50 people coming to Bibby Stockholm asylum barge despite safety worries',
 'Russia-Ukraine war live: Putin say

Okay, some cleanup to do.

In [73]:
def clean_headline(headline: str) -> str:
    """Get to the relevant "sound byte" of a headline"""
    return headline.split("–")[0].split(":")[-1].split(";")[0].split("|")[0].strip()


[
    clean_headline(article["webTitle"])
    for article in json.loads(result.text)["response"]["results"]
]

['Ashes fifth Test, day one',
 'Trump says lawyers were given no indication of looming indictment from DoJ',
 'Quashing of Andrew Malkinson’s rape conviction confirms failings of criminal review watchdog',
 'The Labour party is walking a fine line on trans rights',
 'Home Office is racist for blocking Siyabonga Twala from returning',
 'Disabled people are receiving degrading treatment in an overstretched NHS',
 'A short-lived guide to saving the Earth',
 'Sunak under pressure to block ex-Ukip deputy from potential Tory candidacy',
 'a case for reform',
 'widening regional pay gaps expose Conservative failure',
 'Muslim leaders decry ‘double standard’ of Farage bank account closure furore',
 'First 50 people coming to Bibby Stockholm asylum barge despite safety worries',
 'Putin says Ukrainian attacks have intensified',
 'Norwegian woman claims record time for climbing world’s 14 highest peaks',
 'Jordan Henderson no longer an LGBTQ+ ally after Saudi move, says Hitzlsperger',
 'Bereaved

I think I can work with this. Let's expand the sample size.

Now, for a given article, can I also get the lede?

In [87]:
json.loads(
    requests.get(
        json.loads(result.text)["response"]["results"][5]["apiUrl"],
        params={
            "api-key": secrets["GUARDIAN_KEY"],
            "show-fields": ["trailText", "headline", "body"],
        },
    ).text
)["response"]

{'status': 'ok',
 'userTier': 'developer',
 'total': 1,
 'content': {'id': 'society/2023/jul/27/disabled-people-are-receiving-degrading-treatment-in-an-overstretched-nhs',
  'type': 'article',
  'sectionId': 'society',
  'sectionName': 'Society',
  'webPublicationDate': '2023-07-27T17:34:59Z',
  'webTitle': 'Disabled people are receiving degrading treatment in an overstretched NHS | Letter',
  'webUrl': 'https://www.theguardian.com/society/2023/jul/27/disabled-people-are-receiving-degrading-treatment-in-an-overstretched-nhs',
  'apiUrl': 'https://content.guardianapis.com/society/2023/jul/27/disabled-people-are-receiving-degrading-treatment-in-an-overstretched-nhs',
  'fields': {'body': '<p>It’s deeply concerning to hear that frontline NHS staff are feeling “moral distress” from having too little time to spend with their patients (<a href="https://www.theguardian.com/society/2023/jul/24/most-nhs-staff-say-they-dont-have-enough-time-to-spend-with-patients" title="">Most NHS staff say the

In [88]:
_["content"]["fields"]

{'body': '<p>It’s deeply concerning to hear that frontline NHS staff are feeling “moral distress” from having too little time to spend with their patients (<a href="https://www.theguardian.com/society/2023/jul/24/most-nhs-staff-say-they-dont-have-enough-time-to-spend-with-patients" title="">Most NHS staff say they don’t have enough time to spend with patients, 24 July</a>). It creates a worrying picture for people with a learning disability, who often need extra time for appointments and already struggle to access basic healthcare.</p> <p>Our helpline regularly receives calls from families seriously worried about quality of care and feeling they need to be there to advocate for their loved one. We hear stories of people being left in incontinence pads rather than being supported to go to the toilet, or having medication and care plans changed without clear explanation. We know that people on specialised diets have been fed the wrong food, sometimes with tragic consequences that could h

In [89]:
_.keys()

dict_keys(['body'])

Feh. It'll do, though.

## Combine

In [90]:
up_words: list[str] = [
    "soars",
    "skyrockets",
    "captapults",
    "zooms",
    "jumps",
    "shoots up",
]

down_words: list[str] = [
    "plummets",
    "tanks",
    "in free fall",
    "dives",
    # "crashes",  # let's not go that far
    "plunges",
]

In [92]:
random.seed()

In [99]:
indicator, symbol = random.choice(list(tickers.items()))
is_up = (stonks.Close - stonks.Open).at[stonks.index[-1], symbol] > 0

word = random.choice(up_words if is_up else down_words)

article = random.choice(json.loads(result.text)["response"]["results"])
base_headline = clean_headline(article["webTitle"])

print(" ".join([indicator, word, "as", base_headline]))

Nikkei dives as The US at this World Cup are young, talented … and running out of time to peak


Omg. That's epic. Now gimme the lede.

In [103]:
print(
    json.loads(
        requests.get(
            article["apiUrl"],
            params={"api-key": secrets["GUARDIAN_KEY"], "show-fields": ["body"]},
        ).text
    )["response"]["content"]["fields"]["body"].split("</p>")[0][3:]
)

When the United States won the 2019 Women’s World Cup, they did so with a team certain of their identity, one that pressured opponents into submission early in each match. The Americans scored in the 12th minute or earlier in each of their first six games of that campaign. A defining characteristic of the USA in that era was their high press and counter-press after losing possession. It was suffocating and relentless – and it forced some of the best teams in the world to panic. 


And finally a link tot he content:

In [104]:
article["webUrl"]

'https://www.theguardian.com/football/2023/jul/27/uswnt-usa-womens-world-cup-soccer-netherlands'