# Homework 4: Data Acquisition and Ingestion

## Task 1: API Pull

Loading environment variables below.

I've used the FMP API for the data below, with yfinance as fallback - this is because Alpha Advantage's Time series endpoint is not freely available. 

In [7]:
import os
import json
import time
import datetime
from datetime import datetime
from pathlib import Path
import pandas as pd
import bs4
from bs4 import BeautifulSoup
import requests
import pandas as pd
import sys
import yfinance

%reload_ext autoreload
%autoreload 2


config_path = Path.cwd().parent / "src"
sys.path.append(str(config_path))

from config import *
from utils import *

env_path = Path.cwd().parent / ".env"

load_env(env_path)

try:
    fmp_key = get_key("API_KEY")
    #print(f'''Loaded api key: {fmp_key}''')

except Exception as e:
    print(f'''Error retrieving key: {e} and yfinance will be used instead''')

Below pulls raw data, and saves down 1 year's worth of adjusted close prices for the given ticker. It raises a flag if date/adjclose are not part of the response of the API response

In [2]:
ticker = "AAPL"
use_api = bool(fmp_key)

end_date, start_date = year_date_interval_creator()
if use_api:

    url = f"https://financialmodelingprep.com/api/v3/historical-price-full/{ticker}"
    params = {
        "apikey": fmp_key,
        "from": end_date.strftime("%Y-%m-%d"),
        "to": start_date.strftime("%Y-%m-%d") 
    }
    r = requests.get(url, params=params)
    data = r.json()
    historical = data.get("historical", [])
    key = [k for k in historical if "adjClose" in k.keys() or "date" in k.keys()]
    assert key, f"Unexpected response keys: {list (data.keys())}"
    df_raw_api = pd.DataFrame(historical)
    df_raw_api['date'] = pd.to_datetime(df_raw_api['date'])
    df_raw_api['adjClose']=pd.to_numeric(df_raw_api['adjClose'])
    df_raw_api = df_raw_api[['date','adjClose']]

else:

    import yfinance as yf
    df_raw_api = yf.download(ticker, period="1y", interval="1d",auto_adjust=True).reset_index()[[('Date',''), ('Close',ticker)]]
    df_raw_api.columns = ['date','adjClose']
    df_raw_api['date'] = pd.to_datetime(df_raw_api['date'])
    df_raw_api['adjClose']=pd.to_numeric(df_raw_api['adjClose'])



Data pulled is validated and saved using functions from the utils module.

In [8]:
msg_dict = validate_df(df_raw_api,['date','adjClose'],{'date':'datetime64[ns]','adjClose':'float64'})
msg_dict

fname = safe_filename(prefix="api", meta={"source": "fmp" if use_api else "yfinance", "symbol": ticker})
raw_data_path = Path.cwd().parent / "data" / "raw"
out_path = raw_data_path / fname
df_raw_api.to_csv(out_path, index=False)
print("Saved:", out_path)

Saved: /Users/svolety/Desktop/bootcamp IV/bootcamp/homework/homework4and5/data/raw/api_source-fmp_symbol-AAPL_20250816-204301.csv


## Task 2: Web Scraping

Most websites with market data - yfinance/morningstar/fmp/alphaadvantage - either block requests or I need to use selenium to run the JS and pull data from the table. (otherwise it pulls a blank table) I'll return to this part of the assignment post lecture on Monday.