##1. Data Sources & URLs
##Scrape:
FDIC
https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/
##API
AlphaVantage or Yahoo Finance
https://www.alphavantage.co/query (fallback: yfinance API)

##2. Request Parameters
##Scrape
URL: https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/
Headers: User-Agent: AFE-Course-Notebook/1.0 (contact: instructor@example.edu)
Timeout: 30s
Tooling: requests + BeautifulSoup + pandas.read_html
Fallback: simulated HTML table used if scraping fails
##API
Provider: AlphaVantage (TIME_SERIES_DAILY)
Parameters:
symbol: "AAPL"
outputsize: "compact"
apikey: loaded via .env
datatype: "json"
Fallback: yfinance.download("AAPL", period="6mo", interval="1d")

##3. Validation Logic
Both data sources are validated using the same helper function:

In [None]:
def validate_df(df: pd.DataFrame, required_cols: List[str], dtypes_map: Dict[str, str]) -> Dict[str, str]:
    msgs = {}
    missing = [c for c in required_cols if c not in df.columns]
    if missing:
        msgs["missing_cols"] = f"Missing columns: {missing}"
    for col, dtype in dtypes_map.items():
        if col in df.columns:
            try:
                if dtype == "datetime64[ns]":
                    pd.to_datetime(df[col])
                elif dtype == "float":
                    pd.to_numeric(df[col])
            except Exception as e:
                msgs[f"dtype_{col}"] = f"Failed to coerce {col} to {dtype}: {e}"
    na_counts = df.isna().sum().sum()
    msgs["na_total"] = f"Total NA values: {na_counts}"
    return msgs

##API Validation

In [None]:
msgs = validate_df(df_api, required_cols=["date", "adj_close"], dtypes_map={"date": "datetime64[ns]", "adj_close": "float"})

##Scrape Validation

In [None]:
msgs2 = validate_df(df_scrape, required_cols=list(df_scrape.columns), dtypes_map={})

##4. Assumptions & Risks
##API
Assumptions:
The .env file is correctly configured with a valid ALPHAVANTAGE_API_KEY.
The AlphaVantage endpoint TIME_SERIES_DAILY remains accessible and correctly formatted.
Risks:
The AlphaVantage API has rate limits; if exceeded or the key is missing, data fetching will fail.
##Scrape
Assumptions:
The FDIC webpage structure contains a well-formed <table> that pandas.read_html can parse.
The table contains all expected columns and rows directly in the HTML.
Risks:
If the FDIC updates the website structure, the scraper may break.