# Homework4
## Webcrawler using alphadvantage

### Data Source
 **Provider**: [Alpha Vantage](https://www.alphavantage.co/)
 **Endpoint**: `TIME_SERIES_DAILY_ADJUSTED`
 **Ticker**: `AAPL` (Apple Inc.)
 **Access**: Free tier (requires API key)
 **API Documentation**: [Alpha Vantage Docs](https://www.alphavantage.co/documentation/)


### Data Range

 **Output Size**: `compact` (last 100 data points)
 **Frequency**: Daily
 **Coverage**: Adjusted historical OHLCV (Open, High, Low, Close, Volume) + corporate actions like splits/dividends

### Logic & Workflow

1. **Environment Setup**  
   Load necessary libraries and API key from `.env` using `dotenv`.
     
2. **API Request**  
   Use `requests.get()` to call the Alpha Vantage API, passing in function, symbol, and key.
     
3. **Parsing & Cleaning** 
   IdothisIdothat

4. **Validation**  
   balah balah

5. **Save to Local**  
   Ensure output directory exists: `data/raw/`.
   Save as `aapl_api.csv`.

### ⚠️ Assumptions & Risks

| Type | Details |
|------|---------|
| **Assumptions** | - API key is valid and available in `.env`<br>- API response structure is stable<br>- "compact" size (last ~100 days) is sufficient |
| **Risks** | - Free API rate limit (5 requests/min, 500/day)<br>- API format or endpoint may change<br>- `.env` file must be added to `.gitignore`<br>- Market holidays cause missing dates |


In [9]:
## Code

In [8]:
import os
import requests
import pandas as pd
from dotenv import load_dotenv
from pathlib import Path

load_dotenv()

API_KEY = os.getenv("ALPHAVANTAGE_API_KEY")
symbol = "AAPL"
url = f"https://www.alphavantage.co/query"
params = {
    "function": "TIME_SERIES_DAILY",
    "symbol": symbol,
    "outputsize": "compact",  
    "datatype": "json",
    "apikey": API_KEY
}

response = requests.get(url, params=params)
data = response.json()

ts = data.get("Time Series (Daily)", {})
df = pd.DataFrame.from_dict(ts, orient="index")
df.index = pd.to_datetime(df.index)
df = df.sort_index()

df.columns = [col.split(". ")[1] for col in df.columns]
df = df.astype(float)

print("✅ Data shape:", df.shape)
print("✅ Missing values:\n", df.isna().sum())

project_root = Path().resolve().parent  # 根目录---Frank持有权限-copyright 
output_path = project_root/"data"/"raw"/"aapl_api.csv"
output_path.parent.mkdir(parents=True, exist_ok=True)
df.to_csv(output_path, index=False)
print(f"✅ Saved to {output_path.resolve()}")

✅ Data shape: (100, 5)
✅ Missing values:
 open      0
high      0
low       0
close     0
volume    0
dtype: int64
✅ Saved to D:\bootcamp_Haochen_Zou\homework\homework4\data\raw\aapl_api.csv
