# 01: Data Acquisition Guide
## Healthcare Resource Optimization Project

This notebook guides you through acquiring the CDC NHAMCS dataset and setting up API credentials.

## 1. CDC NHAMCS Dataset

### Download Instructions:
1. Visit: https://www.cdc.gov/nchs/ahcd/datasets_documentation_related.htm
2. Download the Emergency Department (ED) data files
3. Save to `data/raw/nhamcs/` directory

### Required Files:
- NHAMCS ED 2021 data file (CSV or SAS format)
- Documentation file
- Codebook

In [None]:
import pandas as pd
import os

# Create directory structure
directories = [
    'data/raw/nhamcs',
    'data/raw/cdc_news',
    'data/raw/reddit_health',
    'data/raw/twitter_health',
    'data/processed',
    'models',
    'logs',
    'visualizations'
]

for directory in directories:
    os.makedirs(directory, exist_ok=True)
    print(f"Created: {directory}")

## 2. Reddit API Setup

### Steps:
1. Go to: https://www.reddit.com/prefs/apps
2. Click "Create App" or "Create Another App"
3. Fill in:
   - Name: Healthcare Trend Analyzer
   - Type: Script
   - Redirect URI: http://localhost:8080
4. Copy Client ID and Secret to `.env` file

In [None]:
# Test Reddit API connection
from dotenv import load_dotenv
import os

load_dotenv()

print("Reddit API Configuration:")
print(f"Client ID configured: {'REDDIT_CLIENT_ID' in os.environ}")
print(f"Client Secret configured: {'REDDIT_CLIENT_SECRET' in os.environ}")
print(f"User Agent configured: {'REDDIT_USER_AGENT' in os.environ}")

## 3. Verify Installation

Check all required packages are installed:

In [None]:
import sys

required_packages = [
    'pandas', 'numpy', 'scipy', 'scikit-learn',
    'requests', 'beautifulsoup4', 'praw',
    'textblob', 'matplotlib', 'seaborn', 'plotly'
]

print("Package Installation Check:")
print("-" * 50)

for package in required_packages:
    try:
        __import__(package.replace('-', '_'))
        print(f"✓ {package}")
    except ImportError:
        print(f"✗ {package} - NOT INSTALLED")

print("-" * 50)
print(f"Python version: {sys.version}")

## 4. Next Steps

After completing data acquisition:
1. Run notebook 02: Web Scraping - CDC
2. Run notebook 03: Web Scraping - Reddit
3. Run notebook 04: Web Scraping - Twitter
4. Proceed to data cleaning and analysis