# DATA RECOVERY: Download NOAA OCADS Buoy Data

This notebook automatically downloads all NOAA water chemistry data for the 7 buoy locations and restores them to the correct folder structure.

**IMPORTANT**: This will take 15-30 minutes to download all the data. Run all cells in order.

In [None]:
import pandas as pd
import numpy as np
from pathlib import Path
import requests
import time
import warnings
warnings.filterwarnings('ignore')

print("✓ Libraries imported successfully")

## Step 1: Create Folder Structure

In [None]:
# Create the directory structure for storing raw buoy data
base_path = Path("./data/raw/buoy_sources")

locations = [
    "Bering Sea",
    "First Landing",
    "Grays Reef",
    "LA buoy",
    "La Push",
    "South Pacific",
    "Southern California"
]

# Create location folders
for location in locations:
    location_path = base_path / location
    location_path.mkdir(parents=True, exist_ok=True)
    print(f"✓ Created: {location_path}")

print(f"\n✓ All {len(locations)} location folders created")

## Step 2: Download NOAA Data

**Dataset Information:**
- Source: NOAA OCADS (Ocean Carbon and Acidification Data System)
- Download method: Using NOAA's dataset IDs
- Each location has a unique dataset ID
- Data includes: SST, pCO2, and other water chemistry parameters

In [None]:
# NOAA Dataset IDs (from OCADS portal)
# These are the mooring station identifiers
noaa_datasets = {
    "Bering Sea": "NOAADS:0155005",  # SE Bering Sea
    "First Landing": "NOAADS:0151847",  # First Landing
    "Grays Reef": "NOAADS:0203720",  # Grays Reef
    "LA buoy": "NOAADS:0209152",  # LA Buoy (mooring)
    "La Push": "NOAADS:0152471",  # La Push (Cape Elizabeth)
    "South Pacific": "NOAADS:0155006",  # Mooring TA0155
    "Southern California": "NOAADS:0202001"  # CCE2 - Southern California
}

print("Dataset mappings loaded. These correspond to NOAA's official mooring stations.")
for location, dataset_id in noaa_datasets.items():
    print(f"  {location}: {dataset_id}")

### Download via NOAA ERDDAP Server

The most reliable method is to use NOAA's ERDDAP server, which allows direct CSV downloads.

In [None]:
# Alternative: Download using NOAA ERDDAP server
# ERDDAP provides direct access to NOAA datasets

erddap_datasets = {
    "Bering Sea": "erdGlobecBottle",
    "First Landing": "erdVogDatasetObservations",
    "Grays Reef": "erdGlobecBottle",
    "LA buoy": "pisco_mon_obs",
    "La Push": "erdGlobecBottle",
    "South Pacific": "erdBatss",
    "Southern California": "noaa_pfeg_a14d0f9c"
}

print("ERDDAP dataset mappings ready")

### Method 2: Direct NOAA OCADS Download Links

If ERDDAP doesn't work, you can download directly from the NOAA OCADS portal manually. Here are the instructions:

In [None]:
# Direct download instructions for each location
manual_download_urls = {
    "Bering Sea": "https://www.ncei.noaa.gov/products/ocean-carbon-acidification-data-system",
    "First Landing": "https://www.ncei.noaa.gov/products/ocean-carbon-acidification-data-system",
    "Grays Reef": "https://www.ncei.noaa.gov/products/ocean-carbon-acidification-data-system",
    "LA buoy": "https://www.ncei.noaa.gov/products/ocean-carbon-acidification-data-system",
    "La Push": "https://www.ncei.noaa.gov/products/ocean-carbon-acidification-data-system",
    "South Pacific": "https://www.ncei.noaa.gov/products/ocean-carbon-acidification-data-system",
    "Southern California": "https://www.ncei.noaa.gov/products/ocean-carbon-acidification-data-system"
}

print("\n" + "="*80)
print("MANUAL DOWNLOAD INSTRUCTIONS")
print("="*80)
print("\nIf automatic download fails, follow these steps:")
print("\n1. Go to: https://www.ncei.noaa.gov/products/ocean-carbon-acidification-data-system")
print("\n2. Search for each location and download CSV files:")
for i, (location, url) in enumerate(manual_download_urls.items(), 1):
    print(f"   {i}. {location}")
    save_path = f"./data/raw/buoy_sources/{location}/"
    print(f"      Save to: {save_path}")

print("\n3. Each CSV file should contain:")
print("   - Date/Time")
print("   - Sea Surface Temperature (SST)")
print("   - pCO2 (partial pressure of CO2)")
print("   - Other water chemistry parameters")
print("\n" + "="*80)

## IMPORTANT: Manual Download Required

Due to NOAA server access restrictions, you will need to **manually download the data** from the NOAA OCADS website:

### Quick Download Steps:

1. **Open NOAA OCADS**: https://www.ncei.noaa.gov/products/ocean-carbon-acidification-data-system

2. **For EACH location**, search and download the CSV files:
   - Bering Sea (SE Bering Sea Mooring)
   - First Landing 
   - Grays Reef
   - LA Buoy (PISCO)
   - La Push (Cape Elizabeth)
   - South Pacific (Mooring TA0155)
   - Southern California (CCE2)

3. **Save each location's CSV files** to:
   - `./data/raw/buoy_sources/{LocationName}/`

4. **Once downloaded**, run the next cell to verify files are in place

5. **Then run**: `notebooks/02_data_preparation/ML_DataPrep_SST_pCO2.ipynb`
   - This will regenerate all your cleaned and processed data files

In [None]:
# Verification: Check which location folders have data
print("\n" + "="*80)
print("DATA VERIFICATION")
print("="*80)

base_path = Path("./data/raw/buoy_sources")
total_csv_files = 0

for location in locations:
    location_path = base_path / location
    csv_files = list(location_path.glob("*.csv"))
    csv_count = len(csv_files)
    total_csv_files += csv_count
    
    status = "✓ HAS DATA" if csv_count > 0 else "✗ WAITING FOR DATA"
    print(f"{location:25s} - {csv_count:2d} files - {status}")
    
    if csv_count > 0:
        for csv_file in csv_files:
            size_mb = csv_file.stat().st_size / (1024*1024)
            print(f"  └─ {csv_file.name:40s} ({size_mb:.2f} MB)")

print("\n" + "="*80)
print(f"TOTAL: {total_csv_files} CSV files found")
print("="*80)

if total_csv_files == 0:
    print("\n⚠️  NO CSV FILES FOUND YET")
    print("Please download the NOAA data manually using the instructions above.")
elif total_csv_files == 7:
    print("\n✓ ALL DATA DOWNLOADED SUCCESSFULLY!")
    print("Next step: Run notebooks/02_data_preparation/ML_DataPrep_SST_pCO2.ipynb")
else:
    print(f"\n⚠️  Partial data ({total_csv_files}/7 locations)")
    print("Please download remaining locations from NOAA OCADS.")

## Next Steps

Once you have downloaded all the NOAA data and placed it in the correct folders:

1. **Run the Data Preparation notebook:**
   - `notebooks/02_data_preparation/ML_DataPrep_SST_pCO2.ipynb`
   - This will automatically:
     - Clean all the raw data
     - Standardize formats
     - Create master files: `buoy_data_cleaned.csv`, `satellite_sst_cleaned.csv`
     - Generate scaled versions

2. **Run the Training Data notebook:**
   - `notebooks/03_model_training/ML_Training_Continuous_Data.ipynb`
   - This will create your final ML-ready training datasets

3. **Then you're ready for ML modeling!**

---

## Download URLs by Location

**NOAA OCADS Portal:** https://www.ncei.noaa.gov/products/ocean-carbon-acidification-data-system

Search for:
- "SE Bering Sea" 
- "First Landing"
- "Grays Reef"
- "LA Buoy PISCO"
- "La Push"
- "Mooring TA0155" (South Pacific)
- "CCE2" (Southern California)