# NOAA PrecipRate GRIB2 File Access Demo

**Prepared for:** Texas Department of Transportation (TxDOT)  
**Purpose:** Technical demonstration of NOAA MRMS PrecipRate data access methods

This notebook demonstrates three methods for accessing NOAA MRMS PrecipRate GRIB2 files:

1. **Programmatic API/SDK Access** (Python - boto3)
2. **Direct HTTPS Download** (Cross-platform - curl/wget)
3. **Production Implementation Reference** (Python - our implementation)

## Setup

This notebook will:
- Detect your Python environment
- Install required libraries (boto3, requests)
- Set up the working directory
- Test connectivity

---


## Prerequisites and Installation

**Required packages:** `boto3` and `requests`

**Quick install:**
```bash
pip install boto3 requests
```

**Alternative:** The setup cell below will automatically install dependencies if needed.

---

## Step 0: Environment Setup and Testing

This cell will:
1. Check your Python version and install required libraries
2. Create working directory for downloads  
3. Test S3 connectivity


In [None]:
#!/usr/bin/env python3
"""
Environment Setup for NOAA PrecipRate Access Demo
This cell handles setup and works from any directory
"""

import sys
import subprocess
import os
from pathlib import Path

print("AUTOMATED SETUP STARTING...")
print("=" * 50)

# 1. Detect current working directory (where the notebook is located)
current_dir = Path.cwd()
notebook_dir = Path(__file__).parent if '__file__' in globals() else current_dir

# Detect if we're in Jupyter by looking for the notebook file
if (current_dir / "noaa_preciprate_access_demo.ipynb").exists():
    notebook_dir = current_dir
elif (current_dir.parent / "noaa_preciprate_access_demo.ipynb").exists():
    notebook_dir = current_dir.parent
else:
    # Default to current directory
    notebook_dir = current_dir

print(f"Detected notebook directory: {notebook_dir.name}")
print(f"Python version: {sys.version}")
print(f"Python executable: {sys.executable}")

# 2. Create working directory for downloads in the same location as notebook
work_dir = notebook_dir / "grib2_downloads"
work_dir.mkdir(exist_ok=True)
print(f"Created working directory: {work_dir.name}")

# 3. Function to install packages safely
def install_package(package_name):
    """Install a package using pip, with error handling"""
    try:
        __import__(package_name.split('==')[0])  # Check if already installed
        print(f"{package_name} is already installed")
        return True
    except ImportError:
        print(f"Installing {package_name}...")
        try:
            subprocess.check_call([sys.executable, "-m", "pip", "install", package_name, "--quiet"])
            print(f"Successfully installed {package_name}")
            return True
        except subprocess.CalledProcessError as e:
            print(f"Failed to install {package_name}: {e}")
            return False

# 4. Install required packages
required_packages = ["boto3", "requests"]
all_installed = True

for package in required_packages:
    if not install_package(package):
        all_installed = False

if all_installed:
    print("\nALL PACKAGES INSTALLED SUCCESSFULLY!")
else:
    print("\nSome packages failed to install. You may need to install them manually.")

# 5. Test imports
print("\nTESTING IMPORTS...")
try:
    import boto3
    from botocore import UNSIGNED
    from botocore.config import Config
    from datetime import datetime
    import requests
    print("All required libraries imported successfully!")
    
    # 6. Test AWS connection (no credentials needed)
    print("\nTESTING AWS S3 CONNECTION...")
    s3_test = boto3.client('s3', region_name='us-east-1', config=Config(signature_version=UNSIGNED))
    
    # Quick test - list a few files from today to verify connection
    today = datetime.utcnow().strftime("%Y%m%d")
    test_prefix = f'CONUS/PrecipRate_00.00/{today}/'
    
    resp = s3_test.list_objects_v2(Bucket='noaa-mrms-pds', Prefix=test_prefix, MaxKeys=1)
    if 'Contents' in resp:
        print("Successfully connected to NOAA S3 bucket!")
        print(f"Found data files for {today}")
    else:
        print("Connected to S3 but no files found for today yet (this might be normal if it's early in the day)")
    
except Exception as e:
    print(f"Import or connection test failed: {e}")
    print("Please check your internet connection and try again.")

print("\n" + "=" * 50)
print("SETUP COMPLETE! You can now run the remaining cells.")
print(f"Files will be downloaded to: {work_dir.name}")
print(f"Working directory: {notebook_dir.name}")
print("=" * 50)


## Method 1: Programmatic API/SDK Access (Python - boto3)

**Platform:** Python only  
**Requirements:** boto3 library (installed automatically above)  
**Credentials:** None required  

This method uses the AWS SDK for Python to programmatically list and download files.

### Interactive Examples:


In [None]:
# Import the libraries we installed above
import boto3
from botocore import UNSIGNED
from botocore.config import Config
from datetime import datetime, timedelta
import os

# Set up the S3 client for public (unsigned) access
s3 = boto3.client('s3', region_name='us-east-1', config=Config(signature_version=UNSIGNED))
bucket = 'noaa-mrms-pds'

# Create prefix for today's files
today = datetime.utcnow().strftime("%Y%m%d")
prefix = f'CONUS/PrecipRate_00.00/{today}/'

def round_to_even_minute(dt):
    """
    Round a datetime to the nearest even minute (00, 02, 04, 06, 08, etc.)
    NOAA MRMS PrecipRate files are only available at even-minute intervals.
    
    Args:
        dt (datetime): Input datetime object
        
    Returns:
        datetime: Datetime rounded to nearest even minute
    """
    # If minute is already even, return as-is
    if dt.minute % 2 == 0:
        return dt.replace(second=0, microsecond=0)
    
    # If minute is odd, round to nearest even minute
    if dt.minute % 2 == 1:
        # Round down to previous even minute
        rounded_minute = dt.minute - 1
        return dt.replace(minute=rounded_minute, second=0, microsecond=0)

def get_grib2_filename(dt):
    """
    Generate the expected GRIB2 filename for a given datetime.
    Automatically rounds to nearest even minute.
    
    Args:
        dt (datetime): Input datetime object
        
    Returns:
        str: GRIB2 filename
    """
    # Round to even minute first
    rounded_dt = round_to_even_minute(dt)
    date_str = rounded_dt.strftime('%Y%m%d')
    time_str = rounded_dt.strftime('%H%M%S')
    return f"MRMS_PrecipRate_00.00_{date_str}-{time_str}.grib2.gz"

print(f"S3 client configured for bucket: {bucket}")
print(f"Looking for files with prefix: {prefix}")
print(f"Today's date (UTC): {today}")

# Example of timestamp rounding
now = datetime.utcnow()
rounded_now = round_to_even_minute(now)
print(f"\nTimestamp rounding example:")
print(f"Current time: {now.strftime('%H:%M:%S')}")
print(f"Rounded time: {rounded_now.strftime('%H:%M:%S')}")
print(f"Example filename: {get_grib2_filename(now)}")


In [None]:
# List all objects in today's folder
try:
    print("Searching for recent files...")
    resp = s3.list_objects_v2(Bucket=bucket, Prefix=prefix, MaxKeys=10)  # Limit to 10 for demo
    
    if 'Contents' in resp:
        files = [obj['Key'] for obj in resp['Contents']]
        print(f"Found {len(files)} files for today. Showing first 5:")
        for i, file in enumerate(files[:5]):
            print(f"   {i+1}. {file}")
        
        # Find the latest file by sorting (filenames contain timestamps)
        latest_file = sorted(files)[-1]
        print(f"\nLatest file: {latest_file}")
        
    else:
        print(f"No files found for today ({today}). This might be normal if it's early in the day.")
        print("Files are typically updated every 2 minutes. Please try again later.")
        latest_file = None
            
except Exception as e:
    print(f"Error listing files: {e}")
    latest_file = None


In [None]:
# Download the latest file (if found) to our working directory
if latest_file:
    local_filename = work_dir / latest_file.split('/')[-1]  # Save to working directory
    
    try:
        print(f"Downloading {latest_file}")
        print(f"Saving to: {local_filename.name}")
        s3.download_file(bucket, latest_file, str(local_filename))
        
        # Check if file was downloaded and get its size
        if local_filename.exists():
            file_size = local_filename.stat().st_size
            print(f"Successfully downloaded: {local_filename.name}")
            print(f"   File size: {file_size:,} bytes ({file_size / (1024*1024):.2f} MB)")
            print(f"   Full path: {local_filename}")
        else:
            print("Download failed: File not found locally")
            
    except Exception as e:
        print(f"Error downloading file: {e}")
else:
    print("No file to download.")


In [None]:
# Download a specific file by timestamp (with automatic even-minute rounding)
# This demonstrates how to download a file for a specific time, even if you specify an odd minute

# Example: Try to get a file for a specific time (even if it has odd minutes)
target_time = datetime.utcnow() - timedelta(hours=1)  # 1 hour ago
print(f"Target time: {target_time.strftime('%Y-%m-%d %H:%M:%S')}")

# Round to even minute and generate filename
rounded_time = round_to_even_minute(target_time)
filename = get_grib2_filename(target_time)
s3_key = f"CONUS/PrecipRate_00.00/{rounded_time.strftime('%Y%m%d')}/{filename}"

print(f"Rounded time: {rounded_time.strftime('%Y-%m-%d %H:%M:%S')}")
print(f"S3 key: {s3_key}")
print(f"Filename: {filename}")

# Check if the file exists and download it
try:
    print(f"\nChecking if file exists: {filename}")
    resp = s3.head_object(Bucket=bucket, Key=s3_key)
    print("SUCCESS: File exists!")
    
    # Download the specific file
    local_filename_specific = work_dir / filename
    if not local_filename_specific.exists():
        print(f"Downloading specific file: {filename}")
        s3.download_file(bucket, s3_key, str(local_filename_specific))
        
        file_size = local_filename_specific.stat().st_size
        print(f"SUCCESS: Successfully downloaded: {filename}")
        print(f"   File size: {file_size:,} bytes ({file_size / (1024*1024):.2f} MB)")
    else:
        print(f"File already exists locally: {filename}")
        
except Exception as e:
    print(f"ERROR: File not found or error: {e}")
    print(f"This might happen if the file is too old or not yet available.")
    print(f"Files are typically available for the last few days.")


## Method 2: Direct HTTPS Download (Cross-Platform)

**Platform:** Any (Windows/Mac/Linux)  
**Requirements:** curl, wget, or Python requests (requests installed automatically above)  
**Credentials:** None required  

This method downloads files directly using their public HTTPS URLs.


### Method 2A: Using curl (Terminal/Command Prompt)

**Platform:** Any terminal with curl installed  
**Copy-paste examples:**

**Windows:**
```cmd
curl -O "https://noaa-mrms-pds.s3.amazonaws.com/CONUS/PrecipRate_00.00/20241225/MRMS_PrecipRate_00.00_20241225-120000.grib2.gz"
```

**Mac/Linux:**
```bash
curl -O https://noaa-mrms-pds.s3.amazonaws.com/CONUS/PrecipRate_00.00/20241225/MRMS_PrecipRate_00.00_20241225-120000.grib2.gz
```

**Note:** Replace the date and time in the URL with your desired timestamp. See the timestamp requirements section below.


### Method 2B: Using wget (Linux/Mac Terminal)

**Platform:** Linux/Mac with wget installed

**Example:**
```bash
wget https://noaa-mrms-pds.s3.amazonaws.com/CONUS/PrecipRate_00.00/20241225/MRMS_PrecipRate_00.00_20241225-120000.grib2.gz
```

**Install wget on Mac (if needed):**
```bash
brew install wget
```


### Method 2C: Using Python requests (Python Only)

**Platform:** Python only  
**Requirements:** requests library  

**Interactive Python Code:**


In [None]:
import requests

# Method A: Download the latest file found above
if latest_file:
    https_url = f"https://noaa-mrms-pds.s3.amazonaws.com/{latest_file}"
    local_filename_https = work_dir / f"https_{latest_file.split('/')[-1]}"
    
    print(f"Method A - Downloading latest file via HTTPS: {https_url}")
    print(f"Saving to: {local_filename_https.name}")
    
    try:
        response = requests.get(https_url, stream=True)
        response.raise_for_status()  # Raise an exception for bad status codes
        
        with open(local_filename_https, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)
        
        file_size = local_filename_https.stat().st_size
        print(f"SUCCESS: Successfully downloaded via HTTPS: {local_filename_https.name}")
        print(f"   File size: {file_size:,} bytes ({file_size / (1024*1024):.2f} MB)")
        
    except Exception as e:
        print(f"ERROR: Error downloading via HTTPS: {e}")
else:
    print("No latest file available to download via HTTPS.")

print("\n" + "="*60)

# Method B: Download by constructing URL manually (with even-minute rounding)
print("Method B - Manual URL construction with timestamp rounding:")

# Example: Create URL for a specific time (demonstrating even-minute rounding)
specific_time = datetime.utcnow() - timedelta(minutes=30)  # 30 minutes ago
rounded_time = round_to_even_minute(specific_time)
manual_filename = get_grib2_filename(specific_time)

# Construct the HTTPS URL manually
date_str = rounded_time.strftime('%Y%m%d')
manual_url = f"https://noaa-mrms-pds.s3.amazonaws.com/CONUS/PrecipRate_00.00/{date_str}/{manual_filename}"

print(f"Original time: {specific_time.strftime('%H:%M:%S')}")
print(f"Rounded time: {rounded_time.strftime('%H:%M:%S')}")
print(f"Manual URL: {manual_url}")
print(f"Filename: {manual_filename}")

# Try to download the manually constructed URL
local_filename_manual = work_dir / f"manual_{manual_filename}"

try:
    print(f"\nAttempting to download: {manual_filename}")
    response = requests.get(manual_url, stream=True)
    response.raise_for_status()
    
    with open(local_filename_manual, 'wb') as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)
    
    file_size = local_filename_manual.stat().st_size
    print(f"SUCCESS: Successfully downloaded manually constructed URL!")
    print(f"   File size: {file_size:,} bytes ({file_size / (1024*1024):.2f} MB)")
    
except requests.exceptions.HTTPError as e:
    if e.response.status_code == 404:
        print(f"ERROR: File not found (404). The file might be too old or not yet available.")
        print(f"   Files are typically available for the last few days.")
    else:
        print(f"ERROR: HTTP Error: {e}")
except Exception as e:
    print(f"ERROR: Error downloading manually constructed URL: {e}")


## Summary: Even-Minute Timestamp Requirement

**Important:** NOAA MRMS PrecipRate files are only available at **even-minute intervals** (00, 02, 04, 06, 08, etc.).

### Why Even Minutes Only?

- **Data Collection**: NOAA generates precipitation data every 2 minutes
- **File Naming**: Files are timestamped at the exact 2-minute intervals when data is processed
- **No Interpolation**: There are no files for odd minutes (01, 03, 05, etc.)

### What This Means for Your Code

**Valid timestamps:**
- `120000` (12:00:00)
- `120200` (12:02:00)
- `120400` (12:04:00)
- `120600` (12:06:00)

**Invalid timestamps:**
- `120100` (12:01:00)
- `120300` (12:03:00)
- `120500` (12:05:00)

### Utility Functions Provided

The notebook includes helper functions to handle this automatically:

- `round_to_even_minute(datetime)` - Rounds any timestamp to the nearest even minute
- `get_grib2_filename(datetime)` - Generates the correct filename with even-minute rounding

### Best Practices

1. **Always round timestamps** before constructing file paths or URLs
2. **Use the provided utility functions** for consistency
3. **Test with recent timestamps** (files are typically available for the last few days)
4. **Handle 404 errors gracefully** when files might not exist yet


## Method 3: Production Implementation Reference

**Platform:** Python  
**Requirements:** boto3, specialized libraries for processing  

This section shows how the production code (`grib2_processor.py`) handles downloads:

```python
def __init__(self, output_dir=None):
    self.s3_client = boto3.client('s3', 
        region_name='us-east-1', 
        config=Config(signature_version=UNSIGNED, max_pool_connections=50))
    self.bucket_name = 'noaa-mrms-pds'

def download_grib2(self, utc_time):
    """Download GRIB2 file from S3 for a given UTC time"""
    date_str = utc_time.strftime('%Y%m%d')
    filename = self.get_grib2_filename(utc_time)
    s3_key = f"CONUS/PrecipRate_00.00/{date_str}/{filename}"
    
    local_path = self.output_dir / filename
    if local_path.exists():
        return local_path
        
    try:
        self.s3_client.download_file(self.bucket_name, s3_key, str(local_path))
        return local_path
    except Exception as e:
        logger.error(f"Error downloading S3 object {s3_key}: {e}")
        return None
```

### Key Production Features

1. **Concurrent Downloads**: Uses asyncio semaphores to limit simultaneous downloads  
2. **Retry Logic**: Implements exponential backoff for failed downloads
3. **File Management**: Tracks downloaded files for cleanup
4. **Error Handling**: Comprehensive logging and error recovery


## File Naming and S3 Structure

**File naming pattern:**
```
MRMS_PrecipRate_00.00_{YYYYMMDD}-{HHMMSS}.grib2.gz
```

**S3 path structure:**
```
noaa-mrms-pds/CONUS/PrecipRate_00.00/{YYYYMMDD}/{filename}
```

**Example:**
```
noaa-mrms-pds/CONUS/PrecipRate_00.00/20241225/MRMS_PrecipRate_00.00_20241225-120000.grib2.gz
```


## Summary

### Available Methods

| Method | Platform | Requirements | Best For |
|--------|----------|--------------|----------|
| **Method 1: boto3** | Python | boto3 library | Automated applications, production |
| **Method 2A: curl** | Any | curl installed | Quick downloads, scripting |
| **Method 2B: wget** | Linux/Mac | wget installed | Terminal downloads |
| **Method 2C: requests** | Python | requests library | Python integration |
| **Method 3: Production** | Python | Custom modules | Large-scale processing |

### Key Information

- **Access**: Public S3 bucket - no AWS credentials required
- **File format**: Compressed `.grib2.gz` files  
- **Update frequency**: New files every 2 minutes
- **Timestamp requirement**: Even-minute intervals only (00, 02, 04, 06, 08, etc.)
- **Bucket**: `noaa-mrms-pds`
- **Path**: `CONUS/PrecipRate_00.00/{YYYYMMDD}/{filename}`
