# Tutorial to Download NOAA Satellite Data Files

This tutorial is adapted from a much longer tutorial, written in December 2022 by Dr. Amy Huff, IMSG at NOAA/NESDIS/STAR (amy.huff@noaa.gov) and Dr. Rebekah Esmaili, STC at NOAA/JPSS (rebekah.esmaili@noaa.gov) located here: https://github.com/modern-tools-workshop/AMS-python-workshop-2023 . It demonstrates how to download satellite data files from the GOES-R & JPSS Amazon Web Services (AWS) Simple Storage Service (S3) buckets and the NOAA/NESDIS/STAR gridded aerosol data archive website.  This adapation will only cover how to download satellite data files from the JPSS Amazon Web Services (AWS) Simple Storage Service (S3) buckets for the fire use case.

The downloaded files will include:
- From the JPSS S3 bucket:
    - NOAA-20 VIIRS L2 Active Fires (AF) I-band Environmental Data Record (EDR) data files for Oct 16, 2022 at 21:16-21:19 UTC (3 files)

(for the entire tutorial please go to: https://github.com/modern-tools-workshop/AMS-python-workshop-2023 )

## Topic 1: Getting Started with Jupyter Notebook

### Step 1.1: Import Python packages

We will use two Python packages (libraries) and two Python modules in this tutorial:
- The **S3Fs** library is used to set up a filesystem interface with the Amazon Simple Storage Service (S3)
- The **Requests** library is used to send HTTP requests
- The **datetime** module is used to manipulate dates and times
- The **pathlib** module is used to set filesystem paths for the user's operating system

In [None]:
%pip install s3fs  -q # requests

In [None]:
import s3fs

import requests

import datetime

from pathlib import Path

### Step 1.2: Set directory path where satellite data files will be saved

We set the directory path for the satellite data files using the [pathlib module](https://docs.python.org/3/library/pathlib.html#module-pathlib), which automatically uses the correct format for the user's operating system. This helps avoid errors in situations when more than one person is using the same code file, because Windows uses back slashes in directory paths, while MacOS and Linux use forward slashes. 

To keep things simple for this training, we put the satellite data files we downloaded in the current working directory ```Path(cwd)```, i.e., the same Jupyter Notebook folder where this code file is located.

In [None]:
directory_path = Path.cwd()

## Topic 2: Downloading Satellite Data Files

### Step 2.1: Connect to AWS S3 (Simple Storage Service)

The [NOAA Open Data Dissemination (NODD) program](https://www.noaa.gov/information-technology/open-data-dissemination) is increasing access to NOAA satellite data, including data from the GOES-R geostationary satellites and JPSS polar-orbiting satellites. 

The NODD program disseminates data through collaborations with AWS, Google Earth Engine, and Microsoft Azure. We will use the AWS S3 buckets in this tutorial because they are free to access and do not require any additional registration or a password.

Think of the S3 buckets as online data archives. You do **not** need an AWS cloud computing account to access NOAA satellite data!

The [S3Fs package](https://s3fs.readthedocs.io/en/latest/) allows us to set up a filesystem (```fs```) interface to S3 buckets. We use an anonymous connection (```annon=True```) because the NODD S3 buckets are publicly available & read-only.

In [None]:
fs = s3fs.S3FileSystem(anon=True)

### Step 2.2: Download data from the S3 bucket

The NODD program makes NOAA JPSS polar-orbiting satellite data from the SNPP and NOAA-20 satellites available via [AWS](https://registry.opendata.aws/noaa-jpss/).

There is one S3 bucket that contains all of the JPSS data, which can be viewed in a web browser: 
- [JPSS satellites](https://noaa-jpss.s3.amazonaws.com/index.html)

The JPSS satellites generate an enormous volume of data products, which are gradually being added to the NODD. As a result, JPSS data availability on the NODD varies widely; some JPSS products are not yet included in the NODD, and some products don't have a full archive of files on the NODD. More products are being added all the time, in response to end user requests.

**We thank Lihang Zhou of NOAA/NESDIS/JPSS for her leadership of the JPSS NODD, and Gian Dilawari of NOAA/NESDIS/JPSS and his team for their hard work adding the massive JPSS datasets to the NODD!**

#### Step 2.2.1: Browse the noaa-nesdis-n20-pds S3 bucket

NOTE well that NOAA has moved the Fire product to noaa-nesdis-n20-pds, so we will use that bucket instead of JPSS.

Data files in the JPSS S3 bucket are organized by satellite (SNPP or NOAA-20) and sensor name. There is also a category for blended products (containing data from both satellites).

In this tutorial, we are going to download four data files, all from the NOAA-20 satellite.

Let's access the JPSS bucket and list (```fs.ls```) the available sensors for the NOAA-20 satellite, and then print the sensor names.

In [None]:
# s3://noaa-nesdis-n20-pds/
bucket = 'noaa-nesdis-n20-pds'

sensors = fs.ls(bucket)

for sensor in sensors:
    print(sensor.split('/')[-1])

#### Step 2.3.6: Browse NOAA-20 VIIRS data

We also need to download three NOAA-20 VIIRS data files. Let's list (```fs.ls```) the available VIIRS products.

In [None]:
bucket = 'noaa-nesdis-n20-pds'
sensor = 'VIIRS-AF-Iband-EDR'

products_path = bucket + '/' + sensor

products = fs.ls(products_path)

for product in products:
    print(product.split('/')[-1])

#### Step 2.3.7: Browse NOAA-20 VIIRS AF I-band data for October 16, 2022

There are a lot of VIIRS data products: > 80! We are going to download VIIRS Active Fires (AF) I-band Environmental Data Record (EDR) files for October 16, 2022 at 21:16-21:19 UTC, when wildfires in the US Pacific Northwest underwent explosive growth. You will combine these three individual netCDF4 files into one large netCDF4 file, and use the data in this file to plot fire detections on a map.

Let's list (```fs.ls```) the available NOAA-20 VIIRS AF I-band data for October 16, 2022, and then print the total number of files and the first 10 file names.

In [None]:
bucket = 'noaa-nesdis-n20-pds'
sensor = 'VIIRS-AF-Iband-EDR'
year = 2022
month = 10
day = 16

files_path = bucket + '/' + sensor + '/' + str(year) + '/' + str(month).zfill(2)  + '/' + str(day).zfill(2)

files = fs.ls(files_path)

print('Total number of files:', len(files), '\n')

for file in files[:10]:
    print(file.split('/')[-1])

#### Step 2.3.8: Find the NOAA-20 VIIRS AF I-band data files for October 16, 2022 at 21:16-21:19 UTC

We can see there are a lot of VIIRS AF I-band EDR files for October 16: 1,011! Again, this is because the JPSS satellites have global coverage.

As we did in Step 2.5.4, we use slicing and list comprehension to identify the three files we want by using the information in the file names to match the starting (```s```) observation time range of ```2116``` to ```2119```. Then we print the file names to confirm they are the ones we want and check the approximate size of each file (```fs.size```) before we download them.

In [None]:
start_time = '2116'
end_time = '2119'

matches = [file for file in files if (file.split('/')[-1].split('_')[3][9:13] >= start_time and file.split('/')[-1].split('_')[3][9:13] <= end_time)]

for match in matches:
    print(match.split('/')[-1])
    print('Approximate file size (MB):', round((fs.size(match)/1.0E6), 2))

#### Step 2.3.9: Download the NOAA-20 VIIRS AF I-band data files for October 16, 2022 at 21:16-21:19 UTC

We use the same code as in Steps 2.2.5 and 2.3.5 to download the NOAA-20 VIIRS AF I-band files to our local computer.

In [None]:
for match in matches:
    fs.get(match, str(directory_path / match.split('/')[-1]))

In [None]:
# do not close this notebook until you see the files 
# AF-Iband_v1r0_j01_s202210162116424_e202210162118070_c202210162142397.nc
# AF-Iband_v1r0_j01_s202210162118082_e202210162119327_c202210162142235.nc
# AF-Iband_v1r0_j01_s202210162119340_e202210162120567_c202210162143123.nc
# ... have been downloaded in your current working directory.