# Processing and Correcting NEON Hyperspectral Flight Lines for Scalable Spectral Data Analysis

Welcome to this vignette! This guide provides a detailed walkthrough for processing NEON (National Ecological Observatory Network) flight line data, taking you from raw downloads to actionable outputs. The workflow includes converting raw NEON flight lines into ENVI-compatible formats, applying essential data corrections (such as topographic and BRDF adjustments), and extracting hyperspectral data to build comprehensive tables for numerical and statistical analysis.

This workflow has been carefully designed to address the challenges of processing large datasets, ensuring efficient memory usage and high data integrity, even within the constraints of a machine with 250 GB of RAM. By the end of this guide, you will have the tools and understanding to transform raw hyperspectral data into corrected, high-quality datasets ready for advanced ecological and environmental research.

---

## Table of Contents
1. [Introduction](#1-introduction)
2. [Prerequisites](#2-prerequisites)
3. [Environment Setup](#3-environment-setup)
4. [Understanding NEON Flight Lines](#4-understanding-neon-flight-lines)
5. [Finding NEON Flight Codes](#5-finding-neon-flight-codes)
6. [Running the `jefe` Function](#6-running-the-jefe-function)
7. [Handling Large Data Processing](#7-handling-large-data-processing)
8. [Extracting Data and Building Tables](#8-extracting-data-and-building-tables)
9. [Workarounds for RAM Limitations](#9-workarounds-for-ram-limitations)
10. [Conclusion](#10-conclusion)
11. [References](#11-references)

---

## 1. Introduction

Hyperspectral data collected through NEON (National Ecological Observatory Network) flight lines provides a high-resolution spectral view of the Earth's surface, capturing detailed information about vegetation, soils, water, and other environmental components. However, the raw NEON data comes in specialized formats that require processing and correction before they can be used for meaningful analysis. 

This vignette provides a detailed, step-by-step guide to download, convert, and process NEON flight line data, ensuring efficient memory usage and high data integrity throughout the workflow. Along the way, we will create specific file types needed for both corrections and analysis, bridging the gap between raw hyperspectral data and actionable insights.

---

### **Why Extract Hyperspectral Signals?**
Hyperspectral data is invaluable for ecological and environmental research as it provides detailed spectral signatures across hundreds of bands. By extracting these signals and applying corrections, researchers can:
- **Translate Patterns Across Scales:** Connect fine-scale field measurements to broader regional or global observations.
- **Quantify Environmental Changes:** Monitor vegetation health, water quality, or land cover changes over time.
- **Improve Decision-Making Tools:** Build robust models for ecological resilience, biodiversity, and conservation planning.

---

### **Applications of This Workflow**

This workflow is particularly suited for:
- **Scaling Insights Across Spatial Domains:** Translating fine-scale hyperspectral data to broader landscapes ensures consistency and comparability across scales.
- **Monitoring Environmental Changes:** Creating corrected and high-fidelity datasets to track vegetation health, water quality, or land cover over time.
- **Enabling Cross-Sensor Calibration:** Harmonizing hyperspectral data across platforms by applying consistent corrections and resampling techniques.

---

## 2. Prerequisites

Before you begin, ensure you have the following:

- **Hardware Requirements:**
  - A machine with at least **250 GB RAM** to handle large datasets efficiently.

- **Software Requirements:**
  - Access to a pre-configured Python environment with necessary libraries installed, including:
    - `geopandas`, `rasterio`, `pandas`, `numpy`, `hytools`, `scikit-learn`, `matplotlib`, `requests`, `h5py`, `ray`.

- **Data Requirements:**
  - NEON flight line data.
  - Corresponding flight codes to identify and process the relevant flight lines.

- **Additional Tools:**
  - A Jupyter Notebook interface to follow this vignette step by step.

---

## 3. Python Setup

To follow this vignette, you'll need a Python environment configured with the necessary dependencies. If you haven't set up your environment yet, follow the steps below to install the required tools and libraries. This guide assumes you are working in a Jupyter Notebook.

### Required Libraries
Ensure the following libraries are available in your environment:

In [3]:
import hytools as ht
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import random
import nbconvert
import time

In [7]:
pip install spectral

Note: you may need to restart the kernel to use updated packages.


In [8]:
### Loading Earth Lab Spectral Tools

# 1. Enable autoreload in your Jupyter Notebook:

%load_ext autoreload
%autoreload 2

# 2. Import the custom tools module:

import spectral_unmixing_tools_original as el_spectral

# 3. Verify that the tools loaded correctly by printing the module's directory:

print(dir(el_spectral))

['ENVIProcessor', 'GradientBoostingRegressor', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'apply_topo_and_brdf_corrections', 'boosted_quantile_plot', 'boosted_quantile_plot_by_sensor', 'box', 'clean_data_and_write_to_csv', 'concatenate_sensors', 'control_function', 'download_neon_file', 'download_neon_flight_lines', 'extract_overlapping_layers_to_2d_dataframe', 'find_raster_files', 'fit_models_with_different_alpha', 'flight_lines_to_envi', 'generate_config_json', 'generate_correction_configs', 'generate_correction_configs_for_directory', 'get_spectral_data_and_wavelengths', 'glob', 'go_forth_and_multiply', 'gpd', 'h5py', 'ht', 'jefe', 'json', 'load_and_combine_rasters', 'load_spectra', 'mask', 'np', 'os', 'pd', 'plot_each_sensor_with_highlight', 'plot_spectral_data', 'plot_with_highlighted_sensors', 'plt', 'prepare_spectral_data', 'process_all_subdirectories', 'process_and_flatten_array', 'process_hdf5_with_neon2envi', 'ran


## 4. Understanding and Finding NEON Flight Lines

NEON flight lines are aerial survey paths designed to collect high-resolution spectral data across various ecological sites. These datasets are vital for studying vegetation, soil, water bodies, and other environmental parameters, forming the foundation for many ecological and environmental analyses.

### **How to Find Flight Codes**
To process NEON flight lines with the `jefe` function, you’ll need the flight codes corresponding to your desired data. Follow these steps to find them:
1. **Access the NEON Data Portal:** Visit the [NEON Data Portal](https://data.neonscience.org/) to browse available datasets.
2. **Navigate to Flight Line Data:** Locate the section for flight line spectral data at your site of interest.
3. **Identify Relevant Flight Codes:** Each flight line has associated metadata, including its unique flight code. Record the codes for the lines you wish to process.

### **Important Considerations**
1. **Data Availability:**
   - NEON’s Airborne Observation Platform (AOP) data is generally available 60 days after the last collection day at a site.
   - Data collection schedules may shift due to weather or logistical factors. For the latest updates, consult the [NEON Flight Schedules and Coverage page](https://www.neonscience.org/data-collection/flight-schedules-coverage).

2. **Data Quality Updates:**
   - NEON regularly updates its data products to address quality concerns or implement new processing methods.
   - Stay informed about updates or changes that could affect your datasets by checking the [AOP Data Availability Notification](https://www.neonscience.org/impact/observatory-blog/aop-data-availability-notification-release-2024).

---

## 5. Running the `jefe` Function

The `jefe` function orchestrates the entire workflow, including converting flight lines into appropriate file formats, applying corrections, and extracting pixel data to build tables.

### Parameters for `jefe`

To effectively utilize the `jefe` function for processing NEON flight line data, it's crucial to understand and accurately specify its parameters. Below is a detailed guide on each parameter, including how to obtain the necessary information.

#### **`base_folder` (str)**
- **Description:** The directory where output files will be stored.
- **How to Specify:** Choose or create a directory path on your local system where you want the processed data to be saved.

#### **`site_code` (str)**
- **Description:** The NEON site code representing the specific field site.
- **How to Find:**
  - NEON assigns unique four-letter codes to each field site (e.g., "NIWO" for Niwot Ridge).
  - You can find these codes on the [NEON Field Sites page](https://www.neonscience.org/field-sites/explore).

#### **`product_code` (str)**
- **Description:** The NEON data product code identifying the specific data product.
- **How to Find:**
  - NEON data products have unique identifiers (e.g., "DP1.30003.001" for discrete return LiDAR point cloud data).
  - Browse the [NEON Data Products Catalog](https://data.neonscience.org/data-products/explore) to locate the product code relevant to your research.

#### **`year_month` (str)**
- **Description:** The year and month of data collection in `'YYYY-MM'` format.
- **How to Determine:**
  - Data collection periods vary by site and product. Consult the [NEON Data Availability page](https://data.neonscience.org/visualizations/data-availability) to check when data was collected for your site and product of interest.
  - **Important Note:** Data availability is subject to change due to factors like weather conditions and program planning adjustments.

#### **`flight_lines` (list)**
- **Description:** A list of flight line codes to process.
- **How to Find:**
  - Flight line codes correspond to specific aerial survey paths.
  - Access the [NEON Data Portal](https://data.neonscience.org/) and navigate to the desired data product and site.
  - Flight line codes are typically listed in the metadata associated with each dataset.

---




### Example Usage

In [None]:
# el jefe takes 3-5 hours to run and it creates a lot of files. You should have 200+GB of RAM and Storage available.
base_folder = "NIWOT_calibration_flight_08_2020"
site_code = 'NIWO'
product_code = 'DP1.30006.001'
year_month = '2020-08'
flight_lines = [
    'D13_NIWO_DP1_20200807_170802'
]

# Run the jefe function with the provided example parameters
el_spectral.jefe(base_folder, site_code, product_code, year_month, flight_lines)

Processing flight line: D13_NIWO_DP1_20200807_170802
Data retrieved successfully for 2020-08!
Downloading NEON_D13_NIWO_DP1_20200807_170802_reflectance.h5 from https://storage.googleapis.com/neon-aop-products/2020/FullSite/D13/2020_NIWO_4/L1/Spectrometer/ReflectanceH5/2020080714/NEON_D13_NIWO_DP1_20200807_170802_reflectance.h5


--2024-12-04 22:50:15--  https://storage.googleapis.com/neon-aop-products/2020/FullSite/D13/2020_NIWO_4/L1/Spectrometer/ReflectanceH5/2020080714/NEON_D13_NIWO_DP1_20200807_170802_reflectance.h5
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.217.123, 142.251.33.91, 142.251.33.123, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.217.123|:443... connected.
  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 200 OK
Length: 3620916313 (3.4G) [application/octet-stream]
Saving to: ‘NEON_D13_NIWO_DP1_20200807_170802_reflectance.h5’

     0K .......... .......... .......... .......... ..........  0%  613K 96m9s
    50K .......... .......... .......... .......... ..........  0% 1.28M 70m34s
   100K .......... .......... .......... .......... ..........  0% 2.16M 55m56s
   150K .......... .......... .......... .......... ..........  0% 2.90M 46m54s
   200K .......... .......... .......... .......... .......... 

Download completed.

Processing: ./NEON_D13_NIWO_DP1_20200807_170802_reflectance.h5
Command executed successfully
Standard Output: Here we GO!

Exporting ancillary data
[36m(HyTools pid=20099)[0m Exported NEON_D13_NIWO_DP1_20200807_170802_reflectance to ENVI format.
[36m(HyTools pid=20099)[0m Ancillary_Imagery/Path_Length /NIWO/Reflectance/Metadata/Ancillary_Imagery/Path_Length
[36m(HyTools pid=20099)[0m to-sensor_Azimuth_Angle /NIWO/Reflectance/Metadata/to-sensor_Azimuth_Angle
[36m(HyTools pid=20099)[0m to-sensor_Zenith_Angle /NIWO/Reflectance/Metadata/to-sensor_Zenith_Angle
[36m(HyTools pid=20099)[0m Logs/Solar_Azimuth_Angle /NIWO/Reflectance/Metadata/Logs/Solar_Azimuth_Angle
[36m(HyTools pid=20099)[0m Logs/Solar_Zenith_Angle /NIWO/Reflectance/Metadata/Logs/Solar_Zenith_Angle
[36m(HyTools pid=20099)[0m Ancillary_Imagery/Slope /NIWO/Reflectance/Metadata/Ancillary_Imagery/Slope
[36m(HyTools pid=20099)[0m Ancillary_Imagery/Aspect /NIWO/Reflectance/Metadata/Ancillary_Imag

### What Happens When `jefe` Runs

When you run the `jefe` function, a sequence of operations is executed, and multiple outputs are generated. Here's a detailed breakdown:

1. **Downloading Raw Data:**
   - The original NEON flight line folder is downloaded to the specified output directory.
   - The raw folder contains the reflectance data and associated metadata files.

2. **Conversion to Multiple Formats:**
   - The downloaded folder is processed to generate additional formats required for analysis.
   - These files are named systematically to represent the processing step or correction applied. For example:
     - **`_envi`:** Reflectance data in ENVI format.
     - **`_envi_mask`:** Mask files indicating areas to include or exclude during analysis.
     - **`.hdr`:** Header files describing the structure of the associated data.
     - **`.json`:** Configuration files for corrections and processing steps.

3. **Application of Corrections:**
   - Topographic corrections (TOPO) and bidirectional reflectance distribution function (BRDF) corrections are applied to ensure data accuracy.
   - Outputs include:
     - **`_brdf_coeffs__envi.json`:** Coefficients for BRDF corrections.
     - **`_topo_coeffs__envi.json`:** Coefficients for topographic corrections.

4. **Data Extraction and Processing:**
   - Spectral data is extracted pixel by pixel and saved in tabular formats for further analysis.
   - These extractions are saved incrementally to avoid memory overuse.

---

### Example Outputs from a Single Flight Line

After running the `jefe` function, the output directory contains processed files at the top level and a folder for the original raw data. Here’s what you can expect for a single trial run:

---

#### **Main Output Directory:**
- **Processed Files:** Includes ENVI-format files, masks, headers, and configuration files. These represent the final processed outputs ready for analysis.
- **Raw Folder:** A subdirectory containing the original reflectance data downloaded from NEON.

| File Name                                             | Description                                         |
|-------------------------------------------------------|-----------------------------------------------------|
| `NEON_D13_NIWO_DP1_20200807_170802_reflectance__envi` | Reflectance data converted to ENVI format.         |
| `NEON_D13_NIWO_DP1_20200807_170802_reflectance__mask` | Mask file for the reflectance data.                |
| `NEON_D13_NIWO_DP1_20200807_170802_reflectance.hdr`   | Header file describing the ENVI data structure.    |
| `NEON_D13_NIWO_DP1_20200807_170802_reflectance__brdf_coeffs__envi.json` | BRDF correction coefficients. |
| `NEON_D13_NIWO_DP1_20200807_170802_reflectance__topo_coeffs__envi.json` | TOPO correction coefficients. |

---

#### **Raw Folder (Inside the Output Directory):**
- **Original Files:** Contains the raw reflectance data downloaded directly from NEON before any processing steps.

| File Name                                             | Description                                         |
|-------------------------------------------------------|-----------------------------------------------------|
| `NEON_D13_NIWO_DP1_20200807_170802_reflectance`       | Original reflectance data from NEON.               |
| `NEON_D13_NIWO_DP1_20200807_170802_reflectance_ancillary` | Ancillary metadata for corrections.               |
| `NEON_D13_NIWO_DP1_20200807_170802_reflectance_config__envi.json` | Configuration for ENVI data processing. |
| `NEON_D13_NIWO_DP1_20200807_170802_reflectance_config__anc.json`  | Configuration for ancillary corrections. |

---

This structure ensures that:
1. The **processed files** are readily available in the main directory for analysis.
2. The **raw data** is preserved in its original form for reference or reprocessing if needed.

By organizing outputs this way, you can easily navigate between raw and processed data while maintaining a clear workflow history.


### Process Overview

1. **Data Conversion:**
   - Converts NEON reflectance data to formats compatible with ENVI tools and downstream analyses.

2. **Data Corrections:**
   - Applies topographic and BRDF corrections to improve data quality.

3. **Outputs Generated:**
   - Reflectance data in corrected formats.
   - Mask files for regions of interest.
   - Configuration files describing the processing steps.
   - Coefficients for TOPO and BRDF corrections.

By the end of this process, you will have a comprehensive set of files ready for analysis, including corrected reflectance data, metadata, and configurations.

---

## 6. Handling Large Data Processing<a name="handling-large-data-processing"></a>

Processing NEON flight lines involves managing large amounts of spectral data. This workflow incorporates strategies to optimize memory usage and prevent bottlenecks.

### Key Strategies

1. **Chunk Processing:** Processes data in smaller chunks to avoid memory overload.
2. **Direct Disk Writing:** Saves intermediate and final results directly to storage.
3. **Optimized Data Structures:** Uses efficient formats like NumPy arrays and Pandas DataFrames.
4. **Parallel Processing:** Utilizes libraries like `ray` for distributed processing.

---


## 9. Conclusion<a name="conclusion"></a>

This vignette provided a comprehensive, step-by-step guide to processing NEON flight line data, highlighting key techniques and strategies for handling large, complex datasets. The workflow included downloading NEON flight lines, converting them into suitable file formats, applying critical corrections, and extracting hyperspectral data from pixels before writing the results to CSV files for further numerical analysis.

By completing this process, you gain the ability to transform raw NEON airborne data into actionable datasets, enabling robust ecological and environmental research. This workflow is designed to balance efficiency, accuracy, and scalability, ensuring that even massive datasets can be processed on machines with limited resources.

### **Key Takeaways**
1. **Efficient Data Handling:** 
   - From downloading raw flight line data to saving corrected and processed outputs, this workflow demonstrates how to manage large-scale operations effectively.
   - Chunk processing and direct-to-disk writing ensure that memory constraints are respected while maintaining high data fidelity.

2. **Robust Data Corrections:** 
   - The inclusion of topographic and BRDF corrections ensures that the processed data is accurate and reliable for downstream analysis, accounting for variability in reflectance and terrain.

3. **Hyperspectral Data for Analysis:** 
   - The extraction of hyperspectral data from individual pixels provides a valuable resource for detailed numerical and statistical studies, enabling deeper insights into ecological and environmental processes.

4. **Scalability and Reproducibility:** 
   - This workflow is scalable to handle additional flight lines, datasets, and sites, making it a versatile tool for researchers working across diverse geographies and ecological systems.
   - By following standardized steps and leveraging robust tools, you can ensure that your processing is reproducible and aligned with scientific best practices.

---

## 10. References<a name="references"></a>

- **NEON Data Portal:** [https://data.neonscience.org/](https://data.neonscience.org/)
- **GeoPandas Documentation:** [https://geopandas.org/](https://geopandas.org/)
- **Rasterio Documentation:** [https://rasterio.readthedocs.io/](https://rasterio.readthedocs.io/)
- **NumPy Documentation:** [https://numpy.org/doc/](https://numpy.org/doc/)
- **HyTools Documentation:** [https://hytools.readthedocs.io/](https://hytools.readthedocs.io/)
- **Ray Documentation:** [https://docs.ray.io/en/latest/](https://docs.ray.io/en/latest/)
- **NEON Field Sites Page:** [https://www.neonscience.org/field-sites/explore](https://www.neonscience.org/field-sites/explore)
- **NEON Data Products Catalog:** [https://data.neonscience.org/data-products/explore](https://data.neonscience.org/data-products/explore)
- **NEON Data Availability Page:** [https://data.neonscience.org/visualizations/data-availability](https://data.neonscience.org/visualizations/data-availability)
- **NEON Flight Schedules and Coverage:** [https://www.neonscience.org/data-collection/flight-schedules-coverage](https://www.neonscience.org/data-collection/flight-schedules-coverage)
- **AOP Data Availability Notification:** [https://www.neonscience.org/impact/observatory-blog/aop-data-availability-notification-release-2024](https://www.neonscience.org/impact/observatory-blog/aop-data-availability-notification-release-2024)

---