# NDVI Calculation

You will work with high-quality satellite imagery from Sentinel to display both the **Red (R)** and **Near-InfraRed (NIR)** bands for a specific area (rather a very small one, such as $100\text{ m}^2$).

Please complete the first section (**Setup**) before the workshop starts.

## Setup

1. Ensure your project directory is properly organized.

        project-directory/
        │
        ├── Images/
        │
        ├── raw/
        │   ├── ndvi/
        │   ├── ndwi/
        │   └── time_series/
        │
        └── script.py

        We will use the raw/ folder to store input raster files (Sentinel TIFF images).


2. Ensure that the necessary libraries are installed and configured in your Python environment.

In [None]:
# Install required packages (uncomment to install if needed)
# !pip install rasterio numpy matplotlib
# Import libraries
import rasterio
import numpy as np
from pathlib import Path
import matplotlib.pyplot as plt
import re

# Print versions of the libraries to ensure compatibility
print(f"Rasterio version: {rasterio.__version__}")
print(f"Numpy version: {np.__version__}")

These resources provide comprehensive information on usage, functions, and examples for each library:

- [**rasterio**](https://rasterio.readthedocs.io/): For reading and processing raster data
- [**numpy**](https://numpy.org/doc/): For numerical operations
- [**pathlib**](https://docs.python.org/3/library/pathlib.html): For handling file paths
- [**matplotlib**](https://matplotlib.org/stable/contents.html): For data visualization
- [**re (Regular Expressions)**](https://docs.python.org/3/library/re.html): For handling regular expressions

## Step-by-Step Guide for Geospatial Analysis

### Step 1: Accessing Sentinel Imagery

- **[SentiWiki](https://sentiwiki.copernicus.eu/web/sentiwiki)** will help you choose a satelite for getting the data. 
- Once selected, visit the [Copernicus Open Access Hub](https://dataspace.copernicus.eu/) to access the data.
- Which bands do you need for an NDVI analysis?
- Start downloading single data. Choose an area of Interest (Crop to AOI).

### Step 2: Download Bands

Identify and download the required bands for calculating the **NDVI**. Ensure to:

- You can use coordinates (latitude and longitude) to define the exact location.
- Select the **TIFF** format with **16-bit** depth.
- Choose a **MEDIUM** (682 x 413 px is automatically set) resolution suitable for your analysis.
- You can use `pathlib` library to manipulate files. Load the data and characterize it.
- Discuss what px refers to.

**A Raster File:**

A raster file is an image file that contains data organized in a grid of cells, often representing geographic data. In satellite imagery, each cell (or pixel) in the grid has a specific value that can represent various physical characteristics, such as light reflectance in different bands (e.g., Red, NIR, Blue, Green, etc.).

The `.tiff` files (often GeoTIFF format) commonly used in remote sensing contain such raster data. They typically include metadata that describes the spatial reference, coordinate system, and the specific bands of data (like B4 for Red and B8 for NIR in Sentinel-2 imagery).

**How deep do you want to go?** 

- You can download the data from the GUI or develop a Python API.
- Load and analyze the contet of the data. 
- Examine various attributes of the dataset object to understand its structure and properties: `type(src).__name__`, `src.width`, `src.height`, `src.crs`, `src.count`, `src.meta`.
- Particularly understand `type(src.read()).__name__`. Check the putputs of `src.read().shape` and `src.read(1).shape`.

In [None]:
# Get data from Sentinel-2 L2A
directory_path = Path('raw/ndvi')
# Gey TIFF files names
for file_path in directory_path.glob("*.tiff"):
    print(file_path)
print('-'*20)

for file_path in directory_path.glob("*B*.tiff"):
    print('Choose a file:')
    print(file_path)
    file_path = file_path
print('*'*20)

# Type the solution here:
# ............................


### Step 3: Explore, Characterize and Visualize the Data

Begin by defining a function that reads a band from a .tiff file and returns both the band data and its metadata. You can use Rasterio again; the Python library designed for reading and writing geospatial raster data. Rasterio provides tools to work with GeoTIFF files efficiently and integrates well with other geospatial libraries like NumPy. Convert the data to ‘float32’, as it will be required for further calculations. Refer to the function’s docstring for guidance.

Try the function.

After implementing this function, complete the subsequent cell to understand the outputs of `.min()`, `.max()` and `.shape`. This will provide insight into the data you’re handling. 

Finally, visualize the bands to gain a comprehensive understanding of the dataset. You can use `.imshow()` combined with a suitable `cmap="RdYlGn"` to display the images side by side. Activate the `axis` and explain. Interpret the bands (see below).

In [3]:
def read_band(filename):
    """
    Reads a the first band of the file from a TIFF file.
    - Parameters:
        file_path (Path or str): Full path to the file.
    - Returns: tuple, (band_data, metadata)
        band_data (numpy.ndarray): NumPy array of the band values, converted to float32.
        metadata (dict): Metadata of the file.
    """

    # Type the solution here:
    #........................


In [4]:
# Try the function here:
#.......................

In [None]:
directory_path = Path('raw/ndvi')

for file_path in directory_path.glob("*.tiff"):
    print(file_path.name)

    if "B08" in file_path.name:  # Check if the file corresponds to B08
        B08, _ = read_band(file_path)
        
        # Complete the code below:
        #.........................
        print(f"- Band shape:", ##COMPLETE THE CODE HERE##)
        print('- B8 max', ##COMPLETE THE CODE HERE##)
        print('- B8 min', ##COMPLETE THE CODE HERE##) 
        print('')

    elif "B04" in file_path.name:  # Check if the file corresponds to B04
        B04, _ = read_band(file_path)
        
        # Complete the code below:
        #.........................
        print(f"Band shape:", ##COMPLETE THE CODE HERE##)
        print('- $4 max', ##COMPLETE THE CODE HERE##)
        print('- b4 min', ##COMPLETE THE CODE HERE##)


In [None]:
# Visualize both bands side by side
# Type the solution here:
# .......................


#### What are these Red and NIR bands?

In remote sensing, Red and NIR are two specific regions of the light spectrum captured by sensors on satellites (or drones):

- **Red Band** measures the reflectance in the visible red light spectrum (*600–700* nm in wavelength). Vegetation absorbs most of this red light for photosynthesis (**lower values**), making it a key indicator of plant health. In other words, the amount of red light reflected is inversely related to the amount of chlorophyll in the plants. Bare soil or non-vegetated areas reflect more red light (**higher values**). Check sand and water.

- **Near-InfraRed Band** captures the region of light that is just beyond the visible spectrum (*700–1000* nm). Healthy vegetation reflects a large amount of NIR light, as it is not used in photosynthesis. This high reflectance (**higher values**) in the NIR band is a characteristic of healthy, dense vegetation.

# ![Light Spectrum](Images/Light_Spectrum.png)

Bands are often scaled reflectance values ranging from 0 to a certain maximum.

### Step 4: NDVI calculation

The **Normalized Difference Vegetation Index (NDVI)** is a widely used index in agriculture and land-classification when using remote sensing to analyze vegetation health and land cover. 

NDVI is calculated based on the reflectance of light in the **Red (R)** and **Near-InfraRed (NIR)** bands. Healthy vegetation absorbs most of the visible light (particularly red) and reflects a significant amount of NIR light, while unhealthy or sparse vegetation, and human-made surfaces, reflect more R and less NIR.

#### Formula

**NDVI always ranges from -1 to +1**, and it is calculated with the the NIR and Red channels as follows:

$$\text{NDVI} = \frac{(\text{NIR} - \text{RED})}{(\text{NIR} + \text{RED})}\qquad\qquad(1)$$

where:
- **NIR** is the reflectance in the near-infrared band,
- **RED** is the reflectance in the red band.

The result is a value between -1 and 1:
- **Values closer to 1** indicate dense, healthy vegetation.
- **Values closer to -1** suggest non-vegetative surfaces, such as water, urban areas, or bare soil.
- **Values around 0** indicate sparse vegetation or soil.

#### Applications
NDVI is used to:
1. **Monitor Vegetation Health**: Detecting stressed or unhealthy vegetation.
2. **Track Seasonal Changes**: Identifying trends in vegetation over time.
3. **Assess Land Use**: Evaluating changes in vegetation cover for land management.

#### Calculate the NDVI with the previous bands analyzed

- **Check for Division by Zero:** Ensure that  $\text{NIR} + \text{RED} \neq 0$  to prevent division by zero errors.
- **Compute the NDVI:** Calculate the NDVI using the formula (1). What is the expected size of the NDVI array?
- **Print NDVI Characteristics:** After computing the NDVI, print its shape, minimum, and maximum values to understand the data range and dimensions.
- **Visualize NDVI:** Create a visual representation of the NDVI array to interpret vegetation health.

In [None]:
# Check if any values of nir + red are 0
sum_nir_red = B08 + B04  # Perform the addition

# Complete the code below:
#.........................
zeros_exist =  # Check if any value is 0

print(zeros_exist)

In [None]:
# Calculate & Plot NDVI
# Type the solution here:
# .......................


### Step 5: Additional Indices

Beyond NDVI, various other indices can be derived from processed satellite data to analyze different environmental parameters. One such index is the **Normalized Difference Water Index (NDWI)**, which is particularly useful for monitoring water bodies.

The NDWI is designed to enhance the presence of water features while suppressing the influence of vegetation and soil. It is calculated using the **Green (B03)** and **Near-Infrared (NIR, B08)** bands with the following formula:

$$\text{NDWI} = \frac{\text{Green} - \text{NIR}}{\text{Green} + \text{NIR}} \qquad\qquad(2)$$

In this formula, water has low reflectance in NIR (B08) but higher reflectance in Green (B03 - 559 nm).

Values of NDWI range from -1 to 1: positive values typically indicate the presence of water bodies; negative values correspond to soil, or built-up areas.

- Get two images from a water reservoir, such as the Pantà de la Llosa del Cavall in Sant Llorenç de Morunys, Spain.

- Choose images where the water level difference between them is expected to be significant (winter/summer).

- Complete the function `store_images` to manipulate data into a dictionary (`dict`).

- Visualize the NDWI for both images and analyze the differences between the two dates.

In [9]:
def store_images(file_path, images):
    """
    Reads and stores image data and metadata into a dictionary.

    Parameters:
        file_path (Path): Full path to the image file.
        images (dict): Dictionary to store image data.

    Returns:
        None: Updates the 'images' dictionary in place.
    """

    images[file_path] = {
            "data": # Read data. 
            "metadata": # Read metadata. You have a function for doing both :)
            "date": # Extract the date from file_path.name
        }
    
    print(f"Loaded {file_path.name} with date: {images[file_path]['date']}")

In [None]:
# Path to the directory containing the images
directory_path = Path('raw/ndwi')

# Dictionary to store loaded images
images = {}

# Iterate over the files in the directory
for file_path in directory_path.glob("*.tiff"):
    # Complete the code below:
    #.........................


# Check contents of the dictionary



In [None]:
# Plot NDVI
# Type the solution here:
# .......................

### Step 6: Operations Over Images

#### Thresholding NDWI to Isolate Water Bodies

- Analyze the computed NDWI values to understand their range.
- Play with the code below and understand how masks works. If you have an array such as: `[0.15, 0.05, 0.30, 0.20]`, you will obtain `[F, F, T, T]`, if a `threshold = 0.15` is chosen.
- Implement a function that apply a threshold to classify pixels as water or non-water (binary water mask).           
- Visualize the results.

- Key Questions:

    - What happens when you lower or raise the threshold value?
    - Are there any misclassifications (e.g., areas incorrectly identified as water)?

In [None]:
ndvi_real = np.array([0.15, 0.05, 0.30, 0.20])


In [13]:
def apply_mask(data, threshold=1000):
    """
    Applies a mask to the data based on a threshold.

    Parameters:
        data (numpy.ndarray): The NDWI data to mask.
        threshold (float): The value above which pixels are considered water.

    Returns:
        numpy.ndarray: Binary mask with True for values above the threshold.
    """


In [None]:
# Create a 2x2 grid for visualization


#### Image Subtraction for Change Detection

In remote sensing, comparing images from different time periods can help us monitor changes in environmental features, such as water levels in reservoirs. By subtracting one image from another, we can highlight changes over time and quantify the extent of these changes. Subtract the earlier NDWI image from the later NDWI image:


$$\text{Change} = \text{NDWI}{(\text{later})} - \text{NDWI}{(\text{earlier})}$$

Positive values indicate an increase in water levels, while negative values highlight a decrease in water levels.

The resulting image shows areas where water levels have changed, helping us understand the extent of drying or flooding in a reservoir.

**Can You Subtract Two Masked Images? What Would You Expect to See?**

In [None]:
# Type the code here:

### Step 7: NDVI Analysis for Multiple Files (time series)

1. Start with a single image and track vegetation over time:
    - Open and visualize the **NDVI** for one file.
    - Adjust the NDVI values to the interval [-1, 1], mapping the available values appropriately.
    - Show both the raw and normalized NDVI images side by side for comparison.
    - Calculate vegetation metrics for each image:
        - **Mean NDVI:** Represents the average vegetation level for the area.
        - **Median NDVI:** Provides a robust measure, less sensitive to extreme values.
        - **High Vegetation Percentage:** Percentage of pixels with NDVI above a threshold (e.g., NDVI > 0.6).

2. Visualize NDVI Over Multiple Years:
    - Represent 9 images from 2016 to 2024 in a 3x3 grid. Use consistent colormap scaling (-1 to 1) for easy comparison.
    - Compute the different metrics for each image.
    - Compile the metrics for all years into a dataset.
    - Create a line graph to show trends across the time series:
	    - X-axis: Years (2016–2024).
	    - Y-axis: NDVI metrics (Mean, Median, and High Vegetation Percentage).
    - Analyze temporal trends in vegetation health.

3. Key Questions:
    - How has vegetation evolved from 2016 to 2024?
    - Are there clear trends (e.g., recovery, degradation, or seasonality)?
    - What do the metrics reveal about the overall health of the area?

In [16]:
# Function to normalize NDVI
def normalize_ndvi(ndvi_raw):
    

# Function to calculate NDVI metrics
def calculate_metrics(ndvi_normalized, thresholds=[0.5, 0.6, 0.7]):
    

In [None]:
# Root directory
directory_path = Path('raw/time_series')


Mask to copmute only pixel above the threshold:

In [None]:
ndvi_real = np.array([[0.15, 0.05, 0.30, 0.20],
                      [0.10, 0.35, 0.20, 0.20]])
print('Array:\n', ndvi_real)
mask = 
print('Mask:\n', mask)
filtered_values = 
print('Filtered values:', filtered_values)
print('Filtered values mean:', filtered_values.mean())

In [None]:
# Variables to store NDVI mean values


In [None]:
# Plot NDVIs averages vs. year
