# Basic Statistics for Raster Data

This notebook demonstrates how to compute basic statistics for raster data using `rasterio` and `numpy` in Python. Statistics such as mean, standard deviation, minimum, maximum, and histograms are essential for understanding remote sensing datasets.

## Prerequisites
- Install required libraries: `rasterio`, `numpy`, `matplotlib` (listed in `requirements.txt`).
- A sample GeoTIFF file (e.g., `sample.tif`). Replace the file path with your own raster file.

## Learning Objectives
- Compute global and per-band statistics for raster data.
- Generate histograms for raster values.
- Handle masked or no-data values in calculations.

In [None]:
# Import required libraries
import rasterio
import numpy as np
import matplotlib.pyplot as plt

## Step 1: Load the Raster File

Load the raster file and read its data. We’ll handle no-data values by masking them.

In [None]:
# Define the path to the raster file
raster_path = 'sample.tif'

# Open the raster file
with rasterio.open(raster_path) as src:
    raster_data = src.read(masked=True)  # Read with mask for no-data values
    nodata = src.nodata

# Print basic information
print(f'Raster shape: {raster_data.shape}')
print(f'No-data value: {nodata}')

## Step 2: Compute Global Statistics

Calculate statistics across all bands, treating the raster as a flattened array (ignoring no-data values).

In [None]:
# Flatten the raster data (all bands) and mask no-data
flattened_data = raster_data.compressed()  # Returns array without masked values

# Compute statistics
global_min = np.min(flattened_data)
global_max = np.max(flattened_data)
global_mean = np.mean(flattened_data)
global_std = np.std(flattened_data)

# Print global statistics
print(f'Global Min: {global_min}')
print(f'Global Max: {global_max}')
print(f'Global Mean: {global_mean}')
print(f'Global Standard Deviation: {global_std}')

## Step 3: Compute Per-Band Statistics

Calculate statistics for each band individually.

In [None]:
# Number of bands
num_bands = raster_data.shape[0]

# Compute statistics per band
for band in range(num_bands):
    band_data = raster_data[band].compressed()
    band_min = np.min(band_data)
    band_max = np.max(band_data)
    band_mean = np.mean(band_data)
    band_std = np.std(band_data)
    print(f'Band {band+1}: Min={band_min}, Max={band_max}, Mean={band_mean}, Std={band_std}')

## Step 4: Generate Histogram

Create a histogram for a selected band to visualize the distribution of pixel values.

In [None]:
# Select the first band for histogram
band_data = raster_data[0].compressed()

# Plot histogram
plt.figure(figsize=(8, 6))
plt.hist(band_data, bins=50, color='blue', alpha=0.7)
plt.title('Histogram of Band 1')
plt.xlabel('Pixel Value')
plt.ylabel('Frequency')
plt.show()

## Next Steps

- Replace `sample.tif` with your own raster file.
- Compute additional statistics (e.g., median, percentiles) using `np.median` or `np.percentile`.
- Generate histograms for other bands or the entire dataset.
- Proceed to the next notebook (`06_reproject_resample_raster.ipynb`) to learn raster reprojection and resampling.

## Notes
- Using `masked=True` ensures no-data values are excluded from calculations.
- For large rasters, consider computing statistics in chunks to save memory.
- See `docs/installation.md` for troubleshooting library installation.