<a href="https://colab.research.google.com/github/ulfboge/temporal-landcover-vectorizer/blob/main/scripts/python/merge_ndvi_csv.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NDVI CSV Merger Tool

This notebook merges multiple NDVI CSV files from different years into consolidated files by area. It processes temporal NDVI data and creates comprehensive time-series datasets.

## Features:
- Merges NDVI data from multiple years (2013-2023)
- Processes multiple areas simultaneously
- Maintains spatial coordinates and pixel IDs
- Creates organized output files by area

## Input Format:
Files should follow the naming convention:
- `NDVI_Annual_YYYY_Area_X_vectorized.csv`
  - YYYY: Year (e.g., 2013, 2015, etc.)
  - X: Area number

## Output Format:
- `NDVI_Merged_Area_X.csv`
  - Contains columns: pixel_id, x_coord, y_coord, y2013, y2015, y2017, y2019, y2021, y2023

## Setup
First, let's mount Google Drive and install required packages.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Import Libraries and Set Up Directories

In [None]:
# Import required libraries
import os
import glob
import pandas as pd

# Define Base Directories
base_directory = "/content/drive/MyDrive/earthengine/conversion"
csv_folder = os.path.join(base_directory, "csv")
output_folder = os.path.join(csv_folder, "merged")

# Create output directory
os.makedirs(output_folder, exist_ok=True)
print(f"Created output directory: {output_folder}")

## Find and List CSV Files

In [None]:
# Get all CSV files
csv_files = glob.glob(os.path.join(csv_folder, '*.csv'))

if not csv_files:
    raise FileNotFoundError(f"No CSV files found in '{csv_folder}'.")

print("Found the following CSV files:")
for file in csv_files:
    print(f"  - {os.path.basename(file)}")

## Process and Merge CSV Files

In [None]:
# Dictionary to store dataframes by area
area_data = {}

# Process each CSV file
for file in csv_files:
    filename = os.path.basename(file)
    parts = filename.split('_')
    
    # Skip files that don't match expected pattern
    if len(parts) < 5 or not parts[2].isdigit():
        print(f"Skipping file with invalid format: {filename}")
        continue

    year = parts[2]  # Extract year
    area = parts[4]  # Extract area number
    
    print(f"Processing: Year {year}, Area {area}")
    
    try:
        # Read CSV and rename year column
        df = pd.read_csv(file)
        df = df.rename(columns={'y2013': f'y{year}'})
        
        # Initialize or merge with existing area data
        if area not in area_data:
            area_data[area] = df[['pixel_id', 'x_coord', 'y_coord', f'y{year}']]
        else:
            area_data[area] = area_data[area].merge(
                df[['pixel_id', f'y{year}']], 
                on='pixel_id', 
                how='outer'
            )
            
    except Exception as e:
        print(f"Error processing {filename}: {str(e)}")
        continue

## Save Merged Results

In [None]:
# Save merged data for each area
for area, df in area_data.items():
    try:
        output_file = os.path.join(output_folder, f'NDVI_Merged_Area_{area}.csv')
        df.to_csv(output_file, index=False)
        print(f'Successfully saved: {os.path.basename(output_file)}')
        print(f'  - Shape: {df.shape}')
        print(f'  - Columns: {", ".join(df.columns)}')
    except Exception as e:
        print(f"Error saving Area {area}: {str(e)}")

print("\nProcessing complete! Check the 'merged' folder for results.")

## Results

The script has created merged CSV files in your Google Drive:
```
/content/drive/MyDrive/earthengine/conversion/csv/merged/
    ├── NDVI_Merged_Area_1.csv
    ├── NDVI_Merged_Area_2.csv
    └── ...
```

Each merged file contains:
- Spatial coordinates (x_coord, y_coord)
- Pixel identifiers (pixel_id)
- NDVI values for all available years (y2013 through y2023)

You can now use these merged files for temporal analysis of NDVI changes across different areas.