##The following code can be used for any region of True Color Images downloaded to a folder path in Google Drive

# RGB Extraction & Analysis from Sentinel-2 True Color PNG Images

This notebook processes True Color Sentinel-2 images from the St. Mary region to extract pixel-level RGB data over a specified time frame. Then it filters out cloud-covered pixels. The complete workflow is shown below
---

##  Mounting Google Drive

The first step mounts a persons Google Drive to Colab to access and save files. The images are stored in a directory such as: /MyDrive/StMaryTCandNVDIdata/TrueColorData/


Each `.png` file in this folder represents a True Color satillite of the region at a given timestamp.

---

##  Extract RGB Values from a Single Image

A single `.png` image is loaded using the `PIL` (Python Imaging Library). It is then converted into a numpy array to access pixel data. The image is reshaped so that each row corresponds to one pixel, and each column holds its Red, Green, and Blue values. These values are saved into a CSV file in the format:

####Pixel Number | Red | Green | Blue


This step helps test the code on one file before applying it to all within a time frame.

---

##  Batch Convert All Images to RGB CSVs

The notebook loops through all `.png` images in the directory and automates the RGB extraction:

- **Extracts the date** from each image’s filename using a common format.
- **Loads the image**, reshapes it into RGB format.
- **Saves a new CSV file** for each image using the same corresponding date (e.g., `20190802.csv`).

All CSVs are stored in: /TrueColorcsvs/


---

##  Merge All CSVs & Calculate Average RGB Values

All CSVs are loaded into pandas (pd) dataframes and merged into one dataset using the pixel number as a common key. The script then calculates the **mean Red, Green, and Blue values** for each pixel across all dates using `.groupby()` and `.agg('mean')`.

The result is saved to a new CSV file: `average_rgb_by_pixel.csv`


This provides an average color per pixel over time.

---

## Filter Out Cloud-Covered Pixels

This is an assumption that clouds in satellite images typically show up as white (RGB = 255, 255, 255). Thus, we filtered them out from the average dataset:

```python
filtered_df = df[~((df['Red'] == 255) & (df['Green'] == 255) & (df['Blue'] == 255))]
```

The cleaned data, excluding clouds, is saved as: `filtered_average_rgb.csv`








In [None]:
#mounting and connecting drive to the workspace
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
#testing to convert one image into RBG pixel values
from PIL import Image
import numpy as np
import csv
#specifying the image path, located in the google drive folder
image_path = "/content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/2019-08-02 05:25:24.363000+00:00.png"
img = Image.open(image_path)

img_np = np.array(img)

rgb_values = img_np.reshape((-1, 3))
#extracting through a loop, the numerical values of the image which correspond to the Red, Green and Blue pixels
csv_data = []
for i, row in enumerate(rgb_values):
    csv_data.append([i+1, row[0], row[1], row[2]])  # Add pixel number, R, G, B

#saving the new data to a new file and path
csv_file_path = "/content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20190802.csv"
with open(csv_file_path, 'w', newline='') as csvfile:
    csv_writer = csv.writer(csvfile)
    csv_writer.writerow(['Pixel Number', 'Red', 'Green', 'Blue'])
    csv_writer.writerows(csv_data)

print(f"RGB data saved to {csv_file_path}")

RGB data saved to /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20190802.csv


In [None]:
# saves and converts each png true color into a csv containing the RBG values of each pixel (for all images)

from google.colab import drive
from PIL import Image
import numpy as np
import csv
import os
import re

drive.mount('/content/drive')

# Define the directory containing the PNG files
png_dir = "/content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/"
csv_dir = "/content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/"

# Create the CSV directory if it doesn't exist
if not os.path.exists(csv_dir):
    os.makedirs(csv_dir)

# Iterate through all files in the directory
for filename in os.listdir(png_dir):
    if filename.endswith(".png"):
        image_path = os.path.join(png_dir, filename)
        # Extract date from filename in this specific format with year first
        match = re.search(r"(\d{4}-\d{2}-\d{2})", filename)
        if match:
            date_str = match.group(1).replace("-", "")
            csv_file_path = os.path.join(csv_dir, f"{date_str}.csv")

            try:
                img = Image.open(image_path)
                img_np = np.array(img)
                rgb_values = img_np.reshape((-1, 3))
                csv_data = []
                for i, row in enumerate(rgb_values):
                    csv_data.append([i + 1, row[0], row[1], row[2]])
                #getting the RGB values from each image in the directory (png_dir) and saving it into a new one (csv_dir) containing the
                #RGB values
                with open(csv_file_path, 'w', newline='') as csvfile:
                    csv_writer = csv.writer(csvfile)
                    csv_writer.writerow(['Pixel Number', 'Red', 'Green', 'Blue'])
                    csv_writer.writerows(csv_data)

                print(f"RGB data saved to {csv_file_path}")
            except Exception as e:
                print(f"Error processing {filename}: {e}")
        else:
            print(f"Could not extract date from filename: {filename}")


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
RGB data saved to /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20230801.csv
RGB data saved to /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20230816.csv
RGB data saved to /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20230806.csv
RGB data saved to /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20230826.csv
RGB data saved to /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20230821.csv
RGB data saved to /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20230831.csv
RGB data saved to /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20190807.csv
RGB data saved to /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20190802.csv
RGB data saved to /content/drive/MyDrive/StMaryTCandNVD

In [None]:
#merges the csv's in StMaryTCandNVDIData, TrueColorcsvs by pixel to find the average of the rbgs for each pixel over time

import pandas as pd
import os
import glob

# Define the directory containing the CSV files
csv_dir = "/content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/"

# Find all CSV files in the directory
csv_files = glob.glob(os.path.join(csv_dir, "*.csv"))

# Create an empty list to store dataframes
dfs = []

# Iterate through CSV files and read them into pandas dataframes
for csv_file in csv_files:
    try:
        df = pd.read_csv(csv_file)
        dfs.append(df)
        print(f"Successfully read {csv_file}")
    except Exception as e:
        print(f"Error reading {csv_file}: {e}")


# Concatenating all datas for all years
merged_df = pd.concat(dfs, keys=[os.path.basename(f).split('.')[0] for f in csv_files])

merged_df = merged_df.reset_index(level=0)

# Group by 'Pixel Number' and calculate the mean of RGB values
average_rgb_df = merged_df.groupby('Pixel Number').agg({'Red': 'mean', 'Green': 'mean', 'Blue': 'mean'})

# Save the result to a new CSV file
output_csv_path = os.path.join(csv_dir, "average_rgb_by_pixel.csv")
average_rgb_df.to_csv(output_csv_path)

print(f"Average RGB values by pixel saved to {output_csv_path}")


Successfully read /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20190802.csv
Successfully read /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20230801.csv
Successfully read /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20230816.csv
Successfully read /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20230806.csv
Successfully read /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20230826.csv
Successfully read /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20230821.csv
Successfully read /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20230831.csv
Successfully read /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20190807.csv
Successfully read /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/20190817.csv
Successfully read /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueCol

In [None]:
 #filtering data in average_rgb_by_pixel (indicates cloud coverage)
import pandas as pd
import os

# Define the path to your average_rgb_by_pixel.csv file
csv_dir = "/content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/"
input_csv_path = os.path.join(csv_dir, "average_rgb_by_pixel.csv")
output_csv_path = os.path.join(csv_dir, "filtered_average_rgb.csv")

# Load the CSV file into dataframe
try:
    df = pd.read_csv(input_csv_path)
except FileNotFoundError:
    print(f"Error: File not found at {input_csv_path}")
    exit()
except pd.errors.EmptyDataError:
    print(f"Error: Empty CSV file at {input_csv_path}")
    exit()
except Exception as e:
    print(f"An unexpected error occurred: {e}")
    exit()

# Filter out rows where R, G, and B are all 255
filtered_df = df[~((df['Red'] == 255) & (df['Green'] == 255) & (df['Blue'] == 255))]

# Save the filtered data to a new CSV file
filtered_df.to_csv(output_csv_path, index=False)

print(f"Filtered data saved to {output_csv_path}")


Filtered data saved to /content/drive/MyDrive/StMaryTCandNVDIdata/TrueColorData/TrueColorcsvs/filtered_average_rgb.csv
