# GreenGuardian 🌎🛡️

Green Guardian utilizes AWS SageMaker and associated cloud technologies to deploy an advanced deep learning model for precise plastic object detection in images. Through SageMaker, the model is efficiently trained and tuned. The detection process is orchestrated using Step Functions, triggered by AWS EventBridge's CRON jobs on scheduled intervals. This approach enables cost-effective, scalable, and accurate plastic detection, aligning with the goal of a greener planet!

## DATA ANALYSIS 📊:

In [2]:
import pandas as pd
from IPython.display import display

# create a dataframe for the train and validation imgs:
print("VALIDATION IMGS RAW LABEL DATA:")
df_val = pd.read_csv('data/validation/labels/detections.csv')
display(df_val.head())

print("\nTRAIN IMGS RAW LABEL DATA:")
df_train = pd.read_csv('data/train/labels/detections.csv')
display(df_train.head())

VALIDATION IMGS RAW LABEL DATA:


Unnamed: 0,ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside
0,0001eeaf4aed83f9,xclick,/m/0cmf2,1,0.022673,0.964201,0.071038,0.800546,0,0,0,0,0
1,000595fe6fee6369,xclick,/m/02wbm,1,0.0,1.0,0.0,1.0,0,0,1,0,0
2,000595fe6fee6369,xclick,/m/02xwb,1,0.141384,0.179676,0.676275,0.731707,0,0,0,0,0
3,000595fe6fee6369,xclick,/m/02xwb,1,0.213549,0.253314,0.299335,0.354767,1,0,0,0,0
4,000595fe6fee6369,xclick,/m/02xwb,1,0.232695,0.28866,0.490022,0.545455,1,0,0,0,0



TRAIN IMGS RAW LABEL DATA:


Unnamed: 0,ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,...,IsDepiction,IsInside,XClick1X,XClick2X,XClick3X,XClick4X,XClick1Y,XClick2Y,XClick3Y,XClick4Y
0,000002b66c9c498e,xclick,/m/01g317,1,0.0125,0.195312,0.148438,0.5875,0,1,...,0,0,0.148438,0.0125,0.059375,0.195312,0.148438,0.357812,0.5875,0.325
1,000002b66c9c498e,xclick,/m/01g317,1,0.025,0.276563,0.714063,0.948438,0,1,...,0,0,0.025,0.248438,0.276563,0.214062,0.914062,0.714063,0.782813,0.948438
2,000002b66c9c498e,xclick,/m/01g317,1,0.151562,0.310937,0.198437,0.590625,1,0,...,0,0,0.24375,0.151562,0.310937,0.2625,0.198437,0.434375,0.507812,0.590625
3,000002b66c9c498e,xclick,/m/01g317,1,0.25625,0.429688,0.651563,0.925,1,0,...,0,0,0.315625,0.429688,0.25625,0.423438,0.651563,0.921875,0.826562,0.925
4,000002b66c9c498e,xclick,/m/01g317,1,0.257812,0.346875,0.235938,0.385938,1,0,...,0,0,0.317188,0.257812,0.346875,0.307812,0.235938,0.289062,0.348438,0.385938


## DATA ANALYSIS NOTE #1:

- The CSV-data extracted above for the training labels and validation labels contains information about **all** the classes in the dataset we acquired from Google's Open Images Dataset.

- I will be restricting this data to contain information about the images that have **only** the unqiue **label_name** corresponding to plastic-images for this project's purpose, as we do not care about other types of images, this model is solely aimed to be trained on the bounding boxes drawn around the plastic entities in the image.

In [16]:
import glob

# paths to the training and validation images.
img_paths = ["data/validation/data/*.jpg", "data/train/data/*.jpg"]

# lists to store the respective img ids.
train_img_ids = []
val_img_ids = []

# total amt of img analysis.
def count_total_imgs():
    
    # img counters.
    total_val_imgs = 0
    total_train_imgs = 0

    # counter-loop.
    img_paths = ["data/validation/data/*.jpg", "data/train/data/*.jpg"]
    for path in img_paths:
        if('validation' in path):
            for img in glob.glob(path):
                total_val_imgs += 1
        if('train' in path):
            for img in glob.glob(path):
                total_train_imgs += 1

    print("THE TOTAL NUMBER OF VALIDATION IMGS IS: ", total_val_imgs)
    print("THE TOTAL NUMBER OF TRAIN IMGS IS: ", total_train_imgs)

# retrieving each img-id from each of the paths.
def get_desired_img_ids(path):
    
    # retrieval.
    for img in glob.glob(path):
        if 'train' in path:
            img_id = img.split("/")[-1].split(".")[0]
            train_img_ids.append(img_id)
        elif 'validation' in path:
            img_id = img.split("/")[-1].split(".")[0]
            val_img_ids.append(img_id)

# func calls.
count_total_imgs()
for path in img_paths:
    get_desired_img_ids(path)

THE TOTAL NUMBER OF VALIDATION IMGS IS:  9
THE TOTAL NUMBER OF TRAIN IMGS IS:  517
