<center><h1 style="color: orange"><b>Datathon - Zeroing Methane Emissions</b></h1></center>

This is the *starter notebook* for the [Zeroing Methane Emissions](https://www.speuntapped.com/) Datathon by SPE and Untapped Energy.

Here, we will unzip the images, load them to the notebook, and do initial visualizations.

First, importing some libraies:

In [None]:
# Data manipulation
import numpy as np
import pandas as pd

# Data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Utilities
from zipfile import ZipFile                         # To unzip files
import os
import glob
from random import *

# Computer vision
import cv2                                          # conda install opencv

# Display images using OpenCV (if using Google Colab)
# from google.colab.patches import cv2_imshow                                                      # Importing cv2_imshow from google.patches to display images


First, unzip all the files to the disk:

In [None]:
# specifying the zip files names
file_names = ["ch4_plume_permian_2019_png Jeremy Zhao.zip", 
              "ch4_plume_july_2020_to_may_2022_png Jeremy Zhao.zip",
              "dummy_data Jeremy Zhao.zip",
              "dummy_data_permian Jeremy Zhao.zip"]

# Looping throught all zip files:
for file_name in file_names:

    print(f"Extracting files from {file_name}...\n")
    
    # opening the zip file in READ mode
    with ZipFile(file_name, 'r') as zip:
        
        # printing all the contents of the zip file
        # zip.printdir()
  
        # extracting all the files
        print('Extracting all the files now...')
        zip.extractall()
        print('Done!')

Each zipfile was unziped to a different folder. We can import the images using **OpenCV**.

In [None]:
# CH 4 Plume Permian 2019
# Only getting files with extension .png
plume_permian_2019_list = glob.glob("./permian_2019_png/*.png")
print(f"There are {len(plume_permian_2019_list)} png files.\n")

# CH 4 RGB Geotiffs July 2020 to may 2022 png
# Only getting files with extension .png
plume_2020_2022_list = glob.glob("./ch4_rgb_geotiffs_july_2020_to_may_2022_png/*.png")
print(f"There are {len(plume_2020_2022_list)} png files.\n")

# Dummy Data Permian
# Only getting files with extension .png
dummy_permian_list = glob.glob("./dummy_data_permian/*.png")
print(f"There are {len(dummy_permian_list)} png files.\n")

# Dummy Data
# Only getting files with extension .png
dummy_list = glob.glob("./dummy_data/*.png")
print(f"There are {len(dummy_list)} png files.\n")

Loading the images to the notebook:

In [None]:
# Plume Permian
plume_permian_images = np.array([np.array(cv2.imread(p), dtype=np.uint8) for p in plume_permian_2019_list])
print("Plume Permian shape:", plume_permian_images.shape)

# Plume
plume_images = np.array([np.array(cv2.imread(p), dtype=np.uint8) for p in plume_2020_2022_list])
print("Plume shape:", plume_images.shape)

# Dummy Permian
dummy_permian_images = np.array([np.array(cv2.imread(p), dtype=np.uint8) for p in dummy_permian_list])
print("Dummy Permian shape:", dummy_permian_images.shape)

# Dummy
dummy_images = np.array([np.array(cv2.imread(p), dtype=np.uint8) for p in dummy_list])
print("Dummy shape:", dummy_images.shape)


All the images have a shape of 217x217 pixels and 3 color channels (RGB). Let's visualize some of them:

In [None]:
# Using OpenCV (it will open in another window)
# If you are using the cv2_imshow in Google Colab,
# the image will be shown inline
cv2.imshow("image", plume_permian_images[0])
cv2.waitKey(0)              # Press any key to close the window. DO NOT CLOSE THE WINDOW!
cv2.destroyAllWindows();

In [None]:
# Using matplotlib
plt.imshow(plume_permian_images[0]);

With OpenCV, the plume is red, while with matplotlib it is blue. It happens because the images were not read as RGB, but as BGR, so we need to convert them:

In [None]:
# Converting the images from BGR to RGB using cvtColor function of OpenCV
for i in range(len(plume_permian_images)):
  plume_permian_images[i] = cv2.cvtColor(plume_permian_images[i], cv2.COLOR_BGR2RGB)

plt.imshow(plume_permian_images[0]);

Now it is fixed. Let's do this for all the other images:

In [None]:
# Converting the images from BGR to RGB using cvtColor function of OpenCV
for i in range(len(plume_images)):
  plume_images[i] = cv2.cvtColor(plume_images[i], cv2.COLOR_BGR2RGB)

for i in range(len(dummy_permian_images)):
  dummy_permian_images[i] = cv2.cvtColor(dummy_permian_images[i], cv2.COLOR_BGR2RGB)

for i in range(len(dummy_images)):
  dummy_images[i] = cv2.cvtColor(dummy_images[i], cv2.COLOR_BGR2RGB)

Resizing the image, if desired:

In [None]:
def resizeImage(img, size = 128):
    """
    `resizeImage` resize an image to a squared one
    with size x size shape

    Parameters:

        img -> images array

        size -> output size. Default = 128

    Returns:

        imgR -> resized images
    """

    # Creating an empty array of images
    imgR = np.zeros((img.shape[0],size,size, 3))

    # Looping through all the images
    for i in range(img.shape[0]):

        # Resizing images 
        imgR[i] = cv2.resize(img[i], (size,size), interpolation=cv2.INTER_LINEAR)
    
    # Making sure the number are integers (for the plt.imshow)
    imgR = imgR.astype(int)
    
    return(imgR)

In [None]:
plume_permian_images = resizeImage(plume_permian_images)
print(plume_permian_images.shape)

plume_images = resizeImage(plume_images)
print(plume_images.shape)

dummy_permian_images = resizeImage(dummy_permian_images)
print(dummy_permian_images.shape)

dummy_images = resizeImage(dummy_images)
print(dummy_images.shape)

In [None]:
plume_images.max()

Creating a function to randomly plot several images at once:

In [None]:
def plotImages(data, k = 8, figsize = (13, 13)):
    """
    `plotImages` plot kxk images from the
    provided data.

    Parameters:
    
        data -> numpy array containing the images.

        k -> number of rows and columns to subplot.
             Default = 8
        
        figsize -> size of the figure.
                   Default = (15, 15)
    
    Returns:

        fig -> Image with the subplots.
    """

    # Creating subplots
    fig, ax = plt.subplots(nrows = k, ncols = k, figsize = figsize)
    for i in range(k):
        for j in range(k):

            # Randomly selecting one image
            ind = randint(0,data.shape[0]-1)

            # Ploting the image to a subplot
            ax[i,j].imshow(data[ind])

            # Turning axis off
            ax[i,j].set_axis_off()
            
    fig.tight_layout()
    plt.show();


In [None]:
plotImages(plume_permian_images)

In [None]:
plotImages(plume_images)

In [None]:
plotImages(dummy_permian_images)

In [None]:
plotImages(dummy_images)