<p align="center"><img width="50%" src="https://aimodelsharecontent.s3.amazonaws.com/aimodshare_banner.jpg" /></p>

# Climate Change Satellite Image Classification Competition Model Submission Guide - PyTorch

---
**About the Original Data:**<br>
*Data and Description accessed from [Tensorflow](https://www.tensorflow.org/datasets/catalog/bigearthnet)* <br>
The BigEarthNet is a new large-scale Sentinel-2 benchmark archive, consisting of 590,326 Sentinel-2 image patches. The image patch size on the ground is 1.2 x 1.2 km with variable image size depending on the channel resolution. This is a multi-label dataset with 43 imbalanced labels, which has been simplified to single labels with 3 categories for the purposes of this competition.

To construct the BigEarthNet, 125 Sentinel-2 tiles acquired between June 2017 and May 2018 over the 10 countries (Austria, Belgium, Finland, Ireland, Kosovo, Lithuania, Luxembourg, Portugal, Serbia, Switzerland) of Europe were initially selected. All the tiles were atmospherically corrected by the Sentinel-2 Level 2A product generation and formatting tool (sen2cor). Then, they were divided into 590,326 non-overlapping image patches. Each image patch was annotated by the multiple land-cover classes (i.e., multi-labels) that were provided from the CORINE Land Cover database of the year 2018 (CLC 2018).

Bands and pixel resolution in meters:

    B01: Coastal aerosol; 60m
    B02: Blue; 10m
    B03: Green; 10m
    B04: Red; 10m
    B05: Vegetation red edge; 20m
    B06: Vegetation red edge; 20m
    B07: Vegetation red edge; 20m
    B08: NIR; 10m
    B09: Water vapor; 60m
    B11: SWIR; 20m
    B12: SWIR; 20m
    B8A: Narrow NIR; 20m

License: Community Data License Agreement - Permissive, Version 1.0."

**Competition Data Specifics:**<br>
For the purpose of this competition, the original BigEarthNet dataset has been simplified to 20,000 images (15,000 training images and 5,000 test images) with 3 categories: "forest", "nonforest", and "snow_shadow_cloud", which contains images of snow and clouds. <br>
Each "image" is a folder with 12 satellite image layers, each of which pics up on different features. The example preprocessor uses just three layers: B02, B03, and B04, which contain the standard RGB layers used in ML models. However, you are free to use any combination of the satellite image layers. 

**Data Source:**<br>
Sumbul, G, Charfuelan, M, Demir, B and Markl, V. (2019). BigEarthNet: A Large-Scale Benchmark Archive For Remote Sensing Image Understanding. *Computing Research Repository (CoRR), abs/1902.06148.* https://www.tensorflow.org/datasets/catalog/bigearthnet




# Overview
---

Let's share our models to a centralized leaderboard, so that we can collaborate and learn from the model experimentation process...

**Instructions:**
1.   Get data in and set up X_train / X_test / y_train
2.   Preprocess data / Write and Save Preprocessor function
3. Fit model on preprocessed data and save preprocessor function and model 
4. Generate predictions from X_test data and submit model to competition
5. Repeat submission process to improve place on leaderboard



## 1. Load Data

In [None]:
#install aimodelshare library
! pip install aimodelshare-nightly

[Original preprocess notebook]('https://colab.research.google.com/drive/1K7Cpg4oFdDrEV09CFODfY9Qq1Ao1o0iW')

In [None]:
# Get competition data - May take a couple minutes due to size of data set
from aimodelshare import download_data
#download_data('public.ecr.aws/y2e2a1d6/climate_competition_data-repository:latest') 

In [None]:
# Unzip Data - May take a couple minutes due to size of data set
import zipfile
#with zipfile.ZipFile('climate_competition_data/climate_competition_data.zip', 'r') as zip_ref:
    #zip_ref.extractall('competition_data')

##2.   Preprocess data / Write and Save Preprocessor function


### **Write a Preprocessor Function**


> ###   Preprocessor functions are used to preprocess data into the precise data your model requires to generate predictions.  

*  *Preprocessor functions should always be named "preprocessor".*
*  *You can use any Python library in a preprocessor function, but all libraries should be imported inside your preprocessor function.*  
*  *For image prediction models users should minimally include function inputs for an image filepath and values to reshape the image height and width.*  


In [None]:
# Set up for data preprocessing
import numpy as np
import os
import PIL
import PIL.Image
import tensorflow as tf
import tensorflow_datasets as tfds

In [None]:
# Here is a pre-designed preprocessor, but you could also build your own to prepare the data differently

def preprocessor(imageband_directory):
        """
        This function preprocesses reads in images, resizes them to a fixed shape and
        min/max transforms them before converting feature values to float32 numeric values
        required by onnx files.
        
        params:
            imageband_directory
                path to folder with 13 satellite image bands
                      
        returns:
            X
                numpy array of preprocessed image data
                  
        """
           
        import PIL
        import os
        import numpy as np
        import tensorflow_datasets as tfds

        def _load_tif(data):
            """Loads TIF file and returns as float32 numpy array."""
            img=tfds.core.lazy_imports.PIL_Image.open(data)
            img = np.array(img.getdata()).reshape(img.size).astype(np.float32)
            return img

        image_list = []
        filelist1=os.listdir(imageband_directory)
        for fpath in filelist1:
          fullpath=imageband_directory+"/"+fpath
          if fullpath.endswith(('B02.tif','B03.tif','B04.tif')):
              imgarray=_load_tif(imageband_directory+"/"+fpath)
              image_list.append(imgarray)

        X = np.stack(image_list,axis=2)   # to get (height,width,3)

        X = np.expand_dims(X, axis=0) # Expand dims to add "1" to object shape [1, h, w, channels] for keras model.
        X = np.array(X, dtype=np.float32) # Final shape for onnx runtime.
        X=X/18581 # min max transform to max value
        return X

In [None]:
# Create complete list of file names
forestfilenames=["competition_data/trainingdata/forest/"+x for x in os.listdir("competition_data/trainingdata/forest")]
nonforestfilenames=["competition_data/trainingdata/nonforest/"+x for x in os.listdir("competition_data/trainingdata/nonforest")]
otherfilenames=["competition_data/trainingdata/other/"+x for x in os.listdir("competition_data/trainingdata/other")]

filenames=forestfilenames+nonforestfilenames+otherfilenames

#preprocess rbg images into 120,120,3 numpy ndarray
preprocessed_image_data=[]
for i in filenames:
  try:
    preprocessed_image_data.append(preprocessor(i))
  except:
    pass  
  

In [None]:
# Set up y data
from itertools import repeat
forest=repeat("forest",5000)
nonforest=repeat("nonforest",5000)
other=repeat("snow_shadow_cloud",5000)
ylist=list(forest)+list(nonforest)+list(other)

In [None]:
# Shuffle X and y data
from sklearn.utils import shuffle
X, y = shuffle(preprocessed_image_data, ylist, random_state=0)

In [None]:
X =np.vstack(X) # convert X from list to array

In [None]:
X.shape

(15000, 120, 120, 3)

In [None]:
# get numerical representation of y labels
import pandas as pd
y_labels_num = pd.DataFrame(y)[0].map({'forest': 0, 'nonforest': 1, 'snow_shadow_cloud': 2}) 

y_labels_num = list(y_labels_num)

In [None]:
# Separate 20% of Data for validation
X_train = X[0:12000]
X_val = X[12001:15000]
y_train = y_labels_num[0:12000]
y_val = y_labels_num[12001:15000]

In [None]:
X_train.shape

(12000, 120, 120, 3)

##3. Fit model on preprocessed data and save preprocessor function and model 

In [None]:
import torch
from torch import nn
from torch.utils.data import DataLoader, TensorDataset
from torchvision import datasets
from torchvision.transforms import ToTensor

### **Prepare Data** for Pytorch

In [None]:
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

Using cpu device


In [None]:
# prepare datasets for pytorch dataloader
tensor_X_train = torch.Tensor(X_train)
tensor_y_train = torch.tensor(y_train, dtype=torch.long) 
train_ds = TensorDataset(tensor_X_train, tensor_y_train) 

tensor_X_test = torch.Tensor(X_val) 
tensor_y_test = torch.tensor(y_val, dtype=torch.long) 
test_ds = TensorDataset(tensor_X_test, tensor_y_test)

In [None]:
# set up dataloaders
batch_size = 50
train_dataloader = DataLoader(train_ds, batch_size=batch_size, shuffle=False)
test_dataloader = DataLoader(test_ds, batch_size=batch_size, shuffle=False)

In [None]:
for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([50, 120, 120, 3])
Shape of y: torch.Size([50]) torch.int64


### Pytorch **Neural Network**

In [None]:
# Define pytorch model

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(120*120*3, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 5)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=43200, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=5, bias=True)
  )
)


In [None]:
# set up loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In [None]:
# define training function

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

In [None]:
# define testing function

def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [None]:
epochs = 1
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

#### Save preprocessor function to "preprocessor.zip" file

In [None]:
import aimodelshare as ai
ai.export_preprocessor(preprocessor,"") 

can't pickle module objects
Your preprocessor is now saved to 'preprocessor.zip'


#### Save model to local ".onnx" file


In [None]:
# Save pytorch model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

example_input = torch.randn(1, 3, 120, 120, requires_grad=True)

onnx_model = model_to_onnx(model, framework='pytorch',
                           model_input=example_input,
                          transfer_learning=False,
                          deep_learning=True)

with open("model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

## 4. Generate predictions from X_test data and submit model to competition

In [None]:
# import and preprocess X_test images in correct order...
# ...for leaderboard prediction submissions
filenumbers=[str(x) for x in range(1, 5001)]
filenames=["competition_data/testdata/test/test"+x for x in filenumbers]

#preprocess rbg images into 120,120,3 numpy ndarray
preprocessed_image_data=[]
for i in filenames:
  try:
    preprocessed_image_data.append(preprocessor(i))
  except:
    pass  

In [None]:
X_test_submissiondata=np.vstack(preprocessed_image_data) 
tensor_X_test_submissiondata = torch.Tensor(X_test_submissiondata) 

In [None]:
#Set credentials using modelshare.org username/password

from aimodelshare.aws import set_credentials

# Note -- This is the unique rest api that powers this climate change image classification  Model Plaground
# ... Update the apiurl if submitting to a new competition

apiurl="https://srdmat3yhf.execute-api.us-east-1.amazonaws.com/prod/m"
set_credentials(apiurl=apiurl)

AI Modelshare Username:··········
AI Modelshare Password:··········
AI Model Share login credentials set successfully.


In [None]:
#Instantiate Competition

mycompetition= ai.Competition(apiurl)

In [None]:
#Submit Model 1: 
import pandas 

#-- Generate predicted y values (Model 1)
#Note: Keras predict returns the predicted column index location for classification models
prediction_column_index=model(tensor_X_test_submissiondata).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [['forest', 'nonforest', 'snow_shadow_cloud'][i] for i in prediction_column_index]

In [None]:
# Submit Model 1 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 

Your model has been submitted as model version 7

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:1535


In [None]:
# Get leaderboard to explore current best model architectures

# Get raw data in pandas data frame
data = mycompetition.get_leaderboard()

# Stylize leaderboard data
mycompetition.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,deep_learning,model_type,depth,num_params,dropout_layers,dense_layers,flatten_layers,conv2d_layers,maxpooling2d_layers,softmax_act,relu_act,loss,optimizer,model_config,memory_size,username,version
0,54.28%,46.12%,44.40%,48.11%,sklearn,,RandomForestClassifier,,,,,,,,,,,,"{'bootstrap': True, 'ccp_alpha...",,AIModelShare,2
1,56.88%,45.37%,42.31%,49.11%,sklearn,,RandomForestClassifier,,,,,,,,,,,,"{'bootstrap': True, 'ccp_alpha...",,AIModelShare,3
2,45.60%,43.39%,46.70%,40.53%,keras,True,Sequential,8.0,1847811.0,2.0,2.0,1.0,2.0,1.0,1.0,3.0,str,RMSprop,"{'name': 'sequential', 'layers...",2114288.0,AIModelShare,5
3,40.08%,43.37%,44.80%,47.78%,keras,True,Sequential,8.0,1847811.0,2.0,2.0,1.0,2.0,1.0,1.0,3.0,str,RMSprop,"{'name': 'sequential', 'layers...",2233032.0,AIModelShare,1
4,34.88%,32.55%,34.93%,35.20%,unknown,,unknown,,,,,,,,,,,,None...,,AIModelShare,4


## 5. Repeat submission process to improve place on leaderboard

*Train and submit your own models using code modeled after what you see above.*

It may also be useful to examine the architeture of models that perform particuarly well/poorly, or to compare models you've created with similar models submitted by others. Use the compare_models function in combination with the leaderboard to learn more about models that been previously submitted and potentially make decisiona about what you should do next.

In [None]:
# Compare two or more models
data=mycompetition.compare_models([1, 5], verbose=1)
mycompetition.stylize_compare(data)

Unnamed: 0,Model_1_Layer,Model_1_Shape,Model_1_Params,Model_5_Layer,Model_5_Shape,Model_5_Params
0,Conv2D,"[None, 120, 120, 32]",416,Conv2D,"[None, 120, 120, 32]",416
1,Conv2D,"[None, 120, 120, 32]",4128,Conv2D,"[None, 120, 120, 32]",4128
2,MaxPooling2D,"[None, 60, 60, 32]",0,MaxPooling2D,"[None, 60, 60, 32]",0
3,Dropout,"[None, 60, 60, 32]",0,Dropout,"[None, 60, 60, 32]",0
4,Flatten,"[None, 115200]",0,Flatten,"[None, 115200]",0
5,Dense,"[None, 16]",1843216,Dense,"[None, 16]",1843216
6,Dropout,"[None, 16]",0,Dropout,"[None, 16]",0
7,Dense,"[None, 3]",51,Dense,"[None, 3]",51
