<p align="center"><img width="50%" src="https://aimodelsharecontent.s3.amazonaws.com/aimodshare_banner.jpg" /></p>


---


# Climate Change Satellite Image Classification Competition Model Submission Guide - sklearn

---
**About the Original Data:**<br>
*Data and Description accessed from [Tensorflow](https://www.tensorflow.org/datasets/catalog/bigearthnet)* <br>
The BigEarthNet is a new large-scale Sentinel-2 benchmark archive, consisting of 590,326 Sentinel-2 image patches. The image patch size on the ground is 1.2 x 1.2 km with variable image size depending on the channel resolution. This is a multi-label dataset with 43 imbalanced labels, which has been simplified to single labels with 3 categories for the purposes of this competition.

To construct the BigEarthNet, 125 Sentinel-2 tiles acquired between June 2017 and May 2018 over the 10 countries (Austria, Belgium, Finland, Ireland, Kosovo, Lithuania, Luxembourg, Portugal, Serbia, Switzerland) of Europe were initially selected. All the tiles were atmospherically corrected by the Sentinel-2 Level 2A product generation and formatting tool (sen2cor). Then, they were divided into 590,326 non-overlapping image patches. Each image patch was annotated by the multiple land-cover classes (i.e., multi-labels) that were provided from the CORINE Land Cover database of the year 2018 (CLC 2018).

Bands and pixel resolution in meters:

    B01: Coastal aerosol; 60m
    B02: Blue; 10m
    B03: Green; 10m
    B04: Red; 10m
    B05: Vegetation red edge; 20m
    B06: Vegetation red edge; 20m
    B07: Vegetation red edge; 20m
    B08: NIR; 10m
    B09: Water vapor; 60m
    B11: SWIR; 20m
    B12: SWIR; 20m
    B8A: Narrow NIR; 20m

License: Community Data License Agreement - Permissive, Version 1.0."

**Competition Data Specifics:**<br>
For the purpose of this competition, the original BigEarthNet dataset has been simplified to 20,000 images (15,000 training images and 5,000 test images) with 3 categories: "forest", "nonforest", and "snow_shadow_cloud", which contains images of snow and clouds. <br>
Each "image" is a folder with 12 satellite image layers, each of which pics up on different features. The example preprocessor uses just three layers: B02, B03, and B04, which contain the standard RGB layers used in ML models. However, you are free to use any combination of the satellite image layers. 

**Data Source:**<br>
Sumbul, G, Charfuelan, M, Demir, B and Markl, V. (2019). BigEarthNet: A Large-Scale Benchmark Archive For Remote Sensing Image Understanding. *Computing Research Repository (CoRR), abs/1902.06148.* https://www.tensorflow.org/datasets/catalog/bigearthnet

# Overview
---

Let's share our models to a centralized leaderboard, so that we can collaborate and learn from the model experimentation process...

**Instructions:**
1.   Get data in and set up X_train / X_test / y_train
2.   Preprocess data / Write and Save Preprocessor function
3. Fit model on preprocessed data and save preprocessor function and model 
4. Generate predictions from X_test data and submit model to competition
5. Repeat submission process to improve place on leaderboard



## 1. Load Data

In [1]:
#install aimodelshare library
! pip install aimodelshare-nightly



In [2]:
# Get competition data - May take a couple minutes due to size of data set
from aimodelshare import download_data
download_data('public.ecr.aws/y2e2a1d6/climate_competition_data-repository:latest') 


Data downloaded successfully.


In [3]:
# Unzip Data - May take a couple minutes due to size of data set
import zipfile
with zipfile.ZipFile('climate_competition_data/climate_competition_data.zip', 'r') as zip_ref:
    zip_ref.extractall('competition_data')

##2.   Preprocess data / Write and Save Preprocessor function


In [4]:
# Set up for data preprocessing
import numpy as np
import os
import PIL
import PIL.Image
import tensorflow as tf
import tensorflow_datasets as tfds

In [5]:
# Here is a pre-designed preprocessor, but you could also build your own to prepare the data differently

def preprocessor(imageband_directory):
        """
        This function preprocesses reads in images, resizes them to a fixed shape and
        min/max transforms them before converting feature values to float32 numeric values
        required by onnx files.
        
        params:
            imageband_directory
                path to folder with 13 satellite image bands
                      
        returns:
            X
                numpy array of preprocessed image data
                  
        """
           
        import PIL
        import os
        import numpy as np
        import tensorflow_datasets as tfds

        def _load_tif(data):
            """Loads TIF file and returns as float32 numpy array."""
            img=tfds.core.lazy_imports.PIL_Image.open(data)
            img = np.array(img.getdata()).reshape(img.size).astype(np.float32)
            return img

        image_list = []
        filelist1=os.listdir(imageband_directory)
        for fpath in filelist1:
          fullpath=imageband_directory+"/"+fpath
          if fullpath.endswith(('B02.tif','B03.tif','B04.tif')):
              imgarray=_load_tif(imageband_directory+"/"+fpath)
              image_list.append(imgarray)

        X = np.stack(image_list,axis=2)   # to get (height,width,3)

        X = np.expand_dims(X, axis=0) # Expand dims to add "1" to object shape [1, h, w, channels] for keras model.
        X = np.array(X, dtype=np.float32) # Final shape for onnx runtime.
        X=X/18581 # min max transform to max value
        X = X.flatten()
        return X

In [6]:
# Create complete list of file names
forestfilenames=["competition_data/trainingdata/forest/"+x for x in os.listdir("competition_data/trainingdata/forest")]
nonforestfilenames=["competition_data/trainingdata/nonforest/"+x for x in os.listdir("competition_data/trainingdata/nonforest")]
otherfilenames=["competition_data/trainingdata/other/"+x for x in os.listdir("competition_data/trainingdata/other")]

filenames=forestfilenames+nonforestfilenames+otherfilenames

#preprocess rbg images into 120,120,3 numpy ndarray, then flatten into a single 1 x 43200 row for each set of rbg images
preprocessed_image_data=[]
for i in filenames:
  try:
    preprocessed_image_data.append(preprocessor(i))
  except:
    pass  
  

In [7]:
# Set up y data
from itertools import repeat
forest=repeat("forest",5000)
nonforest=repeat("nonforest",5000)
other=repeat("snow_shadow_cloud",5000)
ylist=list(forest)+list(nonforest)+list(other)

In [8]:
# Shuffle X and y data
from sklearn.utils import shuffle
X_train, y_train = shuffle(preprocessed_image_data, ylist, random_state=0)

In [9]:
X_train=np.vstack(X_train) # convert X from list to array

In [10]:
X_train.shape

(15000, 43200)

In [11]:
# Preprocess X_test Data: 

# import and preprocess X_test images in correct order...
# ...for leaderboard prediction submissions
filenumbers=[str(x) for x in range(1, 5001)]
filenames=["competition_data/testdata/test/test"+x for x in filenumbers]

#preprocess rbg images into 120,120,3 numpy ndarray, then flatten into a single 1 x 43200 row for each set of rbg images
preprocessed_image_data=[]
for i in filenames:
  try:
    preprocessed_image_data.append(preprocessor(i))
  except:
    pass  

In [12]:
X_test=np.vstack(preprocessed_image_data) # convert X from list to array

##3. Fit model on preprocessed data and save preprocessor function and model 


In [14]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score

model = RandomForestClassifier(n_estimators = 100, max_depth = 3, random_state=0)
model.fit(X_train, y_train) # Fitting to the training set.
prediction_labels = model.predict(X_train)

f1_score(y_train, prediction_labels, average='macro')

0.5587660574136145

In [15]:
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
from sklearn.metrics import f1_score

model = DecisionTreeClassifier(max_depth = 5, random_state = 123,
                                       splitter = "best", criterion = "gini")
model.fit(X_train, y_train) # Fitting to the training set.
prediction_labels = model.predict(X_train)

f1_score(y_train, prediction_labels, average='macro')

0.5707251275962552

#### Save preprocessor function to local "preprocessor.zip" file

In [16]:
import aimodelshare as ai
ai.export_preprocessor(preprocessor,"") 

Your preprocessor is now saved to 'preprocessor.zip'


#### Save model to local ".onnx" file

In [17]:
# Save sklearn model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

# Check how many preprocessed input features are there?
from skl2onnx.common.data_types import FloatTensorType

feature_count=X_train.shape[1] #Get count of preprocessed features
initial_type = [('float_input', FloatTensorType([None, feature_count]))]  #Insert correct number of preprocessed features

onnx_model = model_to_onnx(model, framework='sklearn',
                          initial_types=initial_type,
                          transfer_learning=False,
                          deep_learning=False)

with open("model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

## 4. Generate predictions from X_test data and submit model to competition


In [18]:
#Set credentials using modelshare.org username/password

from aimodelshare.aws import set_credentials
    
apiurl="https://srdmat3yhf.execute-api.us-east-1.amazonaws.com/prod/m"
 #This is the unique rest api that powers this Climate Change Satellite Image Classification Playground

set_credentials(apiurl=apiurl)

AI Modelshare Username:··········
AI Modelshare Password:··········
AI Model Share login credentials set successfully.


In [19]:
#Instantiate Competition
import aimodelshare as ai
mycompetition= ai.Competition(apiurl)

In [20]:
#Submit Model: 

#-- Generate predicted values (a list of predicted image categories) 
predicted_values = model.predict(X_test)

# Submit model to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=predicted_values)

Insert search tags to help users find your model (optional): karthik9
Provide any useful notes about your model (optional): 

Your model has been submitted as model version 429

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:1535


In [21]:
# Get leaderboard to explore current best model architectures
# Get raw data in pandas data frame
data = mycompetition.get_leaderboard()

# Stylize leaderboard data
mycompetition.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,depth,num_params,globalmaxpooling2d_layers,inputlayer_layers,minimum_layers,dropout_layers,separableconv2d_layers,globalaveragepooling2d_layers,averagepooling2d_layers,maximum_layers,randomcontrast_layers,dense_layers,randomflip_layers,multiply_layers,maxpool2d_layers,lambda_layers,flatten_layers,rescaling_layers,randomzoom_layers,add_layers,conv2d_layers,depthwiseconv2d_layers,adaptiveavgpool2d_layers,reshape_layers,normalization_layers,zeropadding2d_layers,concatenate_layers,randomtranslation_layers,layernormalization_layers,batchnormalization_layers,conv2dtranspose_layers,softmax_act,sigmoid_act,elu_act,leakyrelu_act,softplus_act,tanh_act,relu_act,swish_act,silu_act,loss,optimizer,memory_size,randombrightness_layers,Member1,Member2,team,Deep Learning,Optimizer,Transfer Learning,member1,member2,MEMBER1,MEMBER2,username,version
0,82.40%,82.22%,82.98%,86.22%,pytorch,,True,ResNet(),109.0,23543875.0,,,,,,,,,,1.0,,,1.0,,,,,,53.0,,1.0,,,,,,,53.0,,,,,,,,17.0,,,,,564128.0,,,,,,,,,,,,AustinZ,414
1,84.44%,80.76%,84.43%,82.33%,pytorch,,True,Baseline(),114.0,28889667.0,,,,1.0,,,,,,4.0,,,1.0,,,,,,53.0,,1.0,,,,,,,54.0,,,,,,,,18.0,,,,,626416.0,,,,,,,,,,,,SuperbTUM,188
2,83.12%,82.03%,82.40%,82.89%,keras,,True,Functional,6.0,65667.0,,1.0,,2.0,,1.0,,,,2.0,,,0.0,,,,,,,,,,,,,,,0.0,,1.0,,1.0,,,,,,,str,Adam,13498096.0,,,,,,,,,,,,hywang,287
3,83.16%,80.22%,82.94%,81.71%,pytorch,,True,Baseline(),115.0,28889667.0,,,,2.0,,,,,,4.0,,,1.0,,,,,,53.0,,1.0,,,,,,,54.0,,,,,,,,17.0,,,,,626416.0,,,,,,,,,,,,SuperbTUM,190
4,81.60%,80.42%,81.77%,82.31%,keras,True,True,Sequential,8.0,136387.0,,,,3.0,,,,,,4.0,,,0.0,,1.0,,,,,,,,,,,,,0.0,,1.0,,,,,,3.0,,,str,Adam,16622592.0,,,,,,,,,,,,prajwalseth,252
5,83.12%,78.61%,83.13%,79.73%,pytorch,,True,Xception(),124.0,22005035.0,,,,1.0,,,,,,4.0,,,4.0,,,,,,74.0,,,,,,,,,41.0,,,,,,,,45.0,,,,,784616.0,,,,,,,,,,,,SuperbTUM,283
6,83.68%,78.37%,83.60%,79.29%,pytorch,,True,Baseline(),114.0,24632259.0,,,,1.0,,,,,,4.0,,,1.0,,,,,,53.0,,1.0,,,,,,,54.0,,,,,,,,18.0,,,,,626352.0,,,,,,,,,,,,SuperbTUM,347
7,80.24%,79.55%,81.42%,81.91%,keras,,True,Functional,6.0,131331.0,,1.0,,2.0,,1.0,,,,2.0,,,0.0,,,,,,,,,,,,,,,0.0,,1.0,,1.0,,,,,,,str,Adam,13503144.0,,,,,,,,,,,,hywang,260
8,83.68%,80.33%,82.58%,78.53%,keras,,True,Functional,6.0,65667.0,,1.0,,2.0,,1.0,,,,2.0,,,0.0,,,,,,,,,,,,,,,0.0,,1.0,,,,,,,,,str,Adam,13490728.0,,,,,,,,,,,,hywang,264
9,79.04%,78.89%,80.53%,83.33%,pytorch,,True,ResNet(),109.0,23543875.0,,,,,,,,,,1.0,,,1.0,,,,,,53.0,,1.0,,,,,,,53.0,,,,,,,,17.0,,,,,564128.0,,,,,,,,,,,,AustinZ,418


## 5. Repeat submission process to improve place on leaderboard
*Train and submit your own models using code modeled after what you see above.*

In [22]:
# Here are several classic ML architectures you can consider choosing from to experiment with next:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import GradientBoostingClassifier

#Example code to fit model:
model = RandomForestClassifier(n_estimators = 300, max_depth = 2, random_state=0)
model.fit(X_train, y_train) # Fitting to the training set.
model.score(X_train, y_train)

# Save sklearn model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

feature_count=X_test.shape[1] #Get count of preprocessed features
initial_type = [('float_input', FloatTensorType([None, feature_count]))]  # Insert correct number of preprocessed features

onnx_model = model_to_onnx(model, framework='sklearn',
                          initial_types=initial_type,
                          transfer_learning=False,
                          deep_learning=False)

with open("model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

#-- Generate predicted values (a list of predicted labels "real" or "fake")
prediction_labels = model.predict(X_test)

# Submit model to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)


Insert search tags to help users find your model (optional): karthik9
Provide any useful notes about your model (optional): 

Your model has been submitted as model version 430

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:1535


It may also be useful to examine the architeture of models that perform particuarly well/poorly, or to compare models you've created with similar models submitted by others. Use the compare_models function in combination with the leaderboard to learn more about models that been previously submitted and potentially make decisiona about what you should do next.

In [23]:
# Compare two or more models
data=mycompetition.compare_models([2, 3], verbose=1)
mycompetition.stylize_compare(data)

Unnamed: 0,param_name,default_value,model_version_2,model_version_3
0,bootstrap,True,True,True
1,ccp_alpha,0.000000,0.000000,0.000000
2,class_weight,,,
3,criterion,gini,gini,gini
4,max_depth,,3,2
5,max_features,auto,auto,auto
6,max_leaf_nodes,,,
7,max_samples,,,
8,min_impurity_decrease,0.000000,0.000000,0.000000
9,min_impurity_split,,,





