# Level 1: Rice Crop Discovery Tool Benchmark Notebook

## Challenge Level 1 Overview

<p align="justify">Welcome to the EY Open Science Data Challenge 2023! This challenge consists of two levels – Level 1 and Level 2. This is the Level 1 challenge aimed at participants who are beginners or have intermediate skill sets in data science and programming. The goal of Level 1 is to predict the presence of rice crops at a given location using satellite data. By the time you complete this level, you will have developed a rice crop classification model, which can distinguish between rice and non-rice fields. 
</p>

<b>Challenge Aim: </b><p align="justify"> <p>

<p align="justify">In this notebook, we will demonstrate a basic model workflow that can serve as a starting point for the challenge. The basic model has been built to predict rice crops against non-rice crops (which might include forest, other vegetation and water bodies) using features from the Sentinel-1 Radiometrically Terrain Corrected (RTC)  dataset as predictor variables. In this demonstration, we have used two features from the Sentinel-1 dataset, namely VV (Vertical polarization – Vertical polarization) and VH (Vertical polarization – Horizontal polarization) and trained a logistic regression model with these features. We have extracted the VV band and VH band data from the Sentinel-1 dataset for one day (21st March 2020), with an assumption that VV and VH values for this day are representative of VV and VH values for the entire year (2020) for a given location.

Most of the functions presented in this notebook were adapted from the <a href="https://planetarycomputer.microsoft.com/dataset/sentinel-1-rtc#Example-Notebook">Sentinel-1-RTC notebook</a> found in the Planetary Computer portal.</p>
    
<p align="justify"> Please note that this notebook is just a starting point. We have made many assumptions in this notebook that you may think are not best for solving the challenge effectively. You are encouraged to modify these functions, rewrite them, or try an entirely new approach.</p>

## Load In Dependencies

To run this demonstration notebook, you will need to have the following packages imported below installed. This may take some time.  

#### Note: Environment setup
Running this notebook requires an API key.

To use your API key locally, set the environment variable <i><b>PC_SDK_SUBSCRIPTION_KEY</i></b> or use <i><b>planetary_computer.settings.set_subscription_key(<YOUR API Key>)</i></b><br>
See <a href="https://planetarycomputer.microsoft.com/docs/concepts/sas/#when-an-account-is-needed">when an account is needed for more </a>, and <a href="https://planetarycomputer.microsoft.com/account/request">request</a> an account if needed.

In [None]:
!pip install tensorflow
# Supress Warnings
import warnings
warnings.filterwarnings('ignore')

# Visualization
import ipyleaflet
import matplotlib.pyplot as plt
from IPython.display import Image
import seaborn as sns
import urllib.request

# Data Science
import numpy as np
import pandas as pd
from statistics import fmean
import sys

# Feature Engineering
from sklearn.preprocessing import MinMaxScaler, MaxAbsScaler, StandardScaler
from sklearn.model_selection import train_test_split

# Machine Learning
from sklearn.linear_model import LogisticRegression, Perceptron
from sklearn.metrics import f1_score, accuracy_score,classification_report,confusion_matrix
from sklearn.neural_network import MLPClassifier

## TENSOR FLOW
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing import sequence

# Planetary Computer Tools
import pystac
import pystac_client
import odc
from pystac_client import Client
from pystac.extensions.eo import EOExtension as eo
from odc.stac import stac_load
import planetary_computer as pc
pc.settings.set_subscription_key('28c0d5ff91ba4def81b44a6322afaad4')

# Others
import requests
import os
import rich.table
from itertools import cycle
from tqdm import tqdm
tqdm.pandas()

## Response Variable

Before building the model, we need to load in the rice crop presence data. We have curated for you data from a certain region in Vietnam for the year 2020. The data consists of  geo locations (Latitude and Longitude) with a tag specifying if the crop present in each geo location is rice or not.  

In [3]:
crop_presence_data = pd.read_csv("Crop_Location_Data_20221201.csv")
crop_presence_data.head()

Unnamed: 0,Latitude and Longitude,Class of Land
0,"(10.323727047081501, 105.2516346045924)",Rice
1,"(10.322364360592521, 105.27843410554115)",Rice
2,"(10.321455902933202, 105.25254306225168)",Rice
3,"(10.324181275911162, 105.25118037576274)",Rice
4,"(10.324635504740822, 105.27389181724476)",Rice


## Predictor Variables

<p align ="justify">Now that we have our crop location data, it is time to gather the predictor variables from the Sentinel-1 dataset. For a more in-depth look regarding the Sentinel-1 dataset and how to query it, see the Sentinel-1 <a href="https://challenge.ey.com/api/v1/storage/admin-files/6403146221623637-63ca8d537b1fe300146c79d0-Sentinel%201%20Phenology.ipynb/"> supplementary 
notebook</a>.
    

<p align = "justify">Sentinel-1 radar data penetrates through the clouds, thus helping us to get the band values with minimal atmospheric attenuation. Band values such as VV and VH help us in distinguishing between the rice and non rice crops. Hence we are choosing VV and VH as predictor variables for this experiment. 
        
<ul>
<li>VV - gamma naught values of signal transmitted with vertical polarization and received with vertical polarization with radiometric terrain correction applied.

<li>VH - gamma naught values of signal transmitted with vertical polarization and received with horizontal polarization with radiometric terrain correction applied.
</ul>

<h4 style="color:rgb(195, 52, 235)"><strong>Tip 1</strong></h4>
<p align="justify">Participants might explore other combinations of bands from the Sentinel-1 data. For example, you can use mathematical combinations of bands to generate various <a href="https://challenge.ey.com/api/v1/storage/admin-files/3868217534768359-63ca8dc8aea56e00146e3489-Comprehensive%20Guide%20-%20Satellite%20Data.docx">vegetation indices </a> which can then be used as features in your model.


### Accessing the Sentinel-1 Data

<p align = "Justify">To get the Sentinel-1 data, we write a function called <i><b>get_sentinel_data.</b></i> This function will fetch VV and VH band values for a particular location over the specified time window. In this example, we have extracted VV and VH values for a day (21st March 2020). </p>

<h4 style="color:rgb(195, 52, 235)"><strong>Tip 2</strong></h4>
<p align="justify"> Extract VV and VH band values for an entire year. Different land classes (e.g., agriculture, water, urban) will have different annual variability. This variability will be better than a single date for accurately identifying land classes. Please find below a demonstration of extracting data for a day (21st March 2020).

In [4]:
def get_sentinel_data(latlong,time_slice,assets):
    '''
    Returns VV and VH values for a given latitude and longitude 
    Attributes:
    latlong - A tuple with 2 elements - latitude and longitude
    time_slice - Timeframe for which the VV and VH values have to be extracted
    assets - A list of bands to be extracted
    '''

    latlong=latlong.replace('(','').replace(')','').replace(' ','').split(',')
    box_size_deg = 0.0004
    
    min_lon = float(latlong[1])-box_size_deg/2
    min_lat = float(latlong[0])-box_size_deg/2
    max_lon = float(latlong[1])+box_size_deg/2
    max_lat = float(latlong[0])+box_size_deg/2
    
    bbox_of_interest = (min_lon, min_lat, max_lon, max_lat)
    time_of_interest = time_slice

    catalog = pystac_client.Client.open(
        "https://planetarycomputer.microsoft.com/api/stac/v1"
    )
    search = catalog.search(
        collections=["sentinel-1-rtc"], bbox=bbox_of_interest, datetime=time_of_interest
    )
    items = list(search.get_all_items())
    
    resolution = 10  # meters per pixel 
    scale = resolution / 111320.0 # degrees per pixel for crs=4326 
    
    data = stac_load(items,bands = assets, patch_url=pc.sign, bbox=bbox_of_interest, crs="EPSG:4326", resolution=scale)
    mean = data.mean(dim=['latitude','longitude']).compute()
    dop = (mean.vv / (mean.vv + mean.vh))
    m = 1 - dop
    rvi = (np.sqrt(m))*((4*mean.vh)/(mean.vv + mean.vh))
    
    return rvi

<h4 style="color:rgb(195, 52, 235)"><strong>Tip 3 </strong></h4>

Explore the approach of building a bounding box (e.g., 5x5 pixels) around the given latitude and longitude positions and then extract the aggregated band values (e.g., average, median) to get normalized band values to build the model. Radar data has inherent variability at the pixel level due to variable scattering response from the target. This effect is called “speckle” and it is common to filter the data to smooth these variations. Try using a 3x3, 5x5 or 7x7 window around the specific latitude and longitude point to get improved results.

In [None]:
## Function call to extract VV,VH Values
## This takes 4 hours to run so I've saved the data as sentinel_1_data
time_slice = "2020-03-20/2021-03-20"
assests = ['vh','vv']
vh_vv = []
for coordinates in tqdm(crop_presence_data['Latitude and Longitude']):
    rvi = get_sentinel_data(coordinates, time_slice, assests)
    vh_vv.append(rvi)
vh_vv_data = pd.DataFrame(vh_vv)

## Joining the predictor variables and response variables
Now that we have extracted our predictor variables, we need to join them onto the response variable . We use the function <i><b>combine_two_datasets</b></i> to combine the predictor variables and response variables.The <i><b>concat</b></i> function from pandas comes in handy here.

In [27]:
vh_vv_data = vh_vv_data.loc[:,0:92]

def combine_two_datasets(dataset1,dataset2):
    '''
    Returns a  vertically concatenated dataset.
    Attributes:
    dataset1 - Dataset 1 to be combined 
    dataset2 - Dataset 2 to be combined
    '''
    data = pd.concat([dataset1,dataset2], axis=1)
    return data

In [37]:
crop_data = combine_two_datasets(crop_presence_data,vh_vv_data)
crop_data.tail()

Unnamed: 0,Latitude and Longitude,Class of Land,0,1,2,3,4,5,6,7,...,83,84,85,86,87,88,89,90,91,92
595,"(10.013942985253381, 105.67361318732796)",Non Rice,"<xarray.DataArray ()>\narray(0.19655974, dtype...","<xarray.DataArray ()>\narray(0.39938805, dtype...","<xarray.DataArray ()>\narray(0.3184799, dtype=...","<xarray.DataArray ()>\narray(0.38925806, dtype...","<xarray.DataArray ()>\narray(0.2739313, dtype=...","<xarray.DataArray ()>\narray(0.2770606, dtype=...","<xarray.DataArray ()>\narray(0.33158368, dtype...","<xarray.DataArray ()>\narray(0.23231599, dtype...",...,"<xarray.DataArray ()>\narray(0.48238558, dtype...","<xarray.DataArray ()>\narray(0.27081254, dtype...","<xarray.DataArray ()>\narray(0.28089663, dtype...","<xarray.DataArray ()>\narray(0.33328292, dtype...","<xarray.DataArray ()>\narray(0.2522075, dtype=...","<xarray.DataArray ()>\narray(0.39207846, dtype...","<xarray.DataArray ()>\narray(0.22683817, dtype...","<xarray.DataArray ()>\narray(0.29818112, dtype...","<xarray.DataArray ()>\narray(0.36950523, dtype...","<xarray.DataArray ()>\narray(0.3994072, dtype=..."
596,"(10.01348875642372, 105.67361318732796)",Non Rice,"<xarray.DataArray ()>\narray(0.3555902, dtype=...","<xarray.DataArray ()>\narray(0.2903077, dtype=...","<xarray.DataArray ()>\narray(0.16377805, dtype...","<xarray.DataArray ()>\narray(0.32119307, dtype...","<xarray.DataArray ()>\narray(0.3793813, dtype=...","<xarray.DataArray ()>\narray(0.24735147, dtype...","<xarray.DataArray ()>\narray(0.32905725, dtype...","<xarray.DataArray ()>\narray(0.24606143, dtype...",...,"<xarray.DataArray ()>\narray(0.59484184, dtype...","<xarray.DataArray ()>\narray(0.33831546, dtype...","<xarray.DataArray ()>\narray(0.25411814, dtype...","<xarray.DataArray ()>\narray(0.22466914, dtype...","<xarray.DataArray ()>\narray(0.27413058, dtype...","<xarray.DataArray ()>\narray(0.32964402, dtype...","<xarray.DataArray ()>\narray(0.34736705, dtype...","<xarray.DataArray ()>\narray(0.42407405, dtype...","<xarray.DataArray ()>\narray(0.55382156, dtype...","<xarray.DataArray ()>\narray(0.40999198, dtype..."
597,"(10.013034527594062, 105.67361318732796)",Non Rice,"<xarray.DataArray ()>\narray(0.20934895, dtype...","<xarray.DataArray ()>\narray(0.2577243, dtype=...","<xarray.DataArray ()>\narray(0.21198733, dtype...","<xarray.DataArray ()>\narray(0.33406776, dtype...","<xarray.DataArray ()>\narray(0.33187065, dtype...","<xarray.DataArray ()>\narray(0.24814975, dtype...","<xarray.DataArray ()>\narray(0.27146998, dtype...","<xarray.DataArray ()>\narray(0.45057943, dtype...",...,"<xarray.DataArray ()>\narray(0.41493595, dtype...","<xarray.DataArray ()>\narray(0.28537256, dtype...","<xarray.DataArray ()>\narray(0.50599813, dtype...","<xarray.DataArray ()>\narray(0.47668934, dtype...","<xarray.DataArray ()>\narray(0.29529318, dtype...","<xarray.DataArray ()>\narray(0.36437353, dtype...","<xarray.DataArray ()>\narray(0.21710692, dtype...","<xarray.DataArray ()>\narray(0.5114662, dtype=...","<xarray.DataArray ()>\narray(0.35419172, dtype...","<xarray.DataArray ()>\narray(0.26622713, dtype..."
598,"(10.012580298764401, 105.67361318732796)",Non Rice,"<xarray.DataArray ()>\narray(0.21546237, dtype...","<xarray.DataArray ()>\narray(0.36383352, dtype...","<xarray.DataArray ()>\narray(0.3322873, dtype=...","<xarray.DataArray ()>\narray(0.2127097, dtype=...","<xarray.DataArray ()>\narray(0.51291436, dtype...","<xarray.DataArray ()>\narray(0.14234579, dtype...","<xarray.DataArray ()>\narray(0.5336944, dtype=...","<xarray.DataArray ()>\narray(0.28254122, dtype...",...,"<xarray.DataArray ()>\narray(0.31938356, dtype...","<xarray.DataArray ()>\narray(0.3118895, dtype=...","<xarray.DataArray ()>\narray(0.6270707, dtype=...","<xarray.DataArray ()>\narray(0.34401327, dtype...","<xarray.DataArray ()>\narray(0.27901357, dtype...","<xarray.DataArray ()>\narray(0.550253, dtype=f...","<xarray.DataArray ()>\narray(0.27162036, dtype...","<xarray.DataArray ()>\narray(0.34852818, dtype...","<xarray.DataArray ()>\narray(0.28245082, dtype...","<xarray.DataArray ()>\narray(0.311675, dtype=f..."
599,"(10.012126069934741, 105.67361318732796)",Non Rice,"<xarray.DataArray ()>\narray(0.2558531, dtype=...","<xarray.DataArray ()>\narray(0.3539578, dtype=...","<xarray.DataArray ()>\narray(0.43271172, dtype...","<xarray.DataArray ()>\narray(0.24186663, dtype...","<xarray.DataArray ()>\narray(0.2541362, dtype=...","<xarray.DataArray ()>\narray(0.18631743, dtype...","<xarray.DataArray ()>\narray(0.33960024, dtype...","<xarray.DataArray ()>\narray(0.3975323, dtype=...",...,"<xarray.DataArray ()>\narray(0.4691493, dtype=...","<xarray.DataArray ()>\narray(0.28549674, dtype...","<xarray.DataArray ()>\narray(0.45891556, dtype...","<xarray.DataArray ()>\narray(0.24588229, dtype...","<xarray.DataArray ()>\narray(0.18654916, dtype...","<xarray.DataArray ()>\narray(0.3637562, dtype=...","<xarray.DataArray ()>\narray(0.31024554, dtype...","<xarray.DataArray ()>\narray(0.5495329, dtype=...","<xarray.DataArray ()>\narray(0.7351637, dtype=...","<xarray.DataArray ()>\narray(0.5552375, dtype=..."


In [44]:
##saving data
crop_data.to_pickle(os.path.abspath("") + "/pickle")

In [None]:
##loading pickled RVI data
unpickled_crop_data = pd.read_pickle(os.path.abspath("") + "/pickle")
unpickled_crop_data

## Model Building


<p align="justify"> Now let us select the columns required for our model building exercise. We will consider only VV and VH for our model. It does not make sense to use latitude and longitude as predictor variables as they do not have any impact on presence of rice crop.</p>

In [46]:
crop_data = crop_data.drop(columns=['Latitude and Longitude'])

### Train and Test Split 

<p align="justify">We will now split the data into 70% training data and 30% test data. Scikit-learn alias “sklearn” is a robust library for machine learning in Python. The scikit-learn library has a <i><b>model_selection</b></i> module in which there is a splitting function <i><b>train_test_split</b></i>. You can use the same.</p>

In [47]:
X = crop_data.drop(columns=['Class of Land']).values
X.reshape(600,93,1)
y = crop_data['Class of Land'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,stratify=y,random_state=40)

### Feature Scaling 

<p align="justify"> Before initiating the model training we may have to execute different data pre-processing steps. Here we are demonstrating the scaling of VV and VH variable by using Standard Scaler.</p>

<p align = "justify">Feature Scaling is a data preprocessing step for numerical features. Many machine learning algorithms like Gradient descent methods, KNN algorithm, linear and logistic regression, etc. require data scaling to produce good results. Scikit learn provides functions that can be used to apply data scaling. Here we are using Standard Scaler.</p>

<h4 style="color:rgb(195, 52, 235)"><strong>Tip 4 </strong></h4>
<p align="justify">Participants might explore other feature scaling techniques like Min Max Scaler, Max Absolute Scaling, Robust Scaling etc.</p>

In [48]:
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [60]:
y_train_int = pd.get_dummies(y_train).values.argmax(1)
y_test_int = pd.get_dummies(y_test).values.argmax(1)

### Model Training

<p justify ="align">Now that we have the data in a format appropriate for machine learning, we can begin training a model. In this demonstration notebook, we have used a binary logistic regression model from the scikit-learn library. This library offers a wide range of other models, each with the capacity for extensive parameter tuning and customization capabilities.</p>

<p justify ="align">Scikit-learn models require separation of predictor variables and the response variable. You have to store the predictor variables in array X and the response variable in the array Y. You must make sure not to include the response variable in array X. It also doesn't make sense to use latitude and longitude as predictor variables in such a confined area, so we drop those too.</p>

In [61]:
model = Sequential()
model.add(LSTM(100))
model.add(Dense(50, activation='relu'))
model.add(Dense(25, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.build((600,93,1))
print(model.summary())
model.fit(X_train, y_train_int, validation_data=(X_test, y_test_int), epochs=3, batch_size=64)

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm_3 (LSTM)               (600, 100)                40800     
                                                                 
 dense_12 (Dense)            (600, 50)                 5050      
                                                                 
 dense_13 (Dense)            (600, 25)                 1275      
                                                                 
 dense_14 (Dense)            (600, 10)                 260       
                                                                 
 dense_15 (Dense)            (600, 1)                  11        
                                                                 
Total params: 47,396
Trainable params: 47,396
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/3


2023-03-28 05:04:16.272837: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_2_grad/concat/split_2/split_dim' with dtype int32
	 [[{{node gradients/split_2_grad/concat/split_2/split_dim}}]]
2023-03-28 05:04:16.274872: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_grad/concat/split/split_dim' with dtype int32
	 [[{{node gradients/split_grad/concat/split/split_dim}}]]
2023-03-28 05:04:16.276233: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You mus



2023-03-28 05:04:19.606021: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_2_grad/concat/split_2/split_dim' with dtype int32
	 [[{{node gradients/split_2_grad/concat/split_2/split_dim}}]]
2023-03-28 05:04:19.607891: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_grad/concat/split/split_dim' with dtype int32
	 [[{{node gradients/split_grad/concat/split/split_dim}}]]
2023-03-28 05:04:19.609323: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You mus

Epoch 2/3
Epoch 3/3


NameError: name 'X_test_int' is not defined

In [62]:
# Final evaluation of the model
scores = model.evaluate(X_test, y_test_int, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Accuracy: 98.33%


## Submission

Once you are happy with your model, you can make a submission. To make a submission, you will need to use your model to make predictions about the presence of rice crops for a set of test coordinates we have provided in the <a href="https://challenge.ey.com/api/v1/storage/admin-files/6847912254281276-63ca8b5ab12e510013520e2b-challenge_1_submission_template.csv"><b>"challenge_1_submission_template.csv"</b></a> file and upload the file onto the challenge platform.

In [63]:
#Reading the coordinates for the submission
test_file = pd.read_csv('challenge_1_submission_template.csv')
test_file.head()

Unnamed: 0,id,target
0,"(10.18019073690894, 105.32022315786804)",
1,"(10.561107033461816, 105.12772097986661)",
2,"(10.623790611954897, 105.13771401411867)",
3,"(10.583364246115156, 105.23946127195805)",
4,"(10.20744446668854, 105.26844107128906)",


In [66]:
## Get Sentinel-1-RTC Data
time_slice = "2020-03-20/2021-03-20"
assests = ['vh','vv']
vh_vv = []
for coordinates in tqdm(test_file['id']):
    vh_vv.append(get_sentinel_data(coordinates,time_slice,assests))
submission_vh_vv_data = pd.DataFrame(vh_vv)

100%|██████████| 250/250 [1:17:54<00:00, 18.70s/it]


In [67]:
submission_vh_vv_data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,83,84,85,86,87,88,89,90,91,92
0,"<xarray.DataArray ()>\narray(0.20498002, dtype...","<xarray.DataArray ()>\narray(0.09581719, dtype...","<xarray.DataArray ()>\narray(0.20718579, dtype...","<xarray.DataArray ()>\narray(0.2533498, dtype=...","<xarray.DataArray ()>\narray(0.11844371, dtype...","<xarray.DataArray ()>\narray(0.02767199, dtype...","<xarray.DataArray ()>\narray(0.10165091, dtype...","<xarray.DataArray ()>\narray(0.04258522, dtype...","<xarray.DataArray ()>\narray(0.4925321, dtype=...","<xarray.DataArray ()>\narray(0.6745176, dtype=...",...,"<xarray.DataArray ()>\narray(0.5674154, dtype=...","<xarray.DataArray ()>\narray(0.4159905, dtype=...","<xarray.DataArray ()>\narray(0.73235065, dtype...","<xarray.DataArray ()>\narray(0.800328, dtype=f...","<xarray.DataArray ()>\narray(0.61892587, dtype...","<xarray.DataArray ()>\narray(0.60382855, dtype...","<xarray.DataArray ()>\narray(0.7630911, dtype=...","<xarray.DataArray ()>\narray(0.72475797, dtype...","<xarray.DataArray ()>\narray(0.70411044, dtype...","<xarray.DataArray ()>\narray(1.1346291, dtype=..."
1,"<xarray.DataArray ()>\narray(0.11710291, dtype...","<xarray.DataArray ()>\narray(0.11907005, dtype...","<xarray.DataArray ()>\narray(0.27990967, dtype...","<xarray.DataArray ()>\narray(0.14115386, dtype...","<xarray.DataArray ()>\narray(0.19613186, dtype...","<xarray.DataArray ()>\narray(0.06624521, dtype...","<xarray.DataArray ()>\narray(0.17149457, dtype...","<xarray.DataArray ()>\narray(0.14702904, dtype...","<xarray.DataArray ()>\narray(0.22493415, dtype...","<xarray.DataArray ()>\narray(0.5882499, dtype=...",...,"<xarray.DataArray ()>\narray(0.6932167, dtype=...","<xarray.DataArray ()>\narray(0.3987493, dtype=...","<xarray.DataArray ()>\narray(0.4053366, dtype=...","<xarray.DataArray ()>\narray(0.9910362, dtype=...","<xarray.DataArray ()>\narray(0.71792424, dtype...","<xarray.DataArray ()>\narray(0.5605409, dtype=...","<xarray.DataArray ()>\narray(0.64207584, dtype...","<xarray.DataArray ()>\narray(0.8856257, dtype=...","<xarray.DataArray ()>\narray(0.67131335, dtype...","<xarray.DataArray ()>\narray(1.2807736, dtype=..."
2,"<xarray.DataArray ()>\narray(0.43898377, dtype...","<xarray.DataArray ()>\narray(0.69650763, dtype...","<xarray.DataArray ()>\narray(0.7175918, dtype=...","<xarray.DataArray ()>\narray(0.4817952, dtype=...","<xarray.DataArray ()>\narray(0.81247586, dtype...","<xarray.DataArray ()>\narray(0.5653384, dtype=...","<xarray.DataArray ()>\narray(0.33836052, dtype...","<xarray.DataArray ()>\narray(0.45593515, dtype...","<xarray.DataArray ()>\narray(0.39517048, dtype...","<xarray.DataArray ()>\narray(0.3908047, dtype=...",...,"<xarray.DataArray ()>\narray(0.2704521, dtype=...","<xarray.DataArray ()>\narray(0.49552223, dtype...","<xarray.DataArray ()>\narray(0.3681865, dtype=...","<xarray.DataArray ()>\narray(0.16039291, dtype...","<xarray.DataArray ()>\narray(0.75872433, dtype...","<xarray.DataArray ()>\narray(0.38063878, dtype...","<xarray.DataArray ()>\narray(0.40482783, dtype...","<xarray.DataArray ()>\narray(0.93083644, dtype...","<xarray.DataArray ()>\narray(0.65628135, dtype...","<xarray.DataArray ()>\narray(0.27782217, dtype..."
3,"<xarray.DataArray ()>\narray(0.8433081, dtype=...","<xarray.DataArray ()>\narray(0.3684077, dtype=...","<xarray.DataArray ()>\narray(0.28390878, dtype...","<xarray.DataArray ()>\narray(0.3935043, dtype=...","<xarray.DataArray ()>\narray(0.5877848, dtype=...","<xarray.DataArray ()>\narray(0.6460531, dtype=...","<xarray.DataArray ()>\narray(0.53515136, dtype...","<xarray.DataArray ()>\narray(0.38372862, dtype...","<xarray.DataArray ()>\narray(0.39201453, dtype...","<xarray.DataArray ()>\narray(0.33640766, dtype...",...,"<xarray.DataArray ()>\narray(0.45283407, dtype...","<xarray.DataArray ()>\narray(0.5277884, dtype=...","<xarray.DataArray ()>\narray(0.42061043, dtype...","<xarray.DataArray ()>\narray(0.695183, dtype=f...","<xarray.DataArray ()>\narray(0.3238123, dtype=...","<xarray.DataArray ()>\narray(0.51410043, dtype...","<xarray.DataArray ()>\narray(0.4949537, dtype=...","<xarray.DataArray ()>\narray(0.45904544, dtype...","<xarray.DataArray ()>\narray(0.5261167, dtype=...","<xarray.DataArray ()>\narray(0.88999724, dtype..."
4,"<xarray.DataArray ()>\narray(0.46209118, dtype...","<xarray.DataArray ()>\narray(0.2610963, dtype=...","<xarray.DataArray ()>\narray(0.6274914, dtype=...","<xarray.DataArray ()>\narray(0.2072094, dtype=...","<xarray.DataArray ()>\narray(0.05597639, dtype...","<xarray.DataArray ()>\narray(0.10425574, dtype...","<xarray.DataArray ()>\narray(0.15725571, dtype...","<xarray.DataArray ()>\narray(0.13038425, dtype...","<xarray.DataArray ()>\narray(0.07319376, dtype...","<xarray.DataArray ()>\narray(0.13584733, dtype...",...,"<xarray.DataArray ()>\narray(0.49297413, dtype...","<xarray.DataArray ()>\narray(0.4613879, dtype=...","<xarray.DataArray ()>\narray(0.90365267, dtype...","<xarray.DataArray ()>\narray(0.29636106, dtype...","<xarray.DataArray ()>\narray(0.5994225, dtype=...","<xarray.DataArray ()>\narray(0.9913828, dtype=...","<xarray.DataArray ()>\narray(0.7312665, dtype=...","<xarray.DataArray ()>\narray(0.5909075, dtype=...","<xarray.DataArray ()>\narray(0.43711868, dtype...","<xarray.DataArray ()>\narray(0.63138276, dtype..."


In [68]:
# Feature Scaling 
submission_vh_vv_data = submission_vh_vv_data.values
transformed_submission_data = sc.transform(submission_vh_vv_data)


In [None]:
#Making predictions
final_predictions = model.predict(transformed_submission_data)


In [79]:
y_test

array(['Non Rice', 'Rice', 'Non Rice', 'Non Rice', 'Rice', 'Rice', 'Rice',
       'Non Rice', 'Rice', 'Rice', 'Rice', 'Non Rice', 'Rice', 'Rice',
       'Non Rice', 'Non Rice', 'Non Rice', 'Rice', 'Rice', 'Non Rice',
       'Non Rice', 'Non Rice', 'Rice', 'Non Rice', 'Non Rice', 'Rice',
       'Rice', 'Rice', 'Non Rice', 'Rice', 'Non Rice', 'Non Rice', 'Rice',
       'Non Rice', 'Rice', 'Rice', 'Rice', 'Rice', 'Rice', 'Non Rice',
       'Rice', 'Non Rice', 'Rice', 'Rice', 'Rice', 'Non Rice', 'Non Rice',
       'Non Rice', 'Rice', 'Non Rice', 'Non Rice', 'Rice', 'Non Rice',
       'Rice', 'Non Rice', 'Non Rice', 'Rice', 'Non Rice', 'Rice',
       'Non Rice', 'Rice', 'Non Rice', 'Rice', 'Rice', 'Non Rice',
       'Non Rice', 'Non Rice', 'Non Rice', 'Rice', 'Non Rice', 'Rice',
       'Non Rice', 'Rice', 'Rice', 'Rice', 'Rice', 'Rice', 'Non Rice',
       'Rice', 'Rice', 'Rice', 'Rice', 'Rice', 'Non Rice', 'Non Rice',
       'Non Rice', 'Rice', 'Non Rice', 'Rice', 'Rice', 'Non Rice',
      

In [78]:
final_prediction_series = pd.Series(final_predictions.flatten())
final_prediction_series

0      0.998045
1      0.997816
2      0.446641
3      0.428837
4      0.997346
         ...   
245    0.419690
246    0.387671
247    0.426460
248    0.772415
249    0.471928
Length: 250, dtype: float64

In [82]:
for i in range(len(final_prediction_series)):
    if final_prediction_series[i] >= 0.50:
        final_prediction_series[i] = "Rice"
    else:
        final_prediction_series[i] = "Non Rice"


In [83]:
final_prediction_series

0          Rice
1          Rice
2      Non Rice
3      Non Rice
4          Rice
         ...   
245    Non Rice
246    Non Rice
247    Non Rice
248        Rice
249    Non Rice
Length: 250, dtype: object

In [85]:
#Combining the results into dataframe
submission_df = pd.DataFrame({'id':test_file['id'].values, 'target':final_prediction_series.values})

In [86]:
#Displaying the sample submission dataframe
display(submission_df)

Unnamed: 0,id,target
0,"(10.18019073690894, 105.32022315786804)",Rice
1,"(10.561107033461816, 105.12772097986661)",Rice
2,"(10.623790611954897, 105.13771401411867)",Non Rice
3,"(10.583364246115156, 105.23946127195805)",Non Rice
4,"(10.20744446668854, 105.26844107128906)",Rice
...,...,...
245,"(10.308283266873062, 105.50872812216863)",Non Rice
246,"(10.582910017285496, 105.23991550078767)",Non Rice
247,"(10.581547330796518, 105.23991550078767)",Non Rice
248,"(10.629241357910818, 105.15315779432643)",Rice


In [87]:
#Dumping the predictions into a csv file.
submission_df.to_csv("challenge_1_submission_rice_crop_prediction.csv",index = False)

## Conclusion

Now that you have learned a basic approach to model training, it’s time to try your own approach! Feel free to modify any of the functions presented in this notebook. We look forward to seeing your version of the model and the results. Best of luck with the challenge!