# Learn from vector and compare influence of standardize data in SVM


vectorTools to extract values of bands from vector, and to slect a sampling method for Cross-Validation

In [1]:
from MuseoToolBox import  learnAndPredict
from MuseoToolBox import vectorTools
import numpy as np

## Select an algorithm from sklearn
Here we select RandomForestClassifier from sklearn.ensemble
We define the param_grid for the Cross-Validation according to the [parameters of RandomForestClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html).

In [2]:
from sklearn.svm import SVC
param_grid = dict(gamma=2.0**np.arange(-4,1), C=10.0**np.arange(-2,2)) 

## Define the variables

In [3]:
inRaster = '../data/map.tif'
inVector = '../data/train_withROI.gpkg'
inField = 'Class'
inStand = 'uniqueFID'

## Setup the Cross-Validation and extract values from vector

In [4]:
# extract Values from vector
Y,S,X = vectorTools.readValuesFromVector(inVector,inField,inStand,bandPrefix='band_')
# Get random  5000 features in order to faster the test
downsampled = np.random.permutation(range(X.shape[0]))[:5000]
Y = Y[downsampled]
S = S[downsampled]
X = X[downsampled,:]

## Setup learnAndPredict
With the algorithm (and its parameters, such as *oob_score=True* if you want to be able to save it).

In [5]:
model = learnAndPredict()

In [6]:
standMethod = vectorTools.samplingMethods.standCV(S,SLOO=False,maxIter=5,seed=12)


## Train and compare performances between CV

In [7]:
# initialize learning

for scale in [True,False]:
    if scale is True:print('Results with standardized data')
    else:print('Results with unstandardized data')
        
    # Generate Cross-Validation from Vector, that's why inField (2nd argument) is None.
    standCV = vectorTools.sampleSelection(Y,None,standMethod)
    
    model.learnFromVector(X,Y,classifier=SVC(),param_grid=param_grid,\
                          scale=scale,cv=standCV.getCrossValidationForScikitLearn())
    matrix,kappa,OA=model.getStatsFromCV(kappa=True,OA=True)
    
    for idx,mtrx in enumerate(matrix):
        print('Kappa : '+str(kappa[idx])+' | OA : '+str(OA[idx]))
        #print(mtrx)
    
    meanKappa = round(np.mean(kappa)*100,2)
    stdKappa = round(np.std(kappa)*100,2)
    print('Mean kappa for 5 iter with scale as {} : {}% (+-{})'.format(scale,meanKappa,stdKappa))
    print(40*"=")


Results with standardized data




best gamma : 1.0
best C : 0.1
Kappa : 0.911354396688 | OA : 0.953235908142
Kappa : 0.842579665241 | OA : 0.902118644068
Kappa : 0.890309116058 | OA : 0.939660590823
Kappa : 0.846020109451 | OA : 0.906418070993
Kappa : 0.860531304038 | OA : 0.916880891174
Mean kappa for 5 iter with scale as True : 87.02% (+-2.66)
Results with unstandardized data
best gamma : 0.0625
best C : 10.0
Kappa : 0.68292357346 | OA : 0.84885177453
Kappa : 0.760907286231 | OA : 0.856779661017
Kappa : 0.679636782596 | OA : 0.835637963545
Kappa : 0.6828542434 | OA : 0.820007171029
Kappa : 0.796223492896 | OA : 0.881748071979
Mean kappa for 5 iter with scale as False : 72.05% (+-4.87)


**So don't forget to standardize your data when using SVM (SVC for classes).**