<h1><font color='blue'>Finding the Probability P(Y==1|X)</font></h1>

<h2><font color='Geen'>Implementing Decision Function of SVM RBF Kernel</font></h2>

<font face=' Comic Sans MS' size=3>After we train a kernel SVM model, we will be getting support vectors and their corresponsing coefficients $\alpha_{i}$

Check the documentation for better understanding of these attributes: 

https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
<img src='https://i.imgur.com/K11msU4.png' width=500>

As a part of this assignment you will be implementing the ```decision_function()``` of kernel SVM, here decision_function() means based on the value return by ```decision_function()``` model will classify the data point either as positive or negative

Ex 1: In logistic regression After traning the models with the optimal weights $w$ we get, we will find the value $\frac{1}{1+\exp(-(wx+b))}$, if this value comes out to be < 0.5 we will mark it as negative class, else its positive class

Ex 2: In Linear SVM After traning the models with the optimal weights $w$ we get, we will find the value of $sign(wx+b)$, if this value comes out to be -ve we will mark it as negative class, else its positive class.

Similarly in Kernel SVM After traning the models with the coefficients $\alpha_{i}$ we get, we will find the value of 
$sign(\sum_{i=1}^{n}(y_{i}\alpha_{i}K(x_{i},x_{q})) + intercept)$, here $K(x_{i},x_{q})$ is the RBF kernel. If this value comes out to be -ve we will mark $x_{q}$ as negative class, else its positive class.

RBF kernel is defined as: $K(x_{i},x_{q})$ = $exp(-\gamma ||x_{i} - x_{q}||^2)$

For better understanding check this link: https://scikit-learn.org/stable/modules/svm.html#svm-mathematical-formulation
</font>

## Task E

> 1. Split the data into $X_{train}$(60), $X_{cv}$(20), $X_{test}$(20)

> 2. Train $SVC(gamma=0.001, C=100.)$ on the ($X_{train}$, $y_{train}$)

> 3. Get the decision boundry values $f_{cv}$ on the $X_{cv}$ data  i.e. ` `$f_{cv}$ ```= decision_function(```$X_{cv}$```)```  <font color='red'>you need to implement this decision_function()</font>

In [1]:
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.svm import SVC

In [2]:
X, y = make_classification(n_samples=5000, n_features=5, n_redundant=2,
                           n_classes=2, weights=[0.7], class_sep=0.7, random_state=15)

In [3]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,train_size=0.8)
X_train,X_cv,y_train,y_cv = train_test_split(X_train,y_train,train_size=0.8)
print(X_train.shape, y_train.shape)
print(X_cv.shape, y_cv.shape)
print(X_test.shape, y_test.shape)

(3200, 5) (3200,)
(800, 5) (800,)
(1000, 5) (1000,)


### Pseudo code

clf = SVC(gamma=0.001, C=100.)<br>
clf.fit(Xtrain, ytrain)

<font color='green'>def</font> <font color='blue'>decision_function</font>(Xcv, ...): #use appropriate parameters <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color='green'>for</font> a data point $x_q$ <font color='green'>in</font> Xcv: <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<font color='grey'>#write code to implement $(\sum_{i=1}^{\text{all the support vectors}}(y_{i}\alpha_{i}K(x_{i},x_{q})) + intercept)$, here the values $y_i$, $\alpha_{i}$, and $intercept$ can be obtained from the trained model</font><br>
   <font color='green'>return</font> <font color='grey'><i># the decision_function output for all the data points in the Xcv</i></font>
    
fcv = decision_function(Xcv, ...)  <i># based on your requirement you can pass any other parameters </i>

<b>Note</b>: Make sure the values you get as fcv, should be equal to outputs of clf.decision_function(Xcv)


In [4]:
clf = SVC(gamma=0.001, C=100.)
clf.fit(X_train, y_train)

SVC(C=100.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [5]:
def decision_function(X_cv): #use appropriate parameters
    fcv = []
    gamma=0.001
    for xq in X_cv:
        val = 0
        for alpha,xi in zip(clf.dual_coef_[0],clf.support_vectors_): #the dual_coef_[i] contains label[i]*alpha[i]
            val += alpha*np.exp(-gamma*np.linalg.norm(xi-xq)**2) 
        fcv.append(val+clf.intercept_.item())
        
    return(np.array(fcv))            

In [6]:
fcv = decision_function(X_cv)
print(fcv)

[-1.89594501e+00  1.93189141e+00  2.10998841e+00  2.07778749e+00
 -1.43861288e+00 -2.66692020e+00 -1.76219056e+00 -3.76675201e-01
 -6.35368133e-01 -2.11467137e+00  1.40464887e+00 -3.23037181e+00
 -2.04674522e+00 -3.33172075e+00  1.69733312e+00  1.19634065e-01
  5.33301563e-01 -9.58660620e-01 -3.73636215e+00 -3.61134245e+00
 -9.02624269e-01 -3.27084238e+00 -3.43000967e+00 -3.75347922e+00
 -4.21745811e+00 -2.69345909e+00  1.64002314e+00 -3.38860741e+00
 -2.90989577e+00 -2.32970694e+00 -5.80713460e-01  1.44337608e+00
 -2.36425293e+00  1.73281604e+00  3.85856961e-01 -3.77054171e+00
 -1.72326763e+00 -1.26323096e+00 -2.35379028e+00 -1.18457119e-01
 -4.03950901e+00 -5.03399406e+00 -8.27659157e-01  1.97613668e+00
 -1.12503454e+00  1.64464730e+00  2.00081711e+00  1.84643861e+00
 -8.51200350e-01 -3.31432503e+00 -1.74397475e+00 -1.82901480e+00
  1.79273458e+00 -2.84069906e+00 -5.07695051e+00 -5.95878136e-01
 -2.46332246e-01 -1.18851693e+00  1.52910285e+00  1.33096500e+00
 -4.20294411e+00 -1.99481

In [7]:
clf.decision_function(X_cv)

array([-1.89594501e+00,  1.93189141e+00,  2.10998841e+00,  2.07778749e+00,
       -1.43861288e+00, -2.66692020e+00, -1.76219056e+00, -3.76675201e-01,
       -6.35368133e-01, -2.11467137e+00,  1.40464887e+00, -3.23037181e+00,
       -2.04674522e+00, -3.33172075e+00,  1.69733312e+00,  1.19634065e-01,
        5.33301563e-01, -9.58660620e-01, -3.73636215e+00, -3.61134245e+00,
       -9.02624269e-01, -3.27084238e+00, -3.43000967e+00, -3.75347922e+00,
       -4.21745811e+00, -2.69345909e+00,  1.64002314e+00, -3.38860741e+00,
       -2.90989577e+00, -2.32970694e+00, -5.80713460e-01,  1.44337608e+00,
       -2.36425293e+00,  1.73281604e+00,  3.85856961e-01, -3.77054171e+00,
       -1.72326763e+00, -1.26323096e+00, -2.35379028e+00, -1.18457119e-01,
       -4.03950901e+00, -5.03399406e+00, -8.27659157e-01,  1.97613668e+00,
       -1.12503454e+00,  1.64464730e+00,  2.00081711e+00,  1.84643861e+00,
       -8.51200350e-01, -3.31432503e+00, -1.74397475e+00, -1.82901480e+00,
        1.79273458e+00, -

NOTE: The values of fcv is equal to outputs of clf.decision_function(X_cv)

<h2><font color='Geen'> 9F: Implementing Platt Scaling to find P(Y==1|X)</font></h2>

Check this <a href='https://drive.google.com/open?id=133odBinMOIVb_rh_GQxxsyMRyW-Zts7a'>PDF</a>
<img src='https://i.imgur.com/CAMnVnh.png'>


## TASK F


> 4. Apply SGD algorithm with ($f_{cv}$, $y_{cv}$) and find the weight $W$ intercept $b$ ```Note: here our data is of one dimensional so we will have a one dimensional weight vector i.e W.shape (1,)``` 

> Note1: Don't forget to change the values of $y_{cv}$ as mentioned in the above image. you will calculate y+, y- based on data points in train data

> Note2: the Sklearn's SGD algorithm doesn't support the real valued outputs, you need to use the code that was done in the `'Logistic Regression with SGD and L2'` Assignment after modifying loss function, and use same parameters that used in that assignment.
<img src='https://i.imgur.com/zKYE9Oc.png'>
if Y[i] is 1, it will be replaced with y+ value else it will replaced with y- value

> 5. For a given data point from $X_{test}$, $P(Y=1|X) = \frac{1}{1+exp(-(W*f_{test}+ b))}$ where ` `$f_{test}$ ```= decision_function(```$X_{test}$```)```, W and b will be learned as metioned in the above step

__Note: in the above algorithm, the steps 2, 4 might need hyper parameter tuning, To reduce the complexity of the assignment we are excluding the hyerparameter tuning part, but intrested students can try that__

In [8]:
# Apply SGD algorithm with ( 𝑓𝑐𝑣 ,  𝑦𝑐𝑣 ) and find the weight  𝑊  intercept  𝑏  Note: here our data is of one dimensional 
# so we will have a one dimensional weight vector i.e W.shape (1,)

#Counting number of 0's & 1's in y_cv
pos, neg = 0, 0

for i in range(len(y_cv)):
    if y_cv[i] == 1:
        pos += 1
    else:
        neg += 1


# Caculating y+ value for y_cv 
val_pos = (pos + 1)/(pos + 2)

# Caculating y- value for y_cv 
val_neg = 1/(neg + 2)

_y_cv = np.copy(y_cv)
x_pos, x_neg = [], [] # values corresponds to y+, y- label

_y_cv = [float(i) for i in _y_cv]
for i in range(len(_y_cv)):
    if _y_cv[i] == 1:
        _y_cv[i] = val_pos
        x_pos.append(fcv[i])        
    else:
        _y_cv[i] = val_neg
        x_neg.append(fcv[i])
          
x_pos = np.array(x_pos)
x_neg = np.array(x_neg)

w = np.array([1]) # initializing w of shape(1,)
b = 0
eta0  = 0.0001
alpha = 0.0001
n1 = len(fcv)

def sigmoid(w,x,b):
    return 1 / (1 + np.exp(-(np.dot(w.transpose(), x) + b)))

loss = 0

for i in range(len(x_pos)):
    w = ( (1 - ((alpha * eta0)/n1)) * w ) +\
          ( alpha * x_pos[i] * ( val_pos - sigmoid(w, x_pos[i], b) ) ) 
            
    b = b + ( alpha * (val_pos- sigmoid(w, x_pos[i], b)))
    
    loss += (val_pos * np.log10(sigmoid(w,x_pos[i],b))) + (1 - val_pos) * np.log10(1-sigmoid(w,x_pos[i],b))
        
for i in range(len(x_neg)):
    w = ( (1 - ((alpha * eta0)/n1)) * w ) +\
          ( alpha * x_neg[i] * ( val_neg - sigmoid(w, x_neg[i], b) ) ) 
            
    b = b + ( alpha * (val_neg- sigmoid(w, x_neg[i], b)))
    
    loss += (val_neg * np.log10(sigmoid(w,x_neg[i],b))) + (1 - val_neg) * np.log10(1-sigmoid(w,x_neg[i],b))     
    
print("w.shape: ", w.shape)
print("b: " , b)

w.shape:  (1,)
b:  [-0.00186871]


In [11]:
# For a given data point from  𝑋𝑡𝑒𝑠𝑡 ,  𝑃(𝑌=1|𝑋)=11+𝑒𝑥𝑝(−(𝑊∗𝑓𝑡𝑒𝑠𝑡+𝑏))  where  𝑓𝑡𝑒𝑠𝑡  = decision_function( 𝑋𝑡𝑒𝑠𝑡 ), 
# W and b will be learned as metioned in the above step.

def decision_function(X_test, _intercept): #use appropriate parameters
    fcv = []
    gamma=0.001
    for xq in X_test:
        val = 0
        for alpha,xi in zip(clf.dual_coef_[0], clf.support_vectors_): 
            val += alpha*np.exp(-gamma*np.linalg.norm(xi-xq)**2) 
        fcv.append(val+_intercept)
        
    return(np.array(fcv))            

_intercept = b
ftest = decision_function(X_test, _intercept)

In [13]:
# Finding Calibrated Probabilities
P = 1/(1 + np.exp(-((w * ftest) + b)))
P

array([[0.68655652],
       [0.5391208 ],
       [0.75763094],
       [0.09979582],
       [0.88330061],
       [0.19394424],
       [0.29650535],
       [0.13102461],
       [0.13326018],
       [0.39968704],
       [0.23818807],
       [0.13927706],
       [0.92403469],
       [0.0135782 ],
       [0.37357327],
       [0.19096898],
       [0.17403915],
       [0.09266651],
       [0.96165875],
       [0.21738164],
       [0.39786513],
       [0.67302415],
       [0.52568223],
       [0.95901682],
       [0.31884448],
       [0.05206948],
       [0.12282879],
       [0.20576168],
       [0.67072056],
       [0.92023002],
       [0.18037455],
       [0.31493632],
       [0.95497239],
       [0.57284782],
       [0.9430279 ],
       [0.75916746],
       [0.15289052],
       [0.1112763 ],
       [0.45052753],
       [0.08005358],
       [0.95497186],
       [0.27461864],
       [0.27454201],
       [0.10468367],
       [0.78509835],
       [0.87126434],
       [0.48245338],
       [0.648


If any one wants to try other calibration algorithm istonic regression also please check these tutorials

1. http://fa.bianp.net/blog/tag/scikit-learn.html#fn:1

2. https://drive.google.com/open?id=1MzmA7QaP58RDzocB0RBmRiWfl7Co_VJ7

3. https://drive.google.com/open?id=133odBinMOIVb_rh_GQxxsyMRyW-Zts7a

4. https://stat.fandom.com/wiki/Isotonic_regression#Pool_Adjacent_Violators_Algorithm
