# Instance Based

xxxxxx

----------------------------------------------------------------------------------------------------------------------

#### K-Nearest Neighbour (KNN)

Pros: Simple to implement; Flexible to feature / distance choices; Naturally handles multi-class cases; Can do well in practice with enough representative data; Robust to outlier; No assumptions;

Cons: Computation intensive; Storage of data; must know meaningful distance; Sensitive to local patterns; Totally based on your training on your entire train data;

#### Learning Vector Quantization (LVQ)

Pros: Support binary and multiple class predict; The algorithm complexity can be adjusted during training as needed (# of nodes); Non-parametric model more accurate; 

Cons: Choose meaningful distance metrics; Computational intensive; 

#### Self-Organizing Map (SOM) 

Pros:

Cons:

#### Locally Weighted Learning (LWL)

Pros:

Cons:



## --------- K-Nearest Neighbour (KNN)

#### Wiki Definitation:
k nearest neighbors is a simple algorithm that stores all available cases and classifies new cases by a majority vote of its k neighbors. This algorithms segregates unlabeled data points into well defined groups.
In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression.[1] In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:
In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.
In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors.
k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms.
Both for classification and regression, it can be useful to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor.[2]
#### Input Data:
X(Numeric)/X(Categorical)

#### Initial Parameters:
K – nearest neighbors (Trade-off between linear or fit)
#### Cost Function:
Calculate Euclidean Distance
#### Process Flow:
The training examples are vectors in a multidimensional feature space, each with a class label. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples.
In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point.
A commonly used distance metric for continuous variables is Euclidean distance. For discrete variables, such as for text classification, another metric can be used, such as the overlap metric (or Hamming distance). In the context of gene expression microarray data, for example, k-NN has also been employed with correlation coefficients such as Pearson and Spearman.[3] Often, the classification accuracy of k-NN can be improved significantly if the distance metric is learned with specialized algorithms such as Large Margin Nearest Neighbor or Neighbourhood components analysis.
A drawback of the basic "majority voting" classification occurs when the class distribution is skewed. That is, examples of a more frequent class tend to dominate the prediction of the new example, because they tend to be common among the k nearest neighbors due to their large number.[4] One way to overcome this problem is to weight the classification, taking into account the distance from the test point to each of its k nearest neighbors. The class (or value, in regression problems) of each of the k nearest points is multiplied by a weight proportional to the inverse of the distance from that point to the test point. Another way to overcome skew is by abstraction in data representation. For example, in a self-organizing map (SOM), each node is a representative (a center) of a cluster of similar points, regardless of their density in the original training data. K-NN can then be applied to the SOM.
#### Tips:
Choosing the number of nearest neighbors i.e. determining the value of k plays a significant role in determining the efficacy of the model. Thus, selection of k will determine how well the data can be utilized to generalize the results of the kNN algorithm. A large k value has benefits which include reducing the variance due to the noisy data; the side effect being developing a bias due to which the learner tends to ignore the smaller patterns which may have useful insights.

In [None]:
# ---------------------------- R

# Classification
# https://www.analyticsvidhya.com/blog/2015/08/learning-concept-knn-algorithms-programming/
prc <- read.csv("Prostate_Cancer.csv",stringsAsFactors = FALSE)    #This command imports the required data 
prc <- prc[-1]  #removes the first variable(id) from the data set. set and saves it to the prc data frame.
prc$diagnosis <- factor(prc$diagnosis_result, levels = c("B", "M"), labels = c("Benign", "Malignant"))
round(prop.table(table(prc$diagnosis)) * 100, digits = 1)  # it gives the result in the percentage form rounded of to 1 decimal place( and so it’s digits = 1)
normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x))) }
prc_n <- as.data.frame(lapply(prc[2:9], normalize))
prc_train <- prc_n[1:65,]
prc_test <- prc_n[66:100,]
prc_train_labels <- prc[1:65, 1]
prc_test_labels <- prc[66:100, 1]   #This code takes the diagnosis factor in column 1 of the prc data frame and on turn creates prc_train_labels and prc_test_labels data frame.
install.packages(“class”)
library(class)
prc_test_pred <- knn(train = prc_train, test = prc_test,cl = prc_train_labels, k=10)
install.packages("gmodels")
library(gmodel)
CrossTable(x=prc_test_labels, y=prc_test_pred, prop.chisq=FALSE)

# Regression
# https://artax.karlin.mff.cuni.cz/r-help/library/FNN/html/knn.reg.html
require(chemometrics)
data(PAC);
pac.knn<- knn.reg(PAC$X, y=PAC$y, k=3);

In [None]:
# ---------------------------- Pythom

# Classification
# load lib {numpy, sklearn} 
from sklearn import datasets 
import numpy as np 
# loading datasets 
isir = datasets.load_iris() 
X = iris.data[:,[2,3]] 
y = iris.target 
# Spliting datasets 
from sklearn.cross_validation import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0) 
# Scaling data 
from sklearn.preprocessing import StandardScaler 
sc = StandardScaler() # - define scaler object 
sc.fit(X_train) # fit the object with data to get meansure 
X_train_std = sc.transform(X_train) # scale data 
X_test_std = sc.transform(X_test) # scale data 
# KNN Classifier 
from sklearn.neighbors import KNeighborsClassifier 
knn = KNeighborsClassifier(n_neighbors=5, p=2, metric='minkowski') # p = 1 {manhatten Dist} : p = 2 {Euclidean}  
knn.fit(X_train_std, y_train)

# Regression
import numpy as np
import matplotlib.pyplot as plt
from sklearn import neighbors

np.random.seed(0)
X = np.sort(5 * np.random.rand(40, 1), axis=0)
T = np.linspace(0, 5, 500)[:, np.newaxis]
y = np.sin(X).ravel()

# Add noise to targets
y[::5] += 1 * (0.5 - np.random.rand(8))

n_neighbors = 5

for i, weights in enumerate(['uniform', 'distance']):
    knn = neighbors.KNeighborsRegressor(n_neighbors, weights=weights)
    y_ = knn.fit(X, y).predict(T)

    plt.subplot(2, 1, i + 1)
    plt.scatter(X, y, c='k', label='data')
    plt.plot(T, y_, c='g', label='prediction')
    plt.axis('tight')
    plt.legend()
    plt.title("KNeighborsRegressor (k = %i, weights = '%s')" % (n_neighbors,
                                                                weights))

plt.show()

## --------- Learning Vector Quantization (LVQ)

#### Wiki Definitation:
https://www.researchgate.net/publication/259486415_A_review_of_learning_vector_quantization_classifiers
http://www.cs.rug.nl/~biehl/Preprints/wsom07lvq.pdf

In computer science, learning vector quantization (LVQ), is a prototype-based supervised classification algorithm. LVQ is the supervised counterpart of vector quantization systems.
#### Input Data:
X(Numeric)/X(Categorical)

#### Initial Parameters:
Initial weights; Number of output nodes; K(default = 1, competitive learning); Learning rate(Update steps); 
#### Cost Function:
D(X,J) – Only single X *Competitive, winner-takes-all*
#### Process Flow:
http://ccy.dd.ncu.edu.tw/~chen/course/neural/ch4/index.htm
First initialize the M weights (initial M feature values) for J output nodes (Randomly assign J output classes to J nodes) -> For each (Like KNN but use K=1 so evaluate each) training example X with M feature values (Total X * n) -> Find the Jth node that minimize the D(X,J) *Usually Euclidean distance* -> Update the M weights for that Jth node by IF[True X class == Jth node class, update the M weights to move the Jth node close to X] ELSE [True X class != Jth node class, update the M weights to move the Jth node away from X] given the update scale based on “learning rate”. -> Repeat the process literately throughout the training set repeatly-> Stop if stopping criteria meets or max repeat time meets  
#### Tips:


In [None]:
# ---------------------------- R

# Classification
# Find LVQ1,2,3 in https://cran.r-project.org/web/packages/class/class.pdf
# http://astrostatistics.psu.edu/su07/R/html/class/html/lvq3.html
# LVQ3
data(iris3)
train <- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])
test <- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])
cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))
cd <- lvqinit(train, cl, 10)
lvqtest(cd, train)
cd0 <- olvq1(train, cl, cd)
lvqtest(cd0, train)
cd3 <- lvq3(train, cl, cd0)
lvqtest(cd3, train)


In [None]:
# ---------------------------- Python

# http://mnemstudio.org/ai/nn/lvq_python_ex1.txt
# Another – https://pythonhosted.org/neurolab/ex_newlvq.html
"""
Example of use LVQ network
==========================
"""
import numpy as np
import neurolab as nl

# Create train samples
input = np.array([[-3, 0], [-2, 1], [-2, -1], [0, 2], [0, 1], [0, -1], [0, -2], 
                                                        [2, 1], [2, -1], [3, 0]])
target = np.array([[1, 0], [1, 0], [1, 0], [0, 1], [0, 1], [0, 1], [0, 1], 
                                                        [1, 0], [1, 0], [1, 0]])

# Create network with 2 layers:4 neurons in input layer(Competitive)
# and 2 neurons in output layer(liner)
net = nl.net.newlvq(nl.tool.minmax(input), 4, [.6, .4])
# Train network
error = net.train(input, target, epochs=1000, goal=-1)

# Plot result
import pylab as pl
xx, yy = np.meshgrid(np.arange(-3, 3.4, 0.2), np.arange(-3, 3.4, 0.2))
xx.shape = xx.size, 1
yy.shape = yy.size, 1
i = np.concatenate((xx, yy), axis=1)
o = net.sim(i)
grid1 = i[o[:, 0]>0]
grid2 = i[o[:, 1]>0]

class1 = input[target[:, 0]>0]
class2 = input[target[:, 1]>0]

pl.plot(class1[:,0], class1[:,1], 'bo', class2[:,0], class2[:,1], 'go')
pl.plot(grid1[:,0], grid1[:,1], 'b.', grid2[:,0], grid2[:,1], 'gx')
pl.axis([-3.2, 3.2, -3, 3])
pl.xlabel('Input[:, 0]')
pl.ylabel('Input[:, 1]')
pl.legend(['class 1', 'class 2', 'detected class 1', 'detected class 2'])
pl.show()

## --------- Self-Organizing Map (SOM) 

#### Wiki Definitation:

#### Input Data:
X(Numeric)/X(Categorical)

#### Initial Parameters:

#### Cost Function:

#### Process Flow:

#### Tips:


In [None]:
# ---------------------------- R

In [None]:
# ---------------------------- Python

## --------- Locally Weighted Learning (LWL)

#### Wiki Definitation:

#### Input Data:
X(Numeric)/X(Categorical)

#### Initial Parameters:

#### Cost Function:

#### Process Flow:

#### Tips:


In [None]:
# ---------------------------- R

In [None]:
# ---------------------------- P