#Support vector machine-based software reuse prediction

## Objective: To implement SVM from scratch and also compared it with using sklearn's SVM

Source of SVM: https://dzone.com/articles/classification-from-scratch-svm-78

In machine learning, support-vector machines (SVMs, also support-vector networks) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. SVM presents one of the most robust prediction methods, based on the statistical learning framework. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier (although methods such as Platt scaling exist to use SVM in a probabilistic classification setting). An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on the side of the gap on which they fall.

In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces.


### 1. For all ti in training set:
 ti.w + b <= -1   if yi = -1 

 ti.w + b >= +1 if yi = +1 

or

yi(ti.w+b) >= 1

###2. for all support vectors (i.e., data points that defines margin)
  ti.w+b = -1    where ti is -ve support vector and yi is -1

  ti.w+b = +1    where ti is +ve support vector and yi is +1

###3. For decision Boundary i.e., yi(ti.w+b)=0 where ti lies within decision boundary
### 4. The goal is to maximize width (W) or to minimize |w|

W = ((X+ - X-).w)/|w|

### 5. After obtaining the tuned w and b we have

x.w+b = 1 is line passing through +ve support vectors

x.w+b = -1 is line passing through -ve support vectors

x.w+b = 0 is decision boundary

### 6. As you know it is not possible that the support vector lines always pass through support vectors

### 7. Thus, it is a convex optimization issue and will lead to a global minimum

### 8. This is Linear SVM i.e., kernel is linear

#Dataset: Reuse/predicting successful reuse

# Attribute Information:
1.  Project ID {A,B,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y}
2.  Software Staff {L,M,S}
3.  Overall Staff {L,X,M,S}
4.  Type of Software Production {product-family,isolated}
5.  Software and Product {product,alone,process,NA}
6.  SP maturity {high,middle,low}
7.  Application Domain {TLC,SE-Tools,Bank,Engine_Controller,FMS,ATC,TS,Space Manufacturing,Measurement,Finance,Book-Keeping}
8.  Type of Software {Technical,Business,Embedded-RT,Non-Embedded-RT}
9.  Size of Baseline {L,M,S,not_available}
10. Development Approach {OO,proc,not_available}
11. Staff Experience {high,middle,low,not_available}
12. Top Management Commitment {yes,no}
13. Key Reuse Roles Introduced {yes,no,NA}
14. Reuse Processes Introduced {yes,no,NA}
15. Non-Reuse Processes Modified {yes,no,NA}
16. Repository {yes,NA}
17. Human Factors {yes,no}
18. Reuse Approach {tight,loose,NA}
19. Work Products {D+C,C,R+D+C,NA}
20. Domain Analysis {yes,no,NA}
21. Origin {ex-novo,as-is,reeng,NA}
22. Independent Team {yes,no,NA}
23. When Assests Developed {before,justintime,NA}
24. Qualification {yes,no,NA}
25. Configuration Management {yes,no,NA}
26. Rewards Policy {no,yes}
27. Assests {51_to_100,21_to_50,100+,1_to_20,NA}

#Target classes 
Success or Failure {success,failure}

#Source: http://promise.site.uottawa.ca/SERepository/datasets/reuse.arff

#Tasks:
1. Initially, load arff dataset
2. Apply pre-processing techniques
3. Divide data into training and testing sets.
4. Build SVM model from scratch
5. Test your own SVM model
6. Obtain precision and recall
7. Implement sklearn's model on processed data
8. Compare your SVM model with sklearn's model

##Task 1: Implement linear SVM from scratch  
# Algorithm of Linear SVM
1.  Initialize with random big value of w say(w0,w0) we will decrease it later
2.  Set step size as w0*0.1
3.  A minimum value of b, may increase it during the process

        i.  b will range from (-b0 < b < +b0, step = step*b_multiple)

        ii. It is also computational extensive. Therefore, define b0 wisely
4.  Check for points ti in dataset:

        i.  Check all transformation of w like (w0,w0), (-w0,w0), (w0,-w0), (-w0,-w0)

        ii. if not yi(ti.w+b)>=1 for all points then break

        iii.  Else evaluate |w| and put it in dictionary as key and (w,b) as values
5.  If w<=0 then current step is completed and move to step 6

        Else minimize w as (w0-step,w0-step) and move to step 3
6.  While step not becomes w0*0.001 

        i.  step = step*0.1

        ii. move to step 3

7.  Select (w,b) that contain minimum |w| form the dictionary

##Task 2: Implement sklearn's SVM

##Task 3: Compare your SVM with sklearn's SVM with concluding remarks

#Helping links:

https://pythonprogramming.net/svm-in-python-machine-learning-tutorial/

https://medium.com/deep-math-machine-learning-ai/chapter-3-1-svm-from-scratch-in-python-86f93f853dc

https://stackabuse.com/implementing-svm-and-kernel-svm-with-pythons-scikit-learn/

http://ecomunsing.com/build-your-own-support-vector-machine




## Task 1: Implement linear SVM from scratch 

In [None]:
# Load the libraries


In [None]:
# Load the arff dataset 
# Shuffel the dataset



In [None]:
# Preprocessing
# Encoding categorical variables (if any)
# Feature Scaling
# Filling missing values (if any)



In [None]:
# Divide the dataset to training and testing set



In [None]:
# Implement SVM from scratch 





In [None]:
# Train and test your SVM models



In [None]:
# Evaluate training and testing precision and recall



##Task 2: Implement sklearn's SVM


In [None]:
# Use the preprocessed dataset here



In [None]:
# Divide the dataset to training and testing set

In [None]:
# Train SVM model using sklearn's SVM





In [None]:
# Evaluate training and testing precision and recall

In [None]:
# Play with the intial/hyper parameters of the models(Optional)




##Task 3: Compare your SVM with sklearn's SVM with concluding remarks
