# Training and Testing Models

In [1]:
import pandas as pd

In [2]:
data = pd.read_csv('data.csv')
data.head(2)

Unnamed: 0,x1,x2,y
0,0.78051,-0.063669,0
1,0.28774,0.29139,0


In [3]:
X = data[['x1', 'x2']]
y = data['y']

# A. Training Models

### 1. Logistic Regression

In [4]:
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()

In [5]:
classifier.fit(X, y)

LogisticRegression()

![Logistic_Regression.png](attachment:Logistic_Regression.png)

### 2. Neural Networks

In [6]:
from sklearn.neural_network import MLPClassifier
classifier = MLPClassifier()

In [7]:
classifier.fit(X, y)



MLPClassifier()

### 3. Decision Trees

In [8]:
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier()

In [9]:
classifier.fit(X, y)

DecisionTreeClassifier()

![Decision_Tree.png](attachment:Decision_Tree.png)

### 4. Support Vector Machines

In [10]:
from sklearn.svm import SVC
classifier = SVC()

In [11]:
classifier.fit(X, y)

SVC()

![SVM.png](attachment:SVM.png)

# B. Tuning Parameters (Manually)

![Tuning_Parameters.PNG](attachment:Tuning_Parameters.PNG)

- It seems that Logistic Regression didn't do so well, as it's a linear algorithm.
- Decision Trees managed to bound the data well
- SVM also did pretty well

###  Let's try to fit this data with an SVM Classifier

In [12]:
data = pd.read_csv('data2.csv')
data.head(2)

Unnamed: 0,x1,x2,y
0,0.24539,0.81725,0
1,0.21774,0.76462,0


__Note:__
- kernel (string): 'linear', 'poly', 'rbf'.
- degree (integer): This is the degree of the polynomial kernel, if that's the kernel you picked (goes with poly kernel).
- gamma (float): The gamma parameter (goes with rbf kernel).
- C (float): The C parameter.

In [13]:
classifier = SVC()
classifier.fit(X, y)

SVC()

![wo_manual_tuning.png](attachment:wo_manual_tuning.png)

In [14]:
classifier = SVC(kernel = 'rbf', gamma = 200)
classifier.fit(X, y)

SVC(gamma=200)

![manual_tuning_rbf.png](attachment:manual_tuning_rbf.png)

# C. Testing Models

In [15]:
# Import packages
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np

In [16]:
# Import the train test split
from sklearn.model_selection import train_test_split

In [17]:
# Read the data.
data = pd.read_csv('data.csv')
data.head(2)

Unnamed: 0,x1,x2,y
0,0.78051,-0.063669,0
1,0.28774,0.29139,0


In [18]:
X = data[['x1', 'x2']]
y = data['y']

In [19]:
# Use train test split to split your data 
# Use a test size of 25% and a random state of 42
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

In [20]:
# Create the decision tree model and assign it to the variable model.
model = DecisionTreeClassifier()

In [21]:
# Fit the model to the training data.
model.fit(X_train,y_train)

DecisionTreeClassifier()

In [22]:
# Make predictions on the test data
y_pred = model.predict(X_test)

In [23]:
# Calculate the accuracy and assign it to the variable acc. on the test data
acc = accuracy_score(y_test, y_pred)
acc

0.6