# Machine Learning - Classification

## Did they buy?

This Jupyter notebook contains a classification test case where,
given some input data, we'll try to predict whether a user bought anything or not.

In [47]:
# Imports
import pandas as pd
from sklearn.metrics import accuracy_score
from sklearn.svm import LinearSVC

In [48]:
# Data
data = pd.read_csv('../data/tracking.csv')
data.head()

Unnamed: 0,home,how_it_works,contact,bought
0,1,1,0,0
1,1,1,0,0
2,1,1,0,0
3,1,1,0,0
4,1,1,0,0


### Features

So, as we can see by the data, we have the following features:
- Whether the user visited the 'Home' page
- Whether the user visited the 'How it works' page
- Whether the user visited the 'Contact' page

And we have the classification:
- Whether the user bought anything

For all the columns, we have a binary representation:
- 0 means 'no'
- 1 means 'yes'

In [49]:
# Splitting x and y
x = data[['home', 'how_it_works', 'contact']]
y = data[['bought']]
y.head()

Unnamed: 0,bought
0,0
1,0
2,0
3,0
4,0


In [50]:
# Shape of our data
data.shape

(99, 4)

In [51]:
# We'll split our x and y into train and test dataframes

# Train data
train_x = x[:75]
train_y = y[:75]

# Test data
test_x = x[75:]
test_y = y[75:]

print("We''l be training with {} elements and testing with {} elements".format(len(train_x), len(test_x)))

We''l be training with 75 elements and testing with 24 elements


### Model

In [52]:
model = LinearSVC()
model.fit(train_x, train_y.values.ravel())

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
          intercept_scaling=1, loss='squared_hinge', max_iter=1000,
          multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
          verbose=0)

### Predictions

In [53]:
predictions = model.predict(test_x)
print(predictions)

[0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0]


### Results

In [54]:
accuracy = accuracy_score(test_y, predictions)
print("Accuracy score: %.2f" % (accuracy * 100))

Accuracy score: 95.83
