# Simple Linear Regression

We want to know how to make our chocolate-bar customers happier. To do this, we need to know which chocolate bar _features_ predict customer happiness. For example, customers may be happier when chocolate bars are bigger, or when they contain more cocoa or more sugar. 

We have data on customer happiness when eating chocolate bars with different features. Lets look at the relationship between happiness and bar size.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score

In [None]:
train=pd.read_csv("chocolatedata.txt", index_col=False, sep="\t",header=0)
train.shape

In [None]:
train.head()

## Splitting the data into Train and Test

In [None]:
x = np.array(train.customer_happiness).reshape(-1, 1)
y = np.array(train.cocoa_percent)

In [None]:
#Splitting the data into Train and Test
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.30, random_state=5)

## Train the model using training dataset

In [None]:
regression = LinearRegression()
chocmodel = regression.fit(x_train, y_train)

In [None]:
# Model is trained
chocmodel

### Predicting customer happiness using test dataset

In [None]:
y_pred = chocmodel.predict(x_test)
print(y_pred)

In [None]:
y_train

## TO DO 1: Check if heavier chocolate bars (weight) make people happier

## TO DO 2: Check if more sugar (sugar_percent) make people happier

## TO DO 3: Explore the code below and research what it does

## Visualizing the training Test Results 

In [None]:
plt.scatter(x_train, y_train, color= 'red')
plt.plot(x_train, chocmodel.predict(x_train), color = 'blue')
plt.title ("Visuals for Training Dataset")
plt.xlabel("Customer Happiness")
plt.ylabel("Cocoa percent")
plt.show()

## Visualizing the Test Results 

In [None]:
plt.scatter(x_test, y_test, color= 'red')
plt.plot(x_test, chocmodel.predict(x_test), color = 'blue')
plt.title ("Visuals for Training Dataset")
plt.xlabel("Customer Happiness")
plt.ylabel("Cocoa percent")
plt.show()

In [None]:
from sklearn import metrics
print("Performance of Linear regressor:")
print("Mean absolute error =", round(metrics.mean_absolute_error(y_test, y_pred), 2))
print("Mean squared error =", round(metrics.mean_squared_error(y_test, y_pred), 2))
print("Median absolute error =", round(metrics.median_absolute_error(y_test, y_pred), 2))
print("Explain variance score =", round(metrics.explained_variance_score(y_test, y_pred),2))
print("R2 score =", round(metrics.r2_score(y_test, y_pred), 2))

In [None]:
chocmodel.predict(x_test)

In [None]:
chocmodel.score(x_test, y_test)
chocmodel.coef_
chocmodel.intercept_

In [None]:
plt.scatter(x_test, y_test, color= 'red')
plt.plot(x_test, chocmodel.predict(x_test), color = 'blue')
plt.title ("Visuals for Training Dataset")
plt.xlabel("Customer Happiness")
plt.ylabel("Cocoa percent")
plt.show()