-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path11 - Evaluate.py
61 lines (53 loc) · 2.26 KB
/
11 - Evaluate.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
import numpy
import matplotlib.pyplot as plt
from sklearn.metrics import r2_score
print('Machine Learning')
print()
'''
What is Train/Test - Train/Test is a method to measure the accuracy
of your model. It is called Train/Test because you split the data
set into two sets: a training set and a testing set. 80% for training,
and 20% for testing.
'''
# Start with a data set you want to test.
# Our data set illustrates 100 customers in a shop, and their shopping habits.
numpy.random.seed(2)
x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x
plt.scatter(x, y)
plt.show()
# Split Into Train/Test - The training set should be a random selection
# of 80% of the original data. The testing set should be the remaining 20%.
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
# Display the same scatter plot with the training set:
plt.scatter(train_x, train_y)
plt.show() # It looks like the original data set, so it seems to
# be a fair selection.
# To make sure the testing set is not completely different,
# we will take a look at the testing set as well.
plt.scatter(test_x, test_y)
plt.show() # The testing set also looks like the original data set.
# Draw a polynomial regression line through the data points:
mymodel = numpy.poly1d(numpy.polyfit(train_x, train_y, 4))
myline = numpy.linspace(0, 6, 100)
plt.scatter(train_x, train_y)
plt.plot(myline, mymodel(myline))
plt.show()
# Remember R2, also known as R-squared? - It measures the relationship
# between the x axis and the y axis, and the value ranges from 0 to 1,
# where 0 means no relationship, and 1 means totally related.
r2 = r2_score(train_y, mymodel(train_x))
print(r2) # Note: The result 0.799 shows that there is a OK relationship.
# Now we want to test the model with the testing data as well, to see if
# gives us the same result. Let us find the R2 score when using testing data:
r2 = r2_score(test_y, mymodel(test_x))
print(r2) # The result 0.809 shows that the model fits the testing set as
# well, and we are confident that we can use the model to predict future values.
# Predict Values - Now that we have established that our model is OK,
# we can start predicting new values.
# How much money will a buying customer spend, if she or he stays in
# the shop for 5 minutes?
print(mymodel(5))