/
trees_ex_02.py
77 lines (62 loc) · 2.02 KB
/
trees_ex_02.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.14.5
# kernelspec:
# display_name: Python 3
# name: python3
# ---
# %% [markdown]
# # 📝 Exercise M5.02
#
# The aim of this exercise is to find out whether a decision tree model is able
# to extrapolate.
#
# By extrapolation, we refer to values predicted by a model outside of the range
# of feature values seen during the training.
#
# We first load the regression data.
# %%
import pandas as pd
penguins = pd.read_csv("../datasets/penguins_regression.csv")
feature_name = "Flipper Length (mm)"
target_name = "Body Mass (g)"
data_train, target_train = penguins[[feature_name]], penguins[target_name]
# %% [markdown]
# ```{note}
# If you want a deeper overview regarding this dataset, you can refer to the
# Appendix - Datasets description section at the end of this MOOC.
# ```
# %% [markdown]
# First, create two models, a linear regression model and a decision tree
# regression model, and fit them on the training data. Limit the depth at 3
# levels for the decision tree.
# %%
# Write your code here.
# %% [markdown]
# Create a synthetic dataset containing all possible flipper length from the
# minimum to the maximum of the training dataset. Get the predictions of each
# model using this dataset.
# %%
# Write your code here.
# %% [markdown]
# Create a scatter plot containing the training samples and superimpose the
# predictions of both models on the top.
# %%
# Write your code here.
# %% [markdown]
# Now, we check the extrapolation capabilities of each model. Create a dataset
# containing a broader range of values than your previous dataset, in other
# words, add values below and above the minimum and the maximum of the flipper
# length seen during training.
# %%
# Write your code here.
# %% [markdown]
# Finally, make predictions with both models on this new interval of data.
# Repeat the plotting of the previous exercise.
# %%
# Write your code here.