# ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) Lesson 3.03 | Feature Engineering

## Review of Linear Regression

- Linear regression is a way for us to relate some dependent variable $Y$ to independent variables $X_1$,$\ldots$,$X_p$.
- We might write this out in one of the following two forms:
$$
\begin{eqnarray}
Y &=& \beta_0 + \beta_1X_1 + \beta_2X_2 + \cdots + \beta_pX_p +\varepsilon\\
\mathbf{Y} &=& \mathbf{X \beta + \varepsilon}
\end{eqnarray}
$$

<details><summary>
There are four assumptions to the simple linear regression model and five assumptions to the multiple linear regression model.
</summary>
1. **Linearity:** $Y$ is linearly related to $X_i$ for all $i$.
2. **Independence:** Each residual $\varepsilon_i$ is independent of $\varepsilon_j$ for all $i\neq j$.
3. **Normality:** The errors (residuals) follow a Normal distribution with mean 0.
4. **Equality of Variance:** The errors (residuals) should have a roughly consistent pattern, regardless of the value of $X_i$. (There should be no discernable relationship between $X_i$ and the residuals.)
5. **Independence Part II:** $X_i$ is independent of $X_j$ for all $i\neq j$.
</details>
- We can measure the performance of our model by using mean squared error (MSE).

## Feature Engineering

- If I use degrees Fahrenheit to predict how much a substance will expand or inches of rain to predict traffic accidents, people outside the United States may have a tougher time understanding my work.
- If I use straight line distance (as the crow flies) between two locations, my estimated time of arrival in a taxi or a Lyft is going to be pretty bad.
- If I put text into my model without some sort of preprocessing, my computer isn't going to understand how to handle it.

Suffice it to say: If your features (variables) aren't good, your predictions and inferences won't be good!

#### What is feature engineering?

"Coming up with features is difficult, time-consuming, requires expert knowledge. 'Applied machine learning' is basically feature engineering." - Andrew Ng

**Feature engineering** is the term broadly applied to the creation and manipulation of features (variables) used in machine learning algorithms.

Unless we're working with the same data over and over again, this isn't something we can automate. It will require creativity and a good, thorough understanding of our data.

#### The Process of Data Science
1. Data Gathering
2. Data Cleaning/Munging
3. EDA
4. Modeling
5. Reporting
    - Feature engineering will straddle all five of these steps, but mostly focus on steps 2 and 3.

#### [The Process of Feature Engineering](https://www.youtube.com/watch?v=drUToKxEAUA)
1. Brainstorming or testing features.
2. Deciding what features to create.
3. Creating features.
4. Checking how the features work with your model.
5. Improving features (if needed).
6. Return to step 1.
7. "Do data science!"

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("forestfires.csv")

In [3]:
df.head()

Unnamed: 0,X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
0,7,5,mar,fri,86.2,26.2,94.3,5.1,8.2,51,6.7,0.0,0.0
1,7,4,oct,tue,90.6,35.4,669.1,6.7,18.0,33,0.9,0.0,0.0
2,7,4,oct,sat,90.6,43.7,686.9,6.7,14.6,33,1.3,0.0,0.0
3,8,6,mar,fri,91.7,33.3,77.5,9.0,8.3,97,4.0,0.2,0.0
4,8,6,mar,sun,89.3,51.3,102.2,9.6,11.4,99,1.8,0.0,0.0


[Documentation](https://archive.ics.uci.edu/ml/machine-learning-databases/forest-fires/forestfires.names)

1. Number of Instances: 517 

2. Number of Attributes: 12 + output attribute
  
   Note: several of the attributes may be correlated, thus it makes sense to apply some sort of feature selection.

3. Attribute information:

   For more information, read [Cortez and Morais, 2007].

   - X - x-axis spatial coordinate within the Montesinho park map: 1 to 9
   - Y - y-axis spatial coordinate within the Montesinho park map: 2 to 9
   - month - month of the year: "jan" to "dec" 
   - day - day of the week: "mon" to "sun"
   - FFMC - FFMC index from the FWI system: 18.7 to 96.20
   - DMC - DMC index from the FWI system: 1.1 to 291.3 
   - DC - DC index from the FWI system: 7.9 to 860.6 
   - ISI - ISI index from the FWI system: 0.0 to 56.10
   - temp - temperature in Celsius degrees: 2.2 to 33.30
   - RH - relative humidity in %: 15.0 to 100
   - wind - wind speed in km/h: 0.40 to 9.40 
   - rain - outside rain in mm/m2 : 0.0 to 6.4 
   - area - the burned area of the forest (in ha): 0.00 to 1090.84 (this output variable is very skewed towards 0.0, thus it may make sense to model with the logarithm transform). 

In [4]:
import statsmodels.api as sm

  from pandas.core import datetools


What do each of the next six cells do? Chat in your local markets and we'll summarize together.

In [5]:
dep = df['area'] # Cell 1

In [6]:
indep = df.drop(['area','month','day'], axis = 'columns') # Cell 2

In [7]:
indep.head()

Unnamed: 0,X,Y,FFMC,DMC,DC,ISI,temp,RH,wind,rain
0,7,5,86.2,26.2,94.3,5.1,8.2,51,6.7,0.0
1,7,4,90.6,35.4,669.1,6.7,18.0,33,0.9,0.0
2,7,4,90.6,43.7,686.9,6.7,14.6,33,1.3,0.0
3,8,6,91.7,33.3,77.5,9.0,8.3,97,4.0,0.2
4,8,6,89.3,51.3,102.2,9.6,11.4,99,1.8,0.0


In [8]:
indep = sm.add_constant(indep) # Cell 3

In [9]:
indep.head()

Unnamed: 0,const,X,Y,FFMC,DMC,DC,ISI,temp,RH,wind,rain
0,1.0,7,5,86.2,26.2,94.3,5.1,8.2,51,6.7,0.0
1,1.0,7,4,90.6,35.4,669.1,6.7,18.0,33,0.9,0.0
2,1.0,7,4,90.6,43.7,686.9,6.7,14.6,33,1.3,0.0
3,1.0,8,6,91.7,33.3,77.5,9.0,8.3,97,4.0,0.2
4,1.0,8,6,89.3,51.3,102.2,9.6,11.4,99,1.8,0.0


In [10]:
model = sm.OLS(dep,indep) # Cell 4

In [11]:
results = model.fit() # Cell 5

In [13]:
results.summary() # Cell 6

0,1,2,3
Dep. Variable:,area,R-squared:,0.022
Model:,OLS,Adj. R-squared:,0.002
Method:,Least Squares,F-statistic:,1.119
Date:,"Tue, 30 Jan 2018",Prob (F-statistic):,0.345
Time:,10:45:34,Log-Likelihood:,-2874.8
No. Observations:,517,AIC:,5772.0
Df Residuals:,506,BIC:,5818.0
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-6.3693,63.019,-0.101,0.920,-130.181,117.443
X,1.9079,1.448,1.317,0.188,-0.938,4.753
Y,0.5692,2.736,0.208,0.835,-4.807,5.945
FFMC,-0.0392,0.661,-0.059,0.953,-1.337,1.259
DMC,0.0773,0.067,1.151,0.250,-0.055,0.209
DC,-0.0033,0.016,-0.200,0.841,-0.036,0.029
ISI,-0.7137,0.772,-0.925,0.355,-2.229,0.802
temp,0.8002,0.787,1.017,0.310,-0.746,2.347
RH,-0.2306,0.237,-0.972,0.332,-0.697,0.236

0,1,2,3
Omnibus:,975.065,Durbin-Watson:,1.647
Prob(Omnibus):,0.0,Jarque-Bera (JB):,781330.782
Skew:,12.57,Prob(JB):,0.0
Kurtosis:,191.782,Cond. No.,14000.0


In [14]:
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                   area   R-squared:                       0.022
Model:                            OLS   Adj. R-squared:                  0.002
Method:                 Least Squares   F-statistic:                     1.119
Date:                Tue, 30 Jan 2018   Prob (F-statistic):              0.345
Time:                        10:45:40   Log-Likelihood:                -2874.8
No. Observations:                 517   AIC:                             5772.
Df Residuals:                     506   BIC:                             5818.
Df Model:                          10                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -6.3693     63.019     -0.101      0.9

Now, let's walk through some feature engineering!

#### Conversion

First, let's say that we want to convert `temp` from Celsius to Fahrenheit. Let's do that now by engineering a new feature `Fahr`, then build a new model and check the summary.

In [15]:
indep['Fahr'] = 1.8 * indep['temp'] + 32

In [16]:
indep.head()

Unnamed: 0,const,X,Y,FFMC,DMC,DC,ISI,temp,RH,wind,rain,Fahr
0,1.0,7,5,86.2,26.2,94.3,5.1,8.2,51,6.7,0.0,46.76
1,1.0,7,4,90.6,35.4,669.1,6.7,18.0,33,0.9,0.0,64.4
2,1.0,7,4,90.6,43.7,686.9,6.7,14.6,33,1.3,0.0,58.28
3,1.0,8,6,91.7,33.3,77.5,9.0,8.3,97,4.0,0.2,46.94
4,1.0,8,6,89.3,51.3,102.2,9.6,11.4,99,1.8,0.0,52.52


In [17]:
model = sm.OLS(dep,indep)

In [18]:
results = model.fit()

In [20]:
results.summary()

0,1,2,3
Dep. Variable:,area,R-squared:,0.022
Model:,OLS,Adj. R-squared:,0.002
Method:,Least Squares,F-statistic:,1.119
Date:,"Tue, 30 Jan 2018",Prob (F-statistic):,0.345
Time:,10:52:22,Log-Likelihood:,-2874.8
No. Observations:,517,AIC:,5772.0
Df Residuals:,506,BIC:,5818.0
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-0.0711,0.276,-0.258,0.797,-0.613,0.471
X,1.9079,1.448,1.317,0.188,-0.938,4.753
Y,0.5692,2.736,0.208,0.835,-4.807,5.945
FFMC,-0.0392,0.661,-0.059,0.953,-1.337,1.259
DMC,0.0773,0.067,1.151,0.250,-0.055,0.209
DC,-0.0033,0.016,-0.200,0.841,-0.036,0.029
ISI,-0.7137,0.772,-0.925,0.355,-2.229,0.802
temp,1.1545,3.829,0.301,0.763,-6.369,8.678
RH,-0.2306,0.237,-0.972,0.332,-0.697,0.236

0,1,2,3
Omnibus:,975.065,Durbin-Watson:,1.647
Prob(Omnibus):,0.0,Jarque-Bera (JB):,781330.782
Skew:,12.57,Prob(JB):,0.0
Kurtosis:,191.782,Cond. No.,3.92e+18


In [21]:
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                   area   R-squared:                       0.022
Model:                            OLS   Adj. R-squared:                  0.002
Method:                 Least Squares   F-statistic:                     1.119
Date:                Tue, 30 Jan 2018   Prob (F-statistic):              0.345
Time:                        10:52:29   Log-Likelihood:                -2874.8
No. Observations:                 517   AIC:                             5772.
Df Residuals:                     506   BIC:                             5818.
Df Model:                          10                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.0711      0.276     -0.258      0.7

This is an example of a conversion. In this case, our goal isn't to add any new information to the model - we haven't given our model any "new" information - we just want to change the way our model accounts for temperature by using Fahrenheit!

**Check:** How would I interpret the coefficient for `temp` here? (Remember that our `Y` variable is `area`.)

#### Dummy Variables
Second, let's say that we want to convert `day` into a dummy variable. (A variable with values of `1`s and `0`s; enables our computer to understand categorical data.) There are a few ways of doing this. We'll go through one of the most common, `get_dummies()`.

In [22]:
df.head()

Unnamed: 0,X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
0,7,5,mar,fri,86.2,26.2,94.3,5.1,8.2,51,6.7,0.0,0.0
1,7,4,oct,tue,90.6,35.4,669.1,6.7,18.0,33,0.9,0.0,0.0
2,7,4,oct,sat,90.6,43.7,686.9,6.7,14.6,33,1.3,0.0,0.0
3,8,6,mar,fri,91.7,33.3,77.5,9.0,8.3,97,4.0,0.2,0.0
4,8,6,mar,sun,89.3,51.3,102.2,9.6,11.4,99,1.8,0.0,0.0


In [23]:
pd.get_dummies(df['day'])

Unnamed: 0,fri,mon,sat,sun,thu,tue,wed
0,1,0,0,0,0,0,0
1,0,0,0,0,0,1,0
2,0,0,1,0,0,0,0
3,1,0,0,0,0,0,0
4,0,0,0,1,0,0,0
5,0,0,0,1,0,0,0
6,0,1,0,0,0,0,0
7,0,1,0,0,0,0,0
8,0,0,0,0,0,1,0
9,0,0,1,0,0,0,0


In [24]:
df.head()

Unnamed: 0,X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
0,7,5,mar,fri,86.2,26.2,94.3,5.1,8.2,51,6.7,0.0,0.0
1,7,4,oct,tue,90.6,35.4,669.1,6.7,18.0,33,0.9,0.0,0.0
2,7,4,oct,sat,90.6,43.7,686.9,6.7,14.6,33,1.3,0.0,0.0
3,8,6,mar,fri,91.7,33.3,77.5,9.0,8.3,97,4.0,0.2,0.0
4,8,6,mar,sun,89.3,51.3,102.2,9.6,11.4,99,1.8,0.0,0.0


In [25]:
pd.concat([df, pd.get_dummies(df['day'])], axis = 1)

Unnamed: 0,X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area,fri,mon,sat,sun,thu,tue,wed
0,7,5,mar,fri,86.2,26.2,94.3,5.1,8.2,51,6.7,0.0,0.00,1,0,0,0,0,0,0
1,7,4,oct,tue,90.6,35.4,669.1,6.7,18.0,33,0.9,0.0,0.00,0,0,0,0,0,1,0
2,7,4,oct,sat,90.6,43.7,686.9,6.7,14.6,33,1.3,0.0,0.00,0,0,1,0,0,0,0
3,8,6,mar,fri,91.7,33.3,77.5,9.0,8.3,97,4.0,0.2,0.00,1,0,0,0,0,0,0
4,8,6,mar,sun,89.3,51.3,102.2,9.6,11.4,99,1.8,0.0,0.00,0,0,0,1,0,0,0
5,8,6,aug,sun,92.3,85.3,488.0,14.7,22.2,29,5.4,0.0,0.00,0,0,0,1,0,0,0
6,8,6,aug,mon,92.3,88.9,495.6,8.5,24.1,27,3.1,0.0,0.00,0,1,0,0,0,0,0
7,8,6,aug,mon,91.5,145.4,608.2,10.7,8.0,86,2.2,0.0,0.00,0,1,0,0,0,0,0
8,8,6,sep,tue,91.0,129.5,692.6,7.0,13.1,63,5.4,0.0,0.00,0,0,0,0,0,1,0
9,7,5,sep,sat,92.5,88.0,698.6,7.1,22.8,40,4.0,0.0,0.00,0,0,1,0,0,0,0


In [26]:
df = pd.concat([df, pd.get_dummies(df['day'])], axis = 1)

In [28]:
df.head()

Unnamed: 0,X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area,fri,mon,sat,sun,thu,tue,wed
0,7,5,mar,fri,86.2,26.2,94.3,5.1,8.2,51,6.7,0.0,0.0,1,0,0,0,0,0,0
1,7,4,oct,tue,90.6,35.4,669.1,6.7,18.0,33,0.9,0.0,0.0,0,0,0,0,0,1,0
2,7,4,oct,sat,90.6,43.7,686.9,6.7,14.6,33,1.3,0.0,0.0,0,0,1,0,0,0,0
3,8,6,mar,fri,91.7,33.3,77.5,9.0,8.3,97,4.0,0.2,0.0,1,0,0,0,0,0,0
4,8,6,mar,sun,89.3,51.3,102.2,9.6,11.4,99,1.8,0.0,0.0,0,0,0,1,0,0,0


In [30]:
dep = df['area']
indep = df.drop(['area','month','day'], axis = 'columns')
indep = sm.add_constant(indep)
model = sm.OLS(dep,indep)
results = model.fit()
results.summary()

0,1,2,3
Dep. Variable:,area,R-squared:,0.03
Model:,OLS,Adj. R-squared:,-0.001
Method:,Least Squares,F-statistic:,0.9576
Date:,"Tue, 30 Jan 2018",Prob (F-statistic):,0.503
Time:,11:13:14,Log-Likelihood:,-2872.6
No. Observations:,517,AIC:,5779.0
Df Residuals:,500,BIC:,5852.0
Df Model:,16,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-8.7901,55.322,-0.159,0.874,-117.482,99.902
X,1.8113,1.454,1.246,0.213,-1.045,4.668
Y,0.6610,2.750,0.240,0.810,-4.742,6.064
FFMC,-0.0184,0.665,-0.028,0.978,-1.325,1.288
DMC,0.0709,0.068,1.045,0.297,-0.062,0.204
DC,-0.0010,0.017,-0.059,0.953,-0.034,0.032
ISI,-0.6270,0.782,-0.802,0.423,-2.164,0.910
temp,0.7312,0.803,0.911,0.363,-0.846,2.308
RH,-0.2167,0.244,-0.890,0.374,-0.695,0.262

0,1,2,3
Omnibus:,968.536,Durbin-Watson:,1.638
Prob(Omnibus):,0.0,Jarque-Bera (JB):,750384.304
Skew:,12.4,Prob(JB):,0.0
Kurtosis:,187.984,Cond. No.,4.35e+18


In [31]:
dep = df['area']
indep = df.drop(['area','month','day','fri'], axis = 'columns')
indep = sm.add_constant(indep)
model = sm.OLS(dep,indep)
results = model.fit()
results.summary()

0,1,2,3
Dep. Variable:,area,R-squared:,0.03
Model:,OLS,Adj. R-squared:,-0.001
Method:,Least Squares,F-statistic:,0.9576
Date:,"Tue, 30 Jan 2018",Prob (F-statistic):,0.503
Time:,11:13:39,Log-Likelihood:,-2872.6
No. Observations:,517,AIC:,5779.0
Df Residuals:,500,BIC:,5852.0
Df Model:,16,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-16.4875,63.745,-0.259,0.796,-141.728,108.753
X,1.8113,1.454,1.246,0.213,-1.045,4.668
Y,0.6610,2.750,0.240,0.810,-4.742,6.064
FFMC,-0.0184,0.665,-0.028,0.978,-1.325,1.288
DMC,0.0709,0.068,1.045,0.297,-0.062,0.204
DC,-0.0010,0.017,-0.059,0.953,-0.034,0.032
ISI,-0.6270,0.782,-0.802,0.423,-2.164,0.910
temp,0.7312,0.803,0.911,0.363,-0.846,2.308
RH,-0.2167,0.244,-0.890,0.374,-0.695,0.262

0,1,2,3
Omnibus:,968.536,Durbin-Watson:,1.638
Prob(Omnibus):,0.0,Jarque-Bera (JB):,750384.304
Skew:,12.4,Prob(JB):,0.0
Kurtosis:,187.984,Cond. No.,14100.0


**Check:** How would I interpret the coefficient for `Mon`?

#### Advanced Feature Engineering
There's not an exhaustive list of ways we can engineer features. However, let's chat about a use-case in the context of this data.

We have horizontal and vertical spatial coordinates, listed as `X` and `Y` in our data. Perhaps we believe that area (our dependent variable) is linearly related to `X` or `Y`, but that's not very likely.

What if we think that `X` and `Y` have some combined effect on area? Here, we can create an **[interaction term](http://statisticsbyjim.com/regression/interaction-effects/)**.
- An interaction term is a way for us to account for the case when two independent variables have some joint effect or interact in some way.

In [33]:
dep = df['area']
indep = df[['X','Y']] ## EDITED
indep = sm.add_constant(indep)
indep['XY'] = indep['X'] * indep['Y'] ## NEW LINE
model = sm.OLS(dep,indep)
results = model.fit()
results.summary()

0,1,2,3
Dep. Variable:,area,R-squared:,0.006
Model:,OLS,Adj. R-squared:,0.0
Method:,Least Squares,F-statistic:,1.028
Date:,"Tue, 30 Jan 2018",Prob (F-statistic):,0.38
Time:,11:27:19,Log-Likelihood:,-2878.9
No. Observations:,517,AIC:,5766.0
Df Residuals:,513,BIC:,5783.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,18.7297,19.784,0.947,0.344,-20.137,57.596
X,-1.8621,3.794,-0.491,0.624,-9.316,5.592
Y,-3.3095,5.033,-0.658,0.511,-13.198,6.579
XY,0.7887,0.819,0.963,0.336,-0.820,2.397

0,1,2,3
Omnibus:,981.72,Durbin-Watson:,1.658
Prob(Omnibus):,0.0,Jarque-Bera (JB):,806473.069
Skew:,12.751,Prob(JB):,0.0
Kurtosis:,194.801,Cond. No.,198.0


If $X_1$ increases by 1, what is the effect on $Y$?
$$
\begin{eqnarray*}
Y_1 &=& \beta_0 + \beta_1X_1 + \beta_2X_2 + \beta_3X_1X_2 \\
\Rightarrow Y_2 &=& \beta_0 + \beta_1(X_1 + 1) + \beta_2X_2 + \beta_3(X_1 + 1)X_2 \\
\Rightarrow Y_\Delta &=& Y_2 - Y_1 \\
&=& \left(\beta_0 + \beta_1(X_1 + 1) + \beta_2X_2 + \beta_3(X_1 + 1)X_2\right) - \left(\beta_0 + \beta_1X_1 + \beta_2X_2 + \beta_3X_1X_2\right) \\
&=& (\beta_0 - \beta_0) + (\beta_1X_1 + \beta_1 - \beta_1X_1) + (\beta_2X_2 - \beta_2X_2) + (\beta_3X_1X_2 + \beta_3X_2 - \beta_3X_1X_2) \\
&=& \beta_1 + \beta_3X_2
\end{eqnarray*}
$$
If $X_1$ increases by 1, $Y$ is expected to increase by $\beta_1 + \beta_3X_2$.

Interpreting interaction terms is much less straightforward than interpreting "marginal" terms.

---

The last feature we'll engineer has to do with distance.

We have geographic data across a grid. We can talk about how horizontal distance matters or how vertical distance matters. We can also talk about how horizontal and vertical interact, but this is still clunky and awkward for us to handle.

Instead, what if we could actually define **distance** from a certain point?

$$
dist(A,B) = \sqrt{(A_X - B_X) ^ 2 + (A_Y - B_Y) ^ 2}
$$

Let's say that there's a specific landmark of interest at the point $X = 6, Y = 5$, and we want to engineer a new feature that represents distance from this landmark.

In [34]:
indep.head()

Unnamed: 0,const,X,Y,XY
0,1.0,7,5,35
1,1.0,7,4,28
2,1.0,7,4,28
3,1.0,8,6,48
4,1.0,8,6,48


In [35]:
indep['Distance'] = ((indep['X'] - 6) ** 2 + (indep['Y'] - 5) ** 2) ** 0.5

In [36]:
indep.head(20)

Unnamed: 0,const,X,Y,XY,Distance
0,1.0,7,5,35,1.0
1,1.0,7,4,28,1.414214
2,1.0,7,4,28,1.414214
3,1.0,8,6,48,2.236068
4,1.0,8,6,48,2.236068
5,1.0,8,6,48,2.236068
6,1.0,8,6,48,2.236068
7,1.0,8,6,48,2.236068
8,1.0,8,6,48,2.236068
9,1.0,7,5,35,1.0


## Conclusion

#### When do we decide when to stop?
- There's never infinite time.
- We have to consider **both** a) how useful it is and b) how much time you're willing to invest.
    - You will **always** be able to engineer more features.
    - [Diminishing marginal returns](https://www.investopedia.com/terms/l/lawofdiminishingmarginalreturn.asp) will absolutely play a role here.