# feature_engineering_homework

The goal of this notebook is to create some new features using our test dataset.

In [2]:
import pandas as pd
import numpy as np

# Example synthetic data (replace with your project dataset)
df = pd.read_csv('data/HMC.csv')
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2020-01-02,28.6,28.65,28.459999,28.639999,26.480253,262800
1,2020-01-03,28.25,28.379999,28.08,28.129999,26.008713,663600
2,2020-01-06,27.719999,28.059999,27.719999,28.049999,25.934746,463000
3,2020-01-07,28.389999,28.389999,28.18,28.209999,26.082678,341800
4,2020-01-08,27.99,28.219999,27.99,28.129999,26.008713,264200


## Implementing 2 new features

In [3]:
# New feature 1: Get the daily returns of the 'Close' price
df['daily_returns'] = df['Close'].diff() / df['Close'].shift(1)
df['daily_returns'].fillna(0, inplace=True)  # Handle NaN for the first row
# Alternative: df['daily_returns'] = df['Close'].pct_change()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['daily_returns'].fillna(0, inplace=True)  # Handle NaN for the first row


### Reason
When dealing with stock prices, daily returns are a crucial feature that captures the percentage change in price from one day to the next. This feature is often more informative than raw prices for modeling purposes, as it normalizes the data and highlights volatility and trends.

In [4]:
# New feature 2: square the 'Close' price to capture non-linear effects
df['close_squared'] = df['Close'] ** 2

### Reason
The reason we are squaring the close price is because we want to capture non-linear effects.