<a href="https://colab.research.google.com/github/Jhansipothabattula/Machine_Learning/blob/main/Day32.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Creating and Transforming Features

**Feature Creation**

  - What is Feature Creation?

    - Feature creation involves deriving new, meaningful features from existing ones to enhance a model's ability to capture important patterns in the data

  - What is Feature Transformation?

    - Feature Transformation modifies existing features to better suit the learning algorithm

  - Common Transformations

    - Logarithmic Transformation

      - Reduces Skewness in highly skewed distributions

    - Square Root Transformation

      - Moderately reduces skewness, often used for count data

    - Polynomial Transformation

      - Adds higher-order tems(x^2, x^3) to capture non-linear relationships

    

    

**Importance of Feature Transformation in Non-linear relationships**

- Transformations allow linear models to handle non-linear relationships

  - For Example:

    - Polynomial transformations enable linear regression to model quadratic pattens

    - Logarithmic transformations stabilize variance and handle skewness

- By Transforming features, models become more robust and capable of capturing complex patterns in data

**1. Create New features from a date column(Ex: Day of the week, month, year)**

**2. Apply Polynomial transformations to a dataset and compare model perfomance before and after transformation**

In [1]:
import pandas as pd
from google.colab import files
uploaded = files.upload()

Saving bike-sharing-daily.csv to bike-sharing-daily.csv


In [8]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Load Bike Sharing Dataset
df = pd.read_csv("bike-sharing-daily.csv")

# Display dataset information
print(df.info())

# Preview the first few rows
print("Dataset Preview: \n")
print(df.head())

# Convert dteday to datetime
df["dteday"] = pd.to_datetime(df["dteday"])

# Create New Features
df["day_of_week"] = df["dteday"].dt.day_name()
df["month"] = df["dteday"].dt.month
df["year"] = df["dteday"].dt.year

# Display updated dataset
print("New features derived from Date column: \n")
print(df[["dteday", "day_of_week", "month", "year"]].head())

# Select feature and target
X = df[["temp"]]
y = df["cnt"]

# Apply Polynomial transformation
poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly.fit_transform(X)

# Display the transform features
print("Original and polynomial features: \n")
print(pd.DataFrame(X_poly, columns=(["temp", "temp^2"])))

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_poly_train, X_poly_test = train_test_split(X_poly, test_size=0.2, random_state=42)

# Train and Evaluate model with original features
model_original = LinearRegression()
model_original.fit(X_train, y_train)
y_pred_original = model_original.predict(X_test)
mse_original = mean_squared_error(y_test, y_pred_original)

# Train and Evaluate model with polynomial features
model_poly = LinearRegression()
model_poly.fit(X_poly_train, y_train) # Corrected: use y_train instead of X_test
y_pred_poly = model_poly.predict(X_poly_test)
mse_poly = mean_squared_error(y_test, y_pred_poly)

# Compare results
print(f"MSE with original features: {mse_original}")
print(f"MSE with polynomial features: {mse_poly}")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     731 non-null    int64  
 1   dteday      731 non-null    object 
 2   season      731 non-null    int64  
 3   yr          731 non-null    int64  
 4   mnth        731 non-null    int64  
 5   holiday     731 non-null    int64  
 6   weekday     731 non-null    int64  
 7   workingday  731 non-null    int64  
 8   weathersit  731 non-null    int64  
 9   temp        731 non-null    float64
 10  atemp       731 non-null    float64
 11  hum         731 non-null    float64
 12  windspeed   731 non-null    float64
 13  casual      731 non-null    int64  
 14  registered  731 non-null    int64  
 15  cnt         731 non-null    int64  
dtypes: float64(4), int64(11), object(1)
memory usage: 91.5+ KB
None
Dataset Preview: 

   instant      dteday  season  yr  mnth  holiday  weekday  workingda