# Feature Engineering

In this notebook, we will focus on feature engineering for our recommendation system. Feature engineering is a crucial step in the machine learning pipeline as it can significantly impact the performance of our models. We will create new features from the existing dataset that may help improve the recommendation accuracy.

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np

# Load the processed data
processed_data_path = '../data/processed/processed_data.csv'
data = pd.read_csv(processed_data_path)

# Display the first few rows of the dataset
data.head()

## Creating New Features

We will create several new features that could be beneficial for our recommendation model. These features may include:
- User preferences based on historical data
- Item popularity metrics
- Time-based features (e.g., day of the week, month)

Let's implement these feature engineering steps.

In [2]:
# Example: Creating a feature for item popularity
item_popularity = data.groupby('item_id')['rating'].count().reset_index()
item_popularity.columns = ['item_id', 'popularity']

# Merge the popularity feature back to the main dataset
data = data.merge(item_popularity, on='item_id', how='left')

# Example: Creating time-based features
data['timestamp'] = pd.to_datetime(data['timestamp'])
data['day_of_week'] = data['timestamp'].dt.dayofweek
data['month'] = data['timestamp'].dt.month

# Display the updated dataset with new features
data.head()

## Saving the Engineered Features

After creating the new features, we will save the updated dataset for use in the modeling phase.

In [3]:
# Save the updated dataset with new features
data.to_csv('../data/processed/processed_data_with_features.csv', index=False)
print('Processed data with new features saved successfully!')

## Conclusion

In this notebook, we performed feature engineering to enhance our dataset for the recommendation system. We created new features such as item popularity and time-based features. The updated dataset has been saved for further modeling and evaluation.