<a href="https://colab.research.google.com/github/dquerales/jupyter-automation-github-actions/blob/main/notebooks/model_training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Model Training
Daniel Querales - d.querales@gmail.com

___

 **Changelog:**
- 2023-01-02: File created.

___

## Table of Contents
- [Importing libraries](#Importing-libraries)
- [Load data](#Load-data)
- [Data Cleaning and Preparation](#Data-Cleaning-and-Preparation)
- [Data Exploration & Visualization](#Data-Exploration-&-Visualization)

---

## Importing libraries

In [1]:
import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.model_selection import train_test_split

In [2]:
import sys
sys.path.append("..")

In [3]:
from src.logger import logger, log_data_status

____

## Load data

In [4]:
df = pd.read_csv('../datasets/raw/data_raw.csv')

In [5]:
df.head()

Unnamed: 0,date,price
0,2022-11-05 00:00:00,21152.93
1,2022-11-06 00:00:00,21292.67
2,2022-11-07 00:00:00,20920.33
3,2022-11-08 00:00:00,20598.44
4,2022-11-09 00:00:00,18540.11


In [6]:
df.shape

(366, 2)

____

## Data Preparation


In [7]:
from src.prepare_data import add_date_features

In [8]:
logger('Load data'); log_data_status(df)

In [9]:
df = add_date_features(df, 'date')

In [10]:
logger('Data prepared'); log_data_status(df)

In [11]:
df.head()

Unnamed: 0_level_0,price,hour,dayofweek,quarter,month,year,dayofyear,dayofmonth
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2022-11-05,21152.93,0,5,4,11,2022,309,5
2022-11-06,21292.67,0,6,4,11,2022,310,6
2022-11-07,20920.33,0,0,4,11,2022,311,7
2022-11-08,20598.44,0,1,4,11,2022,312,8
2022-11-09,18540.11,0,2,4,11,2022,313,9


In [12]:
TARGET = 'price'

In [13]:
X = df.drop([TARGET], axis=1)
y = df[TARGET]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=False)

____

## Modelling

In [14]:
model = xgb.XGBRegressor(n_estimators=1000)
model.fit(X_train, y_train)

In [15]:
model.save_model("../models/model.json")