# EazyML Modeling: Walmart Regression

## Define Imports

In [None]:
!pip install --upgrade eazyml-automl
!pip install gdown python-dotenv

In [None]:
import os
import pandas as pd
import eazyml as ez
import gdown
from dotenv import load_dotenv
load_dotenv()

## 1. Initialize EazyML

The `ez_init` function uses the `EAZYML_ACCESS_KEY` environment variable for authentication. If the variable is not set, it defaults to a trial license.

In [None]:
ez.ez_init(os.getenv('EAZYML_ACCESS_KEY'))

## 2. Define Dataset Files, Outcome Variable and Train Model

### 2.1 Define Train Dataset and Other Model Parameters

In [None]:
gdown.download_folder(id='16LfwRMjchrPgdbsgPHr79AHvNCHsL5Is')

In [None]:
reg_file_path = os.path.join('data', "walmart_train_data.csv")
reg_outcome = "Weekly_Sales"
df_reg = pd.read_csv(reg_file_path)
options = {'model_type': 'predictive'}

### 2.2 Train Model

In [None]:
resp_reg = ez.ez_build_model(df_reg, outcome=reg_outcome, options=options)

### 2.3 Show Model Performance

In [None]:
ez.ez_display_df(resp_reg['model_performance'])

## 3. Dataset Information

The dataset used in this notebook is the **Walmart Dataset**, which contains data related to sales at Walmart stores. It includes various features such as store, fuel price, sales data, and other metrics over a specified period of time.

You can find more details and download the dataset from Kaggle using the following link:

[Kaggle Walmart Dataset](https://www.kaggle.com/datasets/yasserh/walmart-dataset)

### Columns in the Dataset:
- **Store**: The store number.
- **Weekly_Sales**: Sales for the given store.
- **IsHoliday**: Whether the week is a special holiday week 1 – Holiday week 0 – Non-holiday week.
- **Temperature**: Temperature on the day of sale.
- **Fuel_Price**: Cost of fuel in the region.
- **CPI**: Prevailing consumer price index.
- **Unemployment**: Prevailing unemployment rate,

### 3.1 Display the Dataset

Below is a preview of the dataset:

In [None]:
# Load the dataset from the provided file
train = pd.read_csv(reg_file_path)

# Display the first few rows of the dataset
ez.ez_display_df(train.head())

## 4. Define Test Dataset and Predict on that Dataset

In [None]:
reg_model_info = resp_reg["model_info"]

### 4.1 Define Test Dataset

In [None]:
reg_test_file_path = os.path.join('data', "walmart_train_data.csv")
reg_test_data = pd.read_csv(reg_test_file_path)

### 4.2 Predict on Test Dataset

In [None]:
options = {}
reg_pred_df = ez.ez_predict(reg_test_data, model_info=reg_model_info, options=options)
pred_df = reg_pred_df['pred_df']

In [None]:
ez.ez_display_df(pred_df.head())