# Predicting Sales Revenue

Using AutoML to predict total sales revenue of a product from a store using the following features: 
- Brand (The brand of the product)
- Quantity (Quantity of product purchased)
- Advert (Whether the product had an advertisement or not)
- Price (How much the product costs)

 ![Sales Forecasting](https://stretailprod.blob.core.windows.net/notebookimages/sales_revenue.jpg?sp=r&st=2022-07-28T22:34:27Z&se=2023-07-29T06:34:27Z&spr=https&sv=2021-06-08&sr=b&sig=fSAuuOQTB8c78YiqYZUl6XwJu%2FN%2FXHLadEQJ%2FdJxvyU%3D)


### Importing libraries

In [1]:
!pip freeze

absl-py==0.15.0
adal==1.2.7
adlfs==2022.4.0
aiohttp==3.8.1
aiohttp-cors==0.7.0
aiosignal==1.2.0
alembic==1.8.0
ansiwrap==0.8.4
antlr4-python3-runtime==4.9.3
anyio==3.6.1
applicationinsights==0.11.10
arch==4.14
argcomplete==2.0.0
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
arviz @ file:///tmp/build/80754af9/arviz_1614019183254/work
asgiref==3.5.2
astroid==2.11.6
asttokens==2.0.5
astunparse==1.6.3
async-timeout==4.0.2
attrs==21.4.0
auto-tqdm==1.0.2
autokeras==1.0.16
autopep8==1.6.0
autovizwidget==0.20.0
azure-ai-ml==0.1.0b3
azure-appconfiguration==1.1.1
azure-batch==12.0.0
azure-cli==2.37.0
azure-cli-core==2.37.0
azure-cli-telemetry==1.0.6
azure-common==1.1.28
azure-core==1.22.1
azure-cosmos==3.2.0
azure-data-tables==12.4.0
azure-datalake-store==0.0.52
azure-graphrbac==0.61.1
azure-identity==1.7.0
azure-keyvault==1.1.0
azure-keyvault-administration==4.0.0b3
azure-keyvault-keys==4.5.1
azure-loganalytics==0.1.1
azure-mgmt-advisor==9.0.0
azur

In [16]:
import azureml.core
from azureml.core import Experiment, Workspace, Dataset, Datastore
from azureml.train.automl import AutoMLConfig
from azureml.data.dataset_factory import TabularDatasetFactory
from azure.ai.ml.constants import AssetTypes
from azure.ai.ml import automl, Input
from azureml.automl.core.forecasting_parameters import ForecastingParameters

In [17]:
from pyspark.sql import SparkSession
import matplotlib.pyplot as plt 
import seaborn as sns
import numpy as np
from azure.storage.blob import ContainerClient, BlobClient
import pandas as pd
from io import BytesIO
from copy import deepcopy
from datetime import datetime
from dateutil import parser
import logging
import GlobalVariables as gv

### Setting up workspace

In [18]:
# Azure Machine Learning workspace
ws = Workspace.from_config()

experiment_name = "salestransdata-automl"
experiment = Experiment(ws, experiment_name)

### Reading data from datastore

In [19]:
train_data = Dataset.get_by_name(ws, name='sales_dataset_train')
train_data.to_pandas_dataframe().head(5)

Unnamed: 0,Store,Brand,Quantity,Advert,Price,Revenue
0,177.0,BrandB,19243.0,1.0,2.1,40410.3
1,1136.0,BrandB,12646.0,1.0,2.0,25292.0
2,1154.0,BrandB,16719.0,1.0,1.91,31933.289999
3,1448.0,BrandB,14016.0,1.0,2.41,33778.56
4,1276.0,BrandA,17105.0,1.0,2.07,35407.35


In [20]:
test_data = Dataset.get_by_name(ws, name='sales_dataset_test')
test_data.to_pandas_dataframe().head(5)

Unnamed: 0,Store,Brand,Quantity,Advert,Price,Revenue
0,1781.0,BrandA,15473.0,1.0,2.32,35897.36
1,1397.0,BrandA,11173.0,1.0,2.46,27485.579999
2,1121.0,BrandA,16566.0,1.0,2.62,43402.92
3,1632.0,BrandA,18340.0,1.0,2.31,42365.4
4,1113.0,BrandB,12874.0,1.0,1.99,25619.26


### Configuring and running experiment

In [21]:
automl_config = AutoMLConfig(
                             task = "regression",
                             training_data = train_data,
                             test_data=test_data,
                             label_column_name = "Revenue",
                             primary_metric = "normalized_root_mean_squared_error",
                             experiment_timeout_hours = 0.5,
                             max_concurrent_iterations = 2,
                             n_cross_validations = 5,
                             compute_target= gv.SALES_AUTOML_COMPUTE_NAME,
                             featurization = 'auto')

In [22]:
run = experiment.submit(automl_config)

Class SynapseCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
salestransdata-automl,AutoML_33242e5c-fcb3-40b4-aaaa-1d6198717e23,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


In [23]:
run.wait_for_completion()

# Get best model from automl run
best_run, non_onnx_model = run.get_output()

Package:azureml-automl-runtime, training version:1.43.0, current version:1.42.0
Package:azureml-core, training version:1.43.0, current version:1.42.0
Package:azureml-dataset-runtime, training version:1.43.0, current version:1.42.0
Package:azureml-defaults, training version:1.43.0, current version:1.42.0
Package:azureml-interpret, training version:1.43.0, current version:1.42.0
Package:azureml-mlflow, training version:1.43.0.post1, current version:1.42.0
Package:azureml-pipeline-core, training version:1.43.0, current version:1.42.0
Package:azureml-responsibleai, training version:1.43.0, current version:1.42.0
Package:azureml-telemetry, training version:1.43.0, current version:1.42.0
Package:azureml-train-automl-client, training version:1.43.0, current version:1.42.0
Package:azureml-train-automl-runtime, training version:1.43.0.post1, current version:1.42.0
Package:azureml-train-core, training version:1.43.0, current version:1.42.0
Package:azureml-train-restclients-hyperdrive, training v

In [32]:
print(best_run,"\n>>",non_onnx_model)

Run(Experiment: salestransdata-automl,
Id: AutoML_33242e5c-fcb3-40b4-aaaa-1d6198717e23_32,
Type: azureml.scriptrun,
Status: Completed) 
>> RegressionPipeline(pipeline=Pipeline(memory=None,
                                     steps=[('datatransformer',
                                             DataTransformer(enable_dnn=False, enable_feature_sweeping=True, feature_sweeping_config={}, feature_sweeping_timeout=86400, featurization_config=None, force_text_dnn=False, is_cross_validation=True, is_onnx_compatible=False, observer=None, task='regression', working_dir='/mnt/batch/ta...
                                             PreFittedSoftVotingRegressor(estimators=[('31', Pipeline(memory=None, steps=[('standardscalerwrapper', StandardScalerWrapper(copy=True, with_mean=False, with_std=False)), ('xgboostregressor', XGBoostRegressor(booster='gbtree', colsample_bytree=0.6, eta=0.3, gamma=0, max_depth=6, max_leaves=0, n_estimators=100, n_jobs=1, objective='reg:linear', problem_info=ProblemIn