<header style="padding:1px;background:#f9f9f9;border-top:3px solid #00b2b1"><img id="Teradata-logo" src="https://www.teradata.com/Teradata/Images/Rebrand/Teradata_logo-two_color.png" alt="Teradata" width="220" align="right" />

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Sales Forecasting :- SAS and Vantage Comparison</b>
</header>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>Introduction</b></p>
<p style = 'font-size:16px;font-family:Arial'>
This demo walks through how a typical SAS user would use sales data to build a simple sales forecasting model and then will showcase how we can achieve the same using Vantage InDB Analytics</p>

<p style = 'font-size:16px;font-family:Arial'>
Customers who are finding their analytical environments difficult to manage are looking for ways to make these environments more streamlined whilst adapting to more contemporary technologies. Our open source analytical ecosystem can be leveraged to simplify and apply more governance to the data flows in your analytical environment, enabling you to increase efficiency of computation, reduce cost of ownership and take advantage of any analytical tool of choice.</p>

<p style = 'font-size:16px;font-family:Arial'> This overview shows how to undertake an analytical model foundation using ClearScape Analytics that uses data from a variety of sources. Teradata Vantage™ enables enterprises to automate and post timely model outputs for use in downstream business processes.</p>

<p style = 'font-size:22px;font-family:Arial;color:#E37C4D'><b>1. Connect to Vantage</b></p>
<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press Enter, then use down arrow to go to next cell.</p>

In [None]:
%connect local, hidewarnings=true

<p style = 'font-size:16px;font-family:Arial'>Setup for execution of notebook. Begin running steps with Shift + Enter keys.</p>

In [None]:
Set query_band='DEMO=Sales_Forecasting_SAS_Vantage_SQL.ipynb;' update for session;

<b style = 'font-size:20px;font-family:Arial;color:#E37C4D'>Getting Data for This Demo
<p style = 'font-size:16px;font-family:Arial'>We have provided data for this demo on cloud storage. You can either run the demo using foreign tables to access the data without any storage on your environment or download the data to local storage, which may yield faster execution. Still, there could be considerations of available storage. Two statements are in the following cell, and one of them is commented out. You may switch between the modes by changing the comment string.</p>

In [None]:
---call get_data('DEMO_SlsForecast_SAS_cloud');    -- takes about 20 seconds, estimated space: 0 MB
call get_data('DEMO_SlsForecast_SAS_local');     -- takes about 35 seconds, estimated space: 11 MB

<p style = 'font-size:16px;font-family:Arial'>Optional step – if you want to see status of databases/tables created and space used.</p>


In [None]:
call space_report();          -- Takes 5 seconds

<hr>
<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>2. Explore the dataset</b></p>
<p style = 'font-size:20px;font-family:Arial'><b>Transfer and explore data in SAS </b></p>
<p style = 'font-size:16px;font-family:Arial'>As a first step, we import all the data from Teradata to SAS and the first data step will create a new dataset named "sales_temp_data_1" in the "work" library. It will select all the data from the table "sales_ts_data_1_54M" located in the "Teradata" (we are calling it through our libname connection) teralib.</p>
<p style = 'font-size:16px;font-family:Arial'> We replicate the same process with the second data step. We fetch all the data from the table "sales_ts_data_2_54M" and store it in "sales_temp_data_2" in the "work" library. After running the second data step we’ll now see store_id, day of sale, transaction_id, product sku id, and transaction quantity and transaction weight within the SAS library.</p>

<code>

<div class="alert alert-block alert-warning">  
<p style = 'font-size:18px;font-family:Arial;color:#000000'><b>Equivalent SAS Code</b>    
<p style = 'font-size:16px;color:#000000'> 
/* Setting up a libname for the connection with Teradata Database */
libname teralib teradata server=barbera user=tahaw pw=tahaw database=tahaw;
options SASTRACE=',,,ds' SASTRACELOC=SASLOG nostsuffix;
<p style = 'font-size:16px;color:#000000'>
/* The first Data step is to fetch all the rows from the Teradata table and create an SAS dataset in the work library */
%let start_time = %sysfunc(datetime());
data work.sales_temp_data_1;
set TERALIB.sales_ts_data_1_54M;
run;
<p style = 'font-size:16px;color:#000000'>
/* The second Data step is to fetch all the rows from the Teradata table for the remaining attributes */
data work.sales_temp_data_2;
set TERALIB.sales_ts_data_2_54M;
run;
    </p>
</div>
</code>

<p style = 'font-size:20px;font-family:Arial'><b>Explore data in Vantage </b></p>
<p style = 'font-size:16px;font-family:Arial'>As the data is already in Vantage the data transfer step is <b>NOT</b> required. So we will explore the data in both the tables</p> 

In [None]:
Select TOP 5 * from DEMO_SlsForecast_SAS.Store_Sales_Qty;

<p style = 'font-size:16px;font-family:Arial'>This data set contain store_id, day of sale, transaction_id, product sku id, and transaction quantity and transaction weight.</p> 


In [None]:
Select TOP 5 * from DEMO_SlsForecast_SAS.Store_Sales_Amt;

<p style = 'font-size:16px;font-family:Arial'>This data set contain store_id, day of sale, transaction_id, product sku id and transaction amount.</p> 


<hr>
<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>3. Aggregating the Data</b></p>
<p style = 'font-size:20px;font-family:Arial'><b>Aggregate data in SAS</b></p>
<p style = 'font-size:16px;font-family:Arial'>As a next step, to prepare the data into a single ADS to forecast sales for this particular store, we aggregate the total sales by each store product and transaction id on a particular day.

<p style = 'font-size:16px;font-family:Arial'>We use Proc SQL in SAS, which uses SAS’s native SQL processing capabilities. Here we are applying the sum on the transaction amount and doing group by on store_id, day_of_sale, product_sku_id, transaction_id. After it executes, running it will create the dataset sales_aggregated_data_1 in the work library. This new dataset has around 52.5 million records.

<p style = 'font-size:16px;font-family:Arial'>We again apply the aggregation on our second data set, sales_temp_2 but here we do a sum on transaction quantity and weight and group by on store_id, day_of_sale, product_sku_id, transaction_id. After execution, it will create a new dataset named sales_aggregated_data_2 in the work library. This new dataset has around 52.5 million records .</p>

<code>

<div class="alert alert-block alert-warning">  
<p style = 'font-size:18px;font-family:Arial;color:#000000'><b>Equivalent SAS Code</b>    
<p style = 'font-size:16px;color:#000000'> 
/* Aggregating the amount to caluclate the total sales by each store and product on a particular day */
proc sql;
  create table work.sales_aggregated_data_1 as
  select 
    store_id, day_of_sale, product_sku_id, transaction_id,
    sum(transaction_amount) as total_sales
  from work.sales_temp_data_1
  group by store_id, day_of_sale, product_sku_id, transaction_id;
quit;
<p style = 'font-size:16px;color:#000000'> 
/* Aggregating the weight and quantity to calculate the total weight 
and total quantity by each store and product on a particular day */
proc sql;
  create table work.sales_aggregated_data_2 as
  select 
    store_id, day_of_sale, product_sku_id, transaction_id,
    sum(transaction_quantity) as total_quantity,
    sum(transaction_weight) as total_weight
  from work.sales_temp_data_2
  group by store_id, day_of_sale, product_sku_id, transaction_id;
quit;
    </p>
    </div>
</code>

<p style = 'font-size:20px;font-family:Arial'><b>Aggregate data in Vantage </b></p>
<p style = 'font-size:16px;font-family:Arial'>First we sum the amount using group by store_id, day_of_sale, product_sku_id, transaction_id.</p> 

In [None]:
create multiset table sales_aggregated_data_amt as
(select 
    store_id, day_of_sale, product_sku_id, transaction_id,
    sum(transaction_amount) as total_sales
from DEMO_SlsForecast_SAS.Store_Sales_Amt
group by store_id, day_of_sale, product_sku_id, transaction_id)with data Primary index(transaction_id);

<p style = 'font-size:16px;font-family:Arial'>Than we sum the quantity and weight using group by store_id, day_of_sale, product_sku_id, transaction_id.</p>

In [None]:
create multiset table sales_aggregated_data_qty as
(select 
    store_id, day_of_sale, product_sku_id, transaction_id,
    sum(transaction_quantity) as total_quantity,
    sum(transaction_weight) as total_weight
from DEMO_SlsForecast_SAS.Store_Sales_Qty
group by store_id, day_of_sale, product_sku_id, transaction_id)with data Primary index(transaction_id);

<hr>
<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>4. Merging the Data</b></p>
<p style = 'font-size:20px;font-family:Arial'><b>Merging data in SAS</b></p>
<p style = 'font-size:16px;font-family:Arial'>As a next step, to prepare the data into a single ADS to forecast sales for this particular store, we aggregate the total sales by each store product and transaction id on a particular day.

<p style = 'font-size:16px;font-family:Arial'>Now we’ll merge these 2 aggregated datasets to have a single table that contains store_id, day_of_sale, product_sku_id, transaction_id, transaction_amount, transaction quantity, and transaction weight. We use merge in SAS and specify the two datasets and define the join columns such as store_id, day_of_sale, product_sku_id and transaction_id. After running the merge  we now have all the required columns in one dataset. This dataset contains around 52.5 million rows.</p>

<code>

<div class="alert alert-block alert-warning">  
<p style = 'font-size:18px;font-family:Arial;color:#000000'><b>Equivalent SAS Code</b>    
<p style = 'font-size:16px;color:#000000'>   
/* Vertically merging two datasets and creating another dataset in work library with all the required attributes */
data work.merged_sales_data_c;
merge  work.sales_aggregated_data_1
       work.sales_aggregated_data_2;
       by store_id day_of_sale product_sku_id transaction_id;
       run;
<p style = 'font-size:16px;color:#000000'> 
/* Aggregating the amount to caluclate total sales in a particular day */
proc sql;
  create table work.aggregated_data as
  select 
    day_of_sale,
    sum(total_sales) as total_sales
  from work.merged_sales_data_c
  group by day_of_sale;
    quit;</p>
    </div>
    </code>

<p style = 'font-size:20px;font-family:Arial'><b>Merging data in Vantage </b></p>
<p style = 'font-size:16px;font-family:Arial'>We merge the 2 datasets in Vantage.</p> 

In [None]:
create MULTISET table merged_sales_data (
      store_id SMALLINT,
      day_of_sale DATE,
      transaction_id INTEGER,
      product_sku_id DECIMAL(18,0),
      transaction_amount DECIMAL(9,2),
      transaction_quantity SMALLINT,
      transaction_weight DECIMAL(9,2))
PRIMARY INDEX(transaction_id);

In [None]:
INSERT INTO merged_sales_data
    SELECT A.store_id,
        A.day_of_sale,
        A.transaction_id,
        A.product_sku_id,
        A.total_sales,
        B.total_quantity,
        B.total_weight
        FROM (
        SELECT 
            store_id,
            day_of_sale,
            product_sku_id,
            transaction_id,
            total_sales
            FROM sales_aggregated_data_amt
            ) AS A
        INNER JOIN 
        (
        SELECT 
            store_id,
            day_of_sale,
            product_sku_id,
            transaction_id,
            total_quantity,
            total_weight
            FROM sales_aggregated_data_qty
            ) AS B
        ON A.day_of_sale=B.day_of_sale AND A.product_sku_id=B.product_sku_id AND A.store_id=B.store_id 
        AND A.transaction_id=B.transaction_id

<p style = 'font-size:16px;font-family:Arial'>We do a final aggregation to get the total sales by day in Vantage.</p> 

In [None]:
create multiset table aggregated_sales_td as
  (select 
    rank() over(partition by day_of_sale order by day_of_sale) as SeriesId, ---Series Id created for using in ARIMA
    cast(day_of_sale as timestamp(0)) as day_of_sale,
    sum(transaction_amount) as total_sales
  from merged_sales_data 
  group by day_of_sale) with data;

In [None]:
sel TOP 5 * from aggregated_sales_td order by day_of_sale;

<p style = 'font-size:16px;font-family:Arial'>We plot the total sales by day of sales to check the series data.</p> 

<p style = 'font-size:16px;font-family:Arial'>The <b>TD_PLOT</b> function will return an image in the cell of the results showing the Auto Correlation Plot.</p>
<i>* Please <b> right click on the cell under the IMAGE column </b> from the output and choose view image to see the plot generated. </i>

In [None]:
EXECUTE FUNCTION
TD_Plot
(
    SERIES_SPEC
    (
        TABLE_NAME(aggregated_sales_td),
        ROW_AXIS(TIMECODE("day_of_sale")),
        SERIES_ID(SeriesID),
        PAYLOAD (FIELDS("total_sales"),CONTENT(REAL))
    ),
    FUNC_PARAMS
    (
        PLOTS[(
            TYPE('line'),
            LEGEND('upper left'),
            TITLE('Daily Sales')
        )],
        IMAGE('png')
    )
);

<p style = 'font-size:16px;font-family:Arial'>If you followed the instructions above, you should have seen a graph that looks like follows:</p>
<img src="images/Org_SalesData.png" alt="Auto Correlation" width="400" />
<p style = 'font-size:16px;font-family:Arial'>In the plot we can see that the Sales vary from September 2019 till November 2019 than it is flat till November 2020. The sales again vary from November 2020 till January 2021. After that there is a steep drop in sales and it remains below 1000 from January 2021 and March 2021.</p> 

<hr>
<p style = 'font-size:22px;font-family:Arial;color:#E37C4D'><b>5. Using ARIMA (AutoRegressive Integrated Moving Average) model to forecast Sales</b></p>

<p style = 'font-size:16px;font-family:Arial'>
ARIMA functions on VANTAGE run in the following order:
<br>
1. Run <b>TD_ARIMAESTIMATE</b> function to get the coefficients for the ARIMA model.
<br>
2. <i>[Optional]</i> Run <b>TD_ARIMAVALIDATE</b> function to validate the the "goodness of fit" of the ARIMA model, when
FIT_PERCENTAGE is not 100 in TD_ARIMAESTIMATE.
<br>
3. Run the <b>TD_ARIMAFORECAST</b> function with input from step 1 or step 2 to forecast the future periods
beyond the last observed period.
</p>

<hr>
<p style = 'font-size:20px;font-family:Arial;color:#E37C4D'><b>5.1 Estimation step</b></p>
<p style = 'font-size:20px;font-family:Arial'><b>Estimation step in SAS</b></p>
<p style = 'font-size:16px;font-family:Arial'>The final step is to fit the Arima model. In the first PROC ARIMA block, it identifies the best ARIMA model for the total_sales variable with a maximum lag of 30 using the identify statement and estimates the model with one seasonal difference (q=1). The parameter estimates are saved in the arima_params dataset in the work library.</p>

<code>

<div class="alert alert-block alert-warning">  
<p style = 'font-size:18px;font-family:Arial;color:#000000'><b>Equivalent SAS Code</b>    
<p style = 'font-size:16px;color:#000000'> 
/* Fit ARIMA model and calculating its coefficients */
proc arima data=work.aggregated_data;
  identify var=total_sales(1) nlag=30;
  estimate q=1 outest=arima_params;
    run;</p>
    </div>
</code>

<p style = 'font-size:20px;font-family:Arial'><b>Estimation step in Vantage</b></p>
<p style = 'font-size:16px;font-family:Arial'>The TD_ARIMAESTIMATE function estimates the coefficients corresponding to an ARIMA model and fits a series with an existing ARIMA model. The function can also provide the "goodness of fit" and the residuals of the fitting operation. The function generates a model layer used as input for the TD_ARIMAVALIDATE and TD_ARIMAFORECAST functions. This function is for univariate series.</p>

<br>

<p style = 'font-size:16px;font-family:Arial'>Here, the previously estimated parameters, namely P (Auto-Regressive lags), d (differencing order), and Q (Moving Average lags), are required to be passed into the MODEL_ORDER function. For example, the specific values used here are MODEL_ORDER(2, 1, 8).
<br>
<br>
The output of the analysis is stored in an ART (Analytical Result Table), which contains relevant information and results of the ARIMA modeling process.
<br>
<br>
Furthermore, the fit percentage is determined to be 100. This fit percentage indicates that the ARIMA model is trained using 100% of the available data.</p>

In [None]:
EXECUTE FUNCTION INTO ART(ART_ESTSales)
TD_ARIMAESTIMATE(
    SERIES_SPEC(
        TABLE_NAME(aggregated_sales_td),
        ROW_AXIS(TIMECODE("day_of_sale")),
        SERIES_ID(seriesID),
        PAYLOAD(
            FIELDS("total_sales"),
            CONTENT(REAL))),
     FUNC_PARAMS(
        NONSEASONAL(MODEL_ORDER(2, 1, 8)),
        CONSTANT(1), COEFF_STATS(1), FIT_METRICS(1),
        RESIDUALS(1), ALGORITHM(CSS_MLE),  FIT_PERCENTAGE(100)
    )
);

<hr>
<p style = 'font-size:18px;font-family:Arial'><b>Extract residuals</b></p>
<p style = 'font-size:16px;font-family:Arial'>The TD_EXTRACT_RESULTS function serves the purpose of retrieving auxiliary result sets stored within an Analytical Result Table (ART). In this particular case, we focus on extracting the residuals from the ART obtained during the previous estimation step.
<br>

In [None]:
CREATE TABLE AR_RESIDUALS_Sales AS (
    EXECUTE FUNCTION
    TD_EXTRACT_RESULTS(
        ART_SPEC(
            TABLE_NAME(ART_ESTSales),
            LAYER(ARTFITRESIDUALS)
        )
    )
) WITH DATA;

In [None]:
select TOP 5 * from AR_RESIDUALS_Sales;

<p style = 'font-size:16px;font-family:Arial'>The output displayed above provides insights into the ARIMA model's actual values, calculated values, and residuals. In this context, the actual value represents the actual sales, reflecting the real-world data.
<br>
<br>
The calculated value corresponds to the values generated by the ARIMA model during the estimation phase. These calculated values are based on the model's learned patterns, relationships, and parameters derived from the training data. The residual value represents the discrepancy or difference between the actual value and the calculated value. It quantifies the model's prediction error or the extent to which the model's estimates deviate from the actual observations.
<br>
<br>
In the following cell, we extract additional metrics from the estimate phase i.e. TD_ARIMAESTIMATE.
</p>

In [None]:
SELECT * FROM (
    EXECUTE FUNCTION
    TD_EXTRACT_RESULTS(
        ART_SPEC(
            TABLE_NAME(ART_ESTSales),
            LAYER(ARTFITMETADATA)
        )
    )
) AS T;

<p style = 'font-size:16px;font-family:Arial'>The displayed output provides performance metrics that offer insights into the effectiveness of the trained ARIMA model. One such metric is the R-Squared value, which measures how well the model fits the data. In this instance, the R-Squared value is noted as 0.92, indicating a strong fit between the model and the data.
<br>


<hr>
<p style = 'font-size:18px;font-family:Arial'><b>Create table PLOT_ESTIMATE for plotting</b></p>
</p>
<p style = 'font-size:16px;font-family:Arial'>Here, we'll create a table which will be used to plot the actual and estimated time series.</p>

In [None]:
CREATE TABLE PLOT_ESTIMATE (DatasetID VARCHAR(10), ROW_I BIGINT, FIT_MAGNITUDE FLOAT);

In [None]:
INSERT INTO PLOT_ESTIMATE SELECT 'ActualSales', ROW_I, ACTUAL_VALUE FROM AR_RESIDUALS_sales WHERE ROW_I>1; 
INSERT INTO PLOT_ESTIMATE SELECT 'ESTIMATED', ROW_I, CALC_VALUE FROM AR_RESIDUALS_sales WHERE ROW_I>1; 

In [None]:
SELECT TOP 5 * FROM PLOT_ESTIMATE ORDER BY ROW_I;

<p style = 'font-size:16px;font-family:Arial'>The <b>TD_PLOT</b> function will return an image in the cell of the results showing the Actual and Estimated values by the fitted ARIMA model.</p>
<i>* Please <b> right click on the cell under the IMAGE column </b> from the output and choose view image to see the plot generated. </i>

In [None]:
EXECUTE FUNCTION
TD_Plot
(
    SERIES_SPEC(
        TABLE_NAME(PLOT_ESTIMATE),
        ROW_AXIS(SEQUENCE(ROW_I)),
        SERIES_ID(DataSetID),
        ID_SEQUENCE('[{"DatasetID":"ActualSale"},{"DatasetID":"ESTIMATED"}]'),
        PAYLOAD(
            FIELDS(FIT_MAGNITUDE),
            CONTENT(REAL)
        )
    ),
    FUNC_PARAMS
    (
        WIDTH(1920),
        HEIGHT(1080),
        TITLE('ARIMA ESTIMATE'),
        PLOTS[
            (
                TITLE ('ORIGINAL and ESTIMATED SERIES'),
                GRID(FORMAT('-')),
                TYPE('line'),
                SERIES[
                       (
                        ID(1),
                        FORMAT('r--')
                       ),
                       (
                        ID(2),
                        FORMAT('b-')
                       )
                     ],
                MARKER('o'),
                LEGEND('best'),
                XLABEL('X SeqNo'),
                YLABEL('Y Magnitude')
            )
        ]
    )
);

<p style = 'font-size:16px;font-family:Arial'>If you followed the instructions above, you should have seen a graph looks like follows:</p>
<img src="images/ARIMA_EST.png" alt="ARIMA Estimate" width="400" />
<p style = 'font-size:16px;font-family:Arial'>The red line indicates the actual sales, and the blue line indicates the estimated sales. This graph shows how well the ARIMA model has learned on the training dataset.</p>

<hr>
<p style = 'font-size:20px;font-family:Arial;color:#E37C4D'><b>5.2 Forecast step</b></p>
<p style = 'font-size:20px;font-family:Arial'><b>Forecast step in SAS</b></p>
<p style = 'font-size:16px;font-family:Arial'>In the second PROC ARIMA block, it identifies and estimates the same ARIMA model as in the first block but additionally generates forecasts for the next 30 time periods (lead=30) and stores the forecasted values in the forecasted_sales dataset in the work library. Now if run this we can see the results with the log indicating it took around 2 seconds to fit the Arima model on the aggregated data. Once the small aggregated dataset is available for the SAS procedure it executes relatively fast.<p>

<code>

<div class="alert alert-block alert-warning">  
<p style = 'font-size:18px;font-family:Arial;color:#000000'><b>Equivalent SAS Code</b>    
<p style = 'font-size:16px;color:#000000'> 
/* Forecasting future 30 values */
proc arima data=work.aggregated_data;
  identify var=total_sales(1) nlag=30;
  estimate q=1 outest=arima_params;
  forecast lead=30 out=forecasted_sales;
    run;</p>
    </div>    
</code>

<p style = 'font-size:20px;font-family:Arial'><b>Forecast Step in Vantage</b></p>
<p style = 'font-size:16px;font-family:Arial'>The TD_ARIMAFORECAST function is used to forecast a user-defined number of periods based on models fitted from the TD_ARIMAESTIMATE function.</p>
<p style = 'font-size:16px;font-family:Arial'>Here in the next cell, we use the estimated and validated model to forecast the sales for the subsequent 30 periods, i.e. next 30 days.</p>

In [None]:
EXECUTE FUNCTION INTO VOLATILE ART(ARIMA_SlsFORECAST)
TD_ARIMAFORECAST(
           ART_SPEC(TABLE_NAME(ART_ESTSales)),
           FUNC_PARAMS(FORECAST_PERIODS(30)));

In [None]:
SELECT TOP 5 * FROM ARIMA_SlsFORECAST;

<p style = 'font-size:16px;font-family:Arial'>The above output shows us the forecasted value for the next 30 days. Observe that we also have forecasted values with 80% and 95% confidence.</p>

<hr>
<p style = 'font-size:18px;font-family:Arial'><b>Create table PLOT_FORECAST for plotting</b></p>
<p style = 'font-size:16px;font-family:Arial'>Here, we'll create a table which will be used to plot the forecasted sales in the next 30 days.</p>

In [None]:
CREATE TABLE PLOT_FORECAST (DatasetID VARCHAR(16), ROW_I BIGINT, FORECAST_MAGNITUDE FLOAT);

In [None]:
INSERT INTO PLOT_FORECAST   SELECT 'FORECASTED', ROW_I, FORECAST_VALUE FROM ARIMA_SlsFORECAST; 
INSERT INTO PLOT_FORECAST   SELECT 'UPPER_BOUND', ROW_I, HI_80 FROM ARIMA_SlsFORECAST ; 
INSERT INTO PLOT_FORECAST   SELECT 'LOWER_BOUND', ROW_I, LO_80 FROM ARIMA_SlsFORECAST ; 

In [None]:
SELECT * FROM PLOT_FORECAST ORDER BY ROW_I;

<p style = 'font-size:16px;font-family:Arial'>The <b>TD_PLOT</b> function will return an image in the cell of the results showing the Forecasted values by ARIMA model.</p>
<i>* Please <b> right click on the cell under the IMAGE column </b> from the output and choose view image to see the plot generated. </i>

In [None]:
EXECUTE FUNCTION
TD_Plot
(
    SERIES_SPEC(
        TABLE_NAME(PLOT_FORECAST),
        ROW_AXIS(SEQUENCE(ROW_I)),
        SERIES_ID(DataSetID),
        ID_SEQUENCE('[{"DatasetID":"FORECASTED"},{"DatasetID":"UPPER_BOUND"},{"DatasetID":"LOWER_BOUND"}]'),
        PAYLOAD(
            FIELDS(FORECAST_MAGNITUDE),
            CONTENT(REAL)
        )
    ),
    FUNC_PARAMS
    (
        WIDTH(1920),
        HEIGHT(1080),
        TITLE('ARIMA FORECAST'),
        PLOTS[
            (
                TITLE ('Forecast'),
                GRID(FORMAT('-')),
                TYPE('line'),
                SERIES[
                       (
                        ID(1),
                        FORMAT('r--')
                       ),
                       (
                        ID(2),
                        FORMAT('b-')
                       ),
                        (
                        ID(3),
                        FORMAT('b-')
                       )
                     ],
                MARKER('o'),
                LEGEND('best'),
                XLABEL('X SeqNo'),
                YLABEL('Y Magnitude')
            )
        ]
    )
);

<p style = 'font-size:16px;font-family:Arial'>If you followed the instructions above, you should have seen a graph that looks like follows:</p>
<img id="fig6" src="images/ARIMA_FORECAST.png" alt="ARIMA Forecast" width="400" />
<p style = 'font-size:16px;font-family:Arial'>The red line is the Forecasted Sales for the next 30 days, and the blue lines are the upper and lower confidence interval with an 80% confidence level. As seen in the original Sales graph, the sales have dropped below 1000 for the latest period. Similar sales can be observed in the forecast period, which varies around 1000.</p>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>Conclusion:</b></p>
<p style = 'font-size:16px;font-family:Arial'>
After training and validating the ARIMA model on the Sales dataset, we observe that the model's predictions closely align with the actual data. This indicates that the model has successfully learned the underlying patterns and relationships within the dataset.
<br>
<br>
Moving large amounts of data between Teradata and SAS is usually the main culprit for slow running jobs and complex analytics pipelines amplifying governance issues from orphaned and exposed data in SAS environments. By executing the complete flow inside Vantage using ClearScape Analytics we are reducing the complexity and achieving greater efficiency.  </p>

<hr>
<b style = 'font-size:22px;font-family:Arial;color:#E37C4D'>6. Cleanup</b>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial'>Cleanup work tables to prevent errors next time. This section drops all the tables created during the demonstration.</p>

In [None]:
DROP TABLE sales_aggregated_data_amt;

In [None]:
DROP TABLE sales_aggregated_data_qty;

In [None]:
DROP TABLE merged_sales_data;

In [None]:
DROP TABLE aggregated_sales_td;

In [None]:
DROP TABLE ART_ESTSales;

In [None]:
DROP TABLE AR_RESIDUALS_Sales;

In [None]:
DROP TABLE PLOT_ESTIMATE;

In [None]:
DROP TABLE ARIMA_SlsFORECAST;

In [None]:
DROP TABLE PLOT_FORECAST;

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'> <b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial'>The following code will clean up tables and databases created above.</p>

In [None]:
call remove_data('DEMO_SlsForecast_SAS')        -- Takes 5 seconds

<p style = 'font-size:16px;font-family:Arial;color:#E37C4D'><b>Links:</b></p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li>UAF(Unbounded Array Framework) Documentation: <a href = 'https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-VantageTM-Unbounded-Array-Framework-Time-Series-Reference-17.20/Unbounded-Array-Framework'>https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-VantageTM-Unbounded-Array-Framework-Time-Series-Reference-17.20/Unbounded-Array-Framework</a></li>
</ul>

<footer style="padding:10px;background:#f9f9f9;border-bottom:3px solid #394851">Copyright © Teradata Corporation - 2023. All Rights Reserved.</footer>