# SOAM FLOW RUN QUICKSTART

In the following notebook we present a simple quickstart to expose how to make a connection with a database, extract the data, transform it, generate a forecast, plot it and send a mail report using soam modules and methods in simple steps by using our soam flow.

To see these modules and methods in a disaggregated manner and further explained check the [quickstart notebook](notebook/examples/quickstart.ipynb)

Our Soam Flows are based on Prefect Flows. To see further info on these please check this [prefect docs](https://docs.prefect.io/core/concepts/flows.html).

In [3]:
from soam.workflow.time_series_extractor import TimeSeriesExtractor
from muttlib.dbconn import get_client
import pandas as pd
from soam.workflow import Transformer
from sklearn.preprocessing import MinMaxScaler
import numpy as np
from soam.workflow.forecaster import Forecaster
from soam.models import SkProphet
from soam.utilities.utils import add_future_dates
import matplotlib.pyplot as plt
from soam.reporting import mail_report
import datetime
from soam.core import SoamFlow
from prefect import task
from soam.plotting.forecast_plotter import ForecastPlotterTask
from soam.constants import PLOT_CONFIG
from copy import deepcopy
from pathlib import Path

## Extraction
DB Connection using `muttlib`. <br>
`SQL Query` constructed <br>
`SOAM Extractor` object initialization.

In [4]:
pg_cfg = {
    "host": "localhost",
    "port": 5432,
    "db_type": "postgres",
    "username": "mutt",
    "password": "mutt",
    "database": "sqlalchemy"
}
pg_client = get_client(pg_cfg)[1]

In [5]:
build_query_kwargs={
    'columns': '*',
    'timestamp_col': 'date',
    'start_date': "2021-03-01",
    'end_date': "2021-03-20",
    'extra_where_conditions': ["symbol = 'AAPL'"],
    'order_by': ["date ASC"]
}

In [6]:
extractor = TimeSeriesExtractor(db=pg_client, table_name='stocks_valuation')

## Preprocessing
`SOAM Transformer` object initialization. <br>
Tasks created upon functions based on custom transformations.

In [7]:
scaler = MinMaxScaler()
ts = Transformer(transformer = scaler)

@task()
def transform_df_for_scaler(df: pd.DataFrame):
    data = np.array([df.avg_price])
    data = np.swapaxes(data, 0, 1)
    return data

@task()
def transform_df_format(df: pd.DataFrame):    
    df = df[['date', 'avg_price']]
    df.rename(columns = {
        'date': 'ds',
        'avg_price': 'y'}, inplace = True)
    df.ds =  pd.to_datetime(df.ds, infer_datetime_format=True)
    df = add_future_dates(df, periods=7, frequency="d")
    return df

## Forecasting
Forecasting model selected: `FBProphet`. <br>
`SOAM Forecaster` object initialization.

In [8]:
my_model = SkProphet(weekly_seasonality=False, daily_seasonality=False)
forecaster = Forecaster(my_model, output_length=7)

## Postprocessing
Postprocessing tasks based on functions for custom transformations.

## Plotting and Reporting
Plotting task based on the `SOAM Forecast Plotter` object.

In [9]:
plot_config = deepcopy(PLOT_CONFIG)
forecast_plotter = ForecastPlotterTask(path = Path('img/applestockprice'), metric_name = 'Retail Sales', plot_config = plot_config)

`SOAM Mail Report` object initialization.

In [10]:
mr = mail_report.MailReportTask(
    # recipients mails separated by commas
    mail_recipients_list = ["scafatieugenio@gmail.com"],
    # the metric name will be in the title
    metric_name = "Stocks Forecast" 
)

# this refer to the start and end of the historic values. they are used since they are part of plot filename. format: yyyymmddhh.
start_date='2021030100'
end_date='2021031900'

# SoaMFlow

Putting all together using `SoaMFlow`.

In [11]:
with SoamFlow(name = "t") as t:
    # EXTRACTION
    df = extractor(build_query_kwargs)
    # PRE PROCESSING
    data = transform_df_for_scaler(df = df)
    df.avg_price = ts(data)[0]
    df = transform_df_format(df = df)
    # FORECASTING
    predictions, time_series, model = forecaster(time_series=df)
    # PLOTTING
    forecast_plotter(forecaster.time_series, forecaster.prediction)
    # REPORTING
    mr(
    current_date = "2021-04-12",
    plot_filename = f'img/applestockprice/0_forecast_{start_date}_{end_date}_.png'
)

In [12]:
t.run()

[2021-04-12 12:55:46-0300] INFO - prefect.FlowRunner | Beginning Flow run for 't'
[2021-04-12 12:55:46-0300] INFO - prefect.TaskRunner | Task 'ForecastPlotterTask': Starting task run...
[2021-04-12 12:55:46-0300] ERROR - prefect.TaskRunner | Unexpected error: KeyError('ds')
Traceback (most recent call last):
  File "/home/scafati98/MUTT/soam/quickstart_env/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2889, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 97, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'ds'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
