<a href="https://colab.research.google.com/github/daniel0076/EvaDBFinancialForecasting/blob/main/main.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CS6422 EvaDB Profiling on Financial Forecasting App

This project builds a handy profiling tool for EvaDB, and demonstrate the basic profiling results using a financial forecasting application using S&P500 stock data.

## Build an forecasting application with EvaDB

### Install dependencies

In [None]:
!apt-get install -y postgresql
!service postgresql start

### Create database and install EvaDB

In [None]:
!sudo -u postgres psql -c "CREATE USER eva WITH SUPERUSER PASSWORD 'password'"
!sudo -u postgres psql -c "CREATE DATABASE evadb"

CREATE ROLE
CREATE DATABASE


### Install the EvaDB with profiling module

This variant is based on EvaDB v0.3.7

In [None]:
%pip install "evadb[postgres,forecasting] @ git+https://github.com/daniel0076/evadb.git@profiling"

import evadb and create the database. Here we have a query, and we can see the profiling results in the output. There are two segments: the query parsing, and the execution.

In [None]:
import evadb
cursor = evadb.connect().cursor()
params = {
    "user": "eva",
    "password": "password",
    "host": "localhost",
    "port": "5432",
    "database": "evadb",
}
query = f"CREATE DATABASE postgres_data WITH ENGINE = 'postgres', PARAMETERS = {params};"
cursor.query(query).df()

evadb.interfaces.relational.db.parse_query-start: 02:05:26.745491
evadb.interfaces.relational.db.parse_query-end:   02:05:26.756328, elapsed time(us): 0:00:00.010837
evadb.interfaces.relational.relation.execute-start: 02:05:26.757178
evadb.interfaces.relational.relation.execute-end:   02:05:26.847531, elapsed time(us): 0:00:00.090353


Unnamed: 0,0
0,The database postgres_data has been successful...


### Download the dateset and perform data cleaning

We need to

1. Remove all the null data
2. Remove duplicated rows

In [None]:
!mkdir -p data
!wget -qnc -O data/sp500.zip https://github.com/CNuge/kaggle-code/raw/master/stock_data/individual_stocks_5yr.zip
!wget -qnc -O data/merge.sh https://github.com/CNuge/kaggle-code/raw/master/stock_data/merge.sh
!cd data && unzip sp500.zip
!cd data && sh merge.sh

In [None]:
import pandas as pd
df = pd.read_csv("/content/data/all_stocks_5yr.csv")
df.dropna(inplace=True) # To remove null data
df.drop_duplicates(inplace=True)
df.set_index('date', inplace=True) # Set the 'date' column as the index
df.to_csv("/content/data/stock_cleaned.csv")

### Create the table for the data in database

> Note that, when using forecasting with EvaDB based on [`statsforcast`](https://github.com/Nixtla/statsforecast), the `date` column need to be `VARCHAR` format rather than `DATE`, otherwise we got an error that `DATE` type is not supported in EvaDB

Here we can also see the profiling result of the query and the execution

In [None]:
cursor.query("""
  USE postgres_data {
    CREATE TABLE sp500 (
      date VARCHAR(64) NOT NULL,
      open NUMERIC(10, 2) NOT NULL,
      high NUMERIC(10, 2) NOT NULL,
      low NUMERIC(10, 2) NOT NULL,
      close NUMERIC(10, 2) NOT NULL,
      volume INT NOT NULL,
      name VARCHAR(255) NOT NULL
    )
  }
""").df()

evadb.interfaces.relational.db.parse_query-start: 02:05:38.132445
evadb.interfaces.relational.db.parse_query-end:   02:05:38.136997, elapsed time(us): 0:00:00.004552
evadb.interfaces.relational.relation.execute-start: 02:05:38.137946
evadb.interfaces.relational.relation.execute-end:   02:05:38.183993, elapsed time(us): 0:00:00.046047


Unnamed: 0,status
0,success


### Load the cleaned data from the CSV into database and EvaDB

In [None]:
cursor.query("""
  USE postgres_data {
    COPY sp500(date, open, high, low, close, volume, name)
    FROM '/content/data/stock_cleaned.csv'
    DELIMITER ',' CSV HEADER
  }
""").df()

evadb.interfaces.relational.db.parse_query-start: 02:05:38.205572
evadb.interfaces.relational.db.parse_query-end:   02:05:38.206191, elapsed time(us): 0:00:00.000619
evadb.interfaces.relational.relation.execute-start: 02:05:38.206290
evadb.interfaces.relational.relation.execute-end:   02:05:40.274751, elapsed time(us): 0:00:02.068461


Unnamed: 0,status
0,success


> We can preview the data with SQL

In [None]:
cursor.query("SELECT * FROM postgres_data.sp500 LIMIT 3;").df()

evadb.interfaces.relational.db.parse_query-start: 02:05:40.301564
evadb.interfaces.relational.db.parse_query-end:   02:05:40.327032, elapsed time(us): 0:00:00.025468
evadb.interfaces.relational.relation.execute-start: 02:05:40.329191


evadb.interfaces.relational.relation.execute-end:   02:05:50.637330, elapsed time(us): 0:00:10.308139


Unnamed: 0,sp500.low,sp500.open,sp500.high,sp500.close,sp500.volume,sp500.date,sp500.name
0,14.63,15.07,15.12,14.75,8407500,2013-02-08,AAL
1,14.26,14.89,15.01,14.46,8882000,2013-02-11,AAL
2,14.1,14.45,14.51,14.27,8126000,2013-02-12,AAL


## Analysis Data with EvaDB, here we try to predict the closing price of each symbol

+ `PREDICT`: the column to predict
+ `TIME`: The column for time series data
+ `ID`: The identifier to group data (for multiple time series)

In [None]:
cursor.query("DROP FUNCTION IF EXISTS stockForecast;").df()
cursor.query("""
  CREATE FUNCTION IF NOT EXISTS stockForecast FROM
    (
      SELECT name, date, close
      FROM postgres_data.sp500
    )
  TYPE Forecasting
  PREDICT 'close'
  TIME 'date'
  ID 'name'
  FREQUENCY 'D'
""").df()

evadb.interfaces.relational.db.parse_query-start: 02:05:50.661820
evadb.interfaces.relational.db.parse_query-end:   02:05:50.663823, elapsed time(us): 0:00:00.002003
evadb.interfaces.relational.relation.execute-start: 02:05:50.663974
evadb.interfaces.relational.relation.execute-end:   02:05:50.695408, elapsed time(us): 0:00:00.031434
evadb.interfaces.relational.db.parse_query-start: 02:05:50.695814
evadb.interfaces.relational.db.parse_query-end:   02:05:50.705069, elapsed time(us): 0:00:00.009255
evadb.interfaces.relational.relation.execute-start: 02:05:50.706575


evadb.interfaces.relational.relation.execute-end:   02:09:51.439331, elapsed time(us): 0:04:00.732756


Unnamed: 0,0
0,Function stockForecast added to the database.


### Use the model to predict a symbol

For example we want to find the price of `NVDA` in the upcoming 5 days

In [None]:
cursor.query("SELECT * FROM (SELECT stockForecast(5)) AS S WHERE name = 'NVDA' ORDER BY date ;").df()

evadb.interfaces.relational.db.parse_query-start: 02:09:51.465943
evadb.interfaces.relational.db.parse_query-end:   02:09:51.509344, elapsed time(us): 0:00:00.043401
evadb.interfaces.relational.relation.execute-start: 02:09:51.509597
evadb.interfaces.relational.relation.execute-end:   02:09:53.479998, elapsed time(us): 0:00:01.970401


Unnamed: 0,S.name,S.date,S.close
0,NVDA,2018-02-08,229.091492
1,NVDA,2018-02-09,229.518341
2,NVDA,2018-02-10,229.938919
3,NVDA,2018-02-11,230.359802
4,NVDA,2018-02-12,230.78067
