# TE_Connect Sales Forcasting

Author: Alexiy Buynitsky

### Configuration

Run Through the following commands, and then alter the:
- `src/data_constants.py`
  -  Control data inputs,outputs, prediction, foracsting...
- `src/model_constants.py`
  - Control model architecture, lstm cells, hidden layers, dropout...
- `src/filepath_constants.py`
  - Control file system, verbose mode...

You then only have to run the last command and use the [MLflow UI](https://dagshub.com/Abuynits/TE_Connect.mlflow) to track your experiments!



## Imports

In [1]:
#!pip3 install -r requirements.txt &> /dev/null
!pip3 install pytorch_lightning &> /dev/null
# install all libraries
!pip install dvc &> /dev/null
!pip install dagshub &> /dev/null
!pip3 install mlflow &> /dev/null

In [3]:
import requests
import getpass
import datetime

In [4]:
#@title Set Environment Variables

#@markdown Enter the repository name for the project:
REPO_NAME= "TE_Connect" #@param {type:"string"}

#@markdown Enter the username of your DAGsHub account:
USER_NAME = "Abuynits" #@param {type:"string"}

#@markdown Enter the email for your DAGsHub account:
EMAIL = "abuynits@gmail.com" #@param {type:"string"}

#@markdown ---

In [5]:
# create a personal access token
r = requests.post('https://dagshub.com/api/v1/user/tokens', 
                  json={"name": f"colab-token-{datetime.datetime.now()}"}, 
                  auth=(USER_NAME, getpass.getpass('DAGsHub password:')))
r.raise_for_status()
TOKEN=r.json()['sha1']

DAGsHub password:··········


## Generating SSH Key for github

In [6]:
!ssh-keygen -t rsa -b 4096

Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa
Your public key has been saved in /root/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:RBBq6okish2EF0AjlkFy5WMXBs29sFvJcEVruoFGF5k root@07def59b38d8
The key's randomart image is:
+---[RSA 4096]----+
|**+.o+=+o=o      |
|++.. o=oE. .     |
|  . * o*ooo      |
| . = +.+=o       |
|. +   ooS        |
| = . ..  o       |
|= +     .        |
|+o .             |
|. .              |
+----[SHA256]-----+


hit enter for all to use default file location. 

**NOTE**: will be deleted at the end of the notebook session

In [7]:
!ssh-keyscan -t rsa github.com >> ~/.ssh/known_hosts



# github.com:22 SSH-2.0-babeld-2ef5a5a9


Print the ssh_id and add it to github keys. Will only be valid for this notebook session

In [8]:
!cat /root/.ssh/id_rsa.pub

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDjiWIAKUpx0X2ojjK7ApzTVBqmlSDXAbwHSXwwdzyuY15I8voEbH4NDNlqdCqGOpw4pOYXv4pi2F3sN55tOoTTijeO3Gc//8pvIrTM3rzF4uYYfiNtoFZsHiRwBeT4lagVcabOCJFh9iuI1EUi6HXwsVWSir/jEyvS4tm1GdkNNJu5FzbKSCXB+Q2pghBXeNgrmtPkXX+sq4noypNyZqtYT+u94tZA1LoP9rPyPUmDnn1RQlNQfCFzS+zgpCIeBcMrr6GCV64nfYS701p8bBAr03wPDd3K0E1/SePce6W9v/481F+0O3u/01ytAzD0pHb+x5a//CVnZfKaNjwhlnmJAZCfupjjuRS9iZoKt3DBHz3xUl0gL22Nv+JxsyzSOyTligSq/ohThNXvtgWATcSsLabbO4KHRJ9CbzIqTzCKVo1G5d7eb7a9q+otyQDIYxu6Olt3Rhjoyklaiqh/bLHanlWNWrK6dy4Ou499zrDlIGRwjMOlW8lGdn0zh0bRa+S9sTZg8ofOkWDKgO+ycJDPxZH0+o94sAKzH6iS9Qzq7jJaCv1q+ioTDOCqkxRsunAhgoB42FOWoufL4JGu5mOpLS1ed8aQ7wxBF9mfzf1FiWLPYhJ7EI9StTprCnMM/g7T+AiLYiLIc02EQeGzdmqJWxwktBcVsQQYKJT6R5YlXw== root@07def59b38d8


In [9]:
!ssh -T git@github.com # test the ssh id  

Hi Abuynits! You've successfully authenticated, but GitHub does not provide shell access.


In [10]:
# configure github usernames
!git config --global user.email {REPO_NAME} 
!git config --global user.name {USER_NAME}

In [11]:
!git clone git@github.com:Abuynits/TE_Connect.git

Cloning into 'TE_Connect'...
remote: Enumerating objects: 660, done.[K
remote: Counting objects: 100% (169/169), done.[K
remote: Compressing objects: 100% (125/125), done.[K
remote: Total 660 (delta 110), reused 97 (delta 43), pack-reused 491[K
Receiving objects: 100% (660/660), 63.54 MiB | 7.83 MiB/s, done.
Resolving deltas: 100% (347/347), done.


In [12]:
%cd TE_Connect

/content/TE_Connect


## Initialize DVC!

In [13]:
!dvc remote modify origin --local auth basic 
!dvc remote modify origin --local user {USER_NAME} 
!dvc remote modify origin --local password {TOKEN}

[0m[0m[0m

In [14]:
!dvc pull --remote origin

name: data/dicts/validation_dict.pkl, md5: a4f898c0c6449dc10167c14c77e09d89
name: data/model_io/test_y.pkl, md5: d25cac3447bfdead313dea1377ceb717
name: data/model_io/validation_x.pkl, md5: f98d83ac13933adac7eb68a2156d56e6
name: data/model_io/validation_y.pkl, md5: c2e645e1cbd1441ddecc1a20859850e1
name: data/model_io/train_x.pkl, md5: df34654782d9068b7a7161247be93f6d
name: data/dicts/train_dict.pkl, md5: 27011e6457f33460d8d3e90b14aa815c
name: data/model_io/test_x.pkl, md5: b6c0e7709bba19e0985423b586a9a080
name: data/dicts/test_dict.pkl, md5: 509e355771ac85cc42a1445fb07f6876
name: data/model_io/train_y.pkl, md5: c5c17ba4762d3460fe794f03a4dd4bda
Transferring:   0% 0/6 [00:00<?, ?file/s{'info': ''}]  
![A
  0%|          |https://dagshub.com/Abuynits/TE_Con0.00/? [00:00<?,        ?B/s][A
  0%|          |https://dagshub.com/Abuynits/TE_Con0.00/? [00:00<?,        ?B/s][A
  0%|          |https://dagshub.com/Abuynits/T89.2k/? [00:00<00:00,     752kB/s][A
  0%|          |https://dagshub.com/

#TESTING

run the following cell after alterning any of the constant files. Track the progress live in the MLflow UI.

In [15]:
!pip3 install pandas==1.4.1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pandas==1.4.1
  Downloading pandas-1.4.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.7/11.7 MB[0m [31m23.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pandas
  Attempting uninstall: pandas
    Found existing installation: pandas 1.3.5
    Uninstalling pandas-1.3.5:
      Successfully uninstalled pandas-1.3.5
Successfully installed pandas-1.4.1


In [16]:
!python3 src/data_prep.py

Traceback (most recent call last):
  File "src/data_prep.py", line 1, in <module>
    from visualization import *
  File "/content/TE_Connect/src/visualization.py", line 1, in <module>
    from model_constants import *
  File "/content/TE_Connect/src/model_constants.py", line 1, in <module>
    from data_constants import *
  File "/content/TE_Connect/src/data_constants.py", line 1, in <module>
    from filepaths_constants import *
  File "/content/TE_Connect/src/filepaths_constants.py", line 7, in <module>
    import torch
  File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 218, in <module>
    from torch._C import *  # noqa: F403
RuntimeError: KeyboardInterrupt: <EMPTY MESSAGE>


In [26]:
!dvc repro

!If DVC froze, see `hardlink_lock` in <[36mhttps://man.dvc.org/config#core[39m>                                                                      'data/te_ai_cup_sales_forecasting_data.csv.dvc' didn't change, skipping
Stage 'data_prep' didn't change, skipping
Running stage 'train_model':
> python3 src/train.py
reading data from files..
creating datasets...
batches in train dl: 58
batches in valid dl: 17
batches in test dl: 6
 Input seq:
 tensor([[0.0575, 0.0418, 0.5470],
        [0.2281, 0.2498, 0.2887],
        [0.3546, 0.3473, 0.3743],
        [0.3596, 0.3664, 0.3429],
        [0.1309, 0.1490, 0.2641],
        [0.2177, 0.1982, 0.4298],
        [0.0442, 0.0828, 0.0405],
        [0.4069, 0.4995, 0.2079],
        [0.3057, 0.3958, 0.1751],
        [0.5903, 0.7651, 0.1708]], device='cuda:0')
 target seq:
 tensor([[5.9034e-01, 7.6510e-01, 1.7078e-01],
        [4.9964e-02, 5.5491e-02, 2.8533e-01],
        [1.5955e-02, 6.2382e-04, 8.4930e-01],
        [7.9892e-02, 8.7558e-02, 2.9076

In [27]:
!MLFLOW_TRACKING_URI=https://dagshub.com/Abuynits/TE_Connect.mlflow \
MLFLOW_TRACKING_USERNAME={USER_NAME} \
MLFLOW_TRACKING_PASSWORD={TOKEN}  \
dvc repro

!If DVC froze, see `hardlink_lock` in <[36mhttps://man.dvc.org/config#core[39m>                                                                      'data/te_ai_cup_sales_forecasting_data.csv.dvc' didn't change, skipping
Stage 'data_prep' didn't change, skipping
Running stage 'train_model':
> python3 src/train.py
reading data from files..
creating datasets...
batches in train dl: 58
batches in valid dl: 17
batches in test dl: 6
 Input seq:
 tensor([[0.0912, 0.0529, 0.4103],
        [0.2691, 0.2661, 0.1836],
        [0.2525, 0.2508, 0.1836],
        [0.2691, 0.2661, 0.1836],
        [0.4191, 0.4038, 0.1836],
        [0.1162, 0.1454, 0.1298],
        [0.1363, 0.1655, 0.1298],
        [0.1363, 0.1655, 0.1298],
        [0.1463, 0.1755, 0.1298],
        [0.1787, 0.2675, 0.0325]], device='cuda:0')
 target seq:
 tensor([[0.1787, 0.2675, 0.0325],
        [0.1941, 0.2861, 0.0326],
        [0.1787, 0.2675, 0.0325],
        [0.1787, 0.2675, 0.0325],
        [0.2249, 0.3233, 0.0326]], device