# TE_Connect Sales Forcasting

Author: Alexiy Buynitsky

### Configuration

Run Through the following commands, and then alter the:
- `src/data_constants.py`
  -  Control data inputs,outputs, prediction, foracsting...
- `src/model_constants.py`
  - Control model architecture, lstm cells, hidden layers, dropout...
- `src/filepath_constants.py`
  - Control file system, verbose mode...

You then only have to run the last command and use the [MLflow UI](https://dagshub.com/Abuynits/TE_Connect.mlflow) to track your experiments!



## Imports

In [1]:
#!pip3 install -r requirements.txt &> /dev/null
!pip3 install pytorch_lightning &> /dev/null
# install all libraries
!pip install dvc &> /dev/null
!pip install dagshub &> /dev/null
!pip3 install mlflow &> /dev/null

In [2]:
import requests
import getpass
import datetime

In [3]:
#@title Set Environment Variables

#@markdown Enter the repository name for the project:
REPO_NAME= "TE_Connect" #@param {type:"string"}

#@markdown Enter the username of your DAGsHub account:
USER_NAME = "Abuynits" #@param {type:"string"}

#@markdown Enter the email for your DAGsHub account:
EMAIL = "abuynits@gmail.com" #@param {type:"string"}

#@markdown ---

In [4]:
# create a personal access token
r = requests.post('https://dagshub.com/api/v1/user/tokens', 
                  json={"name": f"colab-token-{datetime.datetime.now()}"}, 
                  auth=(USER_NAME, getpass.getpass('DAGsHub password:')))
r.raise_for_status()
TOKEN=r.json()['sha1']

DAGsHub password:··········


## Generating SSH Key for github

In [5]:
!ssh-keygen -t rsa -b 4096

Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa
Your public key has been saved in /root/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:YbpVKAByx5L3YD9RGsdAi3UJg1bof4FlED9H6lD+iIA root@30e8bc8fe902
The key's randomart image is:
+---[RSA 4096]----+
|. o+o=@B=..      |
| oooO+.@=+       |
|  E*o+==O o      |
|    ..=B.B       |
|     .o.S..      |
|      .o.        |
|      ..         |
|                 |
|                 |
+----[SHA256]-----+


hit enter for all to use default file location. 

**NOTE**: will be deleted at the end of the notebook session

In [6]:
!ssh-keyscan -t rsa github.com >> ~/.ssh/known_hosts

# github.com:22 SSH-2.0-babeld-dd5cf4e7


Print the ssh_id and add it to github keys. Will only be valid for this notebook session

In [7]:
!cat /root/.ssh/id_rsa.pub

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCdHb0sSdYfA4iM6N33buQTXS0LdZV+nfzThkansaQQAQhFMhGe1NDnNgOdUlSkJ7QBwG2z0Sdi+Jbh6efvKfk3ztyfLo6UluJa0vEBvNjbDwrVrN6yVroYZ2N2F/fAcqqWDEOM+6u2PHpmF/v5vf+oNvaolsRWDm+5qRX2buBtoObsc41IY3g6ExuVgJj2omp0zffFTpfzABfe6gc/P0CQp23k0yP2Z4PB470bPWmkvHUOcobWEfSWqMpNTI/4FTa1SjfMgA+tpbKFmJLNtMJbIwT8chQE04Gxr7eXQ7mn9gJ4BOFWgw9D8+WC5tcgTSkekOjkLeNFkdY9+qcTNFPJqYJ6LavlJI1q3opcdHpyB+7Ekm2vN05/FlFS5EgyM5Pjp8e5jWGwcUT12hRxs1wKMG2154ZfxQiYll/mKNTlccdEBGXtXOmtUO4TmhFmYZ9WWO3VMJ+In0GdSOVSHRHeqNIcsc+25PnT7T/CcJCxD7xVcsPOdqzxT78QeoPPV16WunLq2407TjhdU+hYUgYuRUZWM78Dn+X6xuPlI+Ty5Q0dQYct2oKTqbtFoOerE3kYwGCMRckxpUMG+bfiZkKdTrAxpmpFawYh0LNIi6DvnIOh+4VQP0iMKYy3Fi7NOKBSkF22oocZS9JmrJJEDiNpfuvlCwxgOMMp9sxjcondSw== root@30e8bc8fe902


In [8]:
!ssh -T git@github.com # test the ssh id  

Hi Abuynits! You've successfully authenticated, but GitHub does not provide shell access.


In [9]:
# configure github usernames
!git config --global user.email {REPO_NAME} 
!git config --global user.name {USER_NAME}

In [10]:
!git clone git@github.com:Abuynits/TE_Connect.git

Cloning into 'TE_Connect'...
remote: Enumerating objects: 902, done.[K
remote: Counting objects: 100% (411/411), done.[K
remote: Compressing objects: 100% (287/287), done.[K
remote: Total 902 (delta 245), reused 269 (delta 106), pack-reused 491[K
Receiving objects: 100% (902/902), 70.46 MiB | 21.11 MiB/s, done.
Resolving deltas: 100% (482/482), done.


In [11]:
%cd TE_Connect

/content/TE_Connect


## Initialize DVC!

In [12]:
!dvc remote modify origin --local auth basic 
!dvc remote modify origin --local user {USER_NAME} 
!dvc remote modify origin --local password {TOKEN}

[0m[0m[0m

In [13]:
!dvc pull --remote origin

name: None, md5: aa99ef38edd4fb7d355faf4a997f4510.dir
name: data/model_io/train_y.pkl, md5: 9dcdfc9d75d0137c503338f12dd7be07
name: data/model_io/test_x.pkl, md5: 560b74e616b8e74c809f2c9ef5f38465
name: data/model_io/train_x.pkl, md5: 841614f8f2799b86bccb66578bf699f8
name: data/model/model.pkl, md5: 703bd0c6a2a883e5f70f39b855db3889
name: data/dicts/validation_dict.pkl, md5: 2c0ffdd4915078a1d937e181b91e5f8a
name: data/model_io/validation_y.pkl, md5: 7f5487103b279395e9969b0bfc1ab034
name: data/dicts/test_dict.pkl, md5: 6d8f1e486c373bcd8934ef4d4b24e3a2
name: data/model_io/validation_x.pkl, md5: 7f4b40f15e1c4bf95a096fe4e0f39c9b
name: data/dicts/train_dict.pkl, md5: 60c1bc509f5c7b43ab25235679caa051
name: data/model_io/validation_target.pkl, md5: def246e4c95aa49b1b57dd10e11e89eb
name: data/model_io/test_target.pkl, md5: cdc3646d3f3681cea6fb9fa8ea420960
name: data/model_io/train_target.pkl, md5: 248396a4e4b2e35663f8f666dea6763f
name: data/model_io/test_y.pkl, md5: 641162d9c415f9c44f1c8a11a4100d

#TESTING

run the following cell after alterning any of the constant files. Track the progress live in the MLflow UI.

In [14]:
!pip3 install pandas==1.4.1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pandas==1.4.1
  Downloading pandas-1.4.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.7/11.7 MB[0m [31m85.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pandas
  Attempting uninstall: pandas
    Found existing installation: pandas 1.3.5
    Uninstalling pandas-1.3.5:
      Successfully uninstalled pandas-1.3.5
Successfully installed pandas-1.4.1


In [None]:
!python3 src/data_prep.py


(269, 100, 3)
[[0.09330867 0.30794779 0.01880223]
 [0.12934242 0.40937575 0.01880223]
 [0.12934242 0.40937575 0.01880223]
 [0.14135367 0.44318507 0.01880223]
 [0.12595828 0.32126241 0.06080032]
 [0.15007859 0.37674786 0.06080029]
 [0.13399837 0.33975756 0.06080029]
 [0.13399837 0.33975756 0.06080029]
 [0.12595828 0.32126241 0.06080032]
 [0.10004489 0.13856517 0.22845256]
 [0.10511884 0.14531359 0.22845259]
 [0.11526675 0.15881044 0.22845262]
 [0.0797491  0.11157148 0.22845259]
 [0.06116611 0.19359289 0.03884595]
 [0.18579016 0.5105555  0.03884592]
 [0.20045181 0.54784522 0.03884592]
 [0.18579016 0.5105555  0.03884592]
 [0.144821   0.43447522 0.02626755]
 [0.144821   0.43447522 0.02626755]
 [0.13664839 0.41235042 0.02626755]
 [0.12030315 0.36810084 0.02626753]
 [0.12847577 0.39022563 0.02626754]
 [0.10216671 0.16536023 0.17971749]
 [0.10216671 0.16536023 0.17971749]
 [0.12796085 0.20446136 0.17971754]
 [0.21179171 0.33154003 0.17971753]
 [0.10695874 0.217705   0.1162917 ]
 [0.14215943 

In [40]:
!dvc repro

!If DVC froze, see `hardlink_lock` in <[36mhttps://man.dvc.org/config#core[39m>                                                                      'data/te_ai_cup_sales_forecasting_data.csv.dvc' didn't change, skipping
Running stage 'data_prep':
> python3 src/data_prep.py

(269, 100, 3)
[[0.09330867 0.30794779 0.01880223]
 [0.12934242 0.40937575 0.01880223]
 [0.12934242 0.40937575 0.01880223]
 [0.14135367 0.44318507 0.01880223]
 [0.12595828 0.32126241 0.06080032]
 [0.15007859 0.37674786 0.06080029]
 [0.13399837 0.33975756 0.06080029]
 [0.13399837 0.33975756 0.06080029]
 [0.12595828 0.32126241 0.06080032]
 [0.10004489 0.13856517 0.22845256]
 [0.10511884 0.14531359 0.22845259]
 [0.11526675 0.15881044 0.22845262]
 [0.0797491  0.11157148 0.22845259]
 [0.06116611 0.19359289 0.03884595]
 [0.18579016 0.5105555  0.03884592]
 [0.20045181 0.54784522 0.03884592]
 [0.18579016 0.5105555  0.03884592]
 [0.144821   0.43447522 0.02626755]
 [0.144821   0.43447522 0.02626755]
 [0.13664839 0.41235

In [15]:
!MLFLOW_TRACKING_URI=https://dagshub.com/Abuynits/TE_Connect.mlflow \
MLFLOW_TRACKING_USERNAME={USER_NAME} \
MLFLOW_TRACKING_PASSWORD={TOKEN}  \
dvc repro

'data/te_ai_cup_sales_forecasting_data.csv.dvc' didn't change, skipping
Running stage 'data_prep':
> python3 src/data_prep.py

(269, 100, 3)
[[0.09330867 0.30794779 0.01880223]
 [0.12934242 0.40937575 0.01880223]
 [0.12934242 0.40937575 0.01880223]
 [0.14135367 0.44318507 0.01880223]
 [0.12595828 0.32126241 0.06080032]
 [0.15007859 0.37674786 0.06080029]
 [0.13399837 0.33975756 0.06080029]
 [0.13399837 0.33975756 0.06080029]
 [0.12595828 0.32126241 0.06080032]
 [0.10004489 0.13856517 0.22845256]
 [0.10511884 0.14531359 0.22845259]
 [0.11526675 0.15881044 0.22845262]
 [0.0797491  0.11157148 0.22845259]
 [0.06116611 0.19359289 0.03884595]
 [0.18579016 0.5105555  0.03884592]
 [0.20045181 0.54784522 0.03884592]
 [0.18579016 0.5105555  0.03884592]
 [0.144821   0.43447522 0.02626755]
 [0.144821   0.43447522 0.02626755]
 [0.13664839 0.41235042 0.02626755]
 [0.12030315 0.36810084 0.02626753]
 [0.12847577 0.39022563 0.02626754]
 [0.10216671 0.16536023 0.17971749]
 [0.10216671 0.16536023 0.17971

In [16]:
!dvc push -r origin

Transferring:   0% 0/9 [00:00<?, ?file/s{'info': ''}]  
![A
  0%|          |/content/TE_Connect/.dvc/cache/c2/90.00/? [00:00<?,        ?B/s][A
Transferring:  11% 1/9 [00:00<00:03,  2.64file/s{'info': ''}]
                                           [A
![A
  0%|          |/content/TE_Connect/.dvc/cache/b3/20.00/? [00:00<?,        ?B/s][A
  0% 0.00/98.9k [00:00<?, ?B/s{'info': ''}]                                     [A

![A[A

  0%|          |/content/TE_Connect/.dvc/cache/6d/40.00/? [00:00<?,        ?B/s][A[A

  0% 0.00/81.4k [00:00<?, ?B/s{'info': ''}]                                     [A[A


![A[A[A


  0%|          |/content/TE_Connect/.dvc/cache/6c/60.00/? [00:00<?,        ?B/s][A[A[A


  0% 0.00/82.9k [00:00<?, ?B/s{'info': ''}]                                     [A[A[A



![A[A[A[A



  0%|          |/content/TE_Connect/.dvc/cache/9a/e0.00/? [00:00<?,        ?B/s][A[A[A[A



  0% 0.00/74.9k [00:00<?, ?B/s{'info': ''}]                                

In [41]:
!git add src/

In [42]:
!git commit -m 'my chair is at a 45 degree angle'

[master a948b92] my chair is at a 45 degree angle
 5 files changed, 169 insertions(+), 88 deletions(-)


In [43]:
!git push origin master

Enumerating objects: 15, done.
Counting objects:   6% (1/15)Counting objects:  13% (2/15)Counting objects:  20% (3/15)Counting objects:  26% (4/15)Counting objects:  33% (5/15)Counting objects:  40% (6/15)Counting objects:  46% (7/15)Counting objects:  53% (8/15)Counting objects:  60% (9/15)Counting objects:  66% (10/15)Counting objects:  73% (11/15)Counting objects:  80% (12/15)Counting objects:  86% (13/15)Counting objects:  93% (14/15)Counting objects: 100% (15/15)Counting objects: 100% (15/15), done.
Delta compression using up to 2 threads
Compressing objects:  12% (1/8)Compressing objects:  25% (2/8)Compressing objects:  37% (3/8)Compressing objects:  50% (4/8)Compressing objects:  62% (5/8)Compressing objects:  75% (6/8)Compressing objects:  87% (7/8)Compressing objects: 100% (8/8)Compressing objects: 100% (8/8), done.
Writing objects:  12% (1/8)Writing objects:  25% (2/8)Writing objects:  37% (3/8)Writing objects:  50% (4/8)Writing objects:  62% (5/8)