# ABOUT
- this code trains node2vec embeddings for every "card_id to ID_COL" pair
- background:
    - it may be useful to generate embeddings for id columns
    - dataset is too large to apply the original node2vec algorithm
    - instead, the graph structure is converted to csr matrices and used to train node2vec
- steps:
    - convert to csr matrix
    - use GGvec to generate node2vec embeddings

- GGVec notes:
    - If you are using GGVec, keep order at 1. Using higher order embeddings will take quadratically more time. Additionally, keep negative_ratio low (0.05-0.1), learning_rate high (0.1), and use aggressive early stopping values. GGVec generally only needs a few (less than 100) epochs to get most of the embedding quality you need.

## load data
- a csr matrix representing a graph

In [1]:
import csrgraph as cg
import nodevectors
import os
from tqdm import tqdm

In [2]:
def get_save_path(datapath, save_dir = r"C:\Users\tanch\Documents\NTU\NTU Year 4\Semester 1\CZ4041 - Machine Learning\Team Project\model"):
    temp = datapath.split(".")[0]
    filename = f"node2vec_{temp}"
    return os.path.join(save_dir,filename)

In [3]:
# for every edgelist in edgelist_dir:
# 1. convert to csr matrix
# 2. train node2vev
# 3. save node2vev
edgelist_dir = r"C:\Users\tanch\Documents\NTU\NTU Year 4\Semester 1\CZ4041 - Machine Learning\Team Project\data\edgelist for node2vec"
os.chdir(edgelist_dir)
datapaths = os.listdir(edgelist_dir)
datapaths = ['card_id_city_id.csv', 'card_id_merchant_category_id.csv', 'card_id_merchant_id.csv', 'card_id_state_id.csv', 'card_id_subsector_id.csv']
for datapath in tqdm(datapaths):
    G = cg.read_edgelist(datapath, directed=False, sep=',')
    ggvec_model = nodevectors.GGVec(learning_rate = 0.1,
                                    n_components = 16,
                                    negative_ratio = 0.1, 
                                    max_epoch = 200, 
                                    tol = "auto", 
                                    tol_samples = 10,
                                    verbose = 10) 
    ggvec_model.fit(G)
    ggvec_model.save(get_save_path(datapath))
#     print(datapath, get_save_path(datapath))

  0%|                                                                                            | 0/5 [00:00<?, ?it/s]
  0%|                                                                                          | 0/200 [00:00<?, ?it/s][A
Loss: 0.0241	:   0%|                                                                           | 0/200 [00:04<?, ?it/s][A
Loss: 0.0241	:   0%|▎                                                                  | 1/200 [00:04<13:45,  4.15s/it][A
Loss: 0.0217	:   0%|▎                                                                  | 1/200 [00:04<13:45,  4.15s/it][A
Loss: 0.0217	:   1%|▋                                                                  | 2/200 [00:04<05:57,  1.80s/it][A
Loss: 0.0212	:   1%|▋                                                                  | 2/200 [00:04<05:57,  1.80s/it][A
Loss: 0.0212	:   2%|█                                                                  | 3/200 [00:04<03:28,  1.06s/it][A
Loss: 0.0207	:   2%

Loss: 0.0110	:  16%|██████████▌                                                       | 32/200 [00:10<00:34,  4.92it/s][A
Loss: 0.0110	:  16%|██████████▉                                                       | 33/200 [00:10<00:34,  4.89it/s][A
Loss: 0.0109	:  16%|██████████▉                                                       | 33/200 [00:10<00:34,  4.89it/s][A
Loss: 0.0109	:  17%|███████████▏                                                      | 34/200 [00:10<00:34,  4.83it/s][A
Loss: 0.0108	:  17%|███████████▏                                                      | 34/200 [00:10<00:34,  4.83it/s][A
Loss: 0.0108	:  18%|███████████▌                                                      | 35/200 [00:10<00:34,  4.83it/s][A
Loss: 0.0106	:  18%|███████████▌                                                      | 35/200 [00:10<00:34,  4.83it/s][A
Loss: 0.0106	:  18%|███████████▉                                                      | 36/200 [00:10<00:33,  4.94it/s][A
Loss: 0.0104	:  

Loss: 0.0072	:  32%|█████████████████████▍                                            | 65/200 [00:15<00:21,  6.14it/s][A
Loss: 0.0072	:  33%|█████████████████████▊                                            | 66/200 [00:15<00:21,  6.20it/s][A
Loss: 0.0072	:  33%|█████████████████████▊                                            | 66/200 [00:16<00:21,  6.20it/s][A
Loss: 0.0072	:  34%|██████████████████████                                            | 67/200 [00:16<00:21,  6.20it/s][A
Loss: 0.0071	:  34%|██████████████████████                                            | 67/200 [00:16<00:21,  6.20it/s][A
Loss: 0.0071	:  34%|██████████████████████▍                                           | 68/200 [00:16<00:21,  6.14it/s][A
Loss: 0.0071	:  34%|██████████████████████▍                                           | 68/200 [00:16<00:21,  6.14it/s][A
Loss: 0.0071	:  34%|██████████████████████▊                                           | 69/200 [00:16<00:21,  6.22it/s][A
Loss: 0.0070	:  

Loss: 0.0055	:  49%|████████████████████████████████▎                                 | 98/200 [00:21<00:16,  6.15it/s][A
Loss: 0.0055	:  50%|████████████████████████████████▋                                 | 99/200 [00:21<00:16,  6.07it/s][A
Loss: 0.0055	:  50%|████████████████████████████████▋                                 | 99/200 [00:21<00:16,  6.07it/s][A
Loss: 0.0055	:  50%|████████████████████████████████▌                                | 100/200 [00:21<00:16,  6.12it/s][A
Loss: 0.0054	:  50%|████████████████████████████████▌                                | 100/200 [00:21<00:16,  6.12it/s][A
Loss: 0.0054	:  50%|████████████████████████████████▊                                | 101/200 [00:21<00:16,  6.16it/s][A
Loss: 0.0054	:  50%|████████████████████████████████▊                                | 101/200 [00:21<00:16,  6.16it/s][A
Loss: 0.0054	:  51%|█████████████████████████████████▏                               | 102/200 [00:21<00:16,  6.10it/s][A
Loss: 0.0053	:  

Converged! Loss: 0.0049


 20%|████████████████▊                                                                   | 1/5 [00:50<03:23, 50.77s/it]
  0%|                                                                                          | 0/200 [00:00<?, ?it/s][A
Loss: 0.0147	:   0%|                                                                           | 0/200 [00:00<?, ?it/s][A
Loss: 0.0147	:   0%|▎                                                                  | 1/200 [00:00<01:44,  1.91it/s][A
Loss: 0.0126	:   0%|▎                                                                  | 1/200 [00:01<01:44,  1.91it/s][A
Loss: 0.0126	:   1%|▋                                                                  | 2/200 [00:01<01:41,  1.95it/s][A
Loss: 0.0114	:   1%|▋                                                                  | 2/200 [00:01<01:41,  1.95it/s][A
Loss: 0.0114	:   2%|█                                                                  | 3/200 [00:01<01:41,  1.94it/s][A
Loss: 0.0104	:   2%

Loss: 0.0031	:  16%|██████████▌                                                       | 32/200 [00:17<01:24,  1.98it/s][A
Loss: 0.0031	:  16%|██████████▉                                                       | 33/200 [00:17<01:23,  1.99it/s][A
Loss: 0.0031	:  16%|██████████▉                                                       | 33/200 [00:17<01:23,  1.99it/s][A
Loss: 0.0031	:  17%|███████████▏                                                      | 34/200 [00:17<01:23,  2.00it/s][A
Loss: 0.0030	:  17%|███████████▏                                                      | 34/200 [00:18<01:23,  2.00it/s][A
Loss: 0.0030	:  18%|███████████▌                                                      | 35/200 [00:18<01:22,  1.99it/s][A
Loss: 0.0030	:  18%|███████████▌                                                      | 35/200 [00:18<01:22,  1.99it/s][A
Loss: 0.0030	:  18%|███████████▉                                                      | 36/200 [00:18<01:22,  1.98it/s][A
Loss: 0.0029	:  

Converged! Loss: 0.0023



 40%|█████████████████████████████████▌                                                  | 2/5 [02:01<03:07, 62.39s/it]
  0%|                                                                                          | 0/200 [00:00<?, ?it/s][A
Loss: 0.1544	:   0%|                                                                           | 0/200 [00:01<?, ?it/s][A
Loss: 0.1544	:   0%|▎                                                                  | 1/200 [00:01<03:46,  1.14s/it][A
Loss: 0.1003	:   0%|▎                                                                  | 1/200 [00:02<03:46,  1.14s/it][A
Loss: 0.1003	:   1%|▋                                                                  | 2/200 [00:02<03:46,  1.14s/it][A
Loss: 0.0926	:   1%|▋                                                                  | 2/200 [00:03<03:46,  1.14s/it][A
Loss: 0.0926	:   2%|█                                                                  | 3/200 [00:03<03:46,  1.15s/it][A
Loss: 0.0886	:   2

Loss: 0.0501	:  16%|██████████▌                                                       | 32/200 [00:39<03:29,  1.25s/it][A
Loss: 0.0501	:  16%|██████████▉                                                       | 33/200 [00:39<03:26,  1.23s/it][A
Loss: 0.0493	:  16%|██████████▉                                                       | 33/200 [00:40<03:26,  1.23s/it][A
Loss: 0.0493	:  17%|███████████▏                                                      | 34/200 [00:40<03:21,  1.21s/it][A
Loss: 0.0486	:  17%|███████████▏                                                      | 34/200 [00:41<03:21,  1.21s/it][A
Loss: 0.0486	:  18%|███████████▌                                                      | 35/200 [00:41<03:17,  1.20s/it][A
Loss: 0.0480	:  18%|███████████▌                                                      | 35/200 [00:42<03:17,  1.20s/it][A
Loss: 0.0480	:  18%|███████████▉                                                      | 36/200 [00:42<03:14,  1.19s/it][A
Loss: 0.0474	:  

Loss: 0.0370	:  32%|█████████████████████▍                                            | 65/200 [01:19<02:40,  1.19s/it][A
Loss: 0.0370	:  33%|█████████████████████▊                                            | 66/200 [01:19<02:39,  1.19s/it][A
Loss: 0.0369	:  33%|█████████████████████▊                                            | 66/200 [01:20<02:39,  1.19s/it][A
Loss: 0.0369	:  34%|██████████████████████                                            | 67/200 [01:21<02:42,  1.22s/it][A

Converged! Loss: 0.0367



 60%|█████████████████████████████████████████████████▊                                 | 3/5 [04:45<03:37, 108.68s/it]
  0%|                                                                                          | 0/200 [00:00<?, ?it/s][A
Loss: 0.0202	:   0%|                                                                           | 0/200 [00:00<?, ?it/s][A
Loss: 0.0202	:   0%|▎                                                                  | 1/200 [00:00<00:21,  9.09it/s][A
Loss: 0.0224	:   0%|▎                                                                  | 1/200 [00:00<00:21,  9.09it/s][A
Loss: 0.0224	:   1%|▋                                                                  | 2/200 [00:00<00:22,  8.76it/s][A
Loss: 0.0228	:   1%|▋                                                                  | 2/200 [00:00<00:22,  8.76it/s][A
Loss: 0.0228	:   2%|█                                                                  | 3/200 [00:00<00:23,  8.53it/s][A
Loss: 0.0227	:   2

Loss: 0.0153	:  16%|██████████▌                                                       | 32/200 [00:03<00:17,  9.58it/s][A
Loss: 0.0153	:  16%|██████████▉                                                       | 33/200 [00:03<00:17,  9.61it/s][A
Loss: 0.0151	:  16%|██████████▉                                                       | 33/200 [00:03<00:17,  9.61it/s][A
Loss: 0.0151	:  17%|███████████▏                                                      | 34/200 [00:03<00:17,  9.35it/s][A
Loss: 0.0149	:  17%|███████████▏                                                      | 34/200 [00:03<00:17,  9.35it/s][A
Loss: 0.0149	:  18%|███████████▌                                                      | 35/200 [00:03<00:18,  9.04it/s][A
Loss: 0.0147	:  18%|███████████▌                                                      | 35/200 [00:03<00:18,  9.04it/s][A
Loss: 0.0147	:  18%|███████████▉                                                      | 36/200 [00:03<00:18,  8.84it/s][A
Loss: 0.0145	:  

Loss: 0.0109	:  32%|█████████████████████▍                                            | 65/200 [00:07<00:14,  9.31it/s][A
Loss: 0.0109	:  33%|█████████████████████▊                                            | 66/200 [00:07<00:14,  9.17it/s][A
Loss: 0.0108	:  33%|█████████████████████▊                                            | 66/200 [00:07<00:14,  9.17it/s][A
Loss: 0.0108	:  34%|██████████████████████                                            | 67/200 [00:07<00:14,  9.12it/s][A
Loss: 0.0107	:  34%|██████████████████████                                            | 67/200 [00:07<00:14,  9.12it/s][A
Loss: 0.0107	:  34%|██████████████████████▍                                           | 68/200 [00:07<00:14,  9.24it/s][A
Loss: 0.0106	:  34%|██████████████████████▍                                           | 68/200 [00:07<00:14,  9.24it/s][A
Loss: 0.0106	:  34%|██████████████████████▊                                           | 69/200 [00:07<00:14,  9.27it/s][A
Loss: 0.0105	:  

Loss: 0.0085	:  49%|████████████████████████████████▎                                 | 98/200 [00:10<00:10,  9.40it/s][A
Loss: 0.0085	:  50%|████████████████████████████████▋                                 | 99/200 [00:10<00:10,  9.39it/s][A
Loss: 0.0084	:  50%|████████████████████████████████▋                                 | 99/200 [00:10<00:10,  9.39it/s][A
Loss: 0.0084	:  50%|████████████████████████████████▌                                | 100/200 [00:10<00:10,  9.37it/s][A
Loss: 0.0083	:  50%|████████████████████████████████▌                                | 100/200 [00:10<00:10,  9.37it/s][A
Loss: 0.0083	:  50%|████████████████████████████████▊                                | 101/200 [00:10<00:10,  9.39it/s][A
Loss: 0.0083	:  50%|████████████████████████████████▊                                | 101/200 [00:11<00:10,  9.39it/s][A
Loss: 0.0083	:  51%|█████████████████████████████████▏                               | 102/200 [00:11<00:10,  9.43it/s][A
Loss: 0.0082	:  

Converged! Loss: 0.0069


 80%|███████████████████████████████████████████████████████████████████▏                | 4/5 [05:26<01:22, 82.15s/it]
  0%|                                                                                          | 0/200 [00:00<?, ?it/s][A
Loss: 0.0183	:   0%|                                                                           | 0/200 [00:00<?, ?it/s][A
Loss: 0.0183	:   0%|▎                                                                  | 1/200 [00:00<01:07,  2.95it/s][A
Loss: 0.0170	:   0%|▎                                                                  | 1/200 [00:00<01:07,  2.95it/s][A
Loss: 0.0170	:   1%|▋                                                                  | 2/200 [00:00<01:12,  2.75it/s][A
Loss: 0.0159	:   1%|▋                                                                  | 2/200 [00:01<01:12,  2.75it/s][A
Loss: 0.0159	:   2%|█                                                                  | 3/200 [00:01<01:14,  2.65it/s][A
Loss: 0.0149	:   2%

Loss: 0.0052	:  16%|██████████▌                                                       | 32/200 [00:12<01:14,  2.27it/s][A
Loss: 0.0052	:  16%|██████████▉                                                       | 33/200 [00:12<01:17,  2.16it/s][A
Loss: 0.0051	:  16%|██████████▉                                                       | 33/200 [00:13<01:17,  2.16it/s][A
Loss: 0.0051	:  17%|███████████▏                                                      | 34/200 [00:13<01:22,  2.02it/s][A
Loss: 0.0049	:  17%|███████████▏                                                      | 34/200 [00:13<01:22,  2.02it/s][A
Loss: 0.0049	:  18%|███████████▌                                                      | 35/200 [00:14<01:22,  1.99it/s][A
Loss: 0.0048	:  18%|███████████▌                                                      | 35/200 [00:14<01:22,  1.99it/s][A
Loss: 0.0048	:  18%|███████████▉                                                      | 36/200 [00:14<01:23,  1.97it/s][A
Loss: 0.0047	:  

Loss: 0.0029	:  32%|█████████████████████▍                                            | 65/200 [00:27<01:08,  1.98it/s][A
Loss: 0.0029	:  33%|█████████████████████▊                                            | 66/200 [00:27<01:06,  2.01it/s][A
Loss: 0.0029	:  33%|█████████████████████▊                                            | 66/200 [00:28<01:06,  2.01it/s][A
Loss: 0.0029	:  34%|██████████████████████                                            | 67/200 [00:28<01:02,  2.12it/s][A
Loss: 0.0029	:  34%|██████████████████████                                            | 67/200 [00:28<01:02,  2.12it/s][A
Loss: 0.0029	:  34%|██████████████████████▍                                           | 68/200 [00:28<00:59,  2.23it/s][A
Loss: 0.0029	:  34%|██████████████████████▍                                           | 68/200 [00:28<00:59,  2.23it/s][A
Loss: 0.0029	:  34%|██████████████████████▊                                           | 69/200 [00:28<00:57,  2.29it/s][A
Loss: 0.0028	:  

Loss: 0.0021	:  49%|████████████████████████████████▎                                 | 98/200 [00:40<00:38,  2.64it/s][A
Loss: 0.0021	:  50%|████████████████████████████████▋                                 | 99/200 [00:40<00:42,  2.38it/s][A
Loss: 0.0021	:  50%|████████████████████████████████▋                                 | 99/200 [00:40<00:42,  2.38it/s][A
Loss: 0.0021	:  50%|████████████████████████████████▌                                | 100/200 [00:40<00:41,  2.40it/s][A
Loss: 0.0021	:  50%|████████████████████████████████▌                                | 100/200 [00:41<00:41,  2.40it/s][A
Loss: 0.0021	:  50%|████████████████████████████████▊                                | 101/200 [00:41<00:39,  2.50it/s][A
Loss: 0.0020	:  50%|████████████████████████████████▊                                | 101/200 [00:41<00:39,  2.50it/s][A
Loss: 0.0020	:  51%|█████████████████████████████████▏                               | 102/200 [00:41<00:38,  2.53it/s][A
Loss: 0.0020	:  

Loss: 0.0017	:  66%|██████████████████████████████████████████▌                      | 131/200 [00:52<00:25,  2.75it/s][A
Loss: 0.0017	:  66%|██████████████████████████████████████████▉                      | 132/200 [00:52<00:24,  2.73it/s][A
Loss: 0.0017	:  66%|██████████████████████████████████████████▉                      | 132/200 [00:53<00:24,  2.73it/s][A
Loss: 0.0017	:  66%|███████████████████████████████████████████▏                     | 133/200 [00:53<00:26,  2.49it/s][A

Converged! Loss: 0.0016



100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [06:51<00:00, 82.21s/it]
