# Environment Sanity Check #

Click the _Runtime_ dropdown at the top of the page, then _Change Runtime Type_ and confirm the instance type is _GPU_.

Check the output of `!nvidia-smi` to make sure you've been allocated a Tesla T4, P4, or P100.

In [None]:
!nvidia-smi

Mon Apr 25 16:39:38 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   46C    P8    29W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

#Setup:
Set up script installs
1. Updates gcc in Colab
1. Installs Conda
1. Install RAPIDS' current stable version of its libraries, as well as some external libraries including:
  1. cuDF
  1. cuML
  1. cuGraph
  1. cuSpatial
  1. cuSignal
  1. BlazingSQL
  1. xgboost
1. Copy RAPIDS .so files into current working directory, a neccessary workaround for RAPIDS+Colab integration.


In [None]:
# This get the RAPIDS-Colab install files and test check your GPU.  Run this and the next cell only.
# Please read the output of this cell.  If your Colab Instance is not RAPIDS compatible, it will warn you and give you remediation steps.
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!python rapidsai-csp-utils/colab/env-check.py

Cloning into 'rapidsai-csp-utils'...
remote: Enumerating objects: 300, done.[K
remote: Counting objects: 100% (129/129), done.[K
remote: Compressing objects: 100% (74/74), done.[K
remote: Total 300 (delta 74), reused 99 (delta 55), pack-reused 171[K
Receiving objects: 100% (300/300), 87.58 KiB | 10.95 MiB/s, done.
Resolving deltas: 100% (136/136), done.
Traceback (most recent call last):
  File "rapidsai-csp-utils/colab/env-check.py", line 24, in <module>
    Please use 'Runtime -> Factory Reset Runtimes...', which will allocate you a different GPU instance, to try again."""
Exception: 
                  Unfortunately Colab didn't give you a RAPIDS compatible GPU (P4, P100, T4, or V100), but gave you a Tesla K80.

                  Please use 'Runtime -> Factory Reset Runtimes...', which will allocate you a different GPU instance, to try again.


In [None]:
# This will update the Colab environment and restart the kernel.  Don't run the next cell until you see the session crash.
!bash rapidsai-csp-utils/colab/update_gcc.sh
import os
os._exit(00)

Updating your Colab environment.  This will restart your kernel.  Don't Panic!
Traceback (most recent call last):
  File "/usr/bin/add-apt-repository", line 12, in <module>
    from softwareproperties.SoftwareProperties import SoftwareProperties, shortcut_handler
  File "/usr/lib/python3/dist-packages/softwareproperties/SoftwareProperties.py", line 67, in <module>
    from gi.repository import Gio
  File "/usr/lib/python3/dist-packages/gi/repository/__init__.py", line 25, in <module>
    from ..importer import DynamicImporter
  File "/usr/lib/python3/dist-packages/gi/importer.py", line 33, in <module>
    from .module import get_introspection_module
  File "/usr/lib/python3/dist-packages/gi/module.py", line 57, in <module>
    from .types import \
  File "/usr/lib/python3/dist-packages/gi/types.py", line 43, in <module>
    from . import _propertyhelper as propertyhelper
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955

In [None]:
# This will install CondaColab.  This will restart your kernel one last time.  Run this cell by itself and only run the next cell once you see the session crash.
import condacolab
condacolab.install()

⏬ Downloading https://github.com/jaimergp/miniforge/releases/latest/download/Mambaforge-colab-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:22
🔁 Restarting kernel...


In [None]:
# you can now run the rest of the cells as normal
import condacolab
condacolab.check()

✨🍰✨ Everything looks OK!


In [None]:
# Installing RAPIDS is now 'python rapidsai-csp-utils/colab/install_rapids.py <release> <packages>'
# The <release> options are 'stable' and 'nightly'.  Leaving it blank or adding any other words will default to stable.
!python rapidsai-csp-utils/colab/install_rapids.py stable
import os
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'
os.environ['CONDA_PREFIX'] = '/usr/local'

Found existing installation: cffi 1.14.5
Uninstalling cffi-1.14.5:
  Successfully uninstalled cffi-1.14.5
Found existing installation: cryptography 3.4.5
Uninstalling cryptography-3.4.5:
  Successfully uninstalled cryptography-3.4.5
Collecting cffi==1.15.0
  Downloading cffi-1.15.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (427 kB)
Installing collected packages: cffi
Successfully installed cffi-1.15.0
Installing RAPIDS Stable 21.12
Starting the RAPIDS install on Colab.  This will take about 15 minutes.
Collecting package metadata (current_repodata.json): ...working... done
failed with initial frozen solve. Retrying with flexible solve.
failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... done
done

## Package Plan ##

  environment location: /usr/local

  added / updated specs:
    - cudatoolkit=11.2
    - dask-sql
    - gcsfs
    - llvmlite
    - openssl
    - python=3.7
    - 

# RAPIDS is now installed on Colab.  You can copy your code into the cells below.  Enjoy!

In [None]:
import cudf


In [None]:
!pip install kaggle
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle competitions download h-and-m-personalized-fashion-recommendations -f transactions_train.csv
!unzip transactions_train.csv.zip

Downloading transactions_train.csv.zip to /content
 99% 581M/584M [00:05<00:00, 119MB/s]
100% 584M/584M [00:05<00:00, 119MB/s]
Archive:  transactions_train.csv.zip
  inflating: transactions_train.csv  


In [None]:
!kaggle competitions download h-and-m-personalized-fashion-recommendations -f sample_submission.csv
!unzip sample_submission.csv.zip

Downloading sample_submission.csv.zip to /content
 97% 49.0M/50.3M [00:00<00:00, 102MB/s] 
100% 50.3M/50.3M [00:00<00:00, 98.4MB/s]
Archive:  sample_submission.csv.zip
  inflating: sample_submission.csv   


In [None]:
!unzip predictions.csv.zip

unzip:  cannot find or open predictions.csv.zip, predictions.csv.zip.zip or predictions.csv.zip.ZIP.


In [None]:
!unzip pairs_cudf.npy.zip

Archive:  pairs_cudf.npy.zip
  inflating: pairs_cudf.npy          


In [None]:
import gc

In [None]:
import pandas as pd, numpy as np
# we use a pre computed pair of frequently bought together products to speed up this process.
# This set was generated by the kaggle community. 
pairs = np.load('pairs_cudf.npy',allow_pickle=True).item()

In [None]:
pairs

{706016001: 706016002,
 706016002: 706016001,
 372860001: 372860002,
 610776002: 610776001,
 759871002: 759871001,
 464297007: 372860001,
 372860002: 372860001,
 610776001: 610776002,
 399223001: 706016001,
 720125001: 706016001,
 706016003: 706016001,
 156231001: 228257001,
 562245046: 562245001,
 562245001: 562245046,
 351484002: 723529001,
 399256001: 636323001,
 673396002: 706016001,
 568601006: 568597006,
 673677002: 673677004,
 448509014: 706016001,
 608776002: 372860001,
 688537004: 684209004,
 751471001: 783346001,
 160442007: 160442010,
 573716012: 573716050,
 158340001: 111586001,
 590928001: 688537004,
 706016015: 706016001,
 579541001: 579541004,
 599580017: 684209004,
 484398001: 564786001,
 554450001: 706016001,
 507909001: 507909003,
 160442010: 160442007,
 562245018: 562245001,
 741356002: 706016001,
 684209004: 688537004,
 111586001: 158340001,
 111593001: 111586001,
 636323001: 399256001,
 572797001: 572797002,
 565379001: 565379002,
 783346001: 751471001,
 717490008:

In [None]:
# We transform the input to int, to speed up computations!
transactions = pd.read_csv('transactions_train.csv')
# transactions['customer_id'] = transactions['customer_id'].str[-16:].str.hex_to_int().astype('int64')
transactions['customer_id'] = transactions['customer_id'].str[-16:].apply(lambda x:int(x,base=16)).astype('int64')
transactions['article_id'] = transactions.article_id.astype('int32')
# transform date column to datetime type
transactions.t_dat = pd.to_datetime(transactions.t_dat)
transactions = transactions[['t_dat','customer_id','article_id']]
# transactions.to_parquet('transactions.pqt',index=False)
transactions

Unnamed: 0,t_dat,customer_id,article_id
0,2018-09-20,-6846340800584936,663713001
1,2018-09-20,-6846340800584936,541518023
2,2018-09-20,-8334631767138808638,505221004
3,2018-09-20,-8334631767138808638,685687003
4,2018-09-20,-8334631767138808638,685687004
...,...,...,...
31788319,2020-09-22,4685485978980270934,929511001
31788320,2020-09-22,4685485978980270934,891322004
31788321,2020-09-22,3959348689921271969,918325001
31788322,2020-09-22,-8639340045377511665,833459002


In [None]:
most_common = transactions.article_id.map(pairs)
most_common

83          552370002
84          626263008
85          626263002
86          552346001
87          657165010
              ...    
31788319    868283004
31788320    915292001
31788321    809961007
31788322    918292004
31788323    868823007
Name: article_id, Length: 5446076, dtype: int32

In [None]:
inverted_pairs = {v:k for (k,v) in pairs.items()}

In [91]:
bought_together = {}
n = len(pairs)

for i, p in enumerate(pairs):
  if(i%10000==0):
    print(f'{i} of {n}')
  st = set()
  st.add('0'+str(p))
  my_set = set()
  my_set.add(p)
  item = inverted_pairs.get(p)
  i = 0
  while(item in inverted_pairs and i < 12):
    if(item in my_set):
      i+=1
      continue
    my_set.add(item)
    st.add('0' + str(item))
    item = inverted_pairs.get(item)
    i+=1
  item = pairs.get(p)
  while((item in pairs)and i < 12):
    if(item in my_set):
      i+=1
      continue
    my_set.add(item)
    st.add('0'+str(item))
    item = pairs.get(item)
    i+=1
  # print(st)
  sti = " ".join(st)
  bought_together['0'+str(p)] = st


0 of 96252
10000 of 96252
20000 of 96252
30000 of 96252
40000 of 96252
50000 of 96252
60000 of 96252
70000 of 96252
80000 of 96252
90000 of 96252


In [92]:
bought_together

{'0706016001': {'0706016001', '0706016002'},
 '0706016002': {'0706016001', '0706016002'},
 '0372860001': {'0372860001', '0372860002'},
 '0610776002': {'0610776001', '0610776002'},
 '0759871002': {'0759871001', '0759871002'},
 '0464297007': {'0372860001', '0372860002', '0464297007'},
 '0372860002': {'0372860001', '0372860002'},
 '0610776001': {'0610776001', '0610776002'},
 '0399223001': {'0399223001', '0706016001', '0706016002'},
 '0720125001': {'0706016001', '0706016002', '0720125001'},
 '0706016003': {'0706016001', '0706016002', '0706016003'},
 '0156231001': {'0156231001', '0228257001'},
 '0562245046': {'0562245001', '0562245046'},
 '0562245001': {'0562245001', '0562245046'},
 '0351484002': {'0351484002', '0723529001'},
 '0399256001': {'0399256001', '0636323001'},
 '0673396002': {'0673396002', '0706016001', '0706016002'},
 '0568601006': {'0568597006', '0568601006'},
 '0673677002': {'0673677002', '0673677004'},
 '0448509014': {'0448509014', '0706016001', '0706016002'},
 '0608776002': {

In [93]:
b_common = transactions.article_id.value_counts().index
m = len(bought_together)
for j, p in enumerate(bought_together):
  if(j%10000==0):
    print(f'{j} out of {m}')
  i = 0
  while(len(bought_together[p]) < 12):
    # if(b_common[i] not in bought_together[p]):
    bought_together[p].add('0'+str(b_common[i]))
    i+=1
  



0 out of 96252
10000 out of 96252
20000 out of 96252
30000 out of 96252
40000 out of 96252
50000 out of 96252
60000 out of 96252
70000 out of 96252
80000 out of 96252
90000 out of 96252


In [97]:
mydx = ['0'+ str(a) for a in b_common[:12].to_numpy()]
default = " ".join(mydx)

In [98]:
default

'0706016001 0706016002 0372860001 0610776002 0759871002 0464297007 0372860002 0610776001 0399223001 0706016003 0720125001 0156231001'

In [99]:
bought_together

{'0706016001': {'0156231001',
  '0372860001',
  '0372860002',
  '0399223001',
  '0464297007',
  '0610776001',
  '0610776002',
  '0706016001',
  '0706016002',
  '0706016003',
  '0720125001',
  '0759871002'},
 '0706016002': {'0156231001',
  '0372860001',
  '0372860002',
  '0399223001',
  '0464297007',
  '0610776001',
  '0610776002',
  '0706016001',
  '0706016002',
  '0706016003',
  '0720125001',
  '0759871002'},
 '0372860001': {'0156231001',
  '0372860001',
  '0372860002',
  '0399223001',
  '0464297007',
  '0610776001',
  '0610776002',
  '0706016001',
  '0706016002',
  '0706016003',
  '0720125001',
  '0759871002'},
 '0610776002': {'0156231001',
  '0372860001',
  '0372860002',
  '0399223001',
  '0464297007',
  '0610776001',
  '0610776002',
  '0706016001',
  '0706016002',
  '0706016003',
  '0720125001',
  '0759871002'},
 '0759871002': {'0372860001',
  '0372860002',
  '0399223001',
  '0464297007',
  '0610776001',
  '0610776002',
  '0706016001',
  '0706016002',
  '0706016003',
  '0720125001'

In [100]:
bought_transform = {k:" ".join(map(str,v)) for (k,v) in bought_together.items()}

In [101]:
bought_transform

{'0706016001': '0610776002 0720125001 0759871002 0372860002 0706016001 0610776001 0464297007 0706016003 0156231001 0399223001 0372860001 0706016002',
 '0706016002': '0610776002 0720125001 0759871002 0372860002 0706016001 0610776001 0464297007 0706016003 0156231001 0399223001 0372860001 0706016002',
 '0372860001': '0610776002 0720125001 0759871002 0372860002 0706016001 0610776001 0464297007 0706016003 0156231001 0399223001 0372860001 0706016002',
 '0610776002': '0610776002 0720125001 0759871002 0372860002 0706016001 0610776001 0464297007 0706016003 0156231001 0399223001 0372860001 0706016002',
 '0759871002': '0610776002 0720125001 0759871002 0372860002 0706016001 0610776001 0464297007 0759871001 0706016003 0399223001 0372860001 0706016002',
 '0464297007': '0610776002 0720125001 0759871002 0372860002 0706016001 0610776001 0464297007 0706016003 0156231001 0399223001 0372860001 0706016002',
 '0372860002': '0610776002 0720125001 0759871002 0372860002 0706016001 0610776001 0464297007 0706016

In [102]:
# aux = transactions_pandas[['customer_id', 'article_id']]
aux = transactions[['customer_id', 'article_id']]
aux = aux.groupby('customer_id').agg(lambda x:x.value_counts().index[0])
aux['customer_id'] = aux.index
aux.index.name = None
aux


Unnamed: 0,article_id,customer_id
-9223352921020755230,706016001,-9223352921020755230
-9223343869995384291,519583013,-9223343869995384291
-9223321797620987725,580600006,-9223321797620987725
-9223319430705797669,470985003,-9223319430705797669
-9223308614576639426,750423005,-9223308614576639426
...,...,...
9223319300843860958,640735005,9223319300843860958
9223333063893176977,607834005,9223333063893176977
9223345314868180224,552018006,9223345314868180224
9223357421094039679,565379022,9223357421094039679


In [105]:
# transactions_pandas['predicts'] = transactions_pandas.article_id.map(bought_transform)
# transactions['predicts'] = transactions.article_id.map(bought_transform)
# transaction_pandas.drop(columns='')
# transactions_pandas.head(5)
# transactions.head(5)
aux_bought_transform = {int(k):v for (k,v) in bought_transform.items()}
aux['predicts'] = aux.article_id.map(aux_bought_transform)
my_preds = aux[['customer_id', 'predicts']]
my_preds

Unnamed: 0,customer_id,predicts
-9223352921020755230,-9223352921020755230,0610776002 0720125001 0759871002 0372860002 07...
-9223343869995384291,-9223343869995384291,0610776002 0519583013 0759871002 0372860002 07...
-9223321797620987725,-9223321797620987725,0610776002 0658030011 0658030001 0759871002 03...
-9223319430705797669,-9223319430705797669,0610776002 0720125001 0759871002 0372860002 07...
-9223308614576639426,-9223308614576639426,0610776002 0750423001 0750423005 0759871002 03...
...,...,...
9223319300843860958,9223319300843860958,0610776002 0640735005 0640735003 0759871002 03...
9223333063893176977,9223333063893176977,0610776002 0720125001 0759871002 0372860002 07...
9223345314868180224,9223345314868180224,0610776002 0759871002 0372860002 0706016001 06...
9223357421094039679,9223357421094039679,0610776002 0759871002 0372860002 0706016001 06...


customer_id      0
predicts       910
dtype: int64

In [None]:
transactions_pandas.to_csv('predictions.csv')

In [None]:
print(hex(8144921020642171327))

0x71089283befd19bf


In [106]:
preds = pd.read_csv('sample_submission.csv')
preds


Unnamed: 0,customer_id,prediction
0,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,0706016001 0706016002 0372860001 0610776002 07...
1,0000423b00ade91418cceaf3b26c6af3dd342b51fd051e...,0706016001 0706016002 0372860001 0610776002 07...
2,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,0706016001 0706016002 0372860001 0610776002 07...
3,00005ca1c9ed5f5146b52ac8639a40ca9d57aeff4d1bd2...,0706016001 0706016002 0372860001 0610776002 07...
4,00006413d8573cd20ed7128e53b7b13819fe5cfc2d801f...,0706016001 0706016002 0372860001 0610776002 07...
...,...,...
1371975,ffffbbf78b6eaac697a8a5dfbfd2bfa8113ee5b403e474...,0706016001 0706016002 0372860001 0610776002 07...
1371976,ffffcd5046a6143d29a04fb8c424ce494a76e5cdf4fab5...,0706016001 0706016002 0372860001 0610776002 07...
1371977,ffffcf35913a0bee60e8741cb2b4e78b8a98ee5ff2e6a1...,0706016001 0706016002 0372860001 0610776002 07...
1371978,ffffd7744cebcf3aca44ae7049d2a94b87074c3d4ffe38...,0706016001 0706016002 0372860001 0610776002 07...


In [None]:
my_preds = pd.read_csv('predictions.csv')
my_preds = my_preds[['customer_id', 'predicts']]


Unnamed: 0,customer_id,predicts
0,8144921020642171327,706016001 610776002 706016002 372860001 372860...
9,4608614416544282068,610776001 610776002 706016001 372860001 372860...
20,1979330188696623796,706016001 610776002 372860001 372860002 706016...
21,8273579154522179394,620208001 610776002 620208002 706016001 372860...
30,8829412084612526158,556560001 610776002 706016001 372860001 556560...
...,...,...
5446055,3721860431314479867,706016001 610776002 706016002 372860001 372860...
5446064,-8887256083993680967,706016001 610776002 372860001 372860002 706016...
5446067,8690546469448085062,372860001 610776002 706016001 372860002 706016...
5446068,-4919130901198739819,


In [107]:
preds['id2'] = preds['customer_id']
preds['customer_id'] = preds['customer_id'].str[-16:].apply(lambda x:int(x,base=16)).astype('int64')
preds

Unnamed: 0,customer_id,prediction,id2
0,6883939031699146327,0706016001 0706016002 0372860001 0610776002 07...,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...
1,-7200416642310594310,0706016001 0706016002 0372860001 0610776002 07...,0000423b00ade91418cceaf3b26c6af3dd342b51fd051e...
2,-6846340800584936,0706016001 0706016002 0372860001 0610776002 07...,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...
3,-94071612138601410,0706016001 0706016002 0372860001 0610776002 07...,00005ca1c9ed5f5146b52ac8639a40ca9d57aeff4d1bd2...
4,-283965518499174310,0706016001 0706016002 0372860001 0610776002 07...,00006413d8573cd20ed7128e53b7b13819fe5cfc2d801f...
...,...,...,...
1371975,7551062398649767985,0706016001 0706016002 0372860001 0610776002 07...,ffffbbf78b6eaac697a8a5dfbfd2bfa8113ee5b403e474...
1371976,-9141402131989464905,0706016001 0706016002 0372860001 0610776002 07...,ffffcd5046a6143d29a04fb8c424ce494a76e5cdf4fab5...
1371977,-8286316756823862684,0706016001 0706016002 0372860001 0610776002 07...,ffffcf35913a0bee60e8741cb2b4e78b8a98ee5ff2e6a1...
1371978,2551401172826382186,0706016001 0706016002 0372860001 0610776002 07...,ffffd7744cebcf3aca44ae7049d2a94b87074c3d4ffe38...


In [None]:
my_preds

Unnamed: 0,customer_id,predicts
-9223352921020755230,-9223352921020755230,706016001 610776002 706016002 372860001 372860...
-9223343869995384291,-9223343869995384291,519583008 706016001 706016002 372860001 610776...
-9223321797620987725,-9223321797620987725,706016001 706016002 372860001 610776002 372860...
-9223319430705797669,-9223319430705797669,706016001 610776002 706016002 372860001 372860...
-9223308614576639426,-9223308614576639426,706016001 706016002 372860001 610776002 372860...
...,...,...
9223319300843860958,9223319300843860958,706016001 706016002 372860001 610776002 372860...
9223333063893176977,9223333063893176977,706016001 610776002 706016002 372860001 372860...
9223345314868180224,9223345314868180224,706016001 706016002 372860001 610776002 372860...
9223357421094039679,9223357421094039679,706016001 706016002 372860001 610776002 372860...


In [108]:
preds.drop(columns=['prediction'], inplace=True)

In [109]:
final = preds.merge(my_preds, how='left', on='customer_id')

In [110]:
final.drop(columns=['customer_id'], inplace=True)
final.rename(columns={'id2':'customer_id', 'predicts':'prediction'}, inplace=True)
final

Unnamed: 0,customer_id,prediction
0,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,0610776002 0759871002 0372860002 0706016001 06...
1,0000423b00ade91418cceaf3b26c6af3dd342b51fd051e...,0610776002 0759871002 0372860002 0706016001 06...
2,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,0610776002 0487722001 0759871002 0372860002 07...
3,00005ca1c9ed5f5146b52ac8639a40ca9d57aeff4d1bd2...,0742079001 0610776002 0720125001 0759871002 03...
4,00006413d8573cd20ed7128e53b7b13819fe5cfc2d801f...,0610776002 0399061015 0720125001 0759871002 03...
...,...,...
1371975,ffffbbf78b6eaac697a8a5dfbfd2bfa8113ee5b403e474...,0610776002 0712924008 0759871002 0372860002 07...
1371976,ffffcd5046a6143d29a04fb8c424ce494a76e5cdf4fab5...,0610776002 0699623006 0759871002 0372860002 07...
1371977,ffffcf35913a0bee60e8741cb2b4e78b8a98ee5ff2e6a1...,0610776002 0564786001 0484398001 0759871002 03...
1371978,ffffd7744cebcf3aca44ae7049d2a94b87074c3d4ffe38...,0610776002 0720125001 0759871002 0372860002 07...


In [112]:
final.fillna(default, inplace=True)
final.isna().sum()

customer_id    0
prediction     0
dtype: int64

In [118]:
final.to_csv('/content/drive/MyDrive/Columbia/submission.csv', index=False)

In [115]:
!kaggle competitions submit -c h-and-m-personalized-fashion-recommendations -f submission.csv -m "freq bought together2"

100% 258M/258M [00:02<00:00, 96.2MB/s]
Successfully submitted to H&M Personalized Fashion Recommendations

In [None]:
preds

Unnamed: 0,customer_id,prediction,id2
0,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,0706016001 0706016002 0372860001 0610776002 07...,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...
1,0000423b00ade91418cceaf3b26c6af3dd342b51fd051e...,0706016001 0706016002 0372860001 0610776002 07...,0000423b00ade91418cceaf3b26c6af3dd342b51fd051e...
2,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,0706016001 0706016002 0372860001 0610776002 07...,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...
3,00005ca1c9ed5f5146b52ac8639a40ca9d57aeff4d1bd2...,0706016001 0706016002 0372860001 0610776002 07...,00005ca1c9ed5f5146b52ac8639a40ca9d57aeff4d1bd2...
4,00006413d8573cd20ed7128e53b7b13819fe5cfc2d801f...,0706016001 0706016002 0372860001 0610776002 07...,00006413d8573cd20ed7128e53b7b13819fe5cfc2d801f...
...,...,...,...
1371975,ffffbbf78b6eaac697a8a5dfbfd2bfa8113ee5b403e474...,0706016001 0706016002 0372860001 0610776002 07...,ffffbbf78b6eaac697a8a5dfbfd2bfa8113ee5b403e474...
1371976,ffffcd5046a6143d29a04fb8c424ce494a76e5cdf4fab5...,0706016001 0706016002 0372860001 0610776002 07...,ffffcd5046a6143d29a04fb8c424ce494a76e5cdf4fab5...
1371977,ffffcf35913a0bee60e8741cb2b4e78b8a98ee5ff2e6a1...,0706016001 0706016002 0372860001 0610776002 07...,ffffcf35913a0bee60e8741cb2b4e78b8a98ee5ff2e6a1...
1371978,ffffd7744cebcf3aca44ae7049d2a94b87074c3d4ffe38...,0706016001 0706016002 0372860001 0610776002 07...,ffffd7744cebcf3aca44ae7049d2a94b87074c3d4ffe38...


In [None]:
preds

Unnamed: 0,customer_id,prediction,id2
0,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,0706016001 0706016002 0372860001 0610776002 07...,6883939031699146327
1,0000423b00ade91418cceaf3b26c6af3dd342b51fd051e...,0706016001 0706016002 0372860001 0610776002 07...,-7200416642310594310
2,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,0706016001 0706016002 0372860001 0610776002 07...,-6846340800584936
3,00005ca1c9ed5f5146b52ac8639a40ca9d57aeff4d1bd2...,0706016001 0706016002 0372860001 0610776002 07...,-94071612138601410
4,00006413d8573cd20ed7128e53b7b13819fe5cfc2d801f...,0706016001 0706016002 0372860001 0610776002 07...,-283965518499174310
...,...,...,...
1371975,ffffbbf78b6eaac697a8a5dfbfd2bfa8113ee5b403e474...,0706016001 0706016002 0372860001 0610776002 07...,7551062398649767985
1371976,ffffcd5046a6143d29a04fb8c424ce494a76e5cdf4fab5...,0706016001 0706016002 0372860001 0610776002 07...,-9141402131989464905
1371977,ffffcf35913a0bee60e8741cb2b4e78b8a98ee5ff2e6a1...,0706016001 0706016002 0372860001 0610776002 07...,-8286316756823862684
1371978,ffffd7744cebcf3aca44ae7049d2a94b87074c3d4ffe38...,0706016001 0706016002 0372860001 0610776002 07...,2551401172826382186


In [116]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive
