# Cold Start Recommendation

The cold start recommendation problem involves the question of how we can 

## Dataset Download

In [19]:
import pandas as pd

original_product_catalog_file = "/Users/david/Documents/data/amazon_kaggle_coldstart/reformatted_trn_unsupervised.csv"

def subsample_catalog(percentage=0.05):
    df = pd.read_csv(original_product_catalog_file)
    df = df.sample(frac=percentage, random_state=42)
    df["PRODUCT_ID"] = [i for i in range(df.shape[0])]

    sampled_product_catalog_file = f"/Users/david/Documents/data/amazon_kaggle_coldstart/reformatted_sampled_{percentage}_trn_unsupervised.csv"
    df.to_csv(sampled_product_catalog_file, index=False)

    return sampled_product_catalog_file, df

sampled_product_catalog_file, dataframe = subsample_catalog()

# UDT Initialization

TODO explain why we're setting it up as query to product, why product has a delimiter, and what is contextual encoding

In [20]:
from thirdai import bolt

model = bolt.UniversalDeepTransformer(
    data_types={
        "PRODUCT_ID": bolt.types.categorical(delimiter=';'),
        "QUERY": bolt.types.text(contextual_encoding="local"),
    },
    target="PRODUCT_ID",
    n_target_classes=dataframe.shape[0],
    integer_target=True,
)


input_1 (Input): dim=100000
input_1 -> fc_1 (FullyConnected): dim=512, sparsity=1, act_func=ReLU
fc_1 -> fc_2 (FullyConnected): dim=5272, sparsity=0.02, act_func=Softmax



# Cold Start Pretraining

In [21]:
model.cold_start(
    filename=sampled_product_catalog_file,
    strong_column_names=["TITLE"],
    weak_column_names=["DESCRIPTION", "BULLET_POINTS", "BRAND"],
    learning_rate=0.001,
    epochs=5,
    metrics=["f_measure(0.95)"]
)

loading data | source 'cold_start_column_map'
loading data | source 'cold_start_column_map' | vectors 97413 | batches 48 | time 0s | complete

train | epoch 0 | train_steps 48 | {precision(t=0.95):0.0624147, recall(t=0.95):0.0624147, f-measure(t=0.95):0.0624147} | train_batches 48 | time 26s | complete

train | epoch 1 | train_steps 96 | {precision(t=0.95):0.334185, recall(t=0.95):0.334185, f-measure(t=0.95):0.334185} | train_batches 48 | time 28s | complete

train | epoch 2 | train_steps 144 | {precision(t=0.95):0.647696, recall(t=0.95):0.647696, f-measure(t=0.95):0.647696} | train_batches 48 | time 27s | complete

train | epoch 3 | train_steps 192 | {precision(t=0.95):0.743884, recall(t=0.95):0.743884, f-measure(t=0.95):0.743884} | train_batches 48 | time 30s | complete

train | epoch 4 | train_steps 240 | {precision(t=0.95):0.789874, recall(t=0.95):0.789874, f-measure(t=0.95):0.789874} | train_batches 48 | time 29s | complete



In [22]:
pd.options.display.max_colwidth = 1000
dataframe[dataframe["PRODUCT_ID"] == 300]

Unnamed: 0,PRODUCT_ID,TITLE,DESCRIPTION,BULLET_POINTS,BRAND
50023,300,Akon Suvice India Pack of 1 Samsung Galaxy M10 5D Tempered; Full Tempered Glass for Samsung Galaxy M10 - Black,Tempered glass edge to edge cover screen protector and ultra safety for samsung galaxy m10 mobile,Brand - Suvice;Ultra Safety;Scratch Resistent;Anti Fingerprint;Pack of 1,AKON SUVICE INDIA


In [25]:
def top_k_products(query, k):
    result = model.predict({"QUERY": query})

    k = min(k, len(result) - 1)
    sorted_product_ids = result.argsort()[-k:][::-1]

    products = []
    for p_id in sorted_product_ids:
        products.append(dict(dataframe[dataframe["PRODUCT_ID"] == p_id]))
    
    return products

top_k_products("Birthday Party return Gift", k=5)

[{'PRODUCT_ID': 54974    2955
  Name: PRODUCT_ID, dtype: int64,
  'TITLE': 54974    GrabChoice Doms Wow 9pc Stationery Kit Set | Return Gift | Kids Birthday Party | Pack Of 20pcs
  Name: TITLE, dtype: object,
  'DESCRIPTION': 54974    You can purchase this kit for your kids or as a Return Gift for a Birthday Party. A perfect return gift for your kid's friends. A quality product from the house of Doms Inds.Doms Wow 9pc Stationery Kit Contains: ►Wax Crayons 6 Shades Pack 1N.►Half Size Color Pencil 6 Shades Pack 1N. ►Water Color Pens 6 Shades pack 1N.►Scale 15cm 1N. ►Neon Eraser 1N.►Extra Long Pencil Sharpener 1N.►Fusion Pencil 2N. ►Sketch Pen 1N.
  Name: DESCRIPTION, dtype: object,
  'BULLET_POINTS': 54974    20 Pieces;Each pack contains: 9pcs Read description  comes with poly bag with zip;Perfect return gift for kids birthday party
  Name: BULLET_POINTS, dtype: object,
  'BRAND': 54974    GRABCHOICE
  Name: BRAND, dtype: object},
 {'PRODUCT_ID': 5038    949
  Name: PRODUCT_ID, dtype: in