# H&M Personalized Fashion Recommendations
Provide product recommendations based on previous purchases

[H&M Group](https://www.hmgroup.com/) is a family of brands and businesses with 53 online markets and approximately 4,850 stores. Our online store offers shoppers an extensive selection of products to browse through. But with too many choices, customers might not quickly find what interests them or what they are looking for, and ultimately, they might not make a purchase. To enhance the shopping experience, product recommendations are key. More importantly, helping customers make the right choices also has a positive implications for sustainability, as it reduces returns, and thereby minimizes emissions from transportation.

In this competition, H&M Group invites you to develop product recommendations based on data from previous transactions, as well as from customer and product meta data. The available meta data spans from simple data, such as garment type and customer age, to text data from product descriptions, to image data from garment images.

There are no preconceptions on what information that may be useful – that is for you to find out. If you want to investigate a categorical data type algorithm, or dive into NLP and image processing deep learning, that is up to you.

Link: https://www.kaggle.com/competitions/h-and-m-personalized-fashion-recommendations/overview

Help: https://www.kaggle.com/code/julian3833/h-m-implicit-als-model-0-014

In [1]:
import pandas as pd
import numpy as np
from scipy.sparse import coo_matrix
import implicit
from tqdm.notebook import tqdm

In [2]:
%load_ext nb_black

<IPython.core.display.Javascript object>

In [3]:
articles_df = pd.read_csv(
    "../../data/h-and-m-personalized-fashion-recommendations/articles.csv",
).set_index("article_id")
articles_df

Unnamed: 0_level_0,product_code,prod_name,product_type_no,product_type_name,product_group_name,graphical_appearance_no,graphical_appearance_name,colour_group_code,colour_group_name,perceived_colour_value_id,...,department_name,index_code,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,garment_group_name,detail_desc
article_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
108775015,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,9,Black,4,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
108775044,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,10,White,3,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
108775051,108775,Strap top (1),253,Vest top,Garment Upper body,1010017,Stripe,11,Off White,1,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
110065001,110065,OP T-shirt (Idro),306,Bra,Underwear,1010016,Solid,9,Black,4,...,Clean Lingerie,B,Lingeries/Tights,1,Ladieswear,61,Womens Lingerie,1017,"Under-, Nightwear","Microfibre T-shirt bra with underwired, moulde..."
110065002,110065,OP T-shirt (Idro),306,Bra,Underwear,1010016,Solid,10,White,3,...,Clean Lingerie,B,Lingeries/Tights,1,Ladieswear,61,Womens Lingerie,1017,"Under-, Nightwear","Microfibre T-shirt bra with underwired, moulde..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
953450001,953450,5pk regular Placement1,302,Socks,Socks & Tights,1010014,Placement print,9,Black,4,...,Socks Bin,F,Menswear,3,Menswear,26,Men Underwear,1021,Socks and Tights,Socks in a fine-knit cotton blend with a small...
953763001,953763,SPORT Malaga tank,253,Vest top,Garment Upper body,1010016,Solid,9,Black,4,...,Jersey,A,Ladieswear,1,Ladieswear,2,H&M+,1005,Jersey Fancy,Loose-fitting sports vest top in ribbed fast-d...
956217002,956217,Cartwheel dress,265,Dress,Garment Full body,1010016,Solid,9,Black,4,...,Jersey,A,Ladieswear,1,Ladieswear,18,Womens Trend,1005,Jersey Fancy,"Short, A-line dress in jersey with a round nec..."
957375001,957375,CLAIRE HAIR CLAW,72,Hair clip,Accessories,1010016,Solid,9,Black,4,...,Small Accessories,D,Divided,2,Divided,52,Divided Accessories,1019,Accessories,Large plastic hair claw.


<IPython.core.display.Javascript object>

In [4]:
articles_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 105542 entries, 108775015 to 959461001
Data columns (total 24 columns):
 #   Column                        Non-Null Count   Dtype 
---  ------                        --------------   ----- 
 0   product_code                  105542 non-null  int64 
 1   prod_name                     105542 non-null  object
 2   product_type_no               105542 non-null  int64 
 3   product_type_name             105542 non-null  object
 4   product_group_name            105542 non-null  object
 5   graphical_appearance_no       105542 non-null  int64 
 6   graphical_appearance_name     105542 non-null  object
 7   colour_group_code             105542 non-null  int64 
 8   colour_group_name             105542 non-null  object
 9   perceived_colour_value_id     105542 non-null  int64 
 10  perceived_colour_value_name   105542 non-null  object
 11  perceived_colour_master_id    105542 non-null  int64 
 12  perceived_colour_master_name  105542 non-null  

<IPython.core.display.Javascript object>

In [5]:
customers_df = pd.read_csv(
    "../../data/h-and-m-personalized-fashion-recommendations/customers.csv",
).set_index("customer_id")
customers_df

Unnamed: 0_level_0,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
00000dbacae5abe5e23885899a1fa44253a17956c6d1c3d25f88aa139fdfc657,,,ACTIVE,NONE,49.0,52043ee2162cf5aa7ee79974281641c6f11a68d276429a...
0000423b00ade91418cceaf3b26c6af3dd342b51fd051eec9c12fb36984420fa,,,ACTIVE,NONE,25.0,2973abc54daa8a5f8ccfe9362140c63247c5eee03f1d93...
000058a12d5b43e67d225668fa1f8d618c13dc232df0cad8ffe7ad4a1091e318,,,ACTIVE,NONE,24.0,64f17e6a330a85798e4998f62d0930d14db8db1c054af6...
00005ca1c9ed5f5146b52ac8639a40ca9d57aeff4d1bd2c5feb1ca5dff07c43e,,,ACTIVE,NONE,54.0,5d36574f52495e81f019b680c843c443bd343d5ca5b1c2...
00006413d8573cd20ed7128e53b7b13819fe5cfc2d801fe7fc0f26dd8d65a85a,1.0,1.0,ACTIVE,Regularly,52.0,25fa5ddee9aac01b35208d01736e57942317d756b32ddd...
...,...,...,...,...,...,...
ffffbbf78b6eaac697a8a5dfbfd2bfa8113ee5b403e4747568cac33e8c541831,,,ACTIVE,NONE,24.0,7aa399f7e669990daba2d92c577b52237380662f36480b...
ffffcd5046a6143d29a04fb8c424ce494a76e5cdf4fab53481233731b5c4f8b7,,,ACTIVE,NONE,21.0,3f47f1279beb72215f4de557d950e0bfa73789d24acb5e...
ffffcf35913a0bee60e8741cb2b4e78b8a98ee5ff2e6a1778d0116cffd259264,1.0,1.0,ACTIVE,Regularly,21.0,4563fc79215672cd6a863f2b4bf56b8f898f2d96ed590e...
ffffd7744cebcf3aca44ae7049d2a94b87074c3d4ffe38b2236865d949d4df6a,1.0,1.0,ACTIVE,Regularly,18.0,8892c18e9bc3dca6aa4000cb8094fc4b51ee8db2ed14d7...


<IPython.core.display.Javascript object>

In [6]:
transactions_train_df = pd.read_csv(
    "../../data/h-and-m-personalized-fashion-recommendations/transactions_train.csv",
    parse_dates=["t_dat"],
)
transactions_train_df["customer_id"] = transactions_train_df["customer_id"].astype(
    "category"
)
transactions_train_df["article_id"] = transactions_train_df["article_id"].astype(
    "category"
)
transactions_train_df

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id
0,2018-09-20,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,663713001,0.050831,2
1,2018-09-20,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,541518023,0.030492,2
2,2018-09-20,00007d2de826758b65a93dd24ce629ed66842531df6699...,505221004,0.015237,2
3,2018-09-20,00007d2de826758b65a93dd24ce629ed66842531df6699...,685687003,0.016932,2
4,2018-09-20,00007d2de826758b65a93dd24ce629ed66842531df6699...,685687004,0.016932,2
...,...,...,...,...,...
31788319,2020-09-22,fff2282977442e327b45d8c89afde25617d00124d0f999...,929511001,0.059305,2
31788320,2020-09-22,fff2282977442e327b45d8c89afde25617d00124d0f999...,891322004,0.042356,2
31788321,2020-09-22,fff380805474b287b05cb2a7507b9a013482f7dd0bce0e...,918325001,0.043203,1
31788322,2020-09-22,fff4d3a8b1f3b60af93e78c30a7cb4cf75edaf2590d3e5...,833459002,0.006763,1


<IPython.core.display.Javascript object>

In [7]:
sample_submission_df = pd.read_csv(
    "../../data/h-and-m-personalized-fashion-recommendations/sample_submission.csv",
)
sample_submission_df

Unnamed: 0,customer_id,prediction
0,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,0706016001 0706016002 0372860001 0610776002 07...
1,0000423b00ade91418cceaf3b26c6af3dd342b51fd051e...,0706016001 0706016002 0372860001 0610776002 07...
2,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,0706016001 0706016002 0372860001 0610776002 07...
3,00005ca1c9ed5f5146b52ac8639a40ca9d57aeff4d1bd2...,0706016001 0706016002 0372860001 0610776002 07...
4,00006413d8573cd20ed7128e53b7b13819fe5cfc2d801f...,0706016001 0706016002 0372860001 0610776002 07...
...,...,...
1371975,ffffbbf78b6eaac697a8a5dfbfd2bfa8113ee5b403e474...,0706016001 0706016002 0372860001 0610776002 07...
1371976,ffffcd5046a6143d29a04fb8c424ce494a76e5cdf4fab5...,0706016001 0706016002 0372860001 0610776002 07...
1371977,ffffcf35913a0bee60e8741cb2b4e78b8a98ee5ff2e6a1...,0706016001 0706016002 0372860001 0610776002 07...
1371978,ffffd7744cebcf3aca44ae7049d2a94b87074c3d4ffe38...,0706016001 0706016002 0372860001 0610776002 07...


<IPython.core.display.Javascript object>

# Prepare

https://www.kaggle.com/code/julian3833/h-m-implicit-als-model-0-014?scriptVersionId=88178941&cellId=11   
https://www.benfrederickson.com/matrix-factorization/

In [8]:
customers_ids, article_ids = customers_df.index.unique(), articles_df.index.unique()
customers_ids.shape, article_ids.shape

((1371980,), (105542,))

<IPython.core.display.Javascript object>

In [9]:
# Trains the model on a sparse matrix of item/user/weight
# https://implicit.readthedocs.io/en/latest/models.html#implicit.recommender_base.RecommenderBase.fit

row = transactions_train_df["article_id"].cat.codes
col = transactions_train_df["customer_id"].cat.codes

data = np.ones(transactions_train_df.shape[0])

coo_transactions = coo_matrix((data, (row, col)))
coo_transactions

<104547x1362281 sparse matrix of type '<class 'numpy.float64'>'
	with 31788324 stored elements in COOrdinate format>

<IPython.core.display.Javascript object>

# Train

In [10]:
model = implicit.als.AlternatingLeastSquares(factors=50)
model.fit(coo_transactions)

  0%|          | 0/15 [00:00<?, ?it/s]

<IPython.core.display.Javascript object>

In [11]:
csr_transactions = coo_transactions.T.tocsr()
csr_transactions

<1362281x104547 sparse matrix of type '<class 'numpy.float64'>'
	with 27306439 stored elements in Compressed Sparse Row format>

<IPython.core.display.Javascript object>

In [12]:
max(col)

1362280

<IPython.core.display.Javascript object>

In [13]:
recommendations = model.recommend(1362280, csr_transactions, N=12)  # recommend customer
recommendations

[(53833, 0.0005609556),
 (42514, 0.0005475608),
 (9905, 0.0003685964),
 (956, 0.0003610294),
 (72626, 0.0003607236),
 (78816, 0.00029659836),
 (1467, 0.00029390445),
 (15980, 0.0002883087),
 (117, 0.0002871875),
 (22619, 0.0002796054),
 (115, 0.00027330845),
 (10633, 0.00026673847)]

<IPython.core.display.Javascript object>

In [14]:
max(row)

104546

<IPython.core.display.Javascript object>

In [15]:
model.similar_items(104546)  # similar article

[(104546, 1.0),
 (84788, 0.9075138),
 (103311, 0.90301347),
 (70947, 0.896247),
 (77010, 0.892503),
 (92562, 0.8847342),
 (84834, 0.88254946),
 (70948, 0.88209087),
 (104137, 0.88116115),
 (76005, 0.88045853)]

<IPython.core.display.Javascript object>

# Predict

In [16]:
customer_code_list = dict(
    zip(
        transactions_train_df["customer_id"],
        transactions_train_df["customer_id"].cat.codes,
    )
)
len(customer_code_list)

1362281

<IPython.core.display.Javascript object>

In [17]:
article_code_list = dict(
    zip(
        transactions_train_df["article_id"].cat.codes,
        transactions_train_df["article_id"],
    )
)
len(article_code_list)

104547

<IPython.core.display.Javascript object>

In [18]:
data = []

for customer_uid in tqdm(sample_submission_df["customer_id"].unique()):
    prediction = []

    if customer_uid in customer_code_list:
        customer_code = customer_code_list[customer_uid]

        rec = model.recommend(customer_code, csr_transactions, N=12)
        article_code_ids = np.array(rec)[:, 0].astype(int)

        prediction = [str(article_code_list[x]) for x in article_code_ids]

    data.append((customer_uid, " ".join(prediction)))

submission_df = pd.DataFrame(data, columns=["customer_id", "prediction"])
submission_df

  0%|          | 0/1371980 [00:00<?, ?it/s]

Unnamed: 0,customer_id,prediction
0,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,568597006 568601007 568597007 448509014 507909...
1,0000423b00ade91418cceaf3b26c6af3dd342b51fd051e...,699080001 776237020 599580052 599580038 759871...
2,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,699080001 699081001 609719001 458543001 838055...
3,00005ca1c9ed5f5146b52ac8639a40ca9d57aeff4d1bd2...,484398001 720125001 564786001 730683001 470789...
4,00006413d8573cd20ed7128e53b7b13819fe5cfc2d801f...,720125001 599580038 562245046 717490008 599580...
...,...,...
1371975,ffffbbf78b6eaac697a8a5dfbfd2bfa8113ee5b403e474...,717490008 579302001 590928001 688537011 590928...
1371976,ffffcd5046a6143d29a04fb8c424ce494a76e5cdf4fab5...,562245001 695632002 706016003 695632001 759871...
1371977,ffffcf35913a0bee60e8741cb2b4e78b8a98ee5ff2e6a1...,759871001 564786001 783346001 706016006 751471...
1371978,ffffd7744cebcf3aca44ae7049d2a94b87074c3d4ffe38...,448509014 448509001 448509018 799365002 714790...


<IPython.core.display.Javascript object>

In [19]:
submission_df.to_csv(
    "../../data/h-and-m-personalized-fashion-recommendations/submission.csv.gz",
    index=False,
    compression="gzip",
)

<IPython.core.display.Javascript object>