# Content-based system using prebuilt article embeddings

## Idea in brief

Introduce a ML regression model that enhances the similarity sums.
  - Model features/inputs: 
    1. Interaction/User-article fields, averaged for each article: timestamps, read times, scroll percentages
    2. Article-specific fields like tags, total inviews, total pageview
    3. User-specific (averaged for each article from all the users that have read the article) fields like age, gender
  - Model target value: `F = "fitness" of the article`, a number between 0 and 1, which best approximates the `score = Y/X` for `X = count of times the article was in view`, `Y = count of times the article was clicked`. 
  - We can then train another model which tries to learn the weight `a` and `b` such that `a * F + b * cosine_similarity` is as close as possible to the average `score` for the article, where `cosine_similarity` is the average cosine similarity between the article when it's inview and an article from the user's profile.
  - Extend user profile with "User-specific" described above and measure similarity of those fields somehow.

## Imports

In [102]:
import random
import json

import pandas as pd
import numpy as np
from tqdm import tqdm
from sklearn.model_selection import train_test_split, RepeatedKFold, cross_val_score
from sklearn.metrics import accuracy_score
from scipy.spatial.distance import cosine
from collections import Counter
import xgboost as xgb

## Constants

In [84]:
ROOT_PATH = '../data/small'
ARTICLES_PATH = f'{ROOT_PATH}/articles.parquet'
TRAIN_HISTORY_PATH = f'{ROOT_PATH}/train/history.parquet'
TRAIN_INTERACTIONS_PATH = f'{ROOT_PATH}/train/behaviors.parquet'
VALIDATION_HISTORY_PATH = f'{ROOT_PATH}/validation/history.parquet'
VALIDATION_INTERACTIONS_PATH = f'{ROOT_PATH}/validation/behaviors.parquet'

## Data loading

In [85]:
articles_df = pd.read_parquet(ARTICLES_PATH)
history_df = pd.read_parquet(TRAIN_HISTORY_PATH)
interactions_df = pd.read_parquet(TRAIN_INTERACTIONS_PATH)
validation_interactions_df = pd.read_parquet(VALIDATION_INTERACTIONS_PATH)

In [86]:
history_df

Unnamed: 0,user_id,impression_time_fixed,scroll_percentage_fixed,article_id_fixed,read_time_fixed
0,13538,"[2023-04-27T10:17:43.000000, 2023-04-27T10:18:...","[100.0, 35.0, 100.0, 24.0, 100.0, 23.0, 100.0,...","[9738663, 9738569, 9738663, 9738490, 9738663, ...","[17.0, 12.0, 4.0, 5.0, 4.0, 9.0, 5.0, 46.0, 11..."
1,14241,"[2023-04-27T09:40:18.000000, 2023-04-27T09:40:...","[100.0, 46.0, 100.0, 70.0, 100.0, 100.0, 100.0...","[9738557, 9738528, 9738533, 9738684, 9739035, ...","[8.0, 9.0, 28.0, 17.0, 91.0, 21.0, 14.0, 27.0,..."
2,20396,"[2023-04-27T12:30:44.000000, 2023-04-27T12:31:...","[100.0, 59.0, nan, nan, 100.0, 100.0, nan, nan...","[9738760, 9738355, 9738355, 9739864, 9741788, ...","[49.0, 34.0, 0.0, 60.0, 180.0, 49.0, 0.0, 0.0,..."
3,34912,"[2023-04-29T07:12:49.000000, 2023-04-29T13:01:...","[100.0, 35.0, 44.0, 31.0, 100.0, 100.0, 100.0,...","[9741802, 9741804, 9741803, 9740087, 9742039, ...","[153.0, 7.0, 5.0, 6.0, 44.0, 44.0, 108.0, 10.0..."
4,37953,"[2023-04-27T19:17:10.000000, 2023-04-27T19:17:...","[14.0, 28.0, 29.0, nan, 36.0, 33.0, 50.0, 100....","[9739205, 9739202, 9737084, 9739274, 9739358, ...","[4.0, 16.0, 4.0, 0.0, 5.0, 5.0, 25.0, 48.0, 6...."
...,...,...,...,...,...
15138,1479974,"[2023-05-18T06:03:16.000000, 2023-05-18T06:03:...","[58.0, 100.0, 21.0, 100.0, 100.0, 100.0, 6.0, ...","[9770989, 9769553, 9770882, 9770541, 9770867, ...","[8.0, 124.0, 9.0, 72.0, 52.0, 70.0, 2.0, 40.0,..."
15139,2405403,"[2023-04-30T19:48:27.000000, 2023-04-30T19:48:...","[40.0, 100.0, 100.0, 43.0, 100.0, 39.0]","[9743574, 9740618, 9742401, 9740156, 9742401, ...","[1.0, 64.0, 6.0, 5.0, 2.0, 120.0]"
15140,2454548,"[2023-05-17T09:35:06.000000, 2023-05-17T09:35:...","[12.0, 10.0, 30.0, 100.0, 33.0, 37.0, 87.0, 57...","[9768328, 9769328, 9769414, 9769380, 9769378, ...","[9.0, 6.0, 3.0, 32.0, 101.0, 10.0, 69.0, 10.0,..."
15141,581228,"[2023-05-18T05:24:32.000000, 2023-05-18T05:24:...","[28.0, 60.0, 100.0, 100.0, 49.0, 100.0]","[9770799, 9770726, 9747757, 9769404, 9769366, ...","[12.0, 15.0, 52.0, 700.0, 7.0, 43.0]"


In [87]:
interactions_df

Unnamed: 0,impression_id,article_id,impression_time,read_time,scroll_percentage,device_type,article_ids_inview,article_ids_clicked,user_id,is_sso_user,gender,postcode,age,is_subscriber,session_id,next_read_time,next_scroll_percentage
0,149474,,2023-05-24 07:47:53,13.0,,2,"[9778623, 9778682, 9778669, 9778657, 9778736, ...",[9778657],139836,False,,,,False,759,7.0,22.0
1,150528,,2023-05-24 07:33:25,25.0,,2,"[9778718, 9778728, 9778745, 9778669, 9778657, ...",[9778623],143471,False,,,,False,1240,287.0,100.0
2,153068,9778682.0,2023-05-24 07:09:04,78.0,100.0,1,"[9778657, 9778669, 9772866, 9776259, 9756397, ...",[9778669],151570,False,,,,False,1976,45.0,100.0
3,153070,9777492.0,2023-05-24 07:13:14,26.0,100.0,1,"[9020783, 9778444, 9525589, 7213923, 9777397, ...",[9778628],151570,False,,,,False,1976,4.0,18.0
4,153071,9778623.0,2023-05-24 07:11:08,125.0,100.0,1,"[9777492, 9774568, 9565836, 9335113, 9771223, ...",[9777492],151570,False,,,,False,1976,26.0,100.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
232882,580099643,9769306.0,2023-05-18 10:01:05,121.0,100.0,3,"[9233208, 9771242, 9767697, 9514481, 9771065, ...",[9770886],2106715,False,,,,False,1416293,121.0,
232883,580099644,9770882.0,2023-05-18 10:05:07,176.0,100.0,3,"[9771065, 9767697, 9770886, 9758882, 9709817, ...",[9769306],2106715,False,,,,False,1416293,148.0,100.0
232884,580099645,9769306.0,2023-05-18 10:11:03,24.0,100.0,3,"[9771042, 9440508, 9486080, 9770997, 9120051, ...",[9771042],2106715,False,,,,False,1416293,4.0,
232885,580100695,9771242.0,2023-05-18 10:00:08,5.0,100.0,1,"[9440508, 9142581, 9769917, 9767697, 9514481, ...",[9767697],2110744,False,,,,False,747086,75.0,100.0


In [88]:
articles_df

Unnamed: 0,article_id,title,subtitle,last_modified_time,premium,body,published_time,image_ids,article_type,url,...,entity_groups,topics,category,subcategory,category_str,total_inviews,total_pageviews,total_read_time,sentiment_score,sentiment_label
0,3001353,Natascha var ikke den første,"Politiet frygter nu, at Nataschas bortfører ha...",2023-06-29 06:20:33,False,Sagen om den østriske Natascha og hendes bortf...,2006-08-31 08:06:45,[3150850],article_default,https://ekstrabladet.dk/krimi/article3001353.ece,...,[],"[Kriminalitet, Personfarlig kriminalitet]",140,[],krimi,,,,0.9955,Negative
1,3003065,Kun Star Wars tjente mere,Biografgængerne strømmer ind for at se 'Da Vin...,2023-06-29 06:20:35,False,Vatikanet har opfordret til at boykotte filmen...,2006-05-21 16:57:00,[3006712],article_default,https://ekstrabladet.dk/underholdning/filmogtv...,...,[],"[Underholdning, Film og tv, Økonomi]",414,"[433, 434]",underholdning,,,,0.8460,Positive
2,3012771,Morten Bruun fyret i SønderjyskE,FODBOLD: Morten Bruun fyret med øjeblikkelig v...,2023-06-29 06:20:39,False,Kemien mellem spillerne i Superligaklubben Søn...,2006-05-01 14:28:40,[3177953],article_default,https://ekstrabladet.dk/sport/fodbold/dansk_fo...,...,[],"[Erhverv, Kendt, Sport, Fodbold, Ansættelsesfo...",142,"[196, 199]",sport,,,,0.8241,Negative
3,3023463,Luderne flytter på landet,I landets tyndest befolkede områder skyder bor...,2023-06-29 06:20:43,False,Det frække erhverv rykker på landet. I den tyn...,2007-03-24 08:27:59,[3184029],article_default,https://ekstrabladet.dk/nyheder/samfund/articl...,...,[],"[Livsstil, Erotik]",118,[133],nyheder,,,,0.7053,Neutral
4,3032577,Cybersex: Hvornår er man utro?,En flirtende sms til den flotte fyr i regnskab...,2023-06-29 06:20:46,False,"De fleste af os mener, at et tungekys er utros...",2007-01-18 10:30:37,[3030463],article_default,https://ekstrabladet.dk/sex_og_samliv/article3...,...,[],"[Livsstil, Partnerskab]",565,[],sex_og_samliv,,,,0.9307,Neutral
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20733,9803492,Vilde billeder: Vulkan i udbrud i ferieparadis,Der er gang i vulkanen på Hawaiis største ø,2023-06-29 06:49:26,False,Det spyer med lava fra vulkanen Kilauea på Haw...,2023-06-08 05:49:20,"[9803493, 9803494, 9803495, 9803495, 9803494]",article_default,https://ekstrabladet.dk/nyheder/samfund/vilde-...,...,"[LOC, LOC, PER, ORG, ORG]","[Katastrofe, Vejr, Større katastrofe]",118,[133],nyheder,535989.0,100120.0,4112624.0,0.6095,Neutral
20734,9803505,Flyvende Antonsen knuser topspiller,"Verdens nummer syv, Chou Tien-Chen, fik ikke e...",2023-06-29 06:49:26,False,Anders Antonsen har holdt pause fra badmintonb...,2023-06-08 05:54:06,[9803516],article_default,https://ekstrabladet.dk/sport/anden_sport/badm...,...,"[PER, PROD, PER, PER, MISC, MISC, PER, PER, LO...","[Kendt, Begivenhed, Sport, Ketcher- og batspor...",142,"[327, 330]",sport,13320.0,959.0,55691.0,0.8884,Positive
20735,9803525,"Dansk skuespiller: - Jeg nægtede, at jeg var syg",Julie R. Ølgaard fik akut kejsersnit og fødte ...,2023-06-29 06:49:26,False,"Mens hun lå søvnløs, lød kakofonien fra baggår...",2023-06-08 06:45:46,"[9803518, 9803519, 9803520, 9803521, 9803522, ...",article_default,https://ekstrabladet.dk/underholdning/dkkendte...,...,"[PER, PROD, PER, PER, PER, PER, MISC]","[Kendt, Livsstil, Familieliv, Underholdning, F...",414,[425],underholdning,315391.0,50361.0,2550671.0,0.7737,Negative
20736,9803560,Så slemt er det: 14.000 huse er oversvømmet,Tusindvis af huse står under vand i Kherson-re...,2023-06-29 06:49:26,False,Et område på omkring 600 kvadratkilometer står...,2023-06-08 06:25:42,,article_default,https://ekstrabladet.dk/nyheder/saa-slemt-er-d...,...,"[LOC, LOC, LOC, PROD, PER, LOC, ORG, ORG, LOC]","[International politik, Katastrofe, Større kat...",118,[],nyheder,21318.0,1237.0,67514.0,0.9927,Negative


## Calculating the average score for each article

In [89]:

# Extract unique article IDs from article_ids_inview and article_ids_clicked
article_ids_inview = set(article_id for sublist in interactions_df['article_ids_inview'] for article_id in sublist)
article_ids_clicked = set(article_id for sublist in  interactions_df['article_ids_clicked'] for article_id in sublist)

# Combine all unique article IDs
all_article_ids = article_ids_inview.union(article_ids_clicked)

# Convert to list for splitting
all_article_ids = list(all_article_ids)

# Split the articles
train_article_ids, test_article_ids = train_test_split(all_article_ids, test_size=0.2, random_state=42)

# Filter interactions for training and testing
train_interactions = interactions_df[interactions_df['article_ids_inview'].apply(lambda x: any(article in train_article_ids for article in x))]
test_interactions = interactions_df[interactions_df['article_ids_inview'].apply(lambda x: any(article in test_article_ids for article in x))]

# Ensure no overlap
train_interactions = train_interactions[
    ~train_interactions.apply(lambda x: any(article_id in test_article_ids for article_id in x['article_ids_clicked']) or any(article_id in test_article_ids for article_id in x['article_ids_inview']), axis=1)
]

filtered_interactions = test_interactions[
    ~test_interactions.apply(lambda x: any(article_id in train_article_ids for article_id in x['article_ids_clicked']) or any(article_id in train_article_ids for article_id in x['article_ids_inview']), axis=1)
]



# Print the number of articles and interactions in each dataset
print(f'Number of total articles: {len(all_article_ids)}')
print(f'Number of articles in training set: {len(train_article_ids)}')
print(f'Number of articles in test set: {len(test_article_ids)}')
print(f'Number of total interactions: {len(interactions_df)}')
print(f'Number of interactions in training set: {len(train_interactions)}')
print(f'Number of interactions in test set: {len(test_interactions)}')


Number of total articles: 3995
Number of articles in training set: 3196
Number of articles in test set: 799
Number of total interactions: 232887
Number of interactions in training set: 43456
Number of interactions in test set: 189431


In [90]:
print(articles_df.shape[0], len(train_article_ids))

train_articles_df = articles_df[articles_df['article_id'].isin(train_article_ids)]
test_articles_df = articles_df[articles_df['article_id'].isin(test_article_ids)]

print(train_articles_df.shape[0])

20738 3196
3196


In [91]:
# 1. Train model on train behaviors -> get a result for some articles
# 2. Test model on validation behaviors -> predict values for articles in there

inview_flat = [item for sublist in train_interactions['article_ids_inview'] for item in sublist]
clicked_flat = [item for sublist in train_interactions['article_ids_clicked'] for item in sublist]
    
inview_counts = Counter(inview_flat)
clicked_counts = Counter(clicked_flat)

inview_df = pd.DataFrame(list(inview_counts.items()), columns=['article_id', 'inview_count'])
clicked_df = pd.DataFrame(list(clicked_counts.items()), columns=['article_id', 'clicked_count'])

# Merge the counts DataFrames
counts_df = pd.merge(inview_df, clicked_df, on='article_id', how='left').fillna(0)

# Calculate the ratio
counts_df['clicks_per_view'] = counts_df['clicked_count'] / counts_df['inview_count']

# Merge with the articles DataFrame
train_articles_df = pd.merge(train_articles_df, counts_df[['article_id', 'clicks_per_view']], on='article_id', how='left').fillna(0)

articles_with_clicks_count = train_articles_df.loc[train_articles_df['clicks_per_view'] > 0].shape[0]

print(f'Train articles with at least 1 click count: {articles_with_clicks_count}') 
print(f'Train articles with 0 clicks: {train_articles_df.size - articles_with_clicks_count}') 

Train articles with at least 1 click count: 1073
Train articles with 0 clicks: 69239


## Building features

### Article-based features

In [92]:
article_feature_names = ['premium','article_type', 'category_str', 'total_inviews', 'total_pageviews', 'total_read_time', 'sentiment_score']

In [93]:
train_article_features = train_articles_df[article_feature_names]
train_article_features = train_article_features.astype({"category_str": 'category', "article_type": 'category'})
train_labels = train_articles_df['clicks_per_view']
train_article_features.dtypes

premium                bool
article_type       category
category_str       category
total_inviews       float64
total_pageviews     float64
total_read_time     float32
sentiment_score     float32
dtype: object

In [97]:
test_article_features = test_articles_df[article_feature_names]
test_article_features = test_article_features.astype({"category_str": 'category', "article_type": 'category'})
test_article_features

Unnamed: 0,premium,article_type,category_str,total_inviews,total_pageviews,total_read_time,sentiment_score
1749,True,article_default,forbrug,,,,0.6123
2214,True,article_default,forbrug,,,,0.5245
2332,True,article_default,underholdning,,,,0.7488
2364,True,article_default,underholdning,,,,0.6074
2534,False,article_default,sex_og_samliv,,,,0.9049
...,...,...,...,...,...,...,...
19032,False,article_default,sport,386926.0,53304.0,4070420.0,0.9834
19041,False,article_default,underholdning,751298.0,169396.0,10188939.0,0.5267
19053,False,article_default,nyheder,231786.0,20536.0,2192031.0,0.9876
19059,False,article_default,nyheder,271255.0,61424.0,1578190.0,0.9334


In [95]:
param = {'objective': 'reg:squarederror'}
param['nthread'] = 4

In [98]:
model = xgb.XGBRegressor(enable_categorical=True)
# define model evaluation method
#cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate model
#scores = cross_val_score(model, article_features, labels, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)
# force scores to be positive
#scores = np.absolute(scores)
# fit model
model.fit(train_article_features, train_labels)
# define new data
# make a prediction
fitness_scores =  model.predict(test_article_features)
test_articles_df['fitness'] = fitness_scores
test_articles_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_articles_df['fitness'] = fitness_scores


Unnamed: 0,article_id,title,subtitle,last_modified_time,premium,body,published_time,image_ids,article_type,url,...,topics,category,subcategory,category_str,total_inviews,total_pageviews,total_read_time,sentiment_score,sentiment_label,fitness
1749,4918926,Stjernekokkenes nemme trick til bedre smag,Nogle af Danmarks største madgenier giver dig ...,2023-06-29 07:14:22,True,Du behøver ikke at gå på restaurant for at få ...,2014-11-25 07:53:06,"[4668258, 4476036, 4476047, 4919602, 4476048, ...",article_default,https://ekstrabladet.dk/forbrug/stjernekokkene...,...,"[Livsstil, Mad og drikke]",457,[],forbrug,,,,0.6123,Neutral,0.171231
2214,5986757,Sådan taber du et kilo rent fedt om ugen,Træningsfysiolog Henrik Duer guider dig i kamp...,2023-06-29 07:24:27,True,Mange sukker efter at smide et par kilo eller ...,2016-03-09 16:44:39,[5986942],article_default,https://ekstrabladet.dk/forbrug/sundhed/saadan...,...,"[Livsstil, Krop og velvære, Mad og drikke]",457,[475],forbrug,,,,0.5245,Neutral,0.160748
2332,6141222,Her er de danske film med ÆGTE sexscener,Der bliver tit antydet sex på film – men fakti...,2023-06-29 07:26:05,True,I flere af filmene blev man nødt til at indkal...,2016-07-18 13:11:09,"[3923988, 6141357, 5550315, 3923988, 5456150, ...",article_default,https://ekstrabladet.dk/underholdning/her-er-d...,...,"[Kendt, Livsstil, Underholdning, Film og tv, E...",414,[],underholdning,,,,0.7488,Neutral,0.160867
2364,6211702,Sådan gik det Harry Potter-stjernerne: En var ...,"Mange fik en stor karriere, men nogle gik til ...",2023-06-29 07:26:43,True,"Hvis der er to fra Potter-universet, man virke...",2016-07-29 13:38:55,"[6212347, 6211720, 6211725, 6211724, 6211738, ...",article_default,https://ekstrabladet.dk/underholdning/saadan-g...,...,"[Kendt, Underholdning, Film og tv]",414,[],underholdning,,,,0.6074,Negative,0.111346
2534,6509487,Stort galleri: De bedste Side 9-piger fra 2016,Josephine Divine vandt konkurrencen om at bliv...,2023-06-29 07:29:27,False,Stemningen er noget ganske særligt. På en og s...,2017-01-28 06:37:52,"[6509237, 6508602, 6509530, 6509240, 6508603, ...",article_default,https://ekstrabladet.dk/sex_og_samliv/stort-ga...,...,"[Kendt, Livsstil, Begivenhed, Erotik, Underhol...",565,[],sex_og_samliv,,,,0.9049,Positive,0.106140
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19032,9779713,Hylder Clara efter ny opvisning: - Vi kan till...,Med to solide sejre i French Open-kvalifikatio...,2023-06-29 06:49:03,False,Hun ligger i øjeblikket på 126.-pladsen på WTA...,2023-05-25 05:51:21,"[9721475, 9779636]",article_default,https://ekstrabladet.dk/sport/anden_sport/tenn...,...,"[Kendt, Begivenhed, Sport, Ketcher- og batspor...",142,"[327, 349]",sport,386926.0,53304.0,4070420.0,0.9834,Positive,0.087041
19041,9779777,Gennemsigtig kjole på den røde løber: Derfor g...,Twerk Queen er ikke bange for at vise sig frem...,2023-06-29 06:49:03,False,Louise 'Twerk Queen' Kjølsen er blevet kendt f...,2023-05-24 20:48:35,"[9779785, 9779785]",article_default,https://ekstrabladet.dk/underholdning/dkkendte...,...,"[Kendt, Livsstil, Underholdning, Film og tv]",414,[425],underholdning,751298.0,169396.0,10188939.0,0.5267,Positive,0.195932
19053,9779956,Bagmænd bag angreb i Belgorod varsler mere af ...,Lederen af Ruslands Frivillige Korps advarer o...,2023-06-29 06:49:03,False,"Lederen af en russisk milits, der i denne uge ...",2023-05-25 03:22:28,[9779965],article_default,https://ekstrabladet.dk/nyheder/krigogkatastro...,...,"[Politik, International politik, Konflikt og k...",118,[127],nyheder,231786.0,20536.0,2192031.0,0.9876,Negative,0.091336
19059,9780039,Signalfejl gav forsinkelser og aflysninger mel...,En signalfejl var torsdag morgen skyld i forlæ...,2023-06-29 06:49:03,False,Den kollektive morgentrafik for DSB var torsda...,2023-05-25 03:16:23,[8702088],article_default,https://ekstrabladet.dk/nyheder/samfund/signal...,...,"[Transportmiddel, Katastrofe, Mindre ulykke, S...",118,[133],nyheder,271255.0,61424.0,1578190.0,0.9334,Negative,0.115895


## Predict clicked articles

In [111]:
def get_best_rec(article_ids_inview):
    fitness_scores = test_articles_df[test_articles_df['article_id'].isin(article_ids_inview)]['fitness']
    if sum(fitness_scores):
        return random.choice(article_ids_inview)
    return random.choices(article_ids_inview, weights=fitness_scores, k=1)[0]

In [113]:
tqdm.pandas()

test_interactions['predict'] = test_interactions['article_ids_inview'].progress_apply(get_best_rec)

100%|██████████| 189431/189431 [01:01<00:00, 3070.48it/s]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_interactions['predict'] = test_interactions['article_ids_inview'].progress_apply(get_best_rec)


In [116]:
test_interactions['actual'] = test_interactions['article_ids_clicked'].apply(lambda ids: ids[0])
accuracy = accuracy_score(test_interactions['actual'], test_interactions['predict'])
print(f'Accuracy: {accuracy}')

Accuracy: 0.1117874054405034


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test_interactions['actual'] = test_interactions['article_ids_clicked'].apply(lambda ids: ids[0])
