# Collaborative Filtering

An experiment with collaborative filtering for generating custom [Bluesky](https://bsky.app) feeds.

In [47]:
import pandas as pd
from sklearn.decomposition import TruncatedSVD
from sklearn.model_selection import train_test_split

df = pd.read_csv("likes.csv")

df.head()

X_train, X_test, y_train, y_test = train_test_split(df["text"], df["likes"], test_size=0.2, random_state=42)

df.head()

Unnamed: 0,text,likes,reposts,createdAt,uri,user,hasLiked
0,@jeffjarvis.bsky.social @leolaporte.me This WP...,1,0,2023-05-13T18:10:07.038Z,at://did:plc:o4zsmmahxrsquyfpdaixrqg6/app.bsky...,did:plc:6y2b4lqfw2j3oyycegx3ey7q,False
1,I’ll never forget sitting in the audience at a...,2,1,2023-05-12T21:50:49.734Z,at://did:plc:fv256ijb3ftgb6hewcujshxw/app.bsky...,did:plc:6y2b4lqfw2j3oyycegx3ey7q,False
2,I once wrote a paper about protocols instead o...,286,50,2023-05-01T07:02:17.157Z,at://did:plc:cak4klqoj3bqgk5rj6b4f5do/app.bsky...,did:plc:6y2b4lqfw2j3oyycegx3ey7q,False
3,My take…\n\nhttp://scripting.com/2023/05/12/12...,1,1,2023-05-13T00:30:29.775Z,at://did:plc:oety7qbfx7x6exn2ytrwikmr/app.bsky...,did:plc:6y2b4lqfw2j3oyycegx3ey7q,False
4,Here's the article!! https://www.theatlantic.c...,1,0,2023-05-12T19:21:11.735Z,at://did:plc:qaqh5r6sxs62ykbzki4tcyad/app.bsky...,did:plc:6y2b4lqfw2j3oyycegx3ey7q,False


In [49]:
# df["createdAt"] = pd.to_datetime(df["createdAt"])

# pivot to map uri to user, if a user has liked a post, the value is the number of likes
df_pivot = df.pivot(index="uri", columns="user", values="likes")

# add hasConsumed = False value to df
df["hasConsumed"] = False
# cast as bool
df["hasConsumed"] = df["hasConsumed"].astype("bool")

# replace NaN with 0
df_pivot = df_pivot.fillna(0)

X_train, X_test, y_train, y_test = train_test_split(df_pivot, df_pivot, test_size=0.2, random_state=42)

svd = TruncatedSVD(n_components=2, n_iter=7, random_state=42)

svd.fit(X_train)

# run prediction on test set
X_test_svd = svd.transform(X_test)

# order by score
recommendations = pd.DataFrame(X_test_svd, index=X_test.index, columns=["x", "y"])

recommendations.head()

Unnamed: 0_level_0,x,y
uri,Unnamed: 1_level_1,Unnamed: 2_level_1
at://did:plc:buofnbcavecxm3kr6x5npusi/app.bsky.feed.post/3jvftj72za22y,0.0,0.0
at://did:plc:pdfj2bwwtcism3cenqik7g43/app.bsky.feed.post/3jvfgjcoibs2m,2.828781e-15,5.0
at://did:plc:eihm7jax4vt7oiichmxdyfjq/app.bsky.feed.post/3jvn52xw76y2j,8.317701000000001e-17,-7.492957e-16
at://did:plc:qj7kf4af24waqlwiesconf4n/app.bsky.feed.post/3jqggyua6vk2w,-1.064216e-16,-1.042623e-15
at://did:plc:mbwlzlpc5bxj62ezllqhm2sn/app.bsky.feed.post/3jvmbo2lw7u26,1.051251e-14,-5.379939e-14


In [55]:
# list comprehension to get top 10 and print the contents to the console
[print(df[df["uri"] == uri]["text"].values[0]) for uri in recommendations.head(10).index]

We got Craig Newmark joining the @web0.bsky.social club forum! Come hang out with us.

We're going back to the future with Web0!!

Get in!

https://web0.discourse.group/invites/CkN932YeMb
Yeah! sorted out my domain on here
Ship of Theseus
Hey jack ;D
You “can’t” put sauce on bread? Please evict the cops from your mind!
Happy birthday! That FIT
Big woman rat czar vibes
😂
My Pilates class has had a sub the last 3weeks who gives the most severe “would never live in NYC” vibes so I’ve been v curious about her and yesterday I saw her, miles away from our gym, leaving a building with her Maltese and it felt like scratching an itch in the part of your back you can’t reach
I’ve arrived


[None, None, None, None, None, None, None, None, None, None]