# Recommender System

In [26]:
#!pip install -U pip setuptools wheel
#!pip install -U spacy
#!pip install -U scikit-learn
#!python -m spacy download en_core_web_lg
#!python -m spacy download de_core_news_lg

import pandas as pd
import spacy
import os as os
from sklearn.feature_extraction.text import TfidfVectorizer

nlp = spacy.load("en_core_web_lg")
nlp_germ = spacy.load("de_core_news_lg")


### Load prepared Dataset

In [102]:
filename = "mastodon.social_toots.csv"
path= "../scraper/datasets"
data = pd.read_csv(os.path.join(path, filename), sep=";")
data.head()
data.set_index("id")

0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            たまに重くにゃって`S

### Select toots in english and german

In [86]:
mask_language = (data["language"] == "en") | (data["language"] == "de")
data = data[mask_language]
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6464 entries, 3 to 11998
Data columns (total 25 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   id                      6464 non-null   int64  
 1   created_at              6464 non-null   object 
 2   in_reply_to_id          378 non-null    float64
 3   in_reply_to_account_id  380 non-null    float64
 4   sensitive               6464 non-null   bool   
 5   spoiler_text            179 non-null    object 
 6   visibility              6464 non-null   object 
 7   language                6464 non-null   object 
 8   uri                     6464 non-null   object 
 9   url                     6464 non-null   object 
 10  replies_count           6464 non-null   int64  
 11  reblogs_count           6464 non-null   int64  
 12  favourites_count        6464 non-null   int64  
 13  edited_at               170 non-null    object 
 14  content                 6464 non-null  

In [63]:
test_toot_df = data
#delete entries with same toot_id
test_toot_df = test_toot_df.drop_duplicates(subset="id", keep="first")
test_toot_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6464 entries, 3 to 11998
Data columns (total 25 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   id                      6464 non-null   int64  
 1   created_at              6464 non-null   object 
 2   in_reply_to_id          378 non-null    float64
 3   in_reply_to_account_id  380 non-null    float64
 4   sensitive               6464 non-null   bool   
 5   spoiler_text            179 non-null    object 
 6   visibility              6464 non-null   object 
 7   language                6464 non-null   object 
 8   uri                     6464 non-null   object 
 9   url                     6464 non-null   object 
 10  replies_count           6464 non-null   int64  
 11  reblogs_count           6464 non-null   int64  
 12  favourites_count        6464 non-null   int64  
 13  edited_at               170 non-null    object 
 14  content                 6464 non-null  

### Recommender System for local timeline

- Step 1. Get relevant toots depending on content after selecting the interests (after registration) from people in local timeline
- Step 2. Get toots from people you follow 
- Step 3. Get persons with simular interests (who to follow)
- Step 4. Get toots by hashtags (filter hashtags by interests)
- Step 5. Mix data
- Step 6. Rank the toots in a ranking system and sort them descending

##### Initial problems on setup: 
- missing toots in local timeline
- missing persons with simular interests
- missing toots from peope you follow

##### Solutions:

- Create initial content in local timeline bot content 
- ....

#### Step 1: Get relevant toots depending on content after selecting the interests (after registration) from people in local timeline

In [64]:
interests = ["climbing", "gaming", "datascience", "politics", "math"] #create list of interests after login/registration

##### Simularity Check with spacy

In [65]:
def lemmatize_text(text):
    """Function to lemmatize text data and remove the stopwords."""
    doc = nlp(text)
    
    # Lemmatization and removal of stop words
    processed_tokens = [token.lemma_ for token in doc if not token.is_stop]
    
    # Return the formatted text as a string
    processed_text = ' '.join(processed_tokens)
    
    return processed_text

In [66]:
# Create new column with lemmatized text
test_toot_df["content_lemma"] = test_toot_df["content"].apply(lemmatize_text)
test_toot_df.head()

Unnamed: 0,id,created_at,in_reply_to_id,in_reply_to_account_id,sensitive,spoiler_text,visibility,language,uri,url,...,account,media_attachments,mentions,tags,emojis,card,poll,application,instance,content_lemma
3,110349884422547899,2023-05-11 11:46:49.481000+00:00,,,False,,public,en,https://mastodon.social/users/lizardsskintattoos/statuses/110349884422547899,https://mastodon.social/@lizardsskintattoos/110349884422547899,...,"{'id': 107144665432235983, 'username': 'lizardsskintattoos', 'acct': 'lizardsskintattoos', 'display_name': ""Lizard's Skin Tattoos"", 'locked': False, 'bot': False, 'discoverable': False, 'group': False, 'created_at': datetime.datetime(2021, 10, 22, 0, 0, tzinfo=tzutc()), 'note': '<p>LIZARD&#39;S SKIN TATTOOS is anything but a conventional tattoo shop. Established by Niloy Das and Punam Barua Das, here the specialists offer a niche range of services. Whether customers want their obscure and unique thoughts to be inked or are looking for suggestions before getting their body art done, the experts at LIZARD&#39;S SKIN TATTOOS will provide the best-illustrated help to make the customers artistic vision a reality.</p><p>Visit: <a href=""https://www.lizardsskintattoos.com/"" target=""_blank"" rel=""nofollow noopener noreferrer""><span class=""invisible"">https://www.</span><span class="""">lizardsskintattoos.com/</span><span class=""invisible""></span></a></p>', 'url': 'https://mastodon.social/@lizardsskintattoos', 'avatar': 'https://files.mastodon.social/accounts/avatars/107/144/665/432/235/983/original/36f37d1e34804a61.png', 'avatar_static': 'https://files.mastodon.social/accounts/avatars/107/144/665/432/235/983/original/36f37d1e34804a61.png', 'header': 'https://files.mastodon.social/accounts/headers/107/144/665/432/235/983/original/0d98a73dffe50094.png', 'header_static': 'https://files.mastodon.social/accounts/headers/107/144/665/432/235/983/original/0d98a73dffe50094.png', 'followers_count': 2, 'following_count': 0, 'statuses_count': 174, 'last_status_at': datetime.datetime(2023, 5, 11, 0, 0), 'noindex': False, 'emojis': [], 'roles': [], 'fields': []}","[{'id': 110349883078216931, 'type': 'image', 'url': 'https://files.mastodon.social/media_attachments/files/110/349/883/078/216/931/original/d614c8bf7006e63a.jpg', 'preview_url': 'https://files.mastodon.social/media_attachments/files/110/349/883/078/216/931/small/d614c8bf7006e63a.jpg', 'remote_url': None, 'preview_remote_url': None, 'text_url': None, 'meta': {'original': {'width': 1000, 'height': 500, 'size': '1000x500', 'aspect': 2.0}, 'small': {'width': 678, 'height': 339, 'size': '678x339', 'aspect': 2.0}, 'focus': {'x': 0.0, 'y': 0.0}}, 'description': 'tattoo removal in kolkata near me\n', 'blurhash': 'UqG+H?01-oSiogWBayogIqozt6RjWBj[ofWB'}]",[],"[{'name': 'best', 'url': 'https://mastodon.social/tags/best'}, {'name': 'tattoo', 'url': 'https://mastodon.social/tags/tattoo'}, {'name': 'removal', 'url': 'https://mastodon.social/tags/removal'}, {'name': 'clinic', 'url': 'https://mastodon.social/tags/clinic'}, {'name': 'kolkata', 'url': 'https://mastodon.social/tags/kolkata'}, {'name': 'lasertattooremovalinkolkata', 'url': 'https://mastodon.social/tags/lasertattooremovalinkolkata'}, {'name': 'tattooremovalinkolkata', 'url': 'https://mastodon.social/tags/tattooremovalinkolkata'}, {'name': 'bestlasertattooremovalclinicinkolkata', 'url': 'https://mastodon.social/tags/bestlasertattooremovalclinicinkolkata'}, {'name': 'besttattooremovalclinicinkolkata', 'url': 'https://mastodon.social/tags/besttattooremovalclinicinkolkata'}, {'name': 'tattooremovalinkolkatanearme', 'url': 'https://mastodon.social/tags/tattooremovalinkolkatanearme'}]",[],,,"{'name': 'Web', 'website': None}",mastodon.social,Laser Tattoo Removal Kolkata : safe effective Solution \n\n look [ # best](https://mastodon.social / tag / good ) \n [ # tattoo](https://mastodon.social / tag / tattoo ) \n [ # removal](https://mastodon.social / tag / removal ) \n [ # clinic](https://mastodon.social / tag / clinic ) \n [ # kolkata](https://mastodon.social / tag / Kolkata ) ? safe effective laser \n tattoo removal service help achieve ink - free skin . goodbye \n unwanted tattoo expert team professional . \n\n Click : [ https://bit.ly/3WXGgxp](https://bit.ly/3WXGgxp ) \n\n [ # lasertattooremovalinkolkata](https://mastodon.social / tag / lasertattooremovalinkolkata ) \n [ # tattooremovalinkolkata](https://mastodon.social / tag / tattooremovalinkolkata ) \n [ # bestlasertattooremovalclinicinkolkata](https://mastodon.social / tag / bestlasertattooremovalclinicinkolkata ) \n [ # besttattooremovalclinicinkolkata](https://mastodon.social / tag / besttattooremovalclinicinkolkata ) \n [ # tattooremovalinkolkatanearme](https://mastodon.social / tag / tattooremovalinkolkatanearme ) \n\n
6,110349884397327123,2023-05-11 11:46:47+00:00,,,False,,public,en,https://news.ongii.com/users/bbcworld/statuses/110349884288072547,https://news.ongii.com/@bbcworld/110349884288072547,...,"{'id': 109361065991072567, 'username': 'bbcworld', 'acct': 'bbcworld@news.ongii.com', 'display_name': 'BBC World', 'locked': False, 'bot': True, 'discoverable': False, 'group': False, 'created_at': datetime.datetime(2022, 11, 17, 0, 0, tzinfo=tzutc()), 'note': '', 'url': 'https://news.ongii.com/@bbcworld', 'avatar': 'https://files.mastodon.social/cache/accounts/avatars/109/361/065/991/072/567/original/d7b7803aa5f515d7.png', 'avatar_static': 'https://files.mastodon.social/cache/accounts/avatars/109/361/065/991/072/567/original/d7b7803aa5f515d7.png', 'header': 'https://mastodon.social/headers/original/missing.png', 'header_static': 'https://mastodon.social/headers/original/missing.png', 'followers_count': 437, 'following_count': 0, 'statuses_count': 9631, 'last_status_at': datetime.datetime(2023, 5, 11, 0, 0), 'emojis': [], 'fields': []}",[],[],[],[],,,,mastodon.social,"RT @BBCBreaking : UK confirm send long - range storm shadow missile \n Ukraine , Kyiv prepare counter - offensive Russia … \n\n"
8,110349884324145120,2023-05-11 11:46:47+00:00,,,False,,public,en,https://me.dm/users/atheistrevolution/statuses/110349884298113565,https://me.dm/@atheistrevolution/110349884298113565,...,"{'id': 110028947376075126, 'username': 'atheistrevolution', 'acct': 'atheistrevolution@me.dm', 'display_name': 'Jack Vance', 'locked': False, 'bot': False, 'discoverable': False, 'group': False, 'created_at': datetime.datetime(2023, 3, 15, 0, 0, tzinfo=tzutc()), 'note': '<p>Blogger @ Atheist Revolution (<a href=""https://www.atheistrev.com/"" rel=""nofollow noopener noreferrer"" target=""_blank""><span class=""invisible"">https://www.</span><span class="""">atheistrev.com/</span><span class=""invisible""></span></a>). I write about atheism, humanism, skepticism, freethought, and other topics of interest.</p>', 'url': 'https://me.dm/@atheistrevolution', 'avatar': 'https://files.mastodon.social/cache/accounts/avatars/110/028/947/376/075/126/original/6ab77024b3318464.jpeg', 'avatar_static': 'https://files.mastodon.social/cache/accounts/avatars/110/028/947/376/075/126/original/6ab77024b3318464.jpeg', 'header': 'https://files.mastodon.social/cache/accounts/headers/110/028/947/376/075/126/original/a6f44a94279ff2eb.jpg', 'header_static': 'https://files.mastodon.social/cache/accounts/headers/110/028/947/376/075/126/original/a6f44a94279ff2eb.jpg', 'followers_count': 15, 'following_count': 28, 'statuses_count': 32, 'last_status_at': datetime.datetime(2023, 5, 11, 0, 0), 'emojis': [], 'fields': [{'name': 'Medium', 'value': '<a href=""https://medium.com/@atheistrevolution"" rel=""nofollow noopener noreferrer"" target=""_blank""><span class=""invisible"">https://</span><span class="""">medium.com/@atheistrevolution</span><span class=""invisible""></span></a>', 'verified_at': '2023-03-19T14:28:54.717+00:00'}]}",[],[],"[{'name': 'retail', 'url': 'https://mastodon.social/tags/retail'}, {'name': 'localbusiness', 'url': 'https://mastodon.social/tags/localbusiness'}, {'name': 'mississippi', 'url': 'https://mastodon.social/tags/mississippi'}, {'name': 'shopping', 'url': 'https://mastodon.social/tags/shopping'}, {'name': 'business', 'url': 'https://mastodon.social/tags/business'}]",[],,,,mastodon.social,exodus brick - - mortar retail decade . \n think product category choice \n increase . think choice decrease . \n\n [ https://medium.com/@atheistrevolution/are-we-nearing-the-end-of-brick-and- \n mortar - retail - e29c7bae55e8](https://medium.com/@atheistrevolution / - we- \n near - - end - - brick - - mortar - retail - e29c7bae55e8 ) \n\n [ # retail](https://me.dm / tag / retail ) \n [ # localbusiness](https://me.dm / tag / LocalBusiness ) \n [ # mississippi](https://me.dm / tag / mississippi ) \n [ # shopping](https://me.dm / tag / shopping ) \n [ # business](https://me.dm / tag / business ) \n\n
9,110349884319932861,2023-05-11 11:46:47.907000+00:00,1.103499e+17,1.093585e+17,False,,public,en,https://mastodon.social/users/Incognitim/statuses/110349884319932861,https://mastodon.social/@Incognitim/110349884319932861,...,"{'id': 109358544043904637, 'username': 'Incognitim', 'acct': 'Incognitim', 'display_name': 'Incognitim', 'locked': False, 'bot': False, 'discoverable': False, 'group': False, 'created_at': datetime.datetime(2022, 11, 17, 0, 0, tzinfo=tzutc()), 'note': '<p><a href=""https://www.wired.com/2008/07/thomas-jeffersons-all-american-incognitum/"" target=""_blank"" rel=""nofollow noopener noreferrer""><span class=""invisible"">https://www.</span><span class=""ellipsis"">wired.com/2008/07/thomas-jeffe</span><span class=""invisible"">rsons-all-american-incognitum/</span></a></p><p>Not good at talking about myself, but you can get to know me by looking through my toots.😅<br />If I followed you, I probably did the same!</p><p>I like almost <a href=""https://mastodon.social/tags/AllTheThings"" class=""mention hashtag"" rel=""tag"">#<span>AllTheThings</span></a><br />I favourite frequently &amp; follow freely, &#39;cause that&#39;s what makes the fediverse fun!💚</p><p>&quot;We can have democracy, or extreme wealth, but we can&#39;t have both.&quot;</p><p>Profile: COME &amp; TAKE IT yard sign (but w/ person reclining, reading a📖)<br />Header: sleepy black cat &amp; his🎃</p>', 'url': 'https://mastodon.social/@Incognitim', 'avatar': 'https://files.mastodon.social/accounts/avatars/109/358/544/043/904/637/original/851becc921f211cd.jpg', 'avatar_static': 'https://files.mastodon.social/accounts/avatars/109/358/544/043/904/637/original/851becc921f211cd.jpg', 'header': 'https://files.mastodon.social/accounts/headers/109/358/544/043/904/637/original/98d63c0d1e0a1e27.jpg', 'header_static': 'https://files.mastodon.social/accounts/headers/109/358/544/043/904/637/original/98d63c0d1e0a1e27.jpg', 'followers_count': 1189, 'following_count': 3408, 'statuses_count': 4646, 'last_status_at': datetime.datetime(2023, 5, 11, 0, 0), 'noindex': True, 'emojis': [], 'roles': [], 'fields': [{'name': 'Arts', 'value': '📖🎶📷🎨📜⚖️🎭🤡🧙🦸🛸🕵️👻🎬\U0001faac', 'verified_at': None}, {'name': 'Science', 'value': '💻🧮🔭♻️🌍🧠🔬🧬📐🌱🩺🧪🧰🦠🤖👽', 'verified_at': None}, {'name': 'Critters', 'value': '🦣😸🐶🦝🦕🦭🐙🦏🐧🐿️🦋🐉🦍🐢🦥🐨🐐🐼🐳🐞🐝🕷️🦠', 'verified_at': None}, {'name': 'LeisurActiviHobbies', 'value': '📺📰📚🛏️🏀👨\u200d🍳🏋️📝📸🎤🕺☮️💟♋☯️🏓🖌️🥏🎾🕹️', 'verified_at': None}]}",[],[],[],[],,,"{'name': 'Web', 'website': None}",mastodon.social,Ken Jennings win lot money Equal Justice Initiative \n Celebrity Wheel Fortune ! 😅 \n\n
10,110349884293929776,2023-05-11 11:46:47.515000+00:00,,,False,,public,en,https://mastodon.social/users/webdev_discussions/statuses/110349884293929776,https://mastodon.social/@webdev_discussions/110349884293929776,...,"{'id': 108333375962034474, 'username': 'webdev_discussions', 'acct': 'webdev_discussions', 'display_name': 'Webdev Weekly', 'locked': False, 'bot': True, 'discoverable': True, 'group': False, 'created_at': datetime.datetime(2022, 5, 20, 0, 0, tzinfo=tzutc()), 'note': '<p>Articles, projects and tutorials about <a href=""https://mastodon.social/tags/JavaScript"" class=""mention hashtag"" rel=""tag"">#<span>JavaScript</span></a>, <a href=""https://mastodon.social/tags/CSS"" class=""mention hashtag"" rel=""tag"">#<span>CSS</span></a>, <a href=""https://mastodon.social/tags/Wasm"" class=""mention hashtag"" rel=""tag"">#<span>Wasm</span></a>, etc.</p><p>Weekly newsletter: <a href=""https://discu.eu/weekly/webdev"" target=""_blank"" rel=""nofollow noopener noreferrer""><span class=""invisible"">https://</span><span class="""">discu.eu/weekly/webdev</span><span class=""invisible""></span></a></p>', 'url': 'https://mastodon.social/@webdev_discussions', 'avatar': 'https://files.mastodon.social/accounts/avatars/108/333/375/962/034/474/original/91a23f5e9af93fde.png', 'avatar_static': 'https://files.mastodon.social/accounts/avatars/108/333/375/962/034/474/original/91a23f5e9af93fde.png', 'header': 'https://files.mastodon.social/accounts/headers/108/333/375/962/034/474/original/e1e3b25b971f44fa.jpeg', 'header_static': 'https://files.mastodon.social/accounts/headers/108/333/375/962/034/474/original/e1e3b25b971f44fa.jpeg', 'followers_count': 1761, 'following_count': 22, 'statuses_count': 2864, 'last_status_at': datetime.datetime(2023, 5, 11, 0, 0), 'noindex': False, 'emojis': [], 'roles': [], 'fields': [{'name': 'Newsletter', 'value': '<a href=""https://discu.eu/weekly/webdev"" target=""_blank"" rel=""nofollow noopener noreferrer me""><span class=""invisible"">https://</span><span class="""">discu.eu/weekly/webdev</span><span class=""invisible""></span></a>', 'verified_at': '2022-05-27T19:03:01.711+00:00'}, {'name': 'Other bots', 'value': '<a href=""https://discu.eu/social"" target=""_blank"" rel=""nofollow noopener noreferrer me""><span class=""invisible"">https://</span><span class="""">discu.eu/social</span><span class=""invisible""></span></a>', 'verified_at': '2022-05-27T19:03:01.770+00:00'}]}",[],[],"[{'name': 'webdev', 'url': 'https://mastodon.social/tags/webdev'}]",[],"{'url': 'https://os-clock.web.app/', 'title': 'Clock Web App', 'description': 'The best Open Source Clock Web App that features Alarms, a timer, a stopwatch and many other intuitive features', 'language': 'en', 'type': 'link', 'author_name': '', 'author_url': '', 'provider_name': '', 'provider_url': '', 'html': '', 'width': 0, 'height': 0, 'image': None, 'embed_url': '', 'blurhash': None}",,"{'name': 'discu.eu', 'website': 'https://discu.eu'}",mastodon.social,feature add open source Clock PWA \n\n [ https://os-clock.web.app/](https://os-clock.web.app/ ) \n\n discussion : [ https://discu.eu/q/https://os- \n clock.web.app/](https://discu.eu/q/https://os-clock.web.app/ ) \n\n [ # webdev](https://mastodon.social / tag / webdev ) \n\n


In [68]:
def calculate_content_similarity_score(interests, toot_dataframe, sort_dataframe_by_content_similarity=True):
    """Function to calculate the similarity score between the interests and the toot content."""
    
    # Create a list of tuples (similarity, toot) for the most similar toots
    similarity_scores = []
    for _, toot in toot_dataframe.iterrows():
        toot_content = toot['content_lemma']
        toot_doc = nlp(toot_content)
        
        # Calculate the average similarity between the interests and the toot content
        similarity_scores_sum = 0
        for interest in interests:
            interest_doc = nlp(interest)
            similarity_scores_sum += toot_doc.similarity(interest_doc)
        
        # Calculate the average similarity score
        similarity_score = similarity_scores_sum / len(interests)
        
        similarity_scores.append((similarity_score, toot))
    
    # Create a new DataFrame with the additional column similarity_score
    result_dataframe = toot_dataframe.copy()
    result_dataframe['content_similarity_score'] = [score for score, _ in similarity_scores]
    
    if sort_dataframe_by_content_similarity:
        # Sort the DataFrame by the column similarity_score (descending) and reset the index
        result_dataframe.sort_values('content_similarity_score', ascending=False, inplace=True)
        result_dataframe.reset_index(drop=True, inplace=True)
    
    
    return result_dataframe

In [69]:
interests

['climbing', 'gaming', 'datascience', 'politics', 'math']

In [87]:
pd.set_option('display.max_colwidth', None)
similar_toots = calculate_content_similarity_score(interests, test_toot_df)

  similarity_scores_sum += toot_doc.similarity(interest_doc)


In [88]:
similar_toots[:5]

Unnamed: 0,id,created_at,in_reply_to_id,in_reply_to_account_id,sensitive,spoiler_text,visibility,language,uri,url,...,media_attachments,mentions,tags,emojis,card,poll,application,instance,content_lemma,content_similarity_score
0,110349742245305990,2023-05-11 11:10:40.024000+00:00,,,False,,public,en,https://mastodon.social/users/archonet/statuses/110349742245305990,https://mastodon.social/@archonet/110349742245305990,...,[],[],[],[],,,"{'name': 'Mastodon for Android', 'website': 'https://app.joinmastodon.org/android'}",mastodon.social,"fun fact : people harp gun , \n exactly GOP want focus actual core \n issue . country go gun , \n nigh - impossibility multitude factor , ram head \n wall instead focus actual bipartisan - achievable goal \n like robust healthcare system social safety net ACTUALLY \n work , play right hand . \n\n",0.327257
1,110349749932623341,2023-05-11 11:12:36+00:00,,,False,,public,en,https://chaos.social/users/nightlynx/statuses/110349749901505104,https://chaos.social/@nightlynx/110349749901505104,...,[],[],[],[],,,,mastodon.social,"stop learn programming language day ! \n\n Mastery topic require year . pick programming language \n mature break change minor version . , prefer \n grammar syntax like . ignore . language will \n fix . programming drive library module . \n language provide feature actually solve problem . \n\n",0.322633
2,110349663174558333,2023-05-11 10:50:33+00:00,1.103497e+17,1.101396e+17,True,"Mentalhealth, panic attacks, anxiety",public,en,https://mastodon.gamedev.place/users/pepe/statuses/110349663162894233,https://mastodon.gamedev.place/@pepe/110349663162894233,...,[],[],[],[],,,,mastodon.social,"2020 start anxiety issue space lot people . \n guess result stress pandemic fresh \n father , leader gamedev team manager game dev collective . area \n hit hard pandemic . \n\n",0.321296
3,110349713107166261,2023-05-11 11:03:15.411000+00:00,,,False,,public,en,https://mastodon.social/users/robc/statuses/110349713107166261,https://mastodon.social/@robc/110349713107166261,...,[],[],[],[],,,"{'name': 'Web', 'website': None}",mastodon.social,"funny , day feel like experience strong sense choice \n paralysis game want sit video . \n\n juggle Big List head - stuff want talk \n , competent \n able . \n\n",0.318607
4,110349695285998636,2023-05-11 10:58:41+00:00,1.103492e+17,234043.0,False,,public,en,https://mastodon.gamedev.place/users/sinbad/statuses/110349695145174426,https://mastodon.gamedev.place/@sinbad/110349695145174426,...,"[{'id': 110349695191270934, 'type': 'image', 'url': 'https://files.mastodon.social/cache/media_attachments/files/110/349/695/191/270/934/original/84abbdb43a7adaca.png', 'preview_url': 'https://files.mastodon.social/cache/media_attachments/files/110/349/695/191/270/934/small/84abbdb43a7adaca.png', 'remote_url': 'https://cdn.masto.host/mastodongamedevplace/media_attachments/files/110/349/674/675/125/758/original/0e2e01e36841c036.png', 'preview_remote_url': None, 'text_url': None, 'meta': {'focus': {'x': 0.0, 'y': 0.0}, 'original': {'width': 474, 'height': 607, 'size': '474x607', 'aspect': 0.7808896210873146}, 'small': {'width': 424, 'height': 543, 'size': '424x543', 'aspect': 0.7808471454880295}}, 'description': ""Picture of an ICL mainframe room from maybe the late 70's / early 80s. There are reel-to-reel tapes in the background, terminals in front, all in a snazzy white and salmon colour scheme. 3 people are there, and there are both flares and perms"", 'blurhash': 'UFG[yi?cIU.89Z4TaKRjxvofozt7?vtRIUW;'}]",[],[],[],,,,mastodon.social,"work computer large human climb , \n smart expose button press \n shut entire system \n\n learn ? \n\n",0.318301


### Interaction Score
Im folgenden Abschnitt wird ein Interaktion Score berechnet der sich aus der Summe der Interaktionen (favourites_count, replies_count, reblogs_count) zusammensetzt. Dieser Score wird anschließend auf 0-1 nomiert. 

In [89]:
def calculate_interaction_score(toot_df, sort_by_interaction_score=False):
    """Function to calculate the interaction score of a toot."""
    
    # Calculate the interaction score
    toot_df['interaction_score'] = toot_df['favourites_count'] + toot_df['replies_count'] + toot_df['reblogs_count']
    
    # Normalize the interaction score to the value range [0, 1]
    max_interaction_score = toot_df['interaction_score'].max()
    toot_df['interaction_score'] = toot_df['interaction_score'] / max_interaction_score
    
    if sort_by_interaction_score:
        # Sort the DataFrame according to the interaction score (descending)
        toot_df.sort_values('interaction_score', ascending=False, inplace=True)
        toot_df.reset_index(drop=True, inplace=True)
    
    return toot_df

In [90]:
similar_toots = calculate_interaction_score(similar_toots, True)
similar_toots.head()

Unnamed: 0,id,created_at,in_reply_to_id,in_reply_to_account_id,sensitive,spoiler_text,visibility,language,uri,url,...,mentions,tags,emojis,card,poll,application,instance,content_lemma,content_similarity_score,interaction_score
0,110349626055347387,2023-05-11 10:41:07.110000+00:00,,,False,,public,en,https://mastodon.social/users/dansup/statuses/110349626055347387,https://mastodon.social/@dansup/110349626055347387,...,"[{'id': 354718, 'username': 'pixelfed', 'url': 'https://mastodon.social/@pixelfed', 'acct': 'pixelfed'}]","[{'name': 'pixelfed', 'url': 'https://mastodon.social/tags/pixelfed'}]",[],"{'url': 'https://fedidb.org/', 'title': 'FediDB - Developer Tools for ActivityPub', 'description': 'Developer Tools for ActivityPub', 'language': 'en', 'type': 'link', 'author_name': '', 'author_url': '', 'provider_name': '', 'provider_url': '', 'html': '', 'width': 0, 'height': 0, 'image': None, 'embed_url': '', 'blurhash': None}",,"{'name': 'Web', 'website': None}",mastodon.social,"wear lot hat , balance : \n\n \- pixelfed support ( matrix / discord ) \n \- pixelfed backend / web dev \n \- pixelfed mobile app dev \n \- pixelfed.(art|social ) admin \n \- pixelfed marketing \n \- [ https://fedidb.org](https://fedidb.org ) dev \n \- [ https://fediverse.info](https://fediverse.info ) dev \n\n related project , initiative outreach project . \n\n wrong , love , like \n [ @pixelfed](https://mastodon.social/@pixelfed ) dev slow . \n\n try hard prepare app public release month 🤞 \n [ # pixelfed](https://mastodon.social / tag / pixelfed ) \n\n",0.110268,1.0
1,110349759434840213,2023-05-11 11:15:02.317000+00:00,,,False,,public,de,https://mastodon.social/users/derpostillon/statuses/110349759434840213,https://mastodon.social/@derpostillon/110349759434840213,...,[],[],[],"{'url': 'https://www.der-postillon.com/2023/05/waisenhaus-muttertag.html', 'title': 'Woke-Wahnsinn! Waisenhaus will dieses Jahr keine Muttertagsgeschenke basteln', 'description': ' München (dpo) - Wie weit soll der woke Irrsinn noch gehen? Das katholische Waisenhaus St. Bartholomä in München (Traunwörter Allee 312, Tel...', 'language': 'de', 'type': 'link', 'author_name': 'Der Postillon', 'author_url': '', 'provider_name': 'Blogger', 'provider_url': '', 'html': '', 'width': 1600, 'height': 917, 'image': 'https://files.mastodon.social/cache/preview_cards/images/061/674/452/original/d794d18796d649c1.jpg', 'embed_url': '', 'blurhash': 'UIJaWKot~V};4pw}IAOt%e569ZR4p0VX%1o~'}",,"{'name': 'Buffer', 'website': 'https://buffer.com'}",mastodon.social,Woke - Wahnsinn ! Waisenhaus diese Jahr keine Muttertagsgeschenke basteln \n [ https://www.der-postillon.com/2023/05/waisenhaus- \n muttertag.html](https://www.der - postillon.com/2023/05 / waisenhaus- \n muttertag.html ) \n\n,-0.030085,0.573034
2,110349632868334707,2023-05-11 10:42:40+00:00,1.103496e+17,787625.0,False,,public,en,https://aus.social/users/screenbeard/statuses/110349632148005908,https://aus.social/@screenbeard/110349632148005908,...,[],[],[],,,,mastodon.social,"Oof . \n\n fathom female president , male nail technician , female doctor , \n female nurse ( course ) , male babysitter , gendere banana . \n matter time ask , swear way know nurse \n refer "" "" overwhelmed . \n\n",0.220813,0.505618
3,110349650223520953,2023-05-11 10:46:54+00:00,,,False,,public,en,https://tldr.nettime.org/users/tante/statuses/110349648850614548,https://tldr.nettime.org/@tante/110349648850614548,...,[],[],[],,,,mastodon.social,"tired "" AI "" thing . space \n bland milquetoast . \n\n Web3 / crypto batshit insane . "" AI "" corporate fan fiction \n ( fun horny ) \n\n",0.178658,0.505618
4,110349744236670561,2023-05-11 11:11:09+00:00,,,False,,public,en,https://mas.to/users/carnage4life/statuses/110349744160087706,https://mas.to/@carnage4life/110349744160087706,...,[],[],[],,,,mastodon.social,Twitter dm encrypt way Tesla car self drive . \n\n,0.229289,0.47191


### TFIDF and Cosine Similarity

In [124]:
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics.pairwise import linear_kernel

# Initialize an instance of tf-idf Vectorizer
tfidf_vectorizer = TfidfVectorizer()

# Generate the tf-idf vectors for the corpus
content = similar_toots['content'] 
tfidf_matrix = tfidf_vectorizer.fit_transform(content)

# compute and print the cosine similarity matrix
cosine_sim_linear = linear_kernel(tfidf_matrix, tfidf_matrix)


In [98]:
def calculate_cosine_similarity(index, cosine_sim):
    # Get the pairwsie similarity scores
    sim_scores = list(enumerate(cosine_sim[index]))
    # Sort the movies based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    # Get the scores for 10 most similar movies
    sim_scores = sim_scores[1:11]

    sim_indices = [i[0] for i in sim_scores]

    return sim_indices

In [118]:
content[3]

'I am so so tired of the whole "AI" thing. Everything out of that space is so\nbland and milquetoast.\n\nWeb3/crypto was at least batshit insane. "AI" is just corporate fan fiction\n(and not the fun horny one)\n\n'

In [117]:

content.iloc[calculate_cosine_similarity(3, cosine_sim_linear)]


16                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               I am once again reiterating that the problem with "AI turns this bullet point\ninto a long email I can pretend I wrote.

### Vorübergehender Ranking Score
Im folgenden Abschnitt wird ein Ranking Score berechnet der sich aus der Summe der der gewichteten Scores zusammensetzt. Das Dataframe wird nach dem Ranking Score definiert.

In [24]:
def calculate_ranking_score(toot_df, similarity_weight, interaction_weight):
    """Function to calculate the ranking score of a toot."""
    
    # Calculate the ranking score
    toot_df['ranking_score'] = (similarity_weight * toot_df['content_similarity_score']) + (interaction_weight * toot_df['interaction_score']) 
    
    # Sort the DataFrame according to the ranking score (descending)
    toot_df.sort_values('ranking_score', ascending=False, inplace=True)
    toot_df.reset_index(drop=True, inplace=True)
    
    return toot_df

In [25]:
# Set the weights for Similarity score and Interaction score
similarity_weight = 0.9
interaction_weight = 0.1

# Calculate the ranking score and expand the DataFrame 
toot_df_with_ranking = calculate_ranking_score(similar_toots, similarity_weight, interaction_weight)
toot_df_with_ranking.head()

Unnamed: 0,toot_id,content,reblogs_count,favourites_count,replies_count,mentions,tags,language,created_at,edited_at,instance,content_lemma,content_similarity_score,interaction_score,ranking_score
0,110322106296283536,I’m proud to live in a country where we earn power in a more democratic way.\nBy calling Georgia officials and asking them to find a few thousand votes.\n\n,116,200,15,[],[],en,2023-05-06 14:02:28.907000+00:00,,mastodon.social,proud live country earn power democratic way . \n call Georgia official ask find thousand vote . \n\n,0.261034,1.0,0.334931
1,110322097183909229,Watching these idiots learn the value of artistic labor is more entertaining\nthan any show.\n\n,14,1,1,[],[],en,2023-05-06 14:00:09+00:00,,mastodon.social,watch idiot learn value artistic labor entertaining \n . \n\n,0.322838,0.048338,0.295388
2,110322098597109624,It's also hilarious that half of these people making these videos are too\nyoung to have used them when they were current tech so they are looking at\nthem as some sort of retro curiosity.\n\n,0,1,1,[],[],en,2023-05-06 14:00:23+00:00,,mastodon.social,hilarious half people make video \n young current tech look \n sort retro curiosity . \n\n,0.325887,0.006042,0.293902
3,110322107673623618,trivia quiz game show-style videos!: Using multimedia platforms like YouTube\nallows one the creative freedom to generate unique content related quizzes\nthey can then share online with others around the world wishing likewise\nquality entertainment within this category - all focused upon favorite themes\ncentral enjoyed commonly together between shared followers alike concerning\nany detail worthy enough disclosure overall among peers attention solely\nconcentrated\n\n,0,0,1,[],[],en,2023-05-06 14:02:49+00:00,,mastodon.social,trivium quiz game - style video ! : multimedia platform like YouTube \n allow creative freedom generate unique content relate quiz \n share online world wish likewise \n quality entertainment category - focus favorite theme \n central enjoy commonly share follower alike concern \n detail worthy disclosure overall peer attention solely \n concentrate \n\n,0.325447,0.003021,0.293204
4,110322062761903802,"I can see why people would play with the idea of an Everything App. By that I\nmean people who would try to architect such a thing, or people who would try\nto profit by such a thing.\n\nBut I'm not sure the user case is so strong. I, at least, have modes of\nbrowsing. It's convenient for me to have an rss reader, a mastodon client, a\nconventional news app, a ""long read"" app lined up in a row.\n\nI approach them differently. In particular I only open rss or long reads when\nI have time to clear them.\n\n",0,0,0,[],[],en,2023-05-06 13:51:23+00:00,,mastodon.social,"people play idea App . \n mean people try architect thing , people try \n profit thing . \n\n sure user case strong . , , mode \n browse . convenient rss reader , mastodon client , \n conventional news app , "" long read "" app line row . \n\n approach differently . particular open rss long read \n time clear . \n\n",0.325529,0.0,0.292976


In [181]:
for toot_content in toot_df_with_ranking[:10].content:
    print(toot_content)

I’m proud to live in a country where we earn power in a more democratic way.
By calling Georgia officials and asking them to find a few thousand votes.


Watching these idiots learn the value of artistic labor is more entertaining
than any show.


It's also hilarious that half of these people making these videos are too
young to have used them when they were current tech so they are looking at
them as some sort of retro curiosity.


trivia quiz game show-style videos!: Using multimedia platforms like YouTube
allows one the creative freedom to generate unique content related quizzes
they can then share online with others around the world wishing likewise
quality entertainment within this category - all focused upon favorite themes
central enjoyed commonly together between shared followers alike concerning
any detail worthy enough disclosure overall among peers attention solely
concentrated


I can see why people would play with the idea of an Everything App. By that I
mean people who wo

## Probleme:
- Performance
    - Der Similarity Check dauert relativ lange
    - Das Lemmatizen dauert relativ lange
    -> Kann beim Öffnen der Timeline zu langer Ladezeit führen.
    
    Mögliche Lösung: 
    - Kategorisierung des Toot Contents nach dem Veröffentlichen, Persistierung in DB, Mustererkennung mit Regex (Abgleich der Interessen mit Kategorien) 
    - Anzahl der Toots beschränken -> Nur Toots der letzten Stunden/Tage laden
    - Vorverarbeitung: Lemmatizierung und Entfernen der Stopwords nach Veröffentlichung durchführen und speichern

## Weitere Schritte:
- Step 2. Get toots from people you follow 
- Step 3. Get persons with similar interests (who to follow)
- Step 4. Get toots by hashtags (filter hashtags by interests)
- Step 5. Mix data
- Step 6. Rank the toots in a ranking system with content similarity, hashtag similarity, interactions, actuality (how new they are), weight for toots from people you follow or persons with similar interests and sort them descending