# Design and construction of tripartite network from Reddit

Datasets from multipartite complex networks with 3 or more levels (tripartite, quadripartite, etc.) are very scarce, unlike the case of only 2 levels better known as bipartite graphs, which are quite common.

I designed and began to construct a tripartite network for my Ph.D. thesis, using the website [Reddit](https://www.reddit.com). According to their own description, "*Reddit is a network of communities where people can dive into their interests, hobbies and passions. There's a community for whatever you're interested in on Reddit*". In this context, I use the term *groups* instead of *communities* for technical reasons and to avoid misunderstandings.

The tripartite network I defined is composed of:
1. **Users** (usernames)
2. **Groups** (subreddits)
3. **Keywords** (words)

My main interest is the tripartite network analysis in two important topics:
* **Link prediction**. This can be used in recommendation systems for example, so we could recommend an user certain groups that might find interesting based on our anaylsis.
* **Community detection**. Also called clustering in (sligthly) different contexts, and it can be used to detect clusters of users based on the groups they frecuent and the keyword they use, for instance.

I already developed many algorithms to do **link prediction** and **community detection** in multipartite networks, but I was lacking of datasets to test them.

In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

In [2]:
import requests
from collections import Counter
import nltk
from textblob import TextBlob
import json
import numpy as np
import pandas as pd
import networkx as nx
import community
from pyvis import network as net

In [3]:
# To use the Reddit API you should have first a Reddit account and
# sign up for an OAUTH Client ID in https://www.reddit.com/prefs/apps
# and at the page bottom click on: "are you a developer? create an app..."
# https://towardsdatascience.com/how-to-use-the-reddit-api-in-python-5e05ddfd1e5c

my_username = 'tripartitenetwork' #account created only for this purpose
my_password = '987654321reddit123456789'

personal_use_script = 'jVFLZzCvn9H82rRg_M_O1w'
secret = 'djzraeUgBxE5U-BKirzY7OG9RQm7_w'

In [4]:
def headers_connection_request():
    # note that CLIENT_ID refers to 'personal use script' and SECRET_TOKEN to 'token'
    auth = requests.auth.HTTPBasicAuth(personal_use_script, secret)

    # here we pass our login method (password), username, and password
    data = {'grant_type': 'password',
            'username': my_username,
            'password': my_password}

    # setup our header info, which gives reddit a brief description of our app
    headers = {'User-Agent': 'MyBot/0.0.1'}

    # send our request for an OAuth token
    res = requests.post('https://www.reddit.com/api/v1/access_token',
                        auth=auth, data=data, headers=headers)

    # convert response to JSON and pull access_token value
    TOKEN = res.json()['access_token']

    # add authorization to our headers dictionary
    headers = {**headers, **{'Authorization': f"bearer {TOKEN}"}}

    return headers

In [5]:
# sometimes the first call to headers_connection_request() doesn't work, we need a while loop
def Headers():
    my_headers = None
    while my_headers is None:
        try: # try until connects and therefore initialize the process
            my_headers = headers_connection_request()
            return my_headers
        except:
             pass

In [6]:
#
my_headers = Headers()
my_headers

{'User-Agent': 'MyBot/0.0.1',
 'Authorization': 'bearer 1206362233968-5smKYQmr7gCB4580Lre3CO-d_8uS7A'}

## The starting point is any Reddit username, it's the only input we need.

In [7]:
username = 'zip759' #'urbannomadberlin' #'GovSchwarzenegger'
my_limit = 100

## (A) We start extracting all the words used from our specific user, and simultaneously, the groups where they were posted

We describe every text that a certain **user** writes (publicly) as a *post*. Hence, calling the Reddit API we indentify two main types of *posts* and some more subtypes:

1. `comment`


2. `submitted`

    i. `title`
    
    ii. `selftext` (optional)

### (i) We extract the keywords from comments and the subreddits where they were posted.

We extract the **keywords** from every `comment` *post*, every `title` of a `submitted` *post*, and optionally from the `selftext` of a `submitted` post, if any. Then we saved all of them in a common string `posts_full_text`.

In [8]:
posts_full_text = ""
groups_list = []

In [9]:
while True:
    try:
        res_comments = requests.get("https://oauth.reddit.com" + "/user" + "/" + username + "/comments",
                                    headers = my_headers,
                                    params = {'limit': my_limit})
        break
    except requests.ConnectionError:
        print("ConnectionError, trying again...")
        my_headers = Headers()#headers_connection_request()

In [10]:
for post in res_comments.json()['data']['children']:
    posts_full_text += " " + post['data']['body']
    groups_list.append(post['data']['subreddit'])

### (ii) Extracting keywords from submitted title, and from submitted selftext, if any, and the subreddits where they were posted.

At the same time, we will append the subreddits, i.e. the **groups** where every *post* belongs, in a list called `groups_list`.

In [11]:
while True:
    try:
        res_submitted = requests.get("https://oauth.reddit.com" + "/user" + "/" + username + "/submitted",
                                     headers = my_headers,
                                     params = {'limit': my_limit})
        break
    except requests.ConnectionError:
        print("ConnectionError, trying again...")
        my_headers = Headers()#headers_connection_request()

In [12]:
for post in res_submitted.json()['data']['children']:
    posts_full_text += " " + post['data']['title']
    groups_list.append(post['data']['subreddit'])
    if post['data']['selftext']:
        posts_full_text += " " + post['data']['selftext']

#### Having all the groups where a user posted we make a very simple analysis of them.

We count the **groups** repetitions and save them as a Python dictionary `groups_dict`. This will help us later to associate every **group** with its respective **user**, where the associated value will correspond to the link weight of the newly defined bipartite **user-groups** network.

In [13]:
groups_dict = {group: count for group, count in Counter(groups_list).most_common()}
#groups_dict

#### After retrieving all of the user posts keywords, we start to analyze them using the simplest approach: the [bag-of-words model](https://en.wikipedia.org/wiki/Bag-of-words_model).

The intention is to improve this analysis later with methods such as n-grams or more sophisticaed ones within the natural language processing field.

In [14]:
corpus_text = posts_full_text.lower()
#corpus_text

In [15]:
#nltk.download('stopwords') #download if necessary!
#nltk.download('punkt') #download if necessary!

stopwords_e = nltk.corpus.stopwords.words('english')
stopwords_g = nltk.corpus.stopwords.words('german')
stopwords_s = nltk.corpus.stopwords.words('spanish') #add languages if needed
stopwords = stopwords_e + stopwords_g + stopwords_s

mystopwords = ["a", "a's", "able", "about", "above", "according", "accordingly", "across", 
                "actually", "after", "afterwards", "again", "against", "ain't", "all", 
                "allow", "allows", "almost", "alone", "along", "already", "also", 
                "although", "always", "am", "among", "amongst", "an", "and", "another", 
                "any", "anybody", "anyhow", "anyone", "anything", "anyway", "anyways", 
                "anywhere", "apart", "appear", "appreciate", "appropriate", "are", 
                "aren't", "around", "as", "aside", "ask", "asking", "associated", "at", 
                "available", "away", "awfully", "b", "be", "became", "because", "become", 
                "becomes", "becoming", "been", "before", "beforehand", "behind", "being", 
                "believe", "below", "beside", "besides", "best", "better", "between", 
                "beyond", "both", "brief", "but", "by", "c", "c'mon", "c's", "came", "can", 
                "can't", "cannot", "cant", "cause", "causes", "certain", "certainly", 
                "changes", "clearly", "co", "com", "come", "comes", "concerning", 
                "consequently", "consider", "considering", "contain", "containing", 
                "contains", "corresponding", "could", "couldn't", "course", "currently", 
                "d", "definitely", "described", "despite", "did", "didn't", "different", 
                "do", "does", "doesn't", "doing", "don't", "done", "down", "downwards", 
                "during", "e", "each", "edu", "eg", "eight", "either", "else", "elsewhere", 
                "enough", "entirely", "especially", "et", "etc", "even", "ever", "every", 
                "everybody", "everyone", "everything", "everywhere", "ex", "exactly", 
                "example", "except", "f", "far", "few", "fifth", "first", "five", 
                "followed", "following", "follows", "for", "former", "formerly", "forth", 
                "four", "from", "further", "furthermore", "g", "get", "gets", "getting", 
                "given", "gives", "go", "goes", "going", "gone", "got", "gotten", 
                "greetings", "h", "had", "hadn't", "happens", "hardly", "has", "hasn't", 
                "have", "haven't", "having", "he", "he's", "hello", "help", "hence", "her", 
                "here", "here's", "hereafter", "hereby", "herein", "hereupon", "hers", 
                "herself", "hi", "him", "himself", "his", "hither", "hopefully", "how", 
                "howbeit", "however", "i", "i'd", "i'll", "i'm", "i've", "ie", "if", 
                "ignored", "immediate", "in", "inasmuch", "inc", "indeed", "indicate", 
                "indicated", "indicates", "inner", "insofar", "instead", "into", "inward", 
                "is", "isn't", "it", "it'd", "it'll", "it's", "its", "itself", "j", "just", 
                "k", "keep", "keeps", "kept", "know", "knows", "known", "l", "last", 
                "lately", "later", "latter", "latterly", "least", "less", "lest", "let", 
                "let's", "like", "liked", "likely", "little", "look", "looking", "looks", 
                "ltd", "m", "mainly", "many", "may", "maybe", "me", "mean", "meanwhile", 
                "merely", "might", "more", "moreover", "most", "mostly", "much", "must", 
                "my", "myself", "n", "name", "namely", "nd", "near", "nearly", "necessary", 
                "need", "needs", "neither", "never", "nevertheless", "new", "next", "nine", 
                "no", "nobody", "non", "none", "noone", "nor", "normally", "not", 
                "nothing", "novel", "now", "nowhere", "o", "obviously", "of", "off", 
                "often", "oh", "ok", "okay", "old", "on", "once", "one", "ones", "only", 
                "onto", "or", "other", "others", "otherwise", "ought", "our", "ours", 
                "ourselves", "out", "outside", "over", "overall", "own", "p", "particular", 
                "particularly", "per", "perhaps", "placed", "please", "plus", "possible", 
                "presumably", "probably", "provides", "q", "que", "quite", "qv", "r", 
                "rather", "rd", "re", "really", "reasonably", "regarding", "regardless", 
                "regards", "relatively", "respectively", "right", "s", "said", "same", 
                "saw", "say", "saying", "says", "second", "secondly", "see", "seeing", 
                "seem", "seemed", "seeming", "seems", "seen", "self", "selves", "sensible", 
                "sent", "serious", "seriously", "seven", "several", "shall", "she", 
                "should", "shouldn't", "since", "six", "so", "some", "somebody", "somehow", 
                "someone", "something", "sometime", "sometimes", "somewhat", "somewhere", 
                "soon", "sorry", "specified", "specify", "specifying", "still", "sub", 
                "such", "sup", "sure", "t", "t's", "take", "taken", "tell", "tends", "th", 
                "than", "thank", "thanks", "thanx", "that", "that's", "thats", "the", 
                "their", "theirs", "them", "themselves", "then", "thence", "there", 
                "there's", "thereafter", "thereby", "therefore", "therein", "theres", 
                "thereupon", "these", "they", "they'd", "they'll", "they're", "they've", 
                "think", "third", "this", "thorough", "thoroughly", "those", "though", 
                "three", "through", "throughout", "thru", "thus", "to", "together", "too", 
                "took", "toward", "towards", "tried", "tries", "truly", "try", "trying", 
                "twice", "two", "u", "un", "under", "unfortunately", "unless", "unlikely", 
                "until", "unto", "up", "upon", "us", "use", "used", "useful", "uses", 
                "using", "usually", "uucp", "v", "value", "various", "very", "via", "viz", 
                "vs", "w", "want", "wants", "was", "wasn't", "way", "we", "we'd", "we'll", 
                "we're", "we've", "welcome", "well", "went", "were", "weren't", "what", 
                "what's", "whatever", "when", "whence", "whenever", "where", "where's", 
                "whereafter", "whereas", "whereby", "wherein", "whereupon", "wherever", 
                "whether", "which", "while", "whither", "who", "who's", "whoever", "whole", 
                "whom", "whose", "why", "will", "willing", "wish", "with", "within", 
                "without", "won't", "wonder", "would", "would", "wouldn't", "x", "y", 
                "yes", "yet", "you", "you'd", "you'll", "you're", "you've", "your", 
                "yours", "yourself", "yourselves", "z", "zero"] #complete with words to exclude if necessary

stopwords += mystopwords

def common_words(text): # isalpha() optional for words made of only letters 
    return [word for word in TextBlob(text).words if word not in stopwords]# and word.isalpha()]

Saving the most common words as a Python dictionary `keywords_dict`, will help us later to associate every **keyword** with its respective **user**, where the associated value will correspond to the link weight of the newly defined **user-keywords** network.

In [16]:
keywords_dict = {word: count for word, count in Counter(common_words(corpus_text)).most_common()}

## (B) We continue extracting, for our specific input user, all the associated users.

In principle, this is not really necessary. Since we already have the basic code to extract all the **groups** and **keywords** for any specific **user**, we could do the same procedure for any arbitrary list of Reddit usernames. But it would make absolute sense to search for **users** connected somehow to our input **user**, and we will find them with a similar approach to the previous one, retrieving our input **user** information. Once we obtain all the **users** associated to our input **user**, we applied to them the full procedure describe in **(A)** to obtain their respective **groups** and **keywords**, and having this we'll have all the needed information to construct our tripartite network. Other different Reddit usernames can be also added manually at any point to expand the network even more.

### (i) For any given input user and from its submitted posts, we extract the users from the direct replies (first children) to any of them.

We save all the associated **users** in the `associated_users` list.

In [17]:
associated_users = []

In [18]:
for post in res_submitted.json()['data']['children']:
    name = post['data']['name']
    while True:
        try:
            res_name = requests.get("https://oauth.reddit.com" + "/comments" + "/" + name[3:] + "/api"
                                    + "/morechildren",
                                    headers = my_headers)
            break
        except requests.ConnectionError:
            print("ConnectionError, trying again...")
            my_headers = Headers()#headers_connection_request()
    for comment in res_name.json()[1]['data']['children']:
        if 'author' in comment['data']: #there's a weird behaviour of Reddit API when retreiving long posts!
            associated_users.append(comment['data']['author'])

### (ii) For the same input user and from its comments, we extract the users from the previous comment (parent or link author).

We append all these new associated **users** in the `associated_users` list.

In [19]:
for i, post in enumerate(res_comments.json()['data']['children']): #up to 100 comments
    link = post['data']['link_id']
    parent = post['data']['parent_id']
    if link != parent: #if parent is not the main post
        while True:
            try:
                res_parent = requests.get("https://oauth.reddit.com" + post['data']['permalink'][:-8]
                                          + parent[3:],
                                          headers = my_headers)
                break
            except requests.ConnectionError:
                print("ConnectionError, trying again...")
                my_headers = Headers()#headers_connection_request()
        for j, comment in enumerate(res_parent.json()[1]['data']['children']):
            if 'author' in comment['data']: #there's a weird behaviour of Reddit API when retreiving long posts!
                #print(j, comment['data']['author'])
                associated_users.append(comment['data']['author'])
    else: #parent is the main post
        associated_users.append(post['data']['link_author'])

### (iii) For the same input user and from its comments, we extract the users from all the following comments (first childrens).

This is very tricky to do given the structure of the retrieved information, we need to define a recursive function which acts directly over the adecuate part of the retrieved json and returns a list of **users**. We start doing it only for one comment, then for all of them. We append all these new associated **users** in the `associated_users` list.

In [20]:
def recursive_in_json(subjson, i=0, depth_limit=1, lst=[]): #depth_limit=1 will show only direct children from a comment
    for post in subjson['data']['children']:
        if i <= depth_limit:
            if 'replies' in post['data']:
                lst.append(post['data']['author'])
                if post['data']['replies']:
                    recursive_in_json(post['data']['replies'], i+1, depth_limit=depth_limit, lst=lst)
    return lst[1:]

In [21]:
userstestlist = []
for i, post in enumerate(res_comments.json()['data']['children']): #up to 100 comments
    while True:
        try:
            res_test = requests.get("https://oauth.reddit.com" + post['data']['permalink'],
                                    headers = my_headers)
            break
        except requests.ConnectionError:
            print("ConnectionError, trying again...")
            my_headers = Headers()#headers_connection_request()
    utl = recursive_in_json(res_test.json()[1], lst=[])
    if utl:
        userstestlist.extend(utl)

In [22]:
associated_users.extend(userstestlist)

We clean this list deleting repeating entries using a Python set, deleting the input **user** and the `'[deleted]'` ones (profiles that doesn't exist anymore), finally creating the list `users_list` to save all of them

In [23]:
users_list = list(set(associated_users))
if username in users_list:
    users_list.remove(username)
if '[deleted]' in users_list:
    users_list.remove('[deleted]')
if 'AutoModerator' in users_list:
    users_list.remove('AutoModerator')
    
users_full_list = users_list + [username]

#### Given the input user, we found the associated groups, keywords and users

In [24]:
#username

In [25]:
#groups_dict

In [26]:
#keywords_dict

In [27]:
#users_list

In [28]:
final_users_groups_keywords_dict = {}
final_users_groups_keywords_dict[username] = {}
final_users_groups_keywords_dict[username]['groups'] = groups_dict
final_users_groups_keywords_dict[username]['keywords'] = keywords_dict
final_users_groups_keywords_dict[username]['users'] = users_list

In [29]:
groups_full_set = set(groups_dict.keys())
keywords_full_set = set(keywords_dict.keys())

## Finding all groups and keywords for the associated users

We automatize now the previous procedure to obtain **groups** and **keywords** for every **user** in `users_list`, and save them in a Python dictionary of dictionaries.

In [30]:
def groups_keywords_dict(user, the_headers):

    posts_full_text = ""
    groups_list = []

    while True:
        try:
            res_comments = requests.get("https://oauth.reddit.com" + "/user" + "/" + user + "/comments",
                                        headers = the_headers,
                                        params = {'limit': my_limit})
            break
        except requests.ConnectionError:
            print("ConnectionError, trying again...")
            my_headers = Headers()
    try:
        for post in res_comments.json()['data']['children']:
            posts_full_text += " " + post['data']['body']
            groups_list.append(post['data']['subreddit'])
    except:
        pass

    while True:
        try:
            res_submitted = requests.get("https://oauth.reddit.com" + "/user" + "/" + user + "/submitted",
                                         headers = the_headers,
                                         params = {'limit': my_limit})
            break
        except requests.ConnectionError:
            print("ConnectionError, trying again...")
            my_headers = Headers()
    try:
        for post in res_submitted.json()['data']['children']:
            posts_full_text += " " + post['data']['title']
            groups_list.append(post['data']['subreddit'])
            if post['data']['selftext']:
                posts_full_text += " " + post['data']['selftext']
    except:
        pass

    groups_dict = {group: count for group, count in Counter(groups_list).most_common()}

    corpus_text = posts_full_text.lower()
    keywords_dict = {word: count for word, count in Counter(common_words(corpus_text)).most_common()}

    return groups_dict, keywords_dict

In [31]:
#my_headers = headers_connection_request() #if needed

for i, user in enumerate(users_list):
    print(i, user)
    gkd = groups_keywords_dict(user, my_headers)
    final_users_groups_keywords_dict[user] = {}
    final_users_groups_keywords_dict[user]['groups'] = gkd[0]
    final_users_groups_keywords_dict[user]['keywords'] = gkd[1]
    
    groups_full_set = groups_full_set.union(gkd[0].keys())
    keywords_full_set = keywords_full_set.union(gkd[1].keys())

0 BobVosh
1 wsbfan1123
ConnectionError, trying again...
ConnectionError, trying again...
2 Jungaktien_Jannik
3 itsbotpixel
4 thisimpetus
5 billionai1
6 docsyzygy
7 General_Ad4617
8 secret759
9 Oculosdegrau
ConnectionError, trying again...
10 qhyirrstynne
11 sonia72quebec
12 HLef
13 Okzuo
14 laymanlinguist
15 Patty_Henry
16 cerebraldormancy
ConnectionError, trying again...
ConnectionError, trying again...
ConnectionError, trying again...
17 DemocraticRepublic
18 BrewCityChaser
19 flatfisher
20 ThatPortraitGuy
21 biggest_____chungus
22 Four4TheRoad
23 EpaFdx
ConnectionError, trying again...
ConnectionError, trying again...
24 SilentSamamander


In [32]:
final_users_groups_keywords_dict[username]

{'groups': {'u_zip759': 3,
  'Pharmadrug': 1,
  'france': 1,
  'technology': 1,
  'aviation': 1,
  'AskReddit': 1,
  'programming': 1,
  'science': 1},
 'keywords': {'canon': 2,
  'time': 2,
  'worry': 1,
  'nida-funded': 1,
  "'s": 1,
  'shame': 1,
  'uk': 1,
  'left': 1,
  'eu': 1,
  'reason': 1,
  'enabled': 1,
  '‘': 1,
  'auto': 1,
  'refill': 1,
  '’': 1,
  'printer': 1,
  'office': 1,
  'guy': 1,
  'stupid': 1,
  'write': 1,
  'instant': 1,
  'messages': 1,
  'bragging': 1,
  'misleading': 1,
  'faa': 1,
  'moving': 1,
  'fast': 1,
  '1-17': 1,
  'forever': 1,
  '18-20': 1,
  'feels': 1,
  'month': 1,
  '20-30': 1,
  'feel': 1,
  'week': 1,
  'phrx': 1,
  'partner': 1,
  'jhu': 1,
  'news': 1,
  'receiving': 1,
  'federal': 1,
  'funding': 1,
  'psychedelic': 1,
  'research': 1,
  'sued': 1,
  'disabling': 1,
  'scanner': 1,
  'printers': 1,
  'run': 1,
  'ink': 1,
  'software': 1,
  'developers': 1,
  'stopped': 1,
  'caring': 1,
  'reliability': 1,
  'covid-19': 1,
  'caused':

## Dumping to a json file the raw information of the tripartite network

In [33]:
# depending on he input user, this could create a json file of a couple of MB
with open('tripartite_raw.json', 'w') as f:
    json.dump(final_users_groups_keywords_dict, f)

# Indexing all elements 

We need to asign every element (user, group or keyword) a certain index, we choose to do it alphabetically, in order to make the correspondence to Numpy arrays.

In [34]:
full_users = {us: i for i, us in enumerate(sorted(users_full_list))}
full_groups = {gr: i for i, gr in enumerate(sorted(groups_full_set))}
full_keywords = {ke: i for i, ke in enumerate(sorted(keywords_full_set))}

In [35]:
len(full_keywords)

14892

In [36]:
full_groups

{'195': 0,
 '196': 1,
 '2624': 2,
 '2healthbars': 3,
 '2meirl4meirl': 4,
 '4x4': 5,
 '52weeksofcooking': 6,
 'AMA': 7,
 'ANormalDayInRussia': 8,
 'AOC': 9,
 'AccidentalComedy': 10,
 'AcousticOriginals': 11,
 'Advice': 12,
 'AmITheAngel': 13,
 'AmItheAsshole': 14,
 'AnimalsBeingDerps': 15,
 'AntiJokes': 16,
 'ArchitecturalRevival': 17,
 'ArenaHS': 18,
 'AskALiberal': 19,
 'AskAnAmerican': 20,
 'AskBiology': 21,
 'AskEngineers': 22,
 'AskEurope': 23,
 'AskGameMasters': 24,
 'AskHistorians': 25,
 'AskHistory': 26,
 'AskMen': 27,
 'AskReddit': 28,
 'AskScienceFiction': 29,
 'Assistance': 30,
 'Atlanta': 31,
 'AutoChess': 32,
 'AutoTuga': 33,
 'Awww': 34,
 'Baking': 35,
 'BakingNoobs': 36,
 'Bigpharmagame': 37,
 'BirdsArentReal': 38,
 'BlackPeopleTwitter': 39,
 'BobsTavern': 40,
 'Borderlands2': 41,
 'Brewers': 42,
 'Buttcoin': 43,
 'COMPLETEANARCHY': 44,
 'Calgary': 45,
 'CalgaryClassifieds': 46,
 'CanadaPolitics': 47,
 'CasualConversation': 48,
 'CasualUK': 49,
 'CatastrophicFailure': 50,

In [37]:
#numpy bipartite arrays
biparr_gu = np.zeros((len(full_groups), len(full_users)))
biparr_uk = np.zeros((len(full_users), len(full_keywords)))

#pandas bipartite dataframes
df_gu = pd.DataFrame(columns=('group', 'user', 'repetitions'))
#df_uk = pd.DataFrame(columns=('user', 'keywords', 'repetitions'))

#biparr_gu.shape

Using the same loop for populating the numpy arrays, we create some pandas dataframes to be called by networkx and immediately used by pyvis to obtain interactive network visualizations.

In [38]:
i = 0
#j = 0
for user, values in final_users_groups_keywords_dict.items():
    u_idx = full_users[user]
    for group, gvalue in values['groups'].items():
        g_idx = full_groups[group]
        biparr_gu[g_idx][u_idx] = gvalue
        df_gu.loc[i] = [group, user, gvalue]
        i += 1
    for keyword, kvalue in values['keywords'].items():
        k_idx = full_keywords[keyword]
        biparr_uk[u_idx][k_idx] = kvalue
        #df_uk.loc[j] = [user, keyword, kvalue]
        #j += 1

In [39]:
#df_gu

In [75]:
G_gu = nx.from_pandas_edgelist(df_gu, 'group', 'user', edge_attr='repetitions')
#partition_G_gu = community.best_partition(G_gu, weight='repetitions')
#for n, p in partition_G_gu.items():
#    G_gu.nodes[n]['group'] = p
    
'''G_uk = nx.from_pandas_edgelist(df_uk, 'user', 'keyword', edge_attr='repetitions')
partition_G_uk = community.best_partition(G_uk, weight='repetitions')
for n, p in partition_G_uk.items():
    G_uk.nodes[n]['group'] = p'''

#G_gu.edges.data()

"G_uk = nx.from_pandas_edgelist(df_uk, 'user', 'keyword', edge_attr='repetitions')\npartition_G_uk = community.best_partition(G_uk, weight='repetitions')\nfor n, p in partition_G_uk.items():\n    G_uk.nodes[n]['group'] = p"

In [68]:
G_gu.nodes['u_zip759']

{}

In [71]:
partition_G_gu = community.best_partition(G_gu, weight='repetitions')
for n, p in partition_G_gu.items():
    G_gu.nodes[n]['group'] = p

partition_G_gu

{'u_zip759': 0,
 'zip759': 0,
 'Pharmadrug': 1,
 'france': 0,
 'technology': 2,
 'aviation': 3,
 'AskReddit': 4,
 'programming': 0,
 'science': 5,
 'BobVosh': 4,
 'CrusaderKings': 6,
 'GifRecipes': 4,
 'TikTokCringe': 4,
 'whatsthisbug': 4,
 'Showerthoughts': 5,
 'HPfanfiction': 4,
 'AskHistorians': 7,
 'slaythespire': 4,
 'KidsAreFuckingStupid': 4,
 'nextfuckinglevel': 5,
 'WormMemes': 4,
 'casualiama': 4,
 'truegaming': 4,
 'oddlysatisfying': 8,
 'HolUp': 9,
 'harrypotter': 4,
 'UniversityOfHouston': 4,
 'BakingNoobs': 4,
 '52weeksofcooking': 4,
 'Baking': 4,
 'buildapc': 10,
 'redditgetsdrawn': 4,
 'AskGameMasters': 4,
 'CollectiveGaming': 4,
 'JimSterling': 4,
 'HeadphoneAdvice': 4,
 'AskHistory': 4,
 'WritingPrompts': 5,
 'Cynicalbrit': 4,
 'CircleofTrust': 4,
 'MadokaMagica': 4,
 'Tarmack': 4,
 'Fitness': 4,
 'gaming': 11,
 'pcgaming': 12,
 'patientgamers': 6,
 'heroesofthestorm': 4,
 'Frisson': 4,
 'AskScienceFiction': 4,
 'guitars': 4,
 'giftcardexchange': 4,
 'DnD': 4,
 'SHIBA

In [72]:
G_gu.nodes['u_zip759']

{'group': 0}

In [74]:
ggu = net.Network(width=1000, height=1000, notebook=True, heading='Bipartite network of groups-users simple plot (unipartite Louvain communities)')
ggu.toggle_physics(True)#False)
ggu.from_nx(G_gu)
ggu.show("bipartite_ggu.html")

In [42]:
'''guk = net.Network(width=1000, height=1000, notebook=True, heading='Bipartite users-keywords')
guk.toggle_physics(False)
guk.from_nx(G_uk)
guk.show("test_guk.html")'''

'guk = net.Network(width=1000, height=1000, notebook=True, heading=\'Bipartite users-keywords\')\nguk.toggle_physics(False)\nguk.from_nx(G_uk)\nguk.show("test_guk.html")'