## Social Media
 





### scenario
In this scenario, we are going to simulate the collboration between a social media platform (made up of subforums discussing various topics) and a company wishing to advertise. 
### Privacy Concerns 
In certain instances, these subforums are of a sensitive nature, and posts are kept private to approved members only. Specifically the subforum we are targeting is "Health and Wellness" and the product being advertised is a sleep tracking app. 
### Business Case
The social media company, which we will call ClickReadShare and the advertiser, Snoo-Ze-Time, want to share data in a secure manner that respects the users privacy and ensures that Snoo-Ze-Time can identify users that are:
* A relevant fit for the product (e.g. they have discussed sleep issues/advice in the health and wellness subforum)
* An influential member of the community (so that if they do buy the product, and enjoy it, they are an ideal source for word of mouth)

### Solution
Here ClickReadShare has agreed to share an encrypted summary of the users' posts in the subforum, as well as an encrypted graph showing who follows who. It's up to Snoo-Ze-Time to decide how to rank and classify the users. Snoo-Ze-Time has trained a transformer to classify the user summaries, and has decided to use PageRank to rank the users. Once they have classified the users according to their interest in sleep health, and ranked them according to their network influence, Snoo-Ze-Time sends back to ClickReadShare the encrypted results. ClickReadShare can then decrypt and identify which users would be a good fit for the add, without disclosing any information to the advertiser.

In [1]:
from transformers import DataCollatorForLanguageModeling, AutoTokenizer, DataCollatorWithPadding, BertTokenizerFast

import numpy as np
import pickle
from tqdm.notebook import tqdm
from venumML.venumpy import small_glwe as vp
import torch
import pandas as pd
# import math
from scipy.special import expit as sigmoid
import networkx as nx

from venumML.deep_learning.transformer.transformer import *
from venumML.venum_tools import *
from venumML.approx_functions import *
from venumML.graphs.venum_graph import *

from transformer_social_utils import *
from venumML.deep_learning.transformer.transformer import *


## ClickReadShare instantiates encrypted context 
Let's start by setting up our context.

In [3]:
ctx = vp.SecretContext()
ctx.precision= 6

# ClickReadShare loads user data

In [4]:
# Import the transformer data folder
data_folder = "../use_cases/social_media_demo/data/"
with open(data_folder + "synthetic_forum_data.pkl", "rb") as f:
    forum_data = pickle.load(f)

# ClickReadShare creates the user graph

In [5]:

forum_data['user_following']
# Create a directed graph
G = nx.DiGraph()

# Add nodes and edges from the user_following data
for user, following in forum_data['user_following'].items():
    for followed_user in following:
        G.add_edge(user, followed_user)

# Display the graph information
# print(nx.info(G))

# ClickReadShare encrypts user graph

In [6]:
EG = encrypt_networkx(ctx,G,use_hashing=False)

# ClickReadShare generates user summaries

In [7]:
user_summary = forum_data['user_summary']

In [8]:
with open(data_folder + "embeddings.pkl", "rb") as f:
    embeddings_weights = pickle.load(f)

embeddings = Embeddings(embeddings_weights.numpy())


In [9]:
tokenizer = load_tokenizer()
max_seq_len = 20

In [10]:
encrypted_summaries = encrypt_user_summary(user_summary,embeddings,tokenizer,max_seq_len,ctx)

100%|██████████| 100/100 [00:00<00:00, 176.96it/s]


At this point ClickReadShare has encrypted all the information need by Snoo-Ze-Time to secuerly identify potential customers. Now they transfer the data to Snoo-Ze-Time, and the encrypted machine learning begins!

# Snoo-Ze-Time loads in the model and labels

In [11]:
model_path = "../use_cases/social_media_demo/model/"
state_dict = torch.load(model_path + 'social_model_weights.pth', map_location=torch.device('cpu'))

In [12]:
mappings_file_path = data_folder + "label_mappings.pkl"


In [13]:

with open(mappings_file_path, "rb") as f:
    mappings = pickle.load(f)
    label_mapping = mappings["label_mapping"]
    reversed_label_mapping = mappings["reversed_label_mapping"]

# Snoo-Ze-Time instantiates the transformer and encrypt the weights

In [14]:
tokenizer = load_tokenizer()

transformer = TransformerInference(
    model_weights_path=model_path+"/social_model_weights.pth",
    tokenizer=tokenizer,
    encryption_context=ctx,
    max_seq_len=20,
    d_model=8,
    num_heads=2,
    d_ff=32,
    vocab_size=30522,  # Example vocab size
    class_size=len(label_mapping)
)



  weight = state_dict[k].T.numpy()
Encrypting weights: 100%|██████████| 21/21 [00:07<00:00,  2.78it/s]


# Snoo-Ze-Time classifies the user summaries   

In [15]:
encrypted_classifications = transformer.predict(encrypted_summaries)   

100%|██████████| 100/100 [04:15<00:00,  2.56s/it]


# Snoo-Ze-Time calculates the user PageRank

In [16]:
ranking = pagerank(ctx,EG,damping_factor=0.85, iters=20)

Now, with all the analysis done, the results can be sent back to ClickReadShare to identify which users to advertise to.


In [17]:
decrypted_candidates = {}
for user in encrypted_classifications.keys():

    decrypted_classification = np.argmax(softmax(decrypt_array(encrypted_classifications[user])))
    decrypted_candidates[user] = decrypted_classification*(ranking[user]).decrypt()



# ClickReadShare identifies candidate

In [18]:
ranking_decrypted = decrypt_pagerank(ctx,ranking)   

In [19]:
def pagerank_percentile(pagerank_dict):
    """
    Computes the percentile rank for PageRank scores in a dictionary.

    Parameters:
        pagerank_dict (dict): Dictionary with user IDs as keys and PageRank scores as values.

    Returns:
        dict: Dictionary with user IDs as keys and percentile-scaled PageRank scores as values.
    """
    # Extract keys and values
    users = list(pagerank_dict.keys())
    pagerank_scores = np.array(list(pagerank_dict.values()))

    # Compute the ranks (percentile)
    percentile_ranks = 100 * np.argsort(np.argsort(pagerank_scores)) / (len(pagerank_scores) - 1)

    # Normalize to [0, 1] range
    scaled_pagerank = percentile_ranks / 100

    # Return as a dictionary
    return {user: scaled for user, scaled in zip(users, scaled_pagerank)}


In [20]:
scaled_pr = pagerank_percentile(ranking_decrypted)


Let's take a look at the top candidates. We can pick the out by picking those that have a high probability in the transformer classifier, and also are in the top percentile in the pagerank results (given by scaled_pr above.)

In [21]:
decrypted_candidates_score = {}
for user in encrypted_classifications.keys():

    probabilities, decrypted_classification = to_classes(decrypt_array(encrypted_classifications[user]))
    if scaled_pr[user] > 0.8 and probabilities[0][1] > 0.99:
        decrypted_candidates_score[user] = probabilities[0][1]*scaled_pr[user]
   

For fun, let's see what the comment summaries say about the users. Note this is not necessary, just to help validate the results.

In [22]:

sorted_decrypted_candidates = {k: v for k, v in sorted(decrypted_candidates.items(), key=lambda item: item[1], reverse=True)}
for user in list(sorted_decrypted_candidates.keys())[:5]:
    print(user)
    print('user summary:' , user_summary[user])
    print('decrypted candidates score:', decrypted_candidates_score[user])
    print('~~~~')


user_99
user summary: Sleep: Mindfulness meditation, exercise, setting boundaries, and avoiding screens before bed can improve sleep quality. Financial Health: Building credit, monitoring credit scores, and disputing errors are key for financial stability. Self-improvement: Challenging oneself
decrypted candidates score: 0.9999999990177237
~~~~
user_98
user summary: Sleep: Create a bedtime routine to improve sleep quality. Sustainable weight loss: Focus on small diet and lifestyle changes. Retirement savings: Prioritize saving early and diversifying investments. Mental wellness: Practice self-care routines, mindfulness, and meditation. Supplements:
decrypted candidates score: 0.9898989898989413
~~~~
user_97
user summary: Improving sleep quality is crucial for overall well-being. Establish a calming bedtime routine and create a comfortable sleeping environment. Limit screen time and avoid caffeine close to bedtime. Consistency is key for better sleep. Prioritize self-care, mental wellne