# Workflow example:  To what extent are you a positive person? 

### Steps ###
0. **Formulate Goal**
1. **Determine which variable should be selected to reach goal**
2. **Select raw data source**
3. **List available information in raw data source**
4. **Create dictionary with sensitive info**
5. **Create function to search for sensitive info and replace it with pseudo data**
____________________________________________________________________________________________________________

In [155]:
import json
import pandas as pd
from pathlib import Path
import numpy as np
import emojis
import emoji
import regex
import matplotlib.pyplot as plt

### 0. Goal

**Develop generic label search function for (instagram) .json** 

### 1. Variable

* Automatically find sensitive user information in all .json files
* Automatically change sensitive user information to (pseudo) anonymized key
* Automatically create a dictionary with original key and (pseudo) anonymized change

### 2. Raw data source

**Instagram**

Instagram download contains the following folders:

* direct  > date folders (YYDDMM) > photos directly send to other users via 'message' on that day
* photos  > date folders (YYDDMM) > photos posted on your 'page' that day
* videos  > date folders (YYDDMM) > videos posted on your 'page' that day
* stories > date folders (YYDDMM) > photos posted on your 'story' that day
* profile > date folders (YYDDMM) > photo used as profile picture on that day

Instagram download contains the following files (not in folders):
0. information_about_you: your primary location (home adress)
1. searches: your search info on instagram with corresponding timestamp
2. autofill: ? (*'You have no data in this section'*)
3. checkout: the email of payment account (N.B. insta if free) 
4. connections: all your connections with corresponding timestamps (e.g., when did you start following them or vice versa)
5. devices: information about the used devices
6. likes: likes of media posts and comments of other users with corresponding timestamp
7. media: caption of photo posts, video posts, and stories with corresponding timestamp and path to corresponding media (within download)
8. seen_content: all content (posts, videos, adds, chains) you've seen on instagram with corresponding timestamp and author (username of poster)
9. settings: account setting (allow comments from)
10. stories_activities: your activity on story polls of other users
11. account_history: info of logged in devices (e.g., ip adress) and registration info (e.g., name, email)
12. comments: your comments on other (unknown) users posts with corresponding timestamp
13. messages: private messages between you and other users with corresponding timestamps, shared media, links, etc.
14. profile: all information about your profile (e.g., username, email, full name, start date, etc.)
15. saved: all saved media with corresponding timestamp and owner of media (username)
16. uploaded_contacts: ? (*'You have no data in this section'*)


In [156]:
project = Path('your path to insta data')
data = project /'datadownload'

In [157]:
# Passive files (generated by insta)
json_file_you = data / 'information_about_you.json'
json_file_autofill = data / 'autofill.json'
json_file_pay = data / 'checkout.json'
json_file_users = data / 'connections.json'
json_file_device = data / 'devices.json'
json_file_settings = data / 'settings.json'
json_file_account = data / 'account_history.json'
json_file_user = data / 'profile.json'
json_file_contact = data / 'uploaded_contacts'

# Interaction files (generated by users)
json_file_like = data / 'likes.json'
json_file_med = data / 'media.json'
json_file_seen = data / 'seen_content.json'
json_file_stories = data / 'stories_activities.json'
json_file_com = data / 'comments.json'
json_file_mes = data / 'messages.json'
json_file_saved = data / 'saved.json'
json_file_search = data / 'searches.json'

### 3. List available information

#### What sensitive info is where?

Files containing 'usernames' 
* 1. searches.json --> username of other users (direct)
* 4. connections.json --> username of other users (direct) (your connections: people following you, or people you follow)
* 6. likes.json --> username of other users (direct)
* 7. media.json --> username of other users (indirect) (within your caption you can tagg people with @username)
* 8. seen_content.json --> username of other (unknown) users (direct)
* 10. stories_activities.json --> username of other users (direct)
* 11. account_history.json --> registration_info list
* 12. comments.json --> username of other (unkown) users (direct + indirect) (within your comment you can tagg people with @username)
* 13. messages.json --> username of other users (direct + indirect) (within your caption you can tagg people with @username, but the full names of the users are also used frequently 'hey Kees! how are you?')
* 14. profile.json --> username of your account (direct)
* 15. saved.json --> username of other users (direct)

Files containing other personal info
* 0. information_about_you.json --> primary location (home adress)
* 3. checkout.json --> payment_account_emails
* 5. devices.json --> device_id
* 11. account_history.json --> login_history (e.g., ip adress, device id) and registration_info (e.g., name, email)
* 14. profile.json --> all profile info (e.g., email, gender, name, link to profile picture, username, etc.)


### 4. Create dictionary with sensitive info

#### Find all 'explicit' usernames

In [158]:
def usernames():
    
    # Load profile.json to get username of user
    with open(json_file_user, encoding = "utf8") as json_user:
        user = json.load(json_user)
    
    user = pd.DataFrame.from_dict(user, 
        orient = 'index').T 
    
    # Load connections.json to get username of all connections
    with open(json_file_users, encoding = "utf8") as json_users:
        users = json.load(json_users)

    users = pd.DataFrame.from_dict(users, 
        orient = 'index').T 

    users = users.index.values.tolist()
    
    # Create dictionary with original username as key
    dictionary = {}
    dictionary = dict.fromkeys(user['username'] , 'NA')
    new = dict.fromkeys(users , 'NA')
    dictionary.update(new)
    
    # look for usernames outside of connections 
    # Saved media
    with open(json_file_saved, encoding = "utf8") as json_saved:
        saved = json.load(json_saved)
    
    users = pd.DataFrame(saved['saved_media'])[1]
    
    # Likes
    with open(json_file_like, encoding = "utf8") as json_likes:
        likes = json.load(json_likes)
    
    user_like = pd.DataFrame(likes['media_likes'])[1]
    user_like = user_like.append(pd.DataFrame(likes['comment_likes'])[1])
        
    # Seen content
    with open(json_file_seen, encoding = "utf8") as json_seen:
        seen = json.load(json_seen)
    
    user_seen = pd.DataFrame(seen['chaining_seen'])['username']
    user_seen = user_seen.append(pd.DataFrame(seen['ads_seen'])['author'])
    user_seen = user_seen.append(pd.DataFrame(seen['posts_seen'])['author'])
    user_seen = user_seen.append(pd.DataFrame(seen['videos_watched'])['author'])
    
    # Search media
    with open(json_file_search, encoding = "utf8") as json_search:
        search = json.load(json_search)

    user_search = pd.DataFrame(search)['search_click']
    
    # Media comments
    with open(json_file_com, encoding = "utf8") as json_comments:
        comments = json.load(json_comments)

    user_com = pd.DataFrame(comments['media_comments'])[2]
    
    # Merge all usernames
    users = users.append(user_seen)
    users = users.append(user_like)
    users = users.append(user_search)
    users = users.append(user_com)
    users = set(users)
    
    for i in users:
        if i in dictionary:
            next
        else: 
            dictionary.update({i:'NA'})
    
    return(dictionary)
    

In [159]:
usernames()

{'roosvoor': 'NA',
 'beberson': 'NA',
 'danielpolosetzky': 'NA',
 'sophie_soof': 'NA',
 'symonab': 'NA',
 'mana.fazel': 'NA',
 'evaendema': 'NA',
 'zack_from_earth': 'NA',
 'jboonstra73': 'NA',
 '_romyrachel': 'NA',
 'lauraderooij': 'NA',
 'veerlegewoon': 'NA',
 'sophiejacobs1993': 'NA',
 'momo_schaap': 'NA',
 'mitalipoovs': 'NA',
 'bonnievanderlee': 'NA',
 'agnesdesl': 'NA',
 'theycallmenita': 'NA',
 'die_ene_insta': 'NA',
 'hannadohle': 'NA',
 'jurrekuin': 'NA',
 'yaramiora': 'NA',
 'faraah.aulia': 'NA',
 'bluunie': 'NA',
 'tiarmaguvnor': 'NA',
 'ingevanooijen': 'NA',
 'dieuweertje': 'NA',
 'al.bert.0': 'NA',
 'annelotte2': 'NA',
 'hugomcgurran': 'NA',
 'anouckdh': 'NA',
 'mitmettoni': 'NA',
 'ellinaa': 'NA',
 'floor___': 'NA',
 'louisebc': 'NA',
 'laurameershoek': 'NA',
 'kimvuurboom': 'NA',
 'dieffiee': 'NA',
 'vera.anne.bakker': 'NA',
 'tamarabreugelmans': 'NA',
 'hannahvanderstok': 'NA',
 'tesselbossen': 'NA',
 'robinvanschuylenburch': 'NA',
 'rafickdemol': 'NA',
 'sboogje': 'NA'

#### Find all names

Extract name of user

In [160]:
def names():
    
    # Load profile.json to get name of user
    with open(json_file_account, encoding = "utf8") as json_user:
        name = json.load(json_user)

    # Create dictionary
    name_dic = {}
    name_dic = {name['registration_info']['registration_username']: 'NA'}

    return(name_dic)

names()

{'Roos': 'NA'}

Extract mail of user

In [161]:
def mail():
    
    # Load profile.json to get username of user
    with open(json_file_user, encoding = "utf8") as json_user:
        user = json.load(json_user)
    
    mail_dic = {}
    mail_dic = {user['email']: 'NA'}
    
    return(mail_dic)

mail()

{'vladimirvladimirina@gmail.com': 'NA'}

Create functions to find names in text

In [162]:
import json
import re
import spacy
from pathlib import Path
import math

In [163]:
def read_json(path):
    with open(path, encoding = "utf8") as f:
        data =json.loads(f.read())
    return data

mydata = read_json(json_file_mes)

In [164]:
def get_pii(string):
    """List all proper nouns, email adresses and phone nrs in a given string"""
    
    email_regex = "(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)"
    mob_regex = "(\d{3}[-\.\s]??\d{3}[-\.\s]??\d{4}|\(\d{3}\)\s*\d{3}[-\.\s]??\d{4}|\d{3}[-\.\s]??\d{4})"
    
    processed = nlp(string)
    pii = list()
    
    for token in processed:
        if token.pos_ == 'PROPN':
            pii.append(token.text)
        elif re.search(mob_regex, str(token)):
            pii.append(token.text)
        elif re.search(email_regex, token.text):
            pii.append(token.text)
        
    return pii

In [165]:
def extract_values(obj, key):
    """Pull all values of specified key from nested JSON."""
    arr = []

    def extract(obj, arr, key):
        """Recursively search for values of key in JSON tree."""
        if isinstance(obj, dict):
            for k, v in obj.items():
                if k == key:
                    arr.append(v)         
                if isinstance(v, (dict, list)):
                    extract(v, arr, key)
        elif isinstance(obj, list):
            for item in obj:
                extract(item, arr, key)
        return arr

    results = extract(obj, arr, key)
    return results
    

Extract senders (usernames) from messages

In [166]:
senders = extract_values(mydata,'sender')
set(senders)

{'adjoayo',
 'ann.eliess',
 'annexevita',
 'anouckdh',
 'appelpartje',
 'beberson',
 'beertjelohman',
 'chairaserrarens',
 'charlottehofstee',
 'danaesme',
 'dieffiee',
 'doris.daily',
 'evaendema',
 'farliaa',
 'guusje002',
 'hannadohle',
 'htullemans',
 'ingevanooijen',
 'irisgombert',
 'jaspervdzwaag',
 'jboonstra73',
 'jolivere',
 'juulhuitema',
 'kimnetsanav94',
 'kmdennard',
 'leonmarijn.s',
 'lissmits_',
 'lizzie_jmo',
 'louisebc',
 'mariannedhk',
 'marinbaelde',
 'mikevzwieten',
 'momo_schaap',
 'neuroseps',
 'noaduizend',
 'pirosssvl',
 'rafickdemol',
 'roh_ree',
 'roosvoor',
 'rrougoor',
 'saarhollander',
 'samsalasamba',
 'serinakragt',
 'sophie_soof',
 'suzannedezwaan',
 'tamarabreugelmans',
 'tesselbossen',
 'theycallmenita',
 'tiarmaguvnor',
 'vandevussevanrijn',
 'veerlegewoon',
 'yaylailksoy'}

In [167]:
# Check if there are unknown senders in the list
dictionary = usernames()

round = 0
for i in senders:
    if i in dictionary:
        next
    else: 
        round = round + 1
        print("There are unknown senders!")
        
if (round == 0):
    print("All senders are in dictionary")

All senders are in dictionary


Extract names from messages

In [168]:
# Load messages.json
with open(json_file_mes, encoding = "utf8") as json_messages:
    message = json.load(json_messages)

messages = pd.DataFrame.from_dict(message[1], 
                                          orient = 'index').T
for i in range(2, len(message)):
    # Create dataframe    
    messages = messages.append(pd.DataFrame.from_dict(message[i], 
                                          orient = 'index').T)

# Generate dataframe from message conversation 
messages_conversation = pd.DataFrame(messages['conversation'].dropna().values.tolist())  

In [169]:
mydata = messages_conversation[{'text', 'sender'}]
mydata

Unnamed: 0,sender,text
0,roosvoor,Haha lekkaahh
1,beertjelohman,Ja sowieso!
2,roosvoor,Maar kan dus alleen maar beter wordn 💪🎉
3,roosvoor,Lekker begin 😅
4,roosvoor,Haha o nooooo
...,...,...
1671,roosvoor,Haha ja als je niet in dat huis moet zitten we...
1672,jaspervdzwaag,Hahah dit is best grappig...
1673,doris.daily,
1674,roosvoor,Hahahaha


In [170]:
# For dutch: spacy.load("nl_core_news_sm")
# For english: spacy.load("en_core_web_sm")

import nl_core_news_sm

nlp = nl_core_news_sm.load()
 
def get_pii(string):
    """List all proper nouns, email adresses and phone nrs in a given string"""

    email_regex = "(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)"
    mob_regex = "(\d{3}[-\.\s]??\d{3}[-\.\s]??\d{4}|\(\d{3}\)\s*\d{3}[-\.\s]??\d{4}|\d{3}[-\.\s]??\d{4})"

    processed = nlp(string)
    pii = list()

    for token in processed:
        if token.pos_ == 'PROPN':
            pii.append(token.text)
        elif re.search(mob_regex, str(token)):
            pii.append(token.text)
        elif re.search(email_regex, token.text):
            pii.append(token.text)

    return pii


In [171]:
# Create list with all nouns (hopefully including all names) and corresponding senders
nouns = list()
sender = list()

for i in range(0,len(mydata)):
    my_text = mydata['text'][i]
    if pd.isnull(my_text):
        next
    else:
        nouns.append(get_pii(my_text))
        sender.append(mydata['sender'][i])

In [172]:
# Create dataframe 
data = pd.DataFrame()
data['sender'] = sender
data['nouns'] = nouns

# Get index of empty nouns
row = []
for i in range(0,len(data)):
    if data['nouns'][i] == []:
        row.append(i)

# Create dataframe with all found nouns
clean = data.drop(row)
clean = clean.reset_index(drop = True)
clean

Unnamed: 0,sender,nouns
0,roosvoor,[lekkaahh]
1,roosvoor,[🎉]
2,roosvoor,[nooooo]
3,beertjelohman,"[ja, zooo]"
4,roosvoor,"[quality, brush]"
...,...,...
253,mariannedhk,[❤]
254,kmdennard,[birthday]
255,roosvoor,[nice]
256,lissmits_,[😍]


In [141]:
############################################# OLD ################################################################

# Load json files

## MESSAGES


### load Messages file

In [207]:
def messages():
    # Load messages.json
    with open(json_file_mes, encoding = "utf8") as json_messages:
        message = json.load(json_messages)

    messages = pd.DataFrame.from_dict(message[1], 
                                              orient = 'index').T
    for i in range(2, len(message)):
        # Create dataframe    
        messages = messages.append(pd.DataFrame.from_dict(message[i], 
                                              orient = 'index').T)

    # Generate dataframe from message conversation 
    messages_conversation = pd.DataFrame(messages['conversation'].dropna().values.tolist())  
    
    return(messages_conversation)

In [218]:
messages_conversation = messages()
file= messages_conversation[{'sender','text'}]

### Content messages_conversation
 * sender
 * created_at
 * story_share
 * text
 * media_owner
 * media_share_caption
 * media_share_url
 * mentioned_username
 * media
 * video_call_action
 * likes
 * animated_media_images
 * is_random
 * user
 * action
 * story_share_type
 * heart
 * voice_media
 * link
 * profile_share_username
 * profile_share_name

#### Variables in messages_conversation with (user)names in them

In [288]:
# Columns with only user in it
sender = messages_conversation[messages_conversation['sender'].isna() == False]['sender']
media_owner = messages_conversation[messages_conversation['media_owner'].isna() == False]['media_owner']
mentioned_username = messages_conversation[messages_conversation['mentioned_username'].isna() == False]['mentioned_username']
video_call = messages_conversation[messages_conversation['video_call_action'].isna() == False]['video_call_action']
likes = messages_conversation[messages_conversation['likes'].isna() == False]['likes']
action = messages_conversation[messages_conversation['action'].isna() == False]['action']
profile_share_username = messages_conversation[messages_conversation['profile_share_username'].isna() == False]['profile_share_username']
profile_share_name = messages_conversation[messages_conversation['profile_share_name'].isna() == False]['profile_share_name']

# Columns with users in text ('s)
story_share = messages_conversation[messages_conversation['story_share'].isna() == False]['story_share']

# Columns with users tagged in text (@)
text_tagged = messages_conversation[messages_conversation['text'].isna() == False]['text']
media_caption_tagged = messages_conversation[messages_conversation['media_share_caption'].isna() == False]['media_share_caption']

# Columns with users randomly mentioned (probably as 'actual' name, not as username)
text_mentioned = messages_conversation[messages_conversation['text'].isna() == False]['text']
media_caption_mentioned = messages_conversation[messages_conversation['media_share_caption'].isna() == False]['media_share_caption']

#### Load Media file

In [20]:
def main_stories():
    # Load media.json
    with open(json_file_med, encoding = "utf8") as json_media:
        media = json.load(json_media)

    media = pd.DataFrame.from_dict(media, 
            orient = 'index').T 

    # Generate separate DataFrames for the different lists (i.e., stories, photos, videos) in media
    stories_media = pd.DataFrame(media['stories'].dropna().values.tolist())
    
    return(stories_media)

In [21]:
def main_photos():
    # Load media.json
    with open(json_file_med, encoding = "utf8") as json_media:
        media = json.load(json_media)

    media = pd.DataFrame.from_dict(media, 
            orient = 'index').T 

    # Generate separate DataFrames for the different lists (i.e., stories, photos, videos) in media
    photos_media  = pd.DataFrame(media['photos'].dropna().values.tolist())
    
    return(photos_media)

In [22]:
def main_profile():
    # Load media.json
    with open(json_file_med, encoding = "utf8") as json_media:
        media = json.load(json_media)

    media = pd.DataFrame.from_dict(media, 
            orient = 'index').T 

    # Generate separate DataFrames for the different lists (i.e., stories, photos, videos) in media
    profile_media = pd.DataFrame(media['profile'].dropna().values.tolist())
    
    return(profile_media)

In [23]:
def main_videos():
    # Load media.json
    with open(json_file_med, encoding = "utf8") as json_media:
        media = json.load(json_media)

    media = pd.DataFrame.from_dict(media, 
            orient = 'index').T 

    # Generate separate DataFrames for the different lists (i.e., stories, photos, videos) in media
    videos_media  = pd.DataFrame(media['videos'].dropna().values.tolist())
    
    return(videos_media)


In [25]:
main_videos()
main_photos()

Unnamed: 0,caption,taken_at,path,location
0,Dagje naar Artis met de fam 🐒\n#NIETWIEBELEN#j...,2020-03-06T10:52:18+00:00,photos/202003/2075921aba6fb776c498512841a02bbf...,
1,Dagje naar Artis met de fam 🐒\n#NIETWIEBELEN#j...,2020-03-06T10:52:18+00:00,photos/202003/9af411fcc7be8bb8694554a57ddf71ae...,
2,Dagje naar Artis met de fam 🐒\n#NIETWIEBELEN#j...,2020-03-06T10:52:18+00:00,photos/202003/f551aa1d550fd97483e8ba865202cb80...,
3,Dagje naar Artis met de fam 🐒\n#NIETWIEBELEN#j...,2020-03-06T10:52:18+00:00,photos/202003/735459e0859bbc9a2245eb1daf91f841...,
4,"""Traveling is all about finding yourself"" 🌍🧘‍♀...",2020-01-21T13:40:28+00:00,photos/202001/456345dbd19f6f856ad0c0fb7c33daba...,
...,...,...,...,...
151,Groepje11 for the win!!#makingSTICSgreatagain,2017-05-16T19:25:59+00:00,photos/201705/ecbb6d3da131aa1062473723ffdf18ec...,
152,Groetjes uit Fryslân!! #cognitoweekend,2017-05-14T15:29:56+00:00,photos/201705/82493ccb5a7ae7965fec2d829d0352a3...,
153,We're on sceen! #CogniTalks#Sooooofficial,2017-04-05T15:31:24+00:00,photos/201704/005e9a149756120a204ddf85cdf1dab4...,Pakhuis De Zeijger Amsterdam
154,Stampot pie #fitgirl#healthy#grandparentsknowbest,2017-04-02T16:58:50+00:00,photos/201704/e741840c5ec33d0d3e12bbf40771c747...,


In [38]:
def direct():
    # Load media.json
    with open(json_file_med, encoding = "utf8") as json_media:
        media = json.load(json_media)

    media = pd.DataFrame.from_dict(media, 
            orient = 'index').T 

    # Generate separate DataFrames for the different lists (i.e., stories, photos, videos) in media
    direct_media  = pd.DataFrame(media['direct'].dropna().values.tolist())
    
    return(direct_media)

In [42]:
if __name__ == '__main__':
    main_videos()

Unnamed: 0,caption,taken_at,location,path
0,Good night 👋💤🌎🌞⬇️🌌🌜⬆️🌠💖 #timelapse#whereismyph...,2019-10-16T23:55:52+00:00,Lake Erie,videos/201910/e40cb11c791e46154182aff50eb7978e...


**Load Comments file**

In [52]:
# Load comments.json
with open(json_file_com, encoding = "utf8") as json_comments:
        comments = json.load(json_comments)

comments = pd.DataFrame.from_dict(comments, 
                                      orient = 'index').T 

# Generate dataframe from media comments 
media_comments = pd.DataFrame(comments['media_comments'].dropna().values.tolist())
media_comments

Unnamed: 0,0,1,2
0,2020-03-06T20:22:31+00:00,@adjoayo inspiratie? ;),the.pinklemonade
1,2019-11-10T16:35:04+00:00,You look so pretty!! And you too Liz 😉😋,lizzie_jmo
2,2019-10-17T02:22:22+00:00,@lizzie_jmo Magic🙌,roosvoor
3,2019-10-07T22:11:10+00:00,Hier!!,doris.daily
4,2019-08-19T12:08:21+00:00,Waar zijn de punani's?,doris.daily
5,2019-07-22T17:08:59+00:00,Wat ben je toch fotogeniek Guusie😍,guusje002
6,2019-05-08T19:51:49+00:00,@yaramiora geen spat veranderd toch? ;),roosvoor
7,2019-05-06T16:56:19+00:00,@lieessmits 😬,roosvoor
8,2019-05-06T16:56:00+00:00,@dieffiee dankje! 😊,roosvoor
9,2019-05-06T16:07:28+00:00,@doris.daily 🙄🙄🙄,roosvoor


# Anonymize

In [226]:
# Transform all senders to anonimized userkeys
def sender():

    for i in range(0,len(messages_conversation.index)):
        if messages_conversation['sender'][i] in syn_dic['user']:
            messages_conversation['sender'][i] = syn_dic['user'][messages_conversation['sender'][i]]
        else:
            messages_conversation['sender'][i] = syn_dic['connections'][messages_conversation['sender'][i]]

In [227]:
# Transform all users mentioned in 'story_share' to anonimized userkeys
def story_share():
    unknown_dic = []
    for i in range(0,len(messages_conversation.index)):
        if isinstance(messages_conversation['story_share'][i],float):
            continue
        else:
            name_string = messages_conversation['story_share'][i].split()
            name = name_string[1].split("'s")[0]

            if name in syn_dic['user']:
                syn = syn_dic['user'][name]
            elif name in syn_dic['connections']:
                syn = syn_dic['connections'][name]
            else:
                syn = randomword(10)
                unknown = {name, syn}
                unknown_dic.append(unknown)

            messages_conversation['story_share'][i] = syn


In [231]:
# Transform all users mentioned in 'media_owner' to anonimized userkeys
def media_owner():
    
    for i in range(0,len(messages_conversation.index)):
        if isinstance(messages_conversation['media_owner'][i],float):
            continue
        else:
            name = messages_conversation['media_owner'][i]

            if name in syn_dic['user']:
                syn = syn_dic['user'][name]
            elif name in syn_dic['connections']:
                syn = syn_dic['connections'][name]
            elif name in unknown_dic:
                syn = unknown_dic[name]
            else:
                syn = randomword(10)
                unknown = {name, syn}
                unknown_dic.append(unknown)

            messages_conversation['media_owner'][i] = syn


In [233]:
# Transform all users mentioned in 'mentioned_username' to anonimized userkeys
def mentioned_username():
    
    for i in range(0,len(messages_conversation.index)):
        if isinstance(messages_conversation['mentioned_username'][i],float):
            continue
        else:
            name = messages_conversation['mentioned_username'][i]

            if name in syn_dic['user']:
                syn = syn_dic['user'][name]
            elif name in syn_dic['connections']:
                syn = syn_dic['connections'][name]
            elif name in unknown_dic:
                syn = unknown_dic[name]
            else:
                syn = randomword(10)
                unknown = {name, syn}
                unknown_dic.append(unknown)

            messages_conversation['mentioned_username'][i] = syn

In [271]:
# Transform all tagged users mentioned in 'text' to ananimized userkeys
def text():
    for i in range(0,len(messages_conversation.index)):
        if isinstance(messages_conversation['media_share_caption'][i],float):
            continue
        else:
            string = messages_conversation['media_share_caption'][i]
            split = string.split()
            tag = []
            for i in range(0,len(split)):
                if split[i].find('@') == 0:
                    tag = split[i]
                    name = tag.split('@')[1]

                    if name in syn_dic['user']:
                        syn = syn_dic['user'][name]
                    elif name in syn_dic['connections']:
                        syn = syn_dic['connections'][name]
                    elif name in unknown_dic:
                        syn = unknown_dic[name]
                    else:
                        syn = randomword(10)
                        unknown = {name, syn}
                        unknown_dic.append(unknown) 

In [1]:
# Apply anonymization
media_owner()
story_share()
sender()
mentioned_username()


NameError: name 'media_owner' is not defined