# NYC Data Science Academy Cohort 12 (Winter 2018/Jan to Mar 2018)

The following were assigned to the BuiltByGirls group:
1. Gregory Brucchieri
2. Marissa Joy
3. William Kye
4. Lainey Liu
5. Zipporah Polinsky-Nagel
6. Ansel Andro Santos
7. Merle Strahlendorf

Pipeline Design - Ansel and Lainey <br>
Data Cleaning and Corpus Creation - Marissa and Merle <br>
Exploratory Data Analysis - Gregory and Zipporah <br>
Feature Importance using Logistic Regression - William <br>
K Nearest Neighbors Scoring and Allocation Algo - Ansel <br>
Logistic Regression Allocation Algo - William <br>
Flask Dashboard - Ansel and Lainey

# Prototype of the the recommender algo (Part of the Pipeline)

# Part 1: Loading the packages to be used

In [27]:
import pandas as pd
import numpy as np
import nltk
from sklearn.neighbors import NearestNeighbors

pd.set_option('display.max_columns', None)
pd.options.mode.chained_assignment = None  # default='warn'

# Part 2: The users file used here is the raw file provided by corie and ciara. this is raw data for the pipeline

Important:
1. The users file here is the one to be used in the creation of the Corpus
2. This means that more is better
3. The minimum requirement is to include the group of users you want to match between each other
4. You can add additional users which you will not match just for the words that they have in their user profile, to make the corpus more robust.  Ideally, users who updated their profile over the past two years would be a good addition because this adds jargons and technical terms into the corpus that was popular over the past two years.

In [28]:
users = pd.read_csv('inputs/users.csv')
drop_cols_users = ['lookingForMentor', 'parentalsignaturecompleted', 
                   'parentalsignaturesent', 'step1completed', 'step2completed', 'checkrcomplete', 
                   'created_timestamp', 'waveUserStatus', 'waveParticipant', 
                   'waveParticipantActive', 'referrer', 'is_test_data', 'has_scheduled', 'match_received', 
                   'viewed_match', 'current_session_number', 'gender', 'onboardingCompletedAt', 
                   'zipcode', 'waveBatch', 'note', 'updated_at', 'updated_by']
users = users.drop(drop_cols_users, axis=1)
users.head(2)

Unnamed: 0,date_joined,brand,experience,goal,obsessions,step3completed,userType,company,location,role,skill,superpower,title,app,industry1,industry2,industry3,techtype1,techtype2,techtype3,topic1,topic2,topic3,is_prepped,wave_number_joined,year_in_school,years_experience,is_vip,company_clean,nyc_id
0,2017/08/31 11:19:10,Red Bull and Everlane,associate,,"Garance Dore. Hand script, doodles, watercolor...",Y,professional,Uber,New York City,Marketing,,,Brand Marketing Designer,Teuxdeux and Evernote,Transport,,,Mobile App,,,UX/UI,Storytelling/Brand,Media/Content,N,1,,,Y,Uber,8843543
1,2017/08/13 2:37:55,,0,Learn about roles beside coding that are also ...,"Art history, the ACLU's Instagram page, Jasmin...",Y,student,,Somewhere else,,,,,Countable and Snapchat,Government & Politics,Travel,Music,Web,Mobile App,Video,Media/Content,Storytelling/Brand,Product Management,N,3,,,N,,7755085


# Getting the topics into a csv (for the dashboard)

In [29]:
topics = users[['topic1', 'topic2', 'topic3']]
topics_words = []
for col in topics:
    for topic in topics[col]:
        if pd.isna(topic): continue
        topics_words.append(topic)

In [30]:
topics_unique = list(set(topics_words))
topics = [[x, topics_words.count(x)] for x in topics_unique]

from operator import itemgetter
topics = sorted(topics, key=itemgetter(1), reverse=True)
topics = pd.DataFrame(topics)
topics.columns = ['topics', 'count']
topics.head()

Unnamed: 0,topics,count
0,Product Management,615
1,Business Operations,611
2,Business Model,604
3,Engineering - Back End,598
4,Data/Analytics,577


In [31]:
topics_df = topics
tot = sum(x for x in topics_df['count'])
top_topics = [[x,'{0:.2f}'.format(round((y/tot)*100,2))] for x,y in zip(topics_df['topics'],topics_df['count'])]
sum_top5 = sum([float(top_topics[x][1]) for x in range(5)])
top_topics = [top_topics[x] for x in range(5)] + [['Others', '{0:.2f}'.format(100-sum_top5)]]
top_topics

[['Product Management', '9.78'],
 ['Business Operations', '9.72'],
 ['Business Model', '9.61'],
 ['Engineering - Back End', '9.51'],
 ['Data/Analytics', '9.18'],
 ['Others', '52.20']]

In [32]:
topics.to_csv('outputs/topics.csv')

# industry tech and topics .csv (for the dashboard)

In [33]:
ind_tech_tops = users[['industry1','industry2','industry3','techtype1','techtype2','techtype3',
                    'topic1', 'topic2', 'topic3']]
ind_tech_tops_words = []
for col in ind_tech_tops:
    for ind_tech_top in ind_tech_tops[col]:
        if pd.isna(ind_tech_top): continue
        ind_tech_tops_words.append(ind_tech_top)

ind_tech_tops_unique = list(set(ind_tech_tops_words))
ind_tech_tops = [[x, ind_tech_tops_words.count(x)] for x in ind_tech_tops_unique]

from operator import itemgetter
ind_tech_tops= sorted(ind_tech_tops, key=itemgetter(1), reverse=True)
ind_tech_tops = pd.DataFrame(ind_tech_tops)
ind_tech_tops.columns = ['ind_tech_tops', 'count']
ind_tech_tops.head()

Unnamed: 0,ind_tech_tops,count
0,Software,904
1,Mobile App,881
2,Technology,858
3,Web,745
4,Product Management,615


In [34]:
ind_tech_tops.to_csv('outputs/ind_tech_tops.csv')

# Part 3-A: Select columns to use for the corpus (Part of the Pipeline)

In [35]:
#put the columns here you want to include in the creation of the corpus, including nyc_id
cols_to_use = ['nyc_id', 'date_joined', 'userType', 'location', 'wave_number_joined', 'brand', 'obsessions', 'company', 'role', 'app', 
                   'industry1', 'industry2', 'industry3', 'techtype1', 'techtype2', 'techtype3', 
               'topic1', 'topic2', 'topic3']
df_initial = users[cols_to_use]
df_initial.head(2)

Unnamed: 0,nyc_id,date_joined,userType,location,wave_number_joined,brand,obsessions,company,role,app,industry1,industry2,industry3,techtype1,techtype2,techtype3,topic1,topic2,topic3
0,8843543,2017/08/31 11:19:10,professional,New York City,1,Red Bull and Everlane,"Garance Dore. Hand script, doodles, watercolor...",Uber,Marketing,Teuxdeux and Evernote,Transport,,,Mobile App,,,UX/UI,Storytelling/Brand,Media/Content
1,7755085,2017/08/13 2:37:55,student,Somewhere else,3,,"Art history, the ACLU's Instagram page, Jasmin...",,,Countable and Snapchat,Government & Politics,Travel,Music,Web,Mobile App,Video,Media/Content,Storytelling/Brand,Product Management


# Part 3-B: Tokenize the words (Part of the Pipeline)

In [36]:
#put the columns here you want to include in the creation of the corpus
cols_no_id = ['brand', 'obsessions', 'company', 'role', 'app', 
                   'industry1', 'industry2', 'industry3', 'techtype1', 'techtype2', 'techtype3', 
              'topic1', 'topic2', 'topic3']
corpus = []
for col in cols_no_id:
    for row in df_initial[col]:
        if pd.isna(row): continue
        for s in nltk.word_tokenize(row):
            corpus.append(s.lower())

# Part 3-C: Remove emojis, stopwords and punctuation

In [37]:
#definition of emojis
import re

emoji_pattern = re.compile("["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                           "]+", flags=re.UNICODE)

In [38]:
from nltk.corpus import stopwords
stopWords = stopwords.words('english')
#listing additional stopwords
stopWords.append('also')
stopWords.append("'ll")
stopWords.append("'s")
stopWords.append("'m")
stopWords.append("n't")
stopWords.append("s'well")

stopWords  = set(stopWords)
print (stopWords)

{"s'well", 'me', "that'll", "couldn't", 'by', 'between', 'very', 'mightn', 'up', 'am', 'the', 'my', 'off', 'theirs', 'with', 'ma', "needn't", 'of', 'an', 'these', 'than', 'because', 'whom', "won't", 'you', 'him', 'again', "you've", 'm', 'is', 'into', 'yourself', 'were', 'above', 'so', 'below', 'few', 'y', "didn't", 'have', 'same', 'which', 'before', 'a', "don't", 'ours', 'wasn', "isn't", 'i', 'll', 'at', 'here', "aren't", 'for', 'aren', "mustn't", 'where', 'he', 'most', 'their', 'didn', 'doesn', 'against', 'not', 'or', 'under', 'to', 'did', 'during', 'both', 'we', "you'd", 's', 'but', 'over', 'its', 'what', 't', 'mustn', 'just', 'them', 'through', "she's", "it's", 'after', 'some', 'own', 'o', 'itself', 'myself', "hasn't", 'can', 'they', 'ain', 'each', 'wouldn', 'further', 'yours', 'had', 'as', 'yourselves', 'only', 'being', 'does', "you're", "you'll", 'about', 'couldn', 'd', 'be', 'more', 'needn', 'while', 'in', "'ll", 'other', 'how', 'shan', 'our', 'don', 'any', 'doing', 'down', 'then

In [39]:
punctuations = ['!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', 
                '=', '>', '?', '@', '[', '\\', ']', '^', '_', '`', '{', '|', '}', '~', '--']

In [40]:
print ('original len: {}'.format(len(corpus)))
corpus = [emoji_pattern.sub(r'', x) for x in corpus]
print ('less emoji: {}'.format(len(corpus)))
corpus = [x for x in corpus if x not in stopWords]
print ('less stopwords: {}'.format(len(corpus)))
corpus = [x for x in corpus if x not in punctuations]
print ('words with punctuation marks: {}'.format(len(corpus)))

original len: 80999
less emoji: 80999
less stopwords: 66192
words with punctuation marks: 54409


# Part 3-D: Lemmatization and removal of words which were not filtered (part of the Pipeline)

In [41]:
from nltk.stem.wordnet import WordNetLemmatizer
lmtzr = WordNetLemmatizer()

corpus = [lmtzr.lemmatize(x) for x in corpus]

In [42]:
corpus_unique = list(set(corpus))

not_impt_word = ['taking', 'see', 'month', 'started', 'soon', 'open', 'advisor', 'coaching', 'build', 'within', 
                 'turn', 'already', 'guide', '10', 'finishing', '90', 'behind', 'buying', 'period', 
                 'whether', 'definitely', 'staying', 'bit', 'x', 'becoming', 'attempting', 'coming', 'far', 
                 'r', 'age', 'researching', 'probably', 'p', '6', 'follow', 'living', 'im', 'able', 'article', 
                 'starting', 'co', "'re", 'list', 'riding', "'d", 'oh', 'wait', 'include', 'seen', 'l', 
                 'latest', 'ago', 'nyt', 'none', 'issue', 'hour', 'watched', 'went', 'user', 'another', 
                 'without', 'become', 'enough', 'mostly', '7', 'note', 'side', 'run', 'top', 'test', 
                 'come', '5', 'moved', 'often', 'interesting', 'meeting', 'actually', 'yet', 'collecting', 
                 'ny', 'b', 'san', 'helping', 'seeing', 'h', '3', 'specifically', '2', 'would', 
                 'called', 'week', 'last', 'current', 'taking', 'u', 'etc', 'want', 'use', 'getting', 
                 'lot', 'next', 'go', 'two', 'every', '...', 'especially', 'way', 'everything', 
                 '’', "'ve", "''", '``', 'like', 'currently']

for s in not_impt_word:
    if s in corpus_unique:
        corpus_unique.remove(s)

corpus_count = [[x, corpus.count(x)] for x in corpus_unique]
corpus_count = [x for x in corpus_count if x[1]>5]

from operator import itemgetter
corpus_count = sorted(corpus_count, key=itemgetter(1), reverse=True)

In [43]:
print (len(corpus_count))
corpus_count

929


[['engineering', 2130],
 ['mobile', 1316],
 ['business', 1309],
 ['end', 1130],
 ['app', 975],
 ['technology', 968],
 ['software', 927],
 ['product', 848],
 ['management', 821],
 ['web', 807],
 ['operation', 735],
 ['data/analytics', 684],
 ['back', 626],
 ['model', 617],
 ['medium', 583],
 ['storytelling/brand', 575],
 ['front', 528],
 ['ux/ui', 456],
 ['entertainment', 443],
 ['social', 437],
 ['media/content', 420],
 ['network', 397],
 ['commerce', 396],
 ['uber', 374],
 ['reality', 369],
 ['music', 360],
 ['virtual', 355],
 ['video', 347],
 ['amazon', 332],
 ['instagram', 325],
 ['hardware', 317],
 ['thing', 309],
 ['marketing', 305],
 ['obsessed', 287],
 ['love', 286],
 ['new', 279],
 ['ecommerce/delivery', 276],
 ['vc', 273],
 ['robotics', 271],
 ['fashion', 269],
 ['snapchat', 248],
 ['beauty', 244],
 ['growth', 242],
 ['food', 237],
 ['lever', 236],
 ['internet', 203],
 ['transport', 176],
 ['spotify', 172],
 ['learning', 168],
 ['game', 160],
 ['gaming', 156],
 ['travel', 154]

# Part 3-E: Creation of the corpus unique list (Part of the Pipeline)

In [44]:
corpus_unique = [x[0] for x in corpus_count]
corpus_unique

['engineering',
 'mobile',
 'business',
 'end',
 'app',
 'technology',
 'software',
 'product',
 'management',
 'web',
 'operation',
 'data/analytics',
 'back',
 'model',
 'medium',
 'storytelling/brand',
 'front',
 'ux/ui',
 'entertainment',
 'social',
 'media/content',
 'network',
 'commerce',
 'uber',
 'reality',
 'music',
 'virtual',
 'video',
 'amazon',
 'instagram',
 'hardware',
 'thing',
 'marketing',
 'obsessed',
 'love',
 'new',
 'ecommerce/delivery',
 'vc',
 'robotics',
 'fashion',
 'snapchat',
 'beauty',
 'growth',
 'food',
 'lever',
 'internet',
 'transport',
 'spotify',
 'learning',
 'game',
 'gaming',
 'travel',
 'education',
 'finance',
 'health',
 'design',
 'qa',
 'school',
 'politics',
 'show',
 'nonprofit',
 'wellness',
 'book',
 'reading',
 'government',
 'time',
 'dog',
 'content',
 'watching',
 'one',
 'really',
 'always',
 'google',
 'high',
 'youtube',
 'twitter',
 'art',
 'traveling',
 'trying',
 'development',
 'ux/ui/design',
 'wearable',
 'cooking',
 'throne

In [45]:
np.savetxt("outputs/corpus.csv", corpus_count, delimiter=",", fmt='%s')

# Part 4: Creation of str_combined column for all rows in the users dataframe

# Create column which aggregates the words, then filter based on corpus_unique (part of the Pipeline)

In [46]:
str_combined = []
for index, row in df_initial.iterrows():
    tempstr = ' '
    for s in cols_no_id:
        if pd.isna(row[s]): continue
        else:
            tempstr += ' {}'.format(str(row[s]))
    str_combined.append(tempstr.lower())

df_initial.loc[:,'str_combined'] = str_combined

In [47]:
df_initial.loc[0,"str_combined"]

'  red bull and everlane garance dore. hand script, doodles, watercolor. spaghetti squash. whether i should buy a food processor or a nutribullet. spam (the food). uber marketing teuxdeux and evernote transport mobile app ux/ui storytelling/brand media/content'

In [48]:
df_initial.loc[:,'str_combined'] = [nltk.word_tokenize(x) for x in df_initial['str_combined']]
df_initial.loc[:,'str_combined'] = [[emoji_pattern.sub(r'', x) for x in y] for y in df_initial['str_combined']]
df_initial.loc[:,'str_combined'] = [[x for x in y if x not in stopWords] for y in df_initial['str_combined']]
df_initial.loc[:,'str_combined'] = [[x for x in y if x not in punctuations] for y in df_initial['str_combined']]
df_initial['str_combined'] = [list(set(x)) for x in df_initial['str_combined']]
df_initial['str_combined'] = [[x for x in y if x in corpus_unique] for y in df_initial['str_combined']]

In [49]:
df_initial.head(2)

Unnamed: 0,nyc_id,date_joined,userType,location,wave_number_joined,brand,obsessions,company,role,app,industry1,industry2,industry3,techtype1,techtype2,techtype3,topic1,topic2,topic3,str_combined
0,8843543,2017/08/31 11:19:10,professional,New York City,1,Red Bull and Everlane,"Garance Dore. Hand script, doodles, watercolor...",Uber,Marketing,Teuxdeux and Evernote,Transport,,,Mobile App,,,UX/UI,Storytelling/Brand,Media/Content,"[everlane, marketing, uber, storytelling/brand..."
1,7755085,2017/08/13 2:37:55,student,Somewhere else,3,,"Art history, the ACLU's Instagram page, Jasmin...",,,Countable and Snapchat,Government & Politics,Travel,Music,Web,Mobile App,Video,Media/Content,Storytelling/Brand,Product Management,"[government, politics, music, page, instagram,..."


In [50]:
df_initial.loc[0, 'str_combined']

['everlane',
 'marketing',
 'uber',
 'storytelling/brand',
 'app',
 'ux/ui',
 'evernote',
 'hand',
 'food',
 'watercolor',
 'transport',
 'media/content',
 'red',
 'buy',
 'mobile']

# Part 5-A: Defining the KNN function (one computation, not part of the pipeline. used for checking)

In [51]:
from sklearn.neighbors import NearestNeighbors

def create_df_distance(df, n_index):
    df_new = df.copy()
    cols = df_new.loc[n_index, 'str_combined']
    for s in cols:
        df_new[s] = 0
    
    df_new = df_new[cols]
    
    for x in range(0,len(df_new),1):
        for s in df.loc[n_index, 'str_combined']:
            if s in s in df.loc[x, 'str_combined']:
                df_new.loc[x,s] = 1
            else:
                df_new.loc[x,s] = 0
    
    X = df_new.as_matrix().copy()

    nbrs = NearestNeighbors(n_neighbors=len(X), algorithm='auto', metric='euclidean').fit(X)

    distances, indices = nbrs.kneighbors(X)
    xtest = df_new.iloc[n_index].as_matrix()
    xtest = xtest.reshape(1, -1)

    distances, indices = nbrs.kneighbors(xtest)
    
    df_new['knn_distance'] = 0
    max_dist = max(list(distances[0][:]))
    
    for x,y in zip(list(indices[0][:]),list(distances[0][:])):
        df_new.loc[x, 'knn_distance'] = y/max_dist
    
    df_id_dist = df_new.copy()
    df_id_dist['nyc_id'] = df['nyc_id']
    df_id_dist = df_id_dist[['nyc_id', 'knn_distance']]
    
    return df_new, df_id_dist

In [52]:
df_ind0, df_ind0_id_dist = create_df_distance(df_initial, 1)
df_ind0.sort_values(['knn_distance']).head(10)

Unnamed: 0,government,politics,music,page,instagram,making,storytelling/brand,nail,app,management,wine,polish,cover,art,video,snapchat,product,travel,media/content,web,history,mobile,knn_distance
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0.0
50,0,0,0,0,0,0,1,0,1,1,0,0,0,1,1,0,1,0,1,1,0,1,0.768706
2179,0,0,1,0,0,0,1,0,1,1,0,0,0,0,1,0,1,1,1,0,0,1,0.768706
417,1,1,0,0,1,0,1,0,1,1,0,0,0,0,0,0,1,0,0,1,0,1,0.768706
355,1,1,0,0,1,0,1,0,1,0,0,0,0,0,1,0,0,0,1,1,0,1,0.768706
725,1,1,0,0,0,0,0,0,1,1,0,0,0,0,1,1,1,0,1,0,0,1,0.768706
2387,1,1,0,0,0,0,1,0,1,1,0,0,0,0,1,0,1,0,0,1,0,1,0.768706
2020,1,1,1,0,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,1,0,1,0.768706
971,0,0,0,0,0,0,1,0,1,1,0,0,0,0,0,1,1,1,1,1,0,1,0.768706
864,1,1,1,0,1,0,1,0,1,0,0,0,0,0,0,1,0,0,0,1,0,1,0.768706


# Part 5-B: Defining the KNN function applied to all other users vs. one user (Output is a list, part of the pipeline)

In [53]:
from sklearn.neighbors import NearestNeighbors

def create_list_distance(df, n_index):
    df_new = df.copy()
    cols = df_new.loc[n_index, 'str_combined']
    for s in cols:
        df_new[s] = 0
    
    df_new = df_new[cols]
    
    for x in range(0,len(df_new),1):
        for s in df.loc[n_index, 'str_combined']:
            if s in s in df.loc[x, 'str_combined']:
                df_new.loc[x,s] = 1
            else:
                df_new.loc[x,s] = 0
    
    X = df_new.as_matrix().copy()

    nbrs = NearestNeighbors(n_neighbors=len(X), algorithm='auto', metric='euclidean').fit(X)

    distances, indices = nbrs.kneighbors(X)
    
    xtest = df_new.iloc[n_index].as_matrix()
    xtest = xtest.reshape(1, -1)

    distances, indices = nbrs.kneighbors(xtest)
    
    df_new['knn_distance'] = 0
    max_dist = max(list(distances[0][:]))
    
    for x,y in zip(list(indices[0][:]),list(distances[0][:])):
        df_new.loc[x, 'knn_distance'] = round(y/max_dist,4)
    
    return list(df_new['knn_distance'])

# Part 6: Creation of the base matrix where we also have the location, is_vip, past advisors of a student, etc. (part of the pipeline)

# Part 6-A: Filtering the universe of people you want to match with each other

Some important notes for filtering being done here:
1. These are the list of students and professionals (advisors) that you want to pair with each other for the current wave
2. Given the requirement in number 1, this means that the number of advisors must be greater than or equal to the number of students
3. This list of users must be part of the original users list used in the creation of the corpus

In [54]:
#users_url is the directory where the users_filtered is located
#users_df is the users_df that has the str_combined (the data frame we created above)
def create_base_matrix(users_filtered_file, past_matches_file, users_df):
    #users_filtered is a file that you (corie/ciara) need to create. 
    #this is the list of users (both student and professionals/advisors) you want to pair with each other
    users_filtered = pd.read_csv(users_filtered_file)
    users_filtered = [x for x in users_filtered['nyc_id']]

    df_distances = users_df.copy()
    for x in df_distances['nyc_id']:
        if x in users_filtered: continue
        else:
            tmpind = df_distances[df_distances['nyc_id']==x].index[0]
            df_distances = df_distances.drop(tmpind)

    #dummy values for past_advisors column
    df_past_matches = pd.read_csv(past_matches_file, header = 0, index_col=None)
    df_past_matches = df_past_matches.dropna()
    df_distances['past_advisors'] = [[int(x) for x,y in \
                                      zip(df_past_matches['advisor_nyc_id'], df_past_matches['advisee_nyc_id'])
                                      if int(y)==int(w)] for w in df_distances['nyc_id']]
    tmp_empty_list = []
    df_distances['past_advisors'] = [x if y=='student' else tmp_empty_list for x,y in zip(df_distances['past_advisors'], 
                                                                              df_distances['userType'])]
    df_distances['topics_all'] = [[x,y,z] for x,y,z in zip(df_distances['topic1'], df_distances['topic2'], df_distances['topic3'])]
    df_distances['topics_all'] = [[x for x in y if not pd.isna(x)] for y in df_distances['topics_all']]
    df_distances['str_combined'] = df_initial['str_combined'].copy()
    df_distances = df_distances[['nyc_id', 'userType', 'location', 'is_vip', 'past_advisors', 'topics_all', 'str_combined']]
    df_distances = df_distances.reset_index(drop=True)
    
    #create checks whether the usertype column is complete or has missing values (error if that happens)
    check_userType = [any([x=='student', x=='professional']) for x in df_distances['userType']]
    if not all(check_userType):
        raise ValueError("userType column has missing values")
    
    #check whether str_combined has an empty list
    check_str_combined = [len(x)>0 for x in df_distances['str_combined']]
    if not all(check_str_combined):
        raise ValueError("str_combined column has missing values")
    
    #check whether topics_all has an empty list
    check_topics_all = [len(x)>0 for x in df_distances['topics_all']]
    if not all(check_topics_all):
        raise ValueError("topics_all column has missing values")
    
    #check if number of students>number of professionals
    if sum(list(df_distances['userType']=='student'))>sum(list(df_distances['userType']=='professional')):
        raise ValueError("Number of students is greater than the number of professionals")
    
    for x in range(0,len(df_distances)):
        tmpname = str(df_distances.loc[x, 'nyc_id'])
        df_distances[tmpname] = 0
    
    df_keywords = df_distances[['nyc_id', 'str_combined']].copy()
    
    return df_keywords, df_distances

In [55]:
#df_keywords is the data frame which will be used as input for the computation of the distances
#df_distances is the matrix where the distances will be inputted
df_keywords, df_distances = create_base_matrix(users_filtered_file='inputs/users_filtered.csv', 
                                               past_matches_file='inputs/past_matches.csv', users_df=users)

In [59]:
#checking df_distances
df_distances.head(2)

Unnamed: 0,nyc_id,userType,location,is_vip,past_advisors,topics_all,str_combined,8843543,7755085,9202605,7666762,432418,3877105,1020980,7896731,6961087,6402058,2290089,1148025,4999309,4369680,382570,284637,8187296,8476629,7306029,5652660,204780,5677744,2593443,4601513,4878277,2620815,3288117,5708352,5474030,4046542,4340193,7474138,3712338,352403,2348870,8526469,7743463,6459524,9929858,9880082,1518524,4788367,8542359,997987,4629976,5064585,4755310,7209834,606005,5017981,5247078,8902800,9134615,7048387,6751671,1348913,6937862,8064126,8436578,8910889,3230467,3140161,3667774,4192795,8301349,8033043,5166603,7485700,9540732,4958919,8561114,3122679,8238975,2562280,7987549,6414426,4101601,1983482,878483,6159047,1557244,6113184,2149492,796123,6672031,4933667,625916,3179761,4914479,2956628,6234328,4558259,6966024,6742971,1497213,3569615,1187639,3758768,7681692,3167915,7792027,9462300,1453483,2598156,477453,3771144,813333,8841456,7679971,3116521,4601941,6306797,3195584,2627748,2169538,4999579,2235203,6437056,2308071,2368457,1390411,7694741,3521250,3206074,301431,2669767,7518538,2256585,94522,6955519,9959606,2037455,2526473,8359351,269209,2825496,3511047,1614397,2659801,4410236,515652,4818902,8219730,7918230,5108683,5238740,7413581,6286127,9113243,3805339,2179494,6059897,9998462,3679840,206547,6117465,9957593,6807435,1739284,9178381,7257843,2109474,8699149,8384707,2360585,3965641,9076155,9318559,6119200,7333768,8697016,7023001,757721,537278,7468074,8175409,8788183,6446789,1857735,7650470,6495154,2611714,710683,152075,2198911,6788313,9853788,9808460,6008056,5240141,8907470,9037441,8912858,6581843,9486722,1110483,4358079,9356485,2271359,7915205,5800707,2794321,2427696,6535431,5402293,8693468,4236687,427550,7888951,2826824,6928588,2527975,5493738,7575883,1597931,1575847,1781688,372231,2282239,4319835,4920915,1484378,2299050,4846292,4336956,5786211,7893434,3895624,2859992,7198024,3423783,6675599,9576956,6126279,5686548,1825469,9700647,2868050,4702672,1146067,4015930,4401457,9546947,94925,1190509,2932188,3266779,8907446,7569840,1746835,3382302,806560,9889124,8560400,6993819,1717171,1758129,3648059,654011,2654860,4409144,345140,7904109,656623,4360626,2373514,6037605,4915718,7945608,381348,9613380,1043372,2004698,8364846,8281303,1555515,3062857,6142470,6025522,2631274,8012883,9664092,7293490,4295994,8811878,4278733,6294925,4723827,8619826,2292042,3960526,5995636,3003753,6696486,6342552,1444202,4885804,4715334,8225142,8653095,1977774,5020397,6314809,6749269,2001782,2569801,4392229,6138397,4095341,2326281,2674601,6272321,7487443,8810505,1085840,8753052,7423155,8633724,6815031,710220,3874565,8985652,7158129,8733709,5064615,2031736,529104,3088834,184099,7680557,8061648,2326949,5205644,6512033,2981141,7601225,822710,2603797,8087478,9938106,3361150,4456308,7594522,9469995,6045830,746096,8259152,422067,7423610,9556430,4214457,3490399,6458334,8846614,6545230,2926356,311409,7324067,5769802,9258719,638772,5485758,6160406,1888196,742264,93981,7958106,8190906,1901532,8632938,2898324,6948891,3509690,7465231,7881557,2320522,1171955,5343640,6617302,2262694,3340724,4604046,2130677,7258728,5023302,7115544,629294,3906792,7841905,4316201,5353925,5978678,4605980,4715885,4923495,125254,2270136,3046384,1905636,1533613,1574686,9973649,436488,2513052,1372385,5791689,7936400,9391125,9773408,5810449,5311075,3092112,1106591,3684523,5182726,7926790,5175387,9389960,428794,3967508,1220914,4345907,5884571,6290417,27779,1579045,1446459,9413760,7110487,2663308,3658078,5880932,8294767,280351,5694419,9224675,7210977,2340455,2452387,2012792,3427395,4187508,4304558,9471033,7748650,385275,9498286,5624053,4686614,8079379,8195703,7828150,1550130,8969696,3032368,1325678,2588879,3703222,6849726,7432662,6778076,6618920,6004942,7496115,6545500,7517791,2799767,5682257,8236719,9658179,7031751,4891992,1963257,8813293,4255011,9835021,5997546,6025427,3398687,1041157,9684715,8769575,5346125,2258308,8923238,7967766,9249359,704895,2185414,9382336,3018819,916644,4119888,8835562,3008395,2212759,9969727,8949839,8265769,4404402,1443873,9990971,9593316,7950567,3828722,7601152,2991448,6512271,2903590,2659610,4077631,2493337,264160,8577110,5766040,2146133,6493111,7638660,7258247,7657509,9932727,7245777,692601,2856925,2352193,2815849,3047315,9642528,3861671,9838132,4075604,4075518,7591652,7908743,9012509,5757510,1005387,4768438,918358,7524608,9934813,6541993,1449114,606407,7931432,4735198,5695334,6550439,8445244,746687,8725761,7670905,701771,9875733,3379787,1842470,983794,8658583,9697676,1468575,9853865,1784635,760211,4667134,9116597,287256,1714194,6791590,2773262,8461802,3760451,4973934,8971747,6287556,6272170,1619149,1256384,7606837,3004297,4514606,2609078,6156693,5206440,4018457,8477772,5212929,5849482,655806,8183052,2880921,506870,3328854,8777873,5357257,1396492,5678088,3589593,7607153,4588539,2951891,4200010,4648718,2696094,4989097,2169312,7658012,5070048,792212,3557154,9648543,8347381,1785441,2330592,4282915,7280759,7414746,1771221,2555756,6935947,4521023,6514761,3355184,9660873,6088914,8233640,8026321,6931189,7830992,245037,2679141,3057273,5682172,5529070,2228699,6922565,7120601,2900118,8633469,5950948,6523262,8363241,3431960,7594141,7772712,8591435,1665700,1444831,4133681,2906883,8060776,5308660,5432977,1065233,8700849,7695512,5458104,7084778,3717562,3021601,6686909,1320940,4453336,1667655,3595011,483768,7050318,6100956,2970600,3929247,3571695,8217139,6075595,9546179,8420826,7184217,5151812,9785062,3465395,5338447,6707682,8387107,746643,1232766,9732127,2160805,646836,7258638,9079878,8199153,5175701,4128637,9455964,4989182,3945611,9332538,1495788,1246537,1788831,2749455,4306743,5137921,9707252,7076461,1359054,3977768,1008130
0,8843543,professional,New York City,Y,[],"[UX/UI, Storytelling/Brand, Media/Content]","[everlane, marketing, uber, storytelling/brand...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,7755085,student,Somewhere else,N,[],"[Media/Content, Storytelling/Brand, Product Ma...","[government, politics, music, page, instagram,...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [60]:
#checking df_keywords
df_keywords.head(2)

Unnamed: 0,nyc_id,str_combined
0,8843543,"[everlane, marketing, uber, storytelling/brand..."
1,7755085,"[government, politics, music, page, instagram,..."


In [61]:
#testing running x=7 for the matrix
x = 7
tmpid = df_distances.loc[x, 'nyc_id']
df_distances[tmpid] = create_list_distance(df_keywords, x)
df_distances.head(2)

Unnamed: 0,nyc_id,userType,location,is_vip,past_advisors,topics_all,str_combined,8843543,7755085,9202605,7666762,432418,3877105,1020980,7896731,6961087,6402058,2290089,1148025,4999309,4369680,382570,284637,8187296,8476629,7306029,5652660,204780,5677744,2593443,4601513,4878277,2620815,3288117,5708352,5474030,4046542,4340193,7474138,3712338,352403,2348870,8526469,7743463,6459524,9929858,9880082,1518524,4788367,8542359,997987,4629976,5064585,4755310,7209834,606005,5017981,5247078,8902800,9134615,7048387,6751671,1348913,6937862,8064126,8436578,8910889,3230467,3140161,3667774,4192795,8301349,8033043,5166603,7485700,9540732,4958919,8561114,3122679,8238975,2562280,7987549,6414426,4101601,1983482,878483,6159047,1557244,6113184,2149492,796123,6672031,4933667,625916,3179761,4914479,2956628,6234328,4558259,6966024,6742971,1497213,3569615,1187639,3758768,7681692,3167915,7792027,9462300,1453483,2598156,477453,3771144,813333,8841456,7679971,3116521,4601941,6306797,3195584,2627748,2169538,4999579,2235203,6437056,2308071,2368457,1390411,7694741,3521250,3206074,301431,2669767,7518538,2256585,94522,6955519,9959606,2037455,2526473,8359351,269209,2825496,3511047,1614397,2659801,4410236,515652,4818902,8219730,7918230,5108683,5238740,7413581,6286127,9113243,3805339,2179494,6059897,9998462,3679840,206547,6117465,9957593,6807435,1739284,9178381,7257843,2109474,8699149,8384707,2360585,3965641,9076155,9318559,6119200,7333768,8697016,7023001,757721,537278,7468074,8175409,8788183,6446789,1857735,7650470,6495154,2611714,710683,152075,2198911,6788313,9853788,9808460,6008056,5240141,8907470,9037441,8912858,6581843,9486722,1110483,4358079,9356485,2271359,7915205,5800707,2794321,2427696,6535431,5402293,8693468,4236687,427550,7888951,2826824,6928588,2527975,5493738,7575883,1597931,1575847,1781688,372231,2282239,4319835,4920915,1484378,2299050,4846292,4336956,5786211,7893434,3895624,2859992,7198024,3423783,6675599,9576956,6126279,5686548,1825469,9700647,2868050,4702672,1146067,4015930,4401457,9546947,94925,1190509,2932188,3266779,8907446,7569840,1746835,3382302,806560,9889124,8560400,6993819,1717171,1758129,3648059,654011,2654860,4409144,345140,7904109,656623,4360626,2373514,6037605,4915718,7945608,381348,9613380,1043372,2004698,8364846,8281303,1555515,3062857,6142470,6025522,2631274,8012883,9664092,7293490,4295994,8811878,4278733,6294925,4723827,8619826,2292042,3960526,5995636,3003753,6696486,6342552,1444202,4885804,4715334,8225142,8653095,1977774,5020397,6314809,6749269,2001782,2569801,4392229,6138397,4095341,2326281,2674601,6272321,7487443,8810505,1085840,8753052,7423155,8633724,6815031,710220,3874565,8985652,7158129,8733709,5064615,2031736,529104,3088834,184099,7680557,8061648,2326949,5205644,6512033,2981141,7601225,822710,2603797,8087478,9938106,3361150,4456308,7594522,9469995,6045830,746096,8259152,422067,7423610,9556430,4214457,3490399,6458334,8846614,6545230,2926356,311409,7324067,5769802,9258719,638772,5485758,6160406,1888196,742264,93981,7958106,8190906,1901532,8632938,2898324,6948891,3509690,7465231,7881557,2320522,1171955,5343640,6617302,2262694,3340724,4604046,2130677,7258728,5023302,7115544,629294,3906792,7841905,4316201,5353925,5978678,4605980,4715885,4923495,125254,2270136,3046384,1905636,1533613,1574686,9973649,436488,2513052,1372385,5791689,7936400,9391125,9773408,5810449,5311075,3092112,1106591,3684523,5182726,7926790,5175387,9389960,428794,3967508,1220914,4345907,5884571,6290417,27779,1579045,1446459,9413760,7110487,2663308,3658078,5880932,8294767,280351,5694419,9224675,7210977,2340455,2452387,2012792,3427395,4187508,4304558,9471033,7748650,385275,9498286,5624053,4686614,8079379,8195703,7828150,1550130,8969696,3032368,1325678,2588879,3703222,6849726,7432662,6778076,6618920,6004942,7496115,6545500,7517791,2799767,5682257,8236719,9658179,7031751,4891992,1963257,8813293,4255011,9835021,5997546,6025427,3398687,1041157,9684715,8769575,5346125,2258308,8923238,7967766,9249359,704895,2185414,9382336,3018819,916644,4119888,8835562,3008395,2212759,9969727,8949839,8265769,4404402,1443873,9990971,9593316,7950567,3828722,7601152,2991448,6512271,2903590,2659610,4077631,2493337,264160,8577110,5766040,2146133,6493111,7638660,7258247,7657509,9932727,7245777,692601,2856925,2352193,2815849,3047315,9642528,3861671,9838132,4075604,4075518,7591652,7908743,9012509,5757510,1005387,4768438,918358,7524608,9934813,6541993,1449114,606407,7931432,4735198,5695334,6550439,8445244,746687,8725761,7670905,701771,9875733,3379787,1842470,983794,8658583,9697676,1468575,9853865,1784635,760211,4667134,9116597,287256,1714194,6791590,2773262,8461802,3760451,4973934,8971747,6287556,6272170,1619149,1256384,7606837,3004297,4514606,2609078,6156693,5206440,4018457,8477772,5212929,5849482,655806,8183052,2880921,506870,3328854,8777873,5357257,1396492,5678088,3589593,7607153,4588539,2951891,4200010,4648718,2696094,4989097,2169312,7658012,5070048,792212,3557154,9648543,8347381,1785441,2330592,4282915,7280759,7414746,1771221,2555756,6935947,4521023,6514761,3355184,9660873,6088914,8233640,8026321,6931189,7830992,245037,2679141,3057273,5682172,5529070,2228699,6922565,7120601,2900118,8633469,5950948,6523262,8363241,3431960,7594141,7772712,8591435,1665700,1444831,4133681,2906883,8060776,5308660,5432977,1065233,8700849,7695512,5458104,7084778,3717562,3021601,6686909,1320940,4453336,1667655,3595011,483768,7050318,6100956,2970600,3929247,3571695,8217139,6075595,9546179,8420826,7184217,5151812,9785062,3465395,5338447,6707682,8387107,746643,1232766,9732127,2160805,646836,7258638,9079878,8199153,5175701,4128637,9455964,4989182,3945611,9332538,1495788,1246537,1788831,2749455,4306743,5137921,9707252,7076461,1359054,3977768,1008130,7896731.1
0,8843543,professional,New York City,Y,[],"[UX/UI, Storytelling/Brand, Media/Content]","[everlane, marketing, uber, storytelling/brand...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.866
1,7755085,student,Somewhere else,N,[],"[Media/Content, Storytelling/Brand, Product Ma...","[government, politics, music, page, instagram,...",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.7906


# Part 6-B: Populate the matrix of scores by running the KNN function

In [130]:
def create_matrix(df_distances, df_keywords, n_from, n_to, filename):
    print ('running from: {} to {}'.format(n_from, n_to))
    for x in range(n_from, n_to + 1):
        tmpid = df_distances.loc[x, 'nyc_id']
        if len(df_distances.loc[x,'str_combined'])>0:
            df_distances[tmpid] = create_list_distance(df_keywords, x)
            if (x%5==0):
                print ('Count {}: Column {} calculation done'.format(x, tmpid))
        else:
            df_distances[tmpid] = np.nan
            if (x%5==0):
                print ('Count {}: Column {} done, NaN distances for all'.format(x, tmpid))
        df_distances.to_csv(filename)

    print ('Job Done!!')

# wait for the run to finish

In [131]:
#the function needs to run until the end for it to calculate all the values
#IN SHORT - you need to wait for the run to finish
create_matrix(df_distances = df_distances, df_keywords = df_keywords, n_from = 0, 
              n_to = len(df_distances)-1, filename = 'outputs/users_distances.csv')

running from: 0 to 705
Count 0: Column 8843543 calculation done
Count 5: Column 3877105 calculation done
Count 10: Column 2290089 calculation done
Count 15: Column 284637 calculation done
Count 20: Column 204780 calculation done
Count 25: Column 2620815 calculation done
Count 30: Column 4340193 calculation done
Count 35: Column 8526469 calculation done
Count 40: Column 1518524 calculation done
Count 45: Column 5064585 calculation done
Count 50: Column 5247078 calculation done
Count 55: Column 1348913 calculation done
Count 60: Column 3230467 calculation done
Count 65: Column 8033043 calculation done
Count 70: Column 8561114 calculation done
Count 75: Column 6414426 calculation done
Count 80: Column 1557244 calculation done
Count 85: Column 4933667 calculation done
Count 90: Column 6234328 calculation done
Count 95: Column 3569615 calculation done
Count 100: Column 7792027 calculation done
Count 105: Column 3771144 calculation done
Count 110: Column 4601941 calculation done
Count 115: C

In [231]:
df_distances.head()

Unnamed: 0,nyc_id,userType,location,is_vip,past_advisors,topics_all,str_combined,8843543,7755085,9202605,7666762,432418,3877105,1020980,7896731,6961087,6402058,2290089,1148025,4999309,4369680,382570,284637,8187296,8476629,7306029,5652660,204780,5677744,2593443,4601513,4878277,2620815,3288117,5708352,5474030,4046542,4340193,7474138,3712338,352403,2348870,8526469,7743463,6459524,9929858,9880082,1518524,4788367,8542359,997987,4629976,5064585,4755310,7209834,606005,5017981,5247078,8902800,9134615,7048387,6751671,1348913,6937862,8064126,8436578,8910889,3230467,3140161,3667774,4192795,8301349,8033043,5166603,7485700,9540732,4958919,8561114,3122679,8238975,2562280,7987549,6414426,4101601,1983482,878483,6159047,1557244,6113184,2149492,796123,6672031,4933667,625916,3179761,4914479,2956628,6234328,4558259,6966024,6742971,1497213,3569615,1187639,3758768,7681692,3167915,7792027,9462300,1453483,2598156,477453,3771144,813333,8841456,7679971,3116521,4601941,6306797,3195584,2627748,2169538,4999579,2235203,6437056,2308071,2368457,1390411,7694741,3521250,3206074,301431,2669767,7518538,2256585,94522,6955519,9959606,2037455,2526473,8359351,269209,2825496,3511047,1614397,2659801,4410236,515652,4818902,8219730,7918230,5108683,5238740,7413581,6286127,9113243,3805339,2179494,6059897,9998462,3679840,206547,6117465,9957593,6807435,1739284,9178381,7257843,2109474,8699149,8384707,2360585,3965641,9076155,9318559,6119200,7333768,8697016,7023001,757721,537278,7468074,8175409,8788183,6446789,1857735,7650470,6495154,2611714,710683,152075,2198911,6788313,9853788,9808460,6008056,5240141,8907470,9037441,8912858,6581843,9486722,1110483,4358079,9356485,2271359,7915205,5800707,2794321,2427696,6535431,5402293,8693468,4236687,427550,7888951,2826824,6928588,2527975,5493738,7575883,1597931,1575847,1781688,372231,2282239,4319835,4920915,1484378,2299050,4846292,4336956,5786211,7893434,3895624,2859992,7198024,3423783,6675599,9576956,6126279,5686548,1825469,9700647,2868050,4702672,1146067,4015930,4401457,9546947,94925,1190509,2932188,3266779,8907446,7569840,1746835,3382302,806560,9889124,8560400,6993819,1717171,1758129,3648059,654011,2654860,4409144,345140,7904109,656623,4360626,2373514,6037605,4915718,7945608,381348,9613380,1043372,2004698,8364846,8281303,1555515,3062857,6142470,6025522,2631274,8012883,9664092,7293490,4295994,8811878,4278733,6294925,4723827,8619826,2292042,3960526,5995636,3003753,6696486,6342552,1444202,4885804,4715334,8225142,8653095,1977774,5020397,6314809,6749269,2001782,2569801,4392229,6138397,4095341,2326281,2674601,6272321,7487443,8810505,1085840,8753052,7423155,8633724,6815031,710220,3874565,8985652,7158129,8733709,5064615,2031736,529104,3088834,184099,7680557,8061648,2326949,5205644,6512033,2981141,7601225,822710,2603797,8087478,9938106,3361150,4456308,7594522,9469995,6045830,746096,8259152,422067,7423610,9556430,4214457,3490399,6458334,8846614,6545230,2926356,311409,7324067,5769802,9258719,638772,5485758,6160406,1888196,742264,93981,7958106,8190906,1901532,8632938,2898324,6948891,3509690,7465231,7881557,2320522,1171955,5343640,6617302,2262694,3340724,4604046,2130677,7258728,5023302,7115544,629294,3906792,7841905,4316201,5353925,5978678,4605980,4715885,4923495,125254,2270136,3046384,1905636,1533613,1574686,9973649,436488,2513052,1372385,5791689,7936400,9391125,9773408,5810449,5311075,3092112,1106591,3684523,5182726,7926790,5175387,9389960,428794,3967508,1220914,4345907,5884571,6290417,27779,1579045,1446459,9413760,7110487,2663308,3658078,5880932,8294767,280351,5694419,9224675,7210977,2340455,2452387,2012792,3427395,4187508,4304558,9471033,7748650,385275,9498286,5624053,4686614,8079379,8195703,7828150,1550130,8969696,3032368,1325678,2588879,3703222,6849726,7432662,6778076,6618920,6004942,7496115,6545500,7517791,2799767,5682257,8236719,9658179,7031751,4891992,1963257,8813293,4255011,9835021,5997546,6025427,3398687,1041157,9684715,8769575,5346125,2258308,8923238,7967766,9249359,704895,2185414,9382336,3018819,916644,4119888,8835562,3008395,2212759,9969727,8949839,8265769,4404402,1443873,9990971,9593316,7950567,3828722,7601152,2991448,6512271,2903590,2659610,4077631,2493337,264160,8577110,5766040,2146133,6493111,7638660,7258247,7657509,9932727,7245777,692601,2856925,2352193,2815849,3047315,9642528,3861671,9838132,4075604,4075518,7591652,7908743,9012509,5757510,1005387,4768438,918358,7524608,9934813,6541993,1449114,606407,7931432,4735198,5695334,6550439,8445244,746687,8725761,7670905,701771,9875733,3379787,1842470,983794,8658583,9697676,1468575,9853865,1784635,760211,4667134,9116597,287256,1714194,6791590,2773262,8461802,3760451,4973934,8971747,6287556,6272170,1619149,1256384,7606837,3004297,4514606,2609078,6156693,5206440,4018457,8477772,5212929,5849482,655806,8183052,2880921,506870,3328854,8777873,5357257,1396492,5678088,3589593,7607153,4588539,2951891,4200010,4648718,2696094,4989097,2169312,7658012,5070048,792212,3557154,9648543,8347381,1785441,2330592,4282915,7280759,7414746,1771221,2555756,6935947,4521023,6514761,3355184,9660873,6088914,8233640,8026321,6931189,7830992,245037,2679141,3057273,5682172,5529070,2228699,6922565,7120601,2900118,8633469,5950948,6523262,8363241,3431960,7594141,7772712,8591435,1665700,1444831,4133681,2906883,8060776,5308660,5432977,1065233,8700849,7695512,5458104,7084778,3717562,3021601,6686909,1320940,4453336,1667655,3595011,483768,7050318,6100956,2970600,3929247,3571695,8217139,6075595,9546179,8420826,7184217,5151812,9785062,3465395,5338447,6707682,8387107,746643,1232766,9732127,2160805,646836,7258638,9079878,8199153,5175701,4128637,9455964,4989182,3945611,9332538,1495788,1246537,1788831,2749455,4306743,5137921,9707252,7076461,1359054,3977768,1008130
0,8843543,professional,New York City,Y,[8843543],"[UX/UI, Storytelling/Brand, Media/Content]","['app', 'hand', 'storytelling/brand', 'mobile'...",0.0,0.9045,0.9393,0.9045,1.0,0.9623,0.9354,0.866,0.8944,0.8584,0.9608,0.9661,1.0,0.9258,0.9045,1.0,0.977,0.922,0.9014,0.9574,0.8885,0.9636,0.9199,0.9258,0.9045,0.9512,0.9428,0.8944,0.7071,1.0,0.9258,0.8944,1.0,0.9258,0.9354,0.9199,0.8864,0.9789,0.8944,0.8997,1.0,0.9636,0.9199,0.9789,1.0,0.9354,0.9789,0.9129,0.8819,0.9701,0.8498,0.9014,0.9258,0.9285,1.0,0.9592,0.9428,0.9428,1.0,0.8292,0.8528,0.8498,0.9733,0.9258,0.9045,0.9293,0.8864,0.9285,0.9608,0.9354,0.9428,0.9393,0.8452,0.978,0.9075,0.9682,0.9354,0.866,0.9747,0.9469,0.922,1.0,0.8292,1.0,0.9592,0.879,0.9636,0.9177,0.922,0.9535,1.0,1.0,0.9459,0.9459,0.9374,0.9014,0.8165,0.9354,0.9636,0.9487,0.9428,0.9832,0.8944,0.8771,0.9129,0.8944,0.9354,0.9672,0.9075,0.922,0.9075,0.8321,0.9045,0.9075,0.866,1.0,0.9177,1.0,0.9682,1.0,0.9045,0.978,0.9393,0.9608,0.923,0.9014,0.9075,0.9354,0.9759,0.9199,0.922,0.9258,1.0,0.9701,0.9428,0.9535,0.9428,0.8944,0.9636,0.9354,0.9199,0.8452,0.9258,0.9381,0.8771,0.866,0.9459,0.9487,0.9075,0.9258,0.7906,0.9354,0.9325,0.9354,1.0,1.0,0.8321,0.9428,0.9393,0.9459,0.9258,0.9608,0.9747,0.9405,0.9608,0.9293,0.9459,0.8018,0.9661,0.9129,1.0,0.978,1.0,1.0,1.0,0.9487,0.9393,0.8367,0.8452,0.9293,0.9574,0.9129,0.9535,0.9325,0.9574,0.8885,0.8819,1.0,0.9354,0.9165,0.978,0.8944,0.9487,0.9733,0.7845,0.8885,0.8864,1.0,1.0,0.866,1.0,0.8452,0.9856,0.9574,0.8367,0.9014,0.9045,0.9449,0.8885,0.9309,0.8944,0.8402,0.9129,0.9075,0.9014,0.9381,0.9608,0.9393,0.9636,0.922,0.8898,0.8528,0.9258,0.7559,1.0,0.9129,0.8819,0.9574,0.9428,0.8944,0.9487,0.9661,0.9199,0.9075,0.9428,0.9574,0.9661,1.0,0.7385,0.9309,0.9608,0.8885,0.9129,1.0,0.9258,0.9733,1.0,0.9309,0.9428,1.0,0.9682,0.9258,0.9177,0.9512,1.0,0.9325,0.9789,0.9636,0.8528,0.8819,0.9608,0.9535,0.9718,0.9309,0.9075,0.9258,0.7906,0.9258,0.7977,0.9177,0.9747,1.0,0.866,0.9535,0.9574,0.9649,0.8944,0.8367,0.922,0.8367,1.0,0.9701,0.9258,0.8864,0.9555,0.9258,0.9393,0.866,0.8452,0.9045,0.8847,0.8819,0.9608,0.8944,0.8944,0.922,0.9574,0.9354,0.9487,0.9075,0.9045,0.9608,0.7454,0.9293,1.0,0.9045,0.9535,1.0,0.8563,0.9574,0.9309,1.0,0.9309,1.0,0.8864,0.9309,0.7559,0.9354,0.9354,0.8864,0.9661,0.9718,0.9258,0.8165,0.8864,0.9555,0.9487,1.0,0.9574,0.9258,1.0,1.0,0.9733,1.0,0.9574,1.0,0.9574,0.866,0.9258,0.9701,0.9393,0.8819,0.6742,1.0,0.9682,1.0,1.0,0.8944,1.0,1.0,0.866,0.9258,0.9428,0.9309,0.9258,0.7746,0.7746,0.7559,0.9459,1.0,1.0,0.9789,0.8819,1.0,0.9636,0.8819,0.8367,0.9512,0.9428,1.0,1.0,0.9608,0.9574,1.0,1.0,0.9129,0.8944,1.0,0.8452,0.9045,0.9682,1.0,0.9393,0.8044,1.0,0.8367,0.8864,0.8944,1.0,0.7977,1.0,1.0,0.9649,0.8864,0.9199,0.9258,0.8819,0.9555,0.8864,0.9393,0.9129,0.9045,0.9309,0.9487,0.8528,0.9258,0.9045,0.9636,0.8292,0.9682,0.866,0.8819,1.0,0.9354,0.8452,0.9459,0.9258,0.7977,0.9045,0.8165,0.8944,0.866,1.0,0.9393,0.9129,1.0,1.0,0.9309,0.9354,0.9258,1.0,0.9177,0.9487,0.9512,0.9661,0.8528,0.978,0.9512,1.0,0.8944,0.9199,1.0,0.9733,1.0,0.9428,0.9459,0.8819,0.9487,0.9177,1.0,0.9014,0.9512,0.9535,0.9636,0.9393,0.9129,0.9393,1.0,0.9045,0.9405,1.0,1.0,0.9177,0.9177,0.9661,0.9608,0.9258,0.9045,0.9428,0.977,0.9574,1.0,0.9636,0.8819,0.9574,1.0,1.0,0.9045,0.9459,0.8864,0.9129,0.9014,0.9608,0.9309,0.8771,1.0,0.7638,0.9608,0.9199,0.8745,1.0,0.9393,0.9487,0.9555,0.9682,0.9555,1.0,0.9428,0.9199,1.0,0.9354,0.9682,0.9649,0.9089,0.8944,0.977,0.9636,0.9661,1.0,0.9393,0.9574,0.9177,0.9487,0.9649,1.0,1.0,0.7071,0.9535,0.9405,0.9512,0.922,0.9309,0.9592,0.9354,0.9806,0.9199,0.9555,1.0,0.8819,0.9682,0.9258,0.9309,0.8321,1.0,0.9535,1.0,0.9089,0.9405,1.0,0.9759,1.0,0.9428,0.9354,1.0,0.9487,0.9789,0.8987,1.0,0.9672,1.0,0.9075,1.0,0.9177,0.9393,0.9459,0.9535,0.9459,1.0,0.7906,1.0,0.9258,0.9293,0.8997,0.9258,1.0,0.9718,1.0,0.8771,0.9608,1.0,0.9129,0.866,1.0,0.9459,0.9014,0.8452,0.9309,0.9293,0.9623,0.8528,0.9806,1.0,0.8321,0.9718,0.9747,0.8745,0.9325,0.9165,1.0,0.9592,0.9701,0.9177,0.8944,1.0,0.8745,1.0,0.8745,0.866,0.9636,0.9535,0.9428,0.8997,1.0,0.9129,0.9718,1.0,0.9258,0.9487,1.0,0.8563,1.0,0.9682,0.8402,0.9608,0.8944,0.9309,0.9661,0.9806,0.9813,0.8165,0.9309,1.0,0.9045,0.977,0.8563,0.9535,0.9393,0.9487,0.9608,0.9733,0.9129,0.8745,0.7977,0.8165,1.0,1.0,0.9574,0.9636,0.9354,0.9459,0.8771,0.8528,0.9592,0.9354,0.9747,0.9535,1.0,0.9608,0.9555,0.9592,0.9487,0.9354,0.9258,0.9574,0.8997,0.9623,0.8864,0.9045,0.9014,0.9129,0.9428,0.9661,0.9354,0.9608,1.0,0.922,0.9789,1.0,0.982,0.9354,1.0,1.0,0.9747,0.9258,0.923,1.0,0.9045,0.9014,1.0,1.0,0.9747,0.9487,0.9535,0.9682,0.922,0.9535,0.9623,0.922,0.9555,0.9535,1.0,1.0,0.9354,1.0,1.0,0.8885,0.9258,0.9747,0.9701,0.9129,0.9129,0.9325,0.9535,0.9487,0.922,0.9449
1,7755085,student,Somewhere else,N,[8843543],"[Media/Content, Storytelling/Brand, Product Ma...","['management', 'music', 'product', 'app', 'pol...",0.8563,0.0,0.9393,0.8528,1.0,0.8607,0.8898,0.7906,0.8944,0.8885,0.9608,0.9309,0.8944,0.8997,0.7977,0.9333,0.9535,0.8944,0.9354,0.9574,0.8272,0.9258,0.8771,0.8864,0.8528,0.8997,0.8819,0.8944,0.866,1.0,0.9258,0.8367,0.9393,0.8864,0.9014,0.9199,0.7071,0.9789,0.7746,0.8452,0.9535,1.0,0.8771,0.9354,0.9293,0.9354,0.9129,0.7638,0.8819,0.9075,0.8498,0.9014,0.8864,0.8906,0.9075,0.9592,0.7454,0.923,0.8885,0.7906,0.8528,0.8498,0.9733,0.8452,0.7977,0.9293,0.7559,0.871,0.7845,0.866,0.7817,0.8745,0.5976,0.9555,0.8402,0.866,0.8165,0.7638,0.922,0.9097,0.8367,0.922,0.9014,1.0,0.8944,0.8257,0.9258,0.8584,0.8062,0.8528,0.9747,0.9682,0.8272,0.8584,0.8528,0.866,0.866,0.866,0.8864,0.866,0.7817,0.9832,0.7746,0.7845,0.9129,0.9309,0.9354,0.9158,0.8044,0.922,0.8745,0.8771,0.879,0.8044,0.7906,0.9393,0.8272,0.8771,0.9682,1.0,0.7385,0.9325,0.8745,0.9199,0.8607,0.866,0.8402,0.866,0.9258,0.8549,0.8944,0.9258,0.8729,0.9075,1.0,0.8528,0.9428,0.8718,0.9636,0.8416,0.9608,0.8018,0.8452,0.8718,0.7338,0.9574,0.8584,0.9487,0.8745,0.9512,0.7906,0.9354,0.8847,0.8292,0.9487,0.9045,0.8321,0.8607,0.9393,0.8885,0.8452,0.8321,0.9487,0.9608,0.9199,0.7977,0.9177,0.8864,0.9661,0.9129,0.9258,0.978,0.8321,0.9813,1.0,0.9487,0.8402,0.7071,0.8018,0.8528,0.9354,0.9574,0.9045,0.9325,1.0,0.8272,0.6667,1.0,0.9129,0.8944,0.9325,0.8944,0.8944,0.9733,0.8771,0.8885,0.8864,0.9428,0.9636,0.866,0.9129,0.7071,0.9258,0.8898,0.8944,0.9014,0.8528,0.9449,0.8885,0.6831,0.8944,0.7276,0.8819,0.9075,0.866,0.8718,0.9405,0.8044,0.8864,0.8062,0.866,0.8528,0.9449,0.7559,0.9661,0.8498,0.8819,0.9129,0.8819,0.8367,0.8367,0.9309,0.8771,0.767,0.8498,0.8416,0.9309,0.9574,0.7977,0.7303,0.9199,0.8885,0.8165,0.9045,0.9512,0.9177,1.0,0.8563,0.8498,0.9045,0.9682,0.8864,0.8584,0.9512,0.866,0.8847,0.9574,0.8864,0.8528,0.8819,0.9199,0.9535,0.9129,0.9309,0.8402,0.8452,0.9014,0.8864,0.8528,0.9459,0.8944,1.0,0.9574,0.8528,0.866,0.9469,0.8165,0.7746,0.8367,0.8944,0.9393,0.9393,0.8018,0.8452,0.9325,0.8018,0.8044,0.866,0.9258,0.7977,0.8847,0.8165,0.8771,0.8367,0.8165,0.8944,0.9574,0.866,0.7071,0.8402,0.7977,0.8321,0.8165,0.9045,0.7638,0.9045,0.9045,0.8528,0.7746,0.866,0.8944,0.9129,0.8563,0.8367,0.9636,0.8563,0.8452,0.866,0.9014,0.9636,0.8944,0.8498,0.8452,0.866,0.8864,0.8847,0.8944,1.0,0.9574,0.8452,0.9075,0.9574,0.9459,0.9661,0.9574,0.8452,0.8165,0.866,0.8452,0.9393,0.9075,0.8165,0.8528,1.0,0.9354,0.9608,0.8745,0.8367,1.0,1.0,0.7906,0.9258,1.0,0.8944,0.8864,0.7071,0.8367,0.8452,0.9459,0.9258,0.9199,0.9789,0.8819,1.0,0.8864,0.8165,0.7071,0.8729,0.9129,0.9608,0.8864,0.8771,1.0,0.9535,0.8864,0.9129,0.8062,1.0,0.8452,0.7977,0.9354,0.8864,0.9393,0.8044,0.8819,0.8367,0.8452,0.7071,0.9574,0.9045,0.8771,0.9661,0.9649,0.9258,0.8771,0.8452,0.8165,0.8341,0.7559,0.9393,0.9129,0.9293,0.9309,0.8367,0.8528,0.9258,0.9045,0.9258,0.9014,0.866,0.9129,0.8165,0.9608,0.9014,0.8018,0.9177,0.8018,0.7385,0.7977,0.7071,0.8944,0.8292,0.7559,0.9393,0.866,0.9682,0.9354,0.9309,0.9354,0.8452,0.9535,0.8272,0.9487,0.8729,0.8944,0.7977,0.978,0.8452,1.0,0.8563,0.8321,0.9798,0.8272,0.9555,0.8819,0.9177,0.8165,0.9487,0.9177,0.8367,0.866,0.9258,0.9045,0.9258,0.8745,0.866,0.8745,0.8864,0.9045,0.8086,1.0,1.0,0.8584,0.8272,0.9309,0.9608,0.8452,0.7977,0.7454,0.9535,0.9574,0.8367,0.8864,0.9428,1.0,0.9393,0.9535,0.9535,0.8885,0.8452,0.9129,0.75,1.0,1.0,0.8321,0.9574,0.8165,0.8771,0.9199,0.8044,1.0,0.9075,0.8062,0.9325,0.9014,0.9089,0.9759,0.9129,0.9199,0.9393,0.866,0.9186,0.851,0.8847,0.8165,0.9045,0.9449,0.8944,0.9806,0.7276,0.9574,0.7609,0.8944,0.9469,0.8528,0.866,0.8452,0.8528,0.8987,0.8165,0.922,0.8165,0.9165,0.8292,0.9405,0.8771,0.9089,0.9661,0.7817,0.8292,0.7559,0.8563,0.7845,0.8885,0.8528,0.9258,0.8597,0.9199,0.9177,0.9258,1.0,0.8819,0.8898,1.0,0.9129,0.9574,0.8321,0.9199,0.9672,0.8864,0.7276,0.9199,0.8584,0.8745,0.8584,0.8528,0.8885,0.8528,0.7906,0.8771,0.8864,0.7977,0.8997,0.8018,1.0,0.9129,0.9258,0.8771,0.9199,0.9354,0.9129,0.866,0.9636,0.8584,0.866,0.7559,0.8944,0.879,0.923,0.8528,0.9806,0.9258,0.7845,0.8819,0.8367,0.8745,0.8597,0.9592,0.8944,0.8944,0.9701,0.9459,0.8944,0.9487,0.9075,0.9258,0.8044,0.7906,0.9258,0.9293,0.8498,0.8997,0.8367,0.8165,0.9428,0.8987,0.8729,1.0,0.978,0.9309,1.0,0.75,0.9075,0.8321,0.8563,1.0,0.8944,0.8771,0.923,0.9428,0.8944,0.8944,0.9045,0.9293,0.7746,0.879,0.9075,0.8367,0.8987,0.8885,0.866,0.9393,0.9045,0.6455,0.922,0.9682,0.9574,0.9636,0.9354,0.8584,0.7338,0.8528,0.9381,0.8416,0.922,0.9535,0.9129,0.8771,0.9089,0.9165,0.866,0.7906,0.7559,0.9129,0.8729,0.923,0.8018,0.9045,0.9014,0.7817,0.8819,0.9309,0.9354,0.9608,0.9459,0.8062,0.9574,0.9535,0.9636,0.866,0.9014,0.9487,0.866,0.8729,0.8389,0.7906,0.8528,0.9354,0.866,0.9129,0.9487,0.8944,0.8528,0.9682,0.866,0.9535,0.9428,0.8944,0.8597,0.8528,0.9759,0.9733,0.9014,0.8321,0.9574,0.8584,0.8452,0.8944,0.8745,0.8165,0.9129,0.9555,0.9293,0.8563,0.922,0.9636
2,9202605,professional,New York City,N,[8843543],"[Data/Analytics, Storytelling/Brand, Media/Con...","['marathon', 'network', 'new', 'musical', 'dat...",0.9309,0.9535,0.0,0.9535,0.9258,0.9623,0.866,0.866,0.9747,0.8584,0.8771,0.9661,0.9487,0.9258,0.9535,0.9837,0.9045,0.8944,0.9682,0.866,0.8584,0.9258,1.0,1.0,1.0,0.9512,0.9428,0.9487,0.9354,0.9636,0.8452,0.7071,0.9701,1.0,0.9354,0.8771,1.0,1.0,1.0,0.8997,1.0,1.0,0.9608,0.9574,1.0,0.9354,1.0,0.9129,0.8819,0.9701,0.8819,0.9354,0.9636,0.9097,1.0,0.9798,0.9428,0.9623,0.9733,0.9014,0.8528,0.8819,0.9459,0.9258,0.9535,0.879,1.0,0.9285,0.9608,0.9682,0.9428,0.9393,0.9258,0.978,0.8745,0.9682,0.9789,0.8165,0.8944,0.9285,0.9747,1.0,0.9354,1.0,0.9592,0.9293,0.9636,0.9459,0.9747,0.9535,0.9747,0.9682,1.0,1.0,0.9535,1.0,0.9129,0.9014,0.9636,1.0,1.0,0.9661,0.8944,0.9199,0.9428,1.0,0.9354,0.9837,0.9393,0.8944,0.9075,0.9199,0.9045,0.9393,0.9354,0.9075,1.0,0.9199,0.9354,1.0,0.9045,1.0,0.8402,0.9405,0.923,0.866,0.9075,0.9354,0.9258,0.9806,1.0,1.0,1.0,0.9393,0.9428,1.0,0.9129,0.9165,0.9258,0.9574,0.9608,0.9636,0.9258,0.9381,0.9199,0.9574,0.8885,1.0,0.9393,1.0,0.9354,0.9789,0.9089,1.0,0.8944,1.0,0.9608,0.9428,0.9393,1.0,0.9636,1.0,0.9487,0.9405,0.9608,0.9535,1.0,0.9258,1.0,0.9574,0.8729,0.9555,0.9199,0.9428,0.9129,0.9747,1.0,0.8367,0.9636,0.977,0.9789,1.0,1.0,1.0,0.9574,0.9177,1.0,0.9428,1.0,0.8944,0.9325,0.9309,0.922,0.9459,0.9199,0.8885,0.9258,0.9428,1.0,1.0,0.9129,0.8864,0.971,0.9574,0.9487,0.9682,0.9293,0.982,0.9459,1.0,0.9487,0.9393,1.0,0.9393,0.9014,0.9592,1.0,1.0,0.9636,0.9747,0.9789,0.7385,0.982,0.7559,0.9309,0.9718,0.9428,0.9574,0.9428,0.9487,1.0,0.9661,0.9405,0.9075,0.9428,0.9354,0.9661,0.9574,0.9045,0.9661,0.9608,0.9733,0.9428,1.0,0.9258,0.9459,0.9487,0.9661,0.9718,0.9535,1.0,1.0,0.9459,0.9512,0.9574,0.9555,1.0,0.9636,0.8528,1.0,1.0,0.9535,0.9428,0.9309,0.9075,0.9258,0.9354,1.0,1.0,0.9733,0.9487,0.9309,1.0,0.9535,0.9574,0.9826,0.9309,0.9487,0.9487,1.0,0.9393,1.0,1.0,1.0,1.0,0.8864,0.9393,0.9682,0.8864,0.977,0.978,0.8819,1.0,0.9487,0.9309,0.922,1.0,0.9354,0.9487,0.9393,0.8528,0.9608,0.9428,0.9293,1.0,1.0,0.9045,0.9535,0.9661,0.9574,0.9309,0.9718,0.9309,0.9487,0.9636,1.0,0.8452,0.7906,0.9354,0.9258,0.9309,0.9718,0.9258,0.9129,0.9258,0.9555,0.9309,1.0,0.9574,0.9258,0.9701,0.9574,0.9177,0.9309,1.0,0.9636,0.9574,0.9014,0.9636,1.0,0.9701,0.9428,0.9535,0.8944,0.9682,1.0,0.9701,0.9487,0.9487,0.9661,0.7906,1.0,1.0,0.8563,0.8452,0.8367,0.7746,0.8452,1.0,0.9258,1.0,0.9574,0.8819,1.0,0.9449,1.0,0.9487,1.0,0.9718,0.9199,0.9636,1.0,1.0,0.9535,0.9258,0.9428,0.9487,0.9354,0.9258,0.7385,1.0,1.0,0.9701,0.8402,1.0,1.0,1.0,0.8944,0.9574,0.8528,0.8321,0.9832,0.9826,0.8864,0.9199,0.8864,1.0,0.9555,0.8864,0.8745,0.9574,0.9535,1.0,1.0,0.8528,0.9258,0.9535,1.0,0.9354,0.9682,0.9129,1.0,0.9608,0.9014,0.9636,0.9733,1.0,0.9045,0.9535,0.9574,0.8944,1.0,1.0,0.9393,0.866,0.9682,0.9354,1.0,0.9682,0.8452,0.9535,0.9459,0.9747,0.9512,0.9661,0.9535,0.978,0.9258,1.0,0.9661,0.8771,0.9798,1.0,1.0,0.9623,0.9733,0.8819,0.9487,0.9177,0.8944,1.0,0.9759,1.0,1.0,0.9701,0.9574,1.0,0.9636,0.9535,0.9806,1.0,0.9428,0.9733,1.0,1.0,1.0,1.0,0.9535,0.9428,1.0,0.9574,0.9487,0.8864,0.9428,0.9574,0.9393,0.9535,0.9293,0.9733,1.0,0.9574,0.866,0.9199,0.9309,0.8771,0.9574,0.7638,1.0,0.9806,0.8745,0.9661,0.9701,0.9487,0.9089,1.0,0.9555,0.9512,0.9129,0.9199,0.9393,0.9789,0.9682,0.9826,0.9325,1.0,0.9293,0.9636,0.9309,0.9806,0.9075,0.9129,0.9177,0.9747,1.0,0.9045,0.9574,0.8864,1.0,0.9806,0.9512,0.9747,0.9309,0.8944,0.9682,0.9806,0.9608,0.9555,0.8944,0.9718,0.9354,1.0,0.9309,0.9199,1.0,0.9535,0.9512,0.9325,0.9806,0.9733,0.9759,0.977,0.9718,0.9574,0.9258,0.9832,0.9574,0.9405,0.9199,0.9158,1.0,0.9701,0.9199,0.9733,1.0,1.0,0.9535,1.0,0.9535,0.866,0.8771,0.982,1.0,0.9258,1.0,0.8944,0.9718,1.0,0.9199,0.9199,1.0,0.9129,0.866,0.9258,0.9459,0.9354,0.8864,0.9661,1.0,0.9813,0.8528,0.9405,0.8452,0.7338,0.9129,0.9747,0.9075,0.9325,0.9381,0.9487,0.9592,1.0,0.9177,0.8944,0.8944,0.9393,0.9636,0.9075,0.7906,1.0,1.0,0.9718,0.8729,1.0,1.0,0.9129,1.0,0.9512,0.9487,0.9555,0.9309,0.9293,1.0,0.9701,0.9608,0.8563,0.9661,0.9661,0.9405,0.9813,1.0,0.9309,0.9487,1.0,0.977,0.8944,1.0,1.0,0.7746,0.9806,0.8885,0.9129,0.9701,1.0,0.866,0.9487,1.0,0.9574,0.8864,1.0,0.9459,0.8771,0.9535,0.9798,0.9354,0.866,1.0,0.9574,1.0,0.9555,0.9798,0.922,0.9682,0.9636,0.9354,0.8997,0.9813,1.0,0.9535,0.9014,0.9428,0.9428,1.0,0.7906,0.8771,0.9733,0.9487,0.9574,1.0,0.982,0.9129,0.8292,0.8944,1.0,0.9258,0.9623,1.0,0.977,0.9682,0.9354,1.0,0.9487,0.9487,0.977,1.0,0.922,0.9847,0.9623,0.922,1.0,0.9535,1.0,1.0,0.9354,1.0,0.9574,0.9177,0.9258,0.9747,1.0,0.9129,0.9574,1.0,0.9535,0.9661,0.922,0.9636
3,7666762,professional,New York City,Y,[8843543],"[Media/Content, Business Operations, Growth Le...","['game', 'digital', 'entertainment', 'business...",0.9309,0.9293,0.9701,0.0,0.9258,0.9428,0.9574,0.7071,0.922,0.9177,1.0,0.9309,0.9487,0.9512,0.977,0.9672,0.9535,0.9747,0.9354,1.0,0.9177,0.9636,0.9199,0.9636,0.9535,0.9512,1.0,0.9487,0.9354,1.0,0.9258,0.8944,0.9393,0.9258,0.9014,0.9608,0.9258,0.9789,0.9309,0.9759,0.977,0.9636,0.9608,0.9354,0.977,1.0,1.0,0.866,0.8165,0.9852,0.9428,0.9354,1.0,0.9649,0.9701,0.9798,0.8819,0.9623,0.9733,0.9354,0.9045,0.9129,0.9459,0.9512,0.9535,0.9535,0.9636,0.9649,0.8771,0.866,0.9129,0.9393,0.8018,0.9325,0.9701,0.9014,1.0,0.7638,0.922,0.9649,0.8944,0.9747,0.9354,0.8165,0.9798,0.9045,0.9258,0.9459,0.9747,0.9045,1.0,0.9682,0.9733,0.9733,0.9535,0.9354,0.7638,0.9354,0.9636,0.9747,0.9428,0.9832,0.7746,0.9199,0.9718,0.8944,0.866,0.9504,0.9075,0.922,0.8745,1.0,0.9535,0.9701,0.866,0.9393,1.0,0.9608,0.9014,0.9661,0.9535,0.978,0.9075,0.9608,0.923,0.866,0.8402,0.9354,0.9512,0.9405,0.922,1.0,0.9512,0.9393,1.0,0.9045,0.9129,0.9381,0.9636,0.9354,0.9199,1.0,0.9636,0.9592,0.9199,1.0,0.8885,0.8944,0.8044,0.9759,0.9354,0.9574,0.9555,0.9014,0.7746,1.0,1.0,0.9623,0.9701,0.9733,0.9449,0.8771,0.922,0.9806,0.9405,0.9293,0.9459,0.9636,1.0,1.0,0.8997,0.978,0.8771,0.9623,1.0,1.0,0.9701,0.9487,0.9258,0.977,0.9354,1.0,1.0,0.9555,0.866,0.9733,0.9428,0.9428,0.9789,0.9592,0.9555,0.9309,0.9747,0.9733,0.9199,0.9733,0.9258,0.9718,0.9636,1.0,0.9574,1.0,0.9856,1.0,0.8944,1.0,0.9293,0.9636,0.9733,0.8944,0.8944,0.9075,0.9718,0.767,0.9354,0.9798,0.9405,0.9701,1.0,0.8944,0.8898,0.9045,0.982,0.7559,0.9309,0.8819,0.8498,0.9129,0.9428,1.0,0.922,0.9661,0.9405,0.9701,0.9428,0.9574,0.9661,1.0,0.8528,0.9661,0.8321,0.8584,0.9718,1.0,0.9759,0.9733,1.0,0.9661,0.9129,0.8528,1.0,0.9636,0.9459,1.0,1.0,0.9089,0.9574,0.9636,0.7977,0.9428,0.9199,1.0,1.0,0.9309,0.9393,0.8452,0.9682,0.9258,1.0,0.9177,0.9487,0.9309,1.0,0.9045,0.9129,0.9649,0.9309,0.7746,0.9487,0.8367,0.9701,0.9393,1.0,1.0,0.978,0.8018,0.9701,0.9354,0.9636,0.9293,0.8847,0.9129,0.9608,0.7746,1.0,0.922,0.9574,1.0,0.7746,0.9075,0.7977,0.8771,0.9428,0.9293,0.866,0.9045,0.8528,0.9535,1.0,0.9129,0.9309,0.9718,0.8944,0.8367,1.0,0.9309,0.7559,0.866,0.9682,0.9258,0.8944,0.9129,0.8018,0.9129,0.8452,0.978,0.9309,0.9682,0.9129,0.8018,0.9075,0.9574,1.0,1.0,0.9129,0.8864,0.9574,0.9682,1.0,0.9393,1.0,1.0,0.9045,0.9487,0.9354,0.9199,0.9075,0.8944,1.0,1.0,0.9354,0.9258,0.8819,0.8944,0.8452,1.0,0.7746,0.9258,0.9177,0.8452,0.9199,0.9789,1.0,1.0,0.982,0.9428,0.8944,0.9512,0.9718,0.9608,0.9258,1.0,1.0,0.8528,1.0,0.8819,0.9747,0.9354,0.9258,0.9535,0.9682,1.0,0.9075,0.9393,0.9718,1.0,0.9636,0.7746,1.0,0.9045,0.9199,0.9832,1.0,0.9258,0.8321,0.8018,0.9428,0.9555,0.8018,0.9701,0.9574,0.9293,0.9661,0.9487,0.7977,0.9258,0.9535,0.9636,0.9354,0.9014,0.9574,0.8165,0.8771,0.9014,0.9258,0.9459,0.9636,0.8528,0.7977,0.9129,0.8944,1.0,0.7559,1.0,0.866,0.9682,0.7906,0.9661,0.9682,1.0,0.9045,0.9459,0.9487,0.9512,1.0,0.8528,0.978,0.9258,0.8944,0.9309,0.9608,0.9592,0.9733,1.0,0.923,0.9459,0.9428,0.9487,0.9733,0.9487,0.9354,0.9512,0.9535,0.9636,1.0,0.9574,1.0,1.0,0.9535,0.9405,1.0,0.8819,0.9459,0.9459,0.9661,0.9199,1.0,0.9535,0.9428,0.9535,0.866,0.8944,1.0,0.8819,0.9129,0.9075,0.9535,0.9535,0.9459,1.0,0.9574,0.9354,0.9608,1.0,0.7845,0.9129,0.8165,0.9199,0.9806,0.9393,0.9661,0.9393,0.8944,0.9555,0.866,0.9555,0.9258,0.8819,1.0,0.8745,0.9574,1.0,0.9469,0.978,0.8944,0.9293,0.9636,1.0,0.9806,0.9393,0.9129,0.9733,0.9747,0.9826,0.9535,0.9129,0.9636,0.9535,0.9806,0.9258,0.9747,0.9309,1.0,0.9014,0.9199,0.9199,0.9555,0.9309,0.9129,0.9014,0.9258,0.8563,0.8321,0.9459,0.8528,0.9759,0.9555,0.9608,0.9459,0.9759,0.977,0.9718,1.0,0.9759,0.9832,0.9789,0.8987,0.9608,1.0,0.9258,0.9075,0.9199,0.9733,0.9393,0.9459,0.9535,0.9459,0.9535,0.7071,0.9608,0.982,0.9535,0.9512,0.9258,0.9487,0.9718,0.8452,0.8771,0.8771,0.9682,0.9574,0.9354,1.0,0.9459,0.9682,0.8864,0.9661,0.9293,0.9623,0.9045,1.0,1.0,0.9608,0.8819,0.9487,0.9393,0.8847,1.0,1.0,0.9592,0.9075,0.9733,0.8367,0.8944,0.8745,1.0,0.9393,0.9354,0.9636,0.977,1.0,0.8997,0.9487,0.9129,0.9718,0.9608,0.9759,1.0,0.978,0.8944,1.0,0.9014,0.9075,0.9199,0.9309,1.0,0.9309,0.9405,1.0,1.0,0.9309,0.8944,0.9535,0.9535,0.8944,0.977,1.0,0.7746,0.9806,0.9459,0.866,1.0,1.0,0.9129,0.9747,0.9682,0.9129,0.9258,1.0,0.9733,0.8771,0.9535,0.9798,0.9129,0.922,0.9535,0.9129,0.9608,0.978,0.8944,0.922,0.9354,0.9636,0.9129,0.9759,0.9623,1.0,1.0,0.866,0.9718,0.9129,0.8944,0.866,0.9608,0.9733,0.9747,0.9789,0.9045,0.9449,0.9354,0.9014,0.8944,0.922,0.9759,0.9623,0.9014,0.9535,1.0,0.9682,0.9574,1.0,0.9747,0.9293,0.9682,0.9487,1.0,0.9813,0.9487,0.978,0.9535,0.9759,0.9733,0.9682,0.9199,0.9354,0.9733,1.0,0.9487,0.9393,0.9718,1.0,0.9325,0.879,0.9309,0.9487,0.9636
4,432418,professional,New York City,N,[8843543],"[Business Model, Data/Analytics, Business Oper...","['coffee', 'tennis', 'model', 'game', 'white',...",1.0,1.0,0.9393,0.9045,0.0,0.9813,0.9574,0.9354,1.0,0.9459,1.0,0.8944,0.6325,1.0,1.0,0.9333,0.9535,0.9747,0.9682,0.866,0.9459,0.8452,1.0,1.0,0.9045,0.9258,0.9428,0.8944,0.7906,0.9258,0.9258,0.9487,0.9075,0.9258,0.9682,0.9199,0.9258,0.9354,0.9661,0.9759,0.9535,1.0,1.0,0.9129,1.0,1.0,0.9789,0.9574,0.9428,0.9701,0.9428,0.866,1.0,0.9826,1.0,1.0,0.8819,0.9813,0.9733,1.0,0.9535,0.9129,0.9177,1.0,1.0,0.9045,0.9636,1.0,1.0,0.9354,0.9718,0.9701,0.9258,0.9555,0.9701,1.0,1.0,0.9129,0.922,0.9469,0.922,0.922,0.9682,0.8165,0.9592,0.9535,0.9636,1.0,0.9487,0.9293,0.922,0.9014,1.0,1.0,0.9692,0.9682,1.0,0.9354,0.9636,1.0,0.9718,1.0,0.8367,0.9608,0.9718,0.8944,0.866,0.9837,0.9701,0.9487,0.8745,1.0,0.9535,0.9393,0.7906,0.8745,1.0,0.9199,0.7906,0.8944,0.9045,0.978,1.0,0.9806,0.923,0.9354,0.9075,0.9574,0.8997,0.9806,1.0,1.0,0.9258,1.0,0.8819,1.0,0.9718,0.9798,0.9636,0.9789,0.9608,0.9636,0.9636,1.0,0.9608,1.0,0.9733,0.9487,0.9075,1.0,0.9682,0.9574,0.9555,0.9014,0.7746,0.9535,0.9199,0.9813,0.9701,0.9733,1.0,0.9199,0.9487,0.9608,0.9806,1.0,0.9733,0.9636,0.9309,0.9574,0.8997,0.978,0.8321,0.923,0.9354,0.9487,0.9075,1.0,0.9636,0.977,0.9129,0.9574,1.0,0.9555,0.7638,0.9733,1.0,0.7454,0.9789,0.9381,0.8597,1.0,0.9487,0.8584,0.8771,0.9733,0.9636,0.9718,0.9636,0.9574,0.9129,1.0,0.9562,0.9354,0.8367,0.9354,1.0,0.982,0.9733,0.9661,0.8944,1.0,0.9718,0.9075,0.9354,0.9798,0.9405,0.9393,1.0,0.9487,0.9354,0.8528,0.9636,0.8452,0.8944,0.9129,0.9428,0.9129,0.8819,0.8944,1.0,0.8944,0.9608,1.0,0.9428,0.9354,0.9661,0.9574,0.9045,1.0,0.8771,0.9459,1.0,1.0,0.8997,0.9177,0.8944,0.8563,1.0,0.7977,0.9682,1.0,1.0,0.9759,0.9129,0.9325,1.0,0.9258,0.9045,0.9428,1.0,0.9045,0.9428,0.8944,0.9393,0.9258,0.9014,0.8452,1.0,0.8885,0.9487,0.8944,1.0,0.9535,0.866,0.9649,0.8563,0.8944,1.0,0.7746,0.9075,1.0,0.9636,1.0,0.978,0.8864,1.0,0.866,0.8864,1.0,0.9555,1.0,0.9608,1.0,1.0,0.9487,0.866,0.866,0.9487,0.8745,0.9535,0.9199,1.0,0.9045,0.9129,0.8528,0.8528,0.8528,1.0,1.0,1.0,0.9428,1.0,0.9487,1.0,0.9309,0.9258,0.866,1.0,0.9258,0.8944,0.9428,0.9636,0.866,0.8864,1.0,0.9487,0.866,0.866,0.9636,0.8745,0.7638,0.9733,0.9661,1.0,0.9258,0.9574,1.0,0.9636,0.9701,0.9393,0.9428,0.9045,0.8367,0.9014,1.0,0.9075,0.7746,0.8944,0.9309,0.9354,1.0,0.8165,0.8944,0.8452,1.0,0.9487,0.9258,0.9459,0.7559,0.9199,0.9574,1.0,0.9701,0.9636,0.9718,1.0,0.9759,0.9129,0.9199,0.9636,0.9608,0.9574,0.8528,1.0,0.9718,1.0,0.866,0.9258,1.0,0.866,1.0,1.0,1.0,0.9718,0.8944,0.9636,0.9487,0.9574,0.9045,0.9608,0.9309,0.9826,0.9636,0.9199,0.9258,1.0,0.978,1.0,0.8402,0.9129,0.977,1.0,0.9487,0.9045,0.7559,0.9045,0.9258,0.866,0.9682,0.9129,0.8819,0.8771,0.9682,0.9636,0.9177,1.0,0.9045,1.0,1.0,0.8944,1.0,0.7559,1.0,0.8165,0.9014,0.6124,0.9309,0.9682,1.0,0.7385,1.0,0.922,0.8997,0.9661,1.0,0.978,0.9512,0.7746,1.0,1.0,0.9165,0.9733,0.978,0.9623,0.9733,0.9718,0.8367,0.9459,0.8944,0.9682,0.9759,0.9535,0.9258,1.0,0.9129,0.9701,0.9258,1.0,0.9405,1.0,0.8819,0.9733,1.0,0.9309,1.0,1.0,0.9045,0.8165,0.9535,0.7638,0.8367,0.8864,0.8819,0.866,0.9075,0.9045,1.0,0.9177,0.9636,0.9574,0.9014,0.8771,0.9661,0.8771,0.866,0.9129,1.0,1.0,1.0,0.9309,0.9075,0.9487,0.9325,0.9354,0.9555,0.9512,0.9428,0.9199,0.9075,0.9789,0.9843,1.0,1.0,0.9309,0.977,1.0,1.0,0.9405,0.9701,0.8165,0.9733,0.9747,0.9469,0.9045,0.9574,0.9258,0.9293,1.0,1.0,0.9487,0.9661,0.9798,1.0,0.9405,1.0,1.0,0.8563,0.9129,0.9682,1.0,0.8944,0.9608,0.9733,1.0,1.0,0.9555,0.9806,0.9177,0.9759,1.0,0.9428,0.9574,1.0,0.9661,0.9789,0.9608,0.9608,0.9504,0.8864,0.9701,0.9608,1.0,1.0,0.9177,0.9045,0.9459,0.9045,0.9354,1.0,0.982,1.0,0.9512,1.0,0.7746,0.9428,0.7559,0.9608,0.9608,0.9354,0.9129,0.7906,0.9636,1.0,1.0,1.0,0.9309,0.9293,0.9428,0.9045,0.9806,0.9636,0.9608,0.8819,0.9487,0.9393,0.978,0.9592,1.0,0.9592,0.9393,1.0,0.8944,0.7071,0.9701,0.9258,0.9701,0.866,0.9636,0.9293,0.9428,0.9512,0.8944,0.866,0.9129,0.8987,0.9759,0.9487,0.978,0.9661,0.977,0.9354,0.9393,0.8771,0.9309,0.9661,0.8944,0.9405,0.9813,1.0,0.9309,0.5477,0.977,0.9535,1.0,0.9535,0.9393,0.8367,0.9806,0.9459,0.8165,0.9393,1.0,1.0,0.8944,0.9682,0.9574,0.8452,1.0,0.9459,0.9608,0.8528,0.9165,0.9574,0.922,0.9293,0.9574,1.0,0.978,0.9798,0.922,0.9682,0.9636,0.9574,0.9759,0.9813,1.0,0.9535,0.9682,0.9718,0.9129,1.0,0.7906,0.8321,0.9733,1.0,0.9789,0.8528,0.9449,0.9574,0.9682,0.7746,0.9487,0.9258,0.9623,0.9354,0.977,0.9354,0.9682,0.9574,0.9747,0.9747,0.9535,1.0,1.0,1.0,0.9623,0.9747,0.978,0.8528,1.0,0.9733,0.866,0.9199,0.9789,1.0,1.0,0.9747,0.9701,0.9718,1.0,0.9325,0.9045,0.9661,0.9747,0.9636


In [232]:
df_distances.tail()

Unnamed: 0,nyc_id,userType,location,is_vip,past_advisors,topics_all,str_combined,8843543,7755085,9202605,7666762,432418,3877105,1020980,7896731,6961087,6402058,2290089,1148025,4999309,4369680,382570,284637,8187296,8476629,7306029,5652660,204780,5677744,2593443,4601513,4878277,2620815,3288117,5708352,5474030,4046542,4340193,7474138,3712338,352403,2348870,8526469,7743463,6459524,9929858,9880082,1518524,4788367,8542359,997987,4629976,5064585,4755310,7209834,606005,5017981,5247078,8902800,9134615,7048387,6751671,1348913,6937862,8064126,8436578,8910889,3230467,3140161,3667774,4192795,8301349,8033043,5166603,7485700,9540732,4958919,8561114,3122679,8238975,2562280,7987549,6414426,4101601,1983482,878483,6159047,1557244,6113184,2149492,796123,6672031,4933667,625916,3179761,4914479,2956628,6234328,4558259,6966024,6742971,1497213,3569615,1187639,3758768,7681692,3167915,7792027,9462300,1453483,2598156,477453,3771144,813333,8841456,7679971,3116521,4601941,6306797,3195584,2627748,2169538,4999579,2235203,6437056,2308071,2368457,1390411,7694741,3521250,3206074,301431,2669767,7518538,2256585,94522,6955519,9959606,2037455,2526473,8359351,269209,2825496,3511047,1614397,2659801,4410236,515652,4818902,8219730,7918230,5108683,5238740,7413581,6286127,9113243,3805339,2179494,6059897,9998462,3679840,206547,6117465,9957593,6807435,1739284,9178381,7257843,2109474,8699149,8384707,2360585,3965641,9076155,9318559,6119200,7333768,8697016,7023001,757721,537278,7468074,8175409,8788183,6446789,1857735,7650470,6495154,2611714,710683,152075,2198911,6788313,9853788,9808460,6008056,5240141,8907470,9037441,8912858,6581843,9486722,1110483,4358079,9356485,2271359,7915205,5800707,2794321,2427696,6535431,5402293,8693468,4236687,427550,7888951,2826824,6928588,2527975,5493738,7575883,1597931,1575847,1781688,372231,2282239,4319835,4920915,1484378,2299050,4846292,4336956,5786211,7893434,3895624,2859992,7198024,3423783,6675599,9576956,6126279,5686548,1825469,9700647,2868050,4702672,1146067,4015930,4401457,9546947,94925,1190509,2932188,3266779,8907446,7569840,1746835,3382302,806560,9889124,8560400,6993819,1717171,1758129,3648059,654011,2654860,4409144,345140,7904109,656623,4360626,2373514,6037605,4915718,7945608,381348,9613380,1043372,2004698,8364846,8281303,1555515,3062857,6142470,6025522,2631274,8012883,9664092,7293490,4295994,8811878,4278733,6294925,4723827,8619826,2292042,3960526,5995636,3003753,6696486,6342552,1444202,4885804,4715334,8225142,8653095,1977774,5020397,6314809,6749269,2001782,2569801,4392229,6138397,4095341,2326281,2674601,6272321,7487443,8810505,1085840,8753052,7423155,8633724,6815031,710220,3874565,8985652,7158129,8733709,5064615,2031736,529104,3088834,184099,7680557,8061648,2326949,5205644,6512033,2981141,7601225,822710,2603797,8087478,9938106,3361150,4456308,7594522,9469995,6045830,746096,8259152,422067,7423610,9556430,4214457,3490399,6458334,8846614,6545230,2926356,311409,7324067,5769802,9258719,638772,5485758,6160406,1888196,742264,93981,7958106,8190906,1901532,8632938,2898324,6948891,3509690,7465231,7881557,2320522,1171955,5343640,6617302,2262694,3340724,4604046,2130677,7258728,5023302,7115544,629294,3906792,7841905,4316201,5353925,5978678,4605980,4715885,4923495,125254,2270136,3046384,1905636,1533613,1574686,9973649,436488,2513052,1372385,5791689,7936400,9391125,9773408,5810449,5311075,3092112,1106591,3684523,5182726,7926790,5175387,9389960,428794,3967508,1220914,4345907,5884571,6290417,27779,1579045,1446459,9413760,7110487,2663308,3658078,5880932,8294767,280351,5694419,9224675,7210977,2340455,2452387,2012792,3427395,4187508,4304558,9471033,7748650,385275,9498286,5624053,4686614,8079379,8195703,7828150,1550130,8969696,3032368,1325678,2588879,3703222,6849726,7432662,6778076,6618920,6004942,7496115,6545500,7517791,2799767,5682257,8236719,9658179,7031751,4891992,1963257,8813293,4255011,9835021,5997546,6025427,3398687,1041157,9684715,8769575,5346125,2258308,8923238,7967766,9249359,704895,2185414,9382336,3018819,916644,4119888,8835562,3008395,2212759,9969727,8949839,8265769,4404402,1443873,9990971,9593316,7950567,3828722,7601152,2991448,6512271,2903590,2659610,4077631,2493337,264160,8577110,5766040,2146133,6493111,7638660,7258247,7657509,9932727,7245777,692601,2856925,2352193,2815849,3047315,9642528,3861671,9838132,4075604,4075518,7591652,7908743,9012509,5757510,1005387,4768438,918358,7524608,9934813,6541993,1449114,606407,7931432,4735198,5695334,6550439,8445244,746687,8725761,7670905,701771,9875733,3379787,1842470,983794,8658583,9697676,1468575,9853865,1784635,760211,4667134,9116597,287256,1714194,6791590,2773262,8461802,3760451,4973934,8971747,6287556,6272170,1619149,1256384,7606837,3004297,4514606,2609078,6156693,5206440,4018457,8477772,5212929,5849482,655806,8183052,2880921,506870,3328854,8777873,5357257,1396492,5678088,3589593,7607153,4588539,2951891,4200010,4648718,2696094,4989097,2169312,7658012,5070048,792212,3557154,9648543,8347381,1785441,2330592,4282915,7280759,7414746,1771221,2555756,6935947,4521023,6514761,3355184,9660873,6088914,8233640,8026321,6931189,7830992,245037,2679141,3057273,5682172,5529070,2228699,6922565,7120601,2900118,8633469,5950948,6523262,8363241,3431960,7594141,7772712,8591435,1665700,1444831,4133681,2906883,8060776,5308660,5432977,1065233,8700849,7695512,5458104,7084778,3717562,3021601,6686909,1320940,4453336,1667655,3595011,483768,7050318,6100956,2970600,3929247,3571695,8217139,6075595,9546179,8420826,7184217,5151812,9785062,3465395,5338447,6707682,8387107,746643,1232766,9732127,2160805,646836,7258638,9079878,8199153,5175701,4128637,9455964,4989182,3945611,9332538,1495788,1246537,1788831,2749455,4306743,5137921,9707252,7076461,1359054,3977768,1008130
701,9707252,student,New York City,N,[8843543],"[Business Model, Engineering - Learning the ba...","['engineering', 'boy', 'crime', 'app', 'watch'...",0.8944,0.9535,1.0,0.8528,0.8864,0.8165,0.8898,0.866,0.8367,0.8584,1.0,0.8563,0.8367,0.8165,0.9293,0.9333,0.9293,0.866,0.866,0.9574,0.8885,0.8864,0.8321,0.9636,0.7977,0.8452,1.0,0.8944,0.7071,0.8864,0.9258,1.0,0.8745,0.7559,0.9354,0.8321,0.8452,0.866,0.8165,0.9258,0.879,0.8864,0.9608,0.9129,0.9293,0.7906,0.9129,0.9574,0.8819,0.9235,0.8819,0.8292,0.8864,0.9649,0.8745,1.0,0.8819,0.8607,0.9177,0.9354,0.9045,0.8498,0.9733,0.8997,1.0,0.977,0.8452,0.9097,1.0,0.9014,0.8165,0.7276,0.8018,0.9555,0.8402,0.9682,0.9354,0.866,0.922,0.871,0.866,0.922,0.9014,0.9129,0.9165,0.9293,0.9258,0.8584,0.866,0.9535,0.8944,0.7906,0.8885,0.8885,0.9211,0.8292,0.9129,0.9682,0.9636,0.866,0.8819,0.9487,0.8367,0.9199,0.9718,0.7746,0.866,0.9672,0.8044,0.922,0.9075,0.9608,0.9293,0.8402,0.866,0.8044,0.8584,0.9199,0.8292,0.8165,0.9045,0.9089,1.0,0.9199,0.923,0.9354,0.9701,0.8898,0.8997,0.8987,0.866,0.8452,0.9512,0.9075,0.8165,0.8528,0.8165,0.8718,0.9258,0.866,0.8321,0.8864,0.8864,0.8944,0.9608,0.866,0.9177,0.8367,0.9075,0.8452,0.866,0.8416,1.0,0.75,0.8367,0.9045,0.8771,0.8607,0.9393,0.9177,0.9063,0.7338,0.9487,0.9199,0.8987,0.8528,0.8584,0.8864,0.8563,0.8165,0.9512,0.9555,0.9199,0.923,0.9354,0.8367,0.8745,0.9487,0.8864,0.8257,0.866,0.8165,0.9045,0.8076,0.8165,0.8885,0.8165,0.8165,0.8416,0.8944,0.9089,1.0,0.922,0.8272,0.7845,0.8885,0.9636,0.9129,0.8864,0.8165,0.9574,0.9258,0.8783,0.9354,0.7746,0.8292,0.9045,0.9063,0.8272,0.7746,0.8944,0.8745,0.8498,0.9075,0.7906,0.9165,0.8321,0.8402,0.9258,0.8367,0.8898,0.7977,0.9258,0.8452,0.8944,0.8819,0.8819,0.866,0.8165,0.7746,0.866,0.9309,0.9199,0.9075,0.9428,0.8165,0.8165,0.866,0.7977,0.9661,0.9199,0.7947,0.8819,0.8528,0.8997,0.7947,0.7746,0.8165,0.9718,0.7977,0.866,0.9258,0.8584,0.8997,0.8165,0.7802,0.9129,0.8018,0.8528,0.7454,0.9608,0.7977,0.9428,0.7746,0.9393,0.8864,0.8292,0.8018,0.7977,0.7947,0.8944,0.8944,0.866,0.8528,0.866,0.9469,0.8563,0.7071,0.922,0.7071,0.9393,0.8745,0.8018,0.7559,0.8847,0.9636,0.8745,0.7906,0.8452,0.8257,0.8076,0.8819,0.7845,0.8944,0.8944,0.8367,0.8165,0.866,0.8944,0.8044,0.9535,0.9199,0.8819,0.7977,0.9129,0.8528,0.7977,0.8528,0.9309,0.9574,0.8944,0.9129,0.9309,0.8944,0.9258,0.8563,0.7559,0.866,0.9354,0.8452,0.9309,0.9428,0.9258,0.8165,0.7559,0.9089,0.9129,0.866,0.9129,0.8864,0.9075,0.866,0.8885,0.8563,0.9129,0.9636,0.866,0.75,0.8018,0.8044,0.8402,0.8819,0.6742,0.9487,0.9014,0.9608,0.9075,0.7071,0.7746,0.8563,1.0,0.8864,0.7454,0.9661,0.9258,0.8944,0.7746,0.8452,0.8885,0.7559,0.8771,0.8416,0.8165,0.8745,0.9063,0.8165,0.8944,0.8452,0.8498,0.9199,0.9258,0.8321,0.866,0.8528,0.8864,0.9428,0.922,0.866,0.8997,1.0,0.9014,0.9636,0.8745,0.9393,0.8819,0.7746,0.8018,0.9487,0.866,0.9535,1.0,0.8944,0.8906,0.9636,0.8771,0.8864,0.8819,0.9089,0.9636,0.8745,0.866,0.9535,0.8944,0.8367,0.9535,0.5345,0.9045,0.8864,0.866,1.0,0.866,0.5774,0.6794,0.9354,0.9258,0.8885,0.8452,0.7385,0.9535,0.866,0.7746,0.866,0.7559,0.9393,0.8165,0.9354,0.7071,0.9661,0.9014,0.8864,0.9045,0.9459,0.866,0.8165,0.8944,0.9045,0.9089,0.8997,0.7746,0.8563,1.0,0.8718,0.8885,0.9089,0.9027,0.8885,0.8819,0.8944,0.9459,0.8944,0.8292,0.9759,0.9045,0.8452,0.9075,0.8165,0.8044,0.9258,0.9535,0.8987,0.9258,0.8819,0.9733,0.8885,0.8165,0.9199,0.8864,0.7977,0.8165,0.9293,0.7638,0.8367,0.9636,0.8165,0.9129,0.9075,0.8528,0.977,0.8584,0.8452,0.9129,0.9014,0.9199,0.9309,0.9199,0.9129,0.8165,0.8771,0.9199,0.8402,0.8563,0.8745,0.7746,0.8847,0.7906,0.9325,0.7868,0.9428,0.8771,0.8044,0.9129,0.9014,0.9285,0.8847,0.7303,0.879,0.9449,0.8944,0.9199,0.9701,0.7638,0.9177,0.8367,0.8906,0.8528,0.9574,0.9258,0.879,0.8987,0.8997,0.8062,0.8563,0.9592,0.866,0.8549,0.9199,0.8847,0.8944,0.8498,0.9354,0.8452,0.8944,0.7338,0.8584,0.9045,0.8729,0.9089,0.8771,0.8885,0.9512,0.9045,0.8819,0.8898,0.9258,0.8944,0.9129,0.9199,0.9199,0.8799,0.8452,0.8745,0.9199,0.8272,0.8044,0.8584,0.9045,0.8584,0.9045,0.7906,0.9608,0.866,0.8528,0.8729,0.8864,0.7746,0.9428,0.8452,0.8771,0.9199,0.866,0.9129,0.9354,0.8452,0.9733,0.866,0.8018,0.9309,0.8528,0.9428,0.7977,0.9199,0.9258,0.9199,0.9129,0.9487,0.8402,0.8597,0.8944,0.9487,0.8485,0.8745,0.8885,0.9487,0.8367,0.9701,0.9258,0.8044,0.9354,0.9258,0.8528,0.8165,0.9258,0.8367,0.7638,0.8498,0.9608,0.8452,0.7746,0.9555,0.9661,0.9045,0.8292,0.8402,0.8321,0.8165,0.9309,0.8563,0.9405,0.923,0.8165,0.8165,0.8367,0.9293,0.9535,0.8165,0.8257,0.9075,0.8944,0.8321,0.8885,0.866,0.8745,0.8528,0.9129,0.8944,0.8292,0.9129,0.8452,0.866,0.7947,0.7845,0.7385,0.9165,0.9129,0.8944,0.879,0.9129,0.7845,0.8076,0.8944,0.922,0.866,0.8864,0.9129,0.8729,0.9623,0.7559,0.9045,0.9014,0.8819,0.9129,0.9309,0.7906,0.8771,0.8584,0.8367,0.9129,0.8528,0.8864,0.866,0.866,0.7746,0.866,0.8729,0.923,0.75,0.8528,0.866,0.9014,0.9129,0.866,0.866,0.9293,0.9354,0.866,0.9211,0.8607,0.8367,0.8597,0.8528,0.8997,0.8885,0.9354,0.8321,0.9129,0.8885,0.8864,0.922,0.8745,0.9129,0.866,0.0,0.9293,0.9129,0.866,0.8864
702,7076461,student,San Francisco,N,[8843543],"[Media/Content, Business Operations, Business ...","['cheese', 'new', 'zara', 'business', 'media/c...",0.9309,0.9293,0.9393,0.7385,0.8452,0.9027,0.9789,0.7906,0.9487,0.8885,0.9199,0.8165,0.8367,0.9258,0.9293,0.8799,0.9293,0.9747,0.9354,0.9574,0.9177,0.9258,0.9199,0.9636,1.0,0.9258,1.0,0.9487,0.866,1.0,0.9258,0.8944,0.9701,0.8452,0.9014,0.9199,0.8864,0.9354,0.9661,0.8997,0.9535,1.0,0.9608,0.866,0.977,1.0,0.9354,0.866,0.9428,0.9701,0.8819,0.9014,1.0,0.9649,0.9701,1.0,0.7454,0.9813,0.9733,0.9354,0.9535,0.7817,0.8885,0.9258,0.9535,0.9045,0.9636,0.9649,0.9199,0.9014,0.8498,0.9701,0.8018,0.8847,0.9701,0.9354,0.9574,0.7638,0.922,0.851,0.866,0.8944,0.9682,0.8165,0.9165,0.9293,0.9258,0.9733,0.8367,0.7977,0.922,0.9014,0.9733,0.9733,0.9211,0.9014,0.866,0.9014,0.9636,0.9487,0.9428,0.9661,0.7746,0.9199,0.9718,0.8944,1.0,0.9158,0.9393,0.866,0.8402,1.0,0.8528,0.9701,0.7906,0.9075,0.9459,0.9608,0.7906,0.9309,0.9045,0.8847,0.9393,0.9405,0.8165,0.866,0.8402,0.866,0.8452,0.9608,0.9487,1.0,0.8452,0.9701,1.0,0.9045,0.9129,0.9381,0.9636,0.9574,0.9608,1.0,1.0,0.9381,0.8771,1.0,0.8272,0.9487,0.8402,0.9512,0.9354,0.9354,0.9325,0.7906,0.8367,1.0,1.0,0.9813,0.9393,0.9733,0.9258,0.8771,0.8944,0.9806,0.9199,0.9535,0.8885,0.9258,0.9661,1.0,0.8997,0.9089,0.7845,0.923,0.9789,0.9747,0.9701,0.9487,0.8452,0.977,0.8898,1.0,1.0,0.8847,0.9129,0.9733,1.0,0.8165,0.9354,0.9381,0.9089,0.9309,0.9747,0.9177,0.8771,0.9733,0.9258,1.0,0.9636,1.0,1.0,0.9258,0.9562,1.0,0.9487,0.9354,0.9045,0.9063,0.9733,0.9309,1.0,0.8745,0.9129,0.7276,0.9014,0.9381,0.8771,0.9075,0.8452,0.866,0.7906,0.8528,0.9063,0.7559,0.9661,0.7454,0.8165,0.9129,0.9428,1.0,0.8367,0.8944,0.8549,1.0,0.8498,0.9129,0.8944,1.0,0.9045,0.8944,0.8771,0.7255,0.9129,1.0,0.8729,0.9459,1.0,0.8563,0.9428,0.7977,0.866,0.9636,0.9733,0.8729,1.0,0.7802,0.9354,0.9636,0.7977,0.9428,0.9199,1.0,0.8819,0.8563,0.9393,0.8452,0.866,0.8452,1.0,0.9177,0.9487,0.9309,1.0,0.9535,0.9574,0.9649,0.8944,0.8367,0.9487,0.8944,0.9393,0.9701,0.9636,0.9636,1.0,0.8864,0.9701,0.7906,0.9636,0.9045,0.8597,0.8819,0.9608,0.8944,0.9309,0.8367,0.9129,1.0,0.8367,0.8745,0.8528,0.5547,1.0,0.879,0.8165,0.9045,0.8528,0.8528,1.0,0.9574,0.8944,0.9129,0.9309,0.8944,1.0,0.9309,0.7559,0.7906,0.9354,0.8864,0.8165,0.8498,0.8018,0.866,0.8018,0.978,0.9129,0.9014,0.7071,0.8452,0.9393,0.866,0.9459,0.9309,0.9574,0.8864,0.9574,0.9682,1.0,0.9075,0.9393,1.0,0.8528,0.8944,0.866,0.9608,0.8402,0.8367,1.0,1.0,0.9354,0.9636,0.8819,0.8563,0.8018,1.0,0.8367,0.9258,0.7255,0.7559,0.9199,0.8898,1.0,0.9701,0.8864,0.9428,0.9487,0.9258,0.9428,0.9608,0.9636,0.9199,1.0,0.8528,1.0,0.9129,0.922,0.9354,0.8729,0.9535,0.9014,0.9258,0.9393,0.9393,0.9428,1.0,0.9636,0.7746,1.0,0.8528,0.9608,0.9487,1.0,0.8452,0.7845,0.8452,0.9428,0.9089,0.8864,0.8044,1.0,0.879,0.9309,0.9487,0.7385,0.8452,1.0,0.9258,0.75,0.9014,0.9574,0.8165,0.8771,0.7906,0.9636,0.9177,0.8864,0.8528,0.8528,0.9574,0.8944,1.0,0.8452,1.0,0.7071,0.9682,0.7906,0.8563,0.9354,1.0,0.7977,0.8885,0.8062,0.8452,0.9309,0.9045,0.8847,0.8729,0.7746,0.9309,0.7845,0.8944,0.8885,1.0,0.9027,0.9733,0.8498,0.8367,0.9459,0.9487,0.866,0.9258,1.0,0.9636,0.9701,0.8165,0.9075,0.9636,0.9535,0.9199,1.0,1.0,0.9459,0.8584,0.9661,0.9608,0.9258,0.9045,0.8819,0.8257,0.7638,0.8944,0.8864,0.9428,0.8165,0.8044,0.9535,0.9535,0.9177,1.0,0.9129,0.8292,0.9199,1.0,0.6794,0.6455,0.866,0.9608,0.9199,0.9701,0.9309,0.9075,0.8944,0.9089,0.866,0.8847,0.8997,0.9428,1.0,0.8745,0.9574,0.952,0.9285,0.8847,0.8944,0.9045,0.9449,0.9309,0.8549,0.8402,0.8165,0.8885,0.9747,0.9285,0.9535,0.9574,0.9258,0.9045,0.9806,0.8997,0.9747,0.9309,0.9798,0.7906,0.8987,0.8771,0.978,0.8563,0.8498,0.9354,0.9636,0.8165,0.7845,0.9733,0.7977,0.9759,0.9555,0.9199,0.8584,0.9759,0.9535,0.9129,0.9354,0.9512,0.9487,0.9789,0.8321,0.9199,0.9672,0.8452,0.8402,0.7845,0.9459,0.9075,0.8272,0.9535,0.9177,0.9535,0.7906,0.9608,0.982,0.9293,0.9258,0.8018,0.8944,0.9129,0.7559,0.9199,0.8771,0.9354,1.0,0.866,1.0,0.9459,0.9014,0.9258,0.9309,0.9045,0.8819,0.9045,0.9608,1.0,0.9608,0.7817,0.9487,0.9075,0.8597,0.9381,1.0,0.9381,0.9393,0.9459,0.8367,0.7071,0.9075,1.0,0.9393,0.9354,0.9636,0.8528,0.9129,0.8729,0.8944,0.8165,0.9129,0.9199,0.9512,1.0,0.978,0.8165,0.9045,0.866,0.9075,0.8771,0.9661,1.0,0.8944,0.9199,0.923,1.0,0.7303,0.8367,0.977,0.977,0.8944,0.9293,0.9075,0.7746,0.9405,0.8885,0.7638,1.0,1.0,0.9129,0.922,0.9354,0.9129,0.8864,0.9354,0.9177,0.8771,0.8528,0.9165,0.8898,0.866,0.9045,0.9574,0.9199,0.9325,0.9165,0.8944,0.866,0.9636,0.8898,0.8997,0.9623,1.0,1.0,0.866,0.9718,0.8819,0.9309,0.866,0.9199,0.9733,0.9747,0.9789,0.7977,0.8864,0.9574,0.9354,0.8367,0.866,0.8997,0.9027,0.9354,0.9045,0.9682,0.9682,0.9129,0.9487,0.922,0.9045,0.9682,0.922,0.9847,0.923,0.9747,0.978,0.8528,0.9512,0.9733,0.866,0.8771,0.9354,0.9177,1.0,0.922,0.9393,0.9428,1.0,0.9325,0.0,0.9487,0.9747,0.9063
703,1359054,student,New York City,N,[8843543],"[Media/Content, Engineering - Front End, QA]","['engineering', 'music', 'direction', 'harry',...",0.8944,0.7977,0.9393,0.7977,0.9258,0.7935,0.866,0.7906,0.7746,0.8584,0.9608,0.9309,1.0,0.9258,0.8257,0.9837,0.8528,0.8367,0.8292,0.9574,0.8885,0.9258,0.6794,0.8864,0.7977,0.7868,1.0,0.7071,0.866,0.8452,0.9258,0.8944,0.8745,0.8452,0.9014,0.9199,0.8018,0.9574,0.7746,0.7868,0.9535,0.8018,0.7845,0.9789,0.9293,0.8292,0.9129,0.7638,0.8819,0.8911,0.8165,0.9014,0.7559,0.9285,0.8402,0.9798,0.8165,0.7454,0.9177,0.866,0.8528,0.8819,0.9733,0.8165,0.9045,0.9045,0.7559,0.7656,0.9199,0.866,0.8498,0.767,0.7559,0.9325,0.686,0.866,0.7638,0.866,0.922,0.871,0.7746,0.9747,0.9014,1.0,0.8485,0.8528,0.9258,0.6882,0.8367,0.9293,0.922,0.866,0.7609,0.6489,0.8348,0.75,0.9129,0.9014,1.0,0.7071,0.9129,0.8563,0.8367,0.8771,0.9129,0.9309,0.7906,0.898,0.6417,0.9747,0.9393,0.9608,0.8257,0.7276,0.9354,0.8402,0.7609,0.8771,0.9014,0.8563,0.8528,0.8597,0.9075,0.9405,0.923,0.9014,0.8745,0.866,0.7868,0.7596,0.7416,0.7559,0.9759,0.767,0.7454,0.7977,0.8819,0.8246,0.8452,0.7071,0.6794,0.9258,0.8018,0.7483,0.9199,0.7638,0.7255,0.7746,0.9075,0.8165,0.866,0.7906,0.9555,0.7906,0.6325,0.9045,0.8321,0.8165,0.8044,0.8584,0.8238,0.6794,0.922,0.9199,0.8987,0.7977,0.8272,0.7559,0.8165,0.7638,0.9258,0.8597,0.9199,0.8819,0.866,0.7746,0.8745,0.8367,0.9258,0.7071,0.9129,0.7638,0.7977,0.8341,0.9129,0.9459,0.7454,1.0,0.8165,0.8246,0.9325,0.9309,0.8944,0.8272,0.8771,0.7609,0.9258,0.7454,0.8452,0.7638,0.9574,0.8452,0.9103,0.9354,0.8944,0.75,0.7977,0.8452,0.7947,0.6831,0.8367,0.7276,0.7817,0.9393,0.866,0.8944,0.8321,0.8044,0.7071,0.8367,0.8898,0.9045,0.9258,0.9258,0.8165,0.8819,0.9428,0.9129,0.8165,0.7071,0.866,0.9661,0.8321,0.9075,0.9129,0.8165,0.7746,0.7638,0.8528,0.8944,0.9608,0.8584,0.8498,0.6742,0.9258,0.8272,0.8367,0.8944,0.9428,0.9535,0.866,0.8452,0.562,0.9258,0.7638,0.8076,0.7906,0.8864,0.9045,0.6667,0.8321,0.7385,0.9428,0.8563,0.8745,0.8864,0.9014,0.9258,0.7385,0.8272,0.8367,0.9661,0.7638,0.9535,0.866,0.8094,0.8944,0.7746,0.7746,0.8367,0.9393,0.767,0.8452,0.8018,0.8597,0.8452,0.767,0.9014,0.7559,0.7977,0.7518,0.7817,0.7845,0.7071,0.8563,0.8062,0.9129,0.9354,0.8944,0.9075,0.9045,0.8771,0.8819,0.9045,0.9129,0.9045,0.9535,0.9045,0.8944,0.8165,0.8563,0.7454,0.8944,0.7071,0.9636,0.8563,0.7559,0.9354,0.9014,1.0,0.9661,0.8819,0.8864,0.7638,0.9258,0.8847,0.8563,0.9682,0.9574,0.8864,0.9393,0.9574,0.9733,0.7746,0.7638,0.9636,0.8165,0.8292,0.8452,0.8044,0.686,1.0,0.9045,0.9487,0.9682,0.8771,0.9075,0.8944,0.7746,0.8165,0.866,0.7559,0.9428,0.8944,0.9258,0.8944,0.8367,0.8452,0.9177,0.9258,0.9608,0.8416,0.6667,0.8745,0.9063,0.8165,0.8367,0.8165,0.8498,0.8321,0.8452,0.9199,0.8165,0.9535,0.8452,0.9428,0.866,0.9354,0.8729,0.9045,0.9682,0.7559,0.8044,0.8402,0.8165,0.7746,0.7071,0.7746,0.7638,0.9045,0.9608,0.8756,0.871,0.8864,0.9199,0.8864,0.9428,0.8597,0.8452,0.9075,0.7071,0.9045,0.8563,0.8367,0.9045,0.9258,0.7977,0.7559,0.9014,0.9354,0.7638,0.7454,0.7845,0.9014,0.8452,1.0,0.8452,0.7385,0.7385,0.866,0.8944,0.8292,0.7559,0.8745,0.866,0.8292,0.7906,1.0,0.8292,0.8864,1.0,0.8584,0.922,0.8165,0.7303,0.9045,0.9325,0.8165,1.0,0.6831,0.8771,0.8,0.8272,0.8847,0.8607,0.8885,0.7817,1.0,0.8584,0.8944,0.866,0.8729,0.9535,0.9258,0.8044,0.866,0.8745,0.8452,0.7977,0.8771,0.8018,0.6667,0.9459,0.7255,0.7746,0.8321,0.7559,0.9045,0.9428,0.9535,0.9129,0.8367,0.9258,0.9428,0.9574,0.8745,0.7977,0.977,0.8885,0.8018,0.8165,0.866,1.0,0.8563,0.8771,1.0,0.8165,0.6794,0.8321,0.767,0.8165,0.9075,0.8062,0.8341,0.8292,0.8847,0.8452,0.8165,0.8321,0.8044,0.7906,0.8478,0.8305,0.7802,0.6831,0.8257,0.866,0.8563,0.8771,0.8402,0.8165,0.6882,0.8367,0.8305,0.8528,0.866,0.8452,0.879,0.8086,0.7868,0.866,0.7303,0.8944,0.6614,0.8771,0.7338,0.7518,0.9309,0.8165,0.866,0.5345,0.8944,0.7845,0.8272,0.7977,0.8452,0.8847,0.8771,0.7947,0.9512,0.8257,0.8165,0.866,0.9258,0.8944,0.8165,0.8987,1.0,0.8799,0.9258,0.8745,0.9199,0.8584,0.767,0.8885,1.0,0.8272,0.9535,0.866,0.8321,0.866,0.7385,0.8997,0.8018,0.9487,0.9129,1.0,0.8771,0.7338,0.866,0.8165,1.0,0.8018,0.9459,0.75,0.5976,0.9661,0.8528,0.9813,0.8528,0.8987,0.8452,0.8321,0.9129,0.9747,0.767,0.7518,0.8944,0.9487,0.8246,0.8745,0.8272,0.8944,0.8944,0.8402,0.8864,0.6417,0.866,0.7559,0.879,0.8165,0.8997,0.8944,0.5774,0.9129,0.9405,0.7237,0.7746,0.8597,0.9661,0.9045,0.9014,0.8745,0.9199,0.8944,0.8165,0.8563,0.9199,0.8819,0.6667,0.7746,0.9487,0.7687,0.9535,0.6831,0.879,0.9075,0.8944,0.8771,0.8584,0.9129,0.8745,0.7977,0.7638,0.9747,0.8292,0.7071,0.9636,0.9014,0.8272,0.6202,0.9045,0.9165,0.8898,0.7746,0.9535,0.7638,0.6202,0.8597,0.8944,0.922,0.7071,0.7071,0.866,0.8165,0.923,0.6547,0.8528,0.9354,0.8498,0.9428,0.7746,0.866,1.0,0.8272,0.7416,0.9354,0.9535,0.866,0.7638,0.7906,0.9487,0.8062,0.9258,0.8389,0.75,0.879,0.8292,0.7906,0.7071,0.8944,0.8944,0.7687,0.866,0.8062,0.9374,0.8389,0.7416,0.9089,0.9535,0.7868,0.8272,0.9014,0.8771,0.9129,0.7609,0.8018,0.8062,0.8044,0.8498,0.7638,0.8847,0.9293,0.0,0.8062,0.9258
704,3977768,student,New York City,N,[8843543],"[Engineering - Mobile, UX/UI, Data/Analytics]","['engineering', 'app', 'network', 'reading', '...",0.8944,0.9293,0.9075,0.9045,0.9636,0.8389,0.8416,0.9354,0.8062,0.8272,0.8321,0.9661,0.9487,0.8165,0.9045,0.9837,0.879,0.8062,0.866,0.9574,0.8885,0.9258,0.8321,0.8864,0.7977,0.8452,0.9428,0.8367,0.7906,0.8864,0.9636,0.8367,0.9393,0.8018,0.9682,0.8321,0.9258,0.9574,0.8165,0.8165,0.9535,0.8864,0.8771,0.9789,0.9535,0.75,0.9129,0.9129,0.8819,0.9075,0.9129,0.9354,0.8452,0.8906,0.9393,0.9592,1.0,0.8165,0.9459,0.866,0.8528,0.8819,0.9459,0.7559,0.9045,0.9293,0.8018,0.851,1.0,0.9014,0.8165,0.7276,0.8452,0.9555,0.767,0.9682,0.866,0.9574,0.9487,0.871,0.8367,0.9747,0.9014,1.0,0.8485,0.879,1.0,0.7947,0.922,0.977,0.9487,0.9014,0.8272,0.8272,0.8704,0.8292,0.9129,0.866,0.9258,0.8367,0.7817,0.8563,0.8944,0.7845,0.8498,0.9309,0.9354,0.9158,0.8402,0.9747,0.9701,0.8771,0.879,0.8044,0.9354,0.8402,0.8885,0.9199,0.9354,0.9309,0.8528,0.9089,0.9393,0.8987,0.9027,0.9354,0.9701,0.8898,0.9258,0.8086,0.7416,0.8018,0.9512,0.9075,0.8165,0.8528,0.8498,0.8485,0.8864,0.7638,0.7845,0.8864,0.7559,0.8246,0.9608,0.866,0.8272,0.8367,0.9701,0.8729,0.9014,0.866,0.8847,0.8292,0.8367,0.9045,0.8321,0.8607,0.8745,0.8885,0.8864,0.8771,0.9747,0.7845,0.8771,0.7385,0.8272,0.8018,0.8944,0.8165,0.8729,0.8597,0.9199,0.9623,0.8416,0.8367,0.8745,0.8944,0.9258,0.8528,0.9354,0.866,0.8528,0.8847,0.9129,0.9459,0.8165,0.9428,0.8898,0.9592,0.9325,0.9661,0.922,0.8272,0.9199,0.7255,0.9636,0.8165,0.9636,0.866,0.9129,0.8018,0.971,0.9129,0.8367,0.7906,0.8257,0.8864,0.7609,0.8165,0.8944,0.8745,0.8498,0.9701,0.6614,0.8485,0.8771,0.9075,0.8452,0.8944,0.8898,0.9045,0.9063,0.8452,0.8563,0.8819,0.9718,0.9574,0.9428,0.7746,0.866,0.8165,0.8549,0.8745,0.9718,0.8416,0.8563,0.8165,0.9045,0.9309,0.9199,0.7947,0.8498,0.9045,0.9512,0.8584,0.8367,0.9309,0.9129,0.8528,0.9014,0.8864,0.7609,0.8452,0.8165,0.9089,0.9129,0.8864,1.0,0.7454,0.8771,0.8528,0.9129,0.7746,0.8745,0.9636,0.9354,0.9258,0.7977,0.8584,0.922,0.9661,0.866,0.9535,0.9574,0.9097,0.8563,0.8367,0.8062,0.8944,0.9393,0.9075,0.8018,0.8018,0.8341,0.8864,0.8044,0.9354,0.8452,0.8528,0.8341,0.8165,0.9199,0.7746,0.7746,0.8062,0.9129,0.7906,0.8944,0.8402,0.9535,0.9608,0.8165,0.8257,0.9129,0.9535,0.9045,0.9045,0.8944,0.9129,0.7303,0.8819,0.9309,0.7746,0.9258,0.9309,0.8452,0.866,0.9354,0.9636,0.8944,0.9718,0.8864,0.7638,0.8864,0.7802,0.7958,0.9682,1.0,0.9258,0.9075,0.9129,0.8885,0.8563,0.866,0.9258,0.9129,0.7906,0.8452,0.8745,0.8044,0.9428,0.9045,0.9487,1.0,0.8771,0.9393,0.7746,0.8367,0.8944,0.9354,0.8452,0.9428,0.8944,0.8864,0.7071,0.7746,0.8452,0.9177,0.7559,0.9199,0.866,0.8165,0.9075,0.8864,0.7817,0.8367,0.8452,0.8819,0.9199,0.8864,0.8771,0.866,0.8528,0.8452,0.9718,0.922,0.866,0.8997,0.9045,0.866,0.9636,0.8745,0.8402,0.9129,0.7746,0.8452,0.8367,0.9129,1.0,0.8771,0.9309,0.8305,0.9258,0.9199,0.9258,0.8165,0.8341,0.8452,0.8402,0.866,0.9535,0.8944,0.9487,0.9535,0.9258,0.8528,0.8864,0.866,0.9014,0.7071,0.6667,0.8321,0.9014,0.8452,1.0,0.8864,0.8528,0.8528,0.7638,0.9487,0.8292,0.9258,0.8745,0.9129,0.9354,0.866,0.9661,0.9354,0.8018,0.9045,0.8272,0.866,0.7868,0.8165,0.9045,0.8341,0.8452,1.0,0.8165,0.8771,0.9381,0.9459,0.9089,0.8819,0.8885,0.7454,1.0,0.9177,0.7746,0.75,0.9258,0.8528,0.8864,0.9075,0.8165,0.8044,0.8018,0.8528,0.9405,0.8864,0.8165,0.9733,0.8885,0.8165,0.9608,0.8864,0.9045,1.0,0.8528,0.866,0.8944,0.9258,0.9428,0.9574,0.9075,0.7977,0.9535,0.8885,0.8452,0.9129,0.8292,1.0,0.8944,0.9608,0.9574,0.8165,0.8771,0.8987,0.767,0.8563,0.8745,0.7416,0.8847,0.7906,0.8597,0.8165,0.8819,0.8321,0.767,0.8898,0.8839,0.8906,0.8076,0.8563,0.8528,0.8864,0.9309,0.9405,0.9701,0.9129,0.8584,0.8944,0.9097,0.9045,0.9129,0.8864,0.9293,0.8771,0.7868,0.8367,0.7303,0.8944,0.8292,0.9199,0.9199,0.8076,0.8563,0.9129,0.9354,0.8018,0.9661,0.8771,0.9177,0.8528,0.8452,0.8847,0.8987,0.8885,0.8997,0.9045,0.8165,0.866,0.8452,0.9129,0.866,0.9199,0.9199,0.8614,0.9636,0.9075,0.8321,0.8584,0.8402,0.9459,1.0,0.8885,0.9535,0.866,0.8771,0.9063,0.8528,0.9258,0.8452,0.8944,0.9129,1.0,0.9608,0.8321,0.9354,0.866,0.9354,0.8018,0.9459,0.7071,0.5976,0.9309,0.9293,0.9428,0.9045,0.8771,0.8018,0.7845,0.9428,0.9747,0.767,0.8076,0.8,0.9487,0.8718,0.9075,0.7255,0.9487,0.9487,0.9701,0.8864,0.8044,0.9354,0.9258,0.8528,0.8819,0.8729,0.9487,0.8165,0.8165,0.9608,0.7559,0.8367,0.9089,1.0,0.879,0.866,0.9075,0.8771,0.8165,0.9309,0.8944,0.9806,0.9623,0.8165,0.8165,0.9487,0.8257,0.9293,0.6325,0.879,0.9393,0.8944,0.8987,0.8885,0.9129,0.8402,0.7977,0.7638,0.9487,0.9014,0.866,0.9636,0.866,0.8272,0.7338,0.9045,0.9381,0.9129,0.8367,1.0,0.866,0.8321,0.8341,0.9592,0.9747,0.866,0.8452,0.8416,0.8452,0.923,0.8018,0.8528,0.9354,0.7454,0.9428,0.8563,0.866,0.9199,0.8584,0.8062,0.8165,1.0,0.9258,0.8416,0.866,0.7746,0.866,0.8165,0.9428,0.9014,0.9293,0.866,0.866,0.9574,0.866,0.866,0.879,0.866,0.7746,0.9211,0.8165,0.8062,0.9325,0.9535,0.9258,0.9459,0.9014,0.9608,0.866,0.7947,0.7071,0.922,0.9393,0.7454,0.7906,0.8847,0.977,0.8756,0.0,0.8864
705,1008130,student,New York City,N,[8843543],"[Engineering - Mobile, VC, Business Operations]","['engineering', 'creative', 'app', 'network', ...",0.8944,0.9535,0.9393,0.9045,0.9258,0.8607,0.8416,0.9354,0.8367,0.8272,0.7338,0.8563,0.8944,0.7559,0.879,0.9333,0.9293,0.8367,0.9014,0.9574,0.8272,0.9636,0.8771,0.9636,0.9045,0.8997,0.9428,0.8944,0.7906,0.9636,0.9258,0.8944,0.9393,0.7071,0.9682,0.7338,0.8864,0.8898,0.8165,0.8165,0.9293,0.9636,0.9608,0.9574,0.977,0.8292,0.8898,1.0,0.9428,0.9393,0.8819,0.9014,0.8864,0.9097,0.9075,0.9592,0.8819,0.8389,0.9733,0.9354,0.9535,0.6236,0.9459,0.8729,1.0,0.977,0.8452,0.8906,1.0,0.9354,0.8165,0.767,0.8452,0.8847,0.8402,1.0,0.866,0.9129,0.922,0.8094,0.8367,0.8944,0.9014,0.7071,0.8,0.9535,0.9258,0.8584,0.8062,0.9293,0.866,0.866,0.8584,0.9177,0.9045,0.866,0.9574,0.9682,0.8864,0.866,0.8165,0.8563,0.9487,0.7338,0.9718,0.8563,1.0,0.9333,0.9075,0.922,0.9701,1.0,0.879,0.8402,0.9354,0.8745,0.8584,0.9199,0.866,0.9309,1.0,0.8597,0.9393,0.8549,0.8819,0.866,0.9393,0.866,0.8452,0.8549,0.922,0.8864,0.8452,0.9701,0.9428,0.9045,0.8165,0.8246,0.9258,0.866,0.9199,0.9258,0.8018,0.8718,0.9199,0.9129,0.8885,0.9487,0.9075,0.8997,0.9014,0.9129,0.9325,0.7906,0.8944,1.0,0.8771,0.9027,0.9393,0.9459,0.9258,0.8771,0.9487,0.8549,0.8086,0.8528,0.8584,0.8864,0.9309,0.866,0.8997,0.7802,0.9608,0.9428,0.8416,0.922,0.9075,0.8944,0.8018,0.9045,0.9129,0.9129,0.9535,0.7802,0.9574,0.9177,0.8165,0.8819,0.8416,0.9381,0.978,1.0,0.922,0.8584,0.8771,0.8272,0.9636,0.9129,0.9636,0.866,1.0,0.8452,0.9258,0.9354,0.8367,0.7906,0.8257,0.866,0.7255,0.8165,1.0,0.9075,0.8819,0.9075,0.75,0.7746,0.8771,0.767,0.8452,0.866,0.8165,0.8528,0.8452,0.9258,0.9661,0.7817,0.9428,0.9574,0.9428,0.8367,0.8944,0.8563,0.8549,0.8044,0.9129,0.866,0.9309,0.9574,0.8528,1.0,0.9608,0.7609,0.8165,0.9535,0.8997,0.9177,0.9487,0.8944,1.0,0.9535,0.9014,0.9636,0.8584,0.8452,0.9574,0.8597,0.9354,0.8864,0.9045,0.8819,0.9199,0.9045,0.9129,0.6325,0.9393,0.9636,0.9014,0.8864,0.8528,0.7947,0.9487,0.9309,0.9129,0.9535,0.9574,0.9469,0.8944,0.8367,0.866,0.8367,0.9701,0.9393,0.8864,0.8864,0.8847,1.0,0.8402,0.8292,0.8864,0.8528,0.8076,0.8498,0.9199,0.9487,0.7303,0.8367,0.9574,1.0,0.9487,0.7276,1.0,0.8771,0.8819,0.7977,0.9574,0.9535,0.9535,0.9535,0.8944,0.9574,0.6831,0.8498,1.0,0.9487,0.9258,0.8944,0.8452,0.7906,1.0,0.9258,0.9661,0.9718,0.9636,0.8165,0.9258,0.8341,0.8367,0.9354,0.9129,0.9636,0.9701,0.9574,0.9177,0.8165,0.9574,0.9636,0.9129,0.7906,0.9258,0.9075,0.8402,1.0,0.7977,0.8944,0.9682,0.9199,0.9393,0.8367,0.9487,0.9661,1.0,0.9258,0.8165,1.0,0.8864,0.7746,0.6325,1.0,0.8885,0.8452,0.9608,0.866,0.8165,0.9701,0.8452,0.8165,0.9487,0.7868,0.8819,0.9608,0.9636,0.8771,0.9574,0.9535,0.8864,0.9718,0.8944,0.866,0.9258,0.8528,0.9014,0.8864,0.9701,0.8402,0.9129,0.8367,0.8452,1.0,0.9574,0.9535,0.9199,0.8563,0.9285,0.9258,0.8771,0.9636,0.9428,0.8847,0.9636,0.8044,0.866,0.9045,0.8944,0.9487,0.9045,0.9258,0.9535,0.8864,0.9014,1.0,0.7638,0.7454,0.9199,0.75,0.9258,0.9459,0.8864,0.8528,1.0,0.9129,0.9487,0.9014,0.9258,0.8745,0.9574,0.9682,0.866,0.9309,0.9014,0.8452,0.9045,0.8885,0.8944,0.7868,0.8563,0.9535,0.8597,0.8452,0.7746,0.8944,0.7845,0.9381,0.8885,0.9325,0.9428,0.9459,0.7817,0.8367,1.0,0.8944,0.866,0.8452,1.0,0.9258,0.9075,0.7638,0.8044,0.8452,0.9535,0.9405,0.9636,0.9428,0.9459,0.8584,0.8944,0.9199,0.8452,0.8528,0.9428,0.9045,0.7071,0.9487,0.9636,0.9428,0.9574,0.9393,0.9535,0.9535,0.8885,0.8864,0.9574,0.75,0.9199,0.9661,0.8771,0.8165,0.7071,0.9199,0.8771,0.767,0.8944,0.9393,0.7746,0.8597,0.9014,0.8847,0.8729,0.9718,0.8771,0.8745,0.8898,0.8839,0.9097,0.7802,0.7746,0.8528,0.9063,0.8944,0.8987,0.9393,0.9574,0.8272,0.8944,0.851,0.8528,1.0,0.9258,0.7687,0.8771,0.8729,0.866,0.8165,0.9165,0.866,0.8987,0.9608,0.8341,0.9309,0.8819,1.0,0.9258,0.9309,0.8321,0.9177,0.7977,0.9258,0.8847,0.9199,0.8885,0.9759,0.9045,0.8498,0.866,0.9258,0.9129,0.8898,0.8987,0.8321,0.8799,0.8864,0.8745,0.8321,0.8885,0.9075,0.7947,0.9535,0.7609,1.0,0.9354,0.8771,0.866,0.9293,0.9258,0.8864,0.9487,1.0,0.8452,0.9608,0.9608,0.9354,0.9574,0.866,0.8864,1.0,0.75,0.8018,0.8165,0.8257,0.923,0.9045,0.8321,0.8864,0.8321,0.9718,0.9487,0.8044,0.8597,0.8485,0.9487,0.9381,0.8745,0.7255,0.9487,0.9487,0.9075,0.9636,0.9075,1.0,0.9636,0.8528,0.8498,0.8729,0.8944,0.7071,0.7454,0.9405,0.7868,0.9487,0.978,0.8944,0.9293,0.9014,0.8745,0.9199,0.8563,0.9661,0.8563,0.9608,0.923,0.8819,0.6831,0.8944,0.879,0.977,0.8165,0.8528,0.8745,0.9487,0.9199,0.7947,0.9574,0.9075,0.9045,0.8165,0.8367,0.9014,0.9574,0.9636,0.866,0.7609,0.7338,0.8528,0.8718,0.9354,0.8944,0.9045,0.9574,0.9199,0.8076,0.9381,0.922,0.7906,0.8452,0.8416,0.8165,0.9623,0.8864,0.9535,0.9682,0.7817,0.9718,0.9661,0.9354,0.9608,0.9733,0.8062,0.866,0.7385,0.9449,0.8416,0.866,0.9487,0.866,0.8165,0.9027,0.9354,0.9045,0.9014,0.9014,0.9574,0.8062,0.8944,0.9293,0.9354,0.7746,0.9045,0.8389,0.866,0.9089,0.8528,0.8729,0.9459,0.8292,0.8321,0.8416,0.7947,0.7559,0.9487,0.9393,0.7454,0.8898,0.8597,0.879,0.9309,0.8367,0.0


In [39]:
df_distances.to_csv('outputs/users_distances.csv')

# Part 7: Allocation Algo

# Part 7-A: Reload Packages and user_distances/score matrix

I am assuming that we can run this separately, that's why i'm loading the packages again

In [1]:
import pandas as pd
import numpy as np
import random
import ast

pd.set_option('display.max_columns', None)
pd.options.mode.chained_assignment = None  # default='warn'

In [110]:
df_distances = pd.read_csv('outputs/users_distances.csv', header = 0, index_col=0)
df_distances['topics_all'] = [ast.literal_eval(x) for x in df_distances['topics_all']]
df_distances['str_combined'] = [ast.literal_eval(x) for x in df_distances['str_combined']]
df_distances['past_advisors'] = [ast.literal_eval(x) for x in df_distances['past_advisors']]
df_distances.head(2)

Unnamed: 0,nyc_id,userType,location,is_vip,past_advisors,topics_all,str_combined,8843543,7755085,9202605,7666762,432418,3877105,1020980,7896731,6961087,6402058,2290089,1148025,4999309,4369680,382570,284637,8187296,8476629,7306029,5652660,204780,5677744,2593443,4601513,4878277,2620815,3288117,5708352,5474030,4046542,4340193,7474138,3712338,352403,2348870,8526469,7743463,6459524,9929858,9880082,1518524,4788367,8542359,997987,4629976,5064585,4755310,7209834,606005,5017981,5247078,8902800,9134615,7048387,6751671,1348913,6937862,8064126,8436578,8910889,3230467,3140161,3667774,4192795,8301349,8033043,5166603,7485700,9540732,4958919,8561114,3122679,8238975,2562280,7987549,6414426,4101601,1983482,878483,6159047,1557244,6113184,2149492,796123,6672031,4933667,625916,3179761,4914479,2956628,6234328,4558259,6966024,6742971,1497213,3569615,1187639,3758768,7681692,3167915,7792027,9462300,1453483,2598156,477453,3771144,813333,8841456,7679971,3116521,4601941,6306797,3195584,2627748,2169538,4999579,2235203,6437056,2308071,2368457,1390411,7694741,3521250,3206074,301431,2669767,7518538,2256585,94522,6955519,9959606,2037455,2526473,8359351,269209,2825496,3511047,1614397,2659801,4410236,515652,4818902,8219730,7918230,5108683,5238740,7413581,6286127,9113243,3805339,2179494,6059897,9998462,3679840,206547,6117465,9957593,6807435,1739284,9178381,7257843,2109474,8699149,8384707,2360585,3965641,9076155,9318559,6119200,7333768,8697016,7023001,757721,537278,7468074,8175409,8788183,6446789,1857735,7650470,6495154,2611714,710683,152075,2198911,6788313,9853788,9808460,6008056,5240141,8907470,9037441,8912858,6581843,9486722,1110483,4358079,9356485,2271359,7915205,5800707,2794321,2427696,6535431,5402293,8693468,4236687,427550,7888951,2826824,6928588,2527975,5493738,7575883,1597931,1575847,1781688,372231,2282239,4319835,4920915,1484378,2299050,4846292,4336956,5786211,7893434,3895624,2859992,7198024,3423783,6675599,9576956,6126279,5686548,1825469,9700647,2868050,4702672,1146067,4015930,4401457,9546947,94925,1190509,2932188,3266779,8907446,7569840,1746835,3382302,806560,9889124,8560400,6993819,1717171,1758129,3648059,654011,2654860,4409144,345140,7904109,656623,4360626,2373514,6037605,4915718,7945608,381348,9613380,1043372,2004698,8364846,8281303,1555515,3062857,6142470,6025522,2631274,8012883,9664092,7293490,4295994,8811878,4278733,6294925,4723827,8619826,2292042,3960526,5995636,3003753,6696486,6342552,1444202,4885804,4715334,8225142,8653095,1977774,5020397,6314809,6749269,2001782,2569801,4392229,6138397,4095341,2326281,2674601,6272321,7487443,8810505,1085840,8753052,7423155,8633724,6815031,710220,3874565,8985652,7158129,8733709,5064615,2031736,529104,3088834,184099,7680557,8061648,2326949,5205644,6512033,2981141,7601225,822710,2603797,8087478,9938106,3361150,4456308,7594522,9469995,6045830,746096,8259152,422067,7423610,9556430,4214457,3490399,6458334,8846614,6545230,2926356,311409,7324067,5769802,9258719,638772,5485758,6160406,1888196,742264,93981,7958106,8190906,1901532,8632938,2898324,6948891,3509690,7465231,7881557,2320522,1171955,5343640,6617302,2262694,3340724,4604046,2130677,7258728,5023302,7115544,629294,3906792,7841905,4316201,5353925,5978678,4605980,4715885,4923495,125254,2270136,3046384,1905636,1533613,1574686,9973649,436488,2513052,1372385,5791689,7936400,9391125,9773408,5810449,5311075,3092112,1106591,3684523,5182726,7926790,5175387,9389960,428794,3967508,1220914,4345907,5884571,6290417,27779,1579045,1446459,9413760,7110487,2663308,3658078,5880932,8294767,280351,5694419,9224675,7210977,2340455,2452387,2012792,3427395,4187508,4304558,9471033,7748650,385275,9498286,5624053,4686614,8079379,8195703,7828150,1550130,8969696,3032368,1325678,2588879,3703222,6849726,7432662,6778076,6618920,6004942,7496115,6545500,7517791,2799767,5682257,8236719,9658179,7031751,4891992,1963257,8813293,4255011,9835021,5997546,6025427,3398687,1041157,9684715,8769575,5346125,2258308,8923238,7967766,9249359,704895,2185414,9382336,3018819,916644,4119888,8835562,3008395,2212759,9969727,8949839,8265769,4404402,1443873,9990971,9593316,7950567,3828722,7601152,2991448,6512271,2903590,2659610,4077631,2493337,264160,8577110,5766040,2146133,6493111,7638660,7258247,7657509,9932727,7245777,692601,2856925,2352193,2815849,3047315,9642528,3861671,9838132,4075604,4075518,7591652,7908743,9012509,5757510,1005387,4768438,918358,7524608,9934813,6541993,1449114,606407,7931432,4735198,5695334,6550439,8445244,746687,8725761,7670905,701771,9875733,3379787,1842470,983794,8658583,9697676,1468575,9853865,1784635,760211,4667134,9116597,287256,1714194,6791590,2773262,8461802,3760451,4973934,8971747,6287556,6272170,1619149,1256384,7606837,3004297,4514606,2609078,6156693,5206440,4018457,8477772,5212929,5849482,655806,8183052,2880921,506870,3328854,8777873,5357257,1396492,5678088,3589593,7607153,4588539,2951891,4200010,4648718,2696094,4989097,2169312,7658012,5070048,792212,3557154,9648543,8347381,1785441,2330592,4282915,7280759,7414746,1771221,2555756,6935947,4521023,6514761,3355184,9660873,6088914,8233640,8026321,6931189,7830992,245037,2679141,3057273,5682172,5529070,2228699,6922565,7120601,2900118,8633469,5950948,6523262,8363241,3431960,7594141,7772712,8591435,1665700,1444831,4133681,2906883,8060776,5308660,5432977,1065233,8700849,7695512,5458104,7084778,3717562,3021601,6686909,1320940,4453336,1667655,3595011,483768,7050318,6100956,2970600,3929247,3571695,8217139,6075595,9546179,8420826,7184217,5151812,9785062,3465395,5338447,6707682,8387107,746643,1232766,9732127,2160805,646836,7258638,9079878,8199153,5175701,4128637,9455964,4989182,3945611,9332538,1495788,1246537,1788831,2749455,4306743,5137921,9707252,7076461,1359054,3977768,1008130
0,8843543,professional,New York City,Y,[],"[UX/UI, Storytelling/Brand, Media/Content]","[app, hand, storytelling/brand, mobile, red, w...",0.0,0.9045,0.9393,0.9045,1.0,0.9623,0.9354,0.866,0.8944,0.8584,0.9608,0.9661,1.0,0.9258,0.9045,1.0,0.977,0.922,0.9014,0.9574,0.8885,0.9636,0.9199,0.9258,0.9045,0.9512,0.9428,0.8944,0.7071,1.0,0.9258,0.8944,1.0,0.9258,0.9354,0.9199,0.8864,0.9789,0.8944,0.8997,1.0,0.9636,0.9199,0.9789,1.0,0.9354,0.9789,0.9129,0.8819,0.9701,0.8498,0.9014,0.9258,0.9285,1.0,0.9592,0.9428,0.9428,1.0,0.8292,0.8528,0.8498,0.9733,0.9258,0.9045,0.9293,0.8864,0.9285,0.9608,0.9354,0.9428,0.9393,0.8452,0.978,0.9075,0.9682,0.9354,0.866,0.9747,0.9469,0.922,1.0,0.8292,1.0,0.9592,0.879,0.9636,0.9177,0.922,0.9535,1.0,1.0,0.9459,0.9459,0.9374,0.9014,0.8165,0.9354,0.9636,0.9487,0.9428,0.9832,0.8944,0.8771,0.9129,0.8944,0.9354,0.9672,0.9075,0.922,0.9075,0.8321,0.9045,0.9075,0.866,1.0,0.9177,1.0,0.9682,1.0,0.9045,0.978,0.9393,0.9608,0.923,0.9014,0.9075,0.9354,0.9759,0.9199,0.922,0.9258,1.0,0.9701,0.9428,0.9535,0.9428,0.8944,0.9636,0.9354,0.9199,0.8452,0.9258,0.9381,0.8771,0.866,0.9459,0.9487,0.9075,0.9258,0.7906,0.9354,0.9325,0.9354,1.0,1.0,0.8321,0.9428,0.9393,0.9459,0.9258,0.9608,0.9747,0.9405,0.9608,0.9293,0.9459,0.8018,0.9661,0.9129,1.0,0.978,1.0,1.0,1.0,0.9487,0.9393,0.8367,0.8452,0.9293,0.9574,0.9129,0.9535,0.9325,0.9574,0.8885,0.8819,1.0,0.9354,0.9165,0.978,0.8944,0.9487,0.9733,0.7845,0.8885,0.8864,1.0,1.0,0.866,1.0,0.8452,0.9856,0.9574,0.8367,0.9014,0.9045,0.9449,0.8885,0.9309,0.8944,0.8402,0.9129,0.9075,0.9014,0.9381,0.9608,0.9393,0.9636,0.922,0.8898,0.8528,0.9258,0.7559,1.0,0.9129,0.8819,0.9574,0.9428,0.8944,0.9487,0.9661,0.9199,0.9075,0.9428,0.9574,0.9661,1.0,0.7385,0.9309,0.9608,0.8885,0.9129,1.0,0.9258,0.9733,1.0,0.9309,0.9428,1.0,0.9682,0.9258,0.9177,0.9512,1.0,0.9325,0.9789,0.9636,0.8528,0.8819,0.9608,0.9535,0.9718,0.9309,0.9075,0.9258,0.7906,0.9258,0.7977,0.9177,0.9747,1.0,0.866,0.9535,0.9574,0.9649,0.8944,0.8367,0.922,0.8367,1.0,0.9701,0.9258,0.8864,0.9555,0.9258,0.9393,0.866,0.8452,0.9045,0.8847,0.8819,0.9608,0.8944,0.8944,0.922,0.9574,0.9354,0.9487,0.9075,0.9045,0.9608,0.7454,0.9293,1.0,0.9045,0.9535,1.0,0.8563,0.9574,0.9309,1.0,0.9309,1.0,0.8864,0.9309,0.7559,0.9354,0.9354,0.8864,0.9661,0.9718,0.9258,0.8165,0.8864,0.9555,0.9487,1.0,0.9574,0.9258,1.0,1.0,0.9733,1.0,0.9574,1.0,0.9574,0.866,0.9258,0.9701,0.9393,0.8819,0.6742,1.0,0.9682,1.0,1.0,0.8944,1.0,1.0,0.866,0.9258,0.9428,0.9309,0.9258,0.7746,0.7746,0.7559,0.9459,1.0,1.0,0.9789,0.8819,1.0,0.9636,0.8819,0.8367,0.9512,0.9428,1.0,1.0,0.9608,0.9574,1.0,1.0,0.9129,0.8944,1.0,0.8452,0.9045,0.9682,1.0,0.9393,0.8044,1.0,0.8367,0.8864,0.8944,1.0,0.7977,1.0,1.0,0.9649,0.8864,0.9199,0.9258,0.8819,0.9555,0.8864,0.9393,0.9129,0.9045,0.9309,0.9487,0.8528,0.9258,0.9045,0.9636,0.8292,0.9682,0.866,0.8819,1.0,0.9354,0.8452,0.9459,0.9258,0.7977,0.9045,0.8165,0.8944,0.866,1.0,0.9393,0.9129,1.0,1.0,0.9309,0.9354,0.9258,1.0,0.9177,0.9487,0.9512,0.9661,0.8528,0.978,0.9512,1.0,0.8944,0.9199,1.0,0.9733,1.0,0.9428,0.9459,0.8819,0.9487,0.9177,1.0,0.9014,0.9512,0.9535,0.9636,0.9393,0.9129,0.9393,1.0,0.9045,0.9405,1.0,1.0,0.9177,0.9177,0.9661,0.9608,0.9258,0.9045,0.9428,0.977,0.9574,1.0,0.9636,0.8819,0.9574,1.0,1.0,0.9045,0.9459,0.8864,0.9129,0.9014,0.9608,0.9309,0.8771,1.0,0.7638,0.9608,0.9199,0.8745,1.0,0.9393,0.9487,0.9555,0.9682,0.9555,1.0,0.9428,0.9199,1.0,0.9354,0.9682,0.9649,0.9089,0.8944,0.977,0.9636,0.9661,1.0,0.9393,0.9574,0.9177,0.9487,0.9649,1.0,1.0,0.7071,0.9535,0.9405,0.9512,0.922,0.9309,0.9592,0.9354,0.9806,0.9199,0.9555,1.0,0.8819,0.9682,0.9258,0.9309,0.8321,1.0,0.9535,1.0,0.9089,0.9405,1.0,0.9759,1.0,0.9428,0.9354,1.0,0.9487,0.9789,0.8987,1.0,0.9672,1.0,0.9075,1.0,0.9177,0.9393,0.9459,0.9535,0.9459,1.0,0.7906,1.0,0.9258,0.9293,0.8997,0.9258,1.0,0.9718,1.0,0.8771,0.9608,1.0,0.9129,0.866,1.0,0.9459,0.9014,0.8452,0.9309,0.9293,0.9623,0.8528,0.9806,1.0,0.8321,0.9718,0.9747,0.8745,0.9325,0.9165,1.0,0.9592,0.9701,0.9177,0.8944,1.0,0.8745,1.0,0.8745,0.866,0.9636,0.9535,0.9428,0.8997,1.0,0.9129,0.9718,1.0,0.9258,0.9487,1.0,0.8563,1.0,0.9682,0.8402,0.9608,0.8944,0.9309,0.9661,0.9806,0.9813,0.8165,0.9309,1.0,0.9045,0.977,0.8563,0.9535,0.9393,0.9487,0.9608,0.9733,0.9129,0.8745,0.7977,0.8165,1.0,1.0,0.9574,0.9636,0.9354,0.9459,0.8771,0.8528,0.9592,0.9354,0.9747,0.9535,1.0,0.9608,0.9555,0.9592,0.9487,0.9354,0.9258,0.9574,0.8997,0.9623,0.8864,0.9045,0.9014,0.9129,0.9428,0.9661,0.9354,0.9608,1.0,0.922,0.9789,1.0,0.982,0.9354,1.0,1.0,0.9747,0.9258,0.923,1.0,0.9045,0.9014,1.0,1.0,0.9747,0.9487,0.9535,0.9682,0.922,0.9535,0.9623,0.922,0.9555,0.9535,1.0,1.0,0.9354,1.0,1.0,0.8885,0.9258,0.9747,0.9701,0.9129,0.9129,0.9325,0.9535,0.9487,0.922,0.9449
1,7755085,student,Somewhere else,N,[],"[Media/Content, Storytelling/Brand, Product Ma...","[management, music, product, app, polish, stor...",0.8563,0.0,0.9393,0.8528,1.0,0.8607,0.8898,0.7906,0.8944,0.8885,0.9608,0.9309,0.8944,0.8997,0.7977,0.9333,0.9535,0.8944,0.9354,0.9574,0.8272,0.9258,0.8771,0.8864,0.8528,0.8997,0.8819,0.8944,0.866,1.0,0.9258,0.8367,0.9393,0.8864,0.9014,0.9199,0.7071,0.9789,0.7746,0.8452,0.9535,1.0,0.8771,0.9354,0.9293,0.9354,0.9129,0.7638,0.8819,0.9075,0.8498,0.9014,0.8864,0.8906,0.9075,0.9592,0.7454,0.923,0.8885,0.7906,0.8528,0.8498,0.9733,0.8452,0.7977,0.9293,0.7559,0.871,0.7845,0.866,0.7817,0.8745,0.5976,0.9555,0.8402,0.866,0.8165,0.7638,0.922,0.9097,0.8367,0.922,0.9014,1.0,0.8944,0.8257,0.9258,0.8584,0.8062,0.8528,0.9747,0.9682,0.8272,0.8584,0.8528,0.866,0.866,0.866,0.8864,0.866,0.7817,0.9832,0.7746,0.7845,0.9129,0.9309,0.9354,0.9158,0.8044,0.922,0.8745,0.8771,0.879,0.8044,0.7906,0.9393,0.8272,0.8771,0.9682,1.0,0.7385,0.9325,0.8745,0.9199,0.8607,0.866,0.8402,0.866,0.9258,0.8549,0.8944,0.9258,0.8729,0.9075,1.0,0.8528,0.9428,0.8718,0.9636,0.8416,0.9608,0.8018,0.8452,0.8718,0.7338,0.9574,0.8584,0.9487,0.8745,0.9512,0.7906,0.9354,0.8847,0.8292,0.9487,0.9045,0.8321,0.8607,0.9393,0.8885,0.8452,0.8321,0.9487,0.9608,0.9199,0.7977,0.9177,0.8864,0.9661,0.9129,0.9258,0.978,0.8321,0.9813,1.0,0.9487,0.8402,0.7071,0.8018,0.8528,0.9354,0.9574,0.9045,0.9325,1.0,0.8272,0.6667,1.0,0.9129,0.8944,0.9325,0.8944,0.8944,0.9733,0.8771,0.8885,0.8864,0.9428,0.9636,0.866,0.9129,0.7071,0.9258,0.8898,0.8944,0.9014,0.8528,0.9449,0.8885,0.6831,0.8944,0.7276,0.8819,0.9075,0.866,0.8718,0.9405,0.8044,0.8864,0.8062,0.866,0.8528,0.9449,0.7559,0.9661,0.8498,0.8819,0.9129,0.8819,0.8367,0.8367,0.9309,0.8771,0.767,0.8498,0.8416,0.9309,0.9574,0.7977,0.7303,0.9199,0.8885,0.8165,0.9045,0.9512,0.9177,1.0,0.8563,0.8498,0.9045,0.9682,0.8864,0.8584,0.9512,0.866,0.8847,0.9574,0.8864,0.8528,0.8819,0.9199,0.9535,0.9129,0.9309,0.8402,0.8452,0.9014,0.8864,0.8528,0.9459,0.8944,1.0,0.9574,0.8528,0.866,0.9469,0.8165,0.7746,0.8367,0.8944,0.9393,0.9393,0.8018,0.8452,0.9325,0.8018,0.8044,0.866,0.9258,0.7977,0.8847,0.8165,0.8771,0.8367,0.8165,0.8944,0.9574,0.866,0.7071,0.8402,0.7977,0.8321,0.8165,0.9045,0.7638,0.9045,0.9045,0.8528,0.7746,0.866,0.8944,0.9129,0.8563,0.8367,0.9636,0.8563,0.8452,0.866,0.9014,0.9636,0.8944,0.8498,0.8452,0.866,0.8864,0.8847,0.8944,1.0,0.9574,0.8452,0.9075,0.9574,0.9459,0.9661,0.9574,0.8452,0.8165,0.866,0.8452,0.9393,0.9075,0.8165,0.8528,1.0,0.9354,0.9608,0.8745,0.8367,1.0,1.0,0.7906,0.9258,1.0,0.8944,0.8864,0.7071,0.8367,0.8452,0.9459,0.9258,0.9199,0.9789,0.8819,1.0,0.8864,0.8165,0.7071,0.8729,0.9129,0.9608,0.8864,0.8771,1.0,0.9535,0.8864,0.9129,0.8062,1.0,0.8452,0.7977,0.9354,0.8864,0.9393,0.8044,0.8819,0.8367,0.8452,0.7071,0.9574,0.9045,0.8771,0.9661,0.9649,0.9258,0.8771,0.8452,0.8165,0.8341,0.7559,0.9393,0.9129,0.9293,0.9309,0.8367,0.8528,0.9258,0.9045,0.9258,0.9014,0.866,0.9129,0.8165,0.9608,0.9014,0.8018,0.9177,0.8018,0.7385,0.7977,0.7071,0.8944,0.8292,0.7559,0.9393,0.866,0.9682,0.9354,0.9309,0.9354,0.8452,0.9535,0.8272,0.9487,0.8729,0.8944,0.7977,0.978,0.8452,1.0,0.8563,0.8321,0.9798,0.8272,0.9555,0.8819,0.9177,0.8165,0.9487,0.9177,0.8367,0.866,0.9258,0.9045,0.9258,0.8745,0.866,0.8745,0.8864,0.9045,0.8086,1.0,1.0,0.8584,0.8272,0.9309,0.9608,0.8452,0.7977,0.7454,0.9535,0.9574,0.8367,0.8864,0.9428,1.0,0.9393,0.9535,0.9535,0.8885,0.8452,0.9129,0.75,1.0,1.0,0.8321,0.9574,0.8165,0.8771,0.9199,0.8044,1.0,0.9075,0.8062,0.9325,0.9014,0.9089,0.9759,0.9129,0.9199,0.9393,0.866,0.9186,0.851,0.8847,0.8165,0.9045,0.9449,0.8944,0.9806,0.7276,0.9574,0.7609,0.8944,0.9469,0.8528,0.866,0.8452,0.8528,0.8987,0.8165,0.922,0.8165,0.9165,0.8292,0.9405,0.8771,0.9089,0.9661,0.7817,0.8292,0.7559,0.8563,0.7845,0.8885,0.8528,0.9258,0.8597,0.9199,0.9177,0.9258,1.0,0.8819,0.8898,1.0,0.9129,0.9574,0.8321,0.9199,0.9672,0.8864,0.7276,0.9199,0.8584,0.8745,0.8584,0.8528,0.8885,0.8528,0.7906,0.8771,0.8864,0.7977,0.8997,0.8018,1.0,0.9129,0.9258,0.8771,0.9199,0.9354,0.9129,0.866,0.9636,0.8584,0.866,0.7559,0.8944,0.879,0.923,0.8528,0.9806,0.9258,0.7845,0.8819,0.8367,0.8745,0.8597,0.9592,0.8944,0.8944,0.9701,0.9459,0.8944,0.9487,0.9075,0.9258,0.8044,0.7906,0.9258,0.9293,0.8498,0.8997,0.8367,0.8165,0.9428,0.8987,0.8729,1.0,0.978,0.9309,1.0,0.75,0.9075,0.8321,0.8563,1.0,0.8944,0.8771,0.923,0.9428,0.8944,0.8944,0.9045,0.9293,0.7746,0.879,0.9075,0.8367,0.8987,0.8885,0.866,0.9393,0.9045,0.6455,0.922,0.9682,0.9574,0.9636,0.9354,0.8584,0.7338,0.8528,0.9381,0.8416,0.922,0.9535,0.9129,0.8771,0.9089,0.9165,0.866,0.7906,0.7559,0.9129,0.8729,0.923,0.8018,0.9045,0.9014,0.7817,0.8819,0.9309,0.9354,0.9608,0.9459,0.8062,0.9574,0.9535,0.9636,0.866,0.9014,0.9487,0.866,0.8729,0.8389,0.7906,0.8528,0.9354,0.866,0.9129,0.9487,0.8944,0.8528,0.9682,0.866,0.9535,0.9428,0.8944,0.8597,0.8528,0.9759,0.9733,0.9014,0.8321,0.9574,0.8584,0.8452,0.8944,0.8745,0.8165,0.9129,0.9555,0.9293,0.8563,0.922,0.9636


# Part 7-B: Create the dataframes to be used for the Allocation algo

In [3]:
#create the 3 dataframes which will be used for the allocation algo
def create_dfs_matching_algo(df_distances, wave):
    df_distances_stud = df_distances[df_distances['userType'] == 'student'].copy()
    df_distances_prof = df_distances[df_distances['userType'] == 'professional'].copy()
    
    df_matches = pd.DataFrame({'stud_id':[x for x in df_distances_stud['nyc_id']], 'stud_vip':'string',
                               'stud_past_advisors':'string', 'prof_id':'string', 'prof_vip':'string',
                               'location':'string', 'wave':str(wave), 'match_score':0.0001, 'stud_loc':'string', 
                               'prof_loc':'string', 'stud_topics':'string', 'prof_topics':'string',
                               'matched_topics':'string', 'stud_keywords':'string', 'prof_keywords':'string', 
                               'matched_words':'string'})
    
    df_matches = df_matches[['stud_id', 'stud_vip', 'stud_past_advisors', 'prof_id', 'prof_vip', 'location', 'wave', 
                             'match_score', 'stud_loc', 'prof_loc', 'stud_topics', 'prof_topics', 'matched_topics',
                             'stud_keywords', 'prof_keywords', 'matched_words']]
    
    return df_distances_stud, df_distances_prof, df_matches

# This is where you set the wave name/number in string format

In [4]:
df_distances_stud, df_distances_prof, df_matches = create_dfs_matching_algo(df_distances=df_distances, 
                                                                            wave='test_wave')

In [5]:
#check student dataframe
df_distances_stud.head(2)

Unnamed: 0,nyc_id,userType,location,is_vip,past_advisors,topics_all,str_combined,8843543,7755085,9202605,7666762,432418,3877105,1020980,7896731,6961087,6402058,2290089,1148025,4999309,4369680,382570,284637,8187296,8476629,7306029,5652660,204780,5677744,2593443,4601513,4878277,2620815,3288117,5708352,5474030,4046542,4340193,7474138,3712338,352403,2348870,8526469,7743463,6459524,9929858,9880082,1518524,4788367,8542359,997987,4629976,5064585,4755310,7209834,606005,5017981,5247078,8902800,9134615,7048387,6751671,1348913,6937862,8064126,8436578,8910889,3230467,3140161,3667774,4192795,8301349,8033043,5166603,7485700,9540732,4958919,8561114,3122679,8238975,2562280,7987549,6414426,4101601,1983482,878483,6159047,1557244,6113184,2149492,796123,6672031,4933667,625916,3179761,4914479,2956628,6234328,4558259,6966024,6742971,1497213,3569615,1187639,3758768,7681692,3167915,7792027,9462300,1453483,2598156,477453,3771144,813333,8841456,7679971,3116521,4601941,6306797,3195584,2627748,2169538,4999579,2235203,6437056,2308071,2368457,1390411,7694741,3521250,3206074,301431,2669767,7518538,2256585,94522,6955519,9959606,2037455,2526473,8359351,269209,2825496,3511047,1614397,2659801,4410236,515652,4818902,8219730,7918230,5108683,5238740,7413581,6286127,9113243,3805339,2179494,6059897,9998462,3679840,206547,6117465,9957593,6807435,1739284,9178381,7257843,2109474,8699149,8384707,2360585,3965641,9076155,9318559,6119200,7333768,8697016,7023001,757721,537278,7468074,8175409,8788183,6446789,1857735,7650470,6495154,2611714,710683,152075,2198911,6788313,9853788,9808460,6008056,5240141,8907470,9037441,8912858,6581843,9486722,1110483,4358079,9356485,2271359,7915205,5800707,2794321,2427696,6535431,5402293,8693468,4236687,427550,7888951,2826824,6928588,2527975,5493738,7575883,1597931,1575847,1781688,372231,2282239,4319835,4920915,1484378,2299050,4846292,4336956,5786211,7893434,3895624,2859992,7198024,3423783,6675599,9576956,6126279,5686548,1825469,9700647,2868050,4702672,1146067,4015930,4401457,9546947,94925,1190509,2932188,3266779,8907446,7569840,1746835,3382302,806560,9889124,8560400,6993819,1717171,1758129,3648059,654011,2654860,4409144,345140,7904109,656623,4360626,2373514,6037605,4915718,7945608,381348,9613380,1043372,2004698,8364846,8281303,1555515,3062857,6142470,6025522,2631274,8012883,9664092,7293490,4295994,8811878,4278733,6294925,4723827,8619826,2292042,3960526,5995636,3003753,6696486,6342552,1444202,4885804,4715334,8225142,8653095,1977774,5020397,6314809,6749269,2001782,2569801,4392229,6138397,4095341,2326281,2674601,6272321,7487443,8810505,1085840,8753052,7423155,8633724,6815031,710220,3874565,8985652,7158129,8733709,5064615,2031736,529104,3088834,184099,7680557,8061648,2326949,5205644,6512033,2981141,7601225,822710,2603797,8087478,9938106,3361150,4456308,7594522,9469995,6045830,746096,8259152,422067,7423610,9556430,4214457,3490399,6458334,8846614,6545230,2926356,311409,7324067,5769802,9258719,638772,5485758,6160406,1888196,742264,93981,7958106,8190906,1901532,8632938,2898324,6948891,3509690,7465231,7881557,2320522,1171955,5343640,6617302,2262694,3340724,4604046,2130677,7258728,5023302,7115544,629294,3906792,7841905,4316201,5353925,5978678,4605980,4715885,4923495,125254,2270136,3046384,1905636,1533613,1574686,9973649,436488,2513052,1372385,5791689,7936400,9391125,9773408,5810449,5311075,3092112,1106591,3684523,5182726,7926790,5175387,9389960,428794,3967508,1220914,4345907,5884571,6290417,27779,1579045,1446459,9413760,7110487,2663308,3658078,5880932,8294767,280351,5694419,9224675,7210977,2340455,2452387,2012792,3427395,4187508,4304558,9471033,7748650,385275,9498286,5624053,4686614,8079379,8195703,7828150,1550130,8969696,3032368,1325678,2588879,3703222,6849726,7432662,6778076,6618920,6004942,7496115,6545500,7517791,2799767,5682257,8236719,9658179,7031751,4891992,1963257,8813293,4255011,9835021,5997546,6025427,3398687,1041157,9684715,8769575,5346125,2258308,8923238,7967766,9249359,704895,2185414,9382336,3018819,916644,4119888,8835562,3008395,2212759,9969727,8949839,8265769,4404402,1443873,9990971,9593316,7950567,3828722,7601152,2991448,6512271,2903590,2659610,4077631,2493337,264160,8577110,5766040,2146133,6493111,7638660,7258247,7657509,9932727,7245777,692601,2856925,2352193,2815849,3047315,9642528,3861671,9838132,4075604,4075518,7591652,7908743,9012509,5757510,1005387,4768438,918358,7524608,9934813,6541993,1449114,606407,7931432,4735198,5695334,6550439,8445244,746687,8725761,7670905,701771,9875733,3379787,1842470,983794,8658583,9697676,1468575,9853865,1784635,760211,4667134,9116597,287256,1714194,6791590,2773262,8461802,3760451,4973934,8971747,6287556,6272170,1619149,1256384,7606837,3004297,4514606,2609078,6156693,5206440,4018457,8477772,5212929,5849482,655806,8183052,2880921,506870,3328854,8777873,5357257,1396492,5678088,3589593,7607153,4588539,2951891,4200010,4648718,2696094,4989097,2169312,7658012,5070048,792212,3557154,9648543,8347381,1785441,2330592,4282915,7280759,7414746,1771221,2555756,6935947,4521023,6514761,3355184,9660873,6088914,8233640,8026321,6931189,7830992,245037,2679141,3057273,5682172,5529070,2228699,6922565,7120601,2900118,8633469,5950948,6523262,8363241,3431960,7594141,7772712,8591435,1665700,1444831,4133681,2906883,8060776,5308660,5432977,1065233,8700849,7695512,5458104,7084778,3717562,3021601,6686909,1320940,4453336,1667655,3595011,483768,7050318,6100956,2970600,3929247,3571695,8217139,6075595,9546179,8420826,7184217,5151812,9785062,3465395,5338447,6707682,8387107,746643,1232766,9732127,2160805,646836,7258638,9079878,8199153,5175701,4128637,9455964,4989182,3945611,9332538,1495788,1246537,1788831,2749455,4306743,5137921,9707252,7076461,1359054,3977768,1008130
1,7755085,student,Somewhere else,N,[],"[Media/Content, Storytelling/Brand, Product Ma...","[management, music, product, app, polish, stor...",0.8563,0.0,0.9393,0.8528,1.0,0.8607,0.8898,0.7906,0.8944,0.8885,0.9608,0.9309,0.8944,0.8997,0.7977,0.9333,0.9535,0.8944,0.9354,0.9574,0.8272,0.9258,0.8771,0.8864,0.8528,0.8997,0.8819,0.8944,0.866,1.0,0.9258,0.8367,0.9393,0.8864,0.9014,0.9199,0.7071,0.9789,0.7746,0.8452,0.9535,1.0,0.8771,0.9354,0.9293,0.9354,0.9129,0.7638,0.8819,0.9075,0.8498,0.9014,0.8864,0.8906,0.9075,0.9592,0.7454,0.923,0.8885,0.7906,0.8528,0.8498,0.9733,0.8452,0.7977,0.9293,0.7559,0.871,0.7845,0.866,0.7817,0.8745,0.5976,0.9555,0.8402,0.866,0.8165,0.7638,0.922,0.9097,0.8367,0.922,0.9014,1.0,0.8944,0.8257,0.9258,0.8584,0.8062,0.8528,0.9747,0.9682,0.8272,0.8584,0.8528,0.866,0.866,0.866,0.8864,0.866,0.7817,0.9832,0.7746,0.7845,0.9129,0.9309,0.9354,0.9158,0.8044,0.922,0.8745,0.8771,0.879,0.8044,0.7906,0.9393,0.8272,0.8771,0.9682,1.0,0.7385,0.9325,0.8745,0.9199,0.8607,0.866,0.8402,0.866,0.9258,0.8549,0.8944,0.9258,0.8729,0.9075,1.0,0.8528,0.9428,0.8718,0.9636,0.8416,0.9608,0.8018,0.8452,0.8718,0.7338,0.9574,0.8584,0.9487,0.8745,0.9512,0.7906,0.9354,0.8847,0.8292,0.9487,0.9045,0.8321,0.8607,0.9393,0.8885,0.8452,0.8321,0.9487,0.9608,0.9199,0.7977,0.9177,0.8864,0.9661,0.9129,0.9258,0.978,0.8321,0.9813,1.0,0.9487,0.8402,0.7071,0.8018,0.8528,0.9354,0.9574,0.9045,0.9325,1.0,0.8272,0.6667,1.0,0.9129,0.8944,0.9325,0.8944,0.8944,0.9733,0.8771,0.8885,0.8864,0.9428,0.9636,0.866,0.9129,0.7071,0.9258,0.8898,0.8944,0.9014,0.8528,0.9449,0.8885,0.6831,0.8944,0.7276,0.8819,0.9075,0.866,0.8718,0.9405,0.8044,0.8864,0.8062,0.866,0.8528,0.9449,0.7559,0.9661,0.8498,0.8819,0.9129,0.8819,0.8367,0.8367,0.9309,0.8771,0.767,0.8498,0.8416,0.9309,0.9574,0.7977,0.7303,0.9199,0.8885,0.8165,0.9045,0.9512,0.9177,1.0,0.8563,0.8498,0.9045,0.9682,0.8864,0.8584,0.9512,0.866,0.8847,0.9574,0.8864,0.8528,0.8819,0.9199,0.9535,0.9129,0.9309,0.8402,0.8452,0.9014,0.8864,0.8528,0.9459,0.8944,1.0,0.9574,0.8528,0.866,0.9469,0.8165,0.7746,0.8367,0.8944,0.9393,0.9393,0.8018,0.8452,0.9325,0.8018,0.8044,0.866,0.9258,0.7977,0.8847,0.8165,0.8771,0.8367,0.8165,0.8944,0.9574,0.866,0.7071,0.8402,0.7977,0.8321,0.8165,0.9045,0.7638,0.9045,0.9045,0.8528,0.7746,0.866,0.8944,0.9129,0.8563,0.8367,0.9636,0.8563,0.8452,0.866,0.9014,0.9636,0.8944,0.8498,0.8452,0.866,0.8864,0.8847,0.8944,1.0,0.9574,0.8452,0.9075,0.9574,0.9459,0.9661,0.9574,0.8452,0.8165,0.866,0.8452,0.9393,0.9075,0.8165,0.8528,1.0,0.9354,0.9608,0.8745,0.8367,1.0,1.0,0.7906,0.9258,1.0,0.8944,0.8864,0.7071,0.8367,0.8452,0.9459,0.9258,0.9199,0.9789,0.8819,1.0,0.8864,0.8165,0.7071,0.8729,0.9129,0.9608,0.8864,0.8771,1.0,0.9535,0.8864,0.9129,0.8062,1.0,0.8452,0.7977,0.9354,0.8864,0.9393,0.8044,0.8819,0.8367,0.8452,0.7071,0.9574,0.9045,0.8771,0.9661,0.9649,0.9258,0.8771,0.8452,0.8165,0.8341,0.7559,0.9393,0.9129,0.9293,0.9309,0.8367,0.8528,0.9258,0.9045,0.9258,0.9014,0.866,0.9129,0.8165,0.9608,0.9014,0.8018,0.9177,0.8018,0.7385,0.7977,0.7071,0.8944,0.8292,0.7559,0.9393,0.866,0.9682,0.9354,0.9309,0.9354,0.8452,0.9535,0.8272,0.9487,0.8729,0.8944,0.7977,0.978,0.8452,1.0,0.8563,0.8321,0.9798,0.8272,0.9555,0.8819,0.9177,0.8165,0.9487,0.9177,0.8367,0.866,0.9258,0.9045,0.9258,0.8745,0.866,0.8745,0.8864,0.9045,0.8086,1.0,1.0,0.8584,0.8272,0.9309,0.9608,0.8452,0.7977,0.7454,0.9535,0.9574,0.8367,0.8864,0.9428,1.0,0.9393,0.9535,0.9535,0.8885,0.8452,0.9129,0.75,1.0,1.0,0.8321,0.9574,0.8165,0.8771,0.9199,0.8044,1.0,0.9075,0.8062,0.9325,0.9014,0.9089,0.9759,0.9129,0.9199,0.9393,0.866,0.9186,0.851,0.8847,0.8165,0.9045,0.9449,0.8944,0.9806,0.7276,0.9574,0.7609,0.8944,0.9469,0.8528,0.866,0.8452,0.8528,0.8987,0.8165,0.922,0.8165,0.9165,0.8292,0.9405,0.8771,0.9089,0.9661,0.7817,0.8292,0.7559,0.8563,0.7845,0.8885,0.8528,0.9258,0.8597,0.9199,0.9177,0.9258,1.0,0.8819,0.8898,1.0,0.9129,0.9574,0.8321,0.9199,0.9672,0.8864,0.7276,0.9199,0.8584,0.8745,0.8584,0.8528,0.8885,0.8528,0.7906,0.8771,0.8864,0.7977,0.8997,0.8018,1.0,0.9129,0.9258,0.8771,0.9199,0.9354,0.9129,0.866,0.9636,0.8584,0.866,0.7559,0.8944,0.879,0.923,0.8528,0.9806,0.9258,0.7845,0.8819,0.8367,0.8745,0.8597,0.9592,0.8944,0.8944,0.9701,0.9459,0.8944,0.9487,0.9075,0.9258,0.8044,0.7906,0.9258,0.9293,0.8498,0.8997,0.8367,0.8165,0.9428,0.8987,0.8729,1.0,0.978,0.9309,1.0,0.75,0.9075,0.8321,0.8563,1.0,0.8944,0.8771,0.923,0.9428,0.8944,0.8944,0.9045,0.9293,0.7746,0.879,0.9075,0.8367,0.8987,0.8885,0.866,0.9393,0.9045,0.6455,0.922,0.9682,0.9574,0.9636,0.9354,0.8584,0.7338,0.8528,0.9381,0.8416,0.922,0.9535,0.9129,0.8771,0.9089,0.9165,0.866,0.7906,0.7559,0.9129,0.8729,0.923,0.8018,0.9045,0.9014,0.7817,0.8819,0.9309,0.9354,0.9608,0.9459,0.8062,0.9574,0.9535,0.9636,0.866,0.9014,0.9487,0.866,0.8729,0.8389,0.7906,0.8528,0.9354,0.866,0.9129,0.9487,0.8944,0.8528,0.9682,0.866,0.9535,0.9428,0.8944,0.8597,0.8528,0.9759,0.9733,0.9014,0.8321,0.9574,0.8584,0.8452,0.8944,0.8745,0.8165,0.9129,0.9555,0.9293,0.8563,0.922,0.9636
5,3877105,student,New York City,N,[],"[Business Operations, Engineering - Mobile, En...","[engineering, app, network, reading, anatomy, ...",0.9309,0.8257,0.9393,0.8528,0.9636,0.0,0.7906,0.866,0.7746,0.8584,0.8771,0.8944,0.9487,0.8165,0.7977,0.9672,0.7687,0.8062,0.866,1.0,0.8584,0.9258,0.7338,0.8452,0.8528,0.7559,1.0,0.7071,0.7906,0.8452,0.9258,0.8367,0.8745,0.7559,0.9682,0.7845,0.7559,0.9354,0.8165,0.7237,0.9535,0.8864,0.8321,0.9354,0.9293,0.75,0.9354,0.9129,1.0,0.9075,0.9428,0.866,0.7559,0.9097,0.8044,0.9798,0.8819,0.7698,0.8885,0.9014,0.9535,0.8165,0.8885,0.6901,0.9535,0.977,0.7559,0.8305,0.9608,0.9014,0.6667,0.7276,0.8018,0.978,0.686,0.9354,0.8165,0.866,0.8944,0.7878,0.7746,0.9747,0.9014,0.9129,0.7746,0.9293,0.9636,0.6882,0.8062,0.8528,0.9487,0.75,0.7609,0.7609,0.8876,0.7071,0.9574,0.9354,0.9636,0.7746,0.8498,0.8944,0.8944,0.7845,0.9428,0.8944,0.9354,0.9672,0.7276,0.9487,0.9075,0.9608,0.8528,0.6417,0.866,0.686,0.7609,0.8771,0.9014,0.8165,0.7977,0.8341,0.8745,0.8549,0.8165,0.7906,0.9075,0.7906,0.7559,0.8549,0.7746,0.7559,0.9512,0.8745,0.7454,0.7977,0.8498,0.8,0.8018,0.736,0.7338,0.9258,0.8018,0.7746,0.9608,0.7638,0.7255,0.7071,0.9075,0.8452,0.9354,0.8416,0.9089,0.7906,0.7746,0.9535,0.8321,0.8165,0.8745,0.8584,0.866,0.7845,0.922,0.8771,0.8549,0.7687,0.8885,0.7559,0.8563,0.7638,0.8452,0.8847,0.8771,0.9027,0.8898,0.8367,0.8745,0.8944,0.8452,0.7385,0.9354,0.7638,0.7977,0.7802,0.9129,0.9459,0.8165,0.9428,0.9129,0.8944,0.9325,0.9661,0.9487,0.8584,0.8771,0.7947,0.9636,0.8165,0.8864,0.7638,0.9574,0.7071,0.8944,0.9574,0.8367,0.75,0.7687,0.8452,0.7255,0.6831,0.9487,0.767,0.7071,0.9393,0.7071,0.8485,0.7596,0.8402,0.7071,0.8367,0.7906,0.8528,0.8452,0.9258,0.8165,0.7817,0.9129,0.866,0.8165,0.7071,0.8062,0.9309,0.7338,0.8402,0.9428,0.7638,0.7303,0.866,0.9045,0.8563,0.8321,0.7255,0.8165,0.7385,0.8729,0.8584,0.8367,0.8944,0.9129,0.7977,0.7906,0.8018,0.6882,0.8165,0.7638,0.7802,0.8165,0.8452,0.9045,0.6667,0.7845,0.7385,0.9428,0.7303,0.9075,0.9258,0.9014,0.8864,0.7385,0.7609,0.8944,0.9661,0.7638,0.9535,0.8165,0.8906,0.8563,0.8367,0.7071,0.8367,0.9393,0.8402,0.8452,0.8018,0.8847,0.9258,0.7276,0.7906,0.8452,0.7687,0.8847,0.7071,0.7845,0.7071,0.7746,0.5916,0.866,0.9354,0.8367,0.8044,0.9045,0.7845,0.8819,0.8257,0.866,0.8528,0.8528,0.7977,0.9309,0.9574,0.8165,0.8498,0.9661,0.7071,0.9636,0.8165,0.8452,0.7071,0.9682,0.9636,0.8563,0.9428,0.8452,0.7071,0.8018,0.8076,0.8756,0.9354,0.866,0.8864,0.9075,0.9129,0.9459,0.6831,0.8165,0.8452,0.866,0.7071,0.8864,0.8745,0.767,1.0,0.8528,0.9487,0.9354,0.8321,0.8745,0.7746,0.7746,0.8165,1.0,0.8018,0.8819,0.9661,0.8018,0.7746,0.7071,0.9258,0.7947,0.7559,0.9199,0.866,0.6667,0.8745,0.8018,0.7817,0.8944,0.8452,0.8165,0.8321,0.8864,0.9199,0.8165,0.8528,0.7559,0.9718,0.922,0.7906,0.8997,0.9045,0.9014,0.8452,0.8402,0.8402,0.8498,0.7746,0.7559,0.8944,0.8165,0.9535,0.8771,0.8563,0.871,1.0,0.8321,0.8864,0.8819,0.7518,0.8452,0.767,0.8165,0.9293,0.8165,0.8367,0.9535,0.8452,0.8528,0.7559,0.866,0.9354,0.7071,0.6667,0.7845,0.8292,0.8864,0.9733,0.7559,0.7977,0.8528,0.866,0.8367,0.8292,0.9258,0.8044,0.866,0.866,0.7906,0.8944,0.866,0.8018,0.9045,0.7255,0.8944,0.6901,0.7303,0.9535,0.8847,0.7237,0.8944,0.7303,0.7338,0.8,0.8272,0.9089,0.8607,0.9733,0.7071,0.9487,0.9733,0.7746,0.75,0.8997,0.9045,0.8864,0.8402,0.866,0.8402,0.8452,0.7977,0.8771,0.8018,0.8165,0.9459,0.7255,0.8563,0.7845,0.8018,0.8528,0.9428,0.8528,0.866,0.8944,0.9636,0.8819,0.866,0.7276,0.7385,0.977,0.9177,0.8018,0.8165,0.7906,0.9608,0.8563,0.8321,0.7638,0.7638,0.7338,0.8771,0.8402,0.8563,0.9075,0.7071,0.8597,0.8292,0.8847,0.8165,0.9129,0.8321,0.686,0.866,0.8478,0.8305,0.7223,0.6325,0.7977,0.8452,0.9661,0.9199,0.8745,0.9129,0.6882,0.866,0.8906,0.7385,0.9574,0.9258,0.879,0.8771,0.6547,0.866,0.5774,0.9165,0.6614,0.8086,0.7338,0.7223,0.8563,0.8165,0.866,0.8018,0.8944,0.7338,0.8885,0.9045,0.8165,0.8847,0.8086,0.7255,0.9258,0.9045,0.8498,0.9129,0.8729,0.8756,0.8165,0.8321,0.9199,0.8614,0.9258,0.7276,0.7338,0.8584,0.7276,0.8272,0.9535,0.8584,0.9535,0.866,0.7338,0.9063,0.7385,0.8997,0.6547,0.8944,0.9129,0.9258,0.9608,0.7845,0.8292,0.866,0.9354,0.7071,0.9459,0.6614,0.5345,0.9309,0.8257,0.8819,0.8528,0.8321,0.8018,0.8321,0.9129,0.9487,0.686,0.6255,0.8246,0.9487,0.8718,0.8745,0.7947,0.8944,0.8944,0.9393,0.8864,0.8745,0.9354,0.8452,0.8528,0.8819,0.8729,0.8367,0.6455,0.7817,0.9405,0.7559,0.7746,0.9325,0.9309,0.7977,0.8292,0.8745,0.8321,0.8563,0.8563,0.8563,0.9608,0.8607,0.6667,0.5164,0.9487,0.8257,0.977,0.6831,0.7977,0.8402,0.8367,0.8549,0.8584,0.866,0.8745,0.7977,0.7071,0.922,0.9014,0.8165,0.9258,0.9014,0.6882,0.5547,0.8528,0.9381,0.8898,0.8367,0.9535,0.8165,0.6202,0.7223,0.9165,0.922,0.75,0.7071,0.8165,0.7868,0.9428,0.8018,0.8528,0.9014,0.7817,0.9428,0.8165,0.866,0.9199,0.8584,0.6708,0.866,0.9535,0.8452,0.736,0.7906,0.7746,0.7416,0.7868,0.8607,0.8292,0.879,0.8292,0.7071,0.7638,0.8062,0.7746,0.7687,0.8292,0.7746,0.9211,0.7698,0.7416,0.8847,0.9045,0.8997,0.8584,0.866,0.8771,0.8416,0.607,0.7559,0.866,0.8044,0.8165,0.7638,0.7802,0.879,0.8165,0.7746,0.866


In [6]:
#check professional dataframe
df_distances_prof.head(2)

Unnamed: 0,nyc_id,userType,location,is_vip,past_advisors,topics_all,str_combined,8843543,7755085,9202605,7666762,432418,3877105,1020980,7896731,6961087,6402058,2290089,1148025,4999309,4369680,382570,284637,8187296,8476629,7306029,5652660,204780,5677744,2593443,4601513,4878277,2620815,3288117,5708352,5474030,4046542,4340193,7474138,3712338,352403,2348870,8526469,7743463,6459524,9929858,9880082,1518524,4788367,8542359,997987,4629976,5064585,4755310,7209834,606005,5017981,5247078,8902800,9134615,7048387,6751671,1348913,6937862,8064126,8436578,8910889,3230467,3140161,3667774,4192795,8301349,8033043,5166603,7485700,9540732,4958919,8561114,3122679,8238975,2562280,7987549,6414426,4101601,1983482,878483,6159047,1557244,6113184,2149492,796123,6672031,4933667,625916,3179761,4914479,2956628,6234328,4558259,6966024,6742971,1497213,3569615,1187639,3758768,7681692,3167915,7792027,9462300,1453483,2598156,477453,3771144,813333,8841456,7679971,3116521,4601941,6306797,3195584,2627748,2169538,4999579,2235203,6437056,2308071,2368457,1390411,7694741,3521250,3206074,301431,2669767,7518538,2256585,94522,6955519,9959606,2037455,2526473,8359351,269209,2825496,3511047,1614397,2659801,4410236,515652,4818902,8219730,7918230,5108683,5238740,7413581,6286127,9113243,3805339,2179494,6059897,9998462,3679840,206547,6117465,9957593,6807435,1739284,9178381,7257843,2109474,8699149,8384707,2360585,3965641,9076155,9318559,6119200,7333768,8697016,7023001,757721,537278,7468074,8175409,8788183,6446789,1857735,7650470,6495154,2611714,710683,152075,2198911,6788313,9853788,9808460,6008056,5240141,8907470,9037441,8912858,6581843,9486722,1110483,4358079,9356485,2271359,7915205,5800707,2794321,2427696,6535431,5402293,8693468,4236687,427550,7888951,2826824,6928588,2527975,5493738,7575883,1597931,1575847,1781688,372231,2282239,4319835,4920915,1484378,2299050,4846292,4336956,5786211,7893434,3895624,2859992,7198024,3423783,6675599,9576956,6126279,5686548,1825469,9700647,2868050,4702672,1146067,4015930,4401457,9546947,94925,1190509,2932188,3266779,8907446,7569840,1746835,3382302,806560,9889124,8560400,6993819,1717171,1758129,3648059,654011,2654860,4409144,345140,7904109,656623,4360626,2373514,6037605,4915718,7945608,381348,9613380,1043372,2004698,8364846,8281303,1555515,3062857,6142470,6025522,2631274,8012883,9664092,7293490,4295994,8811878,4278733,6294925,4723827,8619826,2292042,3960526,5995636,3003753,6696486,6342552,1444202,4885804,4715334,8225142,8653095,1977774,5020397,6314809,6749269,2001782,2569801,4392229,6138397,4095341,2326281,2674601,6272321,7487443,8810505,1085840,8753052,7423155,8633724,6815031,710220,3874565,8985652,7158129,8733709,5064615,2031736,529104,3088834,184099,7680557,8061648,2326949,5205644,6512033,2981141,7601225,822710,2603797,8087478,9938106,3361150,4456308,7594522,9469995,6045830,746096,8259152,422067,7423610,9556430,4214457,3490399,6458334,8846614,6545230,2926356,311409,7324067,5769802,9258719,638772,5485758,6160406,1888196,742264,93981,7958106,8190906,1901532,8632938,2898324,6948891,3509690,7465231,7881557,2320522,1171955,5343640,6617302,2262694,3340724,4604046,2130677,7258728,5023302,7115544,629294,3906792,7841905,4316201,5353925,5978678,4605980,4715885,4923495,125254,2270136,3046384,1905636,1533613,1574686,9973649,436488,2513052,1372385,5791689,7936400,9391125,9773408,5810449,5311075,3092112,1106591,3684523,5182726,7926790,5175387,9389960,428794,3967508,1220914,4345907,5884571,6290417,27779,1579045,1446459,9413760,7110487,2663308,3658078,5880932,8294767,280351,5694419,9224675,7210977,2340455,2452387,2012792,3427395,4187508,4304558,9471033,7748650,385275,9498286,5624053,4686614,8079379,8195703,7828150,1550130,8969696,3032368,1325678,2588879,3703222,6849726,7432662,6778076,6618920,6004942,7496115,6545500,7517791,2799767,5682257,8236719,9658179,7031751,4891992,1963257,8813293,4255011,9835021,5997546,6025427,3398687,1041157,9684715,8769575,5346125,2258308,8923238,7967766,9249359,704895,2185414,9382336,3018819,916644,4119888,8835562,3008395,2212759,9969727,8949839,8265769,4404402,1443873,9990971,9593316,7950567,3828722,7601152,2991448,6512271,2903590,2659610,4077631,2493337,264160,8577110,5766040,2146133,6493111,7638660,7258247,7657509,9932727,7245777,692601,2856925,2352193,2815849,3047315,9642528,3861671,9838132,4075604,4075518,7591652,7908743,9012509,5757510,1005387,4768438,918358,7524608,9934813,6541993,1449114,606407,7931432,4735198,5695334,6550439,8445244,746687,8725761,7670905,701771,9875733,3379787,1842470,983794,8658583,9697676,1468575,9853865,1784635,760211,4667134,9116597,287256,1714194,6791590,2773262,8461802,3760451,4973934,8971747,6287556,6272170,1619149,1256384,7606837,3004297,4514606,2609078,6156693,5206440,4018457,8477772,5212929,5849482,655806,8183052,2880921,506870,3328854,8777873,5357257,1396492,5678088,3589593,7607153,4588539,2951891,4200010,4648718,2696094,4989097,2169312,7658012,5070048,792212,3557154,9648543,8347381,1785441,2330592,4282915,7280759,7414746,1771221,2555756,6935947,4521023,6514761,3355184,9660873,6088914,8233640,8026321,6931189,7830992,245037,2679141,3057273,5682172,5529070,2228699,6922565,7120601,2900118,8633469,5950948,6523262,8363241,3431960,7594141,7772712,8591435,1665700,1444831,4133681,2906883,8060776,5308660,5432977,1065233,8700849,7695512,5458104,7084778,3717562,3021601,6686909,1320940,4453336,1667655,3595011,483768,7050318,6100956,2970600,3929247,3571695,8217139,6075595,9546179,8420826,7184217,5151812,9785062,3465395,5338447,6707682,8387107,746643,1232766,9732127,2160805,646836,7258638,9079878,8199153,5175701,4128637,9455964,4989182,3945611,9332538,1495788,1246537,1788831,2749455,4306743,5137921,9707252,7076461,1359054,3977768,1008130
0,8843543,professional,New York City,Y,[],"[UX/UI, Storytelling/Brand, Media/Content]","[app, hand, storytelling/brand, mobile, red, w...",0.0,0.9045,0.9393,0.9045,1.0,0.9623,0.9354,0.866,0.8944,0.8584,0.9608,0.9661,1.0,0.9258,0.9045,1.0,0.977,0.922,0.9014,0.9574,0.8885,0.9636,0.9199,0.9258,0.9045,0.9512,0.9428,0.8944,0.7071,1.0,0.9258,0.8944,1.0,0.9258,0.9354,0.9199,0.8864,0.9789,0.8944,0.8997,1.0,0.9636,0.9199,0.9789,1.0,0.9354,0.9789,0.9129,0.8819,0.9701,0.8498,0.9014,0.9258,0.9285,1.0,0.9592,0.9428,0.9428,1.0,0.8292,0.8528,0.8498,0.9733,0.9258,0.9045,0.9293,0.8864,0.9285,0.9608,0.9354,0.9428,0.9393,0.8452,0.978,0.9075,0.9682,0.9354,0.866,0.9747,0.9469,0.922,1.0,0.8292,1.0,0.9592,0.879,0.9636,0.9177,0.922,0.9535,1.0,1.0,0.9459,0.9459,0.9374,0.9014,0.8165,0.9354,0.9636,0.9487,0.9428,0.9832,0.8944,0.8771,0.9129,0.8944,0.9354,0.9672,0.9075,0.922,0.9075,0.8321,0.9045,0.9075,0.866,1.0,0.9177,1.0,0.9682,1.0,0.9045,0.978,0.9393,0.9608,0.923,0.9014,0.9075,0.9354,0.9759,0.9199,0.922,0.9258,1.0,0.9701,0.9428,0.9535,0.9428,0.8944,0.9636,0.9354,0.9199,0.8452,0.9258,0.9381,0.8771,0.866,0.9459,0.9487,0.9075,0.9258,0.7906,0.9354,0.9325,0.9354,1.0,1.0,0.8321,0.9428,0.9393,0.9459,0.9258,0.9608,0.9747,0.9405,0.9608,0.9293,0.9459,0.8018,0.9661,0.9129,1.0,0.978,1.0,1.0,1.0,0.9487,0.9393,0.8367,0.8452,0.9293,0.9574,0.9129,0.9535,0.9325,0.9574,0.8885,0.8819,1.0,0.9354,0.9165,0.978,0.8944,0.9487,0.9733,0.7845,0.8885,0.8864,1.0,1.0,0.866,1.0,0.8452,0.9856,0.9574,0.8367,0.9014,0.9045,0.9449,0.8885,0.9309,0.8944,0.8402,0.9129,0.9075,0.9014,0.9381,0.9608,0.9393,0.9636,0.922,0.8898,0.8528,0.9258,0.7559,1.0,0.9129,0.8819,0.9574,0.9428,0.8944,0.9487,0.9661,0.9199,0.9075,0.9428,0.9574,0.9661,1.0,0.7385,0.9309,0.9608,0.8885,0.9129,1.0,0.9258,0.9733,1.0,0.9309,0.9428,1.0,0.9682,0.9258,0.9177,0.9512,1.0,0.9325,0.9789,0.9636,0.8528,0.8819,0.9608,0.9535,0.9718,0.9309,0.9075,0.9258,0.7906,0.9258,0.7977,0.9177,0.9747,1.0,0.866,0.9535,0.9574,0.9649,0.8944,0.8367,0.922,0.8367,1.0,0.9701,0.9258,0.8864,0.9555,0.9258,0.9393,0.866,0.8452,0.9045,0.8847,0.8819,0.9608,0.8944,0.8944,0.922,0.9574,0.9354,0.9487,0.9075,0.9045,0.9608,0.7454,0.9293,1.0,0.9045,0.9535,1.0,0.8563,0.9574,0.9309,1.0,0.9309,1.0,0.8864,0.9309,0.7559,0.9354,0.9354,0.8864,0.9661,0.9718,0.9258,0.8165,0.8864,0.9555,0.9487,1.0,0.9574,0.9258,1.0,1.0,0.9733,1.0,0.9574,1.0,0.9574,0.866,0.9258,0.9701,0.9393,0.8819,0.6742,1.0,0.9682,1.0,1.0,0.8944,1.0,1.0,0.866,0.9258,0.9428,0.9309,0.9258,0.7746,0.7746,0.7559,0.9459,1.0,1.0,0.9789,0.8819,1.0,0.9636,0.8819,0.8367,0.9512,0.9428,1.0,1.0,0.9608,0.9574,1.0,1.0,0.9129,0.8944,1.0,0.8452,0.9045,0.9682,1.0,0.9393,0.8044,1.0,0.8367,0.8864,0.8944,1.0,0.7977,1.0,1.0,0.9649,0.8864,0.9199,0.9258,0.8819,0.9555,0.8864,0.9393,0.9129,0.9045,0.9309,0.9487,0.8528,0.9258,0.9045,0.9636,0.8292,0.9682,0.866,0.8819,1.0,0.9354,0.8452,0.9459,0.9258,0.7977,0.9045,0.8165,0.8944,0.866,1.0,0.9393,0.9129,1.0,1.0,0.9309,0.9354,0.9258,1.0,0.9177,0.9487,0.9512,0.9661,0.8528,0.978,0.9512,1.0,0.8944,0.9199,1.0,0.9733,1.0,0.9428,0.9459,0.8819,0.9487,0.9177,1.0,0.9014,0.9512,0.9535,0.9636,0.9393,0.9129,0.9393,1.0,0.9045,0.9405,1.0,1.0,0.9177,0.9177,0.9661,0.9608,0.9258,0.9045,0.9428,0.977,0.9574,1.0,0.9636,0.8819,0.9574,1.0,1.0,0.9045,0.9459,0.8864,0.9129,0.9014,0.9608,0.9309,0.8771,1.0,0.7638,0.9608,0.9199,0.8745,1.0,0.9393,0.9487,0.9555,0.9682,0.9555,1.0,0.9428,0.9199,1.0,0.9354,0.9682,0.9649,0.9089,0.8944,0.977,0.9636,0.9661,1.0,0.9393,0.9574,0.9177,0.9487,0.9649,1.0,1.0,0.7071,0.9535,0.9405,0.9512,0.922,0.9309,0.9592,0.9354,0.9806,0.9199,0.9555,1.0,0.8819,0.9682,0.9258,0.9309,0.8321,1.0,0.9535,1.0,0.9089,0.9405,1.0,0.9759,1.0,0.9428,0.9354,1.0,0.9487,0.9789,0.8987,1.0,0.9672,1.0,0.9075,1.0,0.9177,0.9393,0.9459,0.9535,0.9459,1.0,0.7906,1.0,0.9258,0.9293,0.8997,0.9258,1.0,0.9718,1.0,0.8771,0.9608,1.0,0.9129,0.866,1.0,0.9459,0.9014,0.8452,0.9309,0.9293,0.9623,0.8528,0.9806,1.0,0.8321,0.9718,0.9747,0.8745,0.9325,0.9165,1.0,0.9592,0.9701,0.9177,0.8944,1.0,0.8745,1.0,0.8745,0.866,0.9636,0.9535,0.9428,0.8997,1.0,0.9129,0.9718,1.0,0.9258,0.9487,1.0,0.8563,1.0,0.9682,0.8402,0.9608,0.8944,0.9309,0.9661,0.9806,0.9813,0.8165,0.9309,1.0,0.9045,0.977,0.8563,0.9535,0.9393,0.9487,0.9608,0.9733,0.9129,0.8745,0.7977,0.8165,1.0,1.0,0.9574,0.9636,0.9354,0.9459,0.8771,0.8528,0.9592,0.9354,0.9747,0.9535,1.0,0.9608,0.9555,0.9592,0.9487,0.9354,0.9258,0.9574,0.8997,0.9623,0.8864,0.9045,0.9014,0.9129,0.9428,0.9661,0.9354,0.9608,1.0,0.922,0.9789,1.0,0.982,0.9354,1.0,1.0,0.9747,0.9258,0.923,1.0,0.9045,0.9014,1.0,1.0,0.9747,0.9487,0.9535,0.9682,0.922,0.9535,0.9623,0.922,0.9555,0.9535,1.0,1.0,0.9354,1.0,1.0,0.8885,0.9258,0.9747,0.9701,0.9129,0.9129,0.9325,0.9535,0.9487,0.922,0.9449
2,9202605,professional,New York City,N,[],"[Data/Analytics, Storytelling/Brand, Media/Con...","[marathon, network, new, musical, data/analyti...",0.9309,0.9535,0.0,0.9535,0.9258,0.9623,0.866,0.866,0.9747,0.8584,0.8771,0.9661,0.9487,0.9258,0.9535,0.9837,0.9045,0.8944,0.9682,0.866,0.8584,0.9258,1.0,1.0,1.0,0.9512,0.9428,0.9487,0.9354,0.9636,0.8452,0.7071,0.9701,1.0,0.9354,0.8771,1.0,1.0,1.0,0.8997,1.0,1.0,0.9608,0.9574,1.0,0.9354,1.0,0.9129,0.8819,0.9701,0.8819,0.9354,0.9636,0.9097,1.0,0.9798,0.9428,0.9623,0.9733,0.9014,0.8528,0.8819,0.9459,0.9258,0.9535,0.879,1.0,0.9285,0.9608,0.9682,0.9428,0.9393,0.9258,0.978,0.8745,0.9682,0.9789,0.8165,0.8944,0.9285,0.9747,1.0,0.9354,1.0,0.9592,0.9293,0.9636,0.9459,0.9747,0.9535,0.9747,0.9682,1.0,1.0,0.9535,1.0,0.9129,0.9014,0.9636,1.0,1.0,0.9661,0.8944,0.9199,0.9428,1.0,0.9354,0.9837,0.9393,0.8944,0.9075,0.9199,0.9045,0.9393,0.9354,0.9075,1.0,0.9199,0.9354,1.0,0.9045,1.0,0.8402,0.9405,0.923,0.866,0.9075,0.9354,0.9258,0.9806,1.0,1.0,1.0,0.9393,0.9428,1.0,0.9129,0.9165,0.9258,0.9574,0.9608,0.9636,0.9258,0.9381,0.9199,0.9574,0.8885,1.0,0.9393,1.0,0.9354,0.9789,0.9089,1.0,0.8944,1.0,0.9608,0.9428,0.9393,1.0,0.9636,1.0,0.9487,0.9405,0.9608,0.9535,1.0,0.9258,1.0,0.9574,0.8729,0.9555,0.9199,0.9428,0.9129,0.9747,1.0,0.8367,0.9636,0.977,0.9789,1.0,1.0,1.0,0.9574,0.9177,1.0,0.9428,1.0,0.8944,0.9325,0.9309,0.922,0.9459,0.9199,0.8885,0.9258,0.9428,1.0,1.0,0.9129,0.8864,0.971,0.9574,0.9487,0.9682,0.9293,0.982,0.9459,1.0,0.9487,0.9393,1.0,0.9393,0.9014,0.9592,1.0,1.0,0.9636,0.9747,0.9789,0.7385,0.982,0.7559,0.9309,0.9718,0.9428,0.9574,0.9428,0.9487,1.0,0.9661,0.9405,0.9075,0.9428,0.9354,0.9661,0.9574,0.9045,0.9661,0.9608,0.9733,0.9428,1.0,0.9258,0.9459,0.9487,0.9661,0.9718,0.9535,1.0,1.0,0.9459,0.9512,0.9574,0.9555,1.0,0.9636,0.8528,1.0,1.0,0.9535,0.9428,0.9309,0.9075,0.9258,0.9354,1.0,1.0,0.9733,0.9487,0.9309,1.0,0.9535,0.9574,0.9826,0.9309,0.9487,0.9487,1.0,0.9393,1.0,1.0,1.0,1.0,0.8864,0.9393,0.9682,0.8864,0.977,0.978,0.8819,1.0,0.9487,0.9309,0.922,1.0,0.9354,0.9487,0.9393,0.8528,0.9608,0.9428,0.9293,1.0,1.0,0.9045,0.9535,0.9661,0.9574,0.9309,0.9718,0.9309,0.9487,0.9636,1.0,0.8452,0.7906,0.9354,0.9258,0.9309,0.9718,0.9258,0.9129,0.9258,0.9555,0.9309,1.0,0.9574,0.9258,0.9701,0.9574,0.9177,0.9309,1.0,0.9636,0.9574,0.9014,0.9636,1.0,0.9701,0.9428,0.9535,0.8944,0.9682,1.0,0.9701,0.9487,0.9487,0.9661,0.7906,1.0,1.0,0.8563,0.8452,0.8367,0.7746,0.8452,1.0,0.9258,1.0,0.9574,0.8819,1.0,0.9449,1.0,0.9487,1.0,0.9718,0.9199,0.9636,1.0,1.0,0.9535,0.9258,0.9428,0.9487,0.9354,0.9258,0.7385,1.0,1.0,0.9701,0.8402,1.0,1.0,1.0,0.8944,0.9574,0.8528,0.8321,0.9832,0.9826,0.8864,0.9199,0.8864,1.0,0.9555,0.8864,0.8745,0.9574,0.9535,1.0,1.0,0.8528,0.9258,0.9535,1.0,0.9354,0.9682,0.9129,1.0,0.9608,0.9014,0.9636,0.9733,1.0,0.9045,0.9535,0.9574,0.8944,1.0,1.0,0.9393,0.866,0.9682,0.9354,1.0,0.9682,0.8452,0.9535,0.9459,0.9747,0.9512,0.9661,0.9535,0.978,0.9258,1.0,0.9661,0.8771,0.9798,1.0,1.0,0.9623,0.9733,0.8819,0.9487,0.9177,0.8944,1.0,0.9759,1.0,1.0,0.9701,0.9574,1.0,0.9636,0.9535,0.9806,1.0,0.9428,0.9733,1.0,1.0,1.0,1.0,0.9535,0.9428,1.0,0.9574,0.9487,0.8864,0.9428,0.9574,0.9393,0.9535,0.9293,0.9733,1.0,0.9574,0.866,0.9199,0.9309,0.8771,0.9574,0.7638,1.0,0.9806,0.8745,0.9661,0.9701,0.9487,0.9089,1.0,0.9555,0.9512,0.9129,0.9199,0.9393,0.9789,0.9682,0.9826,0.9325,1.0,0.9293,0.9636,0.9309,0.9806,0.9075,0.9129,0.9177,0.9747,1.0,0.9045,0.9574,0.8864,1.0,0.9806,0.9512,0.9747,0.9309,0.8944,0.9682,0.9806,0.9608,0.9555,0.8944,0.9718,0.9354,1.0,0.9309,0.9199,1.0,0.9535,0.9512,0.9325,0.9806,0.9733,0.9759,0.977,0.9718,0.9574,0.9258,0.9832,0.9574,0.9405,0.9199,0.9158,1.0,0.9701,0.9199,0.9733,1.0,1.0,0.9535,1.0,0.9535,0.866,0.8771,0.982,1.0,0.9258,1.0,0.8944,0.9718,1.0,0.9199,0.9199,1.0,0.9129,0.866,0.9258,0.9459,0.9354,0.8864,0.9661,1.0,0.9813,0.8528,0.9405,0.8452,0.7338,0.9129,0.9747,0.9075,0.9325,0.9381,0.9487,0.9592,1.0,0.9177,0.8944,0.8944,0.9393,0.9636,0.9075,0.7906,1.0,1.0,0.9718,0.8729,1.0,1.0,0.9129,1.0,0.9512,0.9487,0.9555,0.9309,0.9293,1.0,0.9701,0.9608,0.8563,0.9661,0.9661,0.9405,0.9813,1.0,0.9309,0.9487,1.0,0.977,0.8944,1.0,1.0,0.7746,0.9806,0.8885,0.9129,0.9701,1.0,0.866,0.9487,1.0,0.9574,0.8864,1.0,0.9459,0.8771,0.9535,0.9798,0.9354,0.866,1.0,0.9574,1.0,0.9555,0.9798,0.922,0.9682,0.9636,0.9354,0.8997,0.9813,1.0,0.9535,0.9014,0.9428,0.9428,1.0,0.7906,0.8771,0.9733,0.9487,0.9574,1.0,0.982,0.9129,0.8292,0.8944,1.0,0.9258,0.9623,1.0,0.977,0.9682,0.9354,1.0,0.9487,0.9487,0.977,1.0,0.922,0.9847,0.9623,0.922,1.0,0.9535,1.0,1.0,0.9354,1.0,0.9574,0.9177,0.9258,0.9747,1.0,0.9129,0.9574,1.0,0.9535,0.9661,0.922,0.9636


In [7]:
#check matches dummy dataframe
df_matches.head(2)

Unnamed: 0,stud_id,stud_vip,stud_past_advisors,prof_id,prof_vip,location,wave,match_score,stud_loc,prof_loc,stud_topics,prof_topics,matched_topics,stud_keywords,prof_keywords,matched_words
0,7755085,string,string,string,string,string,test_wave,0.0001,string,string,string,string,string,string,string,string
1,3877105,string,string,string,string,string,test_wave,0.0001,string,string,string,string,string,string,string,string


# Part 7-C: Creating the allocation code

Notes:
1. Have not found a VIP student yet, which is logical. which means that the idea is to pair VIP professionals with the students which have the lowest score with them (Reverse is happening here: it is not the student that needs to be paired, but the professional)
2. Broad filters can be applied before creating the MIN ID LIST (which should actually be changed to bottom X list of IDs for the student, let's see which one will work better)
3. The location filter is important, but adds a lot of complication because of having a small number of people from other locations aside from NYC and SF
4. if a professional cannot be found for a student given the filters, it will do the normal process of pairing

In [62]:
import random

def make_matches(matches_df, stud_df, prof_df, seed=0, buffer=0, print_specs=True):
    random.seed(seed)
    df_output = matches_df.copy()
    df_student = stud_df.copy()
    df_professional = prof_df.copy()
    
    stud_id_list = list(matches_df['stud_id'])
    random.shuffle(stud_id_list)
    
    for x in stud_id_list:
        tmpbool1 = False
        tmpbool2 = False
        #get the index of the nyc_id in the output
        tmp_stud_index = df_output[df_output['stud_id']==x].index[0]
        tmp_stud_index2 = df_student[df_student['nyc_id']==x].index[0]

        #populate stud/student fields
        df_output.at[tmp_stud_index, 'stud_past_advisors'] = df_student.loc[tmp_stud_index2, 'past_advisors']
        df_output.at[tmp_stud_index, 'stud_vip'] = df_student.loc[tmp_stud_index2, 'is_vip']
        df_output.at[tmp_stud_index, 'stud_loc'] = df_student.loc[tmp_stud_index2, 'location']
        df_output.at[tmp_stud_index, 'stud_topics'] = df_student.loc[tmp_stud_index2, 'topics_all']
        df_output.at[tmp_stud_index, 'stud_keywords'] = df_student.loc[tmp_stud_index2, 'str_combined']
        
        #find the prof/professional ids which have the lowest distance
        tmp_min_dist = min(df_professional[str(x)])
        tmp_min_dist = tmp_min_dist + buffer
        tmp_id_list = list(df_professional[df_professional[str(x)]<=tmp_min_dist]['nyc_id'])
        
        #random sample from the list of professionals that qualify for the student
        tmp_matched_professional_id = random.sample(tmp_id_list, 1)
        tmp_prof_index = df_professional[df_professional['nyc_id']==tmp_matched_professional_id].index[0]

        #populate prof/professional fields
        df_output.at[tmp_stud_index, 'prof_vip'] = df_professional.loc[tmp_prof_index, 'is_vip']
        df_output.at[tmp_stud_index, 'prof_id'] = df_professional.loc[tmp_prof_index, 'nyc_id']
        df_output.at[tmp_stud_index, 'prof_loc'] = df_professional.loc[tmp_prof_index, 'location']
        df_output.at[tmp_stud_index, 'prof_topics'] = df_professional.loc[tmp_prof_index, 'topics_all']
        df_output.at[tmp_stud_index, 'prof_keywords'] = df_professional.loc[tmp_prof_index, 'str_combined']

        #populate match score
        df_output.at[tmp_stud_index, 'match_score'] = df_professional.loc[tmp_prof_index, str(x)]
        
        #populate matched words
        key_stud = df_output.loc[tmp_stud_index, 'stud_keywords']
        key_prof = df_output.loc[tmp_stud_index, 'prof_keywords']
        inters_keywords = [s for s in key_stud if s in key_prof]
        
        df_output.at[tmp_stud_index, 'matched_words'] = inters_keywords
        
        #populate matched topics
        topic_list_stud = df_output.loc[tmp_stud_index, 'stud_topics']
        topic_list_prof = df_output.loc[tmp_stud_index, 'prof_topics']
        inters_topics = [s for s in topic_list_stud if s in topic_list_prof]
        
        df_output.at[tmp_stud_index, 'matched_topics'] = inters_topics

        #populate location
        if (df_output.loc[tmp_stud_index, 'stud_loc'] == df_output.loc[tmp_stud_index, 'prof_loc']):
            df_output.at[tmp_stud_index, 'location'] = df_output.loc[tmp_stud_index, 'stud_loc']
        else:
            df_output.at[tmp_stud_index, 'location'] = 'Remote'

        #drop prof/professional selected from the prof/professional df
        df_professional = df_professional[df_professional['nyc_id']!=tmp_matched_professional_id]
        
        tmp_min_dist = "{0:.4f}".format(tmp_min_dist)
        
        if print_specs:
            print ('stud: {}, teach: {}, distance: {}, matched words: {}'.format(x, \
            tmp_matched_professional_id[0], tmp_min_dist, '|'.join(inters_keywords)))

        sum_distances = sum([x for x in df_output['match_score']])
        
    print ('seed: {}, sum_distances: {}'.format(seed, sum_distances))
    
    return df_output, sum_distances

# Create the allocation function which has the filters applied

In [132]:
import random

def return_id_list(id_value, min_value, df_professional, buffer = 0, filter_col = '', 
                   filter_val = '', past_advisors = []):
    
    tmp_min_dist = min_value + buffer
    tmp_df_id = df_professional.copy()
    
    #filter rows which have a distance value of less than or equal to tmp_min_dist
    tmp_df_id = tmp_df_id[tmp_df_id[str(id_value)]<=tmp_min_dist]
    
    if (filter_col!='') & (filter_val!=''):
        tmp_df_id = tmp_df_id[tmp_df_id[filter_col]==filter_val]
    
    #applying past_advisors filter
    for z in past_advisors:
        tmp_df_id = tmp_df_id[tmp_df_id['nyc_id']!=z]
        
    tmp_id_list = list(tmp_df_id['nyc_id'])
    
    return tmp_id_list, tmp_df_id

def make_matches_filter(matches_df, stud_df, prof_df, seed=0, buffer=0, print_specs=True):
    random.seed(seed)
    df_output = matches_df.copy()
    df_student = stud_df.copy()
    df_professional = prof_df.copy()
    
    stud_id_list = list(matches_df['stud_id'])
    random.shuffle(stud_id_list)
    
    for x in stud_id_list:
        tmpbool1 = False
        tmpbool2 = False
        #get the index of the nyc_id in the output
        tmp_stud_index = df_output[df_output['stud_id']==x].index[0]
        tmp_stud_index2 = df_student[df_student['nyc_id']==x].index[0]

        #populate stud/student fields
        #NOTE: there's no VIP student but we're still keeping information whether they are vip just in case
        df_output.at[tmp_stud_index, 'stud_past_advisors'] = df_student.loc[tmp_stud_index2, 'past_advisors']
        df_output.at[tmp_stud_index, 'stud_vip'] = df_student.loc[tmp_stud_index2, 'is_vip']
        df_output.at[tmp_stud_index, 'stud_loc'] = df_student.loc[tmp_stud_index2, 'location']
        df_output.at[tmp_stud_index, 'stud_topics'] = df_student.loc[tmp_stud_index2, 'topics_all']
        df_output.at[tmp_stud_index, 'stud_keywords'] = df_student.loc[tmp_stud_index2, 'str_combined']
        
        #find the prof/professional ids which have the lowest distance
        #THIS IS THE IMPORTANT PART. this is where you limit the professional IDs which go into the list
        #if you limit the ids that go into the list where we get the potential matches, 
        #that's effectively already applying filters!!!
        
        ###DECLARING VARIABLES FOR FILTERS###
        tmp_min_dist = min(df_professional[str(x)])
        tmp_loc = df_student.loc[tmp_stud_index2, 'location']
        tmp_stud_topics = df_student.loc[tmp_stud_index2, 'topics_all']
        
        ###FIRST ROUND FILTERING - IS_VIP, LOCATION and TOPICS###
        tmp_id_list, tmp_df_id = return_id_list(id_value = str(x), min_value = tmp_min_dist, 
                                             df_professional = df_professional, 
                                             buffer = buffer, filter_col='is_vip', filter_val = 'Y', 
                                             past_advisors=df_student.loc[tmp_stud_index2, 'past_advisors'])
        tmp_id_list, tmp_df_id = return_id_list(id_value = str(x), min_value = tmp_min_dist, 
                                             df_professional = tmp_df_id, buffer = buffer, 
                                             filter_col='location', filter_val = tmp_loc,
                                             past_advisors=df_student.loc[tmp_stud_index2, 'past_advisors'])
        #STILL PART OF FIRST ROUND - CHECKING FOR COMMON TOPICS
        if len(tmp_df_id)>0:
            for m in list(tmp_df_id['nyc_id'].index):
                tmp_prof_topics = []
                tmp_prof_topics = tmp_df_id.loc[m, 'topics_all']
                tmp_prof_bool = any([j in tmp_stud_topics for j in tmp_prof_topics])
                if not tmp_prof_bool:
                    tmp_id_remove_topics = tmp_df_id.loc[m, 'nyc_id']
                    tmp_df_id = tmp_df_id[tmp_df_id['nyc_id']!=tmp_id_remove_topics]
            
        
        ###SECOND ROUND FILTERING - IS_VIP and LOCATION###
        if len(tmp_id_list)==0:
            tmp_id_list, tmp_df_id = return_id_list(id_value = str(x), min_value = tmp_min_dist, 
                                             df_professional = df_professional, 
                                             buffer = buffer, filter_col='is_vip', filter_val = 'Y', 
                                             past_advisors=df_student.loc[tmp_stud_index2, 'past_advisors'])
            tmp_id_list, tmp_df_id = return_id_list(id_value = str(x), min_value = tmp_min_dist, 
                                                 df_professional = tmp_df_id, buffer = buffer, 
                                                 filter_col='location', filter_val = tmp_loc,
                                                 past_advisors=df_student.loc[tmp_stud_index2, 'past_advisors'])

        ###THIRD ROUND FILTERING - LOCATION ONLY###
        if len(tmp_id_list)==0:
            tmp_id_list, tmp_df_id = return_id_list(id_value = str(x), min_value = tmp_min_dist, 
                                                 df_professional = df_professional, 
                                                 buffer = buffer, filter_col='location', filter_val = tmp_loc,
                                                 past_advisors=df_student.loc[tmp_stud_index2, 'past_advisors'])

        ###FOURTH ROUND FILTERING - LOCATION, but choosing all available advisors, setting buffer to 1.00###
        if len(tmp_id_list)==0:
            tmp_id_list, tmp_df_id = return_id_list(id_value = str(x), min_value = tmp_min_dist, 
                                                 df_professional = df_professional, 
                                                 buffer = 1.00, filter_col='location', filter_val = tmp_loc,
                                                 past_advisors=df_student.loc[tmp_stud_index2, 'past_advisors'])

        ###FIFTH ROUND FILTERING - JUST TOPICS FILTER###
        if len(tmp_id_list)==0:
            #past advisors filter is removed for this to prevent having errors
            tmp_id_list, tmp_df_id = return_id_list(id_value = str(x), min_value = tmp_min_dist, 
                                             df_professional = df_professional, 
                                             buffer = buffer)
            #CHECKING FOR COMMON TOPICS
            if len(tmp_df_id)>0:
                for m in list(tmp_df_id['nyc_id'].index):
                    tmp_prof_topics = []
                    tmp_prof_topics = tmp_df_id.loc[m, 'topics_all']
                    tmp_prof_bool = any([j in tmp_stud_topics for j in tmp_prof_topics])
                    if not tmp_prof_bool:
                        tmp_id_remove_topics = tmp_df_id.loc[m, 'nyc_id']
                        tmp_df_id = tmp_df_id[tmp_df_id['nyc_id']!=tmp_id_remove_topics]
            
        ###FINAL ROUND FILTERING - JUST THE BASIC MINIMUM FILTER###
        if len(tmp_id_list)==0:
            #past advisors filter is removed for this to prevent having errors
            tmp_id_list, tmp_df_id = return_id_list(id_value = str(x), min_value = tmp_min_dist, 
                                             df_professional = df_professional, 
                                             buffer = buffer)

        #random sample from the list of professionals that qualify for the student
        tmp_matched_professional_id = random.sample(tmp_id_list, 1)
        tmp_prof_index = df_professional[df_professional['nyc_id']==tmp_matched_professional_id].index[0]

        #populate prof/professional fields
        df_output.at[tmp_stud_index, 'prof_vip'] = df_professional.loc[tmp_prof_index, 'is_vip']
        df_output.at[tmp_stud_index, 'prof_id'] = df_professional.loc[tmp_prof_index, 'nyc_id']
        df_output.at[tmp_stud_index, 'prof_loc'] = df_professional.loc[tmp_prof_index, 'location']
        df_output.at[tmp_stud_index, 'prof_topics'] = df_professional.loc[tmp_prof_index, 'topics_all']
        df_output.at[tmp_stud_index, 'prof_keywords'] = df_professional.loc[tmp_prof_index, 'str_combined']

        #populate match score
        df_output.at[tmp_stud_index, 'match_score'] = df_professional.loc[tmp_prof_index, str(x)]
        
        #populate matched words
        key_stud = df_output.loc[tmp_stud_index, 'stud_keywords']
        key_prof = df_output.loc[tmp_stud_index, 'prof_keywords']
        inters_keywords = [s for s in key_stud if s in key_prof]
        df_output.at[tmp_stud_index, 'matched_words'] = inters_keywords
        
        #populate matched topics
        topic_list_stud = df_output.loc[tmp_stud_index, 'stud_topics']
        topic_list_prof = df_output.loc[tmp_stud_index, 'prof_topics']
        inters_topics = [s for s in topic_list_stud if s in topic_list_prof]
        df_output.at[tmp_stud_index, 'matched_topics'] = inters_topics

        #populate location
        if (df_output.loc[tmp_stud_index, 'stud_loc'] == df_output.loc[tmp_stud_index, 'prof_loc']):
            df_output.at[tmp_stud_index, 'location'] = df_output.loc[tmp_stud_index, 'stud_loc']
        else:
            df_output.at[tmp_stud_index, 'location'] = 'Remote'

        #drop prof/professional selected from the prof/professional df
        df_professional = df_professional[df_professional['nyc_id']!=tmp_matched_professional_id]
        
        tmp_min_dist = "{0:.4f}".format(tmp_min_dist)
        
        if print_specs:
            print ('stud: {}, teach: {}, distance: {}, matched words: {}'.format(x, \
            tmp_matched_professional_id[0], tmp_min_dist, '|'.join(inters_keywords)))

        sum_distances = sum([x for x in df_output['match_score']])
        
    print ('seed: {}, sum_distances: {}'.format(seed, sum_distances))
    
    return df_output, sum_distances

# Part 7-D: Running the allocation code, with monte carlo simulation (NO FILTERS APPLIED)

The Monte Carlo simulation is done by setting seed to a value for each run.  For this run, there were no filters applied

In [164]:
sum_dist1 = 0.0
sum_dist2 = 0.0
x_final = 0
for x in list(range(0,400,1)):
    df1, sum_dist1 = make_matches(matches_df=df_matches, stud_df=df_distances_stud, 
                                        prof_df=df_distances_prof, seed=x, buffer=0.0, 
                                        print_specs=False)
    if x==0:
        sum_dist2=sum_dist1
        df_final = df1
        x_final = x
    elif sum_dist1<sum_dist2:
        sum_dist2 = sum_dist1
        df_final = df1
        x_final = x

print ('best seed: {}, best distance: {}'.format(x_final, sum_dist2))

seed: 0, sum_distances: 299.93689999999975
seed: 1, sum_distances: 299.9607999999998
seed: 2, sum_distances: 300.01010000000014
seed: 3, sum_distances: 300.5882000000001
seed: 4, sum_distances: 299.7261999999996
seed: 5, sum_distances: 299.85820000000007
seed: 6, sum_distances: 300.15960000000007
seed: 7, sum_distances: 300.1813
seed: 8, sum_distances: 299.5728999999998
seed: 9, sum_distances: 299.8316
seed: 10, sum_distances: 299.0971999999999
seed: 11, sum_distances: 299.8626999999998
seed: 12, sum_distances: 299.6845000000001
seed: 13, sum_distances: 299.7377000000001
seed: 14, sum_distances: 300.5555999999999
seed: 15, sum_distances: 299.9624
seed: 16, sum_distances: 299.9257999999999
seed: 17, sum_distances: 300.9038000000001
seed: 18, sum_distances: 300.55999999999995
seed: 19, sum_distances: 300.00629999999995
seed: 20, sum_distances: 300.4059999999999
seed: 21, sum_distances: 299.5758999999999
seed: 22, sum_distances: 300.1213
seed: 23, sum_distances: 300.2071
seed: 24, sum_dis

seed: 194, sum_distances: 299.8184999999999
seed: 195, sum_distances: 300.43469999999985
seed: 196, sum_distances: 300.06769999999983
seed: 197, sum_distances: 299.8644999999998
seed: 198, sum_distances: 300.37450000000007
seed: 199, sum_distances: 299.69449999999995
seed: 200, sum_distances: 300.3865
seed: 201, sum_distances: 300.54649999999975
seed: 202, sum_distances: 299.20559999999995
seed: 203, sum_distances: 299.86660000000023
seed: 204, sum_distances: 301.1044999999999
seed: 205, sum_distances: 299.4828
seed: 206, sum_distances: 299.91650000000016
seed: 207, sum_distances: 299.7573000000002
seed: 208, sum_distances: 299.43159999999995
seed: 209, sum_distances: 300.19320000000033
seed: 210, sum_distances: 299.43059999999997
seed: 211, sum_distances: 300.1413999999998
seed: 212, sum_distances: 299.8382000000001
seed: 213, sum_distances: 300.2408999999999
seed: 214, sum_distances: 300.0226999999999
seed: 215, sum_distances: 300.21500000000015
seed: 216, sum_distances: 299.88889999

seed: 385, sum_distances: 299.72919999999976
seed: 386, sum_distances: 300.09999999999997
seed: 387, sum_distances: 299.5553
seed: 388, sum_distances: 300.5169999999999
seed: 389, sum_distances: 299.5394000000001
seed: 390, sum_distances: 300.3703
seed: 391, sum_distances: 300.2544000000001
seed: 392, sum_distances: 299.77909999999997
seed: 393, sum_distances: 300.1856000000001
seed: 394, sum_distances: 299.7947000000003
seed: 395, sum_distances: 300.4413999999998
seed: 396, sum_distances: 300.45290000000006
seed: 397, sum_distances: 300.6036999999999
seed: 398, sum_distances: 299.3195999999999
seed: 399, sum_distances: 300.11780000000005
best seed: 237, best distance: 299.019


In [165]:
df_final, sum_dist = make_matches(matches_df=df_matches, stud_df=df_distances_stud, 
                                  prof_df=df_distances_prof, seed=x_final, print_specs=True)

stud: 2299050, teach: 1190509, distance: 0.8660, matched words: app|vc|beauty|fashion|food|commerce|business
stud: 7293490, teach: 2799767, distance: 0.7071, matched words: engineering|management|mobile|product|chocolate|app|end
stud: 7120601, teach: 3003753, distance: 0.7906, matched words: engineering|front|google|learning|technology|end
stud: 2906883, teach: 6342552, distance: 0.7802, matched words: engineering|app|network|mobile|gaming|social|snapchat|reality|virtual
stud: 9889124, teach: 2593443, distance: 0.7609, matched words: engineering|music|front|mobile|ux/ui|end|entertainment|web
stud: 2427696, teach: 3004297, distance: 0.8944, matched words: management|product|business|model|life|people|food
stud: 264160, teach: 8733709, distance: 0.8062, matched words: engineering|app|new|mobile|end|puppy|back
stud: 4369680, teach: 6126279, distance: 0.8452, matched words: app|network|storytelling/brand|mobile|growth|social
stud: 9938106, teach: 4453336, distance: 0.8044, matched words: e

stud: 385275, teach: 6088914, distance: 0.8584, matched words: best|ux/ui|fiction|entertainment|science
stud: 8561114, teach: 7772712, distance: 0.7817, matched words: management|product|web|app|mobile|instagram|business
stud: 9808460, teach: 4973934, distance: 0.7454, matched words: ecommerce/delivery|commerce|data/analytics|business
stud: 6008056, teach: 6788313, distance: 0.8660, matched words: product|app|mobile|technology|content|marketing
stud: 6523262, teach: 2926356, distance: 0.8272, matched words: management|product|app|network|mobile|social
stud: 9076155, teach: 3771144, distance: 0.8584, matched words: app|mobile|technology|young|spotify
stud: 4735198, teach: 2611714, distance: 0.8165, matched words: engineering|front|mobile|technology|software|end|metal|back
stud: 8217139, teach: 9202605, distance: 0.8292, matched words: hamilton|network|musical|soundtrack|social
stud: 7888951, teach: 8788183, distance: 0.8272, matched words: engineering|app|mobile|technology|qa|japanese
s

stud: 4295994, teach: 5529070, distance: 0.8018, matched words: engineering|mobile|ux/ui|software|politics
stud: 1857735, teach: 2179494, distance: 0.7559, matched words: storytelling/brand|management|mobile|app|product|marketing
stud: 5769802, teach: 6138397, distance: 0.8272, matched words: reading|snapchat|model|software|chocolate|business
stud: 5212929, teach: 6286127, distance: 0.7670, matched words: engineering|front|technology|software|end|instagram|back
stud: 1110483, teach: 760211, distance: 0.8584, matched words: app|summer|everyday|mobile|media/content
stud: 9613380, teach: 8949839, distance: 0.8660, matched words: new|storytelling/brand|one|content|entertainment
stud: 152075, teach: 4588539, distance: 0.8597, matched words: engineering|music|guitar|front|technology|end
stud: 4686614, teach: 1390411, distance: 0.8885, matched words: new|storytelling/brand|ux/ui|software
stud: 6931189, teach: 3667774, distance: 0.8745, matched words: beauty|coffee|commerce|fashion
stud: 18254

stud: 7031751, teach: 5357257, distance: 0.7638, matched words: model|business|web|data/analytics|technology
stud: 8841456, teach: 6414426, distance: 0.9158, matched words: management|web|product|media/content|entertainment
stud: 8591435, teach: 2146133, distance: 0.8944, matched words: data/analytics|qa|entertainment|web
stud: 7496115, teach: 4255011, distance: 0.9177, matched words: beauty|food|fashion
stud: 2588879, teach: 6306797, distance: 0.8745, matched words: management|product|software|education
stud: 8199153, teach: 8183052, distance: 0.9089, matched words: management|product|vc|software
stud: 3945611, teach: 8813293, distance: 0.8771, matched words: business|growth|technology
stud: 4304558, teach: 5308660, distance: 0.8885, matched words: management|product|college|web
stud: 2659610, teach: 7607153, distance: 0.8745, matched words: storytelling/brand|data/analytics|travel|media/content
stud: 3423783, teach: 7666762, distance: 0.9220, matched words: entertainment|instagram|we

In [166]:
df_final.head(2)

Unnamed: 0,stud_id,stud_vip,stud_past_advisors,prof_id,prof_vip,location,wave,match_score,stud_loc,prof_loc,stud_topics,prof_topics,matched_topics,stud_keywords,prof_keywords,matched_words
0,7755085,N,[],8477772,N,Remote,test_wave,0.8528,Somewhere else,San Francisco,"[Media/Content, Storytelling/Brand, Product Ma...","[Business Operations, Product Management, Stor...","[Storytelling/Brand, Product Management]","[management, music, product, app, polish, stor...","[management, music, product, little, vintage, ...","[management, music, product, storytelling/bran..."
1,3877105,N,[],9835021,N,New York City,test_wave,0.8389,New York City,New York City,"[Business Operations, Engineering - Mobile, En...","[Engineering - Back End, Data/Analytics, Busin...",[Business Operations],"[engineering, app, network, reading, anatomy, ...","[engineering, new, data/analytics, business, w...","[engineering, business, watching, beauty, fash..."


In [167]:
df_final.to_csv('outputs/matches_test_wave_no_filter.csv')

# Part 7-E: Running the allocation code, with monte carlo simulation (FILTERS APPLIED)

The Monte Carlo simulation is done by setting seed to a value for each run.  For this run, filters were applied

In [133]:
sum_dist1 = 0.0
sum_dist2 = 0.0
x_final = 0
for x in list(range(0,200,1)):
    df1, sum_dist1 = make_matches_filter(matches_df=df_matches, stud_df=df_distances_stud, 
                                        prof_df=df_distances_prof, seed=x, buffer=0.06, 
                                        print_specs=False)
    if x==0:
        sum_dist2=sum_dist1
        df_final = df1
        x_final = x
    elif sum_dist1<sum_dist2:
        sum_dist2 = sum_dist1
        df_final = df1
        x_final = x

print ('best seed: {}, best distance: {}'.format(x_final, sum_dist2))

seed: 0, sum_distances: 310.2959000000001
seed: 1, sum_distances: 309.7365000000002
seed: 2, sum_distances: 308.2468000000001
seed: 3, sum_distances: 309.3718000000004
seed: 4, sum_distances: 309.58220000000006
seed: 5, sum_distances: 308.20330000000024
seed: 6, sum_distances: 309.2398000000004
seed: 7, sum_distances: 309.1320999999999
seed: 8, sum_distances: 309.60760000000033
seed: 9, sum_distances: 309.95900000000006
seed: 10, sum_distances: 308.68450000000007
seed: 11, sum_distances: 310.18310000000014
seed: 12, sum_distances: 309.2792000000002
seed: 13, sum_distances: 310.22159999999997
seed: 14, sum_distances: 309.5155000000001
seed: 15, sum_distances: 309.10000000000025
seed: 16, sum_distances: 309.0518999999998
seed: 17, sum_distances: 309.7251000000003
seed: 18, sum_distances: 309.70270000000016
seed: 19, sum_distances: 309.58929999999987
seed: 20, sum_distances: 309.65770000000026
seed: 21, sum_distances: 308.4475000000002
seed: 22, sum_distances: 310.0625000000004
seed: 23, 

seed: 189, sum_distances: 309.48590000000024
seed: 190, sum_distances: 309.10120000000023
seed: 191, sum_distances: 309.5914000000003
seed: 192, sum_distances: 310.2021000000004
seed: 193, sum_distances: 309.17699999999985
seed: 194, sum_distances: 308.8115000000001
seed: 195, sum_distances: 309.8194000000001
seed: 196, sum_distances: 309.8025000000002
seed: 197, sum_distances: 309.5374000000001
seed: 198, sum_distances: 309.9887000000003
seed: 199, sum_distances: 309.88750000000016
best seed: 156, best distance: 307.7634000000001


In [134]:
df_final2, sum_dist2 = make_matches_filter(matches_df=df_matches, stud_df=df_distances_stud, 
                                  prof_df=df_distances_prof, seed=x_final, print_specs=True)

stud: 7918230, teach: 7048387, distance: 0.8718, matched words: app|network|storytelling/brand|mobile|great|social
stud: 2012792, teach: 9134615, distance: 0.7746, matched words: engineering|front|mobile|app|education|end
stud: 5686548, teach: 6617302, distance: 0.8498, matched words: storytelling/brand|model|food|instagram|business
stud: 7023001, teach: 7575883, distance: 0.8597, matched words: marketing|commerce
stud: 4914479, teach: 9389960, distance: 0.8367, matched words: app|storytelling/brand|mobile|snapchat|model|business
stud: 3703222, teach: 7198024, distance: 0.8018, matched words: snapchat|engineering|back|data/analytics|end
stud: 6495154, teach: 3805339, distance: 0.8416, matched words: engineering|world|obsessed|end|marketing|hardware|back
stud: 7594141, teach: 1901532, distance: 0.8718, matched words: app|data/analytics|business|mobile|model|spotify
stud: 1171955, teach: 6342552, distance: 0.8062, matched words: management|product|app|mobile|gaming|reality|virtual
stud: 

stud: 8700849, teach: 6545230, distance: 0.8660, matched words: network|storytelling/brand|business|career|social|instagram
stud: 1148025, teach: 6937862, distance: 0.8563, matched words: model|reality|travel|business
stud: 7950567, teach: 8910889, distance: 0.8597, matched words: music|app|mobile|media/content|ux/ui|nike
stud: 4304558, teach: 2513052, distance: 0.7609, matched words: engineering|management|product|technology|end|food|back|web
stud: 1444202, teach: 1396492, distance: 0.8062, matched words: ux/ui|web
stud: 7755085, teach: 4214457, distance: 0.8257, matched words: storytelling/brand|video|media/content
stud: 9382336, teach: 9249359, distance: 0.8044, matched words: app|network|storytelling/brand|mobile|social|media/content
stud: 7792027, teach: 3062857, distance: 0.8165, matched words: management|product|app|mobile|instagram|business
stud: 9012509, teach: 7666762, distance: 0.8528, matched words: entertainment|marketing|web
stud: 3569615, teach: 6707682, distance: 0.7906

stud: 746643, teach: 4818902, distance: 0.8944, matched words: app|mobile|technology
stud: 2598156, teach: 4958919, distance: 0.7338, matched words: management|product|app|growth|web
stud: 9660873, teach: 6160406, distance: 0.8528, matched words: engineering|app|front|mobile|end|back
stud: 5757510, teach: 7324067, distance: 0.8165, matched words: software
stud: 4306743, teach: 4095341, distance: 0.8165, matched words: management|music|product|app|mobile|ux/ui
stud: 5769802, teach: 1041157, distance: 0.8272, matched words: ux/ui
stud: 7076461, teach: 7967766, distance: 0.8257, matched words: business|beauty|model|reality|commerce|instagram|fashion
stud: 2627748, teach: 7881557, distance: 0.7276, matched words: engineering|management|product|network|front|social|software|end
stud: 7306029, teach: 2868050, distance: 0.7906, matched words: engineering|obsessed|spotify|software|end|back
stud: 5849482, teach: 9116597, distance: 0.8597, matched words: business|software
stud: 2906883, teach: 4

stud: 4128637, teach: 2271359, distance: 0.8452, matched words: engineering|end|people|qa|hardware|back
stud: 4119888, teach: 422067, distance: 0.8367, matched words: app|leadership|business|mobile|model|instagram
stud: 5020397, teach: 8236719, distance: 0.7845, matched words: storytelling/brand|model|business
stud: 7931432, teach: 654011, distance: 0.9129, matched words: web|home|technology|diy|spotify
stud: 7184217, teach: 9224675, distance: 0.9027, matched words: management|web|product|storytelling/brand|spotify
stud: 6581843, teach: 8697016, distance: 0.8272, matched words: data/analytics|business|model
stud: 1020980, teach: 9838132, distance: 0.9129, matched words: data/analytics|watching|snapchat|web
stud: 2588879, teach: 8301349, distance: 0.8044, matched words: management|product
stud: 2526473, teach: 4601941, distance: 0.8165, matched words: model|instagram|business
stud: 4920915, teach: 7893434, distance: 0.8898, matched words: storytelling/brand|business|marketing|instagram|

In [135]:
df_final2.head(2)

Unnamed: 0,stud_id,stud_vip,stud_past_advisors,prof_id,prof_vip,location,wave,match_score,stud_loc,prof_loc,stud_topics,prof_topics,matched_topics,stud_keywords,prof_keywords,matched_words
0,7755085,N,[],4214457,N,Somewhere else,test_wave,0.9293,Somewhere else,Somewhere else,"[Media/Content, Storytelling/Brand, Product Ma...","[Data/Analytics, Storytelling/Brand, Media/Con...","[Media/Content, Storytelling/Brand]","[management, music, product, app, polish, stor...","[storytelling/brand, tea, media/content, nonpr...","[storytelling/brand, video, media/content]"
1,3877105,N,[],2569801,N,New York City,test_wave,0.9428,New York City,New York City,"[Business Operations, Engineering - Mobile, En...","[Business Model, Business Operations, Growth L...",[Business Operations],"[engineering, app, network, reading, anatomy, ...","[snapchat, model, creating, transport, art, bo...","[business, snapchat, software]"


In [137]:
df_final2.to_csv('outputs/matches_test_wave_with_filter.csv')

# Part 7-F: Double checking the output if they are correct

In [19]:
df_final = pd.read_csv('outputs/matches_test_wave_no_filter.csv', header = 0, index_col=0)
col_convert = ['stud_past_advisors', 'stud_topics', 'prof_topics', 'matched_topics', 
               'stud_keywords', 'prof_keywords', 'matched_words']
for y in col_convert:
    df_final[y] = [ast.literal_eval(x) for x in df_final[y]]
df_final.head(2)

Unnamed: 0,stud_id,stud_vip,stud_past_advisors,prof_id,prof_vip,location,wave,match_score,stud_loc,prof_loc,stud_topics,prof_topics,matched_topics,stud_keywords,prof_keywords,matched_words
0,7755085,N,[],1390411,N,Remote,test_wave,0.879,Somewhere else,New York City,"[Media/Content, Storytelling/Brand, Product Ma...","[Product Management, UX/UI, Storytelling/Brand]","[Storytelling/Brand, Product Management]","[management, music, product, app, polish, stor...","[management, storytelling/brand, product, ux/u...","[management, product, storytelling/brand, inst..."
1,3877105,N,[],8733709,N,Remote,test_wave,0.8819,New York City,San Francisco,"[Business Operations, Engineering - Mobile, En...","[Data/Analytics, Engineering - Back End, Busin...",[Business Operations],"[engineering, app, network, reading, anatomy, ...","[engineering, puppy, mobile, transport, app, n...","[engineering, app, business, mobile, end, inst..."


In [22]:
df_ind0, df_ind0_id_dist = create_df_distance(df_distances, 1)
tmp_stud = df_final.loc[0, 'stud_id']
tmp_prof = df_final.loc[0, 'prof_id']
print (tmp_stud)
print (tmp_prof)
print (df_final.loc[0, 'match_score'])
print (df_final.loc[0, 'stud_topics'])
print (df_final.loc[0, 'prof_topics'])
print (df_final.loc[0, 'matched_topics'])
print (df_final.loc[0, 'stud_keywords'])
print (df_final.loc[0, 'prof_keywords'])
print (df_final.loc[0, 'matched_words'])

7755085
1390411
0.879
['Media/Content', 'Storytelling/Brand', 'Product Management']
['Product Management', 'UX/UI', 'Storytelling/Brand']
['Storytelling/Brand', 'Product Management']
['management', 'music', 'product', 'app', 'polish', 'storytelling/brand', 'mobile', 'video', 'history', 'wine', 'media/content', 'snapchat', 'cover', 'government', 'nail', 'making', 'art', 'page', 'politics', 'instagram', 'travel', 'web']
['management', 'storytelling/brand', 'product', 'ux/ui', 'new', 'yext', 'instagram', 'software', 'travel', 'finding', 'technology']
['management', 'product', 'storytelling/brand', 'instagram', 'travel']


In [23]:
ind_stud = df_distances[df_distances['nyc_id']==tmp_stud].index[0]
ind_prof = df_distances[df_distances['nyc_id']==tmp_prof].index[0]
print(df_distances[df_distances['nyc_id']==tmp_stud].index[0])
print(df_distances[df_distances['nyc_id']==tmp_prof].index[0])
df_ind0.sort_values(['knn_distance']).loc[[ind_stud, ind_prof]]

1
120


Unnamed: 0,management,music,product,app,polish,storytelling/brand,mobile,video,history,wine,media/content,snapchat,cover,government,nail,making,art,page,politics,instagram,travel,web,knn_distance
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0.0
120,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0.879049


In [24]:
#first 10 rows. if the index is not in the first 10 rows, it means that we find it in further down
df_ind0.sort_values(['knn_distance']).head(10)

Unnamed: 0,management,music,product,app,polish,storytelling/brand,mobile,video,history,wine,media/content,snapchat,cover,government,nail,making,art,page,politics,instagram,travel,web,knn_distance
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0.0
459,1,1,1,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0.768706
94,1,0,1,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,1,1,0,1,0.768706
72,0,0,0,1,0,1,1,1,0,0,1,0,0,1,0,0,0,0,1,1,0,1,0.768706
165,1,1,1,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0.797724
289,1,1,1,1,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0.797724
211,0,1,0,1,0,1,1,1,0,0,1,1,0,0,0,0,0,0,0,0,0,1,0.797724
209,1,0,1,1,0,0,1,0,0,0,0,1,0,1,0,0,0,0,1,0,0,1,0.797724
160,0,1,0,1,0,1,1,1,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0.797724
14,0,0,0,1,0,1,1,0,0,0,0,1,0,1,0,0,0,0,1,1,1,0,0.797724


# Part 7-G: Comparing VIPs how they were allocated without filters vs with filters

In [200]:
#check results for VIPs (original, do not overwrite)
tmp = df_final[df_final['prof_vip']=='Y']
print (len(tmp))
tmp.head(len(tmp))

33


Unnamed: 0,stud_id,stud_vip,stud_past_advisors,prof_id,prof_vip,location,wave,match_score,stud_loc,prof_loc,stud_topics,prof_topics,matched_topics,stud_keywords,prof_keywords,matched_words
7,382570,N,[8265769],9134615,Y,New York City,test_wave,0.8528,New York City,New York City,"[Engineering - Mobile, Storytelling/Brand, Eng...","[Engineering - Mobile, Engineering - Front End...","[Engineering - Mobile, Engineering - Front End]","[engineering, app, storytelling/brand, home, f...","[engineering, app, editor, front, mobile, podc...","[engineering, app, front, mobile, end, politics]"
15,7743463,N,[],6937862,Y,New York City,test_wave,0.8018,New York City,New York City,"[Media - Content, Business Model, Product Mana...","[Product Management, Business Model, Media/Con...","[Business Model, Product Management]","[management, product, web, app, mobile, snapch...","[management, model, reality, business, media/c...","[management, product, model, travel, business]"
16,6459524,N,[],9469995,Y,New York City,test_wave,0.9354,New York City,New York City,"[Business Model, Business Operations, Engineer...","[Business Operations, QA, Data/Analytics]",[Business Operations],"[engineering, energy, finance, american, codin...","[home, ecommerce/delivery, patagonia, google, ...","[business, google, commerce]"
50,7792027,N,[],7681692,Y,New York City,test_wave,0.7817,New York City,New York City,"[Business Operations, Engineering - Mobile, Pr...","[Business Operations, Storytelling/Brand, Prod...","[Business Operations, Product Management]","[engineering, management, product, app, high, ...","[management, product, hr, soccer, amazing, sto...","[management, product, high, school, internet, ..."
74,6059897,N,[1574686],4846292,Y,New York City,test_wave,0.9129,New York City,New York City,"[Engineering - Mobile, Engineering - Front End...","[Storytelling/Brand, Media/Content, Data/Analy...",[],"[engineering, think, tech, app, watch, new, fr...","[storytelling/brand, media/content, internet, ...","[technology, internet, marketing, instagram]"
94,6008056,N,[5649587],6788313,Y,New York City,test_wave,0.866,New York City,New York City,"[Engineering - Mobile, Media - Content, Produc...","[Media - Content, Storytelling/Brand, Product ...",[Media - Content],"[engineering, ui, music, product, app, academy...","[possible, management, product, app, much, cou...","[product, app, mobile, technology, content, ma..."
103,7888951,N,"[68273, 5182726]",8788183,Y,New York City,test_wave,0.8272,New York City,New York City,"[Engineering - Mobile, UX/UI, QA]","[QA, Engineering - Mobile, Product Management]","[Engineering - Mobile, QA]","[engineering, app, network, korean, high, scho...","[engineering, management, tea, product, app, g...","[engineering, app, mobile, technology, qa, jap..."
105,2527975,N,[],8910889,Y,New York City,test_wave,0.8044,New York City,New York City,"[Media/Content, Storytelling/Brand, UX/UI]","[UX/UI, Media/Content, Storytelling/Brand]","[Media/Content, Storytelling/Brand, UX/UI]","[music, app, storytelling/brand, video, mobile...","[eat, time, music, app, take, storytelling/bra...","[music, app, storytelling/brand, mobile, media..."
112,4319835,N,[],4891992,Y,New York City,test_wave,0.8062,New York City,New York City,"[Media - Content, Business Model, Product Mana...","[Product Management, Data/Analytics, Business ...","[Business Model, Product Management]","[management, music, web, product, app, video, ...","[management, model, entertainment, video, prod...","[management, product, video, model, photograph..."
116,3423783,N,[7487443],7666762,Y,Remote,test_wave,0.922,Somewhere else,New York City,"[Mobile App, Software, Web]","[Media/Content, Business Operations, Growth Le...",[],"[app, data, rapper, mobile, learning, photosho...","[game, digital, entertainment, business, web, ...","[entertainment, instagram, web]"


In [201]:
#check results for VIPs (for test runs)
tmp = df_final2[df_final2['prof_vip']=='Y']
print (len(tmp))
tmp.head(len(tmp))

33


Unnamed: 0,stud_id,stud_vip,stud_past_advisors,prof_id,prof_vip,location,wave,match_score,stud_loc,prof_loc,stud_topics,prof_topics,matched_topics,stud_keywords,prof_keywords,matched_words
5,1148025,N,[],6937862,Y,New York City,test_wave,0.8563,New York City,New York City,"[Business Model, Business Operations, Data/Ana...","[Product Management, Business Model, Media/Con...",[Business Model],"[data/analytics, give, video, variety, model, ...","[management, model, reality, business, media/c...","[model, reality, travel, business]"
15,7743463,N,[],5458104,Y,New York City,test_wave,0.7559,New York City,New York City,"[Media - Content, Business Model, Product Mana...","[Product Management, Growth Levers, Media - Co...","[Media - Content, Product Management]","[management, product, web, app, mobile, snapch...","[try, management, general, product, app, worko...","[management, product, web, app, content, model]"
35,4101601,N,[],1348913,Y,New York City,test_wave,0.9129,New York City,New York City,"[Engineering - Front End, Product Management, ...","[Growth Levers, Storytelling/Brand, UX/UI]",[Storytelling/Brand],"[engineering, management, product, app, anatom...","[female, management, estate, community, name, ...","[management, storytelling/brand, design, love]"
37,1557244,N,[],8902800,Y,New York City,test_wave,0.866,New York City,New York City,"[Business Operations, Media/Content, Business ...","[Business Model, Business Operations, Media/Co...","[Business Operations, Media/Content, Business ...","[app, building, reading, finance, school, busi...","[app, course, creating, mobile, development, m...","[app, business, mobile, media/content, model]"
55,3195584,N,[9648543],3667774,Y,New York City,test_wave,0.879,New York City,New York City,"[Engineering - Mobile, Media/Content, Storytel...","[Business Operations, Growth Levers, Storytell...",[Storytelling/Brand],"[engineering, vsco, app, new, storytelling/bra...","[thought, headspace, reading, storytelling/bra...","[storytelling/brand, obsessed, beauty, commerc..."
71,7918230,N,[9202605],7048387,Y,New York City,test_wave,0.8718,New York City,New York City,"[Engineering - Mobile, Storytelling/Brand, QA]","[Product Management, Storytelling/Brand, UX/UI]",[Storytelling/Brand],"[engineering, creative, app, academy, network,...","[management, kindle, product, think, app, litt...","[app, network, storytelling/brand, mobile, gre..."
75,9998462,N,"[9144188, 6126279]",4846292,Y,New York City,test_wave,0.9089,New York City,New York City,"[UX/UI, Storytelling/Brand, Data/Analytics]","[Storytelling/Brand, Media/Content, Data/Analy...","[Storytelling/Brand, Data/Analytics]","[general, music, muji, network, data/analytics...","[storytelling/brand, media/content, internet, ...","[data/analytics, storytelling/brand, internet,..."
86,7023001,N,[],7575883,Y,New York City,test_wave,0.9555,New York City,New York City,"[Engineering - Front End, Storytelling / Brand...","[Storytelling/Brand, Media/Content, Growth Lev...",[],"[engineering, music, storytelling, network, br...","[digital, ecommerce/delivery, watch, cheese, s...","[marketing, commerce]"
87,7468074,N,[],8912858,Y,New York City,test_wave,0.8416,New York City,New York City,"[Data/Analytics, Engineering - Back End, VC]","[Product Management, Media/Content, Data/Analy...",[Data/Analytics],"[engineering, network, vc, data/analytics, cou...","[save, management, world, product, app, reform...","[data/analytics, course, finance, podcasts, re..."
93,9808460,N,[],9469995,Y,New York City,test_wave,0.7454,New York City,New York City,"[Business Model, Business Operations, Data/Ana...","[Business Operations, QA, Data/Analytics]","[Business Operations, Data/Analytics]","[hardware, model, diy, ecommerce/delivery, dat...","[home, ecommerce/delivery, patagonia, google, ...","[ecommerce/delivery, commerce, data/analytics,..."


# Part 8: Doing an analysis of the matches made using the allocation code

Load the data again

In [104]:
col_convert = ['stud_past_advisors', 'stud_topics', 'prof_topics', 'matched_topics', 
               'stud_keywords', 'prof_keywords', 'matched_words']

df_final = pd.read_csv('outputs/matches_test_wave_no_filter.csv', header = 0, index_col=0)
for y in col_convert:
    df_final[y] = [ast.literal_eval(x) for x in df_final[y]]
df_final.head(2)

Unnamed: 0,stud_id,stud_vip,stud_past_advisors,prof_id,prof_vip,location,wave,match_score,stud_loc,prof_loc,stud_topics,prof_topics,matched_topics,stud_keywords,prof_keywords,matched_words
0,7755085,N,[],1390411,N,Remote,test_wave,0.879,Somewhere else,New York City,"[Media/Content, Storytelling/Brand, Product Ma...","[Product Management, UX/UI, Storytelling/Brand]","[Storytelling/Brand, Product Management]","[management, music, product, app, polish, stor...","[management, storytelling/brand, product, ux/u...","[management, product, storytelling/brand, inst..."
1,3877105,N,[],8733709,N,Remote,test_wave,0.8819,New York City,San Francisco,"[Business Operations, Engineering - Mobile, En...","[Data/Analytics, Engineering - Back End, Busin...",[Business Operations],"[engineering, app, network, reading, anatomy, ...","[engineering, puppy, mobile, transport, app, n...","[engineering, app, business, mobile, end, inst..."


In [130]:
df_final2 = pd.read_csv('outputs/matches_test_wave_with_filter.csv', header = 0, index_col=0)
for y in col_convert:
    df_final2[y] = [ast.literal_eval(x) for x in df_final2[y]]
df_final2.head(2)

Unnamed: 0,stud_id,stud_vip,stud_past_advisors,prof_id,prof_vip,location,wave,match_score,stud_loc,prof_loc,stud_topics,prof_topics,matched_topics,stud_keywords,prof_keywords,matched_words
0,7755085,N,[],4095341,N,Remote,test_wave,0.8528,Somewhere else,New York City,"[Media/Content, Storytelling/Brand, Product Ma...","[UX/UI, Storytelling/Brand, Product Management]","[Storytelling/Brand, Product Management]","[management, music, product, app, polish, stor...","[storytelling/brand, shoe, management, music, ...","[management, music, product, app, storytelling..."
1,3877105,N,[],9556430,N,New York City,test_wave,0.9027,New York City,New York City,"[Business Operations, Engineering - Mobile, En...","[Engineering - Back End, Data/Analytics, QA]",[],"[engineering, app, network, reading, anatomy, ...","[engineering, world, around, back, apple, netf...","[engineering, apple, software, end, netflix]"


In [206]:
df_w3_actual = pd.read_csv('outputs/matches_wave3_actual.csv', header = 0, index_col=0)
for y in col_convert:
    df_w3_actual[y] = [ast.literal_eval(x) for x in df_w3_actual[y]]
df_w3_actual.head(2)

Unnamed: 0,stud_id,stud_vip,stud_past_advisors,prof_id,prof_vip,location,wave,match_score,stud_loc,prof_loc,stud_topics,prof_topics,matched_topics,stud_keywords,prof_keywords,matched_words
0,1256384,N,[9469995],8843543,Y,New York City,wave3,0.9309,New York City,New York City,"[UX/UI, Storytelling/Brand, Business Operations]","[UX/UI, Storytelling/Brand, Media/Content]","[UX/UI, Storytelling/Brand]","[lush, vsco, look, finance, color, storytellin...","[app, hand, storytelling/brand, mobile, red, w...","[storytelling/brand, ux/ui]"
1,1825469,N,[7594522],6126279,N,New York City,wave3,0.8416,New York City,New York City,"[Engineering - Mobile, Product Management, Bus...","[Product Management, Growth Levers, Storytelli...",[Product Management],"[engineering, management, music, product, harr...","[management, creative, music, product, app, an...","[management, music, product, app, network, mob..."


In [196]:
from operator import itemgetter

def match_results_stats(df):
    #all data points
    word_matches = []
    for x in df['matched_words']:
        for s in x:
            word_matches.append(s)
    word_count = len(word_matches)
    word_matches = [[x, word_matches.count(x)] for x in list(set(word_matches))]
    word_matches = sorted(word_matches, key=itemgetter(1), reverse=True)

    topic_matches = []
    for x in df['matched_topics']:
        for s in x:
            topic_matches.append(s)
    topic_count = len(topic_matches)
    topic_matches = [[x, topic_matches.count(x)] for x in list(set(topic_matches))]
    topic_matches = sorted(topic_matches, key=itemgetter(1), reverse=True)

    location = [x for x in df['location']]
    location = [[x, location.count(x)] for x in list(set(location))]

    scores = [x for x in df['match_score']]
    mean = np.mean(scores)
    median = np.median(scores)
    stdev = np.std(scores)
    total = np.sum(scores)
    m, n = np.histogram(scores, bins=12, range=(0.4,1))
    scores = [['{0:.4f}'.format(b),a] for a,b in zip(m,n)]

    times_matched = [len(x) for x in df['matched_words']]
    times_matched = [[x, times_matched.count(x)] for x in list(set(times_matched))]
    times_matched = sorted(times_matched, key=itemgetter(0), reverse=False)

    print ('No. of word matches: {}'.format(word_count))
    print ('word_matches preview: {}'.format(word_matches[20:27]))
    print ('\n')
    print ('No. of topic matches: {}'.format(topic_count))
    print ('topic_matches top 3: {}'.format(topic_matches[0:3]))
    print ('\n')
    print ('location: {}'.format(location))
    print ('\n')
    print ('scores total: {0:.4f}'.format(total))
    print ('scores mean: {0:.4f}'.format(mean))
    print ('scores median: {0:.4f}'.format(median))
    print ('scores std dev: {0:.4f}'.format(stdev))
    #print ('scores histogram: {}'.format(scores))
    #print ('\n')
    #print ('No. of times matched: {}'.format(times_matched))
    print ('\n')

In [197]:
#df_final stats
df = df_final.copy()
df_vip = df[df['prof_vip']=='Y'].copy()

print ('MATCHES NO FILTER:')
print ('\n')
match_results_stats(df=df)
print ('MATCHES NO FILTER VIP:')
print ('\n')
match_results_stats(df=df_vip)

MATCHES NO FILTER:


No. of word matches: 1878
word_matches preview: [['network', 29], ['new', 20], ['marketing', 20], ['music', 18], ['snapchat', 17], ['commerce', 16], ['spotify', 16]]


No. of topic matches: 515
topic_matches top 3: [['Product Management', 78], ['Storytelling/Brand', 64], ['Engineering - Front End', 60]]


location: [['San Francisco', 12], ['New York City', 262], ['Remote', 79]]


scores total: 299.0190
scores mean: 0.8471
scores median: 0.8528
scores std dev: 0.0574


MATCHES NO FILTER VIP:


No. of word matches: 165
word_matches preview: [['music', 2], ['content', 2], ['food', 2], ['video', 2], ['back', 2], ['snapchat', 2], ['beauty', 2]]


No. of topic matches: 40
topic_matches top 3: [['Product Management', 9], ['UX/UI', 7], ['Media/Content', 5]]


location: [['New York City', 28], ['Remote', 5]]


scores total: 27.9541
scores mean: 0.8471
scores median: 0.8498
scores std dev: 0.0518




In [198]:
#df_final stats
df = df_final2.copy()
df_vip = df[df['prof_vip']=='Y'].copy()

print ('MATCHES WITH FILTER:')
print ('\n')
match_results_stats(df=df)
print ('MATCHES WITH FILTER VIP:')
print ('\n')
match_results_stats(df=df_vip)

MATCHES WITH FILTER:


No. of word matches: 1594
word_matches preview: [['network', 25], ['music', 17], ['new', 15], ['snapchat', 14], ['spotify', 13], ['internet', 13], ['marketing', 13]]


No. of topic matches: 436
topic_matches top 3: [['Product Management', 59], ['Storytelling/Brand', 57], ['Engineering - Front End', 54]]


location: [['San Francisco', 43], ['Somewhere else', 5], ['New York City', 296], ['Remote', 9]]


scores total: 307.4646
scores mean: 0.8710
scores median: 0.8718
scores std dev: 0.0685


MATCHES WITH FILTER VIP:


No. of word matches: 155
word_matches preview: [['social', 2], ['content', 2], ['course', 2], ['really', 2], ['software', 2], ['video', 2], ['snapchat', 2]]


No. of topic matches: 44
topic_matches top 3: [['Storytelling/Brand', 7], ['Product Management', 6], ['Business Model', 5]]


location: [['New York City', 33]]


scores total: 28.2984
scores mean: 0.8575
scores median: 0.8660
scores std dev: 0.0598




In [207]:
#df_final stats
df = df_w3_actual.copy()
df_vip = df[df['prof_vip']=='Y'].copy()

print ('MATCHES WAVE3 ACTUAL:')
print ('\n')
match_results_stats(df=df)
print ('MATCHES WAVE3 ACTUAL VIP:')
print ('\n')
match_results_stats(df=df_vip)

MATCHES WAVE3 ACTUAL:


No. of word matches: 1436
word_matches preview: [['commerce', 19], ['network', 19], ['internet', 15], ['marketing', 11], ['music', 10], ['vc', 9], ['snapchat', 9]]


No. of topic matches: 477
topic_matches top 3: [['Engineering - Back End', 62], ['Product Management', 60], ['Storytelling/Brand', 59]]


location: [['San Francisco', 43], ['Somewhere else', 2], ['New York City', 293], ['Remote', 15]]


scores total: 312.6327
scores mean: 0.8856
scores median: 0.8906
scores std dev: 0.0560


MATCHES WAVE3 ACTUAL VIP:


No. of word matches: 126
word_matches preview: [['snapchat', 2], ['school', 1], ['data/analytics', 1], ['reality', 1], ['new', 1], ['instagram', 1], ['making', 1]]


No. of topic matches: 39
topic_matches top 3: [['Product Management', 13], ['Media/Content', 6], ['Business Operations', 4]]


location: [['New York City', 33]]


scores total: 29.7785
scores mean: 0.9024
scores median: 0.9075
scores std dev: 0.0431




# End of the Code!