This notebook was put together by Balachandar Arumugam. Source and license info is on GitHub.

# Twitter: Discovering Social Circles in Ego Networks

Social networking sites like Facebook, Twitter, LinkedIn etc. offer us many ways to access real-time information from around the world. As our network grows, the information tends to get more and more cluttered, making it difficult to separate signal from the noise. While these social-sites offer ways to categorize friends into groups, the process is very repetitive and also misses out the social-structure present in the network. 

In this notebook, we look into the first degree social-network of the given-user and explore the social-structure within the ego-network. i.e. We look at the given-user's friends and study the network between his friends. Given user is called as 'ego' and his friends are 'alters' and objective is to identify communities/clusters in given user's ego network. 

This notebook covers the following: (1) Connect to Twitter API and build ego-network (network of connections between given user's friends) (2) Identify top-influencers in the ego-network by running PageRank algorithm on Markov Transition matrix (3) Explore the network-structure with help of clustering algorithms to identify clusters of interest (4) Recommend Who-to-Follow on the basis of meaningful Clusters from previous step (5) Visualization: Visualize clusters in 2 dimensions and look for actionable surprises (Using Principal Component Analsyis, multi-dimensional data is reduced to 2 dimensions)

To interactively explore the clusters automatically discovered  : https://balaca.shinyapps.io/Twitter_Graph

# Collecting Data |  Twitter API

While Twitter API page (https://dev.twitter.com/overview/api/twitter-libraries) recommends many wrappers,  I recommend python-twitter. Use this command to install the package.

$ pip install python-twitter

Twitter REST APIs are rate-limited (More details here: https://dev.twitter.com/overview/documentation), so I have added a timer in the code to stay within rate-limits). While building the ego-network, there might be cases where someone follows > 5000 Twitter users. I advise modifying wrapper-code to skip to next-node after mining 5000 Ids(a small approximation, to reduce running-time)

Hint: run the 'path' command in notebook find Twitter-wrapper installation-path, and edit the wrapper-method in source-code

In [1]:
from collections import Counter, defaultdict
from sklearn.decomposition import PCA

import csv
import matplotlib.pyplot as plt
import numpy as np
import os.path
import pandas as pd
import random
import seaborn as sns;  sns.set()
import time
import twitter

% matplotlib inline
random.seed(1000)

In [2]:
twitter.__path__  # python-wrapper source code

['C:\\Anaconda\\lib\\site-packages\\twitter']

In [3]:
# initialize file_names 

self_screen_name = 'bala_io'                               # User_of_interest. 'ego' node

fof_filename = self_screen_name + "_friends.csv"           # 'alters' and their follow-structure [FOF]
binaryMap_filename  =  self_screen_name + "_binaryMap.csv" # list of adjacencies as 0 | 1

cache_filename = self_screen_name + "_cache.csv"           # local cache of screen-names, names
cluster_filename = self_screen_name + "_clusters.csv"      # clusters identified

In [4]:
# Twitter API auth | https://dev.twitter.com/oauth/overview/application-owner-access-tokens
# Copy auth details in text-file in same folder as this notebook

with open("credentials.txt", "r") as f:
    reader = csv.reader(f )
    login_dict = {line[0]: line[1]                       
                    for line in reader}        

api = twitter.Api(consumer_key=login_dict.get('consumer_key') ,
                  consumer_secret=login_dict.get('consumer_secret'),
                  access_token_key=login_dict.get('access_token_key'),
                  access_token_secret=login_dict.get('access_token_secret'))
api

<twitter.api.Api at 0x1ab77b00>

In [5]:
# 'ego' and his 'alters'

self_user =  api.GetUser(screen_name = self_screen_name)    # details of 'ego' as a Twitter object
self_user_id = unicode(self_user.id)                        # twitter-id of 'ego'
friends_of_self = api.GetFriendIDs(user_id = self_user_id, screen_name = self_screen_name , stringify_ids = True)
index = [self_user_id] + friends_of_self                    # index for ego-network

In [6]:
# takes fof_filename and to_fetch_list (list of Twitter Ids) as argument, 
# queries Twitter API for 'Following' Ids and writes to local-file, returns None

def update_FoF_File(fileName, to_fetch_list):
    
    with open(fileName, 'a') as f:
        apiRequestCounter = 0
        
        for x in to_fetch_list:
            friends_of_x = api.GetFriendIDs(user_id = x,  stringify_ids = True)  #Twitter API call
            row =  ','.join( [x] +  friends_of_x ) + "\n"            
            f.write(row) 
            
            apiRequestCounter += 1
            if (apiRequestCounter == 14): time.sleep(15*60)  #this API call is rate-limited at 15 req / 15 min

In [7]:
# parses friends_of_friends file and returns list of people, for whom friends' Ids have already been mined

def getFinishList(fileName):
        
    if not os.path.isfile(fileName):
        return [] 
    with open(fileName, 'r') as f:
        return [ x.strip().split(',')[0] for x in f ]  # for every row, 1st entry is a user, followed by his/her friends   

In [8]:
# find the diff, and if any node in 'alters' misses graph-details, fetch that

fof_finish_list = getFinishList( fof_filename )        
fof_to_fetch_list = list ( set(friends_of_self) - set(fof_finish_list) )  # list of nodes, for which fof details are to be fetched
fof_to_fetch_list

[]

In [9]:
update_FoF_File(fof_filename, fof_to_fetch_list) # populate their details in fof_file

In [10]:
# Ego-network as adjacency-matrix
# parses fof_file in order of index, create list of adjacencies as 0 | 1
# returns adjacency-matrix in Row_follows_Column format

def updateBinaryMapFile(fof_filename, binaryMap_filename, index):
    
    with open(fof_filename, "r") as f:
        reader = csv.reader(f)
        fof_dict = {line[0]: line[1:]                        # dict of node:his_followers (excluding 'ego')
                    for line in reader if line[0] in index}
        fof_dict[self_user_id] = index                       # add ego's details at end
    
    bool_list = []
    
    for user in index:
        user_friends = set( fof_dict[user] )  
        bool_row = [item in user_friends for item in index]  # for every node, populate row with T/F 
        bool_list.append(bool_row)
    
    int_nparray = np.array(bool_list) + 0                    # Numpy arithmetic to change boolean to int

    binaryMap_rfc = pd.DataFrame(data = int_nparray, columns= index, index = index)
    binaryMap_rfc.to_csv(binaryMap_filename)
    return binaryMap_rfc                                     # binary map in row_follows_column (rfc) format

In [11]:
binaryMap_rfc = updateBinaryMapFile(fof_filename, binaryMap_filename, index)
binaryMap_rfc.shape

(403, 403)

In [12]:
# Edge-case: Nodes with few outlinks might lead to dead-ends
# To prevent such PageRank leaks, add random outlinks. refer: http://infolab.stanford.edu/~ullman/mmds/ch5.pdf

outlinks_count = binaryMap_rfc.sum(axis = 1)   # horizontal-sum to count outlinks
inlinks_count = binaryMap_rfc.sum(axis = 0)   # vertical-sum to count inlinks

edge_case_nodes =  outlinks_count < 5
edge_case_nodes = edge_case_nodes[edge_case_nodes == True].index

for node in np.array(edge_case_nodes):
    # add 5 random-outlinks
    random_outlinks             = np.random.choice(2, len(index), p=[1-5.0/len(index), 5.0/len(index)]) 
    binaryMap_rfc.loc[node, ] = binaryMap_rfc.loc[node, ] |  random_outlinks

In [13]:
binaryMap_cfr = binaryMap_rfc.transpose()                                # transpose to get in column-follows-row format
colStochMatrix = np.matrix( binaryMap_cfr / binaryMap_cfr.sum(axis = 0)) # column-stochastic-matrix

pageRankVector = np.matrix([1.0/len(index)] *  len(index))               # iniitialize page-rank-vector 
pageRankVector = pageRankVector.transpose()                              # transpose to column-vector

In [14]:
# PageRank algo: Power Iteration to solve Markov transition matrix 
# refer this     : http://setosa.io/blog/2014/07/26/markov-chains/index.html

beta = 0.8
epsilon = 999
while epsilon > (1.0/(10**15)):
    pageRankVectorUpdating = colStochMatrix * pageRankVector * beta # Random teleport happens with 20% prob
    
    # re-insert leaked page-ranks
    S = np.array(pageRankVectorUpdating).sum()                      
    pageRankVectorUpdated = pageRankVectorUpdating + (1 - S) * (1.0/len(index)) * np.ones_like(len(index))
    
    # compute the squared-difference and check for convergence
    error = np.array(pageRankVectorUpdated - pageRankVector)
    epsilon = np.sqrt((error *  error).sum())        
        
    pageRankVector = pageRankVectorUpdated    

In [15]:
# fetches Twitter User_objects for given list of ids, update local_cache to avoid repeat API calls
# returns list of screen_names and names, in same order

def lookup_in_cache(friendsIdsList):

    old_cache, new_cache = pd.DataFrame(), pd.DataFrame()
    screenNamesList, namesList = [], []
    
    if os.path.isfile(cache_filename):
        old_cache = pd.read_csv(cache_filename, dtype=unicode)    # retrieve info if present in local-copy
        to_fetch_list = list ( set (friendsIdsList) - set(old_cache['Ids']) )
    else :        
        to_fetch_list = friendsIdsList   
    
    i = 0
    while (i < len(to_fetch_list) * 1./100):        
        low, high = i * 100, min( len(to_fetch_list), (i+1)*100 )  # UsersLookup api: limit is 100 Ids per request
        twitterObjectsList = api.UsersLookup(user_id = to_fetch_list[low:high])        
        screenNamesList += [unicode(tempObject.screen_name) for tempObject in twitterObjectsList]
        namesList += [tempObject.name.encode('utf-8').strip() for tempObject in twitterObjectsList]
        i = i + 1       
    
    new_cache['screenNames'], new_cache['names'], new_cache['Ids'] = screenNamesList, namesList, to_fetch_list
    new_cache = old_cache.append(new_cache).set_index('Ids', drop = True )
    new_cache.to_csv(cache_filename)    
    
    return list(new_cache.loc[friendsIdsList]['screenNames']), list (new_cache.loc[friendsIdsList]['names'])

In [16]:
# THE data-frame to store output of all computations

clusters_df = pd.DataFrame()
clusters_df['Ids'], clusters_df['PageRanks'] = index, pageRankVector
clusters_df['screenNames'], clusters_df['names'] = lookup_in_cache( list(clusters_df['Ids']) )

clusters_df['Inlinks'], clusters_df['Outlinks'] = list(inlinks_count), list(outlinks_count)
clusters_df = clusters_df.set_index('Ids', drop = True )

n_clusters = min( int( round(np.sqrt(len(index)/2)) ), 10 ) # not more than 10 clusters

In [17]:
# Use K-Means algorithm to cluster nodes into different-clusters

from sklearn.cluster import KMeans
est = KMeans(max_iter = 100000, n_clusters = n_clusters, n_init = 250)  
clusters_df['kmeans'] = est.fit_predict(binaryMap_cfr)

In [18]:
# Use Spectral algorithm to cluster nodes into different-clusters

from sklearn import cluster
spectral = cluster.SpectralClustering(n_clusters=10,
                                          eigen_solver='arpack',
                                          affinity="nearest_neighbors")

spectral.fit(binaryMap_cfr)
clusters_df['spectral'] = spectral.labels_.astype(np.int)



In [32]:
# Use AFfinityPropagation algorithm to cluster nodes into different-clusters

from sklearn.cluster import AffinityPropagation
af = AffinityPropagation(preference=-50).fit(binaryMap_cfr)
clusters_df['affinity'] = af.labels_

cluster_centers_indices = af.cluster_centers_indices_
n_clusters_affinity = len(cluster_centers_indices)
print n_clusters_affinity

163


In [19]:
# Use Principal Component Analysis to reduce multiple-dimensions to simple 2 dimensions

pca = PCA(n_components=2)
Xproj = pca.fit_transform(binaryMap_cfr)

clusters_df['dim1'] = Xproj[:,0]
clusters_df['dim2'] = Xproj[:,1]

clusters_df = clusters_df.sort('PageRanks', ascending =False)
clusters_df.to_csv(cluster_filename)

print pca.explained_variance_

[ 10.16994762   2.76709056]


In [20]:
# Overall Top 100 Influencers in Social-Graph 
dummy_df = pd.DataFrame()
for i in range(10):
    dummy_df[i] = list (clusters_df [10*i : 10* i + 10]['names'])

print "*** Top 100 Influencers in my Social Graph *** "
dummy_df

*** Top 100 Influencers in my Social Graph *** 


Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,Marc Andreessen,dick costolo,Max Levchin,Biz Stone,Dave McClure,Brad Stone,Jeremy Stoppelman,Pierre Omidyar,Jonah Peretti,David Lee
1,Elon Musk,Paul Graham,Vinod Khosla,Nick Bilton,Satya Nadella,Tony Fadell,Bret Taylor,Ron Conway,Adam D'Angelo,Paul Buchheit
2,Tim O'Reilly,Chris Dixon,WIRED,Tim Cook,Josh Kopelman,Gabe Rivera,Jeff Weiner,John Markoff,Mark Suster,Kevin Weil
3,Om Malik,Ben Horowitz,Benedict Evans,Walt Mossberg,Bradley Horowitz,Liz Gannes,Paul Krugman,John Carmack,danah boyd,steve blank
4,Bill Gates,marissamayer,Steven Levy,Hunter Walk,Steve Case,Naval Ravikant,Paul Kedrosky,Brian Chesky,David Kirkpatrick,Startup L. Jackson
5,Fred Wilson,Chris Sacca,Dave Morin,John Maeda,Josh Elman,Patrick Collison,David Sacks,Steven Sinofsky,brian pokorny,Tim Berners-Lee
6,Reid Hoffman,TechCrunch,Sam Altman,Nate Silver,Hilary Mason,jason,Horace Dediu,Steven Pinker,danprimack,Clayton Christensen
7,Kara Swisher,Bill Gurley,Drew Houston,Joi Ito,Keith Rabois,Sarah Lacy,Peter Fenton,Philip Kaplan,Emily Chang,Kevin Kelly
8,Ev Williams,John Doerr,Chris Anderson,Kevin Rose,Shervin Pishevar,Y Combinator,Jessica Lessin,Re/code,Jeff Clavier,Chamath Palihapitiya
9,Aaron Levie,Eric Schmidt,Mitch Kapor,Bill Gross,mark pincus,Matt Mullenweg,dj patil,Jessica Verrilli,a16z,Jimmy Wales


In [21]:
# KMeans clustering | Top-influencers in Social-Circles
dummy_df = pd.DataFrame()

for i in range(n_clusters):
    
    nodes_in_cluster = list( clusters_df [clusters_df['kmeans'] == i ]['names'] )     
    if len(nodes_in_cluster) >= 10:            # identify only clusters which are big enough. say size > 10        
        col_name           = str(i) + " : " + str(len(nodes_in_cluster)) + " Ids"
        dummy_df[col_name] = nodes_in_cluster[:10]
        
print "*** KMeans Clustering | Cluster-wise Top-influencers [Cluster-number : # of users in the cluster] *** "
dummy_df

*** KMeans Clustering | Cluster-wise Top-influencers [Cluster-number : # of users in the cluster] *** 


Unnamed: 0,0 : 106 Ids,1 : 26 Ids,2 : 19 Ids,3 : 32 Ids,4 : 35 Ids,5 : 26 Ids,6 : 26 Ids,7 : 30 Ids,8 : 70 Ids,9 : 33 Ids
0,Microsoft Research,TechCrunch,Marc Andreessen,WIRED,Wes McKinney,Horace Dediu,John Doerr,Patrick Collison,John Carmack,Hilary Mason
1,Edward Tufte,Vinod Khosla,Elon Musk,John Maeda,Sean J. Taylor,Steven Sinofsky,Benedict Evans,Matt Mullenweg,Tom Hulme,dj patil
2,Steven Strogatz,Steven Levy,Tim O'Reilly,Paul Krugman,chris wiggins,Jessica Verrilli,Dave Morin,Jeremy Stoppelman,Luke Wroblewski,Jeff Hammerbacher
3,Stephen Wolfram,Chris Anderson,Om Malik,John Markoff,Olivier Grisel,Jonah Peretti,Sam Altman,Pierre Omidyar,Craig Mod,Monica Rogati
4,O'Reilly Media,Mitch Kapor,Bill Gates,Steven Pinker,John D. Cook,Kevin Weil,Drew Houston,Philip Kaplan,Irene Au,Drew Conway
5,umair,Biz Stone,Fred Wilson,danah boyd,David Smith,sundarpichai,Tim Cook,Adam D'Angelo,Ray Ozzie,Peter Skomoroch
6,Michael Bernstein,Nick Bilton,Reid Hoffman,steve blank,Yann LeCun,megan quinn,Hunter Walk,David Kirkpatrick,Daniel Burka,Google Research
7,John Resig,Walt Mossberg,Kara Swisher,Tim Berners-Lee,Ryan Rosario,Megan Smith,Kevin Rose,brian pokorny,Scott Kupor,Nathan Yau
8,Stanford Business,Nate Silver,Ev Williams,Clayton Christensen,Lynn Cherny,Bryce Roberts,Satya Nadella,Emily Chang,Vivek Wadhwa,Mike Bostock
9,Bob Sutton,Joi Ito,Aaron Levie,Kevin Kelly,John Foreman,michael abbott,Josh Elman,Paul Buchheit,Larry Gadea,Hadley Wickham


In [31]:
# Spectral clustering | Top-influencers in Social-Circles

dummy_df = pd.DataFrame()

for i in range(n_clusters):
    
    nodes_in_cluster = list( clusters_df [clusters_df['spectral'] == i ]['names'] )     
    if len(nodes_in_cluster) >= 10:            # identify only clusters which are big enough. say size > 10        
        col_name           = str(i) + " : " + str(len(nodes_in_cluster)) + " Ids"
        dummy_df[col_name] = nodes_in_cluster[:10]
        
print "*** Spectral Clustering | Cluster-wise Top-influencers [Cluster-number : # of users in the cluster] *** "
dummy_df

*** Spectral Clustering | Cluster-wise Top-influencers [Cluster-number : # of users in the cluster] *** 


Unnamed: 0,0 : 13 Ids,1 : 28 Ids,2 : 19 Ids,4 : 22 Ids,5 : 57 Ids,6 : 212 Ids,7 : 15 Ids,8 : 12 Ids,9 : 18 Ids
0,Luke Wroblewski,Tim O'Reilly,Sam Altman,Wes McKinney,Hilary Mason,Vinod Khosla,WIRED,Marc Andreessen,Hunter Walk
1,Daniel Burka,Om Malik,Drew Houston,Sean J. Taylor,dj patil,Benedict Evans,John Maeda,Elon Musk,Dave McClure
2,Julie Zhuo,Bill Gates,Patrick Collison,Olivier Grisel,Jeff Hammerbacher,Tim Cook,Clayton Christensen,Fred Wilson,Josh Kopelman
3,Braden Kowitz,Kara Swisher,Bret Taylor,John Myles White,Monica Rogati,Nate Silver,Tom Hulme,Reid Hoffman,Josh Elman
4,Twitter Design,Ev Williams,Ron Conway,John D. Cook,Drew Conway,Satya Nadella,MIT Media Lab,Aaron Levie,Keith Rabois
5,Josh Brewer,marissamayer,Brian Chesky,Ryan Rosario,Peter Skomoroch,Tony Fadell,IDEO,dick costolo,Shervin Pishevar
6,GV Design,TechCrunch,Adam D'Angelo,Lynn Cherny,Google Research,Sarah Lacy,Stanford d.school,Paul Graham,Gabe Rivera
7,Jon Wiley,John Doerr,brian pokorny,Fernando Perez,Nathan Yau,Y Combinator,Co.Design,Chris Dixon,Liz Gannes
8,brynn evans,Eric Schmidt,Paul Buchheit,Guido van Rossum,Mike Bostock,Matt Mullenweg,Diego Rodriguez,Ben Horowitz,Naval Ravikant
9,John Zeratsky,Steven Levy,Andrew Mason,Scientific Python,Hadley Wickham,Jeremy Stoppelman,Tim Brown,Chris Sacca,jason


In [30]:
# AffinityPropagation | Top-influencers in Social-Circles

dummy_df = pd.DataFrame()

for i in range(n_clusters_affinity):
    
    nodes_in_cluster = list( clusters_df [clusters_df['affinity'] == i ]['names'] )     
    if len(nodes_in_cluster) >= 10:            # identify only clusters which are big enough. say size > 10        
        col_name           = str(i) + " : " + str(len(nodes_in_cluster)) + " Ids"
        dummy_df[col_name] = nodes_in_cluster[:10]
        
print "*** Affinity Propagation | Cluster-wise Top-influencers [Cluster-number : # of users in the cluster] *** "
dummy_df

*** Affinity Propagation | Cluster-wise Top-influencers [Cluster-number : # of users in the cluster] *** 


Unnamed: 0,5 : 14 Ids,7 : 13 Ids,9 : 12 Ids,39 : 14 Ids,46 : 55 Ids,67 : 51 Ids,131 : 10 Ids,146 : 37 Ids
0,Chris Anderson,Elon Musk,WIRED,Kara Swisher,Tim O'Reilly,Fred Wilson,Drew Houston,Marc Andreessen
1,Nick Bilton,Ben Horowitz,Dave McClure,Walt Mossberg,Bill Gates,Aaron Levie,Hacker News,Max Levchin
2,Megan Smith,Nate Silver,Naval Ravikant,dj patil,Ev Williams,Paul Graham,Tom Hulme,Mitch Kapor
3,Aziz Ansari,Gabe Rivera,Bret Taylor,michael abbott,dick costolo,Chris Dixon,Andrew Ng,John Maeda
4,Josh Wills,Liz Gannes,Jessica Verrilli,Charlie Cheever,marissamayer,TechCrunch,Edd Dumbill,Kevin Rose
5,Sean Ellis,Re/code,Jonah Peretti,Werner Vogels,Chris Sacca,Bill Gurley,Xavier Amatriain,Hilary Mason
6,Twitter Design,Jeff Jordan,Mark Suster,Jeremy Howard,John Doerr,Tim Cook,Alice Zheng,jason
7,Jason Goldberg,Garry Tan,Charlie Rose,Seth Godin,Steven Levy,Joi Ito,RStudio,Matt Mullenweg
8,Trey Causey,Edward Tufte,Semil,DHH,Sam Altman,Satya Nadella,kate matsudaira,John Carmack
9,Matias Duarte,Jay Kreps,Ben Thompson,Anthony Goldbloom,Biz Stone,Bradley Horowitz,Yahoo Labs,Brian Chesky


We can see above output from KMeans, Spectral and AffinityPropagation clustering algorithms, run on the input data of nodes and their-followers withnin the ego-network. Affinity Propagation has identified 100+ clusters, and aren't as meaningful in this problem - whereas KMeans and Spectral algorithms have automotically discovered meaningful clusters. Further, clusters from Spectral-Clustering appear to be more discriminative w.r.t to each other (in terms of similarities within clusters and differences across clusters), than the ones from KMeans-Clustering.

Clustering Algorithms in detail             :  http://scikit-learn.org/stable/modules/clustering.html

Interactive/visual exploration of Clusters  : https://balaca.shinyapps.io/Twitter_Graph

In [23]:
# takes ids_of_interest, aggregates + counts their friends, and returns who_to_follow list as per requested length

def discover_Friends_toFollow(ids_of_interest, prune_factor = 1, count = 50):    
    
    ids_of_interest  = ids_of_interest[:int(len(ids_of_interest) * prune_factor)]
    print "'who-to-follow' list after looking at:%3d friends' records" %(len(ids_of_interest))
    
    with open(fof_filename) as f:
        reader = csv.reader(f)
        fof_dict = {row[0]:row[0:] for row in reader}  # dict of node:her_followers

    friendsToFollow = []
    for id in ids_of_interest:
        friendsToFollow += list (set(fof_dict[str(id)])  - set(index) ) 

    friendsToFollow = Counter(friendsToFollow).most_common(count)    
    topFriendsIds = [i for i,j in friendsToFollow]
    topFriendsFreq = [j for i,j in friendsToFollow]
           
    topFriendsToFollowDF = pd.DataFrame()
    topFriendsToFollowDF['Ids'], topFriendsToFollowDF['Freq'] = topFriendsIds, topFriendsFreq
    topFriendsToFollowDF['screenNames'], topFriendsToFollowDF['names'] = lookup_in_cache(topFriendsIds)
    
    topFriendsToFollowDF = topFriendsToFollowDF.set_index('Ids', drop = True)          
    return topFriendsToFollowDF    

In [24]:
# 'who_to_follow' on the basis of meaningful-circles discovered
#  e.g. 'hmason' for 'data-science'

clusterNo = clusters_df [ clusters_df['screenNames'] == "hmason" ]['spectral']
clusterNo = list(clusterNo)[0]

favorite_cluster_df = clusters_df [clusters_df['spectral'] == clusterNo ]
favorite_cluster_list = list(favorite_cluster_df.index)

print "*** Explore the specific cluster (Spectral-Clustering) of interest ***"
discover_Friends_toFollow(favorite_cluster_list,1,20)  # recommend 20 Users to follow (who_to_follow)!

*** Explore the specific cluster (Spectral-Clustering) of interest ***
'who-to-follow' list after looking at: 57 friends' records


Unnamed: 0_level_0,Freq,screenNames,names
Ids,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
18367054,31,medriscoll,Michael E. Driscoll
2981431,23,acroll,Alistair Croll
14595061,23,ChrisDiehl,Chris Diehl
14642896,22,petewarden,Pete Warden
20167623,22,kdnuggets,Gregory Piatetsky
3030922321,21,DJ44,DJ Patil
214272214,20,sinanaral,Sinan Aral
823957466,20,hannawallach,Hanna Wallach
43186378,20,CMastication,JD Long
18204430,20,sgourley,Sean Gourley


In [25]:
print "*** Explore the complete Ego-Network of given-user***"
discover_Friends_toFollow(clusters_df.index , 0.25, 20) # discover 20 new Users (who_to_follow) after looking at Top 25% influencers in ego-network!

*** Explore the complete Ego-Network of given-user***
'who-to-follow' list after looking at:100 friends' records


Unnamed: 0_level_0,Freq,screenNames,names
Ids,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
12,71,jack,Jack
37570179,56,arrington,Michael Arrington
14600116,56,johnbattelle,John Battelle
9729502,55,travisk,travis kalanick
5017,55,joshu,joshua schachter
5702,55,Caterina,Caterina Fake
14202711,53,mattcohler,Matt Cohler
652193,52,mgsiegler,M.G. Siegler
378223565,51,sparker,Sean Parker
5699,51,stewart,Stewart Butterfield
