# Visualizing the shared followers at the brainhack Warsaw

In [46]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from preprocess_data import *

## Data loading
Our data comes in a dictionary in which each entry is the username of a follower of the Brainhack Warsaw account and its value is a list of all its followers.
To find the simplest similarity measure between Twitter users, we compute a matrix of binary indicators, each entry indicates if a user (indicated by the row number) is followed by another user (all _followers_ are indicated by the number of columns).

In [47]:
data_dict = load_brainhack_warsaw_data()
sparse_mat, vocabulary = compute_sparse_matrix_of_followers(data_dict)

## The simplest similarity
We now compute how many followers are shared between any two users by a simple matrix multiplication.

In [38]:
shared_followers = sparse_mat.dot(sparse_mat.T).todense()

## Embedding it in a space
We now embed all followers of Brainhack Warsaw in a two dimensional space.
For this we use dimensionality reduction to project individual users in a space that groups users with a high number of shared followers closer together than users with a low number.

In [45]:
sparse_mat

<85x85 sparse matrix of type '<class 'numpy.int64'>'
	with 7225 stored elements in Compressed Sparse Row format>

In [32]:
import umap
shared_embedding = umap.UMAP(n_components=2, , metric='precomputed').fit_transform(shared_followers)

  "n_neighbors is larger than the dataset size; truncating to "


TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.

## Visualizing the space
We now visualize it.

In [None]:
plt.scatterplot(shared_embedding[:,0], shared_embedding[:,1], s=np.diag(shared_followers))