<span style='color:LightSeaGreen'>Descrição</span>

Para esse projeto nós criaremos uma rede social baseada no Instagram onde teremos um grafo direcionado, já que posso seguir alguém que não me segue. Além disso, teremos conexões que serão melhores amigos e outras que serão conexão comuns. Logo, teremos um grafo direcionado e ponderado.  
O objetivo será criar algumas funções relacionadas ao grafo e a rede social:  
- Exibir número de seguidores
- Exibir quantidades de pessoas que o usuário segue
- Ordenar a lista de Stories, ou seja, melhores amigos primeiro e depois conexões comuns ordenadas por ordem alfabética -> [melhores amigos em ordem alfabetica , amigos em ordem alfabetica]
- Encontrar top k influencers, ou seja, k pessoas que mais tem seguidores da rede
- Encontrar o caminho entre uma pessoa e outra na rede ✔

# Table of contents
1. [Environment imports](#imports)
2. [Reading Files](#read)
3. [Graph object](#graphobj)
    - [Testing](#testing)
4. [Creating the connection graph](#datagraph)
5. [Show number of followers](#followers)
6. [Show number of following users](#following)
7. [Show top K influencers](#topk)
5. [Testing](#methods)
    - [Test 1](#test1)
    - [Test 2](#test2)
    - [Test 3](#test3)
    - [Test 4](#test4)
    - [Test 5](#test5)

## <span style='color:LightSeaGreen'>Environment imports</span> <a id='imports'></a>

In [388]:
import math
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt


## <span style='color:LightSeaGreen'>Reading files</span> <a id='read'></a>

In [389]:
# show files in data directory
!dir data


 Volume in drive C is BOOTCAMP
 Volume Serial Number is B076-BC51

 Directory of c:\Users\joaob\OneDrive\Documents\GitHub\LetsCode\Modulo 2 - Estrutura de Dados\Projetos\data

23/09/2021  14:24    <DIR>          .
23/09/2021  14:24    <DIR>          ..
21/09/2021  19:43            36.538 conexoes.csv
21/09/2021  19:30             1.893 usuarios.csv
               2 File(s)         38.431 bytes
               2 Dir(s)  48.182.673.408 bytes free


In [390]:
# read csv files in a pandas data frame
connections = pd.read_csv('data/conexoes.csv', header=None)
users = pd.read_csv('data/usuarios.csv', header=None)
# rename data frame titles
connections.columns = ['follower','following','weight']
users.columns = ['Name','username']

### <span style='color:LightGreen'>User dataframe info</span>  

In [391]:
print(users.info())
users.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Name      100 non-null    object
 1   username  100 non-null    object
dtypes: object(2)
memory usage: 1.7+ KB
None


Unnamed: 0,Name,username
0,Helena,helena42
1,Alice,alice43
2,Laura,laura29
3,Manuela,manuela19
4,Valentina,valentina26


### <span style='color:LightGreen'>Connections dataframe info</span>  

In [392]:
print(connections.info())
connections.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1627 entries, 0 to 1626
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   follower   1627 non-null   object
 1   following  1627 non-null   object
 2   weight     1627 non-null   int64 
dtypes: int64(1), object(2)
memory usage: 38.3+ KB
None


Unnamed: 0,follower,following,weight
0,helena42,alice43,1
1,helena42,gustavo16,1
2,helena42,ana_clara30,1
3,helena42,mariana5,1
4,helena42,caua11,1


In [393]:
# checking for outliers in Users data frame
print(f'Unique username values: {users.nunique()[1]} {(users.nunique()[1]==len(users))}')
users.where(users.username==users.username.str.len().max()).nunique()
print(f'Connections weight values: {connections.weight.unique()}')
# ploting
#fig, ax = plt.subplots(figsize=(12, 6))
#ax.set_xticklabels(ax.get_xticklabels(), rotation=45, horizontalalignment='right')
#sns.barplot(x = 'username', y = 'following', data = following_count, ax=ax)
#plt.show()

Unique username values: 100 True
Connections weight values: [1 2]


## <span style='color:LightSeaGreen'>Creating Graph object<span style='color:LightSeaGreen'> <a id='graphobj'></a>

In [394]:
class Graph():

    def __init__(self) -> None:
        self.adjMatrix = {}

    def addNode(self, name):
        self.adjMatrix[name] = {}

    def connectNode(self, origin, destiny, weight=1):
        self.adjMatrix[origin][destiny] = weight

    # BFS algorithm to find path
    def showPath(self, origin, destiny):
        queue = [origin]
        visited = []
        predecessor = {origin: None}
        
        while len(queue) > 0:
            currentNode = queue.pop(0)
            visited.append(currentNode)
            for adjacent in self.adjMatrix[currentNode].keys():

                if adjacent not in queue + visited:
                    predecessor[adjacent] = currentNode
                    queue.append(adjacent)

                if adjacent == destiny:
                    path = [destiny]
                    while currentNode is not None:
                        path.append(currentNode)
                        currentNode = predecessor[currentNode]
                    path.reverse()
                    result = ['->'] * (len(path) * 2 - 1)
                    result[0::2] = path
                    fullPath = ' '.join([str(elem) for elem in result])
                    originName = users.loc[users['username']==origin].Name.item()
                    destinyName = users.loc[users['username']==destiny].Name.item()
                    return f'Path from {originName} to {destinyName}: {fullPath}'

        return False

    def djikstra(self, origin):
        # create node dictionary with infinite values
        distance = {node: math.inf for node in self.adjMatrix.keys()}
        distance[origin] = 0

        # create dicitionary with None
        previous = {node: math.inf for node in self.adjMatrix.keys()}

        # create aux dict for not know node
        know = []
        not_know = distance.copy() # to remove already visited nodes

        while len(not_know) > 0:
            # order dictionary
            k = sorted(not_know.items(), key=lambda x: x[1])[0][0]
            # remove the visited node
            del not_know[k]
            # mark node a visited
            know.append(k)

            for adj in self.adjMatrix[k]:
                # if node was not visited
                if adj not in know:
                    newDist = distance[k] + self.adjMatrix[k][adj]
                    if newDist < distance[adj]:
                        distance[adj] = newDist
                        not_know[adj] = newDist
                        previous[adj] = k
        
        destino = 3

        caminho_invertido = [str(destino)]
        anterior = predecessores[destino]
        while anterior is not None:
            caminho_invertido.append(str(anterior))
            anterior = predecessores[anterior]

        ' -> '.join(reversed(caminho_invertido))

        return distance, previous


## <span style='color:LightSeaGreen'>Creating the connection graph<span style='color:LightSeaGreen'> <a id='datagraph'></a>

In [395]:
# create graph object instance
network = Graph()

# add unique users by username ID
for user in users['username']:
    network.addNode(user)

# connect the users
for row in connections.iterrows():
    # row[0] -> index;  row[1] -> tuple values
    # row[1][0] -> follower;  row[1][1] -> following;  row[1][2] -> weight
    network.connectNode(origin=row[1][0], destiny=row[1][1], weight=row[1][2])

#print('Graph Matrix:')
#print(network.adjMatrix)

## <span style='color:LightSeaGreen'>Show number of followers<span style='color:LightSeaGreen'> <a id='followers'></a>

In [396]:
def showFollowersNumber(username):
    name =  users.loc[users['username']==username].Name.item()
    followers = connections.following.value_counts()[username]
    return f'{name} has {followers} followers.'

## <span style='color:LightSeaGreen'>Show number of following users<span style='color:LightSeaGreen'> <a id='following'></a>

In [397]:
def showFollowingNumber(username):
    name =  users.loc[users['username']==username].Name.item()
    followers = connections.follower.value_counts()[username]
    return f'{name} is following {followers} users.'

## <span style='color:LightSeaGreen'>Order Stories<span style='color:LightSeaGreen'> <a id='stories'></a>

In [None]:
connections.loc[connections['follower']=='helena42'].sort_values(by=['weight','following'], ascending=[False,True])['following']


## <span style='color:LightSeaGreen'>Function for top K influencers<span style='color:LightSeaGreen'> <a id='topk'></a>

In [398]:
def findInfluencers(number):
    following = connections.following.sort_values()
    following_count = following.value_counts().to_frame().reset_index()
    following_count.columns = ['username','followers']
    print(f'Top {number} Influceners:')
    print(following_count[:number])

### Drafts <a id='testing'></a>

## <span style='color:LightSeaGreen'>Testing<span style='color:LightSeaGreen'> <a id='methods'></a>

### <span style='color:LightGreen'>1) Exibir número de seguidores</span> <a id='test1'></a>

quantidade_seguidores('helena42') --> Seguidores da Helena: 18

In [399]:
showFollowersNumber('helena42')

'Helena has 18 followers.'

### <span style='color:LightGreen'>2) Exibir quantidades de pessoas que o usuário segue</span> <a id='test2'></a>

quantidade_seguindo('helena42') --> Pessoas que a Helena segue: 16

In [400]:
showFollowingNumber('helena42')

'Helena is following 16 users.'

### <span style='color:LightGreen'>3) Ordenar a lista de Stories</span> <a id='test3'></a>

stories('helena42') --> Ordem dos stories da Helena:  
 ['ana_julia22', 'pietro33', 'alice43', 'ana_clara30', 'calebe49', 'caua11', 'davi48', 'gustavo16', 'heloisa37', 'lavinia36','mariana5', 'matheus6', 'melissa42', 'nicolas4', 'rafael38', 'sophia31']

In [1]:
#stories('helena42')
connections.loc[connections['follower']=='helena42'].sort_values(by=['weight','following'], ascending=[False,True])['following']

NameError: name 'connections' is not defined

### <span style='color:LightGreen'>4) Encontrar top k influencers</span> <a id='test4'></a>

top_influencers(5) --> Top influences: {'maria_alice19': 24, 'henrique12': 22, 'miguel1': 22, 'isis3': 22, 'alice43': 22}


In [401]:
findInfluencers(5)

Top 5 Influceners:
        username  followers
0  maria_alice19         24
1        alice43         22
2        miguel1         22
3          isis3         22
4     henrique12         22


### <span style='color:LightGreen'>5) Encontrar o caminho entre uma pessoa e outra na rede</span> <a id='test5'></a>

In [402]:
network.showPath('helena42', 'isadora45')

'Path from Helena to Isadora: helena42 -> ana_clara30 -> isadora45'