# Introduction

This notebook provides a step-by-step guide to conducting Organizational Network Analysis (ONA) using Python. ONA is a valuable tool for understanding the relationships and communication patterns within an organization, allowing leaders to optimize collaboration, streamline communication, and enhance organizational effectiveness.

**Contents:**
* Load Libraries: Import necessary Python libraries for data manipulation, network analysis, and visualization.

* Create Dummy Data: Generate synthetic data representing employees and their interactions within the organization. This includes creating lists of first names, last names, teams, and two tests with their scores, randomly combining names to create pairs of interacting employees, and assigning interaction statuses.

* Format Dummy Data: Filter the dummy data to include only interactions with a status of "TRUE," add overall profile scores for both ego and alter

* Ego and Alter: Preperation steps into creating an Ego network

* Visualising the network: Display the network

* Add additonal layers of insights into the network: Identify the employee with the highest centrality value, representing the individual with the most influence in the organization.

* Recomendation Analysis: Dig deeper into the analysis by applying a few moretransformations and calculations to make communication reccomendations

By following this notebook, users can gain valuable insights into the structure of their organization's social network and identify key players who drive communication and collaboration. These insights can inform strategic decisions related to leadership development, team dynamics, and organizational communication strategies.







# Preperation

## Load libraries


In [4]:
import igraph
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import random
import itertools
from statistics import mean

## Create dummy data

In [5]:
#Write sample data
firstnames = ["Maria", "Nushi", "Mohammed", "Jose", "Muhammad", "Mohamed", "Wei", "Mohammad", "Ahmed", "Yan", "Ali", "John", "David", "Li", "Abdul", "Ana", "Ying", "Michael", "Juan", "Anna", "Mary", "Jean", "Robert", "Daniel", "Luis", "Carlos", "James", "Antonio", "Joseph", "Hui", "Elena", "Francisco", "Hong", "Marie", "Min", "Lei", "Yu", "Ibrahim", "Peter", "Fatima", "Aleksandr", "Richard", "Xin", "Bin", "Paul", "Ping", "Lin", "Olga", "Sri", "Pedro", "William", "Rosa", "Thomas", "Jorge", "Yong", "Elizabeth", "Sergey", "Ram", "Patricia", "Hassan", "Anita", "Manuel", "Victor", "Sandra", "Ming", "Siti", "Miguel", "Emmanuel", "Samuel", "Ling", "Charles", "Sarah", "Mario", "Joao", "Tatyana", "Mark", "Rita", "Martin", "Svetlana", "Patrick", "Natalya", "Qing", "Ahmad", "Martha", "Andrey", "Sunita", "Andrea", "Christine", "Irina", "Laura", "Linda", "Marina", "Carmen", "Ghulam", "Vladimir", "Barbara", "Angela", "George", "Roberto", "Peng", "Ivan", "Alexander", "Ekaterina", "Qiang", "Yun", "Jesus", "Susan", "Sara", "Noor", "Mariam", "Dmitriy", "Eric", "Zahra", "Fatma", "Fernando", "Esther", "Jin", "Diana", "Mahmoud", "Chao", "Rong", "Santosh", "Nancy", "Musa", "Anh", "Omar", "Jennifer", "Gang", "Yue", "Claudia", "Maryam", "Gloria", "Ruth", "Teresa", "Sanjay", "Na", "Nur", "Kyaw", "Francis", "Amina", "Denis", "Stephen", "Sunil", "Gabriel", "Andrew", "Eduardo", "Abdullah", "Grace", "Anastasiya", "Mei", "Rafael", "Ricardo", "Christian", "Aleksey", "Steven", "Gita", "Frank", "Jianhua", "Mo", "Karen", "Masmaat", "Brian", "Christopher", "Xiaoyan", "Rajesh", "Mustafa", "Eva", "Bibi", "Monica", "Oscar", "Andre", "Catherine", "Kai", "Ramesh", "Liping", "Sonia", "Anthony", "Mina", "Manoj", "Ashok", "Rose", "Alberto", "Ning", "Rekha", "Chen", "Lan", "Aung", "Alex", "Suresh", "Anil", "Fatemeh", "Julio", "Zhen", "Simon", "Paulo", "Juana", "Irene", "Adam", "Kevin", "Lori"]

lastnames = ["Smith", "Ali", "Kim", "Khan", "Lee", "Tan", "Ahmed", "Lopez", "Awad", "Singh", "Perez", "Reyes", "Li", "Nkosi", "Soto", "Saleh", "Saidi", "Zhang", "Diaz", "Saad", "Kamwi", "Mbeki", "Finai", "Izaia", "Oubi", "Ashoo", "Sirmi", "Saeh", "Giam", "Voe", "Qoe", "Gihon,", "Shora", "Bhut", "Denil", "Rezo", "Cirss", "Mohid", "Zebu", "Vynek", "Zebul", "Haci", "Nelso", "Wuade", "Rodriguez", "Garcia", "Gonzalez", "Hassan", "Mohamed", "Castillo", "DeReygosa", "Macandie", "Yakobovitz", "Carolay", "Galvee", "Tydd,", "Tyreman", "Nawton", "OFallon", "Knobell", "Roisen", "Schwieso", "Keirl", "Vyel", "Kinnock", "Odempsey", "Wharmby", "Riddoch", "Lowres", "Venny", "Semens", "Leland", "Garratty", "OHearn", "Pilipyak", "Mityushin", "Wolledge", "Loughton", "Klausewitz", "Reymers", "Tucknutt", "Littleproud", "Huxster", "Mccrachen", "Jacquet-francillon", "ElArrasi", "Bs", "Klemt", "Mulliss", "Mackleden", "Moretonas", "Haoxiang", "Kokemohr", "Sterricker", "Minchenton", "Tuffrey", "Truder", "Harichane", "Pagden", "Mallows"]

teams = ["Product", "Marketing", "Data Science", "Psychology", "Content", "Finance", "Sales", "Customer Success"]

#Random generate a list of names from first names and last names
names = " ".join(random.choice(firstnames)+" "+random.choice(lastnames)+"," for _ in range(300))

#Turn that names list into an actual list {from a string}
names2 = names.split(",")
names3 = list(names2)

#Create an edge list that randomly ombines two people from the entire list
edge_list = list(itertools.combinations((names3), 2))
interactions = pd.DataFrame(edge_list, columns = ['person_a', 'person_b'])

#Randomly assign whether these two people interact or not
interactions['interaction_random'] = np.random.randint(0, 2, interactions.shape[0])

#Create dataframe of individuals and teams based on names list and teams
res = {names3[i]: random.choice(teams) for i in range(len(names3))}
names_teams = pd.DataFrame.from_dict(res, orient='index', columns=['Team'])
names_teams['Names'] = names_teams.index

#Create test scores
profile_scores = pd.DataFrame(names3, columns = ['ego'])
testB_scores_array = np.random.default_rng().uniform(low=-2.5,high=2.5, size=[300,4])
testA_scores_scores_array = np.random.default_rng().uniform(low=-4,high=4, size=[300,6])
testB_scores = pd.DataFrame(testB_scores_array, columns = ['question_B1', 'question_B2', 'question_B3', 'question_B4'])
testA_scores_scores = pd.DataFrame(testA_scores_scores_array, columns = ['question_A1', 'question_A2', 'question_A3', 'question_A4', 'question_A5', 'question_A6'])
profile_scores = pd.merge(profile_scores, testB_scores, left_index=True, right_index=True)
profile_scores = pd.merge(profile_scores, testA_scores, left_index=True, right_index=True)
profile_scores

## Format dummy data

In [None]:
#Calculate the average profile score to = total score 
col_list = ['question_B1', 'question_B2', 'question_B3', 'question_B4','question_A1', 'question_A2', 'question_A3', 'question_A4', 'question_A5', 'question_A6']
profile_scores['Score'] = (profile_scores[col_list].sum(axis=1))+10 #+3
profile_scores['Score'] = (profile_scores['Score'])/25.942696*5
profile_scores = pd.DataFrame(profile_scores)

#Filter where interactions = TRUE(1)
interactions_filtered = interactions[interactions['interaction_random'] == 1]
interactions_filtered = pd.DataFrame(interactions_filtered)
#Drop interaction_random
del interactions_filtered["interaction_random"]
#Rename interactions_filtered columns
interactions_filtered.columns = ['ego','alter']

#Join 
profile_scores_filtered = profile_scores[['ego','Score']]
interactions_filtered = interactions_filtered.merge(profile_scores_filtered,on='ego')
interactions_filtered = interactions_filtered.merge(profile_scores_filtered,left_on='alter',right_on='ego')

#Format dataframe
del interactions_filtered['ego_y']
interactions_filtered.columns = ['ego','alter','ego_score','alter_score']

#Drop NA rows
interactions_filtered = interactions_filtered.replace(r'^s*$', float('NaN'), regex = True)
interactions_filtered = interactions_filtered.dropna()

interactions_filtered.head()

# Organisational Network Analysis

NetworkX is a Python package for studying structure, dynamics, and functions of complex networks.
The employee influence network is a direct graph, i.e. an arrow line from employee e1 to e2 means that e1 has some "influence" on e2.
There are many ways to compute centrality scores. Here Eigenvector Centraility is used, this measures node **influence** in the graph. 

Note, ['Scores'] from the data sample are used to compute a weighted adjacency matrix, as a posed to simple 1's and 0's

## Ego and Alter
Preperation steps into creating and Ego Network.
In an Ego Network:
* *Ego* is a single entity or node, which is associated with and *Alter*
* *Alter* is the entity or node connected to that *Ego*

In [None]:
#Rename person_a to ego and person_b to alter
network_data = interactions_filtered

#Sort by ego to make things more clear
network_data.sort_values(by='ego',inplace=True)

display(network_data)

## Joining datasets

Adding names_teams data to network_data to use later on, for both Ego and Alter

In [None]:
##EGO
ego_names_teams = names_teams.copy()

#Rename columns to match the network data
ego_names_teams.rename(columns={'Team':'ego_team',
                                'Names':'ego'},
                       inplace=True)

#Join ego_names_teams dataframe to  network_data
network_data = pd.merge(network_data,ego_names_teams,on='ego',how='left')

##ALTER
alter_names_teams = names_teams.copy()

#Rename columns to match the network data
alter_names_teams.rename(columns={'Team':'alter_team',
                                'Names':'alter'},
                       inplace=True)

#Join ego_names_teams dataframe to  network_data
network_data = pd.merge(network_data,alter_names_teams,on='alter',how='left')

display(network_data)

## Visualising the network

In [None]:
#Use the networkx library to visualise the network
network_graph=nx.from_pandas_edgelist(network_data,              #This tells networkx that the dataframe can be read as a list of connections
                                      source='ego', 
                                      target='alter',
                                      create_using=nx.DiGraph()) #This tells networkx that connections can go both ways (people can both seek and be sought)
                                                                 
#Also use matplotlib to display the graph
plt.figure(figsize=(20,20)) #Change the default plot size                                             
#limits=plt.axis('off')      #Get rid of the axis
nx.draw_networkx(network_graph,
                 arrows=True,
                 node_color='b',
                 edge_color = network_data['ego_score'],
                 edge_cmap=plt.cm.Blues)

Create a more insightful visulaisation by:

* Sizing each node according to how often an individual is sought out by others. AKA *Degree Centrality*.
* Focus spesifically on *In-Degree-Centrality*, which is the number of incoming connections.

In [None]:
d = dict(network_graph.in_degree()) #This is a new variable that will be use to tell matplotlib how to size nodes

#Re-draw the graph
plt.figure(figsize=(20,20))
nx.draw_networkx(network_graph,
                 arrows=True,
                 node_color='b',
                 node_size= [v**2.5  for v in d.values()]) #This is a list comprehension
                                                           #It states for every value v in our new variable d, raise that value by the power of 4.1

Create a more insigtful visualisation by:
* Adding in department/team data

In [None]:
#Make a *colour key* for netowrkx to work with
#Assign a numeric code for each team which will then be used to assign a colour to each node
hris = names_teams.set_index('Names')
hris = hris.reindex(network_graph.nodes())
hris['Team'] = pd.Categorical(hris['Team'])
hris['Team'].cat.codes

In [None]:
plt.figure(figsize=(20,20))
nx.draw_networkx(network_graph,
                 arrows=True,
                 node_color=hris['Team'].cat.codes,
                 node_size= [v**2.5 for v in d.values()],
                 edge_color = network_data['ego_score'],
                 edge_cmap=plt.cm.ocean) 

#Use Patch and Line2D to create a legend of teams and their colors
from matplotlib.patches import Patch
from matplotlib.lines import Line2D

legend_elements = [Patch(facecolor='yellow', edgecolor='yellow',label='Product'),
                   Patch(facecolor='green', edgecolor='green',label='Marketing'),
                   Patch(facecolor='purple', edgecolor='purple',label='Data Science'),
                   Patch(facecolor='steelblue', edgecolor='steelblue',label='Psychology'),
                   Patch(facecolor='turquoise', edgecolor='turquoise',label='Content'),
                   Patch(facecolor='orange', edgecolor='orange',label='Finance'),
                   Patch(facecolor='wheat', edgecolor='wheat',label='Sales'),
                   Patch(facecolor='red', edgecolor='red',label='Customer Success')]

plt.legend(handles=legend_elements,prop={'size': 10})

# Recommendation Analysis

Dig deeper into the analysis by applying a few more transformation and calculations to gain insightful information and make commuincation recomendations

### Inbound Connections

Used as a measure of influence 

Gain insight to see how many people seek out an individual employee

In [None]:
#Use Pandas to count the number of times someone is connected to another
##Ego
ego_inbound_connections=pd.DataFrame(network_data.groupby('ego').size()) #Groupby will combine rows with the same Name (participant), and size counts them

ego_inbound_connections.reset_index(inplace=True)
ego_inbound_connections.columns=['ego','Inbound Connections'] #Rename the count to something more intuitive 
ego_inbound_connections.sort_values('Inbound Connections',ascending=False,inplace=True) #Sort to see who is most influential on top

##Alter
alter_inbound_connections=pd.DataFrame(network_data.groupby('alter').size()) #Groupby will combine rows with the same Name (participant), and size counts them

alter_inbound_connections.reset_index(inplace=True)
alter_inbound_connections.columns=['alter','Inbound Connections'] #Rename the count to something more intuitive 
alter_inbound_connections.sort_values('Inbound Connections',ascending=False,inplace=True) #Sort to see who is most influential on top

### Cross-Team Collaboration

View cross-team collaboration using a *block density chart*. 

This represents the number of connections to other groups as a percentage of all a group's outgoing connections

In [None]:
#Use the crosstab method to create a frequency table
cross_team = pd.crosstab(index=network_data['ego_team'],
                         columns=network_data['alter_team'],
                         normalize='index').round(4)*100

display(cross_team)

##Flatten cross_team
cross_team_filtered = cross_team.stack().reset_index()
cross_team_filtered.columns = ['ego_team','alter_team','collaboration']
##Only include cross team collaboration > 30
#cross_team_filtered = cross_team_filtered[cross_team_filtered['collaboration'] > 30]
display(cross_team_filtered)

### Communication Reccomendation

Reccommending who I should be talking to based off:
 * Someone who my team collaborates often with.
 * Someone who colloborates often with my team.
 * Someone who has similar behaviour and personality scores to me.
 * Someone who has similar influence (inbound connections) to me.

In [None]:
#Intantiate a new data frame
communications = pd.DataFrame(network_data)

#Difference between ego_score an alter_score
communications['score_difference'] = (communications['ego_score'] - communications['alter_score'])


#Binary Value for Cross Team Collaboration
#Merge
communications = pd.merge(communications,cross_team_filtered,on='ego_team', how='outer')
communications = communications[communications['alter_team_x'] == communications['alter_team_y']]
#Add binary
communications.loc[communications['collaboration'] > 30, 'CTC'] = 1
communications.loc[communications['ego_team'] == communications['alter_team_x'], 'CTC'] = 1
communications.loc[communications['collaboration'] < 30, 'CTC'] = 0


#Influence/Inbound Connections of ego and alter
communications = pd.merge(communications,ego_inbound_connections,on='ego')
communications = pd.merge(communications,alter_inbound_connections,on='alter')


#Clean Up
communications = communications.drop('alter_team_y', axis=1)
communications.columns = ['ego','alter','ego_score','alter_score','ego_team','alter_team','score_difference','collaboration','CTC','ego_influence','alter_influence']
communications = communications.reindex(columns=['ego','alter','ego_score','alter_score','ego_team','alter_team','score_difference','ego_influence','alter_influence','collaboration','CTC'])

#Total score of reccomended communications based off; alter_influenece, score-difference, and collaboration/CTC
communications['total_score'] = (communications['alter_score'] + communications['score_difference'] + communications['collaboration'])

#Should ego talk to alter Y/N
mean_ts = mean(communications['total_score'])

communications.loc[communications['total_score'] >= mean_ts, 'recommend'] = 'Yes'
communications.loc[communications['total_score'] < mean_ts, 'recommend'] = 'No'

display(communications)

In [None]:
#Intantiate a new data frame
recommend_network_data = pd.DataFrame(communications)

#Filter for only recommended communications
recommend_network_data = recommend_network_data[recommend_network_data['recommend'] == 'Yes']

#Clean up to create an edge list
recommend_network_data = recommend_network_data.drop(['score_difference','ego_influence','alter_influence','collaboration','CTC','total_score','recommend'], axis=1)


#Use the networkx library to visualise the network
recommend_network_graph=nx.from_pandas_edgelist(recommend_network_data,              #This tells networkx that the dataframe can be read as a list of connections
                                      source='ego', 
                                      target='alter',
                                      create_using=nx.DiGraph()) #This tells networkx that connections can go both ways (people can both seek and be sought)
                                                                 
d = dict(recommend_network_graph.in_degree()) #This is a new variable that will be use to tell matplotlib how to size nodes

#Make a *colour key* for netowrkx to work with
#Assign a numeric code for each team which will then be used to assign a colour to each node
hris = names_teams.set_index('Names')
hris = hris.reindex(recommend_network_graph.nodes())
hris['Team'] = pd.Categorical(hris['Team'])
hris['Team'].cat.codes

plt.figure(figsize=(20,20))
nx.draw_networkx(recommend_network_graph,
                 arrows=True,
                 node_color=hris['Team'].cat.codes,
                 node_size= [v**2.5 for v in d.values()],
                 edge_color = recommend_network_data['ego_score'],
                 edge_cmap=plt.cm.ocean) 

#Use Patch and Line2D to create a legend of teams and their colors
from matplotlib.patches import Patch
from matplotlib.lines import Line2D

legend_elements = [Patch(facecolor='yellow', edgecolor='yellow',label='Product'),
                   Patch(facecolor='green', edgecolor='green',label='Marketing'),
                   Patch(facecolor='purple', edgecolor='purple',label='Data Science'),
                   Patch(facecolor='steelblue', edgecolor='steelblue',label='Psychology'),
                   Patch(facecolor='turquoise', edgecolor='turquoise',label='Content'),
                   Patch(facecolor='orange', edgecolor='orange',label='Finance'),
                   Patch(facecolor='wheat', edgecolor='wheat',label='Sales'),
                   Patch(facecolor='red', edgecolor='red',label='Customer Success')]

plt.legend(handles=legend_elements,prop={'size': 10})
#display(recommend_network_data)