# Facebook Social Network Analysis

This notebook analyzes the Facebook social network data by constructing a directed graph and calculating centrality measures. We will compare the centrality measures across different user types using statistical tests.

## Step 1: Load Data

We will load the Facebook data from an Excel file using pandas.

In [3]:
import pandas as pd
import networkx as nx
import scipy.stats as stats

file_path = r'C:\Users\bobbt\Downloads\Facebook_Data.xlsx'
facebook_data = pd.read_excel(file_path, index_col=0)

# Display the first few rows of the dataset to understand its structure
facebook_data.head()

Unnamed: 0,Meredith Stransky,Brittney Mazzella,Yi Cook,Porter Devries,Suzanne Syverson,Ladawn Creason,Mikel Lamberson,Lakendra Lasiter,Kate Shiver,Sharika Aiken,...,Tehmina,Happy Bacha,Younus,AVA,Alfred,Danish,Matloob,Sameed Shahzad,Frazer,Mubashir
Meredith Stransky,0,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
Brittney Mazzella,0,0,0,0,0,0,0,0,0,1,...,0,0,0,1,1,0,0,0,1,0
Yi Cook,0,0,0,0,0,0,0,1,0,0,...,1,0,1,0,0,0,0,0,0,0
Porter Devries,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Suzanne Syverson,0,0,0,0,0,0,0,0,1,0,...,1,0,0,1,0,0,0,0,0,0


## Step 2: Construct the Network

We will create a directed graph where nodes represent users and edges represent follower relationships. The adjacency matrix from the dataset will help us add edges between nodes.


In [4]:
# Create a directed graph
G = nx.DiGraph()

# Add nodes
for user in facebook_data.columns:
    G.add_node(user, type='real')  # You can adjust the type as needed

# Add edges
for follower in facebook_data.index:
    for followee in facebook_data.columns:
        if facebook_data.at[follower, followee] == 1:
            G.add_edge(follower, followee)


## Step 3: Calculate Centrality Measures

We will calculate degree centrality and eigenvector centrality for each node in the network.


In [5]:
# Calculate centrality measures
degree_centrality = nx.degree_centrality(G)
eigenvector_centrality = nx.eigenvector_centrality(G)


## Step 4: Assign Node Types

We will assign dummy types (real or fake) to nodes based on certain criteria. This step may be adjusted if actual categorical data is available.


In [6]:
# Dummy assignment of node types (adjust this according to actual data if available)
for node in G.nodes():
    G.nodes[node]['type'] = 'real' if ' ' in node else 'fake'


## Step 5: Group Nodes by Type and Perform Comparative Analysis

We will group the nodes by their type and compare the centrality measures between real and fake accounts using t-tests.


In [7]:

# Group by type
real_users = [node for node in G.nodes() if G.nodes[node].get('type') == 'real']
fake_users = [node for node in G.nodes() if G.nodes[node].get('type') == 'fake']

degree_real = [degree_centrality[node] for node in real_users]
degree_fake = [degree_centrality[node] for node in fake_users]
eigen_real = [eigenvector_centrality[node] for node in real_users]
eigen_fake = [eigenvector_centrality[node] for node in fake_users]

# Perform t-test
t_stat_degree, p_value_degree = stats.ttest_ind(degree_real, degree_fake)
t_stat_eigen, p_value_eigen = stats.ttest_ind(eigen_real, eigen_fake)

print(f"Degree Centrality t-test: t={t_stat_degree}, p={p_value_degree}")
print(f"Eigenvector Centrality t-test: t={t_stat_eigen}, p={p_value_eigen}")


Degree Centrality t-test: t=0.8642063434738131, p=0.387682206164193
Eigenvector Centrality t-test: t=0.7076984064181457, p=0.47929779869032296


## Conclusion

We have successfully constructed a directed graph from the Facebook data and calculated the centrality measures. Here are the results of our comparative analysis using t-tests:

- **Degree Centrality t-test:**
  - t-statistic: 0.864
  - p-value: 0.388

- **Eigenvector Centrality t-test:**
  - t-statistic: 0.708
  - p-value: 0.479

## Interpretation

The p-values for both degree centrality and eigenvector centrality are greater than 0.05, indicating that there is no statistically significant difference in these centrality measures between real and fake accounts. This suggests that real and fake accounts have similar levels of influence within the network based on our centrality measures.
