# Network Data Generation

### Purpose
This notebook was created to generate the data for a network of brawlers who are likely to win with each other.

### Data Used
This notebook will use game data of pairs of brawlers who won/lost together. This data was collected by both team members.

### Approach
This project will feature the following networks:
- a network of brawlers who are more likely to win with each other
- a network of brawlers who are more likely to lose with each other

#### Why not a Network with Weighted Edges
A network with weighted edges will be extremely sparse. Thus, connecting brawlers who are likely to win/lose together may make more sense for this project.

## Importing Libraries

In [1]:
import pandas as pd
import networkx
from scipy.stats import binomtest

## Writing Constants for this Notebook

In [2]:
ALPHA = 0.05

## Downloading Edge Data

### Getting Times Brawler Pairs Won

In [3]:
df_wins_aldous = pd.read_csv(
    "../outputs/victory-brawler-edges-from-aldous.csv"
)
df_wins_nash = pd.read_csv("../outputs/victory-brawler-edges-from-username.csv")

In [4]:
# Combine the DataFrames
df_wins_concatted = pd.concat([df_wins_aldous, df_wins_nash])

# Group by Brawler_1 and Brawler_2, summing the Weight column
df_wins = (
    df_wins_concatted.groupby(
        ['Brawler_1', 'Brawler_2'], 
        as_index=False
    )['Weight'].sum()
)

# Renaming weight column
df_wins = df_wins.rename(
    columns={'Weight': 'Wins'}
)

df_wins.head()

Unnamed: 0,Brawler_1,Brawler_2,Wins
0,8-BIT,8-BIT,14
1,8-BIT,ALLI,40
2,8-BIT,AMBER,23
3,8-BIT,ANGELO,21
4,8-BIT,ASH,11


### Getting Times Brawler Pairs Lost

In [5]:
df_loses_aldous = pd.read_csv(
    "../outputs/defeat-brawler-edges-from-aldous.csv"
)
df_loses_nash = pd.read_csv(
    "../outputs/defeat-brawler-edges-from-username.csv"
)

In [6]:
# Combine the DataFrames
df_loses_concatted = pd.concat([df_loses_aldous, df_loses_nash])

# Group by Brawler_1 and Brawler_2, summing the Weight column
df_loses = (
    df_loses_concatted.groupby(
        ['Brawler_1', 'Brawler_2'], 
        as_index=False
    )['Weight'].sum()
)

# Renaming weight column
df_loses = df_loses.rename(
    columns={'Weight': 'Loses'}
)

df_loses.head()

Unnamed: 0,Brawler_1,Brawler_2,Loses
0,8-BIT,8-BIT,1
1,8-BIT,ALLI,44
2,8-BIT,AMBER,38
3,8-BIT,ANGELO,15
4,8-BIT,ASH,17


### Data Engineering

#### Combining Win and Loss Data

In [7]:
# Merge df_wins and df_loses on Brawler_1 and Brawler_2
df_brawler_pairs = pd.merge(
    df_wins,
    df_loses,
    on=['Brawler_1', 'Brawler_2'],
    how='outer'
).fillna(0)  # Fill NaN with 0 for pairs that only appear in one DataFrame

# Ensure wins and loses columns are integers
df_brawler_pairs['Wins'] = df_brawler_pairs['Wins'].astype(int)
df_brawler_pairs['Loses'] = df_brawler_pairs['Loses'].astype(int)

# Counting total brawler pairs found
df_brawler_pairs["Observations"] = (
    df_brawler_pairs['Wins'] + df_brawler_pairs['Loses']
)

# Display the first few rows
df_brawler_pairs.head()

Unnamed: 0,Brawler_1,Brawler_2,Wins,Loses,Observations
0,8-BIT,8-BIT,14,1,15
1,8-BIT,ALLI,40,44,84
2,8-BIT,AMBER,23,38,61
3,8-BIT,ANGELO,21,15,36
4,8-BIT,ASH,11,17,28


In [18]:
len(df_brawler_pairs) / 8359

0.5214738605096303

### Setting Data for Gephi

#### Connection Type

In [8]:
df_brawler_pairs["Type"] = "Undirected"

#### Weight

In [9]:
df_brawler_pairs["Weight"] = 1

### Determining Connections with Statistical Tests

#### Finding Winning Brawler Pairs

In [10]:
def run_win_hypothesis_test(x):
    p_val = binomtest(
        x["Wins"], x["Observations"], p=0.5, alternative="greater"
    ).pvalue
    return p_val < ALPHA

df_brawler_pairs["Winning_Pair"] = df_brawler_pairs.apply(
    run_win_hypothesis_test, axis = "columns"
)

#### Finding Losing Brawler Pairs

In [11]:
def run_loss_hypothesis_test(x):
    p_val = binomtest(
        x["Loses"], x["Observations"], p=0.5, alternative="greater"
    ).pvalue
    return p_val < ALPHA

df_brawler_pairs["Losing_Pair"] = df_brawler_pairs.apply(
    run_loss_hypothesis_test, axis = "columns"
)

#### Evaluating Results

In [12]:
print(f"Winning pair edges: {df_brawler_pairs["Winning_Pair"].sum()}")
print(f"Losing pair edges: {df_brawler_pairs["Losing_Pair"].sum()}")

Winning pair edges: 551
Losing pair edges: 455


### Saving Edge Data

#### Winning Edges

In [13]:
# Getting only winning edges
df_winning_edges = df_brawler_pairs[
    df_brawler_pairs["Winning_Pair"] == True
]

# Renaming columns
df_winning_edges = df_winning_edges.rename(
    columns={
        'Brawler_1': 'Source', 
        'Brawler_2': 'Target'
    }
)

# Getting relevant edge information
df_winning_edges = df_winning_edges[[
    "Source", "Target", "Wins", "Loses", "Observations"
]]

In [14]:
df_winning_edges.to_csv(
    "../outputs/graph-data/winning-edges.csv", 
    index = False
)

#### Losing Edges

In [15]:
# Getting only winning edges
df_losing_edges = df_brawler_pairs[
    df_brawler_pairs["Losing_Pair"] == True
]

df_losing_edges = df_losing_edges.rename(
    columns={
        'Brawler_1': 'Source', 
        'Brawler_2': 'Target'
    }
)

# Getting relevant edge information
df_losing_edges = df_losing_edges[[
    "Source", "Target", "Wins", "Loses", "Observations"
]]

In [16]:
df_losing_edges.to_csv(
    "../outputs/graph-data/losing-edges.csv", 
    index = False
)