### The homophily and social contagion of cheating
#### **Author**: Cora Fagan

---
To study the homophily and social contagion of cheating in the massive multiplayer online game PlayerUnknown's Battlegrounds (PUBG). Cheating in this context means the adoption of unapproved software that gives the player an unfair advantage in the game (e.g. being able to see through walls). 

Our hypotheses are that cheaters tend to associate with other cheaters but also, players who interact with cheaters become likely to adopt cheating themselves. To provide preliminary evidence for these hypotheses, we will:

1. Observe whether cheaters tend to team up with other cheaters more than chance.
2. Observe whether players who observe cheaters are likely to become cheaters more than chance.
3. Observe whether players who are killed by cheaters are likely to become cheaters more than chance.

To test the "more than chance" part, we will simulate alternative universes in which the players played the same game but joined a different team or happened to be killed by someone else at a different time. We will then compare how what we observe in the actual data compares to what we would expect in a "randomized" world.  

### Data

The data originally used in this project is not publicly accessible. To run the below code, please format your data as follows: 

* `cheaters.txt` – contains cheaters who played between March 1 and March 10, 2019
    1. player account id
    2. estimated date when the player started cheating
    3. date when the player's account was banned due to cheating


* `kills.txt` – contains the killings done in 6,000 randomly selected matches played between March 1 and March 10, 2019
    1. match id 
    2. account id of the killer
    3. account id of the player who got killed
    4. time when the kill happened
 
 
* `team_ids.txt` – contains the team ids for players in 5,419 team-play matches in the same period. If a match from the kills.txt file does not appear in these data, we will assume that it was in single-player mode.  
    1. match id 
    2. player account id
    3. team id in match

In [1]:
# Import modules here
import get_file_data
import cheaters
import shuffle
import summarize

### 1. Do cheaters team up?

Use the files `cheaters.txt` and `team_ids.txt` to estimate how often cheaters (regardless of when exactly they started cheating) end up on the same team.

Now, randomly shuffle the team ids among the players in a match. Repeat this 20 times and estimate the expected counts as before. 

In [2]:
# Output answers here

team_data = get_file_data.get_team_data('../data/team_ids.txt')
cheaters_data = get_file_data.get_cheaters_data('../data/cheaters.txt')

# Create dictionary that organizes data by a composite key of match id and team id to see how frequently cheaters are on the same team. \n",
match_teams = cheaters.organize_teams_by_match(team_data)

# To find unique cheater IDs set the cheaters_data
cheaters_set = set(player_acc_id for player_acc_id, cheating_start_date, banned_date in cheaters_data)

cheaters_per_team = cheaters.count_cheaters_per_team(match_teams, cheaters_set)

# Summarize the results from the previous dictionary
cheaters.summarize_cheaters_per_team(cheaters_per_team)

# Randomize teams
randomized_teams = shuffle.randomize_teams(match_teams)

# Shuffle 20 times and count cheaters after randomization
randomized_results = shuffle.count_cheaters_after_randomization(match_teams, cheaters_set, num_iterations=20)

mean_cheaters = summarize.calculate_mean_cheaters(randomized_results)

print("\nMean number of teams with n number of cheaters after randomization:\n ")

for cheater_count, mean in mean_cheaters.items():
    print(f"Mean number of teams with {cheater_count} cheater(s): {mean}")

confidence_intervals = summarize.calculate_confidence_intervals(randomized_results)

print("\n95% Confidence Intervals for number of teams with n number of cheaters after randomization:\n ")

# Print the confidence intervals for each number of cheaters (0, 1, 2, 3, 4)
for cheater_count, (lower_bound, upper_bound) in confidence_intervals.items():
    print(f"95% Confidence Intervals for {cheater_count} cheaters: ({lower_bound}, {upper_bound})")

Number of teams with 0 cheater(s): 170783
Number of teams with 1 cheater(s): 3199
Number of teams with 2 cheater(s): 181
Number of teams with 3 cheater(s): 9
Number of teams with 4 cheater(s): 2

Mean number of teams with n number of cheaters after randomization:
 
Mean number of teams with 0 cheater(s): 170613.0
Mean number of teams with 1 cheater(s): 3526.35
Mean number of teams with 2 cheater(s): 34.3
Mean number of teams with 3 cheater(s): 0.35
Mean number of teams with 4 cheater(s): 0.0

95% Confidence Intervals for number of teams with n number of cheaters after randomization:
 
95% Confidence Intervals for 0 cheaters: (95838.55589434637, 245387.44410565362)
95% Confidence Intervals for 1 cheaters: (1980.8589707585488, 5071.841029241451)
95% Confidence Intervals for 2 cheaters: (19.26736220086441, 49.33263779913558)
95% Confidence Intervals for 3 cheaters: (0.19660573674351442, 0.5033942632564855)
95% Confidence Intervals for 4 cheaters: (0.0, 0.0)


### 2. Do victims of cheating start cheating?

Use the files `cheaters.txt` and `kills.txt` to count how many players got killed by an active cheater on at least one occasion and then started cheating. Specifically, we are interested in situations where:

1. Player B has started cheating but player A is not cheating.
2. Player B kills player A.
3. At some point afterwards, player A starts cheating.

Output the count in the data. 

Then, simulate alternative worlds in which everything is the same but the events took somewhat different sequence. To do so, randomize within a game, keeping the timing and structure of interactions but shuffling the player ids. Generate 20 randomizations like this and estimate the expected count of victims of cheating who start cheating as before.

In [3]:
# Output answers here

# Read the kills data
kills_data = get_file_data.get_kills_data('../data/kills.txt')

# Filters kills data to include only rows where the killer is in the cheaters set
cheater_kills = cheaters.filter_kills_by_cheaters(kills_data, cheaters_set)

print("Number of players who were killed by a cheater and then began cheating:", cheaters.cheaters_after_killed(cheater_kills, cheaters_data))

# Generate simulated worlds
all_simulated_worlds = shuffle.generate_simulated_worlds(kills_data, num_simulations=20)

# Pre-flatten the simulations
flattened_simulations = [shuffle.flatten_kills(simulation) for simulation in all_simulated_worlds]

# Now call summarize_cheaters_after_killed with flattened data
simulation_results_killed = shuffle.summarize_cheaters_after_killed(flattened_simulations, cheaters_data)

print("Mean for 'cheating after being killed':", summarize.calculate_mean_observers(simulation_results_killed))

print(f"95% Confidence Interval for the number of players who started cheating after being killed by a cheater in the simulated worlds\n: {summarize.calculate_observer_confidence_intervals(simulation_results_killed)}")


Number of players who were killed by a cheater and then began cheating: 47
Mean for 'cheating after being killed': 12.65
95% Confidence Interval for the number of players who started cheating after being killed by a cheater in the simulated worlds
: (11.214542511949588, 14.085457488050412)


### 3. Do observers of cheating start cheating?

Use the files `cheaters.txt` and `kills.txt` to count how many players observed an active cheater on at least one occasion and then started cheating. Cheating players can be recognized because they exhibit abnormal killing patterns. We will assume that player A realizes that player B cheats if:

1. Player B has started cheating but player A is not cheating.
2. Player B kills at least 3 other players before player A gets killed in the game.
3. At some point afterwards, player A starts cheating.

Then, use the 20 randomizations from Part 2 to estimate the expected count of observers of cheating who start cheating. Output the mean and the 95% confidence interval for the expected count in these randomized worlds.

In [4]:
# Filter kills.txt to find the first instance of abnormal killing patterns in each match
filtered_kills_by_cheating_time = cheaters.filter_kills_by_cheating_time(kills_data, cheaters_data)

# Now find out who observed the cheating by by filtering for victims and killers after the previously obtained "observed_time" per match 
observers = cheaters.find_observers(filtered_kills_by_cheating_time, cheaters_data)

# Now find out who began cheating after they were an observer 
filtered_cheaters = cheaters.filter_cheaters(observers, filtered_kills_by_cheating_time, cheaters_data)

print("Number of players who started cheating after observing a cheater:", filtered_cheaters)

# Find summary statistics for simulations
simulation_results_observing = shuffle.summarize_simulation_results(flattened_simulations, cheaters_data, observers, filtered_kills_by_cheating_time)

print("Mean number of players who started cheating after observing a cheater in simulations:", summarize.calculate_mean_observers(simulation_results_observing))
print("95% Confidence Interval for simulations after observing a cheater", summarize.calculate_observer_confidence_intervals(simulation_results_observing))

Number of players who started cheating after observing a cheater: 105
Mean number of players who started cheating after observing a cheater in simulations: 31.75
95% Confidence Interval for simulations after observing a cheater (29.255808547845614, 34.24419145215438)
