# Gaming Integrity Analytics

## Author: Ella Vacic

---

### Overview

This project studies the behaviors of players in a massive multiplayer online game, focusing on the dynamics of cheating, specifically how cheaters associate with each other and how interactions with cheaters influence others to adopt similar behavior. The goal is to implement a Python program that simulates alternative scenarios to test these hypotheses, comparing actual player behaviors to randomized outcomes. 

To complete the task, only basic Python data types and core libraries to ensure mastery of fundamental programming concepts were used, to emphasize the application of basic Python principles. The program was designed with future adaptability in mind.

### Data

Data used to design and develop the code and modules is not provided. New data, separated into three datasets, should follow the specific format outlined below to ensure compatibility with the provided modules:

1. **Cheater Data**:
   - This dataset should contain information about players who have been identified as cheaters.
   - **Required fields**:
     - Unique identifier for each player 
     - Date when the player began engaging in cheating behavior.
     - Date when the player was banned or penalized due to cheating.

2. **Interaction Data**:
   - This dataset should track interactions between players during game sessions.
   - **Required fields**:
     - Unique identifier for each interaction 
     - Identifier for the player initiating the interaction 
     - Identifier for the player on the receiving end of the interaction
     - Timestamp indicating when the interaction occurred 

3. **Team Data**:
   - This dataset should contain information about the teams players were assigned to in multiplayer game sessions.
   - **Required fields**:
     - Identifier for each game session 
     - Identifier for each player in the session 
     - Identifier for the team the player was part of during the session.

Each dataset must be saved in the same directory to allow easy import into the system.


### Importing code


In [1]:
# Importing modules
import data_import
import count
import simulate

### Cheater collaboration

By utilizing data from the Cheater Data and Team Data (for which true file names should replace the placeholders '...file name here' below), the code below estimates the frequency with which cheaters are grouped together on the same team. The output categorizes teams based on the number of cheaters present, ranging from 0 to 4. Additionally, the project includes a simulation where team assignments are randomly shuffled among players. This process is repeated 20 times to estimate the expected distribution of cheaters across teams, with the results providing both the mean and the 95% confidence intervals for these expected counts.



In [2]:
# Load data
cheaters_data = data_import.load_cheaters('Cheaters data file name here')
teams_data = data_import.load_teams('Teams data file name here') 

# Count the number of cheaters in each team
team_cheater_counts = count.count_cheaters(cheaters_data, teams_data)

# Print results
for cheater_count, team_count in team_cheater_counts.items():
    print(f"Teams with {cheater_count} cheaters: {team_count}")

# Randomize the number of cheaters in each team across matches
shuffled_teams = simulate.shuffle_teams(teams_data, 20)
team_simulation_counts = simulate.process_shuffled_teams(cheaters_data, shuffled_teams)

# Calculate the mean and confidence intervals for the output of the simulations
team_simulation_outputs = simulate.calculate_statistics(team_simulation_counts, 20)

# Print the results
for cheater_count, (mean, ci_lower, ci_upper) in team_simulation_outputs.items():
    print(f"Simulated teams with {cheater_count} cheaters: {mean:.2f} ({ci_lower:.2f}, {ci_upper:.2f})")

Teams with 0 cheaters: 170782
Teams with 1 cheaters: 3199
Teams with 2 cheaters: 182
Teams with 3 cheaters: 9
Teams with 4 cheaters: 2
Simulated teams with 0 cheaters: 170609.70 (170607.54, 170611.86)
Simulated teams with 1 cheaters: 3530.70 (3526.36, 3535.04)
Simulated teams with 2 cheaters: 33.50 (31.30, 35.70)
Simulated teams with 3 cheaters: 0.10 (-0.03, 0.23)


### Cheater interactions

The following code uses the Interaction Data and the Cheaters Data to identify players who were killed by an active cheater and later began cheating themselves. The goal is to count how many such cases occur in the dataset. Alternative scenarios are simulated by randomizing the player interactions within the same structure and timing but shuffling the player IDs. Twenty such randomizations are generated to estimate the expected count of players who start cheating after being killed by a cheater, along with a 95% confidence interval for the result.

In [3]:
# Load data
kills_data = data_import.load_kills('Insert Interaction Data file name here')

# Count the number of cheaters influenced by other cheaters
influenced_victims_count = count.count_influenced_victims(cheaters_data, kills_data)

# Print the results
print(f"Count of victims of cheating influenced to become cheaters: {influenced_victims_count}")

# Randomize the number of cheaters influenced by other cheaters
shuffled_players = simulate.shuffle_players(kills_data, 20)
victim_simulation_counts = simulate.process_shuffled_victims(cheaters_data, shuffled_players)

# Calculate the mean and confidence interval for the output of the simulations
victim_simulation_outputs = simulate.calculate_statistics(victim_simulation_counts, 20)

# Print the results
print(f"Simulated count of victims of cheating influenced to become cheaters: {victim_simulation_outputs[0]:.2f} ({victim_simulation_outputs[1]:.2f}, {victim_simulation_outputs[2]:.2f})")

Count of victims of cheating influenced to become cheaters: 47
Simulated count of victims of cheating influenced to become cheaters: 14.60 (13.71, 15.49)


Using the Interaction Data and Cheaters Data, the code below uses relevant modules to identify cases where a player encounters another player exhibiting abnormal in-game behavior that suggests cheating before later engaging in similar behavior. For the purpose of this analysis, three kills have been deemed necessary as obvious enough cheating for other players to notice. The observed count of such cases is then compared against a baseline derived from randomized simulations to estimate the expected frequency of this phenomenon. The results include a statistical confidence interval for the expected count in randomized scenarios.

In [4]:
# Count the number of observers of cheater who were influenced to become cheaters
influenced_observers_count = count.count_influenced_observers(cheaters_data, kills_data)

# Print the results
print(f"Count of observers of cheating influenced to become cheaters: {influenced_observers_count}")

# Process the shuffled data to get counts
observer_simulation_counts = simulate.process_shuffled_observers(cheaters_data, shuffled_players)

# Calculate the mean and confidence interval for the output of the simulations
observer_simulation_outputs = simulate.calculate_statistics(observer_simulation_counts, 20)

# Print the results
print(f"Simulated count of observers of cheating influenced to become cheaters: {observer_simulation_outputs[0]:.2f} ({observer_simulation_outputs[1]:.2f}, {observer_simulation_outputs[2]:.2f})")

Count of observers of cheating influenced to become cheaters: 25
Simulated count of observers of cheating influenced to become cheaters: 2.25 (1.62, 2.88)
