## The homophily and social contagion of cheating 

---

**NOTE: You are only allowed to use fundamental Python data types (lists, tuples, dictionaries, numpy.ndarray, etc.) to complete this assignment.**

Although this assignment is quite streamlined, imagine that the tasks here are part of a larger project. 


### Output

The tasks ask you to output actual counts and expecteded counts (mean with 95% confidence interval). To estimate the 95% conifdence intervals, ignore the small sample size and the fact that we are dealing with count data, and simply use the approximation: 95% CI $= \mu \pm 1.96 \frac{\sigma}{\sqrt{n}}$, where $\mu$ is the mean and $\sigma$ the standard deviation of the counts in the $n=20$ randomizations.


## Import and run code

In [1]:
# Import modules
import copy
from operator import itemgetter
from counting import *
from presentation import *
from randomisations import *
from data_manipulation import *
from dictionary_methods import *

In [2]:
# Read in cheaters data while converting 1 and 2 indices to date objects
cheaters_txt = open_txt('../assignment-final-data/cheaters.txt', date_idx=1, date_idx_2=2)

In [3]:
# Create a dictionary with the first element of cheaters_txt as the key
# and the remaining items as the value to that key.
cheaters_dic = {x[0]: x[1:] for x in cheaters_txt}

In [4]:
# Read in teams data while converting the index 2 to an integer
teams_txt = open_txt('../assignment-final-data/team_ids.txt', intg=2)    

In [5]:
# Read in kills data while converting index 3 to a datetime object
kills_txt = open_txt('../assignment-final-data/kills.txt', date_idx=3, dt=True)

In [6]:
# Create a list from the first index of all kills_txt items
all_matches_kills = index_set(kills_txt, 0)

### 1. Do cheaters team up?

Use the files `cheaters.txt` and `team_ids.txt` to estimate how often cheaters (regardless of when exactly they started cheating) end up on the same team. Your output should say how many teams have 0, 1, 2, 3, or 4 cheaters.

Now, randomly shuffle the team ids among the players in a match. Repeat this 20 times and estimate the expected counts as before. Output the mean and the 95% confidence intervals for the expected counts. 

### Preparing data

In [7]:
match_teams_cheaters, match_teams, match_players = slicing_teams(teams_txt, cheaters_dic)

### Determining numbers of cheaters on each team in *original* data

In [8]:
# Calculate the number of cheaters on each team per match
cheater_count, teams_count = cheaters_per_team(match_teams, match_teams_cheaters)

In [9]:
# Record team cheater values in a counter dictionary
cheaters_teams_actual = team_counter(cheater_count, teams_count)

### Determining expected number of cheaters in teams in *randomised* data

In [10]:
t_shfl = copy.deepcopy(teams_txt)

In [12]:
cheater_count_list = [[] for x in range(len(cheaters_teams_actual))]

for i in range(20):
    
    # Shuffling the order of team players and reassigning to new
    t_shfl = team_players_shfl(t_shfl, match_players)
    
    # Prepare shuffled data
    match_teams_cheaters_shfl, match_teams_shfl, match_players = slicing_teams(t_shfl, cheaters_dic)   
    
    # Calculate cheaters per team with shuffled data
    cheater_count, teams_count = cheaters_per_team(match_teams_shfl, match_teams_cheaters_shfl)
    
    # Creation of counter dictionary
    cheaters_teams = team_counter(cheater_count, teams_count)
    
    # Add values of counter to a list which stores all values
    for j, l in cheaters_teams.most_common():
        cheater_count_list[j].append(l)      

### Output

In [13]:
print_counter(cheaters_teams_actual.most_common())

Teams with 0 cheaters
Actual: 170782

Teams with 1 cheater
Actual: 3199

Teams with 2 cheaters
Actual: 182

Teams with 3 cheaters
Actual: 9

Teams with 4 cheaters
Actual: 2



In [14]:
print_counter_expected(cheater_count_list)

Teams with 0 cheaters
Expected (mean) (rounded): 170609.1
95% Confidence Interval (rounded): (170606.8, 170611.4)

Teams with 1 cheater
Expected (mean) (rounded): 3531.9
95% Confidence Interval (rounded): (3527.4, 3536.4)

Teams with 2 cheaters
Expected (mean) (rounded): 32.9
95% Confidence Interval (rounded): (30.6, 35.2)

Teams with 3 cheaters
Expected (mean) (rounded): 1.0
95% Confidence Interval (rounded): (1.0, 1.0)

There are no teams with 4 cheaters.


### 2. Do victims of cheating start cheating?

Use the files `cheaters.txt` and `kills.txt` to count how many players got killed by an active cheater on at least one occasion and then started cheating. Specifically, we are interested in situations where:

1. Player B has started cheating but player A is not cheating.
2. Player B kills player A.
3. At some point afterwards, player A starts cheating.

Output the count in the data. 

Then, simulate alternative worlds in which everything is the same but the events took somewhat different sequence. To do so, randomize within a game, keeping the timing and structure of interactions but shuffling the player ids. Generate 20 randomizations like this and estimate the expected count of victims of cheating who start cheating as before. Output the mean and the 95% confidence interval for the expected count in these randomized worlds.

#### Hint

Starting time of cheating is estimated as a date, so assume that a player cheats on any match that they started playing on that date or afterwards. Use the match starting date so that if the match started before midnight of the cheating date but ended after midnight, we will assume that the player was not cheating just yet. 


### Preparing data

In [15]:
# Creation of directory elements
match_cheaters, match_events, match_date, match_victims, match_players = create_directory(kills_txt, cheaters_dic)

In [16]:
# Grouping of directory elements into the directory itself
match_library = {x:[match_cheaters[x],
                    sorted(match_events[x], key=itemgetter(2)),
                    match_date[x], 
                    match_victims[x]] for x in all_matches_kills}

In [17]:
# Player lists and shuffled player lists in values of dic
swap_players = create_shuffle_swap(match_players)

### Determining actual number of victims in teams in *original* data

In [18]:
victim_actual = victims_start_cheating(match_library, cheaters_dic)

### Determining expected number of victims in teams in *randomised* data

In [19]:
match_events_shfl = copy.deepcopy(match_events)

In [20]:
# Twenty randomisations
victim_expected = []

for i in range(20):
    
    # Shuffling events data
    match_library_shfl = indexing_shuffle(match_library, match_events_shfl, swap_players)

    # Determine which victims begin cheating after being killed
    victim_cheating_shfl = victims_start_cheating(match_library_shfl, cheaters_dic)

    # Find expected count and add to tracking count list
    victim_expected.append(victim_cheating_shfl)

    # Reshuffle data before next iteration
    swap_players = shuffle_swap(swap_players)

### Output

In [21]:
print_actual_expected(victim_actual, victim_expected)

Actual:  47
Expected:  10.5
95% Confidence Interval (rounded):  (8.44, 12.56)


### 3. Do observers of cheating start cheating?

Use the files `cheaters.txt` and `kills.txt` to count how many players observed an active cheater on at least one occasion and then started cheating. Cheating players can be recognized because they exhibit abnormal killing patterns. We will assume that player A realizes that player B cheats if:

1. Player B has started cheating but player A is not cheating.
2. Player B kills at least 3 other players before player A gets killed in the game.
3. At some point afterwards, player A starts cheating.

Output the count in the data.

Then, use the 20 randomizations from Part 2 to estimate the expected count of observers of cheating who start cheating. Output the mean and the 95% confidence interval for the expected count in these randomized worlds.

### Preparing data

In [22]:
# Reset match library to use original match events
# and not the shuffled match events.

for key, value in match_library.items():
    match_library[key][1] = sorted(match_events[key], key=itemgetter(2))

### Determining actual number of observers in teams in *original* data

In [23]:
# Find observers list and keep the match date
all_observers = find_killed_observers(match_library)

In [24]:
# Determine which observers began cheating after
observers_actual = count_active_victims(all_observers, cheaters_dic)

### Determining expected  number of observers in teams in *randomised* data

In [25]:
# Twenty randomisations
observers_expected = []

for i in range(20):

    # Shuffling events data and resetting victims list
    match_library_shfl = indexing_shuffle(match_library, match_events_shfl, swap_players)
    match_library_shfl = reset_cheaters_observers(match_library_shfl, cheaters_dic)

    # Determine which observers begin cheating after being killed
    all_observers_shfl = find_killed_observers(match_library_shfl)
    num_observers = count_active_victims(all_observers_shfl, cheaters_dic)

    # Find expected count and add to tracking count list
    observers_expected.append(num_observers)

    # Reshuffle data before next iteration
    swap_players = shuffle_swap(swap_players)

### Output

In [26]:
print_actual_expected(observers_actual, observers_expected)

Actual:  213
Expected:  47.2
95% Confidence Interval (rounded):  (43.72, 50.68)
