# Diseases notebook

The goal of this notebook is to estimate odds about disease apperance for each cause from SDF logs.


First, let's import libraries and the logs :

In [1]:
from tqdm import tqdm

import numpy as np
import pandas as pd
import utils

In [2]:
logs = utils.load_player_logs()

In [3]:
logs.head()

Unnamed: 0,Ship,Character,Day.Cycle,Action,Log,Day,Cycle
0,1,Janice,2.4,NEW_CREW_MEMBER,*Janice* s'est éveillée de son si long sommeil.,2.0,4.0
1,1,Janice,2.4,CHARACTER_LEFT,*Janice* est sortie.,2.0,4.0
2,1,Janice,2.4,CHARACTER_ENTERED,*Janice* est entrée.,2.0,4.0
3,1,Janice,2.4,CUDDLE_OTHER,"*Janice* réconforte *Finola*, ça ira mieux de...",2.0,4.0
4,1,Janice,2.4,CHARACTER_LEFT,*Janice* est sortie.,2.0,4.0


## Trauma
Players can catch diseases and disorders by witnessing a death. We will call this event a **Trauma**.
We want to estimate :
- the odds of getting a trauma
- the mass function of the diseases/disorders (if the trauma event is drawn, what are the odds of getting a specific disease ?)

### Trauma probability

First, let's extract all the `fist kills` logs (a subset of death logs) :

In [4]:
utils.find_all_events_by_name(logs, "FIST_KILLED")

Unnamed: 0,Ship,Character,Day.Cycle,Action,Log,Day,Cycle
265,1,Janice,6.5,FIST_KILLED,*Janice* s'acharne sur *Eleesha*...qui rend s...,6.0,5.0
2177,2,Chao,4.5,FIST_KILLED,*Chao* s'acharne sur *Paola*...qui rend son d...,4.0,5.0
2209,2,Chao,4.7,FIST_KILLED,*Chao* s'acharne sur *Janice*...qui rend son ...,4.0,7.0
2810,2,Frieda,4.1,FIST_KILLED,*Frieda* s'acharne sur *Eleesha*...qui rend s...,4.0,1.0
3496,2,Kuan_Ti,6.6,FIST_KILLED,*Kuan Ti* s'acharne sur *Raluca*...qui rend s...,6.0,6.0
...,...,...,...,...,...,...,...
7723279,1707,Paola,1.8,FIST_KILLED,*Paola* s'acharne sur *Eleesha*...qui rend so...,1.0,8.0
7723404,1707,Ian,1.8,FIST_KILLED,*Ian* s'acharne sur *Chun*...qui rend son der...,1.0,8.0
7724967,1709,Kuan_Ti,3.5,FIST_KILLED,*Kuan Ti* s'acharne sur *Paola*...qui rend so...,3.0,5.0
7725273,1709,Janice,4.5,FIST_KILLED,*Janice* s'acharne sur *Terrence*...qui rend ...,4.0,5.0


Let's see if we can find the `Trauma` logs before or after the kill logs :

In [8]:
logs.loc[2808:2812]

Unnamed: 0,Ship,Character,Day.Cycle,Action,Log,Day,Cycle
2808,2,Frieda,4.1,TRAUMA_DISEASE,Ce que vous venez de voir vous a choqué. Vous...,4.0,1.0
2809,2,Frieda,4.1,DISEASED_PSY,Vous ne vous sentez pas très très bien... Vot...,4.0,1.0
2810,2,Frieda,4.1,FIST_KILLED,*Frieda* s'acharne sur *Eleesha*...qui rend s...,4.0,1.0
2811,2,Frieda,4.1,CHARACTER_LEFT,*Frieda* est sortie.,4.0,1.0
2812,2,Frieda,4.1,CHARACTER_ENTERED,*Frieda* est entrée.,4.0,1.0


We see on this first example that the `TRAUMA_DISEASE` log is 2 logs before the kill log, and the `DISEASED` event log is the log just before.

In [10]:
logs.loc[263:267]

Unnamed: 0,Ship,Character,Day.Cycle,Action,Log,Day,Cycle
263,1,Janice,6.5,FIST_WOUNDED,*Janice* met une terrible baigne à *Eleesha*...,6.0,5.0
264,1,Janice,6.5,TRAUMA_DISEASE,Ce que vous venez de voir vous a choqué. Vous...,6.0,5.0
265,1,Janice,6.5,FIST_KILLED,*Janice* s'acharne sur *Eleesha*...qui rend s...,6.0,5.0
266,1,Janice,6.5,TRIUMPH_EARNED,Vous avez gagné *3 Triomphe*.,6.0,5.0
267,1,Janice,6.6,DMG_DEALT,Vous perdez 3 hp.,6.0,6.0


This second example is slightly different : we have a Trauma log but not a disease log. 
 
It's probably because Janice is a Mush : she should have catched the disease but is immunized.

In [11]:
logs.loc[2175:2179]

Unnamed: 0,Ship,Character,Day.Cycle,Action,Log,Day,Cycle
2175,2,Chao,4.5,FIST_WOUNDED,*Chao* mets un magistral uppercut à *Paola*...,4.0,5.0
2176,2,Chao,4.5,MORAL_DOWN,Vous avez perdu 1 moral.,4.0,5.0
2177,2,Chao,4.5,FIST_KILLED,*Chao* s'acharne sur *Paola*...qui rend son d...,4.0,5.0
2178,2,Chao,4.5,DIRTED,C'est dégoûtant... Vous vous êtes sali.,4.0,5.0
2179,2,Chao,4.6,LOG_ACCESS,*Chao* a accédé au *Centre de Communication*.,4.0,6.0


We can see on the third example that if a human player doesn't catch a disease after a kill, the log just before is a `MORAL_DOWN` event.

In [4]:
logs.loc[7723402:7723406]

Unnamed: 0,Ship,Character,Day.Cycle,Action,Log,Day,Cycle
7723402,1707,Ian,1.8,FIST_MISSED,*Ian* rate son coup sur *Chun*.,1.0,8.0
7723403,1707,Ian,1.8,FIST_WOUNDED,*Ian* met une terrible baigne à *Chun*...,1.0,8.0
7723404,1707,Ian,1.8,FIST_KILLED,*Ian* s'acharne sur *Chun*...qui rend son der...,1.0,8.0
7723405,1707,Ian,1.8,TRIUMPH_EARNED,Vous avez gagné *3 Triomphe*.,1.0,8.0
7723406,1707,Ian,1.8,DMG_DEALT,Vous perdez 3 hp.,1.0,8.0


For a Mush, the log before the kill one is a `FIST_WOUNDED` log (if they doesn't trigger a `TRAUMA_DISEASE` event).

In [5]:
logs.loc[300728:300732]

Unnamed: 0,Ship,Character,Day.Cycle,Action,Log,Day,Cycle
300728,69,Jin_Su,4.8,FIST_WOUNDED,*Jin Su* mets un magistral uppercut à *Janice...,4.0,8.0
300729,69,Jin_Su,4.8,MORAL_DOWN,Vous avez perdu 1 moral.,4.0,8.0
300730,69,Jin_Su,4.8,SKILL_ADD_PA,Votre compétence *Sang-froid* a porté ses fru...,4.0,8.0
300731,69,Jin_Su,4.8,FIST_KILLED,*Jin Su* s'acharne sur *Janice*...qui rend so...,4.0,8.0
300732,69,Jin_Su,4.8,TRIUMPH_EARNED,Vous avez gagné *3 Triomphe*.,4.0,8.0


Another edge case to keep in mind is the Cold-blooded skill which prints a log before the kill one.

Let's recap by taking all the logs just before the kill ones :

In [8]:
potential_traumas = logs.loc[utils.find_all_actions_by_name(logs, "FIST_KILLED").index - 1]
potential_traumas

Unnamed: 0,Ship,Character,Day.Cycle,Action,Log,Day,Cycle
264,1,Janice,6.5,TRAUMA_DISEASE,Ce que vous venez de voir vous a choqué. Vous...,6.0,5.0
2176,2,Chao,4.5,MORAL_DOWN,Vous avez perdu 1 moral.,4.0,5.0
2208,2,Chao,4.7,MORAL_DOWN,Vous avez perdu 1 moral.,4.0,7.0
2809,2,Frieda,4.1,DISEASED_PSY,Vous ne vous sentez pas très très bien... Vot...,4.0,1.0
3495,2,Kuan_Ti,6.6,MORAL_DOWN,Vous avez perdu 1 moral.,6.0,6.0
...,...,...,...,...,...,...,...
7723278,1707,Paola,1.8,DISEASED,Vous ne vous sentez pas très très bien... Cha...,1.0,8.0
7723403,1707,Ian,1.8,FIST_WOUNDED,*Ian* met une terrible baigne à *Chun*...,1.0,8.0
7724966,1709,Kuan_Ti,3.5,MORAL_DOWN,Vous avez perdu 1 moral.,3.0,5.0
7725272,1709,Janice,4.5,MORAL_DOWN,Vous avez perdu 1 moral.,4.0,5.0


In [13]:
events_repartition = potential_traumas.value_counts("Event")
events_repartition

Action
MORAL_DOWN        2336
FIST_WOUNDED       805
DISEASED           628
DISEASED_PSY       604
TRAUMA_DISEASE     268
SKILL_ADD_PA        61
dtype: int64

We can see we've browsed all the cases.

As said above, Mush players can get the `TRAUMA_DISEASE` (trauma event triggered) or the `FIST_WOUNDED` log before the kill log.
 
Human players can get the `DISEASED` or `DISEASE_PSY` log if they triggered the trauma event or the `MORAL_DOWN` log otherwise.

Then if we take only human logs, an estimation of the `Trauma` event probability is $\frac{\#diseased + \#diseased psy}{\#diseased + \#diseased psy + \#moral down}$.

In [29]:
diseased_logs = events_repartition["DISEASED"] + events_repartition["DISEASED_PSY"]
moral_down_logs = events_repartition["MORAL_DOWN"]

trauma_probability_1 = diseased_logs / (diseased_logs + moral_down_logs)

In [30]:
print("First estimation of Trauma event probability : {:.2f}%".format(trauma_probability_1 * 100))

First estimation of Trauma event probability : 34.53%


With similar reasoning we can estimate this probability with kills done with a knife :

In [24]:
knife_potential_traumas = logs.loc[utils.find_all_events_by_name(logs, "KNIFE_KILLED").index - 1]
knife_event_repartition = knife_potential_traumas.value_counts("Event")

In [26]:
knife_event_repartition

Action
MORAL_DOWN        731
KNIFE_WOUNDED     535
DISEASED          232
DISEASED_PSY      176
TRAUMA_DISEASE    113
SKILL_ADD_PA       33
dtype: int64

In [32]:
diseased_logs = knife_event_repartition["DISEASED"] + knife_event_repartition["DISEASED_PSY"]
moral_down_logs = knife_event_repartition["MORAL_DOWN"]

trauma_probability_2 = diseased_logs / (diseased_logs + moral_down_logs)

In [33]:
print("2nd estimation of Trauma event probability : {:.2f}%".format(trauma_probability_2 * 100))

2nd estimation of Trauma event probability : 35.82%


It seems you have a **1/3** chance to catch a disease by witnessing a death.
 
(Note : this is more precisely the chance of catching a disease if you kill a player, if you are a witness it might be lower but harder to estimate)

### Mass function of diseases

We just need to check the disease logs after the `TRAUMA_DISEASE` events :

In [21]:
traumas = utils.find_all_events_by_name(logs, "TRAUMA_DISEASE")
diseases = logs.loc[traumas.index + 1]

is_disease_event = (diseases["Event"] == "DISEASED")
is_psy_disease_event = (diseases["Event"] == "DISEASED_PSY")

physical_diseases = diseases[is_disease_event] #remove logs that are not disease events
physical_diseases = physical_diseases["Log"].apply(lambda x: x.split(":")[-1].split(".")[0]) #keep only the disease name

psy_diseases = diseases[is_psy_disease_event] #remove logs that are not disorder events
psy_diseases = psy_diseases["Log"].apply(lambda x: x.split(":")[-1].split(".")[0]) #keep only the disorder name

diseases = pd.concat([physical_diseases, psy_diseases])

In [17]:
physical_diseases.value_counts(normalize=True) * 100

 Migraine          57.027183
 GastroEntérite    42.972817
Name: Log, dtype: float64

In [18]:
psy_diseases.value_counts(normalize=True) * 100

 Migraine chronique       12.798742
 Episodes psychotiques    12.672956
 Phobie des armes         12.327044
 Crise Paranoïaque        12.295597
 Crabisme                 12.138365
 Coprolalie               12.075472
 Dépression               11.635220
 Agoraphobie               6.037736
 Vertige chronique         5.503145
 Spleen                    2.515723
Name: Log, dtype: float64

In [22]:
diseases.value_counts(normalize=True) * 100

 Migraine                 29.707743
 GastroEntérite           22.386261
 Migraine chronique        6.131365
 Episodes psychotiques     6.071106
 Phobie des armes          5.905393
 Crise Paranoïaque         5.890328
 Crabisme                  5.815005
 Coprolalie                5.784875
 Dépression                5.573968
 Agoraphobie               2.892437
 Vertige chronique         2.636336
 Spleen                    1.205182
Name: Log, dtype: float64

## Sex

Players can catch diseases of their partners during sex.

- What diseases can be transmitted ?
- What are the odds of transmition ?

First, let's extract `sex` and `disease_by_sex` logs :

In [4]:
sex_diseases_logs = utils.find_all_events_by_name(logs, 'DISEASE_BY_SEX').reset_index(drop=True);sex_diseases_logs

Unnamed: 0,Ship,Character,Day.Cycle,Action,Log,Day,Cycle
0,74,Janice,6.1,DISEASE_BY_SEX,Vous vous sentez bizarre...l'émotion...ou bie...,6.0,1.0
1,85,Terrence,9.4,DISEASE_BY_SEX,Vous vous sentez bizarre...l'émotion...ou bie...,9.0,4.0
2,85,Raluca,9.4,DISEASE_BY_SEX,Vous vous sentez bizarre...l'émotion...ou bie...,9.0,4.0
3,276,Jin_Su,13.5,DISEASE_BY_SEX,Vous vous sentez bizarre...l'émotion...ou bie...,13.0,5.0
4,429,Hua,2.8,DISEASE_BY_SEX,Vous vous sentez bizarre...l'émotion...ou bie...,2.0,8.0
5,677,Stephen,8.4,DISEASE_BY_SEX,Vous vous sentez bizarre...l'émotion...ou bie...,8.0,4.0
6,696,Raluca,15.8,DISEASE_BY_SEX,Vous vous sentez bizarre...l'émotion...ou bie...,15.0,8.0
7,788,Terrence,11.8,DISEASE_BY_SEX,Vous vous sentez bizarre...l'émotion...ou bie...,11.0,8.0
8,788,Terrence,12.3,DISEASE_BY_SEX,Vous vous sentez bizarre...l'émotion...ou bie...,12.0,3.0
9,788,Raluca,12.3,DISEASE_BY_SEX,Vous vous sentez bizarre...l'émotion...ou bie...,12.0,3.0


In [5]:
sex_logs = utils.find_all_events_by_name(logs, "EV_DONE_IT").reset_index(drop=True);sex_logs

Unnamed: 0,Ship,Character,Day.Cycle,Action,Log,Day,Cycle
0,10,Finola,4.4,EV_DONE_IT,*Finola* et *Ian* éteignent la lumière et des...,4.0,4.0
1,11,Roland,7.1,EV_DONE_IT,*Roland* et *Eleesha* éteignent la lumière et...,7.0,1.0
2,11,Gioele,9.3,EV_DONE_IT,*Gioele* et *Frieda* éteignent la lumière et ...,9.0,3.0
3,12,Chao,4.5,EV_DONE_IT,*Chao* et *Janice* éteignent la lumière et de...,4.0,5.0
4,15,Chun,16.8,EV_DONE_IT,*Chun* et *Chao* éteignent la lumière et des ...,16.0,8.0
...,...,...,...,...,...,...,...
1680,1699,Gioele,7.2,EV_DONE_IT,*Gioele* et *Eleesha* éteignent la lumière et...,7.0,2.0
1681,1701,Chun,4.4,EV_DONE_IT,*Chun* et *Stephen* éteignent la lumière et d...,4.0,4.0
1682,1705,Derek,8.5,EV_DONE_IT,*Derek* et *Janice* éteignent la lumière et d...,8.0,5.0
1683,1705,Derek,11.7,EV_DONE_IT,*Derek* et *Janice* éteignent la lumière et d...,11.0,7.0


Let's add the protagonist of sex events to this dataframe (it will help later) :

In [6]:
sex_logs["Partner_A"] = sex_logs["Log"].apply(lambda x: x.split("*")[1])
sex_logs["Partner_B"] = sex_logs["Log"].apply(lambda x: x.split("*")[3])

### Transmitted diseases

We just need to extract the diseases for the `sex_disease` logs :

In [7]:
STDs = np.unique(sex_diseases_logs["Log"].apply(lambda x: x.split("(")[-1].split(")")[0])); STDs

array(['Eruption cutanée', 'GastroEntérite', 'Grippe'], dtype=object)

### Probability of transmition

To estimate this probability, the idea is the following :
1) Extract all the disease events and the disease_end events
2) For each sex event :
   1) Keep the disease events which happened on the same ship
   2) Keep the disease events which happened before the sex event cycle
   3) Keep the disease events which concerns one of the partners
   4) Keep the disease events which are in the STDs list
   
   If this list isn't empty, for each disease :
   
   1) Keep disease_end events which happen between disease cycle apparition and the sex event cycle
   2) Keep disease_end events which concerns one of the partners
   3) Keep disease_end events which concerns the current disease
   
If the disease_end events list is empty, this means one of the partner had an STD during sex, so we will keep the relevant sex event in the sample for the estimation.

Then an estimation of the probability of transmitting an STD is : $\frac{\#sexDiseases}{\#sexWithSTDs}$

#### 1) Extraction of disease and disease_end events

In [8]:
all_diseases = utils.find_all_events_by_name(logs, "DISEASED")
all_diseases["Disease"] = all_diseases["Log"].apply(lambda x: x.split(":")[-1].split(".")[0].strip())
all_diseases

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_diseases["Disease"] = all_diseases["Log"].apply(lambda x: x.split(":")[-1].split(".")[0].strip())


Unnamed: 0,Ship,Character,Day.Cycle,Action,Log,Day,Cycle,Disease
2267,2,Jin_Su,3.2,DISEASED,Vous ne vous sentez pas très très bien... Cha...,3.0,2.0,Migraine
2963,2,Frieda,6.1,DISEASED,Vous ne vous sentez pas très très bien... Cha...,6.0,1.0,Reflux Gastriques
3038,2,Frieda,6.7,DISEASED,Vous ne vous sentez pas très très bien... Cha...,6.0,7.0,GastroEntérite
3150,2,Eleesha,3.7,DISEASED,Vous ne vous sentez pas très très bien... Cha...,3.0,7.0,GastroEntérite
3753,3,Terrence,2.2,DISEASED,Vous ne vous sentez pas très très bien... Cha...,2.0,2.0,GastroEntérite
...,...,...,...,...,...,...,...,...
7722011,1707,Frieda,4.4,DISEASED,Vous ne vous sentez pas très très bien... Cha...,4.0,4.0,Migraine
7722043,1707,Terrence,1.8,DISEASED,Vous ne vous sentez pas très très bien... Cha...,1.0,8.0,Migraine
7722079,1707,Gioele,1.8,DISEASED,Vous ne vous sentez pas très très bien... Cha...,1.0,8.0,Migraine
7722890,1707,Derek,4.6,DISEASED,Vous ne vous sentez pas très très bien... Cha...,4.0,6.0,GastroEntérite


In [10]:
all_disease_ends = utils.find_all_events_by_name(logs, "DISEASE_END")
all_disease_ends["Disease"] = all_disease_ends["Log"].apply(lambda x: x.split(' votre ')[-1].split(" a ")[0])
all_disease_ends

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_disease_ends["Disease"] = all_disease_ends["Log"].apply(lambda x: x.split(' votre ')[-1].split(" a ")[0])


Unnamed: 0,Ship,Character,Day.Cycle,Action,Log,Day,Cycle,Disease
2273,2,Jin_Su,3.3,DISEASE_END,"Vous vous sentez mieux, votre Migraine a fini...",3.0,3.0,Migraine
5059,4,Jin_Su,5.7,DISEASE_END,"Vous vous sentez mieux, votre Migraine a fini...",5.0,7.0,Migraine
5255,4,Raluca,4.2,DISEASE_END,"Vous vous sentez mieux, votre Migraine a fini...",4.0,2.0,Migraine
6545,5,Chun,3.6,DISEASE_END,"Vous vous sentez mieux, votre GastroEntérite ...",3.0,6.0,GastroEntérite
8779,6,Stephen,4.1,DISEASE_END,"Vous vous sentez mieux, votre Migraine a fini...",4.0,1.0,Migraine
...,...,...,...,...,...,...,...,...
7720540,1706,Gioele,3.1,DISEASE_END,"Vous vous sentez mieux, votre Grippe a fini p...",3.0,1.0,Grippe
7722012,1707,Frieda,4.5,DISEASE_END,"Vous vous sentez mieux, votre GastroEntérite ...",4.0,5.0,GastroEntérite
7722013,1707,Frieda,4.5,DISEASE_END,"Vous vous sentez mieux, votre Migraine a fini...",4.0,5.0,Migraine
7722049,1707,Terrence,2.2,DISEASE_END,"Vous vous sentez mieux, votre Migraine a fini...",2.0,2.0,Migraine


#### Main loop

In [11]:
nb_sex_with_STDs = 0
for _, row in tqdm(sex_logs.iterrows()):
    ship = row["Ship"]
    partner_a = row["Partner_A"]
    partner_b = row["Partner_B"]
    sex_day_cycle = row["Day.Cycle"]
    
    diseases = utils.get_ship_logs(all_diseases, ship)
    diseases = diseases[diseases["Day.Cycle"] <= sex_day_cycle]
    diseases = diseases[diseases["Character"].isin([partner_a, partner_b])]
    diseases = diseases[diseases["Disease"].isin(STDs)]

    if diseases.empty:
        continue
    
    disease_ends = utils.get_ship_logs(all_disease_ends, ship)
    for _, disease_row in diseases.iterrows():
        disease = disease_row["Disease"]
        disease_apparition_cycle = disease_row["Day.Cycle"]
        disease_ends = utils.get_logs_after_cycle_of_day(disease_ends, disease_apparition_cycle)
        disease_ends = utils.get_logs_before_cycle_of_day(disease_ends, sex_day_cycle)
        disease_ends = disease_ends[disease_ends["Character"].isin([partner_a, partner_b])]
        disease_ends = disease_ends[disease_ends["Disease"] == disease]
        if disease_ends.empty:
            nb_sex_with_STDs += 1

1685it [00:04, 416.49it/s]


In [12]:
print("Estimation of STDs transmition probability: {:.2f}%".format(len(sex_diseases_logs) / nb_sex_with_STDs * 100))

Estimation of STDs transmition probability: 6.90%
