# Mutation Analysis

In this Jupyter Notebook we will analyze the **stat_mutation_best.out** files for different conditions to see if the *mutation up* condition is actually having an effect on the number of mutations.

First we will read in the data.

In [3]:
import pandas as pd
import matplotlib.pyplot as plt

INPUT_ROOT_DIR = "C://ThesisData//" # laptop
OUTPUT_ROOT_DIR = INPUT_ROOT_DIR + "Graphics//Mutations//"

# We're going to be importing files with the same name from multiple directories
CONTROL_INPUT_FILE = "//control//stats//stat_mutation_best.out"
MUTATION_UP_INPUT_FILE = "//mut_up//stats//stat_mutation_best.out"

# Column header names for the columns in stat_mutation_best.out
mutation_best_names = ['generation','number_local_mutations','number_chromosomic_rearrangements','number_switches','number_indels','number_duplications','number_deletions','number_translocations','inversions']

print("Reading in the data...")


# seed01
df_seed01_control = pd.read_csv(INPUT_ROOT_DIR + "seed01" + CONTROL_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)
df_seed01_mut_up = pd.read_csv(INPUT_ROOT_DIR + "seed01" + MUTATION_UP_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)

# seed02
df_seed02_control = pd.read_csv(INPUT_ROOT_DIR + "seed02" + CONTROL_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)
df_seed02_mut_up = pd.read_csv(INPUT_ROOT_DIR + "seed02" + MUTATION_UP_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)


# seed03
df_seed03_control = pd.read_csv(INPUT_ROOT_DIR + "seed03" + CONTROL_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)
df_seed03_mut_up = pd.read_csv(INPUT_ROOT_DIR + "seed03" + MUTATION_UP_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)


# seed04
df_seed04_control = pd.read_csv(INPUT_ROOT_DIR + "seed04" + CONTROL_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)
df_seed04_mut_up = pd.read_csv(INPUT_ROOT_DIR + "seed04" + MUTATION_UP_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)


# seed05
df_seed05_control = pd.read_csv(INPUT_ROOT_DIR + "seed05" + CONTROL_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)
df_seed05_mut_up = pd.read_csv(INPUT_ROOT_DIR + "seed05" + MUTATION_UP_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)

print ("...done")

Reading in the data...
...done


Now that we have imported the mutation data, we will investigate whether the *mutation up* condition shows an increased number of mutations over the *control* condition.

To do this, for each seed we will compare the *control* vs. the *mutation up* condition for each of the 8 columns. We have 5 seeds and a *control* and *mutation up* condition for each seed.

We can simply sum up over the columns:

In [4]:
seed01_control_sums = df_seed01_control.sum(axis=0)
seed01_mut_up_sums = df_seed01_mut_up.sum(axis=0)

seed02_control_sums = df_seed02_control.sum(axis=0)
seed02_mut_up_sums = df_seed02_mut_up.sum(axis=0)

seed03_control_sums = df_seed03_control.sum(axis=0)
seed03_mut_up_sums = df_seed03_mut_up.sum(axis=0)

seed04_control_sums = df_seed04_control.sum(axis=0)
seed04_mut_up_sums = df_seed04_mut_up.sum(axis=0)

seed05_control_sums = df_seed05_control.sum(axis=0)
seed05_mut_up_sums = df_seed05_mut_up.sum(axis=0)

In [27]:
print("seed01 control\n")
print(seed01_control_sums)
seed01_control_sums.name = 'seed01_control'

print("\nseed01 mutation up\n")
print(seed01_mut_up_sums)
seed01_mut_up_sums.name = 'seed01_mut_up'

derpaderp = seed01_control_sums.align(seed01_mut_up_sums, join='outer', axis=0, copy=True )

print("derpeerpflksfdj\n", derpaderp)


seed01 control

generation                           125000250000
number_local_mutations                       5954
number_chromosomic_rearrangements           18395
number_switches                              2861
number_indels                                3093
number_duplications                            76
number_deletions                               27
number_translocations                         928
inversions                                  17364
Name: seed01_control, dtype: int64

seed01 mutation up

generation                           125000250000
number_local_mutations                      16112
number_chromosomic_rearrangements           12096
number_switches                              7526
number_indels                                8586
number_duplications                           105
number_deletions                               26
number_translocations                         850
inversions                                  11115
Name: seed01_mut_up, dtype: 