# Mutation Analysis

In this Jupyter Notebook we will analyze the **stat_mutation_best.out** files for different conditions to see if the *mutation up* condition is actually having an effect on the number of mutations.

First we will read in the data.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt

#INPUT_ROOT_DIR = "C://ThesisData//" # laptop
INPUT_ROOT_DIR = "C://Users//Brian Davis//Dropbox//Freiburg Masters Semesters//Thesis//Results//" # Desktop
OUTPUT_ROOT_DIR = INPUT_ROOT_DIR + "Graphics//Mutations//"

# We're going to be importing files with the same name from multiple directories
CONTROL_INPUT_FILE = "//control//stats//stat_mutation_best.out"
MUTATION_UP_INPUT_FILE = "//mut_up//stats//stat_mutation_best.out"

# Column header names for the columns in stat_mutation_best.out
mutation_best_names = ['generation','number_local_mutations','number_chromosomic_rearrangements','number_switches','number_indels','number_duplications','number_deletions','number_translocations','inversions']

print("Reading in the data...")


# seed01
df_seed01_control = pd.read_csv(INPUT_ROOT_DIR + "seed01" + CONTROL_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)
df_seed01_mut_up = pd.read_csv(INPUT_ROOT_DIR + "seed01" + MUTATION_UP_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)

# seed02
df_seed02_control = pd.read_csv(INPUT_ROOT_DIR + "seed02" + CONTROL_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)
df_seed02_mut_up = pd.read_csv(INPUT_ROOT_DIR + "seed02" + MUTATION_UP_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)


# seed03
df_seed03_control = pd.read_csv(INPUT_ROOT_DIR + "seed03" + CONTROL_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)
df_seed03_mut_up = pd.read_csv(INPUT_ROOT_DIR + "seed03" + MUTATION_UP_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)


# seed04
df_seed04_control = pd.read_csv(INPUT_ROOT_DIR + "seed04" + CONTROL_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)
df_seed04_mut_up = pd.read_csv(INPUT_ROOT_DIR + "seed04" + MUTATION_UP_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)


# seed05
df_seed05_control = pd.read_csv(INPUT_ROOT_DIR + "seed05" + CONTROL_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)
df_seed05_mut_up = pd.read_csv(INPUT_ROOT_DIR + "seed05" + MUTATION_UP_INPUT_FILE, skiprows=14, delim_whitespace=True, header=0, names=mutation_best_names)

print ("...done")

Reading in the data...
...done


Now that we have imported the mutation data, we will investigate whether the *mutation up* condition shows an increased number of mutations over the *control* condition.

To do this, for each seed we will compare the *control* vs. the *mutation up* condition for each of the 8 columns. We have 5 seeds and a *control* and *mutation up* condition for each seed.

We can simply sum up over the columns and make sure to give the resulting Series a name and indexes for later concatenation:

In [6]:
# SEED01

# Control
seed01_control_sums = df_seed01_control.sum(axis=0)
seed01_control_sums.name = 'seed01_control'
seed01_control_sums.index = mutation_best_names

# Mutation up
seed01_mut_up_sums = df_seed01_mut_up.sum(axis=0)
seed01_mut_up_sums.name = 'seed01_mut_up'
seed01_mut_up_sums.index = mutation_best_names

# SEED02

# Control
seed02_control_sums = df_seed02_control.sum(axis=0)
seed02_control_sums.name = 'seed02_control'
seed02_control_sums.index = mutation_best_names

# Mutation up
seed02_mut_up_sums = df_seed02_mut_up.sum(axis=0)
seed02_mut_up_sums.name = 'seed02_mut_up'
seed02_mut_up_sums.index = mutation_best_names

# SEED03

# Control
seed03_control_sums = df_seed03_control.sum(axis=0)
seed03_control_sums.name = 'seed03_control'
seed03_control_sums.index = mutation_best_names

# Mutation up
seed03_mut_up_sums = df_seed03_mut_up.sum(axis=0)
seed03_mut_up_sums.name = 'seed03_mut_up'
seed03_mut_up_sums.index = mutation_best_names

# SEED04

# Control
seed04_control_sums = df_seed04_control.sum(axis=0)
seed04_control_sums.name = 'seed04_control'
seed04_control_sums.index = mutation_best_names

# Mutation up
seed04_mut_up_sums = df_seed04_mut_up.sum(axis=0)
seed04_mut_up_sums.name = 'seed04_mut_up'
seed04_mut_up_sums.index = mutation_best_names

# SEED05

# Control
seed05_control_sums = df_seed05_control.sum(axis=0)
seed05_control_sums.name = 'seed05_control'
seed05_control_sums.index = mutation_best_names

# Mutation up
seed05_mut_up_sums = df_seed05_mut_up.sum(axis=0)
seed05_mut_up_sums.name = 'seed05_mut_up'
seed05_mut_up_sums.index = mutation_best_names

Now concatenate the results in to one DataFrame for easy comparison:

In [51]:
# Create a new DataFrame for the results
df_results = pd.concat([seed01_control_sums, seed01_mut_up_sums], axis=1)#, ignore_index=True)

# Concatenate remaining seeds and conditions
df_results = pd.concat([df_results, seed02_control_sums, seed02_mut_up_sums], axis=1)
df_results = pd.concat([df_results, seed03_control_sums, seed03_mut_up_sums], axis=1)
df_results = pd.concat([df_results, seed04_control_sums, seed04_mut_up_sums], axis=1)
df_results = pd.concat([df_results, seed05_control_sums, seed05_mut_up_sums], axis=1)

#print("df_results has", df_results.size, "total entries and its dimensions (rows,columns) is", df_results.shape)

Print off the final table for comparison:

In [52]:
# Drop the number of generations since the value here is not useful
df_results = df_results.drop(df_results.index[0])
print(df_results)

                                   seed01_control  seed01_mut_up  \
number_local_mutations                       5954          16112   
number_chromosomic_rearrangements           18395          12096   
number_switches                              2861           7526   
number_indels                                3093           8586   
number_duplications                            76            105   
number_deletions                               27             26   
number_translocations                         928            850   
inversions                                  17364          11115   

                                   seed02_control  seed02_mut_up  \
number_local_mutations                       5410          15344   
number_chromosomic_rearrangements           19247          12977   
number_switches                              2656           6938   
number_indels                                2754           8406   
number_duplications                            