Information:

I have created 9 different interactive graphs that if you open in your browser you will be able to toggle certain data points on and off. You can toggle the data points by clicking on the colored boxes next to each label on the sides. I tried to sort them by certain attributes so that grouping together points by certain variables like age, gender and monthsFromScreening (time from diagnosis to first visit) was easy.

The nine graphs are as follows:

1 -- maleData_control.sick_age.svg : allows for looking at male data in the context of control vs sick as well as looking at age bands

2 -- femaleData_control.sick_age.svg : allows for looking at female data in the context of control vs sick as well as looking at age bands

3 -- youngData_control.sick_age.svg : allows for looking at all individuals under the age of 50 in the context of control vs sick as well as male vs female

4 -- middle_1_Data_control.sick_age.svg : allows for looking at all individuals between the ages of 50 and 59 in the context of control vs sick as well as male vs female

5 -- middle_2_Data_control.sick_age.svg : allows for looking at all individuals between the ages of 60 and 64 in the context of control vs sick as well as male vs female

6 -- middle_3_Data_control.sick_age.svg : allows for looking at all individuals between the ages of 65 and 69 in the context of control vs sick as well as male vs female

7 -- old_age_Data_control.sick_age.svg : allows for looking at all individuals older than 69 in the context of control vs sick as well as male vs female

8 -- maleData_monthsFromScreening.svg : allows for looking at male sick patients and how they're protien levels vary due to the time from diagnosis to first visit

9 -- femaleData_monthsFromScreening.svg : allows for looking at female sick patients and how they're protien levels vary due to the time from diagnosis to first visit

I decided to divide up the ages in that ways stated above, as it seemed to split the data decently evenly. If you think another split would be better let me know.

In my pursuit of trying to find patterns in the data I was looking at rolling averages and differences between recorded values, but I haven't included them yet with these new plots... it is something I am looking to do soon.

I will say that while the classifiers seemed to pick up on something within the gender groups (i.e. classify control males from sick males), I really can't see many differences between age, gender or sick/control. If you can take a look at the data and see if you see anything that would be great.

I should also come up and set up this environment for you, so that you can run this notebook and be able to keep randomly selecting samples in order to see more diversity in the graphs... but I think there's enough below to get started with

In [225]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import math
import usefulFunctions
from usefulFunctions import checkForNull
from random import randrange
import os 
import webbrowser

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [226]:
# loading in the data and splitting it up into control vs sick, male vs female as well as by age
control = {}
sick = {}
male = {}
female = {}
young_age = {}    # younger than 50
middle_1_age = {} # between 50 and 60
middle_2_age = {} # between 60 and 65
middle_3_age = {} # between 65 and 70
old_age = {}      # older than 69


age = []
months = []

fileName = "Master_PL.xlsx - Master_PL.csv"
table = pd.read_csv(fileName, header=0, 
                    names=["id", "type", "gender", "age", "monthsFromScreening", "protein_BL", "protein_V4",
                          "protein_V6", "protein_V8", "protein_V10", "protein_V12"])

totalRows = len(table.values)

# splitting data into female vs male, control vs sick, plus storing the age and months data
# into arrays so we can look at their distributions later (to understand good age bands)
for row in table.to_dict(orient='records'):
    if not checkForNull(row['type']):
        if row['type'] == 1:
            control[int(row['id'])] = row
        else:
            sick[int(row['id'])] = row
    
    if not checkForNull(row['gender']):
        if row['gender'] == 1:
            female[int(row['id'])] = row
        else:
            male[int(row['id'])] = row
    
    if not checkForNull(row['age']):
        if row['age']< 50:
            young_age[int(row['id'])] = row
        elif row['age'] < 60:
            middle_1_age[int(row['id'])] = row
        elif row['age'] < 65:
            middle_2_age[int(row['id'])] = row
        elif row['age'] < 70:
            middle_3_age[int(row['id'])] = row
        else:
            old_age[int(row['id'])] = row

    
    if row['age'] is not None:
        age.append(row['age'])
    
    if row['monthsFromScreening'] is not None:
        months.append(row['monthsFromScreening'])

print "Number of data points that are males : " + str(len(male))
print "Number of data points that are females : " + str(len(female))
print "Number of data points that are control : " + str(len(control))
print "Number of data points that are sick : " + str(len(sick))
print "Number of young : " + str(len(young_age))
print "Number of middle_1 : " + str(len(middle_1_age))
print "Number of middle_2 : " + str(len(middle_2_age))
print "Number of middle_3 : " + str(len(middle_3_age))
print "Number of old : " + str(len(old_age))

Number of data points that are males : 403
Number of data points that are females : 216
Number of data points that are control : 196
Number of data points that are sick : 423
Number of young : 78
Number of middle_1 : 168
Number of middle_2 : 118
Number of middle_3 : 119
Number of old : 136


In [227]:
# splitting the data up to look at males who are sick vs control, and females who are sick vs control
tup = usefulFunctions.combine(male, female, control, sick)
maleSplit = tup[0]
femaleSplit = tup[1]
print "Number of Males that are in the Control : " + str(len(maleSplit[0]))
print "Number of Males that are Sick : " + str(len(maleSplit[1]))
print "Number of Females that are in the Control : " + str(len(femaleSplit[0]))
print "Number of Females that are Sick : " + str(len(femaleSplit[1]))

Number of Males that are in the Control : 126
Number of Males that are Sick : 277
Number of Females that are in the Control : 70
Number of Females that are Sick : 146


In [228]:
# splitting the data up to look at young (x<50) people who are sick vs control
# splitting the data up to look at middle_1 (50<=x<60) people who are sick vs control
# splitting the data up to look at middle_2 (60<=x<65) people who are sick vs control
# splitting the data up to look at middle_3 (65<=x<70) people who are sick vs control
# splitting the data up to look at old (x>=70) people who are sick vs control
young_split, middle_1_split = usefulFunctions.combine(young_age, middle_1_age, control, sick)
middle_2_split, middle_3_split = usefulFunctions.combine(middle_2_age, middle_3_age, control, sick)
old_age_split, blah = usefulFunctions.combine(old_age, young_age, control, sick)
print "Number of control young : " + str(len(young_split[0]))
print "Number of control middle_1 : " + str(len(middle_1_split[0]))
print "Number of control middle_2 : " + str(len(middle_2_split[0]))
print "Number of control middle_3 : " + str(len(middle_3_split[0]))
print "Number of control old : " + str(len(old_age_split[0]))
print "\n**************\n"
print "Number of sick young : " + str(len(young_split[1]))
print "Number of sick middle_1 : " + str(len(middle_1_split[1]))
print "Number of sick middle_2 : " + str(len(middle_2_split[1]))
print "Number of sick middle_3 : " + str(len(middle_3_split[1]))
print "Number of sick old : " + str(len(old_age_split[1]))

Number of control young : 29
Number of control middle_1 : 54
Number of control middle_2 : 37
Number of control middle_3 : 34
Number of control old : 42

**************

Number of sick young : 49
Number of sick middle_1 : 114
Number of sick middle_2 : 81
Number of sick middle_3 : 85
Number of sick old : 94


In [294]:
# getting entries without strange outliers and where there are at least k readings for the protein levels
proteinKeys = ['protein_BL', 'protein_V4', 'protein_V6', 'protein_V8', 'protein_V10', 'protein_V12']

def getKFullValuesOnly(dic, filterOutliers=True, k=5):
    fullEntities = {}
    for key in dic:
        entity = dic[key]
        count = 6
        valid = True
        for key2 in proteinKeys:
            if not usefulFunctions.checkForValid(entity[key2]):
                count+=-1
                dic[key][key2] = None
            elif filterOutliers:
                if entity[key2] >= 100 or entity[key2] <= 10:
                    valid = False
        if count>=k and valid:
            temp = []
            for key2 in proteinKeys:
                temp.append(entity[key2])
        
            fullEntities[key] = {"age" : entity["age"], 
                                 "monthsFromScreening" : entity["monthsFromScreening"], 
                                 "vector" : temp } 
        
    return fullEntities

controlMales = getKFullValuesOnly(maleSplit[0])
sickMales = getKFullValuesOnly(maleSplit[1])
controlFemales = getKFullValuesOnly(femaleSplit[0])
sickFemales = getKFullValuesOnly(femaleSplit[1])

In [230]:
len(controlMales), len(controlFemales)

(42, 30)

In [231]:
len(sickMales), len(sickFemales)

(92, 37)

In [232]:
# some functions and variables for creating and displaying the graphs
xLabels = ["protein_BL", "protein_V4","protein_V6", "protein_V8", "protein_V10", "protein_V12"]

# MacOS
chrome_path = 'open -a /Applications/Google\ Chrome.app %s'

# Windows
# chrome_path = 'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe %s'

# Linux
# chrome_path = '/usr/bin/google-chrome %s'


def createLabels(classType, params):
    label = classType
    for param in params:
        label = label + "_" + str(param[0]) + "=" + str(param[1])
    return label

def selectRandomFromContainer(container,typeOf, n):
    if typeOf == "dict":
        random = {}
        keys = container.keys()
        size = len(keys)
        for i in range(n):
            index = randrange(0, size)
            random[keys[index]] = container[keys[index]]
            keys[index] = keys[size-1]
            size +=-1
        return random
    else:
        random = []
        size = len(container)
        for i in range(n):
            index = randrange(0, size)
            random.append(container[index])
            temp = container[index]
            container[index] = container[size-1]
            container[size-1] = temp
            size+=-1
        return random

def getValidMonthsFromScreeningEntries(dic):
    valid = {}
    for key in dic:
        if usefulFunctions.checkForValid(dic[key]['monthsFromScreening']):
            valid[key] = dic[key]
    return valid

Creating a plot to look at examples of control and sick individuals who are males, as well as trying to see if age bands exist within the male data

In [290]:
labels = []
allFullyFilledMales = []
colors=[]
types = []

keyToAge = {}
for key in selectRandomFromContainer(controlMales.keys(), 'array', 30):
    keyToAge[key] = controlMales[key]["age"]

for key in sorted(keyToAge, key=keyToAge.get):
    types.append(0)
    labels.append(createLabels("control",[("age",int(controlMales[key]['age']))]))
    colors.append('#98FB98')
    allFullyFilledMales.append(controlMales[key]['vector'])

keyToAge = {}
for key in selectRandomFromContainer(sickMales.keys(), 'array', 30):
    keyToAge[key] = sickMales[key]["age"]

for key in sorted(keyToAge, key=keyToAge.get):
    types.append(1)
    labels.append(createLabels("sick",[("age",int(sickMales[key]['age']))]))
    colors.append('#FF6347')
    allFullyFilledMales.append(sickMales[key]['vector'])

title = "Random Sample of Male Data Split By Control/Sick"
file_name = "maleData_control.sick_age.svg"
usefulFunctions.createPyGalLinePlots(allFullyFilledMales, labels, xLabels, types, colors, file_name, title)
print "open above file in browser for intractiveness or run below cell"

open above file in browser for intractiveness or run below cell


In [291]:
webbrowser.get(chrome_path).open('file://' + os.path.realpath(file_name))

True

Creating a plot to look at examples of sick individuals who are males to see if there is any groupings due to the time from diagnosis to first visit... not sure why only 26 show

In [288]:
labels = []
allFullyFilledMales = []
colors=[]
types = []

keyToAge = {}
for key in selectRandomFromContainer(sickMales.keys(), 'array', 30):
    keyToAge[key] = sickMales[key]["monthsFromScreening"]
for key in sorted(keyToAge, key=keyToAge.get):
    types.append(0)
    labels.append(createLabels(str(key),[("months",int(sickMales[key]['monthsFromScreening']))]))
    colors.append('#FF6347')
    allFullyFilledMales.append(sickMales[key]['vector'])
title = "Random Sample of Sick Male Data"
file_name = "maleData_monthsFromScreening.svg"
usefulFunctions.createPyGalLinePlots(allFullyFilledMales, labels, xLabels, types, colors, file_name, title)
print "open above file in browser for intractiveness or run below cell"

open above file in browser for intractiveness or run below cell


In [289]:
webbrowser.get(chrome_path).open('file://' + os.path.realpath(file_name))

True

Creating a plot to look at examples of control and sick individuals who are females, as well as trying to see if age bands exist within the female data

In [250]:
labels = []
allFullyFilledFemales = []
colors=[]
types = []

keyToAge = {}
for key in selectRandomFromContainer(controlFemales.keys(), 'array', 30):
    keyToAge[key] = controlFemales[key]["age"]

for key in sorted(keyToAge, key=keyToAge.get):
    types.append(0)
    labels.append(createLabels("control",[("age",int(controlFemales[key]['age']))]))
    colors.append('#3CB371')
    allFullyFilledFemales.append(controlFemales[key]['vector'])

keyToAge = {}
for key in selectRandomFromContainer(sickFemales.keys(), 'array', 30):
    keyToAge[key] = sickFemales[key]["age"]

for key in sorted(keyToAge, key=keyToAge.get):
    types.append(1)
    labels.append(createLabels("sick",[("age",int(sickFemales[key]['age']))]))
    colors.append('#FF6347')
    allFullyFilledFemales.append(sickFemales[key]['vector'])

title = "Random Sample of Female Data Split By Control/Sick"
file_name = "femaleData_control.sick_age.svg"
usefulFunctions.createPyGalLinePlots(allFullyFilledFemales, labels, xLabels, types, colors, file_name, title, None, 90)
print "open above file in browser for intractiveness or run below cell"

open above file in browser for intractiveness or run below cell


In [236]:
webbrowser.get(chrome_path).open('file://' + os.path.realpath(file_name))

True

Creating a plot to look at examples of sick individuals who are females to see if there is any groupings due to the time from diagnosis to first visit... not sure why only 26 show

In [292]:
labels = []
allFullyFilledFemales = []
colors=[]
types = []

keyToAge = {}
for key in selectRandomFromContainer(sickFemales.keys(), 'array', 30):
    keyToAge[key] = sickFemales[key]["monthsFromScreening"]
for key in sorted(keyToAge, key=keyToAge.get):
    types.append(0)
    labels.append(createLabels(str(key),[("months",int(sickFemales[key]['monthsFromScreening']))]))
    colors.append('#FF6347')
    allFullyFilledFemales.append(sickFemales[key]['vector'])
title = "Random Sample of Sick Females Data"
file_name = "femalesData_monthsFromScreening.svg"
usefulFunctions.createPyGalLinePlots(allFullyFilledFemales, labels, xLabels, types, colors, file_name, title)
print "open above file in browser for intractiveness or run below cell"

open above file in browser for intractiveness or run below cell


In [293]:
webbrowser.get(chrome_path).open('file://' + os.path.realpath(file_name))

True

Splitting the age based groupings into sick vs control

In [251]:
youngControl = getKFullValuesOnly(young_split[0])
youngSick= getKFullValuesOnly(young_split[1])
middle_1_Control = getKFullValuesOnly(middle_1_split[0])
middle_1_Sick = getKFullValuesOnly(middle_1_split[1])
middle_2_Control = getKFullValuesOnly(middle_2_split[0])
middle_2_Sick = getKFullValuesOnly(middle_2_split[1])
middle_3_Control = getKFullValuesOnly(middle_3_split[0])
middle_3_Sick = getKFullValuesOnly(middle_3_split[1])
old_age_Control = getKFullValuesOnly(old_age_split[0])
old_age_Sick = getKFullValuesOnly(old_age_split[1],4)

In [252]:
len(youngControl), len(youngSick)

(13, 16)

In [253]:
len(middle_1_Control), len(middle_1_Sick)

(23, 40)

In [254]:
len(middle_2_Control), len(middle_2_Sick)

(15, 21)

In [255]:
len(middle_3_Control), len(middle_3_Sick)

(9, 32)

In [256]:
len(old_age_Control), len(old_age_Sick)

(12, 20)

Creating graph to look at all individuals under 50 and whether there are differences in the protein levels between sick and control, as well as male and female

In [257]:
labels = []
allFullyFilledYoung = []
colors= []
types = []

keyToType = {}
for key in youngControl:
    if key in controlMales:
        keyToType[key] = "male"
    elif key in controlFemales:
        keyToType[key] = "female"
    else:
        print str(key)

for key in sorted(keyToType, key=keyToType.get):
    types.append(0)
    if key in controlMales:
        params = [("age", int(controlMales[key]['age']))]
        labels.append(createLabels("c_male", params))
    elif key in controlFemales:
        params = [("age", int(controlFemales[key]['age']))]
        labels.append(createLabels("c_female", params))
    colors.append('#3CB371')
    allFullyFilledYoung.append(youngControl[key]['vector'])

keyToType = {}
for key in selectRandomFromContainer(youngSick.keys(), 'array', len(youngControl)):
    if key in sickMales:
        keyToType[key] = "male"
    elif key in sickFemales:
        keyToType[key] = "female"
    else:
        print str(key)

for key in sorted(keyToType, key=keyToType.get):
    types.append(1)
    if key in sickMales:
        params = [("age", int(sickMales[key]['age']))]
        labels.append(createLabels("s_male", params))
    elif key in sickFemales:
        params = [("age", int(sickFemales[key]['age']))]
        labels.append(createLabels("s_female", params))
    colors.append('#FF6347')
    allFullyFilledYoung.append(youngSick[key]['vector'])

title = "Random Sample of Young Data, x<50, Split By Control-Green and Sick-Red"
file_name = "youngData_control.sick_age.svg"
usefulFunctions.createPyGalLinePlots(allFullyFilledYoung, labels, xLabels, types, colors, file_name, title)
print "open above file in browser for intractiveness or run below cell"

open above file in browser for intractiveness or run below cell


In [258]:
webbrowser.get(chrome_path).open('file://' + os.path.realpath(file_name))

True

Creating graph to look at all individuals between 50 and 59 and whether there are differences in the protein levels between sick and control, as well as male and female

In [259]:
labels = []
allFullyFilledMiddle_1 = []
colors= []
types = []

keyToType = {}
for key in middle_1_Control:
    if key in controlMales:
        keyToType[key] = "male"
    elif key in controlFemales:
        keyToType[key] = "female"
    else:
        print str(key)

for key in sorted(keyToType, key=keyToType.get):
    types.append(0)
    if key in controlMales:
        params = [("age", int(controlMales[key]['age']))]
        labels.append(createLabels("c_male", params))
    elif key in controlFemales:
        params = [("age", int(controlFemales[key]['age']))]
        labels.append(createLabels("c_female", params))
    colors.append('#3CB371')
    allFullyFilledMiddle_1.append(middle_1_Control[key]['vector'])

keyToType = {}
for key in selectRandomFromContainer(middle_1_Sick.keys(), 'array', len(middle_1_Control)):
    if key in sickMales:
        keyToType[key] = "male"
    elif key in sickFemales:
        keyToType[key] = "female"
    else:
        print str(key)

for key in sorted(keyToType, key=keyToType.get):
    types.append(1)
    if key in sickMales:
        params = [("age", int(sickMales[key]['age']))]
        labels.append(createLabels("s_male", params))
    elif key in sickFemales:
        params = [("age", int(sickFemales[key]['age']))]
        labels.append(createLabels("s_female", params))
    colors.append('#FF6347')
    allFullyFilledMiddle_1.append(middle_1_Sick[key]['vector'])

title = "Random Sample of Middle_1 Data, 50<=x<60, Split By Control-Green and Sick-Red"
file_name = "middle_1_Data_control.sick_age.svg"
usefulFunctions.createPyGalLinePlots(allFullyFilledMiddle_1, labels, xLabels, types, colors, file_name, title)
print "open above file in browser for intractiveness or run below cell"

open above file in browser for intractiveness or run below cell


In [260]:
webbrowser.get(chrome_path).open('file://' + os.path.realpath(file_name))

True

Creating graph to look at all individuals between 60 and 64 and whether there are differences in the protein levels between sick and control, as well as male and female

In [261]:
labels = []
allFullyFilledMiddle_2 = []
colors= []
types = []

keyToType = {}
for key in middle_2_Control:
    if key in controlMales:
        keyToType[key] = "male"
    elif key in controlFemales:
        keyToType[key] = "female"
    else:
        print str(key)

for key in sorted(keyToType, key=keyToType.get):
    types.append(0)
    if key in controlMales:
        params = [("age", int(controlMales[key]['age']))]
        labels.append(createLabels("c_male", params))
    elif key in controlFemales:
        params = [("age", int(controlFemales[key]['age']))]
        labels.append(createLabels("c_female", params))
    colors.append('#3CB371')
    allFullyFilledMiddle_2.append(middle_2_Control[key]['vector'])

keyToType = {}
for key in selectRandomFromContainer(middle_2_Sick.keys(), 'array', len(middle_2_Control)):
    if key in sickMales:
        keyToType[key] = "male"
    elif key in sickFemales:
        keyToType[key] = "female"
    else:
        print str(key)

for key in sorted(keyToType, key=keyToType.get):
    types.append(1)
    if key in sickMales:
        params = [("age", int(sickMales[key]['age']))]
        labels.append(createLabels("s_male", params))
    elif key in sickFemales:
        params = [("age", int(sickFemales[key]['age']))]
        labels.append(createLabels("s_female", params))
    colors.append('#FF6347')
    allFullyFilledMiddle_2.append(middle_2_Sick[key]['vector'])

title = "Random Sample of Middle_2 Data, 60<=x<65, Split By Control-Green and Sick-Red"
file_name = "middle_2_Data_control.sick_age.svg"
usefulFunctions.createPyGalLinePlots(allFullyFilledMiddle_2, labels, xLabels, types, colors, file_name, title)
print "open above file in browser for intractiveness or run below cell"

open above file in browser for intractiveness or run below cell


In [262]:
webbrowser.get(chrome_path).open('file://' + os.path.realpath(file_name))

True

Creating graph to look at all individuals between 65 and 69 and whether there are differences in the protein levels between sick and control, as well as male and female

In [263]:
labels = []
allFullyFilledMiddle_3 = []
colors= []
types = []

keyToType = {}
for key in middle_3_Control:
    if key in controlMales:
        keyToType[key] = "male"
    elif key in controlFemales:
        keyToType[key] = "female"
    else:
        print str(key)

for key in sorted(keyToType, key=keyToType.get):
    types.append(0)
    if key in controlMales:
        params = [("age", int(controlMales[key]['age']))]
        labels.append(createLabels("c_male", params))
    elif key in controlFemales:
        params = [("age", int(controlFemales[key]['age']))]
        labels.append(createLabels("c_female", params))
    colors.append('#3CB371')
    allFullyFilledMiddle_3.append(middle_3_Control[key]['vector'])

keyToType = {}
for key in selectRandomFromContainer(middle_3_Sick.keys(), 'array', len(middle_3_Control)):
    if key in sickMales:
        keyToType[key] = "male"
    elif key in sickFemales:
        keyToType[key] = "female"
    else:
        print str(key)

for key in sorted(keyToType, key=keyToType.get):
    types.append(1)
    if key in sickMales:
        params = [("age", int(sickMales[key]['age']))]
        labels.append(createLabels("s_male", params))
    elif key in sickFemales:
        params = [("age", int(sickFemales[key]['age']))]
        labels.append(createLabels("s_female", params))
    colors.append('#FF6347')
    allFullyFilledMiddle_3.append(middle_3_Sick[key]['vector'])

title = "Random Sample of Middle_3 Data, 65<=x<70, Split By Control-Green and Sick-Red"
file_name = "middle_3_Data_control.sick_age.svg"
usefulFunctions.createPyGalLinePlots(allFullyFilledMiddle_3, labels, xLabels, types, colors, file_name, title)
print "open above file in browser for intractiveness or run below cell"

open above file in browser for intractiveness or run below cell


In [264]:
webbrowser.get(chrome_path).open('file://' + os.path.realpath(file_name))

True

Creating graph to look at all individuals older than 69 and whether there are differences in the protein levels between sick and control, as well as male and female

In [267]:
labels = []
allFullyFilledOld = []
colors= []
types = []

keyToType = {}
for key in old_age_Control:
    if key in controlMales:
        keyToType[key] = "male"
    elif key in controlFemales:
        keyToType[key] = "female"
    else:
        print str(key)

for key in sorted(keyToType, key=keyToType.get):
    types.append(0)
    if key in controlMales:
        params = [("age", int(controlMales[key]['age']))]
        labels.append(createLabels("c_male", params))
    elif key in controlFemales:
        params = [("age", int(controlFemales[key]['age']))]
        labels.append(createLabels("c_female", params))
    colors.append('#3CB371')
    allFullyFilledOld.append(old_age_Control[key]['vector'])

keyToType = {}
for key in selectRandomFromContainer(old_age_Sick.keys(), 'array', len(old_age_Control)):
    if key in sickMales:
        keyToType[key] = "male"
    elif key in sickFemales:
        keyToType[key] = "female"
    else:
        print str(key)

for key in sorted(keyToType, key=keyToType.get):
    types.append(1)
    if key in sickMales:
        params = [("age", int(sickMales[key]['age']))]
        labels.append(createLabels("s_male", params))
    elif key in sickFemales:
        params = [("age", int(sickFemales[key]['age']))]
        labels.append(createLabels("s_female", params))
    colors.append('#FF6347')
    allFullyFilledOld.append(old_age_Sick[key]['vector'])

title = "Random Sample of Old Data, x>=70, Split By Control-Green and Sick-Red"
file_name = "old_age_Data_control.sick_age.svg"
usefulFunctions.createPyGalLinePlots(allFullyFilledOld, labels, xLabels, types, colors, file_name, title)
print "open above file in browser for intractiveness or run below cell"

open above file in browser for intractiveness or run below cell


In [268]:
webbrowser.get(chrome_path).open('file://' + os.path.realpath(file_name))

True