#patient data

    Talitha has been collecting participant data in an excel spreadsheet. It might be possible to make a python participant class that makes this information more useful.

---

    We can use the class key word to create a class called Participant (with a capital 'P')

In [17]:
class Participant():
    '''
    Collects patient data.
    '''
    def __init__(self, id_number):
        self.id_number=id_number

    The 'self' notation is a little hard to explain so I won't.

In [14]:
participant_1 = Participant(1)

In [16]:
participant_1.id_number

1

    We can create as many participants as we like. They all contain their own information.

In [19]:
participant_2 = Participant(2,'male',36)

In [20]:
participant_2.id_number

2

    Let's add a bit more functionality to our class and put in some testing to make our data cleaner

In [300]:
class Participant():
    '''
    Collects patient data.
    '''
    def __init__(self, id_number, sex, age):
        
        sex=sex.lower()
        
        self.id_number=id_number
        self.sex=sex
        self.age=age
        
        assert type(self.id_number) is int, "id_number - '{}' is not an integer".format(id_number)
        assert type(self.age) is int, "age - '{}' is not an integer".format(age)
        assert self.age < 100 and self.age > 0, "age - {} is not < 100 and > 0 ".format(age)
        
        assert type(self.sex) is str, "sex value - '{}' is not a string".format(sex)
        assert self.sex=='male' or sex=='female', "sex - '{}' is not either 'male' or 'female'".format(sex)

In [302]:
# if the age of the participant is not < 100 and not > 0, an error occurs
participant_3 = Participant(3,'Female',-3)

AssertionError: age - -3 is not < 100 and > 0 

In [298]:
#if the age is not an integer, an error occurs
participant_3 = Participant('two','Female',27)

AssertionError: id_number - 'two' is not an integer

In [303]:
#If sex is not stated as a full word, an error occurs
participant_3 = Participant(3,'F',27)

AssertionError: sex - 'f' is not either 'male' or 'female'

---

    For a simple example of how our class can have functionality as well, lets calculate summary data

In [314]:
class Participant():
    '''
    Collects patient data.
    '''
    import numpy as np
    
    def __init__(self, id_number, sex, age, test_scores):
        
        sex=sex.lower()
        
        self.id_number=id_number
        self.sex=sex
        self.age=age
        self.test_scores=test_scores
        
        
        assert type(self.id_number) is int, "id_number - '{}' is not an integer".format(id_number)
        assert type(self.age) is int, "age - '{}' is not an integer".format(age)
        assert self.age < 100 and self.age > 0, "age - {} is not < 100 and > 0 ".format(age)
        
        assert type(self.sex) is str, "sex value - '{}' is not a string".format(sex)
        assert self.sex=='male' or sex=='female', "sex - '{}' is not either 'male' or 'female'".format(sex)
        
        assert type(self.test_scores) is list, "test_scores - '{}' is not a list".format(test_scores)
        
    
    def average_score(self):
        return np.average(self.test_scores)


In [315]:
#so now individual test scores can be added
participant_4 = Participant(4,'Male',19,[67,44,38,94,99,30])

In [316]:
print("Participant {}'s individual test scores were {} for an average score of {}".format(participant_4.id_number,
                                                                                          participant_4.test_scores,
                                                                                          participant_4.average_score()
                                                                                         )
     )


Participant 4's individual test scores were [67, 44, 38, 94, 99, 30] for an average score of 62.0


    Using the real data

In [317]:
import pandas as pd

In [283]:
#using the ID column of the data as the unique id
df = pd.read_excel('Phase2_ParticipantInfo.xlsx',skiprows= 2,index_col = 0)
#remove the rows with NaN's
df = df.drop(['Low','High'],axis=0)

In [206]:
#Not a thorough fix but a quick tidy up of column names
df.columns = [x.replace(' ','_') for x in df.columns]

In [284]:
df

Unnamed: 0_level_0,Age,Gender,Years Educ,Education Comp,Study disapline,Occupation,Hearing,Medication,Unnamed: 9,General Fam history?,Unnamed: 11,1st degree Family history,Diagnosis,Personal History,Diagnosis.1,Medicated?,Smoker?,Frequency,Menstruation,Contraceptive
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
8.0,26.0,Female,6.0,Undergraduate,Nutrition & Dietetics,Dietician/Teacher,No.,No.,,Yes.,Grandfather had Seasonal Affective Depression,No,,No,,No,No,,-,
31.0,26.0,F,3.5,Secondary/TAFE,International studies,Electrician,no,no,,No,,yes,Paternal cyclothymia,No,,,No,,,
56.0,27.0,Male,13.0,Post Graduate,Medical Research,Research Fellow,No.,yes,zertex,No,,No.,,,,,No,,,
140.0,19.0,F,2.0,Secondary,Psych and Forensic,Student,No,No,,,,No,,No,,,,,,
148.0,19.0,M,2.0,Undergraduate,psychology,student,No.,no,,Yes,Aunty Anxiety,No,,No,,,CONTACT,,,
185.0,18.0,F,3.0,Secondary,Psychology,Butcher,No.,No,,,,No,,NA?,,,,,,
176.0,18.0,Female,3.0,Secondary,Business (advertising)/Psychology,Student,No.,Yes.,"The Pill, Ventolin as needed, Seritide",Yes,"Schizophrenia, Depression",No.,,No,,,No,,21 days,"Yes, Pill (Jaz)"
227.0,19.0,F,3.0,Secondary,Forensic Psychology,Student & part time watress,No.,Yes,"Pill(Levlin), xyzal",No,,No,,No,,,CONTACT,,,"Yes,Pill(Levlin)"
233.0,18.0,F,3.0,Secondary,Psychology,unemployed,No.,No,,Yes,"Depression, bipolar, ASD",Yes.,Maternal and Paternal Depression,Yes,Depression,No,CONTACT,,,"yes, pill"
279.0,33.0,Male,20.0,Post Graduate,hearing research,audiologist,No.,No.,,No.,,No,,No,,,No,,,


In [323]:
#finally remove the NaN row
df = df.dropna(axis = 0,how = 'all')
#and change the column names to something more useable
df.columns = [x.replace(' ','_') for x in df.columns]
print(df)

     Age  Gender  Years_Educ          Education_Comp  \
ID                                                     
8     26  female         6.0           Undergraduate   
31    26  female         3.5          Secondary/TAFE   
56    27    male        13.0           Post Graduate   
140   19  female         2.0               Secondary   
148   19    male         2.0           Undergraduate   
185   18  female         3.0               Secondary   
176   18  female         3.0               Secondary   
227   19  female         3.0               Secondary   
233   18  female         3.0               Secondary   
279   33    male        20.0           Post Graduate   
318   20  female         3.0               Secondary   
378   19  female         3.0               Secondary   
505   27  female         7.0           Undergraduate   
506   25  female        20.0           Undergraduate   
570   26  female        17.0                Graduate   
619   30  female         7.0           Undergrad

    The list comprehension probably deserves an explanation.
    for every entry in the Gender column, 
    - the value is converted to lowercase
    - the 0th letter is passed to the next function
    - 'm' is changed to 'male'
    - 'f' is changed to 'female'

In [293]:
df.Gender = [x.lower()[0].replace('m','male').replace('f','female') for x in df.Gender]
print(df.Gender)

ID
8      female
31     female
56       male
140    female
148      male
185    female
176    female
227    female
233    female
279      male
318    female
378    female
505    female
506    female
570    female
619    female
35     female
55       male
106    female
139      male
141    female
166    female
274      male
286    female
394    female
518    female
580    female
Name: Gender, dtype: object


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value


In [318]:
df.Hearing = [x.lower()[0].replace('n','no').replace('y','yes') for x in df.Hearing]
print(df.Hearing)

ID
8      no
31     no
56     no
140    no
148    no
185    no
176    no
227    no
233    no
279    no
318    no
378    no
505    no
506    no
570    no
619    no
35     no
55     no
106    no
139    no
141    no
166    no
274    no
286    no
394    no
518    no
580    no
Name: Hearing, dtype: object


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value


In [320]:
df.Medication = [x.lower()[0].replace('n','no').replace('y','yes') for x in df.Medication]
print(df.Medication)

ID
8       no
31      no
56     yes
140     no
148     no
185     no
176    yes
227    yes
233     no
279     no
318     no
378    yes
505     no
506    yes
570     no
619     no
35      no
55      no
106    yes
139     no
141     no
166     no
274     no
286     no
394     no
518     no
580     no
Name: Medication, dtype: object


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value


This column was a bit more challenging because of the NaN's. The list comprehension doesn't cope with the NaN's and rather than write out a whole loop with clauses to handle it, I decided to just select out the non-NaN values for the list comprehension __df['General_Fam_history?'].dropna()__ and replace the non-NaN values in the column __pd.notnull(df['General_Fam_history?'])__. 

---

df['General_Fam_history?'][pd.notnull(df['General_Fam_history?'])] = [x.lower()[0].replace('n','no').replace('y','yes') for x in df['General_Fam_history?'].dropna()]


In [344]:
df['General_Fam_history?'][pd.notnull(df['General_Fam_history?'])] = [x.lower()[0].replace('n','no').replace('y','yes') for x in df['General_Fam_history?'].dropna()]

A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


In [342]:
df

Unnamed: 0_level_0,Age,Gender,Years_Educ,Education_Comp,Study_disapline,Occupation,Hearing,Medication,Unnamed:_9,General_Fam_history?,Unnamed:_11,1st_degree_Family_history,Diagnosis,Personal_History,Diagnosis.1,Medicated?,Smoker?,Frequency,Menstruation,Contraceptive
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
8,26,female,6.0,Undergraduate,Nutrition & Dietetics,Dietician/Teacher,no,no,,yes,Grandfather had Seasonal Affective Depression,No,,No,,No,No,,-,
31,26,female,3.5,Secondary/TAFE,International studies,Electrician,no,no,,no,,yes,Paternal cyclothymia,No,,,No,,,
56,27,male,13.0,Post Graduate,Medical Research,Research Fellow,no,yes,zertex,no,,No.,,,,,No,,,
140,19,female,2.0,Secondary,Psych and Forensic,Student,no,no,,,,No,,No,,,,,,
148,19,male,2.0,Undergraduate,psychology,student,no,no,,yes,Aunty Anxiety,No,,No,,,CONTACT,,,
185,18,female,3.0,Secondary,Psychology,Butcher,no,no,,,,No,,NA?,,,,,,
176,18,female,3.0,Secondary,Business (advertising)/Psychology,Student,no,yes,"The Pill, Ventolin as needed, Seritide",yes,"Schizophrenia, Depression",No.,,No,,,No,,21 days,"Yes, Pill (Jaz)"
227,19,female,3.0,Secondary,Forensic Psychology,Student & part time watress,no,yes,"Pill(Levlin), xyzal",no,,No,,No,,,CONTACT,,,"Yes,Pill(Levlin)"
233,18,female,3.0,Secondary,Psychology,unemployed,no,no,,yes,"Depression, bipolar, ASD",Yes.,Maternal and Paternal Depression,Yes,Depression,No,CONTACT,,,"yes, pill"
279,33,male,20.0,Post Graduate,hearing research,audiologist,no,no,,no,,No,,No,,,No,,,


    
    As I'm writing all these list comprehensions, it's occuring to me that I should make a function that does it.

In [353]:
[x.lower()[0].replace('m','male').replace('f','female').replace('n','False').replace('y','True') for x in df.Hearing]

['False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False',
 'False']

In [348]:
str(True)

'True'

In [358]:
[str(x).lower() in ['yes','y','1','True'] for x in df.Medication]

[False,
 False,
 True,
 False,
 False,
 False,
 True,
 True,
 False,
 False,
 False,
 True,
 False,
 True,
 False,
 False,
 False,
 False,
 True,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False]