The overarching goal of this paper is to assess the risk factors that may lead to a major bleeding event in CLL patients taking ibruntinib for treatment. 

Therefore, the dependent variable is the bleeding outcome (Major, Minor, or None), while the relevant independent variables are: 
- age of diagnosis, gender, platelet count, anti-coagulation, antiplatelet, PMHx bleeding risk, Molecular/cytogenetics, anemia, prior lines of therapy

In this notebook, I compute some relevant freqencies of the data, in order to later perform a categorical analysis (contigency tables). Note: there are some independent variables not included here which I later determined their frequencies by sorting and filtering through the excel spreadsheet. 

Importing the necessary libraries. 

In [1]:
import pandas as pd

print("complete")

complete


Load the dataset into python, and capitilize the gender column

In [2]:
df = pd.read_csv("/Users/anthonyquint/Desktop/LHSC_Work_Folder/Mina/Bleeding_study/Ibrutinib Data Set, June 10,2021, de- identified data.csv")
df['gender'] = df['gender'].str.upper()
#df.head() Not displaying the data for confidentiality reasons

Assessing the frequency of patients based on their platelet count and the type of their bleeding event

In [3]:
seriesObj_1 = df.apply(lambda x:True if x['Platelets < 50 (Y/N)']=='Y'and x['Major Bleed (Y/N)']=='Y' else False, axis=1)
seriesObj_2 = df.apply(lambda x:True if x['Platelets < 50 (Y/N)']=='N'and x['Major Bleed (Y/N)']=='Y' else False, axis=1)
seriesObj_3 = df.apply(lambda x:True if x['Platelets < 50 (Y/N)']=='Y'and x['Major Bleed (Y/N)']=='N' else False, axis=1)
seriesObj_4 = df.apply(lambda x:True if x['Platelets < 50 (Y/N)']=='N'and x['Major Bleed (Y/N)']=='N' else False, axis=1)
seriesObj_5 = df.apply(lambda x:True if x['Platelets < 50 (Y/N)']=='Y'and x['Major Bleed (Y/N)']!='Y'and x['Major Bleed (Y/N)']!='N' else False, axis=1)
seriesObj_6 = df.apply(lambda x:True if x['Platelets < 50 (Y/N)']=='N'and x['Major Bleed (Y/N)']!='Y'and x['Major Bleed (Y/N)']!='N' else False, axis=1)

numOfRows_1 = len(seriesObj_1[seriesObj_1 == True].index)
numOfRows_2 = len(seriesObj_2[seriesObj_2 == True].index)
numOfRows_3 = len(seriesObj_3[seriesObj_3 == True].index)
numOfRows_4 = len(seriesObj_4[seriesObj_4 == True].index)
numOfRows_5 = len(seriesObj_5[seriesObj_5 == True].index)
numOfRows_6 = len(seriesObj_6[seriesObj_6 == True].index)

print("Number of people with platelets < 50 and that had a major bleeding event:",numOfRows_1)
print("Number of people with platelets > 50 and that had a major bleeding event:",numOfRows_2)
print("Number of people with platelets < 50 and that had a minor bleeding event:",numOfRows_3)
print("Number of people with platelets > 50 and that had a minor bleeding event:",numOfRows_4)
print("Number of people with platelets < 50 and that didn't have a bleeding event:",numOfRows_5)
print("Number of people with platelets > 50 and that didn't have a bleeding event:",numOfRows_6)

Number of people with platelets < 50 and that had a major bleeding event: 6
Number of people with platelets > 50 and that had a major bleeding event: 11
Number of people with platelets < 50 and that had a minor bleeding event: 5
Number of people with platelets > 50 and that had a minor bleeding event: 20
Number of people with platelets < 50 and that didn't have a bleeding event: 28
Number of people with platelets > 50 and that didn't have a bleeding event: 100


Assessing the frequency of patients based on whether they're on anticoagulants and the type of their bleeding event

In [4]:
seriesObj_1 = df.apply(lambda x:True if x['anticoagulation (Y/N)']=='Y'and x['Major Bleed (Y/N)']=='Y' else False, axis=1)
seriesObj_2 = df.apply(lambda x:True if x['anticoagulation (Y/N)']=='N'and x['Major Bleed (Y/N)']=='Y' else False, axis=1)
seriesObj_3 = df.apply(lambda x:True if x['anticoagulation (Y/N)']=='Y'and x['Major Bleed (Y/N)']=='N' else False, axis=1)
seriesObj_4 = df.apply(lambda x:True if x['anticoagulation (Y/N)']=='N'and x['Major Bleed (Y/N)']=='N' else False, axis=1)
seriesObj_5 = df.apply(lambda x:True if x['anticoagulation (Y/N)']=='Y'and x['Major Bleed (Y/N)']!='Y'and x['Major Bleed (Y/N)']!='N' else False, axis=1)
seriesObj_6 = df.apply(lambda x:True if x['anticoagulation (Y/N)']=='N'and x['Major Bleed (Y/N)']!='Y'and x['Major Bleed (Y/N)']!='N' else False, axis=1)

numOfRows_1 = len(seriesObj_1[seriesObj_1 == True].index)
numOfRows_2 = len(seriesObj_2[seriesObj_2 == True].index)
numOfRows_3 = len(seriesObj_3[seriesObj_3 == True].index)
numOfRows_4 = len(seriesObj_4[seriesObj_4 == True].index)
numOfRows_5 = len(seriesObj_5[seriesObj_5 == True].index)
numOfRows_6 = len(seriesObj_6[seriesObj_6 == True].index)

print("Number of people with anticoagulation and that had a major bleeding event:",numOfRows_1)
print("Number of people without anticoagulation and that had a major bleeding event:",numOfRows_2)
print("Number of people with anticoagulation and that had a minor bleeding event:",numOfRows_3)
print("Number of people without anticoagulation and that had a minor bleeding event:",numOfRows_4)
print("Number of people with anticoagulation and that didn't have a bleeding event:",numOfRows_5)
print("Number of people without anticoagulation and that didn't have a bleeding event:",numOfRows_6)

Number of people with anticoagulation and that had a major bleeding event: 6
Number of people without anticoagulation and that had a major bleeding event: 11
Number of people with anticoagulation and that had a minor bleeding event: 11
Number of people without anticoagulation and that had a minor bleeding event: 14
Number of people with anticoagulation and that didn't have a bleeding event: 15
Number of people without anticoagulation and that didn't have a bleeding event: 113


Assessing the frequency of patients based on whether they're on antiplatelets and the type of their bleeding event

In [5]:
seriesObj_1 = df.apply(lambda x:True if x['anti platelet (Y/N)']=='Y'and x['Major Bleed (Y/N)']=='Y' else False, axis=1)
seriesObj_2 = df.apply(lambda x:True if x['anti platelet (Y/N)']=='N'and x['Major Bleed (Y/N)']=='Y' else False, axis=1)
seriesObj_3 = df.apply(lambda x:True if x['anti platelet (Y/N)']=='Y'and x['Major Bleed (Y/N)']=='N' else False, axis=1)
seriesObj_4 = df.apply(lambda x:True if x['anti platelet (Y/N)']=='N'and x['Major Bleed (Y/N)']=='N' else False, axis=1)
seriesObj_5 = df.apply(lambda x:True if x['anti platelet (Y/N)']=='Y'and x['Major Bleed (Y/N)']!='Y'and x['Major Bleed (Y/N)']!='N' else False, axis=1)
seriesObj_6 = df.apply(lambda x:True if x['anti platelet (Y/N)']=='N'and x['Major Bleed (Y/N)']!='Y'and x['Major Bleed (Y/N)']!='N' else False, axis=1)

numOfRows_1 = len(seriesObj_1[seriesObj_1 == True].index)
numOfRows_2 = len(seriesObj_2[seriesObj_2 == True].index)
numOfRows_3 = len(seriesObj_3[seriesObj_3 == True].index)
numOfRows_4 = len(seriesObj_4[seriesObj_4 == True].index)
numOfRows_5 = len(seriesObj_5[seriesObj_5 == True].index)
numOfRows_6 = len(seriesObj_6[seriesObj_6 == True].index)

print("Number of people with anti platelets and that had a major bleeding event:",numOfRows_1)
print("Number of people without anti platelets and that had a major bleeding event:",numOfRows_2)
print("Number of people with anti platelets and that had a minor bleeding event:",numOfRows_3)
print("Number of people without anti platelets and that had a minor bleeding event:",numOfRows_4)
print("Number of people with anti platelets and that didn't have a bleeding event:",numOfRows_5)
print("Number of people without anti platelets and that didn't have a bleeding event:",numOfRows_6)

Number of people with anti platelets and that had a major bleeding event: 4
Number of people without anti platelets and that had a major bleeding event: 13
Number of people with anti platelets and that had a minor bleeding event: 6
Number of people without anti platelets and that had a minor bleeding event: 19
Number of people with anti platelets and that didn't have a bleeding event: 22
Number of people without anti platelets and that didn't have a bleeding event: 106


Assessing the frequency of patients based on their PMHx bleeding risk and the type of their bleeding event

In [6]:
seriesObj_1 = df.apply(lambda x:True if x['PMHx bleeding risk (Y/N)']=='Y'and x['Major Bleed (Y/N)']=='Y' else False, axis=1)
seriesObj_2 = df.apply(lambda x:True if x['PMHx bleeding risk (Y/N)']=='N'and x['Major Bleed (Y/N)']=='Y' else False, axis=1)
seriesObj_3 = df.apply(lambda x:True if x['PMHx bleeding risk (Y/N)']=='Y'and x['Major Bleed (Y/N)']=='N' else False, axis=1)
seriesObj_4 = df.apply(lambda x:True if x['PMHx bleeding risk (Y/N)']=='N'and x['Major Bleed (Y/N)']=='N' else False, axis=1)
seriesObj_5 = df.apply(lambda x:True if x['PMHx bleeding risk (Y/N)']=='Y'and x['Major Bleed (Y/N)']!='Y'and x['Major Bleed (Y/N)']!='N' else False, axis=1)
seriesObj_6 = df.apply(lambda x:True if x['PMHx bleeding risk (Y/N)']=='N'and x['Major Bleed (Y/N)']!='Y'and x['Major Bleed (Y/N)']!='N' else False, axis=1)

numOfRows_1 = len(seriesObj_1[seriesObj_1 == True].index)
numOfRows_2 = len(seriesObj_2[seriesObj_2 == True].index)
numOfRows_3 = len(seriesObj_3[seriesObj_3 == True].index)
numOfRows_4 = len(seriesObj_4[seriesObj_4 == True].index)
numOfRows_5 = len(seriesObj_5[seriesObj_5 == True].index)
numOfRows_6 = len(seriesObj_6[seriesObj_6 == True].index)

print("Number of people with PMHx bleeding risk and that had a major bleeding event:",numOfRows_1)
print("Number of people without PMHx bleeding risk and that had a major bleeding event:",numOfRows_2)
print("Number of people with PMHx bleeding risk and that had a minor bleeding event:",numOfRows_3)
print("Number of people without PMHx bleeding risk and that had a minor bleeding event:",numOfRows_4)
print("Number of people with PMHx bleeding risk and that didn't have a bleeding event:",numOfRows_5)
print("Number of people without PMHx bleeding risk and that didn't have a bleeding event:",numOfRows_6)

Number of people with PMHx bleeding risk and that had a major bleeding event: 4
Number of people without PMHx bleeding risk and that had a major bleeding event: 13
Number of people with PMHx bleeding risk and that had a minor bleeding event: 3
Number of people without PMHx bleeding risk and that had a minor bleeding event: 22
Number of people with PMHx bleeding risk and that didn't have a bleeding event: 16
Number of people without PMHx bleeding risk and that didn't have a bleeding event: 112


Assessing the frequency of patients based on whether their gender and the type of their bleeding event

In [7]:
seriesObj_1 = df.apply(lambda x:True if x['gender']=='M'and x['Major Bleed (Y/N)']=='Y' else False, axis=1)
seriesObj_2 = df.apply(lambda x:True if x['gender']=='F'and x['Major Bleed (Y/N)']=='Y' else False, axis=1)
seriesObj_3 = df.apply(lambda x:True if x['gender']=='M'and x['Major Bleed (Y/N)']=='N' else False, axis=1)
seriesObj_4 = df.apply(lambda x:True if x['gender']=='F'and x['Major Bleed (Y/N)']=='N' else False, axis=1)
seriesObj_5 = df.apply(lambda x:True if x['gender']=='M'and x['Major Bleed (Y/N)']!='Y'and x['Major Bleed (Y/N)']!='N' else False, axis=1)
seriesObj_6 = df.apply(lambda x:True if x['gender']=='F'and x['Major Bleed (Y/N)']!='Y'and x['Major Bleed (Y/N)']!='N' else False, axis=1)

numOfRows_1 = len(seriesObj_1[seriesObj_1 == True].index)
numOfRows_2 = len(seriesObj_2[seriesObj_2 == True].index)
numOfRows_3 = len(seriesObj_3[seriesObj_3 == True].index)
numOfRows_4 = len(seriesObj_4[seriesObj_4 == True].index)
numOfRows_5 = len(seriesObj_5[seriesObj_5 == True].index)
numOfRows_6 = len(seriesObj_6[seriesObj_6 == True].index)

print("Number of Males that had a major bleeding event:",numOfRows_1)
print("Number of Females that had a major bleeding event:",numOfRows_2)
print("Number of Males that had a minor bleeding event:",numOfRows_3)
print("Number of Females that had a minor bleeding event:",numOfRows_4)
print("Number of Males that didn't have a bleeding event:",numOfRows_5)
print("Number of Females that didn't have a bleeding event:",numOfRows_6)

Number of Males that had a major bleeding event: 12
Number of Females that had a major bleeding event: 5
Number of Males that had a minor bleeding event: 18
Number of Females that had a minor bleeding event: 7
Number of Males that didn't have a bleeding event: 75
Number of Females that didn't have a bleeding event: 53


Assessing the frequency of patients based on their cytogenetic risk factors and the type of their bleeding event

In [8]:
seriesObj_1 = df.apply(lambda x:True if x['HR Molecular/Cytogenetics (Y/N)']=='Y'and x['Major Bleed (Y/N)']=='Y' else False, axis=1)
seriesObj_2 = df.apply(lambda x:True if x['HR Molecular/Cytogenetics (Y/N)']=='N'and x['Major Bleed (Y/N)']=='Y' else False, axis=1)
seriesObj_3 = df.apply(lambda x:True if x['HR Molecular/Cytogenetics (Y/N)']=='Y'and x['Major Bleed (Y/N)']=='N' else False, axis=1)
seriesObj_4 = df.apply(lambda x:True if x['HR Molecular/Cytogenetics (Y/N)']=='N'and x['Major Bleed (Y/N)']=='N' else False, axis=1)
seriesObj_5 = df.apply(lambda x:True if x['HR Molecular/Cytogenetics (Y/N)']=='Y'and x['Major Bleed (Y/N)']!='Y'and x['Major Bleed (Y/N)']!='N' else False, axis=1)
seriesObj_6 = df.apply(lambda x:True if x['HR Molecular/Cytogenetics (Y/N)']=='N'and x['Major Bleed (Y/N)']!='Y'and x['Major Bleed (Y/N)']!='N' else False, axis=1)

numOfRows_1 = len(seriesObj_1[seriesObj_1 == True].index)
numOfRows_2 = len(seriesObj_2[seriesObj_2 == True].index)
numOfRows_3 = len(seriesObj_3[seriesObj_3 == True].index)
numOfRows_4 = len(seriesObj_4[seriesObj_4 == True].index)
numOfRows_5 = len(seriesObj_5[seriesObj_5 == True].index)
numOfRows_6 = len(seriesObj_6[seriesObj_6 == True].index)

print("Number of people with cyotogenetic factors that had a major bleeding event:",numOfRows_1)
print("Number of people without cyotogenetic factors that had a major bleeding event:",numOfRows_2)
print("Number of people with cyotogenetic factors that had a minor bleeding event:",numOfRows_3)
print("Number of people without cyotogenetic factors that had a minor bleeding event:",numOfRows_4)
print("Number of people with cyotogenetic factors that didn't have a bleeding event:",numOfRows_5)
print("Number of people without cyotogenetic factors that didn't have a bleeding event:",numOfRows_6)

print("Note: that this does not sum to the total of 170 patients since the people with 'unknown'cytogenetic risk factors were excluded from calculation")

Number of people with cyotogenetic factors that had a major bleeding event: 5
Number of people without cyotogenetic factors that had a major bleeding event: 7
Number of people with cyotogenetic factors that had a minor bleeding event: 8
Number of people without cyotogenetic factors that had a minor bleeding event: 16
Number of people with cyotogenetic factors that didn't have a bleeding event: 58
Number of people without cyotogenetic factors that didn't have a bleeding event: 55
Note: that this does not sum to the total of 170 patients since the people with 'unknown'cytogenetic risk factors were excluded from calculation


Assessing the frequency of patients based on whether they're anemic and the type of their bleeding event

In [9]:
seriesObj_1 = df.apply(lambda x:True if x['Anemia (hb < 110) (Y/N)']=='Y'and x['Major Bleed (Y/N)']=='Y' else False, axis=1)
seriesObj_2 = df.apply(lambda x:True if x['Anemia (hb < 110) (Y/N)']=='N'and x['Major Bleed (Y/N)']=='Y' else False, axis=1)
seriesObj_3 = df.apply(lambda x:True if x['Anemia (hb < 110) (Y/N)']=='Y'and x['Major Bleed (Y/N)']=='N' else False, axis=1)
seriesObj_4 = df.apply(lambda x:True if x['Anemia (hb < 110) (Y/N)']=='N'and x['Major Bleed (Y/N)']=='N' else False, axis=1)
seriesObj_5 = df.apply(lambda x:True if x['Anemia (hb < 110) (Y/N)']=='Y'and x['Major Bleed (Y/N)']!='Y'and x['Major Bleed (Y/N)']!='N' else False, axis=1)
seriesObj_6 = df.apply(lambda x:True if x['Anemia (hb < 110) (Y/N)']=='N'and x['Major Bleed (Y/N)']!='Y'and x['Major Bleed (Y/N)']!='N' else False, axis=1)

numOfRows_1 = len(seriesObj_1[seriesObj_1 == True].index)
numOfRows_2 = len(seriesObj_2[seriesObj_2 == True].index)
numOfRows_3 = len(seriesObj_3[seriesObj_3 == True].index)
numOfRows_4 = len(seriesObj_4[seriesObj_4 == True].index)
numOfRows_5 = len(seriesObj_5[seriesObj_5 == True].index)
numOfRows_6 = len(seriesObj_6[seriesObj_6 == True].index)

print("Number of people with anemia that had a major bleeding event:",numOfRows_1)
print("Number of people without anemia that had a major bleeding event:",numOfRows_2)
print("Number of people with anemia that had a minor bleeding event:",numOfRows_3)
print("Number of people without anemia that had a minor bleeding event:",numOfRows_4)
print("Number of people with anemia that didn't have a bleeding event:",numOfRows_5)
print("Number of people without anemia that didn't have a bleeding event:",numOfRows_6)


Number of people with anemia that had a major bleeding event: 14
Number of people without anemia that had a major bleeding event: 3
Number of people with anemia that had a minor bleeding event: 17
Number of people without anemia that had a minor bleeding event: 8
Number of people with anemia that didn't have a bleeding event: 74
Number of people without anemia that didn't have a bleeding event: 54
