## Analysis pilot data Sysmex (BloodCounts Project)

**Background**: Pilot data is provided from a cohort of COVID patients at the VUMC location, some with pulmonary embolisms. Data is obtained from blood sampling and analysed using Sysmex software. In addition to the regular full blood count, additional information is provided after decryption by Sysmex including flow cytometry data. Respective data is of the cohort is retrieved after permission of lead investigators and extracted based on SampleIds via the Research Data Platform overseen by the Business Intellegence Department (jira ticket:VUBI-8429). Sample data was subsequently extracted by the TraiL in .116 files before decryption by Sysmex.

**Analysis**: Perform exploratory data analysis of the blood counts data, including FCS files, and investigate predictive value for lung embolism.

**Date**:        23-01-2024

**Author**:      Stephan van der Zwaard

In [2]:
# Load required libraries
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import flowkit as fk

# Set options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [4]:
# Check if CSV can be found:
os.path.exists(os.getcwd()+'/testfiles_AMS/XN_SAMPLE.csv')

True

In [5]:
# Read CSV data
data = pd.read_csv(os.getcwd()+'/testfiles_AMS/XN_SAMPLE.csv')

In [6]:
# Inspect dataframe (221 records by 492 cols)
data.info()
data.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 221 entries, 0 to 220
Columns: 492 entries, Nickname to Unnamed: 491
dtypes: float64(279), int64(39), object(174)
memory usage: 849.6+ KB


Unnamed: 0,Nickname,Analyzer ID,Date,Time,Rack,Position,Sample No.,Sample Inf.,Order Type,Reception Date,Measurement Mode,Discrete,Patient ID,Analysis Info.,Judgment,Positive(Diff.),Positive(Morph.),Positive(Count),Error(Func.),Error(Result),Order Info.,WBC Abnormal,WBC Suspect,RBC Abnormal,RBC Suspect,PLT Abnormal,PLT Suspect,WBC Info.,PLT Info.,Rule Result,Validate,Validator,Action Message (Check),Action Message (Review),Action Message (Retest),Sample Comment,Patient Name,Birth,Sex,Patient Comment,Ward Name,Doctor Name,Output,Sequence No.,IP ABN(WBC)WBC Abn Scattergram,IP ABN(WBC)Neutropenia,IP ABN(WBC)Neutrophilia,IP ABN(WBC)Lymphopenia,IP ABN(WBC)Lymphocytosis,IP ABN(WBC)Monocytosis,IP ABN(WBC)Eosinophilia,IP ABN(WBC)Basophilia,IP ABN(WBC)Leukocytopenia,IP ABN(WBC)Leukocytosis,IP ABN(WBC)NRBC Present,IP ABN(WBC)IG Present,IP ABN(RBC)RBC Abn Distribution,IP ABN(RBC)Dimorphic Population,IP ABN(RBC)Anisocytosis,IP ABN(RBC)Microcytosis,IP ABN(RBC)Macrocytosis,IP ABN(RBC)Hypochromia,IP ABN(RBC)Anemia,IP ABN(RBC)Erythrocytosis,IP ABN(RBC)RET Abn Scattergram,IP ABN(RBC)Reticulocytosis,IP ABN(PLT)PLT Abn Distribution,IP ABN(PLT)Thrombocytopenia,IP ABN(PLT)Thrombocytosis,IP ABN(PLT)PLT Abn Scattergram,IP SUS(WBC)Blasts/Abn Lympho?,IP SUS(WBC)Blasts?,IP SUS(WBC)Abn Lympho?,IP SUS(WBC)Left Shift?,IP SUS(WBC)Atypical Lympho?,IP SUS(RBC)RBC Agglutination?,IP SUS(RBC)Turbidity/HGB Interf?,IP SUS(RBC)Iron Deficiency?,IP SUS(RBC)HGB Defect?,IP SUS(RBC)Fragments?,IP SUS(PLT)PLT Clumps?,Q-Flag(Blasts/Abn Lympho?),Q-Flag(Blasts?),Q-Flag(Abn Lympho?),Q-Flag(Left Shift?),Q-Flag(Atypical Lympho?),Q-Flag(RBC Agglutination?),Q-Flag(Turbidity/HGB Interf?),Q-Flag(Iron Deficiency?),Q-Flag(HGB Defect?),Q-Flag(Fragments?),Q-Flag(PLT Clumps?),WBC(10^3/uL),WBC/M,RBC(10^6/uL),RBC/M,HGB(g/dL),HGB/M,HCT(%),HCT/M,MCV(fL),MCV/M,MCH(pg),MCH/M,MCHC(g/dL),MCHC/M,PLT(10^3/uL),PLT/M,RDW-SD(fL),RDW-SD/M,RDW-CV(%),RDW-CV/M,PDW(fL),PDW/M,MPV(fL),MPV/M,P-LCR(%),P-LCR/M,PCT(%),PCT/M,NRBC#(10^3/uL),NRBC#/M,NRBC%(%),NRBC%/M,NEUT#(10^3/uL),NEUT#/M,LYMPH#(10^3/uL),LYMPH#/M,MONO#(10^3/uL),MONO#/M,EO#(10^3/uL),EO#/M,BASO#(10^3/uL),BASO#/M,NEUT%(%),NEUT%/M,LYMPH%(%),LYMPH%/M,MONO%(%),MONO%/M,EO%(%),EO%/M,BASO%(%),BASO%/M,IG#(10^3/uL),IG#/M,IG%(%),IG%/M,RET%(%),RET%/M,RET#(10^6/uL),RET#/M,IRF(%),IRF/M,LFR(%),LFR/M,MFR(%),MFR/M,HFR(%),HFR/M,RET-He(pg),RET-He/M,IPF(%),IPF/M,[PLT-I(10^3/uL)],[PLT-I/M],MicroR(%),MicroR/M,MacroR(%),MacroR/M,[TNC(10^3/uL)],[TNC/M],[WBC-N(10^3/uL)],[WBC-N/M],[TNC-N(10^3/uL)],[TNC-N/M],[BA-N#(10^3/uL)],[BA-N#/M],[BA-N%(%)],[BA-N%/M],[WBC-D(10^3/uL)],[WBC-D/M],[TNC-D(10^3/uL)],[TNC-D/M],[NEUT#&(10^3/uL)],[NEUT#&/M],[NEUT%&(%)],[NEUT%&/M],[LYMP#&(10^3/uL)],[LYMP#&/M],[LYMP%&(%)],[LYMP%&/M],[HFLC#(10^3/uL)],[HFLC#/M],[HFLC%(%)],[HFLC%/M],[BA-D#(10^3/uL)],[BA-D#/M],[BA-D%(%)],[BA-D%/M],[NE-SSC(ch)],[NE-SSC/M],[NE-SFL(ch)],[NE-SFL/M],[NE-FSC(ch)],[NE-FSC/M],[LY-X(ch)],[LY-X/M],[LY-Y(ch)],[LY-Y/M],[LY-Z(ch)],[LY-Z/M],[MO-X(ch)],[MO-X/M],[MO-Y(ch)],[MO-Y/M],[MO-Z(ch)],[MO-Z/M],[NE-WX],[NE-WX/M],[NE-WY],[NE-WY/M],[NE-WZ],[NE-WZ/M],[LY-WX],[LY-WX/M],[LY-WY],[LY-WY/M],[LY-WZ],[LY-WZ/M],[MO-WX],[MO-WX/M],[MO-WY],[MO-WY/M],[MO-WZ],[MO-WZ/M],[WBC-P(10^3/uL)],[WBC-P/M],[TNC-P(10^3/uL)],[TNC-P/M],[RBC-O(10^6/uL)],[RBC-O/M],[PLT-O(10^3/uL)],[PLT-O/M],RBC-He(pg),RBC-He/M,Delta-He(pg),Delta-He/M,[RET-Y(ch)],[RET-Y/M],[RET-RBC-Y(ch)],[RET-RBC-Y/M],[IRF-Y(ch)],[IRF-Y/M],[FRC#(10^6/uL)],[FRC#/M],[FRC%(%)],[FRC%/M],HYPO-He(%),HYPO-He/M,HYPER-He(%),HYPER-He/M,[RPI],[RPI/M],[RET-UPP],[RET-UPP/M],[RET-TNC],[RET-TNC/M],[PLT-F(10^3/uL)],[PLT-F/M],[H-IPF(%)],[H-IPF/M],IPF#(10^3/uL),IPF#/M,WBC-BF(10^3/uL),WBC-BF/M,RBC-BF(10^6/uL),RBC-BF/M,MN#(10^3/uL),MN#/M,PMN#(10^3/uL),PMN#/M,MN%(%),MN%/M,PMN%(%),PMN%/M,TC-BF#(10^3/uL),TC-BF#/M,[HF-BF#(10^3/uL)],[HF-BF#/M],[HF-BF%(/100WBC)],[HF-BF%/M],[NE-BF#(10^3/uL)],[NE-BF#/M],[NE-BF%(%)],[NE-BF%/M],[LY-BF#(10^3/uL)],[LY-BF#/M],[LY-BF%(%)],[LY-BF%/M],[MO-BF#(10^3/uL)],[MO-BF#/M],[MO-BF%(%)],[MO-BF%/M],[EO-BF#(10^3/uL)],[EO-BF#/M],[EO-BF%(%)],[EO-BF%/M],[RBC-BF2(10^6/uL)],[RBC-BF2/M],[HGB-BLANK],[HGB-SAMPLE],[R-MFV(fL)],[S-RBC(10^6/uL)],[S-MCV(fL)],[L-RBC(10^6/uL)],[L-MCV(fL)],[P-MFV(fL)],[WNR-X(ch)],[WNR-Y(ch)],[WNR-Z(ch)],[WNR-WX],[WNR-WY],[WDF-X(ch)],[WDF-Y(ch)],[WDF-Z(ch)],[WDF-WX],[WDF-WY],[WBC-FX(ch)],[DLT-WBCD],[WPC-X(ch)],[WPC-Y(ch)],[WPC-Z(ch)],[DLT-WBCP],[WPC-AREA1#],[WPC-AREA2#],[WPC-AREA3#],[RET-RBC-X(ch)],[RET-X(ch)],[RET-RBC-Z(ch)],[RET-RBC-WX],[RET-RBC-WY],[DLT-RBC],[DLT-PLTO],[Unclassified],[PLT-F-AREA1#],[PLT-F-X(ch)],[PLT-F-Y(ch)],[PLT-F-Z(ch)],[PLT-F-RBC-X(ch)],[PLT-F-RBC-Y(ch)],[PLT-F-RBC-Z(ch)],[PLT-F-RBC-WX],[PLT-F-RBC-WY],[DLT-PLT-F],Unnamed: 355,[WBC-N2(10^3/uL)],[TNC-N2(10^3/uL)],[WBC-D2(10^3/uL)],[TNC-D2(10^3/uL)],[WBC-P2(10^3/uL)],[TNC-P2(10^3/uL)],[HGB_NONSI(g/dL)],[HGB_SI(mmol/L)],[HGB_SI2(mmol/L)],[WNR_TOTAL_COUNT],[WDF_TOTAL_COUNT],[WDF_PLOT_COUNT],[WPC_TOTAL_COUNT],[WPC_PLOT_COUNT],[RET_TOTAL_COUNT],[PLT-F_SIGNAL_COUNT_A],[PLT-F_DATA_COUNT_A],[PLT-F_PLOT_COUNT_A],[PLT-F_PLOT_COUNT_B],[AREA-F#],[NRBC-X(ch)],[NRBC-Y(ch)],HPC#(10^3/uL),HPC#/M,[HGB_NONSI2(g/dL)],[HGB-O(g/dL)],[HGB-O/M],[PLT-F2(10^3/uL)],[PLT-F2/M],IP SUS(RBC)iRBC?,Q-Flag(iRBC?),[iRBC-WNR#],[iRBC-WDF#],[Delta-HGB(g/dL)],[Delta-HGB/M],[MCHC-O(g/dL)],[MCHC-O/M],[WBC(10^3/uL)],[WBC/M],[RBC(10^6/uL)],[RBC/M],[RBC-I(10^6/uL)],[RBC-I/M],[RBC-O(10^6/uL)].1,[RBC-O/M].1,[NEUT#(10^3/uL)],[NEUT#/M],[LYMPH#(10^3/uL)],[LYMPH#/M],[MONO#(10^3/uL)],[MONO#/M],[EO#(10^3/uL)],[EO#/M],[NEUT%(%)],[NEUT%/M],[LYMPH%(%)],[LYMPH%/M],[MONO%(%)],[MONO%/M],[EO%(%)],[EO%/M],[MN#(10^3/uL)],[MN#/M],[PMN#(10^3/uL)],[PMN#/M],[HF#(10^3/uL)],[HF#/M],[MN%(%)],[MN%/M],[PMN%(%)],[PMN%/M],[HF%(/100WBC)],[HF%/M],[TC#(10^3/uL)],[TC#/M],HPC%(%),HPC%/M,AS-LYMP#(10^3/uL),AS-LYMP#/M,AS-LYMP%(%),AS-LYMP%/M,RE-LYMP#(10^3/uL),RE-LYMP#/M,RE-LYMP%(%),RE-LYMP%/M,NEUT-RI(FI),NEUT-RI/M,NEUT-GI(SI),NEUT-GI/M,Unnamed: 445,Unnamed: 446,IP SUS(RBC)iRBC?(R),Q-Flag(iRBC?(R)),IP SUS(PLT)Giant Platelet?,Q-Flag(Giant Platelet?),[AS-LYMP%L(%)],[AS-LYMP%L/M],[RE-LYMP%L(%)],[RE-LYMP%L/M],WBC(10^3/uL).1,WBC/M.1,RBC(RBC Pack)(10^6/uL),RBC(RBC Pack)/M,[RBC(PLT Pack)(10^6/uL)],[RBC(PLT Pack)/M],HGB(RBC Pack)(g/dL),HGB(RBC Pack)/M,HCT(%).1,HCT/M.1,PLT(10^3/uL).1,PLT/M.1,[RBC-I(10^6/uL)].1,[RBC-I/M].1,[HGB(PLT Pack)(g/dL)],[HGB(PLT Pack)/M],[MCV(fL)],[MCV/M],[MCH(pg)],[MCH/M],[MCHC(g/dL)],[MCHC/M],[RDW-SD(fL)],[RDW-SD/M],[RDW-CV(%)],[RDW-CV/M],[PDW(fL)],[PDW/M],[MPV(fL)],[MPV/M],Unnamed: 485,Unnamed: 486,[IPF(%)],[IPF/M],[IPF#(10^3/uL)],[IPF#/M],Unnamed: 491
0,,XN-10^24746,2020/10/02,15:54:12,24,3,205276124803,B,Initial,2020/10/02 15:51:46,WB,CBC+DIFF,,Normal,Negative,,,,,,1,,,,,,,WBC-N,PLT-I,,,,,,,,,,,,,,,289,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20,DISCRETE,DISCRETE,10,30,70,90,80,80,0,0,7.7,,4.16,,12.1,,36.5,,87.7,,29.1,,33.2,,305,,41.8,,13.0,,9.8,,9.4,,19.0,,0.29,,0.0,,0.0,,4.95,,2.2,,0.51,,0.03,,0.01,,64.3,,28.6,,6.6,,0.4,,0.1,,0.05,,0.6,,0.0,,0.0,,0.0,,100.0,,0.0,,0.0,,5.6,,0.0,,305,,1.5,,3.2,,7.7,,7.7,,7.7,,0.01,,0.1,,7.69,,7.69,,4.9,,63.7,,2.09,,27.2,,0.11,,1.4,,0.03,,0.4,,155.2,,50.2,,82.3,,79.8,,71.1,,58.7,,125.8,,128.8,,72.6,,303,,698,,766,,476,,745,,647,,238,,683,,619,,0.0,,0.0,,0.0,,0,,5.6,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0,,0,,0,,0.0,,0.0,,7.691,,0.0,,2.701,,4.99,,35.1,,64.9,,0.0,,----,,----,,4.956,,64.5,,2.195,,28.5,,0.506,,6.6,,0.034,,0.4,,0.0,,5151,6359,88.0,0.0,0.0,0.0,0.0,7.9,151.1,96.2,81.3,216,754,155.2,50.2,82.3,393,1122,0.0,1.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0,0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,,7.695,7.695,7.691,7.691,0.0,0.0,12.1,7.5,7.49,8616,8306,7084,0,0,0,0,0,0,0,0,0.0,0.0,0.0,,12.08,0.0,,0.0,,,0,0,0,12.1,,0.0,,0.0,,4.1633,,4.1633,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.11,,1.4,,0.16,,2.1,,50.2,,155.2,,,,,DISCRETE,,0,5.0,,7.3,,0.0,,0.0,,0.0,,12.1,,0.0,,0,,0.0,,12.1,,0.0,,29.1,,33.2,,0.0,,0.0,,0.0,,0.0,,,,0.0,,0.0,,
1,,XN-10^24746,2020/10/03,04:47:33,15,1,205277003202,B,Initial,2020/10/03 04:46:14,WB,CBC+DIFF,,Normal,Positive,,,Count,,,1,,,,,,1.0,WBC-N,PLT-I,,,,,,,,,,,,,,,355,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,0,DISCRETE,DISCRETE,0,70,50,80,90,90,0,110,4.12,,5.36,,12.7,,40.4,,75.4,-,23.7,-,31.4,,136,*,39.9,,14.7,,14.2,*,10.6,*,31.7,*,0.14,*,0.0,,0.0,,2.86,,0.8,-,0.43,,0.02,,0.01,,69.5,,19.4,-,10.4,,0.5,,0.2,,0.01,,0.2,,0.0,,0.0,,0.0,,100.0,,0.0,,0.0,,5.6,,0.0,,136,*,13.7,,4.5,,4.12,,4.12,,4.12,,0.01,,0.2,,4.19,,4.19,,2.85,,69.3,,0.77,,18.7,,0.03,,0.7,,0.01,,0.2,,152.1,,48.4,,93.5,,83.9,,83.6,,61.0,,121.1,,121.8,,71.5,,309,,620,,685,,596,,861,,574,,231,,714,,755,,0.0,,0.0,,0.0,,0,,5.6,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0,,0,,0,,0.0,,0.0,,4.192,,0.0,,1.254,,2.938,,29.9,,70.1,,0.0,,----,,----,,2.923,,69.7,,0.818,,19.5,,0.436,,10.4,,0.015,,0.4,,0.0,,5085,6354,74.6,0.0,0.0,0.0,0.0,8.6,150.9,91.6,78.3,282,868,152.1,48.4,93.5,486,1694,0.0,1.02,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0,0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,,4.12,4.122,4.191,4.192,0.0,0.0,12.7,7.9,7.88,4899,5702,4053,0,0,0,0,0,0,0,0,105.0,155.0,0.0,,12.69,0.0,,0.0,,,0,0,0,12.7,,0.0,,0.0,,5.356,,5.356,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.03,,0.7,,0.18,,4.4,,48.4,,152.1,,,,,DISCRETE,,0,3.8,,22.5,,0.0,,0.0,,0.0,,12.7,,0.0,,0,,0.0,,12.7,,0.0,,23.7,,31.4,,0.0,,0.0,,0.0,,0.0,,,,0.0,,0.0,,
2,,XN-10^24746,2020/10/03,13:27:08,24,1,205277032303,B,Initial,2020/10/03 13:25:50,WB,CBC+DIFF,,Normal,Positive,Diff.,,,,,1,1.0,,,,,,WBC-N,PLT-I,,,,,,,,,,,,,,,436,,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,DISCRETE,DISCRETE,10,0,70,90,70,80,0,0,9.51,,4.29,,13.5,,39.9,,93.0,,31.5,,33.8,,188,,40.3,,11.9,,12.1,,10.4,,28.4,,0.2,,0.0,,0.0,,8.47,+,0.4,-,0.62,,0.0,,0.02,,89.1,+,4.2,-,6.5,,0.0,,0.2,,0.06,,0.6,,0.0,,0.0,,0.0,,100.0,,0.0,,0.0,,5.6,,0.0,,188,,0.6,,3.2,,9.51,,9.51,,9.51,,0.02,,0.2,,9.54,,9.54,,8.41,,88.5,,0.4,,4.2,,0.0,,0.0,,0.02,,0.2,,156.7,,50.5,,87.6,,78.5,,78.1,,56.7,,123.4,,130.5,,72.1,,306,,653,,742,,599,,730,,794,,276,,728,,638,,0.0,,0.0,,0.0,,0,,5.6,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0,,0,,0,,0.0,,0.0,,9.54,,0.0,,1.023,,8.517,,10.7,,89.3,,0.0,,----,,----,,8.516,,89.3,,0.404,,4.2,,0.619,,6.5,,0.001,,0.0,,0.0,,5106,6457,93.9,0.0,0.0,0.0,0.0,8.3,150.0,92.2,78.8,218,863,156.7,50.5,87.6,428,672,0.0,1.0,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0,0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,,9.515,9.515,9.54,9.54,0.0,0.0,13.5,8.4,8.39,11041,11108,8908,0,0,0,0,0,0,0,0,0.0,0.0,0.0,,13.51,0.0,,0.0,,,0,0,1,13.5,,0.0,,0.0,,4.2944,,4.2944,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,*,0.0,*,0.02,,0.2,,50.5,,156.7,,,,,DISCRETE,,0,0.0,*,5.0,,0.0,,0.0,,0.0,,13.5,,0.0,,0,,0.0,,13.5,,0.0,,31.5,,33.8,,0.0,,0.0,,0.0,,0.0,,,,0.0,,0.0,,
3,,XN-10^24746,2020/10/06,16:43:20,3,4,205280165503,B,Initial,2020/10/06 16:40:18,WB,CBC+DIFF,,Normal,Positive,Diff.,,,,,1,1.0,,,,,,WBC-N,PLT-I,,,,,,,,,,,,,,,1250,,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,DISCRETE,DISCRETE,0,30,70,90,70,80,0,0,5.05,,4.74,,14.9,,43.1,,90.9,,31.4,,34.6,,410,+,39.2,,11.8,,8.7,-,8.6,-,13.8,,0.35,,0.0,,0.0,,3.73,,0.67,-,0.57,,0.07,,0.01,,73.8,+,13.3,-,11.3,,1.4,,0.2,,0.02,,0.4,,0.0,,0.0,,0.0,,100.0,,0.0,,0.0,,5.6,,0.0,,410,,0.7,,3.4,,5.05,,5.05,,5.05,,0.01,,0.2,,5.11,,5.11,,3.71,,73.4,,0.62,,12.3,,0.05,,1.0,,0.02,,0.4,,152.3,,48.6,,86.0,,78.8,,71.7,,59.0,,119.9,,124.4,,72.5,,322,,638,,768,,622,,809,,745,,225,,691,,565,,0.0,,0.0,,0.0,,0,,5.6,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0,,0,,0,,0.0,,0.0,,5.115,,0.0,,1.254,,3.861,,24.6,,75.4,,0.0,,----,,----,,3.792,,74.1,,0.678,,13.3,,0.576,,11.3,,0.069,,1.3,,0.0,,5125,6615,91.6,0.0,0.0,0.0,0.0,7.1,148.1,95.1,81.6,262,890,152.3,48.6,86.0,453,1050,0.0,1.01,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0,0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,,5.05,5.052,5.112,5.115,0.0,0.0,14.9,9.2,9.25,5697,5724,4811,0,0,0,0,0,0,0,0,77.0,125.0,0.0,,14.9,0.0,,0.0,,,0,0,0,14.9,,0.0,,0.0,,4.7382,,4.7382,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.05,,1.0,,0.1,,2.0,,48.6,,152.3,,,,,DISCRETE,,0,7.5,,14.9,,0.0,,0.0,,0.0,,14.9,,0.0,,0,,0.0,,14.9,,0.0,,31.4,,34.6,,0.0,,0.0,,0.0,,0.0,,,,0.0,,0.0,,
4,,XN-10^24746,2020/10/07,05:18:07,33,1,205281008504,B,Initial,2020/10/07 05:16:49,WB,CBC+DIFF,,Normal,Positive,Diff.,Morph.,Count,,,1,1.0,,,,,,WBC-N,PLT-I,,,,,,,,,,,,,,,1327,,,,1.0,,,,,,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,0,DISCRETE,DISCRETE,10,10,60,90,90,90,0,40,10.51,,4.98,,13.4,,39.7,,79.7,-,26.9,,33.8,,205,,42.4,,14.6,,11.7,,10.6,,29.2,,0.22,,0.0,,0.0,,9.72,+,0.47,-,0.31,,0.0,,0.01,,92.5,+,4.5,-,2.9,,0.0,,0.1,,0.11,,1.0,,0.0,,0.0,,0.0,,100.0,,0.0,,0.0,,5.6,,0.0,,205,,7.8,,4.3,,10.51,,10.51,,10.51,,0.01,,0.1,,10.58,,10.58,,9.61,,91.5,,0.44,,4.2,,0.03,,0.3,,0.03,,0.3,,152.8,,47.2,,81.7,,80.2,,69.5,,57.3,,122.7,,134.9,,69.3,,327,,742,,771,,474,,1007,,733,,285,,615,,707,,0.0,,0.0,,0.0,,0,,5.6,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0,,0,,0,,0.0,,0.0,,10.58,,0.0,,0.782,,9.798,,7.4,,92.6,,0.0,,----,,----,,9.796,,92.6,,0.468,,4.4,,0.314,,3.0,,0.002,,0.0,,0.0,,5082,6422,79.5,0.0,0.0,0.0,0.0,8.4,144.3,91.9,76.8,232,856,152.8,47.2,81.7,360,804,0.0,1.01,0.0,0.0,0.0,0.0,0,0,0,0.0,0.0,0.0,0,0,0.0,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0.0,,10.512,10.512,10.58,10.58,0.0,0.0,13.4,8.3,8.32,12599,10765,9728,0,0,0,0,0,0,0,0,0.0,0.0,0.0,,13.4,0.0,,0.0,,,0,0,0,13.4,,0.0,,0.0,,4.9843,,4.9843,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.0,,0.03,,0.3,,0.07,,0.7,,47.2,,152.8,,,,,DISCRETE,,0,6.4,,14.9,,0.0,,0.0,,0.0,,13.4,,0.0,,0,,0.0,,13.4,,0.0,,26.9,,33.8,,0.0,,0.0,,0.0,,0.0,,,,0.0,,0.0,,


In [7]:
# Inspect dataframe (318 vars)
print(data.columns.values)

['Nickname' 'Analyzer ID' 'Date' 'Time' 'Rack' 'Position' 'Sample No.'
 'Sample Inf.' 'Order Type' 'Reception Date' 'Measurement Mode' 'Discrete'
 'Patient ID' 'Analysis Info.' 'Judgment' 'Positive(Diff.)'
 'Positive(Morph.)' 'Positive(Count)' 'Error(Func.)' 'Error(Result)'
 'Order Info.' 'WBC Abnormal' 'WBC Suspect' 'RBC Abnormal' 'RBC Suspect'
 'PLT Abnormal' 'PLT Suspect' 'WBC Info.' 'PLT Info.' 'Rule Result'
 'Validate' 'Validator' 'Action Message (Check)' 'Action Message (Review)'
 'Action Message (Retest)' 'Sample Comment' 'Patient Name' 'Birth' 'Sex'
 'Patient Comment' 'Ward Name' 'Doctor Name' 'Output' 'Sequence No.'
 'IP ABN(WBC)WBC Abn Scattergram' 'IP ABN(WBC)Neutropenia'
 'IP ABN(WBC)Neutrophilia' 'IP ABN(WBC)Lymphopenia'
 'IP ABN(WBC)Lymphocytosis' 'IP ABN(WBC)Monocytosis'
 'IP ABN(WBC)Eosinophilia' 'IP ABN(WBC)Basophilia'
 'IP ABN(WBC)Leukocytopenia' 'IP ABN(WBC)Leukocytosis'
 'IP ABN(WBC)NRBC Present' 'IP ABN(WBC)IG Present'
 'IP ABN(RBC)RBC Abn Distribution' 'IP ABN(RBC

In [9]:
# Check different measurement methods and corresponding counts
data['Measurement Mode'].unique()
data.groupby(by = ['Measurement Mode'])['Measurement Mode'].count()

Measurement Mode
LW      1
WB    220
Name: Measurement Mode, dtype: int64

In [10]:
# Check different measurement methods and corresponding counts
data['Discrete'].unique()
data.groupby(by = ['Discrete'])['Discrete'].count()

Discrete
CBC                21
CBC+DIFF          169
CBC+DIFF+PLT-F      2
CBC+DIFF+WPC       24
FREE SELECT         5
Name: Discrete, dtype: int64

In [11]:
plt.figure()


<Figure size 640x480 with 0 Axes>

<Figure size 640x480 with 0 Axes>

In [12]:
data.isnull().sum().sort_values(ascending=False) #/len(data) >0.9

Nickname                            221
HYPER-He/M                          221
[WBC-P/M]                           221
[TNC-P/M]                           221
[RBC-O/M]                           221
[PLT-O/M]                           221
RBC-He/M                            221
Delta-He/M                          221
[RET-Y/M]                           221
[RET-RBC-Y/M]                       221
[IRF-Y/M]                           221
[FRC#/M]                            221
[FRC%/M]                            221
HYPO-He/M                           221
[RPI/M]                             221
NRBC#/M                             221
[RET-UPP/M]                         221
[RET-TNC/M]                         221
[PLT-F/M]                           221
WBC-BF/M                            221
RBC-BF/M                            221
MN#/M                               221
PMN#/M                              221
MN%/M                               221
PMN%/M                              221


In [13]:
#Check missing data
missing    = data.isnull().sum().sort_values(ascending=False)
df_missing = pd.DataFrame({'missing': missing})
df_missing['perc']  = df_missing['missing']/len(data)
df_missing['rowid'] = df_missing.reset_index().index
df_missing

Unnamed: 0,missing,perc,rowid
Nickname,221,1.0,0
HYPER-He/M,221,1.0,1
[WBC-P/M],221,1.0,2
[TNC-P/M],221,1.0,3
[RBC-O/M],221,1.0,4
[PLT-O/M],221,1.0,5
RBC-He/M,221,1.0,6
Delta-He/M,221,1.0,7
[RET-Y/M],221,1.0,8
[RET-RBC-Y/M],221,1.0,9
