# HW2 - Pandas and ICD-codes

### Get the data

For this assignment, we'll need to get some data! We will be using the Diabetes Dataset that is located here:

https://archive.ics.uci.edu/ml/datasets/diabetes+130-us+hospitals+for+years+1999-2008#

Afterwards, unzip the folder and place the contents in a folder called `/data/` at this directory.

Make sure to add a file called `.gitignore` at the root of your directory and add the line 

```
    data/
```
to it so that it ignore any files that you place in the Data folder.

In [1]:
import pandas as pd
import numpy as np
import string
import re

In [2]:
# Load the data and coerce uninformative fields to NaN
diabetes_df = pd.read_csv('../data/diabetic_data.csv',na_values=['?','Unknown/Invalid'])
display(diabetes_df.head())

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,encounter_id,patient_nbr,race,gender,age,weight,admission_type_id,discharge_disposition_id,admission_source_id,time_in_hospital,...,citoglipton,insulin,glyburide-metformin,glipizide-metformin,glimepiride-pioglitazone,metformin-rosiglitazone,metformin-pioglitazone,change,diabetesMed,readmitted
0,2278392,8222157,Caucasian,Female,[0-10),,6,25,1,1,...,No,No,No,No,No,No,No,No,No,NO
1,149190,55629189,Caucasian,Female,[10-20),,1,1,7,3,...,No,Up,No,No,No,No,No,Ch,Yes,>30
2,64410,86047875,AfricanAmerican,Female,[20-30),,1,1,7,2,...,No,No,No,No,No,No,No,No,Yes,NO
3,500364,82442376,Caucasian,Male,[30-40),,1,1,7,2,...,No,Up,No,No,No,No,No,Ch,Yes,NO
4,16680,42519267,Caucasian,Male,[40-50),,1,1,7,1,...,No,Steady,No,No,No,No,No,Ch,Yes,NO


![](data/datadictionary.png)

## Data Source [5 pts]

When we begin working with data, it is important to try and understand the data that we've been given. Often the context can tell us a lot of information about the data. In fact, often times understanding what is *not* in the data is just as critical as understanding the data itself. From the above link, the information about how the data was collected can be found in this [paper](https://www.hindawi.com/journals/bmri/2014/781670/)

List the 5 inclusion criteria to be in the dataset: 

1. It is an inpatient encounter (a hospital admission).

2. It is a “diabetic” encounter, that is, one during which any kind of diabetes was entered to the system as a diagnosis.

3. The length of stay was at least 1 day and at most 14 days.

4. Laboratory tests were performed during the encounter.

5. Medications were administered during the encounter.


## Explore the Data [55 pts]
Using the data, answer the following questions:

#### 1. How many rows does the `diabetes_df` have? How many columns? [2.5 pts]

In [3]:
nRows, nCols = diabetes_df.shape
print(nRows,"rows")
print(nCols,"columns")

101766 rows
50 columns


#### 2. How many unique encounters are there? How many unique patients? [2.5 pts]

In [4]:
print(len(diabetes_df['encounter_id'].unique()),'unique encounters')
print(len(diabetes_df['patient_nbr'].unique()),'unique patients')

101766 unique encounters
71518 unique patients


#### 3. What is the most amount of encounters that a single patient has in the dataset? [5 pts]

In [5]:
# METHOD 1
print(diabetes_df
     .groupby(['patient_nbr']) # group by patient
     .size() # get the number of rows per patient
     .sort_values(ascending=False) # sort the resulting pd.Series, descending order
     .iloc[0] # Grab the top result, alternatively, just grab .max() after .size()
,'enounters (method 1)')

# OR

# METHOD 2
print(diabetes_df
    .groupby(['patient_nbr'])
    .size()
    .max() # Or just use max
,'enounters (method 2)')

40 enounters (method 1)
40 enounters (method 2)


#### 4. Show the proportion of non-missing values are in the dataset for each column. [10 pts]
> Make sure you check to see if there are missing values that aren't coded as missing, but should be

In [6]:
# The function sum() counts the number of "True" values in boolean array
# The "~" operator inverts a boolean array, ie. means "not"
{column:sum(~diabetes_df[column].isnull())/len(diabetes_df) for column in diabetes_df.columns}

# Note, this works because we set na_values when loading the dataset
# Altenatively you can do the following to coerce certain fields to NaN:
# diabetes_df = diabetes_df.replace({'?':np.nan, 'Unknown/Invalid':np.nan})


{'encounter_id': 1.0,
 'patient_nbr': 1.0,
 'race': 0.9776644458856593,
 'gender': 0.9999705206060964,
 'age': 1.0,
 'weight': 0.03141520743666844,
 'admission_type_id': 1.0,
 'discharge_disposition_id': 1.0,
 'admission_source_id': 1.0,
 'time_in_hospital': 1.0,
 'payer_code': 0.6044258396714031,
 'medical_specialty': 0.5091779179686732,
 'num_lab_procedures': 1.0,
 'num_procedures': 1.0,
 'num_medications': 1.0,
 'number_outpatient': 1.0,
 'number_emergency': 1.0,
 'number_inpatient': 1.0,
 'diag_1': 0.9997936442426744,
 'diag_2': 0.9964821256608297,
 'diag_3': 0.98601694082503,
 'number_diagnoses': 1.0,
 'max_glu_serum': 1.0,
 'A1Cresult': 1.0,
 'metformin': 1.0,
 'repaglinide': 1.0,
 'nateglinide': 1.0,
 'chlorpropamide': 1.0,
 'glimepiride': 1.0,
 'acetohexamide': 1.0,
 'glipizide': 1.0,
 'glyburide': 1.0,
 'tolbutamide': 1.0,
 'pioglitazone': 1.0,
 'rosiglitazone': 1.0,
 'acarbose': 1.0,
 'miglitol': 1.0,
 'troglitazone': 1.0,
 'tolazamide': 1.0,
 'examide': 1.0,
 'citoglipton': 

#### 5. For all numeric columns, show summary statistics (mean, median, max, min, etc) [2.5 pts]

In [7]:
# By default, pandas funtion df.describe() only summarizes numeric columns
diabetes_df.describe()

# You can also select numeric columns using pd.DataFrame.select_dtypes()

Unnamed: 0,encounter_id,patient_nbr,admission_type_id,discharge_disposition_id,admission_source_id,time_in_hospital,num_lab_procedures,num_procedures,num_medications,number_outpatient,number_emergency,number_inpatient,number_diagnoses
count,101766.0,101766.0,101766.0,101766.0,101766.0,101766.0,101766.0,101766.0,101766.0,101766.0,101766.0,101766.0,101766.0
mean,165201600.0,54330400.0,2.024006,3.715642,5.754437,4.395987,43.095641,1.33973,16.021844,0.369357,0.197836,0.635566,7.422607
std,102640300.0,38696360.0,1.445403,5.280166,4.064081,2.985108,19.674362,1.705807,8.127566,1.267265,0.930472,1.262863,1.9336
min,12522.0,135.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0
25%,84961190.0,23413220.0,1.0,1.0,1.0,2.0,31.0,0.0,10.0,0.0,0.0,0.0,6.0
50%,152389000.0,45505140.0,1.0,1.0,7.0,4.0,44.0,1.0,15.0,0.0,0.0,0.0,8.0
75%,230270900.0,87545950.0,3.0,4.0,7.0,6.0,57.0,2.0,20.0,0.0,0.0,1.0,9.0
max,443867200.0,189502600.0,8.0,28.0,25.0,14.0,132.0,6.0,81.0,42.0,76.0,21.0,16.0


#### 6. For all columns with a `dtype` of object, show the count of all of the values in that column [5 pts]
> You may want to reference the `pd.DataFrame.select_dtypes()` function 

In [8]:
for column in diabetes_df.select_dtypes('object'):
    display(diabetes_df[column].value_counts())

# Also, if you pass include=np.object as a parameter, it will summarize categoricals
# diabetes_df.describe(include=np.object)

Caucasian          76099
AfricanAmerican    19210
Hispanic            2037
Other               1506
Asian                641
Name: race, dtype: int64

Female    54708
Male      47055
Name: gender, dtype: int64

[70-80)     26068
[60-70)     22483
[50-60)     17256
[80-90)     17197
[40-50)      9685
[30-40)      3775
[90-100)     2793
[20-30)      1657
[10-20)       691
[0-10)        161
Name: age, dtype: int64

[75-100)     1336
[50-75)       897
[100-125)     625
[125-150)     145
[25-50)        97
[0-25)         48
[150-175)      35
[175-200)      11
>200            3
Name: weight, dtype: int64

MC    32439
HM     6274
SP     5007
BC     4655
MD     3532
CP     2533
UN     2448
CM     1937
OG     1033
PO      592
DM      549
CH      146
WC      135
OT       95
MP       79
SI       55
FR        1
Name: payer_code, dtype: int64

InternalMedicine                     14635
Emergency/Trauma                      7565
Family/GeneralPractice                7440
Cardiology                            5352
Surgery-General                       3099
Nephrology                            1613
Orthopedics                           1400
Orthopedics-Reconstructive            1233
Radiologist                           1140
Pulmonology                            871
Psychiatry                             854
Urology                                685
ObstetricsandGynecology                671
Surgery-Cardiovascular/Thoracic        652
Gastroenterology                       564
Surgery-Vascular                       533
Surgery-Neuro                          468
PhysicalMedicineandRehabilitation      391
Oncology                               348
Pediatrics                             254
Hematology/Oncology                    207
Neurology                              203
Pediatrics-Endocrinology               159
Otolaryngol

428       6862
414       6581
786       4016
410       3614
486       3508
427       2766
491       2275
715       2151
682       2042
434       2028
780       2019
996       1967
276       1889
38        1688
250.8     1680
599       1595
584       1520
V57       1207
250.6     1183
518       1115
820       1082
577       1057
493       1056
435       1016
562        989
574        965
296        896
560        876
250.7      871
250.13     851
          ... 
V70          1
906          1
895          1
389          1
E909         1
704          1
97           1
V25          1
834          1
365          1
57           1
471          1
827          1
391          1
347          1
885          1
842          1
219          1
832          1
10           1
314          1
955          1
684          1
957          1
160          1
698          1
640          1
833          1
375          1
373          1
Name: diag_1, Length: 716, dtype: int64

276       6752
428       6662
250       6071
427       5036
401       3736
496       3305
599       3288
403       2823
414       2650
411       2566
250.02    2074
707       1999
585       1871
584       1649
491       1545
250.01    1523
285       1520
780       1491
425       1434
682       1433
486       1379
518       1355
424       1071
413       1042
250.6      895
493        881
305        702
786        644
280        606
998        571
          ... 
E826         1
942          1
270          1
E854         1
E890         1
703          1
977          1
V25          1
894          1
E968         1
871          1
734          1
V69          1
800          1
268          1
506          1
99           1
E868         1
66           1
460          1
5            1
256          1
316          1
163          1
364          1
195          1
E965         1
529          1
917          1
235          1
Name: diag_2, Length: 748, dtype: int64

250       11555
401        8289
276        5175
428        4577
427        3955
414        3664
496        2605
403        2357
585        1992
272        1969
599        1941
V45        1389
250.02     1369
707        1360
780        1334
285        1200
425        1136
250.6      1080
424        1063
584         963
305         924
250.01      915
682         887
518         854
41          727
493         694
278         680
530         625
786         584
491         574
          ...  
755           1
E861          1
E949          1
E854          1
E826          1
308           1
942           1
315           1
E900          1
17            1
111           1
215           1
834           1
542           1
877           1
E864          1
744           1
930           1
841           1
226           1
47            1
871           1
684           1
265           1
484           1
370           1
992           1
391           1
66            1
971           1
Name: diag_3, Length: 78

None    96420
Norm     2597
>200     1485
>300     1264
Name: max_glu_serum, dtype: int64

None    84748
>8       8216
Norm     4990
>7       3812
Name: A1Cresult, dtype: int64

No        81778
Steady    18346
Up         1067
Down        575
Name: metformin, dtype: int64

No        100227
Steady      1384
Up           110
Down          45
Name: repaglinide, dtype: int64

No        101063
Steady       668
Up            24
Down          11
Name: nateglinide, dtype: int64

No        101680
Steady        79
Up             6
Down           1
Name: chlorpropamide, dtype: int64

No        96575
Steady     4670
Up          327
Down        194
Name: glimepiride, dtype: int64

No        101765
Steady         1
Name: acetohexamide, dtype: int64

No        89080
Steady    11356
Up          770
Down        560
Name: glipizide, dtype: int64

No        91116
Steady     9274
Up          812
Down        564
Name: glyburide, dtype: int64

No        101743
Steady        23
Name: tolbutamide, dtype: int64

No        94438
Steady     6976
Up          234
Down        118
Name: pioglitazone, dtype: int64

No        95401
Steady     6100
Up          178
Down         87
Name: rosiglitazone, dtype: int64

No        101458
Steady       295
Up            10
Down           3
Name: acarbose, dtype: int64

No        101728
Steady        31
Down           5
Up             2
Name: miglitol, dtype: int64

No        101763
Steady         3
Name: troglitazone, dtype: int64

No        101727
Steady        38
Up             1
Name: tolazamide, dtype: int64

No    101766
Name: examide, dtype: int64

No    101766
Name: citoglipton, dtype: int64

No        47383
Steady    30849
Down      12218
Up        11316
Name: insulin, dtype: int64

No        101060
Steady       692
Up             8
Down           6
Name: glyburide-metformin, dtype: int64

No        101753
Steady        13
Name: glipizide-metformin, dtype: int64

No        101765
Steady         1
Name: glimepiride-pioglitazone, dtype: int64

No        101764
Steady         2
Name: metformin-rosiglitazone, dtype: int64

No        101765
Steady         1
Name: metformin-pioglitazone, dtype: int64

No    54755
Ch    47011
Name: change, dtype: int64

Yes    78363
No     23403
Name: diabetesMed, dtype: int64

NO     54864
>30    35545
<30    11357
Name: readmitted, dtype: int64

#### 7. What is the average number of labs administered by age category [2.5 pts]

In [9]:
diabetes_df.groupby(['age'])['num_lab_procedures'].mean()


age
[0-10)      41.012422
[10-20)     43.096961
[20-30)     43.066385
[30-40)     43.033642
[40-50)     42.785958
[50-60)     42.611961
[60-70)     42.600632
[70-80)     43.157396
[80-90)     44.085015
[90-100)    44.695310
Name: num_lab_procedures, dtype: float64

#### 8. Does the number of diagnoses equal the number of non-NA entries in the diag_* columns? [2.5 pts]

In [10]:
# You can use pd.DataFrame.notnull().sum(1) to count non-nullnulls along the rows axis
diabetes_df['is_equal'] = diabetes_df[[u'diag_1', u'diag_2', u'diag_3']].notnull().sum(1) == diabetes_df['number_diagnoses']                                         
diabetes_df[[u'diag_1', u'diag_2', u'diag_3','number_diagnoses','is_equal']].head(10)
 


Unnamed: 0,diag_1,diag_2,diag_3,number_diagnoses,is_equal
0,250.83,,,1,True
1,276.0,250.01,255,9,False
2,648.0,250.0,V27,6,False
3,8.0,250.43,403,7,False
4,197.0,157.0,250,5,False
5,414.0,411.0,250,9,False
6,414.0,411.0,V45,7,False
7,428.0,492.0,250,8,False
8,398.0,427.0,38,8,False
9,434.0,198.0,486,8,False



#### 9. Create a new column that has the value of 1 if the medical specialty in that row contains the word Surgery and 0 otherwise. [10 pts]

In [11]:
# Use a list comprehension
diabetes_df['is_surgery'] = [1 if 'Surgery' in str(x) else 0 for x in diabetes_df['medical_specialty'].values]
diabetes_df[['medical_specialty', 'is_surgery']].head()


Unnamed: 0,medical_specialty,is_surgery
0,Pediatrics-Endocrinology,0
1,,0
2,,0
3,,0
4,,0


#### 10. How many encounters where the patient was between the ages of 0 and 20 took place with Pediatric providers? [5 pts]

In [12]:
diabetes_df['is_pediatric'] = [1 if 'Pediatric' in str(x) else 0 for x in diabetes_df['medical_specialty']]
count = len(diabetes_df.loc[(diabetes_df['is_pediatric'] == 1) & (diabetes_df['age'].isin(['[0-10)', '[10-20)']))])

print(count,'encounters')

429 encounters


#### 12. find the counts of each of the available A1Cresult categories, broken down by whether or not the patient was readmitted (regardless of whether it was less than or greater than 30 days) [2.5 pts]

In [13]:
print("Not admitted:")
diabetes_df['A1Cresult'].loc[diabetes_df['readmitted'] == 'NO'].value_counts()


Not admitted:


None    45322
>8       4504
Norm     2909
>7       2129
Name: A1Cresult, dtype: int64

In [14]:
print("Admitted:")
diabetes_df['A1Cresult'].loc[diabetes_df['readmitted'] != 'NO'].value_counts()


Admitted:


None    39426
>8       3712
Norm     2081
>7       1683
Name: A1Cresult, dtype: int64


## Single-level CCS categories [40 pts]

The columns `diag_1`, `diag_2`,  and `diag_3` contain ICD-9-CM codes for the encounters that took place in this dataset. However, if we count up the number of unique values between the 3 columns, we can see that the data is very sparse. 

As we discussed in class, the single-level CCS categories can be used instead to group similar ICD codes together. Download the latest version of the ICD-9-CM single-level CCS here: [https://www.hcup-us.ahrq.gov/toolssoftware/ccs/Single_Level_CCS_2015.zip](https://www.hcup-us.ahrq.gov/toolssoftware/ccs/Single_Level_CCS_2015.zip)

Unzip this and put the contents in the `data` folder along with the Diabetes dataset and read the file called `$dxref 2015.csv` into a variable. **NOTE** You must skip the first row of this `csv` file when reading it in because there is a note there that is unrelated to the contents of the data. Look up how to do this using the `pd.read_csv` function

Examine the contents of the data. Please bear in mind that Jupyter notebooks do not render whitespace or quotes very well sometimes, so watch out for that. Make sure you examine column names with `.columns` instead of just calling `.head()` and visually inspecting, for example.

If you notice, the ICD codes are not very well-formatted in either the Diabetes dataset (for example, the code `8` should really be `008.0`) or the Single-level CCS crosswalk (`' and whitespace characters`). 

This is quite typical of healthcare data, unfortunately. Many of the publically available files are not suited for reading into modern programming languages. Often, they are limited to SAS format datasets, which is a proprietary software suite for working with statistical packages that is ubiquitous in health care.

In order to use the CCS groupings, we'll have to clean both the groupings *and* the diabetes data as well. Here is the general procedure that we will take (Although this is an imperfect mapping as well).

**Remove all quotes and extra whitespace from the codes and the column names in the CCS crosswalk [10 pts]**

In [15]:
# Load the dataset, skipping the first row
single_level_ccs = pd.read_csv('../data/$dxref 2015.csv',skiprows=1)

# Strip quotes and whitespace from column names
single_level_ccs.columns = [x.strip("'") for x in single_level_ccs.columns]

# Strip quotes and whitespace from codes
single_level_ccs = single_level_ccs.applymap(lambda x: x.strip("' "))

single_level_ccs.head()


Unnamed: 0,ICD-9-CM CODE,CCS CATEGORY,CCS CATEGORY DESCRIPTION,ICD-9-CM CODE DESCRIPTION,OPTIONAL CCS CATEGORY,OPTIONAL CCS CATEGORY DESCRIPTION
0,,0,No DX,INVALID CODES IN USER DATA,,
1,1000.0,1,Tuberculosis,PRIM TB COMPLEX-UNSPEC,,
2,1001.0,1,Tuberculosis,PRIM TB COMPLEX-NO EXAM,,
3,1002.0,1,Tuberculosis,PRIM TB COMPLEX-EXM UNKN,,
4,1003.0,1,Tuberculosis,PRIM TB COMPLEX-MICRO DX,,


Next, we're going to write a function that cleans up the ICD codes found in the Diabetes dataset. If you'll notice, there are no decimal points in the single-level CCS crosswalk. Therefore, we must make sure that our data matches that as well. Implement the function below, and we will use it in an `.apply()` call to modify the `diag_` columns.

#### Implement this function [15 pts]

In [16]:
def clean_diabetes_code(icd_code):
    """
    Formats codes found in the Diabetes dataset to be like those found in the CCS crosswalk
    
    
    If an icd_code has a decimal, remove the decimal
    
    if it has less than 3 digits, prepend '0's, until it is 3 digits.
    
    If it has 3 digits (before or after the above step), append a '0'
    
    Examples:
        250.13 -> 25013
        32 -> 0320
        315 -> 3150
        
    Args: 
        icd_code:
    
    Returns:
        formatted_code string: A formatted ICD-Code string according to the above criteria:
        
        
    """
    code_copy = str(icd_code) # Convert to string due to weird .apply behavior in Series
    if code_copy == 'nan':
        return np.nan
    
    ### Your Code here:
    if '.' in code_copy:
        code_copy = code_copy.replace('.', '')
        
    if len(code_copy) <= 3:
        while len(code_copy) < 3:
            code_copy = '0' + code_copy
        code_copy = code_copy + '0'
    
    return code_copy


Now, replace all three `diag_` columns by calling .apply with this function. For example:

`diabetes_df['diag_1'] = diabetes_df['diag_1'].apply(clean_diabetes_code)`

In [17]:
for diag in ['diag_1','diag_2','diag_3']:
    diabetes_df[diag] = diabetes_df[diag].apply(clean_diabetes_code)

diabetes_df[['diag_1', 'diag_2', 'diag_3']].head(10)

Unnamed: 0,diag_1,diag_2,diag_3
0,25083,,
1,2760,25001.0,2550
2,6480,2500.0,V270
3,80,25043.0,4030
4,1970,1570.0,2500
5,4140,4110.0,2500
6,4140,4110.0,V450
7,4280,4920.0,2500
8,3980,4270.0,0380
9,4340,1980.0,4860


#### Join in the single-level CCS crosswalk and answer the following question: [15 pts]

List the top 10 condition categories (`CCS CATEGORY DESCRIPTION`) when you add up all instances over all 3 columns

In [18]:
for col in ['diag_1','diag_2','diag_3']:
    diabetes_df = diabetes_df.merge(single_level_ccs, how='left', left_on=col, right_on='ICD-9-CM CODE')

diabetes_df.head()

Unnamed: 0,encounter_id,patient_nbr,race,gender,age,weight,admission_type_id,discharge_disposition_id,admission_source_id,time_in_hospital,...,CCS CATEGORY DESCRIPTION_y,ICD-9-CM CODE DESCRIPTION_y,OPTIONAL CCS CATEGORY_y,OPTIONAL CCS CATEGORY DESCRIPTION_y,ICD-9-CM CODE,CCS CATEGORY,CCS CATEGORY DESCRIPTION,ICD-9-CM CODE DESCRIPTION,OPTIONAL CCS CATEGORY,OPTIONAL CCS CATEGORY DESCRIPTION
0,2278392,8222157,Caucasian,Female,[0-10),,6,25,1,1,...,,,,,,,,,,
1,149190,55629189,Caucasian,Female,[10-20),,1,1,7,3,...,DiabMel no c,DIABETES UNCOMPL TYPE I,,,2550,51.0,Ot endo dsor,CUSHING-s SYNDROME,,
2,64410,86047875,AfricanAmerican,Female,[20-30),,1,1,7,2,...,,,,,V270,196.0,Other pregnancy and delivery including normal,DELIVER-SINGLE LIVEBORN,,
3,500364,82442376,Caucasian,Male,[30-40),,1,1,7,2,...,DiabMel w/cm,DIAB RENAL MANIF TYPE I DM UNCONT (Begin 1993),,,4030,99.0,Htn complicn,MAL HYPERTENS RENAL DIS (Begin 1980 End 1989),,
4,16680,42519267,Caucasian,Male,[40-50),,1,1,7,1,...,Pancreas can,MAL NEO PANCREAS HEAD,,,,,,,,


In [19]:
# Note the suffixes _x and _y are added when duplicate columns exist
# In this case '_x' refers to diag1,'_y'refers to diag 2, and '' refers to diag3

(diabetes_df['CCS CATEGORY DESCRIPTION_x'].value_counts() + 
 diabetes_df['CCS CATEGORY DESCRIPTION'].value_counts() +
 diabetes_df['CCS CATEGORY DESCRIPTION_y'].value_counts()
).sort_values(ascending = False).head(10)

Htn complicn    18653.0
chf;nonhp       18101.0
Coron athero    17602.0
Fluid/elc dx    13816.0
Dysrhythmia     12762.0
DiabMel w/cm    10266.0
UTI              7039.0
Anemia           5068.0
COPD             4867.0
Coma/brn dmg     4844.0
dtype: float64