### 2015 Measure List

[source: data.cms.gov](https://data.cms.gov/dataset/2015-Measure-List/bj2d-5cr5)

In [5]:
import pandas
measures_data = pandas.read_json('../qpp-measures-data/measures/measures-data.json')
quality_measures = measures_data[measures_data['category'] == 'quality']
quality_measure_id_set = set(quality_measures['measureId'])
print('Number of known 2017 quality measures: {0}'.format(len(quality_measure_id_set)))

Number of known 2017 quality measures: 271


In [57]:
pqrs_measures = pandas.read_csv('quality_data/2015_Measure_List.csv')
# Drop rows with no measure number (only 1 row)
pqrs_measures = pqrs_measures.dropna(subset=['PQRS Measure Number'])
# Cast to int then string, reads in as float
pqrs_measures['PQRS Measure Number'] = pqrs_measures['PQRS Measure Number'].astype(int).astype(str)
# Create a set
pqrs_measure_id_set = set(pqrs_measures['PQRS Measure Number'])
print('Number of 2015 PQRS measures: {0}'.format(len(pqrs_measure_id_set)))

Number of 2015 PQRS measures: 254


I notice that for measures which have CMS Measure numbers, they might exist in the new measure set but with a different version.

#### First intersection is between PQRS 'PQRS Measure Number' and 2017 'measureId'

In [58]:
print(list(quality_measure_id_set)[0:5])
print(list(pqrs_measure_id_set)[0:5])
matching_measure_ids = pqrs_measure_id_set.intersection(quality_measure_id_set)
matched_on_pqrs_measure_number = len(matching_measure_ids)
matched_on_pqrs_measure_number

['CMS132v50564192', '0711411', '271', '259', '0659185']
['271', '259', '357', '19', '137']


85

In [59]:
# looks like they also match on qualityId
# filter quality measures on those already matched
not_yet_matched_by_measure_id_measures = quality_measures[~quality_measures['measureId'].isin(matching_measure_ids)]
# sanity check
not_yet_matched_plus_matched = not_yet_matched_by_measure_id_measures.shape[0] + matched_on_pqrs_measure_number
print('(Sanity check) Not yet matched plus matched by PQRS Measure Number: {0}'.format(not_yet_matched_plus_matched))

(Sanity check) Not yet matched plus matched by PQRS Measure Number: 271


#### Match 2015 'qualityId' with 2017 'measureId'

In [60]:
not_yet_matched_quality_id_set = set(not_yet_matched_by_measure_id_measures['qualityId'].astype(int).astype(str))

# ignore pqrs measure ids already matched
# difference -> new set with elements in pqrs_measure_id_set but not in matching_measures_ids
remaining_pqrs_measure_id_set = pqrs_measure_id_set.difference(matching_measure_ids)
# sanity check
len(remaining_pqrs_measure_id_set) + 85 # looks right - one NA was removed

matching_measure_ids_by_quality_id = remaining_pqrs_measure_id_set.intersection(not_yet_matched_quality_id_set)
matched_on_quality_id = len(matching_measure_ids_by_quality_id)
print('Number of measures matched on qualityId: {0}'.format(matched_on_quality_id))
print('Total matched so far: {0}'.format(matched_on_quality_id + matched_on_pqrs_measure_number))

Number of measures matched on qualityId: 130
Total matched so far: 215


## Sanity check

In [76]:
quality_measure_key = 'measureId'
pqrs_measure_key = 'PQRS Measure Number'

joined_on_measureId = pandas.merge(
    quality_measures,
    pqrs_measures, how='inner',
    left_on=quality_measure_key,
    right_on=pqrs_measure_key
)
joined_on_measureId.shape

(85, 52)

In [78]:
quality_measure_key = 'qualityId'
quality_measures[quality_measure_key] = quality_measures[quality_measure_key].astype(int).astype(str)

joined_on_qualityId = pandas.merge(
    quality_measures,
    pqrs_measures, how='inner',
    left_on=quality_measure_key,
    right_on=pqrs_measure_key
)
joined_on_qualityId.shape


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


(215, 52)

#### Ah so qualityId is the thing to use!