## Code Sets Exercise 3: Compare Grouping with CCS - Solution

### Dataset Source 
The portal [https://www.hcup-us.ahrq.gov/toolssoftware/ccs10/ccs10.jsp](https://www.hcup-us.ahrq.gov/toolssoftware/ccs10/ccs10.jsp) presents an overview, description, technical guidance, and downloading Information for Clinical Classifications Software (CCS) for ICD-10-PCS (beta version). The CCS is developed as a part of Healthcare Cost and Utilization Project (HCUP). We recommend you to go through each section to understand the data better. 


### Data Description
We have already downloaded the `ccs_pr_icd10pcs_2020_1.zip` file available under the section "Downloading Information for the CCS for ICD-10-PCS Tool" in the above portal. We have tweaked the `.csv` file contained inside of the `.zip` file. The updated `.csv` file is provided locally in the current workspace, with the name `clean_ccs_pr_icd10pcs.csv`. 


The `clean_ccs_pr_icd10pcs.csv` file is composed of the following eight fields separated by commas:
1. Procedure Code as 'ICD-10-PCS CODE'
1. 'CCS CATEGORY',
1. 'ICD-10-PCS CODE DESCRIPTION',
1. 'CCS CATEGORY DESCRIPTION',
1. Multi-level 1 Category as 'MULTI CCS LVL 1',
1. Multi-level 1 Category Description as 'MULTI CCS LVL 1 LABEL',
1. Multi-level 2 Category as 'MULTI CCS LVL 2',
1. Multi-level 2 Category Description as 'MULTI CCS LVL 2 LABEL'


### Exercise
1. If you search the CCS_CATEGORY_DESCRIPTION for "coronary", what are the two single level categories that you find? What is/are the label(s) for the multi-level 1 categories?


2. Given CCS single level category 45, what do you notice about the ICD10 PCS Codes? Do they all have a similar character pattern?


### Solution

In [1]:
import pandas as pd
import numpy as np

In [2]:
ccs_pcs_file_path = "./clean_ccs_pr_icd10pcs.csv"

In [3]:
ccs_pcs_df = pd.read_csv(ccs_pcs_file_path)

In [4]:
#  Inspect the dataset schema and look at some example rows
ccs_pcs_df.head()

Unnamed: 0,ICD10_PCS_CODE,CCS_CATEGORY,ICD10_PCS_CODE_DESCRIPTION,CCS_CATEGORY_DESCRIPTION,MULTI_CCS_LVL_1,MULTI_CCS_LVL_1_LABEL,MULTI_CCS_LVL_2,MULTI_CCS_LVL_2_LABEL
0,00800ZZ,1,"Division of Brain, Open Approach",Incision and excision of CNS,1,Operations on the nervous system,1.1,Incision and excision of CNS [1.]
1,00803ZZ,1,"Division of Brain, Percutaneous Approach",Incision and excision of CNS,1,Operations on the nervous system,1.1,Incision and excision of CNS [1.]
2,00804ZZ,1,"Division of Brain, Percutaneous Endoscopic App...",Incision and excision of CNS,1,Operations on the nervous system,1.1,Incision and excision of CNS [1.]
3,00870ZZ,1,"Division of Cerebral Hemisphere, Open Approach",Incision and excision of CNS,1,Operations on the nervous system,1.1,Incision and excision of CNS [1.]
4,00873ZZ,1,"Division of Cerebral Hemisphere, Percutaneous ...",Incision and excision of CNS,1,Operations on the nervous system,1.1,Incision and excision of CNS [1.]


**1.a.  If you search for CCS_CATEGORY_DESCRIPTION for "coronary", what are the two categories that you find?**

In [5]:
coronary_ccs_df = ccs_pcs_df[ccs_pcs_df['CCS_CATEGORY_DESCRIPTION'].str.contains('coronary')]

In [6]:
coronary_ccs_df.CCS_CATEGORY.unique()

array([45, 47])

In [7]:
coronary_ccs_df.CCS_CATEGORY_DESCRIPTION.unique()

array(['Percutaneous transluminal coronary angioplasty (PTCA) with or without stent placement',
       'Diagnostic cardiac catheterization; coronary arteriography'],
      dtype=object)

**1.b. What is/are the label(s) for the multi-level 1 categories?**

In [8]:
coronary_ccs_df.MULTI_CCS_LVL_1_LABEL.unique()

array(['Operations on the cardiovascular system'], dtype=object)

**2. Given CCS single level category 45, what do you notice about the ICD10 PCS Codes? Do they all have a similar character pattern?**

In [9]:
cat_45_df = ccs_pcs_df[ccs_pcs_df['CCS_CATEGORY']==45]

In [10]:
cat_45_df.head()

Unnamed: 0,ICD10_PCS_CODE,CCS_CATEGORY,ICD10_PCS_CODE_DESCRIPTION,CCS_CATEGORY_DESCRIPTION,MULTI_CCS_LVL_1,MULTI_CCS_LVL_1_LABEL,MULTI_CCS_LVL_2,MULTI_CCS_LVL_2_LABEL
12507,0270346,45,"Dilate 1 Cor Art, Bifurc, w Drug-elut Intra, Perc",Percutaneous transluminal coronary angioplasty...,7,Operations on the cardiovascular system,7.3,Percutaneous transluminal coronary angioplasty...
12508,027034Z,45,"Dilation of 1 Cor Art with Drug-elut Intra, Pe...",Percutaneous transluminal coronary angioplasty...,7,Operations on the cardiovascular system,7.3,Percutaneous transluminal coronary angioplasty...
12509,0270356,45,"Dilate of 1 Cor Art, Bifurc, with 2 Drug-elut,...",Percutaneous transluminal coronary angioplasty...,7,Operations on the cardiovascular system,7.3,Percutaneous transluminal coronary angioplasty...
12510,027035Z,45,"Dilation of 1 Cor Art with 2 Drug-elut, Perc A...",Percutaneous transluminal coronary angioplasty...,7,Operations on the cardiovascular system,7.3,Percutaneous transluminal coronary angioplasty...
12511,0270366,45,"Dilate of 1 Cor Art, Bifurc, with 3 Drug-elut,...",Percutaneous transluminal coronary angioplasty...,7,Operations on the cardiovascular system,7.3,Percutaneous transluminal coronary angioplasty...


In [11]:
ccs_first_character_set = set(cat_45_df['ICD10_PCS_CODE'].str[0:1])
ccs_first_character_set

{'0'}

In [12]:
ccs_two_character_set = set(cat_45_df['ICD10_PCS_CODE'].str[0:2])
ccs_two_character_set

{'02'}

In [13]:
ccs_three_code_set = set(cat_45_df['ICD10_PCS_CODE'].str[0:3])
ccs_three_code_set

{'027', '02C'}

In [14]:
ccs_four_code_set = set(cat_45_df['ICD10_PCS_CODE'].str[0:4])
ccs_four_code_set

{'0270', '0271', '0272', '0273', '02C0', '02C1', '02C2', '02C3'}