# Introduction to Medical coding standards

In this module, you will learn more details about EHR data from a deeper view. 

From our last module, we learned that Electronic Health Record (EHR) data is vast and collected from various clinicians. Clinicians in different clinical divisions may use different clinical terms to refer to the same cases, making it difficult to unify the concept or values of identical devices, observations, or measurements across the health system. To address this challenge, we need to find univeral standards to represent these concepts or values in a consistent manner.

Below are some widely used coding standards. 

## 1. LOINC 

Logical Observation Identifiers Names and Codes (LOINC) is a universally accepted coding system in the medical industry that enables healthcare providers to share test results and medical information. LOINC codes provide a unique, standardized way to identify medical test results, allowing for accurate comparison of results and better communication between healthcare providers. LOINC codes are used to identify medical tests and observations, results of examinations and procedures, and medical histories and other medical information. They enable laboratory information systems to accurately identify the types of tests being performed and the results being reported, so that the same test performed in different laboratories can be compared. This helps to ensure that healthcare providers are using the same language to communicate about tests and results and ensure that the different components of a patient's medical record are properly linked, allowing for more efficient data management. 

A unique code (format: nnnnn-n) is assigned to each laboratory test. Current codes are from 3-7 characters long. 

LOINC also creates clinician-friendly text label (name) to represent each concept. Here are some examples of LOINC codes along with their LOINC names.

* 2160-0 Creatinine [Mass/volume] in Serum or Plasma
* 26464-8 Leukocytes[#/volume] in Blood
* 6690-2 Leukocytes[#/volume] in Blood by Automated count.

Let's look at some examples with structured EHR datasets for laboratory tests. 

In [1]:
import pandas as pd
labs = pd.read_csv('./SampleDatasets/labs_sample.csv')
labs.head(5) #display top five records

Unnamed: 0,patient_deiden_id,measurement_datetime,result_datetime,lab_name,loinc_code,loinc_description,measurement_value,measurement_unit
0,7491,6/21/2016 16:11,6/23/2016 11:37,CREATININE,2160-0,Creatinine [Mass/volume] in Serum or Plasma 21...,1.08,mg/dL
1,61137,6/27/2017 10:28,6/28/2017 9:21,WHITE BLOOD CELL COUNT,6690-2,Leukocytes [#/volume] in Blood by Automated co...,5.0,x10E3/uL
2,23573,12/18/2015 8:36,12/18/2015 9:04,WBC,6690-2,Leukocytes [#/volume] in Blood by Automated co...,8.5,thou/mm3
3,699,8/28/2013 10:57,8/28/2013 12:09,WHITE BLOOD CELL COUNT,26464-8,Leukocytes [#/volume] in Blood 26464-8,7.0,thou/cu mm
4,699,9/17/2014 9:47,9/17/2014 12:49,CREATININE,2160-0,Creatinine [Mass/volume] in Serum or Plasma 21...,0.83,mg/dL


The code reads records from a csv file into dataframe and show the first 5 records.

Here is the field description of the example table.

| Field |  Description |
| --- | --- |
|  patient_deiden_id |  Patient identifier |
|  measurement_datetime |  Measurement datetime of the laboratory test |
|  result_datetime |  Result datatime of the laboratory test |
|  lab_name |  Name of the laboratory test |
|  loinc_code |  Loinc codes of the laboratory test |
|  loinc_description |  Loinc name of the laboratory test |
|  measurement_vale | Result value of the laboratory test  |
|  measurement_unit |  Result unit of the labpratory test |

From Records 0 and 4 are the measurement results of serum creatinine; records 1, 2, and 3 are the measurement results of white blood cell count. We notice that laboratory test with same 'lab_name' may have difference in the way the observation to made, e.g., records 1 and 3. Similarly, the laboratory with same loinc_code can have different lab names, for example, records 1 and 2. These observations demonstate that LOINC codes provide a unique, standardized way to identify medical test results and we should use LOINC to identify data and move it seamlessly between systems.

### Finding specific laboratory tests 
Suppose we want to find all records of serum creatinine. From LOINC database, we know the loinc codes for serum creatinine is '2160-0'. We can use pandas function 'pandas.DataFrame.isin(values)' (Whether each element in the DataFrame is contained in values) to find the records; the 'values' can be a list containing all loinc codes of specific laboratory tests.

For example,
serum creatinine ['2160-0']
white blood cell count ['26464-8', '6690-2']

In [2]:
creatinine = labs[labs['loinc_code'].isin(['2160-0'])]
creatinine.head(5)

Unnamed: 0,patient_deiden_id,measurement_datetime,result_datetime,lab_name,loinc_code,loinc_description,measurement_value,measurement_unit
0,7491,6/21/2016 16:11,6/23/2016 11:37,CREATININE,2160-0,Creatinine [Mass/volume] in Serum or Plasma 21...,1.08,mg/dL
4,699,9/17/2014 9:47,9/17/2014 12:49,CREATININE,2160-0,Creatinine [Mass/volume] in Serum or Plasma 21...,0.83,mg/dL
6,7491,5/23/2013 19:21,5/23/2013 20:10,CREATININE,2160-0,Creatinine [Mass/volume] in Serum or Plasma 21...,1.01,mg/dL
7,699,5/19/2019 6:57,5/19/2019 7:46,CREATININE,2160-0,Creatinine [Mass/volume] in Serum or Plasma 21...,0.85,mg/dL
13,7491,9/30/2015 9:19,9/30/2015 11:27,CREATININE,2160-0,Creatinine [Mass/volume] in Serum or Plasma 21...,1.33,mg/dL


## 2. RxNorm 

RxNorm, produced by the National Library of Medicine (NLM),  is a naming system for prescription drugs and a tool for linking drug names to the many vocabularies used in pharmacies and with drug interaction software. Because healthcare centers and pharmacies use a variety of drug-naming conventions, RxNorm provides a way to normalize the names and identifiers of drugs.

The RxNorm vocabulary creates standard names and codes (RxNorm Concept Unique Identifier (RxCUI)) for the combinations of ingredients, strength, and dose forms (such as Aspirin 325 MG Oral Tablet) that exist in drugs. Here are some examples of drugs with names and codes.

* Celebrex 200 mg oral capsule - 205323
* Cephalexin 500 MG Oral Capsule - 309114
* Metformin hydrochloride 500 MG Oral Tablet - 861007

Let's look at some examples with structured EHR datasets for medications.

In [3]:
import pandas as pd
medications = pd.read_csv('./SampleDatasets/meds_sample.csv')
medications.head(5)

Unnamed: 0,patient_deiden_id,med_order_display_name,med_order_route,taken_datetime,total_dose_character,med_dose_unit_desc,med_infusion_rate,med_infusion_rate_unit_desc,rxnorm
0,123101,perflutren lipid microspheres (DEFINITY) injec...,Intravenous,6/13/2019 14:38,1.7,ml,,,283753
1,123101,perflutren lipid microspheres (DEFINITY) injec...,Intravenous,5/23/2019 15:55,2.0,ml,,,283753
2,123101,perflutren lipid microspheres (DEFINITY) injec...,Intravenous,5/21/2019 7:35,10.0,ml,,,283753
3,123101,iodixanol (VISIPAQUE) 320 MG/ML injection 100 mL,Intravenous,8/4/2019 19:30,100.0,ml,,,27729
4,123101,multivitamin tablet 1 tablet,Oral,8/21/2019 8:11,1.0,tablet,,,89905


Here is the field description of the example table.

| Field |  Description |
| --- | --- |
|  patient_deiden_id |  Patient identifier |
|  med_order_display_name |  medication name |
|  med_order_route |  the means by which a drug enters the body |
|  taken_datetime |  datetime of taking medication |
|  total_dose_character |  dose the taken drug |
|  med_dose_unit_desc |  medication dose unit |
|  med_infusion_rate | infusion rate |
|  med_infusion_rate_unit_desc |  unit of infusion rate |
|  rxnorm |  rxnorm code |

Let's check the most common rxnorm code in this sample dataset.

In [4]:
medications['rxnorm'].value_counts().head()

706943    113
150985    111
308508     78
307675     71
307719     70
Name: rxnorm, dtype: int64

The most common rxnorm code is '706943' representing 'sodium chloride 70 MG/ML Inhalation Solution'.

### Finding patients taking specific drug

Suppose we want to find all patients taking 'sodium chloride 70 MG/ML Inhalation Solution'. From RxNorm look up table, we find that the RxNorm code for this drug is '706943'. We can use 'rxnorm' to find all patients taking this drug.

In [5]:
patients_sodium_chloride = medications[medications.rxnorm.isin([706943])]
patients_sodium_chloride.head(5)

Unnamed: 0,patient_deiden_id,med_order_display_name,med_order_route,taken_datetime,total_dose_character,med_dose_unit_desc,med_infusion_rate,med_infusion_rate_unit_desc,rxnorm
500,123101,sodium chloride 7 % nebulizer solution 4 mL,Nebulization,8/27/2019 21:00,4.0,ml,,,706943
503,123101,sodium chloride 7 % nebulizer solution 4 mL,Nebulization,9/3/2019 11:25,4.0,ml,,,706943
504,123101,sodium chloride 7 % nebulizer solution 4 mL,Nebulization,9/4/2019 12:28,4.0,ml,,,706943
506,123101,sodium chloride 7 % nebulizer solution 4 mL,Nebulization,9/5/2019 8:42,4.0,ml,,,706943
507,123101,sodium chloride 7 % nebulizer solution 4 mL,Nebulization,9/8/2019 4:07,4.0,ml,,,706943


## 3. ICD-9 and ICD-10 codes

The International Classification of Diseases (ICD), produced by the World Health Organization (WHO), is an internationally standard set of codes used by healthcare systems to refer to health conditions and diseases. ICD codes are used to assign a code to each diagnosis or procedure, allowing for easy and accurate tracking of medical data across different countries and language barriers.

### ICD versions

Before October 2015, Medicare claims were reported via version ICD-9-CM for all applicable procedure and diagnosis codes. Starting in October 2015, however, the Department of Health and Human Services (HHS) required Medicare claims to be reported using ICD-10-CM. While the ICD-9-CM system had approximately 13,000 codes and was at risk of running out of numbers, the newer ICD-10-CM system has the capability of having approximately 68,000 numbers. The extra room for additional codes is necessary because ICD-10-CM is a more detailed system than its predecessor. Many ICD-10-CM codes are more specific and granular in their references to anatomy, etiology, comorbidity, and complications, thus allowing for a more specific definition of illness severity. The newest version of the diagnosis code is ICD-11-CM, which went into use on January 1, 2022.

### ICD-9-CM code structure

Here is the format for ICD-9-CM code entries:

XXX.XX

Characters 1-3 indicate the category, and characters 4-5 indicate the etiology. Character 1 can be either alphabetic or numeric, but characters 2-5 must be numeric. ICD-9-CM codes require a minimum of three characters and allow a maximum of five. The decimal point is placed after the first three characters.

Here are some examples of ICD-9-CM codes:

* 001.9 - Cholera, unspecified
* 250.00 - Diabetes mellitus without mention of complication
* 401.1 - Benign essential hypertension

### ICD-10-CM code structure

Here is the format for ICD-10-CM code entries:

XXX.XXXX

Characters 1-3 indicate the category, characters 4-6 indicate the etiology, and character 7 indicates the extension. Character 1 must be alphabetic (any letter except U), character 2 must be numeric, and characters 3-7 can be alphabetic or numeric. The decimal point is placed after the first three characters.

Here are some examples of ICD-10-CM codes:

* A00.9 - Cholera, unspecified
* E11.9 - Type 2 diabetes mellitus without complications
* I10 - Essential (primary) hypertension

Let's look at some examples with structured EHR datasets for diagnosis codes.

In [6]:
import pandas as pd
diagnosis = pd.read_csv('./SampleDatasets/diagnoses_sample.csv')
diagnosis.head(5)

Unnamed: 0,patient_deiden_id,start_date,diag_code,diag_icd_type,diag_hierarchy,poa
0,125715,7/15/2014,135,ICD9,Sarcoidosis,1
1,136154,2/29/2012,153.9,ICD9,"Malignant neoplasm of colon, unspecified site ...",0
2,51200,12/19/2014,307.81,ICD9,Tension headache,0
3,51200,4/6/2015,307.81,ICD9,Tension headache,0
4,144561,10/1/2015,C61,ICD10,Malignant neoplasm of prostate (CMS-HCC: 12),1


Here is the field description of the example table.

| Field |  Description |
| --- | --- |
|  patient_deiden_id |  Patient identifier |
|  start_date |  date of diagnosis code |
|  diag_code |  diagnosis code |
|  diag_icd_type |  ICD type (ICD 9 or 10) |
|  diag_hierarchy |  description of diagnosis code |
|  poa |  indicator if the code is present on admission |


Let's focus on ICD 10 diagnosis codes only using column 'diag_icd_type'.

In [7]:
icd10_diagnosis_code = diagnosis[diagnosis['diag_icd_type'] == 'ICD10']
icd10_diagnosis_code.head(5)

Unnamed: 0,patient_deiden_id,start_date,diag_code,diag_icd_type,diag_hierarchy,poa
4,144561,10/1/2015,C61,ICD10,Malignant neoplasm of prostate (CMS-HCC: 12),1
5,144561,10/16/2015,C61,ICD10,Malignant neoplasm of prostate (CMS-HCC: 12),0
6,144561,1/21/2016,C61,ICD10,Malignant neoplasm of prostate (CMS-HCC: 12),1
7,144561,2/1/2016,C61,ICD10,Malignant neoplasm of prostate (CMS-HCC: 12),1
8,144561,5/13/2016,C61,ICD10,Malignant neoplasm of prostate (CMS-HCC: 12),0


Let's check the most popular disease in this sample dataset.

In [8]:
diagnosis['diag_hierarchy'].value_counts()

Essential hypertension, benign            163
Atrial fibrillation (CMS-HCC: 96)         147
End stage renal disease (CMS-HCC: 136)    114
Supervision of other normal pregnancy      92
Cough                                      86
                                         ... 
Pityriasis rosea                            1
Acquired acanthosis nigricans               1
Inflamed seborrheic keratosis               1
Other specified disease of nail             1
Unspecified tinnitus                        1
Name: diag_hierarchy, Length: 426, dtype: int64

In our sample dataset, the top disease is 'Essential hypertension, benigh' and there are 163 samples have it.

### Finding patients with specific disease

Suppose we want to find all patients with end stage renal disease. From ICD code lookup table, we find that the ICD-9 diagnosis code is '585.6' and ICD-10 diagnosis code is 'N18.6'. We can use 'diag_code' and 'diag_icd_type' to find all patients with end stage renal disease.

In [9]:
esrd_icd9_codes = ['585.6']
esrd_icd10_codes = ['N18.6']
esrd_patients = diagnosis[((diagnosis['diag_code'].isin(esrd_icd9_codes)) & (diagnosis['diag_icd_type'] == 'ICD9')) | 
                          ((diagnosis['diag_code'].isin(esrd_icd10_codes)) & (diagnosis['diag_icd_type'] == 'ICD10'))]
esrd_patients.head(5)

Unnamed: 0,patient_deiden_id,start_date,diag_code,diag_icd_type,diag_hierarchy,poa
2802,1316,6/6/2016,N18.6,ICD10,End stage renal disease (CMS-HCC: 136),1
2803,3205,8/9/2016,N18.6,ICD10,End stage renal disease (CMS-HCC: 136),1
2804,3205,11/1/2016,N18.6,ICD10,End stage renal disease (CMS-HCC: 136),1
2805,3205,2/17/2017,N18.6,ICD10,End stage renal disease (CMS-HCC: 136),0
2824,1316,12/18/2017,N18.6,ICD10,End stage renal disease (CMS-HCC: 136),0


## 4. CPT codes

Current Procedural Terminology (CPT) is a standard system for medical procedure codes. CPT codes help to efficiently streamline reporting and administrative tasks such as claims processing. CPT codes are organized into six categories: Evaluation and Management (E/M); Anesthesia; Surgery; Radiology; Pathology and Laboratory; and Medicine.

Most CPT codes are five-digit numeric codes, but some CPT codes have four numeric characters and one alphabetic character. CPT codes do not have a decimal point.

Here are some examples of CPT codes.

* 99397 - Established Patient Preventive Medicine Services
* 11400 - Under Excision-Benign Lesions Procedures on the Skin
* 01860 - Anesthesia for Procedures on the Forearm, Wrist, and Hand

Let's look at some examples with structured EHR datasets for CPT codes.

In [10]:
import pandas as pd
procedure = pd.read_csv('./SampleDatasets/procedures_sample.csv')
procedure.head(5)

Unnamed: 0,patient_deiden_id,proc_code,proc_code_type,proc_desc,proc_date
0,36230,99233,CPT,"PR SUBSEQUENT HOSPITAL CARE,LEVL III",9/26/2017
1,151522,70450,CPT,"CHG CT SCAN,HEAD/BRAIN,W/O CONTRAST MATL",2/16/2013
2,122518,GZB2ZZZ,ICD10,"Electroconvulsive Therapy, Bilateral-Single Se...",2/18/2019
3,48484,76.31,ICD9,Partial mandibulectomy,3/20/2013
4,5334,31624,CPT,PR BRNCHSC W/BRNCL ALVEOLAR LAVAGE,11/17/2015


Here is the field description of the example table.

| Field |  Description |
| --- | --- |
|  patient_deiden_id |  Patient identifier |
|  proc_code |  procedure code |
|  proc_code_type |  procedure code types (ICD 9 or ICD10 or CPT) |
|  proc_desc |  description of procedure code |
|  proc_date |  date of procedure code |

As we mentioned before, ICD codes are assigned for both diagnosis and procedure. Thus, there are often three kinds of procedure codes including ICD-9-CM, ICD-10-CM and CPT codes. CPT codes and ICD-10-CM codes are similar, but CPT codes are used to identify medical services, whereas ICD-10-CM codes are used to identify patient diagnoses.

In [11]:
procedure['proc_code_type'].unique()

array(['CPT', 'ICD10', 'ICD9'], dtype=object)

Let's check the most popular CPT code in our sample dataset.

In [12]:
procedure.loc[procedure['proc_code_type'] == 'CPT', ['proc_code', 'proc_desc']].value_counts()

proc_code  proc_desc                           
99291      PR CRITICAL CARE, E/M 30-74 MINUTES     61
31624      PR BRNCHSC W/BRNCL ALVEOLAR LAVAGE      37
99214      PR OFFICE/OUTPT VISIT,EST,LEVL IV       32
99233      PR SUBSEQUENT HOSPITAL CARE,LEVL III    32
88305      CHG SURG PATH,LEVEL IV                  25
                                                   ..
70548      CHG MR ANGIO, NECK W/CONTRAST            1
70544      CHG MR ANGIO, HEAD                       1
70491      CHG CT NECK TISSUE CONTRAST              1
70487      CHG CT SCAN, FACE/JAW CONTRAST           1
840        PR ANESTH,SURG LOWER ABDOMEN             1
Length: 115, dtype: int64

The most common CPT code in our sample dataset is 99291 representing 'PR CRITICAL CARE, E/M 30-74 MINUTES'. 

### Finding patients undergone specific procedures / surgeries

Suppose we want to find patients who have undergone hemodialysis procedure. From procedure code lookup table, we find the cpt codes representing hemodialysis procedure are '90935','90937','90945','90947' and '90999'. We can use 'proc_code' variable to find the patients.

In [13]:
dialysis_cpt_codes = ['90935','90937','90945','90947','90999']
dialysis_patients = procedure[procedure['proc_code'].isin(dialysis_cpt_codes)]
dialysis_patients.head(5)

Unnamed: 0,patient_deiden_id,proc_code,proc_code_type,proc_desc,proc_date
95,36230,90935,CPT,PR HEMODIALYSIS PROCEDURE W/ PHYS/QHP EVALUATION,5/15/2015
879,703,90935,CPT,PR HEMODIALYSIS PROCEDURE W/ PHYS/QHP EVALUATION,7/20/2013
955,36230,90935,CPT,PR HEMODIALYSIS PROCEDURE W/ PHYS/QHP EVALUATION,5/15/2015


## 5. SNOMED-CT

Systemized Nomenclature of Medicine – Clinical Terms (SNOMED CT) is a multilingual system of standardized clinical terms that includes codes, terms, synonyms, and definitions. The terminology of SNOMED CT is used for electronic health records and can be used to describe clinical diagnoses, symptoms, medical procedures, organisms and specimens, pharmaceuticals, and medical devices.

The four main components of SNOMED CT are the following:

- Concepts—numeric codes that indicate a specific clinical meaning
- Descriptions—text description of the clinical meaning represented by the Concept Code
- Relationships—connections between two Concept Codes with a related meaning
- Reference Sets—a classification that sorts Concept Codes and Descriptions into groups

![](./Images/SNOMED.png)

> credit: For SNOMED-CT part,the material is adapted from
[SNOMED CT StarterGuide](https://confluence.ihtsdotools.org/display/DOCSTART/SNOMED+CT+Starter+Guide)