# Multi-source data import 

Sources:
- MIMIC-III. Covers the years 2001-2012. Has free-text notes.  
- MIMIC-IV. Covers the years  2008 - 2019. Has physician order entry data, reference ranges for lab values, and some other changes. Doesn't have free-text notes as of this writing.
- UMLS. Provides a common set of concepts that form a central connection point for many other sources such as RxNorm and MeSH.
- RxNorm. Has drug-drug and drug-disease interactions, indications, contraindications, etc.  
- MeSH. Has broader-narrower relationships among hierarchically-related terms.
- Pubmed. Has the majority of the world's medical literature in free text, with abstracts freely available an accessible through an API.

## Information about each source

### MIMIC-III
Schema of MIMIC-III: https://mit-lcp.github.io/mimic-schema-spy/index.html

### MIMIC-IV
Documentation for MIMIC-IV (no schema on schema spy as of this writing): 

### RxNorm 
Connect various forms/dosages/routes of a clinical drug to the underlying pharmacologic substance  
![](images/RxNorm_relationships_among_RXCUIs.png)  
Note the "TTY" field from the graph above corresponds to the heading of each box below.  
![](images/RxNorm_CUIs_related_to_coumadin.png)

Relate each pharmacologic substance to other drugs with interaction info  
![](images/RxNorm_drug_interactions_warfarin.png)  

Connect clinically relevant properties of drugs   
![](images/RxNorm_clinical_properties_relationships.png)  

RxNorm main landing page: https://www.nlm.nih.gov/research/umls/rxnorm/index.html  
AMIA article describing RxNorm: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3128404/  
Data downloads: https://www.nlm.nih.gov/research/umls/rxnorm/docs/rxnormfiles.html  
Web-based browser: https://mor.nlm.nih.gov/RxNav/search?searchBy=String&searchTerm=coumadin  
Technical docs: https://www.nlm.nih.gov/research/umls/rxnorm/docs/index.html  


The full download of RxNorm files contains a directory called "rrf" with the following contents:

RXNCONSO.RRF                        121,180,353          bytes
RXNDOC.RRF                          218,467              bytes
RXNREL.RRF                          503,188,245          bytes
RXNSAB.RRF                          10,698               bytes
RXNSAT.RRF                          502,793,103          bytes
RXNSTY.RRF                          17,996,450           bytes

Archival files for tracking RxNorm historical content:
RXNATOMARCHIVE.RRF                  74,069,962           bytes
RXNCUICHANGES.RRF                   39,589               bytes
RXNCUI.RRF                          1,716,694            bytes

In [2]:
import pandas as pd

In [3]:
# Load RXNREL.RRF into a dataframe
rxnrel = pd.read_csv('/home/tim/Documents/GrApH_AI/Data/RxNorm_full_06072021/rrf/RXNREL.RRF', sep='|', header=None, encoding='utf-8')
rxnrel[:5]

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
0,,5.0,AUI,SY,,6.0,AUI,permuted_term_of,155592245.0,,MSH,,,,,,
1,,5.0,SDUI,SIB,,104746.0,SDUI,,154524204.0,,MSH,,,,,,
2,,5.0,SDUI,RN,,609702.0,SDUI,mapped_to,154691227.0,,MSH,,1.0,,,,
3,,5.0,AUI,SY,,2666961.0,AUI,sort_version_of,155371534.0,,MSH,,,,,,
4,,5.0,AUI,SY,,2681015.0,AUI,entry_version_of,155054914.0,,MSH,,,,,,


In [4]:
rxnrel.iloc[:,7].value_counts()

inactive_ingredient_of             1532663
has_inactive_ingredient            1532663
active_ingredient_of                357487
has_active_ingredient               357487
has_active_moiety                   337459
active_moiety_of                    337459
has_ingredient                      323817
ingredient_of                       323817
inverse_isa                         242390
isa                                 242390
dose_form_of                        124076
has_dose_form                       124076
constitutes                         107708
consists_of                         107708
tradename_of                         98860
has_tradename                        98860
doseformgroup_of                     34806
has_doseformgroup                    34806
has_print_name                       27671
print_name_of                        27671
ingredients_of                       11308
has_ingredients                      11308
has_precise_ingredient               10992
precise_ing

In [19]:
# Load RXNSAT.RRF (Simple Concept and Atom Attributes) into a dataframe
columns = ['RXCUI', 'LUI', 'SUI', 'RXAUI', 'STYPE', 'CODE', 'ATUI', 'SATUI', 'ATN', 'SAB', 'ATV', 'SUPPRESS', 'CVF'] # Column headers and descriptions at https://www.nlm.nih.gov/research/umls/rxnorm/docs/techdoc.html#sat
rxnsat = pd.read_csv('/home/tim/Documents/GrApH_AI/Data/RxNorm_full_06072021/rrf/RXNSAT.RRF', sep='|', header=None, encoding='utf-8')
rxnsat = rxnsat.iloc[:,:13] # Drop empty column at index 14
rxnsat.columns = columns
rxnsat[:5]

Unnamed: 0,RXCUI,LUI,SUI,RXAUI,STYPE,CODE,ATUI,SATUI,ATN,SAB,ATV,SUPPRESS,CVF
0,38,,,829,AUI,38,,,RXN_BN_CARDINALITY,RXNORM,single,N,4096.0
1,38,,,8056626,AUI,D001971,AT212333259,,TERMUI,MSH,T005606,N,
2,38,,,8056626,AUI,D001971,AT212365433,,LT,MSH,TRD,N,
3,38,,,8056626,AUI,D001971,AT212543507,,TH,MSH,UNK (19XX),N,
4,38,,,8056626,SCUI,D001971,AT60770509,,RN,MSH,0,N,


RXSAT.RFF table info

|Column|Description|
|---|---|
|RXCUI|Unique identifier for concept (concept id)|  
|LUI|Unique identifier for term (no value provided)|  
|SUI|Unique identifier for string (no value provided)|  
|RXAUI|RxNorm atom identifier (RXAUI) or RxNorm relationship identifier (RUI).|  
|STYPE|The name of the column in RXNCONSO.RRF or RXNREL.RRF that contains the identifier to which the attribute is attached, e.g., CUI, AUI.|  
|CODE|"Most useful" source asserted identifier (if the source vocabulary has more than one identifier), or a RxNorm-generated source entry identifier (if the source vocabulary has none.)|  
|ATUI|Unique identifier for attribute|  
|SATUI|Source asserted attribute identifier (optional - present if it exists)|  
|ATN|Attribute name (e.g. NDC). Possible values appear in RXNDOC.RRF and are described on the UMLS Attribute Names page|  
|SAB|Abbreviation of the source of the attribute. Possible values appear in RXNSAB.RRF and are listed on the UMLS Source Vocabularies page|  
|ATV|Attribute value described under specific attribute name on the UMLS Attribute Names page (e.g. 000023082503 where ATN = 'NDC'). A few attribute values exceed 1,000 characters. Many of the abbreviations used in attribute values are explained in RXNDOC.RRF and included UMLS Abbreviations Used in Data Elements page|  
|SUPPRESS|Suppressible flag. Values = O, Y, or N. Reflects the suppressible status of the attribute. N - Attribute is not suppressed. O - Attribute is suppressed at source level. Y - Attribute is suppressed by RxNorm editors.|  
|CVF|Content view flag. RxNorm includes one value, '4096', to denote inclusion in the Current Prescribable Content subset. All rows with CVF='4096' can be found in the subset.| 

In [27]:
pd.set_option("display.max_rows", 120)
rxnsat['ATN'].value_counts() #Table listing attribute names and descriptions: https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/release/attribute_names.html

NDC                                                               1820557
SPL_SET_ID                                                        1684527
LABELER                                                            360647
DM_SPL_ID                                                          193547
LABEL_TYPE                                                         186919
MARKETING_EFFECTIVE_TIME_LOW                                       184220
MARKETING_CATEGORY                                                 183116
MARKETING_STATUS                                                   183051
DDF                                                                148523
DCSA                                                               141358
DRT                                                                113332
DST                                                                103673
COLORTEXT                                                           78851
COLOR                                 

### MED-RT
Connect medications with other concept types such as diseases, phenotypes, etc.

How MED-RT connects multiple source vocabularies:  
![image.png](images/MED_RT_content_model.png)  
Figure source: https://evs.nci.nih.gov/ftp1/MED-RT/MED-RT%20Documentation.pdf  

Sample of some relationships specified in MED-RT:  
![image.png](images/MED_RT_relationships.png)  
Screenshot source: https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/MED-RT/metarepresentation.html#relationships 

MEDRT_MoA_NUIs file is an index of mechanisms of action.  
Sample line from the file:  
Acetylcholine Release Inhibitors [MoA]	N0000175770	MED-RT  
Possible ways to store the data:
- Each line becomes a node with the label "Mechanism_of_Action"
- Each line becomes a property of a drug node

MEDRT_PE_NUIs file is an index of physiologic effects.  
Sample line from the file:  
Acetylcholine Activity Alteration [PE]	N0000008290	MED-RT  
Possible ways to store the data:
- Each line becomes a node with the label "Physiologic_Effect"
- Each line becomes a property of an existing UMLS concept node

### Excerpt from MED-RT_Schema_v1.xsd

AssociationDef - definition of Association
	<xs:complexType name="AssociationDef">
		<xs:annotation>
			<xs:documentation> This element includes all types of Associations: Synonyms, Term Associations and Concept Associations.
			</xs:documentation>
		</xs:annotation>
		<xs:sequence>
			<xs:element name="namespace" type="xs:token"/>
			<xs:element name="name" type="xs:token"/>
			<!-- name of AssociationType -->
			<xs:group ref="FromElement"/>
			<xs:group ref="ToElement"/>
			<xs:element name="qualifier" type="QualifierDef" minOccurs="0" maxOccurs="unbounded"/>
		</xs:sequence>
	</xs:complexType>
	<xs:group name="ToElement">
		<xs:annotation>
			<xs:documentation> A reference from the local Concept/Term to another Concept/Term (in any Namespace).
			</xs:documentation>
		</xs:annotation>
		<xs:sequence>
			<xs:element name="to_namespace" type="xs:token"/>
			<xs:element name="to_name" type="xs:token">
				<xs:annotation>
					<xs:documentation>MED-RT: Concept Name
MeSH: Preferred Term
RxNorm: Preferred Term
SNOMED CT: FSN Synonym</xs:documentation>
				</xs:annotation>
			</xs:element>
			<!-- name of target Concept/Term -->
			<xs:element name="to_code" type="xs:token" minOccurs="0">
				<xs:annotation>
					<xs:documentation>MED-RT: NUI
MeSH: Code in Source
RxNorm: Code in Source
SNOMED CT: Code in Source</xs:documentation>
				</xs:annotation>
			</xs:element>
			<!-- code of target Term -->
		</xs:sequence>
	</xs:group>
	<xs:group name="FromElement">
		<xs:annotation>
			<xs:documentation> A reference to the local Concept/Term from another Concept/Term (in a different Namespace).
			</xs:documentation>
		</xs:annotation>
		<xs:sequence>
			<xs:element name="from_namespace" type="xs:token"/>
			<xs:element name="from_name" type="xs:token">
				<xs:annotation>
					<xs:documentation>MED-RT: Concept Name
MeSH: Preferred Term
RxNorm: Preferred Term
SNOMED CT: FSN Synonym</xs:documentation>
				</xs:annotation>
			</xs:element>
			<!-- name of source Concept/Term -->
			<xs:element name="from_code" type="xs:token">
				<xs:annotation>
					<xs:documentation>MED-RT: NUI
MeSH: Code in Source
RxNorm: Code in Source
SNOMED CT: Code in Source</xs:documentation>
				</xs:annotation>
			</xs:element>
			<!-- code of source Term -->
		</xs:sequence>
	</xs:group>

### FDA's Structured Product Labels
"The Structured Product Labeling (SPL) is a document markup standard approved by Health Level Seven (HL7) and adopted by FDA as a mechanism for exchanging product and facility information." - U.S. FDA  
SPL Resources: https://www.fda.gov/industry/fda-resources-data-standards/structured-product-labeling-resources  
Download data: https://dailymed.nlm.nih.gov/dailymed/spl-resources-all-drug-labels.cfm

### MeSH
Connect heirarchically-related terms with broader-narrower relationships  
![Broader-narrower relationships among MeSH concepts](images/MeSH_relationships.png)  
MeSH contributes broader-narrower connections as displayed in the UMLS browser:  
![](images/MeSH_broader_narrower_in_UMLSbrowser.png)

RDF format for MeSH: https://id.nlm.nih.gov/mesh/, https://hhs.github.io/meshrdf/  
Concept structure of MeSH: https://www.nlm.nih.gov/mesh/concept_structure.html


### Pubmed

## Data model to connect the various data sources

MIMIC-IV d_labitems loinc_code connects to UMLS by LOINC code