# SNOMED

SNOMED CT is a standarised clinical terminology consisting of >350,000 unique concepts. It is owned, maintained and distributed by SNOMED International.

Please visit and explore https://www.snomed.org/ to find out further information about the various SNOMED CT products and services which they offer.

--------

## What is SNOMED CT?

SNOMED CT is a clinical terminology containing concepts with unique meanings and formal logic based definitions organised into hierarchies. For further information please see: https://confluence.ihtsdotools.org/display/DOCSTART/4.+SNOMED+CT+Basics

## SNOMED CT Design
SNOMED CT content is represented into 3 main types of components:

- Concepts representing clinical meanings that are organised into hierarchies.
- Descriptions which link appropriate human readable terms to concepts
- Relationships which link each concept to other related concepts

It also contains mappings to classification systems such as:
- ICD (International classifications of diseases)
- OPCS (Office of Population Censuses and Surveys) (SNOMED UK extension only)

---------

<a name="access_to_snomed_ct"></a>
## Access to SNOMED CT release files

You may download SNOMED CT at the Member country’s designated website. The use of SNOMED CT in Member countries is free. Follow this [link](https://www.snomed.org/our-stakeholders/members) to find out if your country is a member state and explore the website to find directions to where to your national SNOMED CT distribution is held.

E.g. 
* UK -> [NHS TRUD](https://isd.digital.nhs.uk/trud3/user/guest/group/0/home)

* US -> [NIH National Library of Medicine](https://www.nlm.nih.gov/healthit/snomedct/international.html) Alternative clinical terminologies such as UMLS can be found here.


The following Steps are to services provided by SNOMED International for organizations and individuals to request use and access to the International Release of SNOMED CT for use in non-Member countries

__To access SNOMED CT files from non-member contries:__

1.   Please visit the SNOMED [Member Licensing and Distribution Service.](https://mlds.ihtsdotools.org/#/landing) and read their terms and conditions for use.

2.   Login or Register for an account and wait to be granted access.

3.   Once you have been granted access. Logged in and visit the tab ["Release Packages"](https://mlds.ihtsdotools.org/#/viewReleases) and retrieve the release of SNOMED CT that you would like to have. Alternatively, for the international SNOMED release simply visit the [International releases](https://mlds.ihtsdotools.org/#/viewReleases/viewRelease/167).

----------

# Part 1: Preprocessing SNOMED CT for MedCAT

Once you have downloaded a SNOMED release of interest. Store the zipped folder containing your respective SNOMED release in the current colab working directory.

The folder name should look like: `SnomedCT_InternationalRF2_PRODUCTION_20210131T120000Z.zip
`


### Install and import required packages

In [None]:
# Install medcat
!pip install --upgrade medcat

In [None]:
from medcat.utils.preprocess_snomed import Snomed


### Load the data
Please see the section: [Access to SNOMED CT release files](#access_to_snomed_ct) for how to retrieve the zipped SNOMED CT release.

In [None]:
# Assign a path to the zipped SNOMED CT release download. (skip this step if the folder is not zipped)
snomed_path = "SnomedCT_InternationalRF2_PRODUCTION_20210131T120000Z.zip"  # Enter your zipped Snomed folder here


In [None]:
!unzip snomed_path

### Preprocess the release for MedCAT

In [None]:
# Initialise
snomed_filename = "SnomedCT_InternationalRF2_PRODUCTION_20210131T120000Z"  # The unzippedSNOMED CT folder
snomed = Snomed(snomed_filename)

In [None]:
### Skip this step if your version of snomed is not the UK extension released >2021.
### Note: this step will only work with MedCAT v1.2.7+

# snomed.uk_ext = True

#### Create a SNOMED DataFrame

We first preprocess SNOMED to fit the following format:


|cui|name|ontologies|name_status|description_type_ids|type_ids|
|--|--|--|:--:|:--:|--|
|101009|Quilonia ethiopica (organism)|SNOMED|P|organism|81102976|
.
.
.

`cui` - The concept unique identifier, this is simply the `SCTID`.

`name` - This include the name of the concept. The status of the name is given in `name_status`

`ontologies` - Always SNOMED. Alternatively you can change it to your specific edition.

`name_status` - The Fully specified name or FSN is denoted with a `P` - Primary Name. Each concept must be assigned only one Primary Name. These should be unique across all SCTID/cui to avoid confusion. A synonym or other description type is represented as a `A` - Alternative Name. This can be enriched with all possible names and abbreviations for a concept of interest.

`description_type_ids` - These are processed to be the Semantic Tags of the concept.

`type_ids` - This is simply a 10 digit Hash of the Semantic Tags




In [None]:
# Create SNOMED DataFrame 
df = snomed.to_concept_df()

In [None]:
df.head()

In [None]:
# inspect
df[df['cui'] == '101009']

In [None]:
# Optional - Create a SCTID to FSN dictionary
primary_names_only = df[df["name_status"] == 'P']
sctid2name = dict(zip(primary_names_only['cui'], primary_names_only['name']))
del primary_names_only

In [None]:
sctid2name['101009']

#### SNOMED Relationships

In [None]:
all_snomed_relationships = snomed.list_all_relationships()

In [None]:
# List of the SCTID of all snomed relationships
all_snomed_relationships

In [None]:
# Using the SCTID to name to inspect what the FSN (fully specified names) are:
for sctid in all_snomed_relationships:
    print(sctid2name[sctid])

In [None]:
# save a specific relationship to json
# In the example we save the "IS a (attribute)" hierarchical relationship.
snomed.relationship2json("116680003", "ISA_relationship.json")

#### Mappings to inbuilt external terminologies 

Create a dictionary map to add to the medcat concept database additional information

##### ICD-10
For SNOMED to ICD-10 mapping read more on:
Map Blocks, Map Groups and Map Priorities, for correct official mapping methodology.

In [None]:
# ICD-10
icd_dict = snomed.map_snomed2icd10()

In [None]:
top10keys = list(icd_dict)[:10]
for k in top10keys:
    print(k, ":", icd_dict[k])

In [None]:
# NOTE: The method below may be present in later versions of medcat
def get_direct_refset_mapping(in_dict: dict) -> dict:
    ret_dict = dict()
    for k, vals in in_dict.items():
        # sort such that highest priority values are first
        svals = sorted(vals, key=lambda el: el['mapPriority'], reverse=True)
        # only keep the code / CUI
        ret_dict[k] = [v['code'] for v in svals]
    return ret_dict
sctid2icd10 = get_direct_refset_mapping(icd_dict)

In [None]:
# To view the SNOMED to ICD-10 Map structure
sctid2icd10

##### OPCS
Office of Population Censuses and Surveys


__Note:__ only the SNOMED UK extension edition contains this information
Skip if your version is not a UK extension

In [None]:
opcs_dict = snomed.map_snomed2opcs4()

In [None]:
top10keys_opcs = list(opcs_dict)[:10]
for k in top10keys_opcs:
    print(k, ":", opcs_dict[k])

### Save for MedCAT

In [None]:
# Save to CSV for medcat CDB creation
df.to_csv("preprocessed_snomed.csv", index=False)

--------

# Part 2: Create a MedCAT CDB using SNOMED CT release files


These steps are also convered in the tutorial: [Part 3.1 Building a Concept Database and Vocabulary](https://colab.research.google.com/drive/1s1QXJ2E76sZLm5P0Lremw8-kWWXxX_w2#scrollTo=KByaUPYNk7gk)

In [None]:
# Get the scispacy model.
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_core_sci_md-0.4.0.tar.gz

**Restart the runtime if on colab, sometimes necessary after installing models**

In [None]:
# Import required packages
from medcat.cdb import CDB
from medcat.config import Config
from medcat.cdb_maker import CDBMaker

#### Create concept database (cdb)

In [None]:
# First initialise the default configuration
config = Config()
config.general['spacy_model'] = 'en_core_sci_md'
maker = CDBMaker(config)

In [None]:
# Create an array containing CSV files that will be used to build our CDB
csv_path = ['preprocessed_snomed.csv']

# Create your CDB
cdb = maker.prepare_csvs(csv_path, full_build=True)

### Inspect your cdb

In [None]:
print(cdb.name2cuis['epilepsy'])

In [None]:
print(cdb.cui2preferred_name['84757009'])

In [None]:
print(cdb.cui2names['84757009'])

#### Enrich with extra information and mapping

Mapping was created in [Mappings to inbuilt external terminologies](https://colab.research.google.com/drive/1yesqjMQwQH20Kl9w7siRGVaSWU0uI84W#scrollTo=Mappings_to_inbuilt_external_terminologies).
Here we use [ICD-10](https://colab.research.google.com/drive/1yesqjMQwQH20Kl9w7siRGVaSWU0uI84W#scrollTo=ICD_10) as an example.

In [None]:
cdb.addl_info['cui2icd10'] = sctid2icd10

### Save your new SNOMED cdb

__tip:__ good practise to include the snomed release edition file name

In [None]:
cdb.save("SNOMED_cdb.dat")