# SNOMED

SNOMED CT is a standarised clinical terminology consisting of >350,000 unique concepts. It is owned, maintained and distributed by SNOMED International.

Please visit and explore https://www.snomed.org/ to find out further information about the various SNOMED CT products and services which they offer.

-------

UK Edition files can be found via [NHS TRUD](https://isd.digital.nhs.uk/)

Download files via API coming soon...


--------

All raw files from SNOMED should be placed in the local directory [here](data/snomed)



# Part 1: Preprocessing SNOMED CT for MedCAT

Once you have downloaded a SNOMED release of interest. Store the zipped folder containing your respective SNOMED release in the current colab working directory.

The folder name should look like: `SnomedCT_InternationalRF2_PRODUCTION_20210131T120000Z.zip
`


### Import required packages

In [None]:
import zipfile
import json
from medcat.utils.preprocess_snomed import Snomed

### Load the data
Please see the section: [Access to SNOMED CT release files](#access_to_snomed_ct) for how to retrieve the zipped SNOMED CT release.

In [None]:
# Assign a path to the zipped SNOMED CT release download. (skip this step if the folder is not zipped)
snomed_path = "SnomedCT_InternationalRF2_PRODUCTION_20230131T120000Z.zip"  # Enter your zipped Snomed folder here
snomed_folder = snomed_path[:-4]  # The unzipped SNOMED CT folder path

In [None]:
with zipfile.ZipFile(snomed_path, 'r') as zip_ref:
    zip_ref.extractall(snomed_folder)

### Preprocess the release for MedCAT

In [None]:
# Initialise
snomed = Snomed(snomed_folder)

In [None]:
### Skip this step if your version of snomed is not the UK extension released >2021.
### Note: this step will only work with MedCAT v1.2.7+

snomed.uk_ext = False

#### Create a SNOMED DataFrame

We first preprocess SNOMED to fit the following format:


|cui|name|ontologies|name_status|description_type_ids|type_ids|
|--|--|--|:--:|:--:|--|
|101009|Quilonia ethiopica (organism)|SNOMED|P|organism|81102976|
.
.
.

`cui` - The concept unique identifier, this is simply the `SCTID`.

`name` - This include the name of the concept. The status of the name is given in `name_status`

`ontologies` - Always SNOMED. Alternatively you can change it to your specific edition.

`name_status` - The Fully specified name or FSN is denoted with a `P` - Primary Name. Each concept must be assigned only one Primary Name. These should be unique across all SCTID/cui to avoid confusion. A synonym or other description type is represented as a `A` - Alternative Name. This can be enriched with all possible names and abbreviations for a concept of interest.

`description_type_ids` - These are processed to be the Semantic Tags of the concept.

`type_ids` - This is simply a 10 digit Hash of the Semantic Tags




In [None]:
# Create SNOMED DataFrame 
df = snomed.to_concept_df()

In [None]:
df.head()

In [None]:
# inspect
df[df['cui'] == '101009']

In [None]:
# Optional - Create a SCTID to FSN dictionary
primary_names_only = df[df["name_status"] == 'P']
sctid2name = dict(zip(primary_names_only['cui'], primary_names_only['name']))
del primary_names_only

In [None]:
# Test with example SCTID
sctid2name['101009']

#### SNOMED Relationships

In [None]:
all_snomed_relationships = snomed.list_all_relationships()

In [None]:
# List of the SCTID of all snomed relationships
all_snomed_relationships

In [None]:
# Using the SCTID to name to inspect what the FSN (fully specified names) are:
for sctid in all_snomed_relationships:
    print(sctid2name[sctid])

#### Classification maps to inbuilt external terminologies 

The UK maps provide a one directional link from SNOMED CT to OPCS-4 and ICD-10. The international edition will only link to ICD-10.

They are compiled to reflect the national clinical coding standards and aid the application of the three dimensions of coding accuracy:

- Individual codes
- Totality of codes; and
- Sequencing of codes

Four different types of map are provided to accommodate the different circumstances that may influence ICD-10/OPCS-4 code assignment. 


|Map Type 1|Map Type 2|Map Type 3|Map Type 4|
|---|---|---|---|
|Links a single SNOMED CT concept to a single classification code to represent the clinical meaning of the concept. |Links a single SNOMED CT concept to a combination of classification codes which collectively represents the meaning of the SNOMED CT concept. <br/><br/> Map Type 1 and 2 may be generated automatically within systems, allowing the coding expert to devote time to the validation of more complex maps.|Links a single SNOMED CT concept to a choice of classification codes (default and alternative targets). Validation involves a coding expert using the additional detail found within the medical record, applying the rules, conventions and standards of the classifications, and manually selecting the final classification code or codes from a list of alternative targets.|Links a single SNOMED CT concept to a choice of classifications maps. Each choice of map may contain a single, combination or choice of target codes. Final selection will be informed by additional detail within the medical record and application of classification expertise by the coder.|


##### Map Blocks, Map Groups and Map Priorities

Each classification map will contain at least one map block, one map group and one map priority. Map Blocks, Map Group and Map Priority are numbered sequentially, starting at 1.

- A __Map Block__ signifies a code or string of codes that represent the SNOMED CT concept’s
fully specified name (FSN). Multiple Map Blocks will be included within the map if it is
necessary to represent the concept in multiple ways (e.g. sequencing of dagger and asterisk
codes).
- A __Map Group__ signifies each individual target code within a Map Block. Each individual code
within a Map Block will be allocated to its own Map Group unless it is an Alternative code.
Where multiple codes are required, the Map Groups builds in any required classification
sequencing rules.
- A __Map Priority__ signifies the priority of the code within the group based on the order in which
the codes are presented within mapping tables to enable the information to be read by
computer software systems. In a complex map, where alternative targets are provided within
a block or a group, an ALTERNATIVE target code is always listed before the TRUE target
code.



Let's inspect and create a SNOMED to ICD-10 map to add to the MedCAT concept database (cdb) additional information section

##### ICD-10
For SNOMED to ICD-10 mapping read more on:
Map Blocks, Map Groups and Map Priorities, for correct official mapping methodology.

In [None]:
# ICD-10
icd_df = snomed.map_snomed2icd10()

In [None]:
icd_df.head()

In [None]:
# drop codes with no mapping
icd_df = icd_df[icd_df['mapTarget']!='']

In [None]:
sctid2icd10 = icd_df.groupby('referencedComponentId').apply(lambda group: [{'code': row['mapTarget'],
                                                                                                    'mapGroup': row['mapPriority'],
                                                                                                    'mapPriority': row['mapPriority'],
                                                                                                    'mapRule': row['mapRule'],
                                                                                                    'mapAdvice': row['mapAdvice']} for _, row in group.iterrows()]).to_dict()

In [None]:
# To view the SNOMED to ICD-10 Map structure.
# The structure should be '44054006': [ {'code': 'R07.4', name: 'diabetes type2', 'priority': 1}, etc]
sctid2icd10['44054006']

##### OPCS
Office of Population Censuses and Surveys


__Note:__ only the SNOMED UK extension edition contains this information
Skip if your version is not a UK extension

In [None]:
opcs_df = snomed.map_snomed2opcs4()

In [None]:
opcs_df.head()

In [None]:
opcs_df['refsetId'].unique()  # notice how there are two codes?
# SCTID:'999002271000000101' represents ICD10 codes and SCTID:'1126441000000105' OPCS4
# Filtering by '999002271000000101' will also show more ICD10 codes. Explore why. Something funny with the UK ext


In [None]:
# Filter for just OPCS4
opcs_df = opcs_df[opcs_df['refsetId']=='1126441000000105']

In [None]:
sctid2opcs4 = opcs_df.groupby('referencedComponentId').apply(lambda group: [{'code': row['mapTarget'],
                                                                                                    'mapGroup': row['mapPriority'],
                                                                                                    'mapPriority': row['mapPriority'],
                                                                                                    'mapBlock': row['mapBlock'],
                                                                                                    'mapAdvice': row['mapAdvice']} for _, row in group.iterrows()]).to_dict()

### Save for MedCAT

In [None]:
# Save to CSV for medcat CDB creation
outfile = snomed_folder.replace('.', '_')+'.csv'
df.to_csv(outfile, index=False)
print(f'File saved for CDB creation at {outfile}.csv ')

In [None]:
# Save a specific relationship to json
# In the example we save the "IS a (attribute)" hierarchical relationship.
snomed.relationship2json("116680003", "ISA_relationship.json")

In [None]:
# Save mappings
json.dump(sctid2icd10, open("sctid2icd10.json", "w"))
json.dump(sctid2opcs4, open("sctid2opcs4.json", "w"))

--------

# Part 2: Create a MedCAT CDB using SNOMED CT release files


These steps are also in the [create_cdb.py](../../medcat/1_create_model/create_cdb/create_cdb.py)

In [None]:
# Import required packages
from medcat.cdb import CDB
from medcat.config import Config
from medcat.cdb_maker import CDBMaker

#### Create concept database (cdb)

In [None]:
# First initialise the default configuration
config = Config()
config.general['spacy_model'] = 'en_core_web_md'
maker = CDBMaker(config)

In [None]:
# Create an array containing CSV files that will be used to build our CDB
csv_path = [outfile]

# Create your CDB
## This step can take up to an hour
cdb = maker.prepare_csvs(csv_path, full_build=True)

### Inspect your cdb

In [None]:
print(cdb.name2cuis['epilepsy'])

In [None]:
print(cdb.cui2preferred_name['84757009'])

In [None]:
print(cdb.cui2names['84757009'])

#### Enrich with extra information and mapping

Mapping was created in [Mappings to inbuilt external terminologies](https://colab.research.google.com/drive/1yesqjMQwQH20Kl9w7siRGVaSWU0uI84W#scrollTo=Mappings_to_inbuilt_external_terminologies).
Here we use [ICD-10](https://colab.research.google.com/drive/1yesqjMQwQH20Kl9w7siRGVaSWU0uI84W#scrollTo=ICD_10) as an example.

In [None]:
cdb.addl_info['cui2icd10'] = sctid2icd10

### Save your new SNOMED cdb

__tip:__ good practise to include the snomed release edition file name

In [None]:
model_path = '../../models/cdb/'
cdb.save(model_path+f'{outfile[:-4]}.dat')