## NHS Master Data Managment (in England)

It is quite common to load in master data files from [ODS Data Search and Export](https://digital.nhs.uk/services/organisation-data-service/data-search-and-export/csv-downloads) into a computer system. The identifiers used for GP's and Organisations help identify these entities between different systems and health providers.

NHS England provides several API's for doing this:

- [Organisation Data Terminology - FHIR API](https://digital.nhs.uk/developer/api-catalogue/organisation-data-terminology) which allows you to search for organisations
- [Spine Directory Service - LDAP API](https://digital.nhs.uk/developer/api-catalogue/spine-directory-service-ldap) which allows search on a wide set of MDM entities and includes most of the entities from ODS.

The structure of these entities in FHIR, ODS and SDS is very similar. This diagram is from [HL7 FHIR Administration Module](https://hl7.org/fhir/R4/administration-module.html)

![Alt text](https://hl7.org/fhir/R4/administration-module-prov-dir.png)

### Care Directory Service

In this guide we are aiming to produce a FHIR API following [IHE Mobile Care Services Discovery (mCSD)](https://profiles.ihe.net/ITI/mCSD/index.html). We won't get a to complete implementation as the health services are available in a variety of `directory of services` APIs, such as:

- [Directory of Healthcare Services (Service Search) API](https://digital.nhs.uk/developer/api-catalogue/directory-of-healthcare-services)
- [Electronic Transmission of Prescriptions Web Services - SOAP API](https://digital.nhs.uk/developer/api-catalogue/electronic-transmission-of-prescriptions-web-services-soap)




### Load GP Practitioners (egpcur)

The general idea behind this is we want to be able to do some basic queries on ODS data. For example we may want a list of GP's who work at

In [128]:
import requests
from zipfile import ZipFile
from io import BytesIO
import pandas as pd

headers = {'User-Agent': 'Mozilla/5.0 (X11; Windows; Windows x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36'}

url = 'https://files.digital.nhs.uk/assets/ods/current/egpcur.zip'
response = requests.get(url, headers=headers, timeout=120)
response.raise_for_status()  # Raise an exception for bad status codes

myzip = ZipFile(BytesIO(response.content))
myzip.namelist()
myzip.extractall('ZIP')

egpcur = pd.read_csv('ZIP/egpcur.csv', header=None, index_col=False, names=["GMP","Practitioner_Name",3,4,"AddressLine_1","AddressLine_2","AddressLine_3","AddressLine_4","AddressLine_5","PostCode",10,11,12,13,"ODS",15,16,"PhoneNumber",18,19,20,21,22,23,24,25,26], dtype={'AddressLine_5': 'S20'})

egpcur

Unnamed: 0,GMP,Practitioner_Name,3,4,AddressLine_1,AddressLine_2,AddressLine_3,AddressLine_4,AddressLine_5,PostCode,...,PhoneNumber,18,19,20,21,22,23,24,25,26
0,G0102005,ALLEN EB,Y11,QAL,"FIRCROFT, LONDON ROAD",ENGLEFIELD GREEN,EGHAM,SURREY,b'',TW20 0BS,...,,,,,1,,,,,
1,G0102926,ANDERSON MG,Y61,QUE,LENSFIELD MEDICAL PRAC.,48 LENSFIELD ROAD,CAMBRIDGE,CAMBRIDGESHIRE,b'',CB2 1EH,...,01223 651020,,,,1,,06H,,,
2,G0105912,ADLER S,Y56,QMJ,682 FINCHLEY ROAD,GOLDERS GREEN,LONDON,,b'',NW11 7NP,...,020 84559994,,,,1,,93C,,,
3,G0107031,ATTWOOD DC,Y62,QOP,GREAT LEVER HEALTH CENTRE,"RUPERT STREET,GREAT LEVER",BOLTON,LANCASHIRE,b'',BL3 6RN,...,01204 462141,,,,1,,00T,,,
4,G0107725,ALEXANDER PJ,Y01,QDF,10 WEST END,SWANLAND,HUMBERSIDE,,b'',HU14 3PE,...,0482 633570,,,,1,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
123768,G9996043,UNIDENTIFIED GPS,W00,Q99,NORTH WALES HA,PRESWYLFA,HENDY ROAD,MOLD FLINTSHIRE,b'',CH7 1PZ,...,,,,,1,,,,,
123769,G9996050,UNIDENTIFIED GPS,W00,Q99,MORGANNWG HA,41 HIGH STREET,SWANSEA,WEST GLAMORGAN,b'',SA1 1LT,...,,,,,1,,,,,
123770,G9996067,COMMITTEES LOCUM,W00,QW3,DEPUTISING SERVICES,POWYS,,,b'',,...,,,,,1,,,,,
123771,G9996074,COMMITTEES LOCUM,W00,QW2,DEPUTISING SERVICES,SOUTH-GLAMORGAN,,,b'',,...,,,,,1,,,,,


### Load GP Practices (epraccur)

In [129]:
url = 'https://files.digital.nhs.uk/assets/ods/current/epraccur.zip'
response = requests.get(url, headers=headers, timeout=120)
response.raise_for_status()  # Raise an exception for bad status codes

myzip = ZipFile(BytesIO(response.content))
#myzip.namelist()
myzip.extractall('ZIP')

epraccur = pd.read_csv('ZIP/epraccur.csv', header=None, index_col=False, names=["ODS","Organisation_Name","NationalGrouping",4,"AddressLine_1","AddressLine_2","AddressLine_3","AddressLine_4","AddressLine_5","PostCode",11,12,13,14,"PRAC_ODS",16,17,18,19,20,21,22,23,24,25,26])

epraccur = epraccur.set_index(['ODS'])

epraccur

Unnamed: 0_level_0,Organisation_Name,NationalGrouping,4,AddressLine_1,AddressLine_2,AddressLine_3,AddressLine_4,AddressLine_5,PostCode,11,...,17,18,19,20,21,22,23,24,25,26
ODS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
A81001,THE DENSHAM SURGERY,Y63,QHM,THE HEALTH CENTRE,LAWSON STREET,STOCKTON ON TEES,CLEVELAND,,TS18 1HU,19740401,...,,01642 672351,,,,0,,16C,,4
A81002,QUEENS PARK MEDICAL CENTRE,Y63,QHM,QUEENS PARK MEDICAL CTR,FARRER STREET,STOCKTON ON TEES,CLEVELAND,,TS18 2AW,19740401,...,,01642 618170,,,,0,,16C,,4
A81003,VICTORIA MEDICAL PRACTICE,Y54,Q74,THE HEALTH CENTRE,VICTORIA ROAD,HARTLEPOOL,CLEVELAND,,TS26 8DB,19740401,...,20171031.0,01429 272945,,,,0,,00K,,4
A81004,ACKLAM MEDICAL CENTRE,Y63,QHM,TRIMDON AVENUE,ACKLAM,MIDDLESBROUGH,CLEVELAND,,TS5 8SB,19740401,...,,01642 827697,,,,0,,16C,,4
A81005,SPRINGWOOD SURGERY,Y63,QHM,SPRINGWOOD SURGERY,RECTORY LANE,GUISBOROUGH,,,TS14 7DJ,19740401,...,,01287 619611,,,,0,,16C,,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Y08757,COMMUNITY HOSPITAL ALCOHOL TEAM,Y60,QNC,EDWARD MYERS UNIT,HARPLANDS HOSPITAL,STOKE-ON-TRENT,STAFFORDSHIRE,,ST4 6TH,20250501,...,,01782 441715,,,,1,,RLY,,10
Y08758,LARC SERVICE,Y60,QJM,COUNTY OFFICES,NEWLAND,LINCOLN,LINCOLNSHIRE,,LN1 1YL,20250401,...,,01522 554980,,,,1,,503,,8
Y08759,WELL LIFE CLINIC,Y59,QXU,THE HOUSE PARTNERSHIP,99 STATION ROAD,REDHILL,SURREY,,RH1 1EB,20250601,...,,01737 761201,,,,1,,92A,,0
Y08760,OSPREY UNIT - PODIATRY,Y58,QOX,GREAT WESTERN HOSPITAL,MARLBOROUGH ROAD,SWINDON,WILTSHIRE,,SN3 6BB,20250422,...,,01793 604300,,,,1,,92G,,9


This next section of code:
- Adds practice name to the GP data frame
- splits the name into surname and initials

In [130]:

egpcur = pd.merge(egpcur, epraccur['Organisation_Name'], left_on='ODS', right_on='ODS')

egpcur['Practitioner_Surname'] = egpcur['Practitioner_Name'].str.split(' ', expand=True)[0]
egpcur['Practitioner_Initials'] = egpcur['Practitioner_Name'].str.split(' ', expand=True)[1]

Updated GP data frame

In [131]:
egpcur

Unnamed: 0,GMP,Practitioner_Name,3,4,AddressLine_1,AddressLine_2,AddressLine_3,AddressLine_4,AddressLine_5,PostCode,...,20,21,22,23,24,25,26,Organisation_Name,Practitioner_Surname,Practitioner_Initials
0,G0102926,ANDERSON MG,Y61,QUE,LENSFIELD MEDICAL PRAC.,48 LENSFIELD ROAD,CAMBRIDGE,CAMBRIDGESHIRE,b'',CB2 1EH,...,,1,,06H,,,,LENSFIELD MEDICAL PRACTICE,ANDERSON,MG
1,G0105912,ADLER S,Y56,QMJ,682 FINCHLEY ROAD,GOLDERS GREEN,LONDON,,b'',NW11 7NP,...,,1,,93C,,,,ADLER JS-THE SURGERY,ADLER,S
2,G0107031,ATTWOOD DC,Y62,QOP,GREAT LEVER HEALTH CENTRE,"RUPERT STREET,GREAT LEVER",BOLTON,LANCASHIRE,b'',BL3 6RN,...,,1,,00T,,,,LEVER CHAMBERS 2,ATTWOOD,DC
3,G0108018,ALLDRIDGE DGE,Y59,QXU,OAKFIELD,158 STATION ROAD,REDHILL,SURREY,b'',RH1 1HF,...,,1,,,,,,MOAT HOUSE SURGERY,ALLDRIDGE,DGE
4,G0108324,ANDERSON CF,Y63,QHM,THE HEALTH CENTRE,LAWSON STREET,STOCKTON ON TEES,CLEVELAND,b'',TS18 1HU,...,,1,,16C,,,,THE DENSHAM SURGERY,ANDERSON,CF
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
119854,G9996012,UNIDENTIFIED GPS,W00,Q99,GWENT HA,MAMHILAD HOUSE,MAMHHILAD PARK ESTATE,PONTYPOOL GWENT,b'',NP4 0YP,...,,1,,,,,,UNIDENTIFIED GPS,UNIDENTIFIED,GPS
119855,G9996029,UNIDENTIFIED GPS,W00,Q99,BRO TAF HA,CHURCHILL HOUSE,CHURCHILL WAY,CARDIFF,b'',CF10 2TW,...,,1,,,,,,UNIDENTIFIED GPS,UNIDENTIFIED,GPS
119856,G9996036,UNIDENTIFIED GPS,W00,Q99,DYFED POWYS HA,ST. DAVID'S HOSPITAL,CARMARTHEN,DYFED,b'',SA31 3HB,...,,1,,,,,,UNIDENTIFIED GPS,UNIDENTIFIED,GPS
119857,G9996043,UNIDENTIFIED GPS,W00,Q99,NORTH WALES HA,PRESWYLFA,HENDY ROAD,MOLD FLINTSHIRE,b'',CH7 1PZ,...,,1,,,,,,UNIDENTIFIED GPS,UNIDENTIFIED,GPS


In [132]:
practitionerDF = egpcur.loc[(egpcur['Practitioner_Surname'] == "KOYA") & (egpcur['Practitioner_Initials'] == "MR")]

row = practitionerDF.iloc[0]

### Practitioner

In [133]:
from fhir.resources.R4B.practitioner import Practitioner
import json

active = True

practitionerJSON = {
    "resourceType": "Practitioner",
    "identifier": [
        {
            "system": "https://fhir.hl7.org.uk/Id/gmp-number",
            "value": row['GMP']
        }
    ],
    "active": active,
    "name": [
        {
            "family": row['Practitioner_Surname'],
            "given": [
                row["Practitioner_Initials"]
            ],
            "prefix": [
                "Dr"
            ]
        }
    ],
    "telecom": [
        {
            "system": "phone",
            "value": row['PhoneNumber'],
            "use": "work"
        }
    ],
    "address": [
        {
            "use": "work",
            "postalCode": row['PostCode']
        }
    ]
}

practitioner = Practitioner(**practitionerJSON)

print(json.dumps(practitionerJSON, indent=2, ensure_ascii=False))

{
  "resourceType": "Practitioner",
  "identifier": [
    {
      "system": "https://fhir.hl7.org.uk/Id/gmp-number",
      "value": "G3298457"
    }
  ],
  "active": true,
  "name": [
    {
      "family": "KOYA",
      "given": [
        "MR"
      ],
      "prefix": [
        "Dr"
      ]
    }
  ],
  "telecom": [
    {
      "system": "phone",
      "value": "020 72720111",
      "use": "work"
    }
  ],
  "address": [
    {
      "use": "work",
      "postalCode": "N19 3NX"
    }
  ]
}


### PractitionerRole

A practitioner can work at multiple organisations, so we need a link entity (table).

The element's code and specialty are optional, but we can improve our search capabilities by adding data we can infer from the source file (egpcur). This is the practitioner is a GP and works in General Practice.

Note how we have incorporated identifiers and display names. This is to provide some common data elements in this resource and not require the user to perform another search to retrieve these details, we can clearly see this role is for Dr Koya at the Archway Practice.

In [134]:
from fhir.resources.R4B.practitionerrole import PractitionerRole

practitionerRoleJSON = {
    "resourceType": "PractitionerRole",
    "active": True,
    "practitioner": {
        "identifier": {
            "system": "https://fhir.hl7.org.uk/Id/gmp-number",
            "value": row['GMP']
        },
        "display": row['Practitioner_Name']
    },
    "organization": {
        "identifier": {
            "system": "https://fhir.nhs.uk/Id/ods-organization-code",
            "value": row['ODS']
        },
        "display": row['Organisation_Name']
    },
    "code": [
        {
            "coding": [
                {
                    "system": "http://snomed.info/sct",
                    "code": "62247001",
                    "display": "General practitioner"
                }
            ]
        }
    ],
    "specialty": [
        {
            "coding": [
                {
                    "system": "http://snomed.info/sct",
                    "code": "394814009",
                    "display": "General practice (specialty) (qualifier value)"
                }
            ]
        }
    ]
}

practitionerRole = PractitionerRole(**practitionerRoleJSON)

print(json.dumps(practitionerRoleJSON, indent=2, ensure_ascii=False))

{
  "resourceType": "PractitionerRole",
  "active": true,
  "practitioner": {
    "identifier": {
      "system": "https://fhir.hl7.org.uk/Id/gmp-number",
      "value": "G3298457"
    },
    "display": "KOYA MR"
  },
  "organization": {
    "identifier": {
      "system": "https://fhir.nhs.uk/Id/ods-organization-code",
      "value": "F83004"
    },
    "display": "ARCHWAY MEDICAL CENTRE"
  },
  "code": [
    {
      "coding": [
        {
          "system": "http://snomed.info/sct",
          "code": "62247001",
          "display": "General practitioner"
        }
      ]
    }
  ],
  "specialty": [
    {
      "coding": [
        {
          "system": "http://snomed.info/sct",
          "code": "394814009",
          "display": "General practice (specialty) (qualifier value)"
        }
      ]
    }
  ]
}


### Organisation



In [135]:
id = epraccur.index.get_loc("F83004")
epraccur.iloc[id,16].replace("'","")

'020 72720111'

In [136]:
from fhir.resources.R4B.organization import Organization

organisationJSON = {
    "resourceType": "Organization",
    "identifier": [
        {
            "system": "https://fhir.nhs.uk/Id/ods-organization-code",
            "value": "F83004"
        }
    ],
    "active": True,
    "type": [
        {
            "coding": [
                {
                    "system": "https://fhir.nhs.uk/CodeSystem/organisation-role",
                    "code": "76",
                    "display": "GP PRACTICE"
                }
            ]
        }
    ],
    "name": epraccur.iloc[id,0],
    "telecom": [
        {
            "system": "phone",
            "value": epraccur.iloc[id,16],
            "use": "work"
        }
    ],
    "address": [
        {
            "use": "work",
            "postalCode": epraccur.iloc[id,8]
        }
    ],
    "partOf": {
        "identifier": {
            "system": "https://fhir.nhs.uk/Id/ods-organization-code",
            "value": epraccur.iloc[id,1]
        }
    }
}

organisation = Organization(**organisationJSON)

print(json.dumps(organisationJSON, indent=2, ensure_ascii=False))

{
  "resourceType": "Organization",
  "identifier": [
    {
      "system": "https://fhir.nhs.uk/Id/ods-organization-code",
      "value": "F83004"
    }
  ],
  "active": true,
  "type": [
    {
      "coding": [
        {
          "system": "https://fhir.nhs.uk/CodeSystem/organisation-role",
          "code": "76",
          "display": "GP PRACTICE"
        }
      ]
    }
  ],
  "name": "ARCHWAY MEDICAL CENTRE",
  "telecom": [
    {
      "system": "phone",
      "value": "020 72720111",
      "use": "work"
    }
  ],
  "address": [
    {
      "use": "work",
      "postalCode": "N19 3NU"
    }
  ],
  "partOf": {
    "identifier": {
      "system": "https://fhir.nhs.uk/Id/ods-organization-code",
      "value": "Y56"
    }
  }
}
{
  "resourceType": "Organization",
  "identifier": [
    {
      "system": "https://fhir.nhs.uk/Id/ods-organization-code",
      "value": "F83004"
    }
  ],
  "active": true,
  "type": [
    {
      "coding": [
        {
          "system": "https://fhir.nh

## Testing FHIR (Validation)

So far we have just created FHIR resources as JSON. We have performed basic schema validation using a [fhir.resources](https://github.com/nazrulworld/fhir.resources). Note this package uses FHIR R4B, not R4 and we are using R4 - confused, none of the resources in FHIR R4 changed in R4B, so this is fine.

You can also validate FHIR using command line tools such as [FHIR CLI Validator](https://confluence.hl7.org/spaces/HAFWG/pages/248876078/Using+the+FHIR+Validator+Locally+Quick+Guide) or online applications such as [validate.fhir.org](https://validator.fhir.org/).

Note these tools will generate warnings around England content; you can reduce these warnings by using the [NHS England UK Core](https://digital.nhs.uk/services/fhir-uk-core) package. We use our own package [Virtual Healthcare Testing](https://virtually-healthcare.github.io/R4/testing.html) which incorporates UK Core and extra NHS England data requirements. Documentation on Virtually Healthcare data requirements can be found below, these are called FHIR Profiles:

- [Organization](https://virtually-healthcare.github.io/R4/StructureDefinition-Organization.html)
- [Practitioner](https://virtually-healthcare.github.io/R4/StructureDefinition-Practitioner.html)
- [PractitionerRole](https://virtually-healthcare.github.io/R4/StructureDefinition-PractitionerRole.html)

The profiles are stricter than UK Core as these need to be followed in several products, they are generally conformant to wider NHS England data requirements (not just FHIR).

### Working with a FHIR Test Server

How to put the resources we built earlier into a FHIR Server is available on the internet, and so we won't repeat that.

If you wish to experiment with this, I would suggest using the [HAPI FHIR Test Server](https://hapi.fhir.org/). E.g.

`POST http://hapi.fhir.org/baseR4/Organization`

`POST http://hapi.fhir.org/baseR4/Practitioner`

`POST http://hapi.fhir.org/baseR4/PractitionerRole`

Once you have added the resources to HAPI FHIR, you should be able to search for them, e.g.

`GET http://hapi.fhir.org/baseR4/Organization?identifier=https://fhir.nhs.uk/Id/ods-organization-code|F83004`

`GET http://hapi.fhir.org/baseR4/Practitioner?identifier=https://fhir.hl7.org.uk/Id/gmp-number|G3298457`


## Practical Implementation

So far we have a relatively simple model for our GPs and Practices both are strongly identified using national identifiers, but in practice we will have several other identifiers. Existing use of these national identifiers may not be robust and have data issues. This can occur in all EPR systems, including secondary care.

The main issue is although GMP is defined [GENERAL MEDICAL PRACTITIONER PPD CODE](https://www.datadictionary.nhs.uk/attributes/general_medical_practitioner_ppd_code.html) this and the other practitioner identifiers are quite frequently mixed up.

How to handle this is beyond the scope of this walkthrough, a list of all the different practitioner identifiers can be found on [NHS North West GMSA](https://nw-gmsa.github.io/R4/StructureDefinition-EnglandPractitionerIdentifier.html)

Many systems will have their own strong identifier — for example, EMIS uses UUID's to identify practitioners across all its API's. Our use case is master data management, so it makes sense for us to have a record of that in our MDM solution. As suppliers are supporting operational delivery of care and that ODS is only updated quarterly (and monthly), it's likely that our Practitioner may have more details than ODS or is more up to date.

This means we need to cope with existing data, our data load needs to be repeatable (so we can schedule quarterly/monthly) runs and we can merge with existing data.

### Demonstration FHIR Server and Database