New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map BNF code changes to current codes #380

Merged
merged 29 commits into from Apr 3, 2017
Commits
Jump to file or symbol
Failed to load files and symbols.
+102 −11
Diff settings

Always

Just for now

Viewing a subset of changes. View all

Documentation

  • Loading branch information...
sebbacon committed Mar 29, 2017
commit 7f968027263c62318403584e4ad5e0b96869bb7e
@@ -1,3 +1,88 @@
"""This command deals with the fact that the NHS mutates its
prescribing identifiers periodically, making tracking changes through
time very difficult.
As of 2017 (but this is expected to change within the next year), NHS
England uses a derivative of the BNF (British National Formulary)
codes to identify each presentation dispensed, called the NHS Pseudo
Classification.
Unfortunately, both the BNF and the NHS make changes to codes
periodically. Sometimes a chemical gets a new code, or sometimes it
moves to a new section. Because the BNF code includes the section in
the first few characters of the code, just reclassifying a drug means
its unique identifier has changed. This makes tracking that drug
through time impossible.
The situation is further complicated that the BNF no longer maintains
its old classification, so the Pseudo codes now used by the NHS no
longer necessarily correspond with official BNF codes at all.
The situation is expected to improve with the introduction of ePACT2
and the moving of prescribing data to use SNOMED codes as per dm+d.
For the being, this method aims to normalise all codes in our dataset
so that prescribing is always indexed by the most recent version of
the Pseudo BNF Code.
We achieve this by applying a mapping of old code to new code which
has been applied annualy by NHSBSA to create their Pseduo code list.
This mapping has been supplied to us in private correspondence with
the NHS BSA, and is reproduced (with some corrections to obvious
typos, etc) in the files accompanying this module.
The normalisation process is as follows:
For each old code -> new code mapping, in reverse order of date
(i.e. starting with the most recent mappings):
* If the code is at the section, paragraph, chemical or product level,
mark our internal corresponding model for that classification as no
longer current
* Find every presentation matching the new code (or classification),
and ensure a presentation exists matching the old code. Create a
reference to the new presentation code from the old one.
* Create a table of mappings from old codes to the most recent current
code (taking into account multlple code changes)
* Create a View in bigquery that joins with this table to produce a
version of the prescribing data with only the most current BNF
codes; this is used to generate our local version of the prescribing
data, our measures, and so on, henceforward.
* Replace all the codes that have new normalised versions in all local
version of the prescribing data. (This method will be removed once
run, as it's only ever needed for an initial migration; look in git
history if interested)
* Iterate over all known BNF codes, sections, paragraphs etc, looking
for codes which have never been prescribed, and mark these as not
current. This is necessary because sometimes our mappings involve a
chemical-level change without making this explicit (i.e. a 15
character BNF code mapping has been supplied, but in fact it's the
Chemical part of the code that has changed). In these cases, we
can't tell if the Chemical is now redundant without checking to see
if there is any other prescribing under that code. This process
also has the useful side effect of removing the (many thousands of)
codes that have never actually been prescribed, and are therefore
unhelpful noise in our user interface.
* The problem with this approach is that recently added codes may
not yet have prescribing, but may do so at some point in the
future. Therefore, there is a `refresh_class_currency` method that
is always called as part of the `import_hscic_prescribing`
management command, which iterates over all sections, paragraphs,
chemicals and products currently listed as not current, and checks
to see if there has been any prescribing this month.
This command should in theory only have to be run once a year, as
we've been told mappings only happen this frequently. And in theory,
2017 is the last year of using BNF codes.
"""
import csv
import glob
import logging
@@ -48,6 +133,9 @@ def create_code_mapping(filenames):
prev_code, next_code = line.split("\t")
prev_code = prev_code.strip()
next_code = next_code.strip()
if not re.match(r'^[0-9A-Z]+$', next_code):
# Skip 'withdrawn' &c
continue
if len(prev_code) <= 7: # section, subsection, paragraph
Section.objects.filter(
@@ -59,17 +147,19 @@ def create_code_mapping(filenames):
elif len(prev_code) == 11:
Product.objects.filter(
bnf_code=prev_code).update(is_current=False)
if re.match(r'^[0-9A-Z]+$', next_code): # Skip 'withdrawn' &c
matches = Presentation.objects.filter(
bnf_code__startswith=next_code)
for row in matches:
replaced_by_id = row.pk
row.pk = None # allows us to clone
row.replaced_by_id = replaced_by_id
row.bnf_code = (
prev_code + replaced_by_id[len(prev_code):])
row.save()
matches = Presentation.objects.filter(
bnf_code__startswith=next_code)
for row in matches:
replaced_by_id = row.pk
old_bnf_code = prev_code + replaced_by_id[len(prev_code):]
try:
old_version = Presentation.objects.get(pk=old_bnf_code)
except Presentation.DoesNotExist:
old_version = row
old_version.pk = None # allows us to clone
old_version.bnf_code = old_bnf_code
old_version.replaced_by_id = replaced_by_id
old_version.save()
def create_bigquery_table():
@@ -199,6 +289,7 @@ def update_existing_prescribing():
def create_bigquery_view():
# XXX this seems to create a legacy view, which we don't want
sql = """
SELECT
prescribing.*,
ProTip! Use n and p to navigate between commits in a pull request.