Skip to content

Data issues, errors, and discrepancies

David Hale edited this page Aug 10, 2013 · 8 revisions

Overview

Many developers new to working with medication-related data are unaware of potential errors and discrepancies within the data. This document provides an overview of many issues present in the Pillbox data set. It is intended as an educational resource to 1) help developers better understand the data, and 2) illustrate the scope and limitations of working with this data, which is critical when creating applications that identify unknown medication or provide information to clinicians, patients, and others.

It is also intended to be a starting point for discussion and further exploration of this data with the goal of improving downstream utilization by a growing community of innovative individuals and groups seeking to solve challenges related to not only medication identification and reference, but across health care.

This report is not intended to definitively confirm the presence of an error in Structured Product Labeling data supplied by a submitting firm. Rather, it highlights potential errors, discrepancies, and data of interest based on analysis of data outlying expected parameters. It is not intended to be comprehensive.

Pillbox’s data is derived from the Structured Product Labeling, obtained via NLM’s DailyMed and NLM’s RxNorm, a normalized naming system for generic and branded drugs.

Data changes and transparency

Because changes are now being made to the data based on comparison of pill images to the physical characteristics data, every effort is being made to be as transparent as possible about this process. This includes making a table of all changes made to data available for download, web pages which list every change made along with the image used for comparison, and this document which defines the scope and methodology used to make these changes.

The scope of these changes is not intended to be complete. In situations where ambiguity exists the original data is not modified. For records where there is no image available, it is not possible to verify the physical characteristics data.

Data changes in Pillbox do not affect the master data derived from the drug labels, available via DailyMed.

Identification of data errors and discrepancies

Errors or discrepancies in Pillbox’s data can be identified either visually (comparing pill images to data) or algorithmically (querying data based on logical assumptions about the data).

To the best of the Pillbox team’s knowledge, no federal agency reviews the physical characteristics data for pills. It is our hope that Pillbox (and the pill images produced through the project) will be a catalyst for the development of manual or automated validation systems for these data and improvement in the overall accuracy of these data.

Three unique problems

Re-labeling and re-distribution

A unique type of discrepancy can occur when a pill is marketed by more than one company. Company A may manufacture and market a drug. They may also allow other companies to distribute and market that same drug. Each company must submit a separate drug label and that same pill will have a different National Drug Code (NDC) in that label. Some distribution chains may be quite large. The small, brown, 200 mg ibuprofen, that has an imprint of “I2” for example, is distributed under almost 80 different NDCs and labels.

An upcoming data release of Pillbox will group pills by physical characteristics, ingredients and strength, and other criteria so that each unique pill has only one record. In practice there may not be possible as some label authors have changed one or more of the physical characteristics or other data from the source label.

The benefits of organizing pills by original manufactured products extend beyond simplifying identification and improving user experience. If an issue, such as contamination, should ever be occur with a medication, identifying all distribution points for that medication is critical for public safety. Developers should have easy access to that information.

Manufacturers changing the appearance of a pill

FDA guidance requires (citation needed) that if the physical characteristics of a pill change, then that product requires a new NDC. The most common change made to a pill is the imprint. When a manufacturer changes the physical characteristics of a pill and does not apply for a new NDC, it creates a conflict in presenting the data, especially if there is an image for the pill.

For example, Company A has a pill with an imprint “123 10”. They then change the imprint to “A 10”. For a certain time, both pills will be available. If the imprint in the image differs from the data, it may be difficult to determine if the imprint data is 1) incorrect or 2) the imprint has changed but the original NDC was kept. Also, if both pills are present in the market, there should be two separate records as users could be trying to identify either pill.

Identifying pills that are no longer marketed

Pillbox was not designed to be an archival resource. It was intended to reflect the current information available via its sources. The data process which creates Pillbox takes current data from DailyMed and RxNorm and parses individual products (pills). Cases exist however where a user is trying to identify an older medication, stored for years in a medicine cabinet. In disaster response situations, medications which are past the expiration date may be used if certain criteria are met and tests show the medications have retained their potency.

This issue will also be addressed by the upcoming data release will pills will be grouped by physical characteristics. It has yet to be determined how far back in time to go, looking for unique pills. Also, without images it will be difficult to verify the accuracy of the physical characteristics used to group the pills. This will results is a greater number of groups, with some groups being created based on inaccurate data. Groupings based on accurate data will be unaffected.

Visual

FDA publishes guidance for coding the physical characteristics (imprint, color, shape, size, score) of pills. Based on a review of the 2,159 images available via Pillbox as of July 2013, changes were made to the physical characteristics data of approximately 17% (359) of records for which there was an image.

As Pillbox increases the number of standardized, high quality images available, those images will be compared to data for each product to ensure physical characteristics (imprint, color, shape, size, score) data match the images. While many of these errors are more easily identified than others (a round pill listed as square), some criteria are more subjective or nuanced.

Imprint

Before continuing, you should read the FDA form and submission requirements for imprint.

Imprint is perhaps the single best identifier for a pill and presents challenges when developing search logic. While relatively few errors of commission (typographic errors) have been found in the data based on a comparison to available pill images, a number of other factors are present.

  • Company or drug names are sometimes omitted from the imprint data.
  • Descriptive text is included in the imprint value (ex: “A;10;company logo”)
  • Some manufacturers change the imprint on the pill without getting a new NDC for that product. Usually this is the addition or removal of a sequence of letters or numbers or the company’s name. See “Manufacturers changing the appearance of a pill” for more information.
  • Some imprints are formatted inconsistently in a way that may affect search results
Additional rules

The goals are to improve search results without introducing ambiguity and to accurately represent the text that appears on a pill.

  • Trailing semi-colons in the data are removed (ex: “A;10;” changed to “A;10”)
  • Dashes, if present, are included in the imprint and not replaced with spaces or semi-colons
  • Decimal points are entered as part of the imprint value
  • Capitalization is accurately represented in the data (search is not case sensitive)
  • Stylized single letters (which often appear on pills) are entered as part of the imprint value
  • Text that appears on separate line or is separated by a score line is separated in the imprint value by semi-colons
  • If text is repeated on an pill (ex: a scored pill with the number 10 on both sides or a pill with the letter A appearing multiple times around the edge of a pill) it is entered as separate text. This will increase the likelihood of an exact match while not interfering with search strings that only include one iteration of repeated text and errs on the side of an accurate representation of the look of the pill.
  • Text the crosses itself (ex: BAYER written vertically and horizontally, crossing at the Y) is entered as separate values, separated by a semi-colon.
  • If there exists a space between characters/number that is larger than the prevalent spacing present elsewhere on the pill’ imprint, text is separated by a semi-colon.

One additional area of concern related to imprint values are characters which look similar. Text on a pill is often small and the various imprinting processes may render text that is difficult to read. Users may not be able to accurately identify a character in situations like these.

  • lower-case L vs the number one (1)
  • Upper-case O vs the number zero (0)

Color

Before continuing, you should read the FDA form and submission requirements for color.

Color is the most subjective of the physical characteristics, however it is one of the most likely to be used by an individual describing a pill. Existing guidance specifies RGB values for each of the 12 colors. Color perception varies greatly among individuals and is subject to ambient lighting conditions (indoor fluorescent and incandescent light sources, sunlight, reflected light, etc.). As such, similarly colored pills may be listed as a variety of similar colors, such as red/orange/brown or blue/turquoise/green.

Issues encountered:

  • Pills listed as a color that is obviously different from the predominant color present in the image. In these situations the color value was changed to that of the predominant color. The new color is subject to the subjectivity described previously.

Additional rules:

  • Though the guidance specifies that there should be only one value present for color, it is common to see two colors listed. This practice is upheld in Pillbox’s data.
  • For pills that have more than one distinct color (a capsule with a pink cap and white base), the secondary color is added.
  • If the labeler lists a single color and the pill could also be described by a second color, that color may be added.
  • When more than one color is listed, if there is a predominant color (such as a yellow pill with a small white section in the middle) the predominant color is listed first. This provides the potential to enhance search and more accurately describe the pill without negatively affecting search results.
  • Double color listings (ex: white/white) were change to a single value of that color.

The NLM Pillbox SPLIMAGE pill image specification creates images under standardized lighting conditions. It is hoped that these images will lead to development of an automated system to accurately define the predominant color of a pill and create a pallet of colors that is representative of the colors present.

Shape

Before continuing, you should read the FDA form and submission requirements for shape.

Additionally, interpretation of guidance for criteria such as shape have specific nuances that may not be obvious.

Size

Before continuing, you should read the FDA form and submission requirements for size.

Score

Before continuing, you should read the FDA form and submission requirements for score.