# Pivot Tables for Devices and Defects

List the devices from 2020 and 2021 along with their defects and the number of times each defect was reported.

- Use the `GENERIC_NAME` column for the device.
- Use the `DEFECT_PROBLEM_TEXT` column for the defect text.

- Use the 'complete' data from the working directories for each year
    - `./04-Process-2020-Data/2020_data_complete.csv`
    - `./04-Process-2021-Data/2021_data_complete.csv`

In [1]:
import os

# Identify the working directory and data files
working_directory = "./05-Pivot-tables-for-devices-and-defects"

# Create the working directory if needed
try:
    os.makedirs(working_directory, exist_ok=True)
except OSError as error:
    print(f"Error creating {working_directory}: {error}")

## Create table for 2020 Data
- Read the data into a pandas dataframe
- Create a dataframe for the devices using the `GENERIC_NAME` column of the dataframe, removing duplicates as needed
- Create a dataframe for the defects using the `DEFECT_PROBLEM_TEXT` column of the dataframe, removing duplicates as needed

In [2]:
import pandas as pd

data_file_2020 = "./04-Process-2020-Data/2020_data_complete.csv"

# Read the data into a pandas dataframe
data_2020 = pd.read_csv(
    data_file_2020,  # The data file being read, from the variable assignment above
    on_bad_lines="warn",  # This tells Pandas to only warn on bad lines vs causing an error
    dtype="str",
)  # This tells Pandas to treat all numbers as words

# Remove unwanted columns
unwanted_columns = [
    "MDR_REPORT_KEY",
    "MDR_TEXT_KEY",
    "TEXT_TYPE_CODE",
    "PATIENT_SEQUENCE_NUMBER",
    "DATE_REPORT",
    "FOI_TEXT",
    "DEVICE_SEQUENCE_NO",
    "BRAND_NAME",
    "MANUFACTURER_D_NAME",
    "MODEL_NUMBER",
    "DEVICE_AVAILABILITY",
    "DEVICE_REPORT_PRODUCT_CODE",
    "REPORT_NUMBER",
    "REPORT_SOURCE_CODE",
    "NUMBER_DEVICES_IN_EVENT",
    "DATE_RECEIVED",
    "INITIAL_REPORT_TO_FDA",
    "MANUFACTURER_G1_NAME",
    "REMEDIAL_ACTION",
    "EVENT_TYPE",
    "MANUFACTURER_NAME",
    "TYPE_OF_REPORT",
    "SUMMARY_REPORT",
    "NOE_SUMMARIZED",
    "UDI-DI",
    "UDI-PUBLIC",
]

data_2020.drop(unwanted_columns, axis=1, inplace=True)

In [3]:
print(f"data_2020 creation complete: {data_2020.shape}")

data_2020 creation complete: (3856740, 8)


In [4]:
# Preview the data
data_2020.head()

Unnamed: 0,DEVICE_PROBLEM_CODE,DEVICE_PROBLEM_TEXT,GENERIC_NAME,DATE_OF_EVENT,REPORTER_OCCUPATION_CODE,REPORT_DATE,EVENT_LOCATION,SOURCE_TYPE
0,2993,Adverse Event Without Identified Device or Use...,DEFIBRILLATION LEAD,12/12/2019,1,,I,"COMPANY REPRESENTATIVE,HEALTH"
1,2993,Adverse Event Without Identified Device or Use...,DEFIBRILLATION LEAD,12/12/2019,1,,I,"COMPANY REPRESENTATIVE,HEALTH"
2,1332,Failure to Interrogate,IMPLANTABLE CARDIOVERTER DEFIBRILLATOR,12/12/2019,0,,I,"COMPANY REPRESENTATIVE,HEALTH"
3,1332,Failure to Interrogate,IMPLANTABLE CARDIOVERTER DEFIBRILLATOR,12/12/2019,0,,I,"COMPANY REPRESENTATIVE,HEALTH"
4,1332,Failure to Interrogate,IMPLANTABLE CARDIOVERTER DEFIBRILLATOR,12/12/2019,0,,I,"COMPANY REPRESENTATIVE,HEALTH"


In [5]:
# Create a dataframe for the devices using the `GENERIC_NAME` column of the dataframe, removing duplicates as needed
generic_names_2020 = pd.DataFrame(data_2020["GENERIC_NAME"].value_counts())

# Create a dataframe for the defects using the `DEFECT_PROBLEM_TEXT` column of the dataframe, removing duplicates as needed
defects_2020 = pd.DataFrame(data_2020["DEVICE_PROBLEM_TEXT"].value_counts())

In [6]:
# Preview the data
generic_names_2020.head()

Unnamed: 0,GENERIC_NAME
CONTINUOUS GLUCOSE MONITOR,496771
ENDOSSEOUS DENTAL IMPLANT,315749
"PUMP, INFUSION",312885
"ARTIFICIAL PANCREAS DEVICE SYSTEM, SINGLE HORMONAL CONTROL",269489
"PUMP, INFUSION, INSULIN, TO BE USED WITH INVASIVE GLUCOSE SENSOR",205467


In [7]:
generic_names_2020.shape

(15203, 1)

In [8]:
generic_names_2020.sum()

GENERIC_NAME    3843862
dtype: int64

In [9]:
# Preview the data
defects_2020.head()

Unnamed: 0,DEVICE_PROBLEM_TEXT
Adverse Event Without Identified Device or Use Problem,331253
Failure to Osseointegrate,231019
Patient Device Interaction Problem,192077
Wireless Communication Problem,168761
No Device Output,163702


In [10]:
defects_2020.shape

(474, 1)

## Create table for 2021 Data
- Read the data into a pandas dataframe
- Create a dataframe for the devices using the `GENERIC_NAME` column of the dataframe, removing duplicates as needed
- Create a dataframe for the defects using the `DEFECT_PROBLEM_TEXT` column of the dataframe, removing duplicates as needed

In [11]:
import pandas as pd

data_file_2021 = "./04-Process-2021-Data/2021_data_complete.csv"

# Read the data into a pandas dataframe
data_2021 = pd.read_csv(
    data_file_2021,  # The data file being read, from the variable assignment above
    on_bad_lines="warn",  # This tells Pandas to only warn on bad lines vs causing an error
    dtype="str",
)  # This tells Pandas to treat all numbers as words

# Remove unwanted columns
unwanted_columns = [
    "MDR_REPORT_KEY",
    "MDR_TEXT_KEY",
    "TEXT_TYPE_CODE",
    "PATIENT_SEQUENCE_NUMBER",
    "DATE_REPORT",
    "FOI_TEXT",
    "DEVICE_SEQUENCE_NO",
    "BRAND_NAME",
    "MANUFACTURER_D_NAME",
    "MODEL_NUMBER",
    "DEVICE_AVAILABILITY",
    "DEVICE_REPORT_PRODUCT_CODE",
    "REPORT_NUMBER",
    "REPORT_SOURCE_CODE",
    "NUMBER_DEVICES_IN_EVENT",
    "DATE_RECEIVED",
    "INITIAL_REPORT_TO_FDA",
    "MANUFACTURER_G1_NAME",
    "REMEDIAL_ACTION",
    "EVENT_TYPE",
    "MANUFACTURER_NAME",
    "TYPE_OF_REPORT",
    "SUMMARY_REPORT",
    "NOE_SUMMARIZED",
    "UDI-DI",
    "UDI-PUBLIC",
]

data_2021.drop(unwanted_columns, axis=1, inplace=True)

In [12]:
print(f"data_2021 creation complete: {data_2021.shape}")

data_2021 creation complete: (4454884, 8)


In [13]:
# Preview the data
data_2021.head()

Unnamed: 0,DEVICE_PROBLEM_CODE,DEVICE_PROBLEM_TEXT,GENERIC_NAME,DATE_OF_EVENT,REPORTER_OCCUPATION_CODE,REPORT_DATE,EVENT_LOCATION,SOURCE_TYPE
0,1535,"Incorrect, Inadequate or Imprecise Resultor Re...",CORONAVIRUS ANTIGEN DETECTION SYSTEM,11/03/2020,100,,,0006
1,1535,"Incorrect, Inadequate or Imprecise Resultor Re...",CORONAVIRUS ANTIGEN DETECTION SYSTEM,11/03/2020,100,,,0006
2,1069,Break,"PUMP, INFUSION",,3,,I,OTHER
3,1135,Crack,"PUMP, INFUSION",,3,,I,OTHER
4,1153,Degraded,"PUMP, INFUSION",,3,,I,OTHER


In [14]:
# Create a dataframe for the devices using the `GENERIC_NAME` column of the dataframe, removing duplicates as needed
generic_names_2021 = pd.DataFrame(data_2021["GENERIC_NAME"].value_counts())

# Create a dataframe for the defects using the `DEFECT_PROBLEM_TEXT` column of the dataframe, removing duplicates as needed
defects_2021 = pd.DataFrame(data_2021["DEVICE_PROBLEM_TEXT"].value_counts())

In [15]:
# Preview the data
generic_names_2021.head(1000)

Unnamed: 0,GENERIC_NAME
CONTINUOUS GLUCOSE MONITOR,534114
"PUMP, INFUSION",526177
ENDOSSEOUS DENTAL IMPLANT,461672
"ARTIFICIAL PANCREAS DEVICE SYSTEM, SINGLE HORMONAL CONTROL",200411
FLASH GLUCOSE MONITORING SYSTEM,188914
...,...
VIDEO DUODENOSCOPE,150
INSULIN DELIVERY DEVICE,149
TEMPORARY NONROLLER TYPE LEFT HEART SUPPORT BLOOD PUMP,148
PRIMACONNEX TC TAPERED RD 4.1X10,148


In [16]:
generic_names_2021.shape

(14161, 1)

In [17]:
generic_names_2021.sum()

GENERIC_NAME    4439679
dtype: int64

In [18]:
# Preview the data
defects_2021.head()

Unnamed: 0,DEVICE_PROBLEM_TEXT
Failure to Osseointegrate,336298
Adverse Event Without Identified Device or Use Problem,314336
Break,252603
Wireless Communication Problem,228677
"Incorrect, Inadequate or Imprecise Resultor Readings",195978


In [19]:
defects_2021.shape

(476, 1)

In [20]:
report_2020_series = data_2020.groupby(["GENERIC_NAME", "DEVICE_PROBLEM_TEXT"]).size()
report_2021_series = data_2021.groupby(["GENERIC_NAME", "DEVICE_PROBLEM_TEXT"]).size()

In [21]:
report_2020_df = report_2020_series.to_frame(name="COUNT").reset_index()
report_2021_df = report_2021_series.to_frame(name="COUNT").reset_index()
report_2020_df.to_csv(f"./{working_directory}/2020_device_problem_pivot.csv")
report_2021_df.to_csv(f"./{working_directory}/2021_device_problem_pivot.csv")

In [22]:
report_2020_df

Unnamed: 0,GENERIC_NAME,DEVICE_PROBLEM_TEXT,COUNT
0,!M1 MODELONE,Battery Problem,2
1,!M1 MODEL ONE,Product Quality Problem,1
2,"""IMAGE GUIDED SURGERY SYSTEM/INSTRUMENT, STERE...",Use of Device Problem,2
3,(01)07640149388879(21)MB240047,"Application Program Freezes, Becomes Nonfuncti...",2
4,(01)07640149388879(21)MB240047,Display or Visual Feedback Problem,2
...,...,...,...
64836,¿COMPOSITE SERIES¿,Use of Device Problem,2
64837,"¿PUMP, INFUSION",Break,2
64838,"¿PUMP, INFUSION",Corroded,2
64839,"¿PUMP, INFUSION",Crack,2


In [23]:
report_2021_df

Unnamed: 0,GENERIC_NAME,DEVICE_PROBLEM_TEXT,COUNT
0,!!! POWERFLEXX,Positioning Problem,1
1,"""ISOLATOR SYNERGY"""" ENCOMPASS CLAMP AND GUIDE """,Adverse Event Without Identified Device or Use...,2
2,"""NEUROVASCULAR EMBOLIZATION DEVICE / VASCULAR ...",Adverse Event Without Identified Device or Use...,8
3,"""NEUROVASCULAR EMBOLIZATION DEVICE / VASCULAR ...",Detachment of Device or Device Component,2
4,"""NEUROVASCULAR EMBOLIZATION DEVICE / VASCULAR ...",Difficult or Delayed Separation,2
...,...,...,...
64940,¿COMPOSITE SERIES¿,Mechanics Altered,10
64941,¿COMPOSITE SERIES¿,Physical Resistance/Sticking,6
64942,¿COMPOSITE SERIES¿,Unintended Movement,66
64943,¿COMPOSITE SERIES¿,Unstable,4


In [24]:
from datetime import datetime
import pytz

# datetime object containing current date and time for the US/Pacific time zone
now = datetime.now(pytz.timezone("US/Pacific"))

# Format date and time like 2022-10-31 5:49 PM
date_time_string = now.strftime("%Y-%m-%d %I:%M %p")

print(f"{date_time_string} Notebook has completed.")

2023-03-30 01:14 PM Notebook has completed.
