# Pivot Tables for Devices and Defects

List the devices from 2020 and 2021 along with their defects and the number of times each defect was reported.

- Use the `GENERIC_NAME` column for the device.
- Use the `DEFECT_PROBLEM_TEXT` column for the defect text.

- Use the 'complete' data from the working directories for each year
    - `./2020_reprocessed/2020_data_complete.csv`
    - `./2021_reprocessed/2020_data_complete.csv`

## Create table for 2020 Data
- Read the data into a pandas dataframe
- Create a dataframe for the devices using the `GENERIC_NAME` column of the dataframe, removing duplicates as needed
- Create a dataframe for the defects using the `DEFECT_PROBLEM_TEXT` column of the dataframe, removing duplicates as needed

In [7]:
import pandas as pd

data_file = './2020_reprocessed/2020_data_complete.csv'

# Read the data into a pandas dataframe
data = pd.read_csv(data_file,           # The data file being read, from the variable assignment above
                   on_bad_lines='warn', # This tells Pandas to only warn on bad lines vs causing an error
                   dtype = 'str')       # This tells Pandas to treat all numbers as words

# Remove unwanted columns
unwanted_columns = [
    'MDR_REPORT_KEY',
    'MDR_TEXT_KEY',
    'TEXT_TYPE_CODE',
    'PATIENT_SEQUENCE_NUMBER',
    'DATE_REPORT',
    'FOI_TEXT',
    'DEVICE_SEQUENCE_NO',
    'BRAND_NAME',
    'MANUFACTURER_D_NAME',
    'MODEL_NUMBER',
    'DEVICE_AVAILABILITY',
    'DEVICE_REPORT_PRODUCT_CODE',
    'REPORT_NUMBER',
    'REPORT_SOURCE_CODE',
    'NUMBER_DEVICES_IN_EVENT',
    'DATE_RECEIVED',
    'INITIAL_REPORT_TO_FDA',
    'MANUFACTURER_G1_NAME',
    'REMEDIAL_ACTION',
    'EVENT_TYPE',
    'MANUFACTURER_NAME',
    'TYPE_OF_REPORT',
    'SUMMARY_REPORT',
    'NOE_SUMMARIZED',
    'UDI-DI',
    'UDI-PUBLIC',
]

data.drop(unwanted_columns, axis=1, inplace=True)


In [8]:
# Preview the data
data.head()

Unnamed: 0,DEVICE_PROBLEM_CODE,DEVICE_PROBLEM_TEXT,GENERIC_NAME
0,2993,Adverse Event Without Identified Device or Use...,DEFIBRILLATION LEAD
1,2993,Adverse Event Without Identified Device or Use...,DEFIBRILLATION LEAD
2,1332,Failure to Interrogate,IMPLANTABLE CARDIOVERTER DEFIBRILLATOR
3,1332,Failure to Interrogate,IMPLANTABLE CARDIOVERTER DEFIBRILLATOR
4,1332,Failure to Interrogate,IMPLANTABLE CARDIOVERTER DEFIBRILLATOR


In [22]:
# Create a dataframe for the devices using the `GENERIC_NAME` column of the dataframe, removing duplicates as needed
generic_names = pd.DataFrame(data['GENERIC_NAME'].value_counts())

# Create a dataframe for the defects using the `DEFECT_PROBLEM_TEXT` column of the dataframe, removing duplicates as needed
defects = pd.DataFrame(data['DEVICE_PROBLEM_TEXT'].value_counts())


In [26]:
# Preview the data
generic_names.head()

Unnamed: 0,GENERIC_NAME
CONTINUOUS GLUCOSE MONITOR,496771
ENDOSSEOUS DENTAL IMPLANT,315749
"PUMP, INFUSION",312885
"ARTIFICIAL PANCREAS DEVICE SYSTEM, SINGLE HORMONAL CONTROL",269489
"PUMP, INFUSION, INSULIN, TO BE USED WITH INVASIVE GLUCOSE SENSOR",205467


In [27]:
# Preview the data
defects.head()

Unnamed: 0,DEVICE_PROBLEM_TEXT
Adverse Event Without Identified Device or Use Problem,331253
Failure to Osseointegrate,231019
Patient Device Interaction Problem,192077
Wireless Communication Problem,168761
No Device Output,163702
