# PBAC RAG Dataset

## California Independent Medical Review Dataset
https://www.kaggle.com/datasets/prasad22/ca-independent-medical-review?resource=download

### Analysis

In [67]:
import pandas as pd

df = pd.read_csv('dataset_0.csv')
df

Unnamed: 0,Reference ID,Report Year,Diagnosis Category,Diagnosis Sub Category,Treatment Category,Treatment Sub Category,Determination,Type,Age Range,Patient Gender,Findings
0,MN16-22639,2016,Infectious,Hepatitis,Pharmacy/Prescription Drugs,Anti-virals,Overturned Decision of Health Plan,Medical Necessity,41-50,Male,Nature of Statutory Criteria/Case Summary: An ...
1,MN16-22638,2016,Mental,Eating Disorder,Mental Health Treatment,Residential Treatment Center - Admission,Upheld Decision of Health Plan,Medical Necessity,21-30,Female,Nature of Statutory Criteria/Case Summary: An...
2,MN16-22637,2016,Autism Spectrum,Autism-PDD-NOS,Autism Related Treatment,Speech Therapy,Upheld Decision of Health Plan,Medical Necessity,0-10,Female,Nature of Statutory Criteria/Case Summary: Th...
3,EI16-22636,2016,Prevention/Good Health,,"Diagnostic Imaging, Screening and Testing",Mammography,Overturned Decision of Health Plan,Experimental/Investigational,65+,Female,Nature of Statutory Criteria/Case Summary: An ...
4,EI06-5319,2006,Cardiac/Circulatory,,Cardio Vascular,,Upheld Decision of Health Plan,Experimental/Investigational,51-64,Male,Physician 1: The patient is a 62-year-old male...
...,...,...,...,...,...,...,...,...,...,...,...
19240,MN01-7,2001,Trauma/Injuries,Gunshot Wound,Neurosugery,Cranioplasty,Overturned Decision of Health Plan,Medical Necessity,,,The parents of a 17-year-old male requested a ...
19241,MN01-6,2001,Infectious,Onychomycosis/ Nail Fungus,Pharmacy/Prescription Drugs,Anti-Fungal,Upheld Decision of Health Plan,Medical Necessity,,,A 46-year-old male requested Penlac lacquer fo...
19242,MN01-5,2001,Orthopedic/ Musculoskeletal,Other,Orthopedic,Arthroscopy,Upheld Decision of Health Plan,Medical Necessity,,,A 46-year-old female requested an orthoscopic ...
19243,MN01-4,2001,Orthopedic/ Musculoskeletal,Back Pain,Reconstructive/Plastic Surgery,Breast Reduction,Overturned Decision of Health Plan,Medical Necessity,,,A 24-year-old female requested a bilateral bre...


In [68]:
# Review cols and their values 
for col in df.columns:
    print(col)
    print(df[col].value_counts(), '\nDescription:\n', df[col].describe())
    print('\n\n')

Reference ID
Reference ID
MN16-22639    1
UR08-7936     1
MN08-7954     1
EI08-7955     1
MN08-7956     1
             ..
EI12-14475    1
MN12-14476    1
MN12-14477    1
EI12-14478    1
EI01-3        1
Name: count, Length: 19245, dtype: int64 
Description:
 count          19245
unique         19245
top       MN16-22639
freq               1
Name: Reference ID, dtype: object



Report Year
Report Year
2015    2082
2008    1519
2011    1445
2010    1442
2009    1441
2014    1432
2007    1343
2016    1307
2013    1222
2012    1206
2006    1082
2005     965
2004     778
2003     739
2002     691
2001     551
Name: count, dtype: int64 
Description:
 count    19245.000000
mean      2009.662977
std          4.272045
min       2001.000000
25%       2007.000000
50%       2010.000000
75%       2014.000000
max       2016.000000
Name: Report Year, dtype: float64



Diagnosis Category
Diagnosis Category
Orthopedic/ Musculoskeletal              3469
Mental                                   2512
Cance

In [69]:
# Review findings 
for row in range(0,5):
    print(df['Findings'][row], '\n')

Nature of Statutory Criteria/Case Summary: An enrollee has requested Harvoni for treatment of his hepatitis C virus genotype 1a.  Findings:  The physician reviewer found that according to the most recent joint guidelines issued by the American Association for the Study of Liver Diseases (AASLD) and the Infectious Diseases Society of America (IDSA), all patients with chronic hepatitis C should be treated except those with limited life expectancy due to non-liver-related conditions. This applies regardless of fibrosis stage or viral load, and advanced fibrosis is not required for treatment. Per guidelines, treatment-naïve genotype 1 patients should be treated with Harvoni. Patients with viral load of less than 6,000,000 IU/mL can be treated for eight weeks. These guideline recommendations are based on multiple randomized clinical trials (Kowdley, et al; Afdhal, et al). Moreover, the AASLD and IDSA guidelines cite data showing reduced survival with delayed treatment (Jezequel, et al). For

### Modifications

In [70]:
# Remove 'Others'
df = df[df['Treatment Sub Category'] != 'Other']
df = df[df['Diagnosis Sub Category'] != 'Other']

In [71]:
# Drop unused cols
df = df.drop(columns=['Determination', 'Type', 'Findings', 'Reference ID'])

In [72]:
# Assign 'Other' where gender is neither 'Male' nor 'Female'
df.loc[~df['Patient Gender'].isin(['Male', 'Female']), 'Patient Gender'] = 'Other'
print(df['Patient Gender'].unique())

['Male' 'Female' 'Other']


In [73]:
import random

age_groups = ['41-50', '21-30', '0-10' ,'65+', '51-64', '31-40', '11_20']

# Assign random age group where 'Age Range' is unknown
df.loc[~df['Age Range'].isin(age_groups), 'Age Range'] = random.choice(age_groups)

# Rename '11_20'
df.loc[df['Age Range'].isin(['11_20']), 'Age Range'] = '10-20'

print(df['Age Range'].unique())

['41-50' '21-30' '0-10' '65+' '51-64' '31-40' '10-20']


In [74]:
import numpy as np
import random
import string

# Function to generate random patient IDs
def generate_patient_id():
    # Format: PAB123456
    # Generate two random capital letters
    letters = random.choices(string.ascii_uppercase, k=2)
    # Generate five random digits
    numbers = ''.join(random.choices(string.digits, k=6))
    patient_id = ''.join(letters) + numbers
    return 'P' + patient_id

# Generate unique patient IDs within each age and gender group
def generate_patient_ids(group):
    num_patients = int(len(group) / 3)  # Assuming 5 visits per patient on average
    patient_ids = [generate_patient_id() for _ in range(num_patients)]
    np.random.shuffle(patient_ids)
    group['Patient ID'] = np.repeat(patient_ids, np.random.randint(1, 11, size=num_patients))[:len(group)]
    return group

# Group by Age Range and Patient Gender, and apply function to generate patient IDs
df = df.groupby(['Age Range', 'Patient Gender']).apply(generate_patient_ids)

# Reset index
df.reset_index(drop=True, inplace=True)

# Check the final DataFrame with patient IDs assigned
df


Unnamed: 0,Report Year,Diagnosis Category,Diagnosis Sub Category,Treatment Category,Treatment Sub Category,Age Range,Patient Gender,Patient ID
0,2016,Autism Spectrum,Autism-PDD-NOS,Autism Related Treatment,Speech Therapy,0-10,Female,PMP898219
1,2016,Autism Spectrum,Autism-PDD-NOS,Autism Related Treatment,ABA-Applied Behavioral Analysis,0-10,Female,PMP898219
2,2016,Pediatrics,Developmental Delays/ Cognitive,Durable Medical Equipment,Wheelchair,0-10,Female,PIQ716007
3,2005,Endocrine/ Metabolic,Growth Hormone Deficiency,Pharmacy/Prescription Drugs,Hormones,0-10,Female,PIQ716007
4,2016,Endocrine/ Metabolic,Growth Hormone Deficiency,Pharmacy/Prescription Drugs,Hormones,0-10,Female,PIQ716007
...,...,...,...,...,...,...,...,...
11684,2004,Genitourinary/ Kidney,Erectile Dysfunction,Pharmacy/Prescription Drugs,Treatment for Impotence,65+,Male,PNN624507
11685,2004,Cancer,Lung Cancer,Pharmacy/Prescription Drugs,Chemotherapy/ Cancer Medications,65+,Male,PNN624507
11686,2003,Cancer,Blood Cancer,Pharmacy/Prescription Drugs,Chemotherapy/ Cancer Medications,65+,Male,PNN624507
11687,2003,Central Nervous System/ Neuromuscular,Headache/Migraine,Pharmacy/Prescription Drugs,Treatment for Migraine,65+,Male,PNN624507


## Enhance Dataset

In [13]:
from faker import Faker
import random

# Instantiate Faker
fake = Faker()
fake_german = Faker('de_DE')

def fake_german_phone_number():
    # Choose the third digit randomly from 5, 6, or 7
    third_digit = random.choice(['5', '6', '7'])
    # Generate the rest of the phone number
    phone_number = '01' + third_digit + fake.numerify(text='#########')

    return phone_number.astype(str)

def fake_blood_type():
    # List of real human blood types
    real_blood_types = ['A+', 'A-', 'B+', 'B-', 'O+', 'O-', 'AB+', 'AB-']
    
    # Randomly select and return a blood type
    return random.choice(real_blood_types)

def fake_insurance_number():
    if (random.randint(1, 10) > 9):
        return 'Private'
    else:
        letter = fake.random_uppercase_letter()
        nine_digit_number = fake.random_number(digits=9, fix_len=True)
        insurance_number = f'{letter}{nine_digit_number}'
        
        return insurance_number
    
def fake_emergency_contact():
    return f'{fake.name()}, {fake_german_phone_number()}.'

def fake_date():
    start_date = pd.to_datetime('2017-01-01')
    end_date = pd.to_datetime('2018-12-31')
    return fake.date_between(start_date=start_date, end_date=end_date)

def fake_consulting_physicians():
    consulting_physicians = ['Dr. Alexandria Gaines', 'Dr. Eddie Young', 'Dr. James Barber', 'Dr. Jerry Daniels', 'Dr. Michelle Lamb', 'Dr. Shelly Hunt']
    return random.choices(consulting_physicians, k=len(df))

def fake_visit_id():
    # Format: VBX123456
    letters = random.choices(string.ascii_uppercase, k=2)
    numbers = ''.join(random.choices(string.digits, k=6))
    visit_id = ''.join(letters) + numbers
    return 'V' + visit_id

In [76]:
# Add fake values on patient level
df['Patient Blood Type'] = df.groupby('Patient ID')['Patient ID'].transform(lambda _: fake_blood_type())
df['Patient Phone'] = df.groupby('Patient ID')['Patient ID'].transform(lambda _: fake_german_phone_number())
df['Patient Insurance Number'] = df.groupby('Patient ID')['Patient ID'].transform(lambda _: fake_insurance_number())
df['Patient Emergency Contact'] = df.groupby('Patient ID')['Patient ID'].transform(lambda _: fake_emergency_contact())
df['Patient Name'] = df.groupby('Patient ID')['Patient ID'].transform(lambda _: fake.name())
df['Patient Address'] = df.groupby('Patient ID')['Patient ID'].transform(lambda _: fake_german.address())
df['Patient Occupation'] = df.groupby('Patient ID')['Patient ID'].transform(lambda _: fake.job())

df.rename(columns={'Report Year': 'Visit Date'}, inplace=True)
df['Visit Date'] = df.apply(lambda _: fake_date(), axis=1)

df.sort_values(by=["Patient ID", "Visit Date"], inplace=True)
df['New Patient'] = df.groupby('Patient ID')['Visit Date'].transform('first') == df['Visit Date']

df['Consulting Physician'] = fake_consulting_physicians()
df['Visit ID'] = df.apply(lambda _: fake_visit_id(), axis=1)
df

Unnamed: 0,Visit Date,Diagnosis Category,Diagnosis Sub Category,Treatment Category,Treatment Sub Category,Age Range,Patient Gender,Patient ID,Patient Blood Type,Patient Phone,Patient Insurance Number,Patient Emergency Contact,Patient Name,Patient Address,Patient Occupation,New Patient,Consulting Physician,Visit ID
5956,2017-04-08,Cancer,Breast Cancer,Cancer Treatment,Radiation Oncology,41-50,Female,PAA152588,A-,016953842686,W427744329,"Angela Hoffman, 016514128513.",Matthew Hill,Violetta-Jungfer-Platz 34\n75141 Staßfurt,Insurance underwriter,True,Dr. Eddie Young,VWA804129
5954,2017-04-14,Orthopedic/ Musculoskeletal,TMJ - Temporal Mandibular Joint,Pharmacy/Prescription Drugs,Botox Injection,41-50,Female,PAA152588,A-,016953842686,W427744329,"Angela Hoffman, 016514128513.",Matthew Hill,Violetta-Jungfer-Platz 34\n75141 Staßfurt,Insurance underwriter,False,Dr. Shelly Hunt,VSV329651
5958,2017-09-23,Immunologic,Rheumatoid Arthritis,Preventive Health Screening,Adult Immunization,41-50,Female,PAA152588,A-,016953842686,W427744329,"Angela Hoffman, 016514128513.",Matthew Hill,Violetta-Jungfer-Platz 34\n75141 Staßfurt,Insurance underwriter,False,Dr. Jerry Daniels,VJE335961
5955,2018-02-02,Cancer,Skin Cancer,Cancer Treatment,Chemotherapy,41-50,Female,PAA152588,A-,016953842686,W427744329,"Angela Hoffman, 016514128513.",Matthew Hill,Violetta-Jungfer-Platz 34\n75141 Staßfurt,Insurance underwriter,False,Dr. Jerry Daniels,VJP866760
5953,2018-09-06,Mental,Bipolar Disorder,Mental Health Treatment,Acute Psychiatric Facility Admission,41-50,Female,PAA152588,A-,016953842686,W427744329,"Angela Hoffman, 016514128513.",Matthew Hill,Violetta-Jungfer-Platz 34\n75141 Staßfurt,Insurance underwriter,False,Dr. Jerry Daniels,VXC886518
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5938,2017-03-18,Central Nervous System/ Neuromuscular,,Pharmacy/Prescription Drugs,Botox Injection,41-50,Female,PZZ942558,B-,016558230604,Y759513074,"Lance Anderson, 017609675353.",Martin Mccarthy,Osman-Pruschke-Weg 0\n95858 Luckau,Tour manager,True,Dr. Jerry Daniels,VEW621744
5934,2017-04-21,Orthopedic/ Musculoskeletal,,Chiropractic,Pain Management,41-50,Female,PZZ942558,B-,016558230604,Y759513074,"Lance Anderson, 017609675353.",Martin Mccarthy,Osman-Pruschke-Weg 0\n95858 Luckau,Tour manager,False,Dr. Jerry Daniels,VFZ983656
5936,2018-03-28,Central Nervous System/ Neuromuscular,Myasthenia Gravis,Pharmacy/Prescription Drugs,IVIG Therapy,41-50,Female,PZZ942558,B-,016558230604,Y759513074,"Lance Anderson, 017609675353.",Martin Mccarthy,Osman-Pruschke-Weg 0\n95858 Luckau,Tour manager,False,Dr. Alexandria Gaines,VEC374437
5935,2018-09-14,Orthopedic/ Musculoskeletal,Vertebral Disc Problem,Orthopedic,,41-50,Female,PZZ942558,B-,016558230604,Y759513074,"Lance Anderson, 017609675353.",Martin Mccarthy,Osman-Pruschke-Weg 0\n95858 Luckau,Tour manager,False,Dr. Alexandria Gaines,VGW264756


In [11]:
import pandas as pd
import numpy as np
from datetime import datetime

df['Visit Date'] = pd.to_datetime(df['Visit Date'])

# Function to pick a random age within the range, excluding the last two years
def pick_random_age(age_range):
    if age_range == '65+':
        lower, upper = 65, 100
    elif age_range == '0-10':
        lower, upper = 1, 9
    else:
        lower, upper = map(int, age_range.split('-'))
    # Exclude the last two years from the upper bound
    upper_excluded = upper - 2
    return np.random.choice(range(lower, upper_excluded))

# Apply the function to each age range to get an initial random age
df['Initial Age'] = df['Patient Age Range'].apply(pick_random_age)

# Sort by 'Patient ID' and 'Visit Date' to ensure chronological order
df.sort_values(by=['Patient ID', 'Visit Date'], inplace=True)

# Initialize an empty 'Patient Age' column
df['Patient Age'] = np.nan

# Function to update age for each visit considering the time elapsed since the first visit
def update_age(group):
    initial_age = group.iloc[0]['Initial Age']
    group['Patient Age'] = np.floor(initial_age + (group['Visit Date'] - group.iloc[0]['Visit Date']).dt.days / 365.25).astype(int)
    return group

# Apply the function to each group of visits by the same patient
df = df.groupby('Patient ID').apply(update_age)

# Drop the 'Initial Age' column as it's no longer needed
df.drop(columns=['Initial Age'], inplace=True)
df


Unnamed: 0_level_0,Unnamed: 1_level_0,Visit ID,Visit Date,Patient ID,Diagnosis Category,Diagnosis Sub Category,Treatment Category,Treatment Sub Category,New Patient,Consulting Physician,Patient Name,Patient Gender,Patient Age,Patient Age Range,Patient Blood Type,Patient Insurance Number,Patient Phone,Patient Address,Patient Occupation,Patient Emergency Contact
Patient ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
PAA152588,1538,VWA804129,2017-04-08,PAA152588,Cancer,Breast Cancer,Cancer Treatment,Radiation Oncology,True,Dr. Eddie Young,Matthew Hill,Female,43,41-50,A-,W427744329,16953842686,Violetta-Jungfer-Platz 34\n75141 Staßfurt,Insurance underwriter,"Angela Hoffman, 016514128513."
PAA152588,1630,VSV329651,2017-04-14,PAA152588,Orthopedic/ Musculoskeletal,TMJ - Temporal Mandibular Joint,Pharmacy/Prescription Drugs,Botox Injection,False,Dr. Shelly Hunt,Matthew Hill,Female,43,41-50,A-,W427744329,16953842686,Violetta-Jungfer-Platz 34\n75141 Staßfurt,Insurance underwriter,"Angela Hoffman, 016514128513."
PAA152588,4169,VJE335961,2017-09-23,PAA152588,Immunologic,Rheumatoid Arthritis,Preventive Health Screening,Adult Immunization,False,Dr. Jerry Daniels,Matthew Hill,Female,43,41-50,A-,W427744329,16953842686,Violetta-Jungfer-Platz 34\n75141 Staßfurt,Insurance underwriter,"Angela Hoffman, 016514128513."
PAA152588,6231,VJP866760,2018-02-02,PAA152588,Cancer,Skin Cancer,Cancer Treatment,Chemotherapy,False,Dr. Jerry Daniels,Matthew Hill,Female,43,41-50,A-,W427744329,16953842686,Violetta-Jungfer-Platz 34\n75141 Staßfurt,Insurance underwriter,"Angela Hoffman, 016514128513."
PAA152588,9788,VXC886518,2018-09-06,PAA152588,Mental,Bipolar Disorder,Mental Health Treatment,Acute Psychiatric Facility Admission,False,Dr. Jerry Daniels,Matthew Hill,Female,44,41-50,A-,W427744329,16953842686,Violetta-Jungfer-Platz 34\n75141 Staßfurt,Insurance underwriter,"Angela Hoffman, 016514128513."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
PZZ942558,1217,VEW621744,2017-03-18,PZZ942558,Central Nervous System/ Neuromuscular,,Pharmacy/Prescription Drugs,Botox Injection,True,Dr. Jerry Daniels,Martin Mccarthy,Female,43,41-50,B-,Y759513074,16558230604,Osman-Pruschke-Weg 0\n95858 Luckau,Tour manager,"Lance Anderson, 017609675353."
PZZ942558,1743,VFZ983656,2017-04-21,PZZ942558,Orthopedic/ Musculoskeletal,,Chiropractic,Pain Management,False,Dr. Jerry Daniels,Martin Mccarthy,Female,43,41-50,B-,Y759513074,16558230604,Osman-Pruschke-Weg 0\n95858 Luckau,Tour manager,"Lance Anderson, 017609675353."
PZZ942558,7088,VEC374437,2018-03-28,PZZ942558,Central Nervous System/ Neuromuscular,Myasthenia Gravis,Pharmacy/Prescription Drugs,IVIG Therapy,False,Dr. Alexandria Gaines,Martin Mccarthy,Female,44,41-50,B-,Y759513074,16558230604,Osman-Pruschke-Weg 0\n95858 Luckau,Tour manager,"Lance Anderson, 017609675353."
PZZ942558,9909,VGW264756,2018-09-14,PZZ942558,Orthopedic/ Musculoskeletal,Vertebral Disc Problem,Orthopedic,,False,Dr. Alexandria Gaines,Martin Mccarthy,Female,44,41-50,B-,Y759513074,16558230604,Osman-Pruschke-Weg 0\n95858 Luckau,Tour manager,"Lance Anderson, 017609675353."


In [21]:
df.reset_index(drop=True, inplace=True)
df.rename(columns={'Age Range': 'Patient Age Range'}, inplace=True)
re_order = [
    'Visit ID', 'Visit Date', 'Patient ID', 'Diagnosis Category', 
    'Diagnosis Sub Category', 'Treatment Category', 'Treatment Sub Category', 
    'New Patient', 'Consulting Physician', 'Patient Name', 'Patient Gender', 'Patient Age',
    'Patient Age Range', 'Patient Blood Type', 'Patient Insurance Number', 
    'Patient Phone', 'Patient Address', 'Patient Occupation', 'Patient Emergency Contact'
]
df = df[re_order]
df = df.sort_values(by='Visit Date').reset_index(drop=True)
df

Unnamed: 0,Visit ID,Visit Date,Patient ID,Diagnosis Category,Diagnosis Sub Category,Treatment Category,Treatment Sub Category,New Patient,Consulting Physician,Patient Name,Patient Gender,Patient Age,Patient Age Range,Patient Blood Type,Patient Insurance Number,Patient Phone,Patient Address,Patient Occupation,Patient Emergency Contact
0,VQC513203,2017-01-01,PZM508653,Chronic Pain,Vertebral Disc Problem,Pharmacy/Prescription Drugs,Non-FDA Approved Use,True,Dr. Jerry Daniels,Michelle Fisher,Female,43,41-50,O-,G264037622,017588215469,Pärtzeltweg 2\n22301 Neunburg vorm Wald,"Surveyor, quantity","Jennifer Bailey, 015680180768."
1,VVC435406,2017-01-01,PSN036517,Endocrine/ Metabolic,Hormone Deficiency,Pharmacy/Prescription Drugs,Hormones,True,Dr. Michelle Lamb,Brooke Davis,Female,36,31-40,O+,Y133547589,015087781378,Kira-Gorlitz-Allee 8\n67100 Rosenheim,Copy,"Dennis Carlson, 015182104709."
2,VKT437745,2017-01-01,PPD253419,Pediatrics,Delayed Speech,Rehabilitation Services - Outpatient,Speech Therapy,True,Dr. Michelle Lamb,Andrew Graves,Male,5,0-10,O+,F115599209,016368315207,Thiesstr. 3/5\n68745 Bremen,Claims inspector/assessor,"Veronica Harris, 015034485673."
3,VJG208744,2017-01-01,PBN488954,Central Nervous System/ Neuromuscular,,Pharmacy/Prescription Drugs,Non-FDA Approved Use,True,Dr. James Barber,Cody Carpenter,Male,52,51-64,AB+,W720918648,016217695326,Schmidtkeallee 53\n54913 Siegen,Tourism officer,"Michelle Graham, 017438105819."
4,VAF235393,2017-01-01,PJG173047,Cancer,Breast Cancer,Cancer Treatment,Surgery,True,Dr. James Barber,Christopher White,Female,36,31-40,A+,Q198981012,017645195473,Ida-Fliegner-Ring 7/6\n53118 Wolgast,Engineering geologist,"Peter Stout, 015017527431."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11684,VTL998110,2018-12-30,PND058072,Mental,Depression,Electrical/ Thermal/ Radiofreq. Interventions,Transcranial Magnetic Stimulation,False,Dr. Alexandria Gaines,Jean Rios,Male,51,51-64,A-,Private,016976645290,Wolfram-auch Schlauchin-Ring 3/3\n94816 Neuruppin,"Research officer, political party","Emily Garrett, 016593887796."
11685,VLH032189,2018-12-30,PPC895292,Cancer,Liver Cancer,Special Procedure,,False,Dr. Eddie Young,Hailey Terry,Male,62,51-64,B-,Q100759966,016394851011,Sölzerring 05\n21768 Altötting,"Engineer, petroleum","Linda Flores, 016973980632."
11686,VQZ422704,2018-12-30,PFS771409,Genetic,Chromosomal Anomalies,Pharmacy/Prescription Drugs,Hormones,False,Dr. James Barber,Pamela Mcgee MD,Female,7,0-10,A+,L605331052,016453400773,Mielcarekplatz 1/9\n83357 Melle,Scientific laboratory technician,"Michael Weiss, 017716313028."
11687,VUT814433,2018-12-30,PVV930449,Mental,,,,True,Dr. Eddie Young,Monique Stephens,Female,16,10-20,A+,G215159968,017155908480,Dussen vangasse 501\n61560 Aachen,Data processing manager,"Tammy Whitney, 017700151177."


## Export

In [22]:
df.to_csv('dataset_1.csv', index=False)