**Notes:**
* This code takes the data Phase2_Phase4 filtered data created in previous step and adds more features from AACT database
* Additional features are added from these AACT tables - calculated values, eligibilities, officials and countries
* Also relevant years of clinical trials starting from 1987 and ending 2024 are only considered
* Features are also added from SLM/LLM models - 1. Pregnant Women inclusion, 2. Criteria Robustness, and 3. Human Important Rating

In [104]:
import numpy as np
from scipy import stats
import pandas as pd
import matplotlib.pyplot as plt
pd.set_option('display.max_columns', None)

In [105]:
# load the conditions to disease type to therapy area mapping
df_conditions_ta = pd.read_csv('nov_24/nov_23/df_conditions_ta.txt', sep='|')
df_conditions_ta = df_conditions_ta.drop(columns=['Unnamed: 0'])
df_conditions_ta.head()

Unnamed: 0,conditions,disease_type,new_therapy_area
0,rhinitis,Allergic Rhinitis,Respiratory
1,allergic rhinitis,Allergic Rhinitis,Respiratory
2,seasonal allergic rhinitis,Allergic Rhinitis,Respiratory
3,acute rhinosinusitis,Allergic Rhinitis,Respiratory
4,allergic rhino-conjunctivitis,Allergic Rhinitis,Respiratory


In [106]:
# load the dataset with phase 2 and phase 4 trials filtered
df_merged_1021_p2_p4 = pd.read_csv('nov_24/merged_df_1021_p2_p4filter.csv', sep='|')
df_merged_1021_p2_p4.head()

  df_merged_1021_p2_p4 = pd.read_csv('nov_24/merged_df_1021_p2_p4filter.csv', sep='|')


Unnamed: 0.2,Unnamed: 0.1,nct_id,intervention_type,description,trial_drug_cleaned,conditions,Unnamed: 0,Drug Name,Highest Status,Other Drug Names,Originator Company,Originator Company HQ,Active Companies,Active Companies HQ,Therapy Area,Active Indications,Action,Technologies,Regulatory Designations,Inactive Indications,Inactive Companies,Has Deals,Last Change Date,Added Date,First Launched Date,Extract,Drug Id,cortellis_cleaned_drug,nlm_download_date_description,study_first_submitted_date,results_first_submitted_date,disposition_first_submitted_date,last_update_submitted_date,study_first_submitted_qc_date,study_first_posted_date,study_first_posted_date_type,results_first_submitted_qc_date,results_first_posted_date,results_first_posted_date_type,disposition_first_submitted_qc_date,disposition_first_posted_date,disposition_first_posted_date_type,last_update_submitted_qc_date,last_update_posted_date,last_update_posted_date_type,start_month_year,start_date_type,start_date,verification_month_year,verification_date,completion_month_year,completion_date_type,completion_date,primary_completion_month_year,primary_completion_date_type,primary_completion_date,target_duration,study_type,acronym,baseline_population,brief_title,official_title,overall_status,last_known_status,phase,enrollment,enrollment_type,source,limitations_and_caveats,number_of_arms,number_of_groups,why_stopped,has_expanded_access,expanded_access_type_individual,expanded_access_type_intermediate,expanded_access_type_treatment,has_dmc,is_fda_regulated_drug,is_fda_regulated_device,is_unapproved_device,is_ppsd,is_us_export,biospec_retention,biospec_description,ipd_time_frame,ipd_access_criteria,ipd_url,plan_to_share_ipd,plan_to_share_ipd_description,created_at,updated_at,source_class,delayed_posting,expanded_access_nctid,expanded_access_status_for_nctid,fdaaa801_violation,baseline_type_units_analyzed,patient_registry,drug_outcome
0,0,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,4966,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success
1,1,NCT04165031,DRUG,Administered orally,erlotinib,non-small cell lung cancer,4966,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2019-11-14,2021-10-27,,2021-10-27,2019-11-14,2019-11-15,ACTUAL,2021-10-27,2021-11-24,ACTUAL,,,,2021-10-27,2021-11-24,ACTUAL,2019-11-28,ACTUAL,2019-11-28,2021-10,2021-10-31,2020-10-30,ACTUAL,2020-10-30,2020-10-30,ACTUAL,2020-10-30,,INTERVENTIONAL,,All participants who received at least one dos...,A Study of LY3499446 in Participants With Adva...,A Phase 1/2 Study of LY3499446 Administered to...,TERMINATED,,PHASE1/PHASE2,5.0,ACTUAL,Eli Lilly and Company,The study was terminated due to an unexpected ...,6.0,,The study was terminated due to an unexpected ...,f,,,,f,t,f,,,,,,Data are available 6 months after the primary ...,A research proposal must be approved by an ind...,https://vivli.org/,YES,Anonymized individual patient level data will ...,2024-08-10 03:04:24.451964,2024-08-10 03:04:24.451964,INDUSTRY,,,,,,,success
2,2,NCT02574078,DRUG,,erlotinib,non-small cell lung cancer,4966,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2015-10-09,2021-03-15,,2021-04-12,2015-10-09,2015-10-12,ESTIMATED,2021-04-12,2021-05-06,ACTUAL,,,,2021-04-12,2021-05-06,ACTUAL,2015-11-23,ACTUAL,2015-11-23,2021-04,2021-04-30,2020-04-15,ACTUAL,2020-04-15,2020-04-15,ACTUAL,2020-04-15,,INTERVENTIONAL,CheckMate370,,A Study of Nivolumab in Advanced Non-Small Cel...,A Master Protocol of Phase 1/2 Studies of Nivo...,COMPLETED,,PHASE1/PHASE2,341.0,ACTUAL,Bristol-Myers Squibb,,10.0,,,f,,,,t,,,,,,,,,,,,,2024-08-04 07:48:14.913206,2024-08-04 07:48:14.913206,INDUSTRY,,,,,,,success
3,3,NCT00840125,DRUG,150mg tablet daily,erlotinib,non-small cell lung cancer,4966,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2009-02-09,,,2012-10-29,2009-02-09,2009-02-10,ESTIMATED,,,,,,,2012-10-29,2012-10-30,ESTIMATED,2009-02,,2009-02-28,2012-10,2012-10-31,2011-08,ACTUAL,2011-08-31,2010-12,ACTUAL,2010-12-31,,INTERVENTIONAL,,,Study of Erlotinib With Docetaxel in Selected ...,A Phase II Open-Label Study Designed to Evalua...,COMPLETED,,PHASE2,4.0,ACTUAL,Meir Medical Center,,1.0,,,f,,,,t,,,,,,,,,,,,,2024-08-09 19:59:10.21038,2024-08-09 19:59:10.21038,OTHER,,,,,,,success
4,4,NCT00563784,DRUG,150 mg by mouth daily for 7 Weeks,erlotinib,non-small cell lung cancer,4966,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2007-11-21,2019-09-03,,2019-11-18,2007-11-23,2007-11-26,ESTIMATED,2019-11-18,2019-11-20,ACTUAL,,,,2019-11-18,2019-11-20,ACTUAL,2007-11,ACTUAL,2007-11-30,2019-11,2019-11-30,2018-05-30,ACTUAL,2018-05-30,2018-05-30,ACTUAL,2018-05-30,,INTERVENTIONAL,,48 patients registered and treated under the p...,TARCEVA (Erlotinib) in Combination With Chemor...,A Phase II Study of TARCEVA (Erlotinib) in Com...,COMPLETED,,PHASE2,68.0,ACTUAL,M.D. Anderson Cancer Center,,1.0,,,f,,,,f,t,f,,,,,,,,,,,2024-08-04 12:54:21.848967,2024-08-04 12:54:21.848967,OTHER,,,,,,,success


In [118]:
# merge p2_p4 filtered data with conditions-to-disease_type-to-therapyarea mapping
merged_df_1123 = pd.merge(df_merged_1021_p2_p4, df_conditions_ta, left_on="conditions", right_on="conditions", how="inner")
merged_df_1123 = merged_df_1123.drop(columns=['Unnamed: 0', 'Unnamed: 0.1'])
merged_df_1123.head()

Unnamed: 0,nct_id,intervention_type,description,trial_drug_cleaned,conditions,Drug Name,Highest Status,Other Drug Names,Originator Company,Originator Company HQ,Active Companies,Active Companies HQ,Therapy Area,Active Indications,Action,Technologies,Regulatory Designations,Inactive Indications,Inactive Companies,Has Deals,Last Change Date,Added Date,First Launched Date,Extract,Drug Id,cortellis_cleaned_drug,nlm_download_date_description,study_first_submitted_date,results_first_submitted_date,disposition_first_submitted_date,last_update_submitted_date,study_first_submitted_qc_date,study_first_posted_date,study_first_posted_date_type,results_first_submitted_qc_date,results_first_posted_date,results_first_posted_date_type,disposition_first_submitted_qc_date,disposition_first_posted_date,disposition_first_posted_date_type,last_update_submitted_qc_date,last_update_posted_date,last_update_posted_date_type,start_month_year,start_date_type,start_date,verification_month_year,verification_date,completion_month_year,completion_date_type,completion_date,primary_completion_month_year,primary_completion_date_type,primary_completion_date,target_duration,study_type,acronym,baseline_population,brief_title,official_title,overall_status,last_known_status,phase,enrollment,enrollment_type,source,limitations_and_caveats,number_of_arms,number_of_groups,why_stopped,has_expanded_access,expanded_access_type_individual,expanded_access_type_intermediate,expanded_access_type_treatment,has_dmc,is_fda_regulated_drug,is_fda_regulated_device,is_unapproved_device,is_ppsd,is_us_export,biospec_retention,biospec_description,ipd_time_frame,ipd_access_criteria,ipd_url,plan_to_share_ipd,plan_to_share_ipd_description,created_at,updated_at,source_class,delayed_posting,expanded_access_nctid,expanded_access_status_for_nctid,fdaaa801_violation,baseline_type_units_analyzed,patient_registry,drug_outcome,disease_type,new_therapy_area
0,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology
1,NCT04165031,DRUG,Administered orally,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2019-11-14,2021-10-27,,2021-10-27,2019-11-14,2019-11-15,ACTUAL,2021-10-27,2021-11-24,ACTUAL,,,,2021-10-27,2021-11-24,ACTUAL,2019-11-28,ACTUAL,2019-11-28,2021-10,2021-10-31,2020-10-30,ACTUAL,2020-10-30,2020-10-30,ACTUAL,2020-10-30,,INTERVENTIONAL,,All participants who received at least one dos...,A Study of LY3499446 in Participants With Adva...,A Phase 1/2 Study of LY3499446 Administered to...,TERMINATED,,PHASE1/PHASE2,5.0,ACTUAL,Eli Lilly and Company,The study was terminated due to an unexpected ...,6.0,,The study was terminated due to an unexpected ...,f,,,,f,t,f,,,,,,Data are available 6 months after the primary ...,A research proposal must be approved by an ind...,https://vivli.org/,YES,Anonymized individual patient level data will ...,2024-08-10 03:04:24.451964,2024-08-10 03:04:24.451964,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology
2,NCT02574078,DRUG,,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2015-10-09,2021-03-15,,2021-04-12,2015-10-09,2015-10-12,ESTIMATED,2021-04-12,2021-05-06,ACTUAL,,,,2021-04-12,2021-05-06,ACTUAL,2015-11-23,ACTUAL,2015-11-23,2021-04,2021-04-30,2020-04-15,ACTUAL,2020-04-15,2020-04-15,ACTUAL,2020-04-15,,INTERVENTIONAL,CheckMate370,,A Study of Nivolumab in Advanced Non-Small Cel...,A Master Protocol of Phase 1/2 Studies of Nivo...,COMPLETED,,PHASE1/PHASE2,341.0,ACTUAL,Bristol-Myers Squibb,,10.0,,,f,,,,t,,,,,,,,,,,,,2024-08-04 07:48:14.913206,2024-08-04 07:48:14.913206,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology
3,NCT00840125,DRUG,150mg tablet daily,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2009-02-09,,,2012-10-29,2009-02-09,2009-02-10,ESTIMATED,,,,,,,2012-10-29,2012-10-30,ESTIMATED,2009-02,,2009-02-28,2012-10,2012-10-31,2011-08,ACTUAL,2011-08-31,2010-12,ACTUAL,2010-12-31,,INTERVENTIONAL,,,Study of Erlotinib With Docetaxel in Selected ...,A Phase II Open-Label Study Designed to Evalua...,COMPLETED,,PHASE2,4.0,ACTUAL,Meir Medical Center,,1.0,,,f,,,,t,,,,,,,,,,,,,2024-08-09 19:59:10.21038,2024-08-09 19:59:10.21038,OTHER,,,,,,,success,"Lung, Non-Small Cell",Oncology
4,NCT00563784,DRUG,150 mg by mouth daily for 7 Weeks,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2007-11-21,2019-09-03,,2019-11-18,2007-11-23,2007-11-26,ESTIMATED,2019-11-18,2019-11-20,ACTUAL,,,,2019-11-18,2019-11-20,ACTUAL,2007-11,ACTUAL,2007-11-30,2019-11,2019-11-30,2018-05-30,ACTUAL,2018-05-30,2018-05-30,ACTUAL,2018-05-30,,INTERVENTIONAL,,48 patients registered and treated under the p...,TARCEVA (Erlotinib) in Combination With Chemor...,A Phase II Study of TARCEVA (Erlotinib) in Com...,COMPLETED,,PHASE2,68.0,ACTUAL,M.D. Anderson Cancer Center,,1.0,,,f,,,,f,t,f,,,,,,,,,,,2024-08-04 12:54:21.848967,2024-08-04 12:54:21.848967,OTHER,,,,,,,success,"Lung, Non-Small Cell",Oncology


In [119]:
merged_df_1123.columns.to_list()

['nct_id',
 'intervention_type',
 'description',
 'trial_drug_cleaned',
 'conditions',
 'Drug Name',
 'Highest Status',
 'Other Drug Names',
 'Originator Company',
 'Originator Company HQ',
 'Active Companies',
 'Active Companies HQ',
 'Therapy Area',
 'Active Indications',
 'Action',
 'Technologies',
 'Regulatory Designations',
 'Inactive Indications',
 'Inactive Companies',
 'Has Deals',
 'Last Change Date',
 'Added Date',
 'First Launched Date',
 'Extract',
 'Drug Id',
 'cortellis_cleaned_drug',
 'nlm_download_date_description',
 'study_first_submitted_date',
 'results_first_submitted_date',
 'disposition_first_submitted_date',
 'last_update_submitted_date',
 'study_first_submitted_qc_date',
 'study_first_posted_date',
 'study_first_posted_date_type',
 'results_first_submitted_qc_date',
 'results_first_posted_date',
 'results_first_posted_date_type',
 'disposition_first_submitted_qc_date',
 'disposition_first_posted_date',
 'disposition_first_posted_date_type',
 'last_update_submitt

In [120]:
# change the data type of start_date and primary_completion_date to datetime from string
merged_df_1123['start_date'] = pd.to_datetime(merged_df_1123['start_date'], errors='coerce')
merged_df_1123['primary_completion_date'] = pd.to_datetime(merged_df_1123['primary_completion_date'], errors='coerce')

In [121]:
# Drop rows with NaT (invalid dates)
merged_df_1123 = merged_df_1123.dropna(subset=['primary_completion_date'])
merged_df_1123 = merged_df_1123.dropna(subset=['start_date'])

In [122]:
# filter on primary completion date type as "Actual"
merged_df_1123 = merged_df_1123[merged_df_1123['primary_completion_date_type'] == 'ACTUAL']

In [123]:
merged_df_1123.shape

(10616, 99)

In [124]:
# check unique start dates
pd.DataFrame(merged_df_1123['start_date'].dt.year.unique()).sort_values(by=0)

Unnamed: 0,0
32,1987
36,1988
37,1989
34,1990
29,1991
33,1992
35,1993
27,1994
26,1995
28,1996


In [125]:
# check unique primary completion dates
pd.DataFrame(merged_df_1123['primary_completion_date'].dt.year.unique()).sort_values(by=0)

Unnamed: 0,0
29,1994
27,1996
26,1997
28,1998
25,1999
24,2000
22,2001
20,2002
21,2003
19,2004


In [126]:
# load calculated values table from AACT
df_calculated_values = pd.read_csv('b3t0zvq5n6oyhvv9h6v8qsqg4j57/calculated_values.txt', sep='|')
df_calculated_values.head()

Unnamed: 0,id,nct_id,number_of_facilities,number_of_nsae_subjects,number_of_sae_subjects,registered_in_calendar_year,nlm_download_date,actual_duration,were_results_reported,months_to_report_results,has_us_facility,has_single_facility,minimum_age_num,maximum_age_num,minimum_age_unit,maximum_age_unit,number_of_primary_outcomes_to_measure,number_of_secondary_outcomes_to_measure,number_of_other_outcomes_to_measure
0,29892566,NCT02198898,1.0,,,2013,,30.0,f,,f,t,20.0,80.0,Years,Years,1.0,,
1,29990558,NCT01492660,1.0,,,2011,,12.0,f,,f,t,18.0,80.0,Years,Years,1.0,5.0,
2,30039632,NCT00800774,1.0,,,2008,,129.0,f,,f,t,8.0,15.0,Years,Years,,,
3,29739658,NCT02415101,1.0,,,2015,,50.0,f,,f,t,18.0,80.0,Years,Years,1.0,12.0,
4,29867616,NCT01097122,1.0,,,2010,,3.0,f,,f,t,20.0,50.0,Years,Years,1.0,,


In [127]:
# merge calculated values table with p2_p4 table
merged_df_1123 = pd.merge(merged_df_1123, df_calculated_values, left_on="nct_id", right_on="nct_id", how="inner")
merged_df_1123.head()

Unnamed: 0,nct_id,intervention_type,description,trial_drug_cleaned,conditions,Drug Name,Highest Status,Other Drug Names,Originator Company,Originator Company HQ,Active Companies,Active Companies HQ,Therapy Area,Active Indications,Action,Technologies,Regulatory Designations,Inactive Indications,Inactive Companies,Has Deals,Last Change Date,Added Date,First Launched Date,Extract,Drug Id,cortellis_cleaned_drug,nlm_download_date_description,study_first_submitted_date,results_first_submitted_date,disposition_first_submitted_date,last_update_submitted_date,study_first_submitted_qc_date,study_first_posted_date,study_first_posted_date_type,results_first_submitted_qc_date,results_first_posted_date,results_first_posted_date_type,disposition_first_submitted_qc_date,disposition_first_posted_date,disposition_first_posted_date_type,last_update_submitted_qc_date,last_update_posted_date,last_update_posted_date_type,start_month_year,start_date_type,start_date,verification_month_year,verification_date,completion_month_year,completion_date_type,completion_date,primary_completion_month_year,primary_completion_date_type,primary_completion_date,target_duration,study_type,acronym,baseline_population,brief_title,official_title,overall_status,last_known_status,phase,enrollment,enrollment_type,source,limitations_and_caveats,number_of_arms,number_of_groups,why_stopped,has_expanded_access,expanded_access_type_individual,expanded_access_type_intermediate,expanded_access_type_treatment,has_dmc,is_fda_regulated_drug,is_fda_regulated_device,is_unapproved_device,is_ppsd,is_us_export,biospec_retention,biospec_description,ipd_time_frame,ipd_access_criteria,ipd_url,plan_to_share_ipd,plan_to_share_ipd_description,created_at,updated_at,source_class,delayed_posting,expanded_access_nctid,expanded_access_status_for_nctid,fdaaa801_violation,baseline_type_units_analyzed,patient_registry,drug_outcome,disease_type,new_therapy_area,id,number_of_facilities,number_of_nsae_subjects,number_of_sae_subjects,registered_in_calendar_year,nlm_download_date,actual_duration,were_results_reported,months_to_report_results,has_us_facility,has_single_facility,minimum_age_num,maximum_age_num,minimum_age_unit,maximum_age_unit,number_of_primary_outcomes_to_measure,number_of_secondary_outcomes_to_measure,number_of_other_outcomes_to_measure
0,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,
1,NCT04165031,DRUG,Administered orally,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2019-11-14,2021-10-27,,2021-10-27,2019-11-14,2019-11-15,ACTUAL,2021-10-27,2021-11-24,ACTUAL,,,,2021-10-27,2021-11-24,ACTUAL,2019-11-28,ACTUAL,2019-11-28,2021-10,2021-10-31,2020-10-30,ACTUAL,2020-10-30,2020-10-30,ACTUAL,2020-10-30,,INTERVENTIONAL,,All participants who received at least one dos...,A Study of LY3499446 in Participants With Adva...,A Phase 1/2 Study of LY3499446 Administered to...,TERMINATED,,PHASE1/PHASE2,5.0,ACTUAL,Eli Lilly and Company,The study was terminated due to an unexpected ...,6.0,,The study was terminated due to an unexpected ...,f,,,,f,t,f,,,,,,Data are available 6 months after the primary ...,A research proposal must be approved by an ind...,https://vivli.org/,YES,Anonymized individual patient level data will ...,2024-08-10 03:04:24.451964,2024-08-10 03:04:24.451964,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29905748,6.0,33.0,5.0,2019,,,t,12.0,t,f,18.0,,Years,,3.0,8.0,
2,NCT04165031,DRUG,Administered orally,ly3499446,non-small cell lung cancer,LY-3499446,Discontinued,"KRAS G12C inhibitor (cancer), Eli Lilly; LY-34...",Eli Lilly & Co,Eli Lilly & Co (US),,,Cancer,,Anticancer; K-Ras GTPase inhibitor,Oral formulation; Small molecule therapeutic,,Colorectal tumor; Non-small-cell lung cancer; ...,Eli Lilly & Co,No,2023-10-04,2019-11-22,,"Eli Lilly was developing LY-3499446, a selecti...",120133,ly3499446,,2019-11-14,2021-10-27,,2021-10-27,2019-11-14,2019-11-15,ACTUAL,2021-10-27,2021-11-24,ACTUAL,,,,2021-10-27,2021-11-24,ACTUAL,2019-11-28,ACTUAL,2019-11-28,2021-10,2021-10-31,2020-10-30,ACTUAL,2020-10-30,2020-10-30,ACTUAL,2020-10-30,,INTERVENTIONAL,,All participants who received at least one dos...,A Study of LY3499446 in Participants With Adva...,A Phase 1/2 Study of LY3499446 Administered to...,TERMINATED,,PHASE1/PHASE2,5.0,ACTUAL,Eli Lilly and Company,The study was terminated due to an unexpected ...,6.0,,The study was terminated due to an unexpected ...,f,,,,f,t,f,,,,,,Data are available 6 months after the primary ...,A research proposal must be approved by an ind...,https://vivli.org/,YES,Anonymized individual patient level data will ...,2024-08-10 03:04:24.451964,2024-08-10 03:04:24.451964,INDUSTRY,,,,,,,failure,"Lung, Non-Small Cell",Oncology,29905748,6.0,33.0,5.0,2019,,,t,12.0,t,f,18.0,,Years,,3.0,8.0,
3,NCT04165031,DRUG,Administered IV,cetuximab,colorectal cancer,cetuximab,Launched,BMS-564717; C225; EMD-271786; EMR-62202; Erbit...,Imclone Systems Inc,Imclone Systems Inc (US),Imclone Systems Inc; Merck KGaA; Merck Serono SA,Imclone Systems Inc (US); Merck KGaA (Germany)...,Cancer; Neurology/Psychiatric,Anal tumor; Cutaneous squamous cell carcinoma;...,Analgesic; Anticancer; Anticancer monoclonal a...,Biological therapeutic; Chimeric monoclonal an...,Accelerated Approval; Fast Track; Orphan Drug;...,Biliary cancer; Bladder cancer; Breast tumor; ...,Bristol-Myers Squibb Co; Inven2,Yes,2024-07-03,1996-02-16,2003-12-01,Cetuximab (Erbitux) is a human-murine chimeric...,10388,cetuximab,,2019-11-14,2021-10-27,,2021-10-27,2019-11-14,2019-11-15,ACTUAL,2021-10-27,2021-11-24,ACTUAL,,,,2021-10-27,2021-11-24,ACTUAL,2019-11-28,ACTUAL,2019-11-28,2021-10,2021-10-31,2020-10-30,ACTUAL,2020-10-30,2020-10-30,ACTUAL,2020-10-30,,INTERVENTIONAL,,All participants who received at least one dos...,A Study of LY3499446 in Participants With Adva...,A Phase 1/2 Study of LY3499446 Administered to...,TERMINATED,,PHASE1/PHASE2,5.0,ACTUAL,Eli Lilly and Company,The study was terminated due to an unexpected ...,6.0,,The study was terminated due to an unexpected ...,f,,,,f,t,f,,,,,,Data are available 6 months after the primary ...,A research proposal must be approved by an ind...,https://vivli.org/,YES,Anonymized individual patient level data will ...,2024-08-10 03:04:24.451964,2024-08-10 03:04:24.451964,INDUSTRY,,,,,,,success,Colorectal (Oncology),Oncology,29905748,6.0,33.0,5.0,2019,,,t,12.0,t,f,18.0,,Years,,3.0,8.0,
4,NCT04165031,DRUG,Administered orally,ly3499446,colorectal cancer,LY-3499446,Discontinued,"KRAS G12C inhibitor (cancer), Eli Lilly; LY-34...",Eli Lilly & Co,Eli Lilly & Co (US),,,Cancer,,Anticancer; K-Ras GTPase inhibitor,Oral formulation; Small molecule therapeutic,,Colorectal tumor; Non-small-cell lung cancer; ...,Eli Lilly & Co,No,2023-10-04,2019-11-22,,"Eli Lilly was developing LY-3499446, a selecti...",120133,ly3499446,,2019-11-14,2021-10-27,,2021-10-27,2019-11-14,2019-11-15,ACTUAL,2021-10-27,2021-11-24,ACTUAL,,,,2021-10-27,2021-11-24,ACTUAL,2019-11-28,ACTUAL,2019-11-28,2021-10,2021-10-31,2020-10-30,ACTUAL,2020-10-30,2020-10-30,ACTUAL,2020-10-30,,INTERVENTIONAL,,All participants who received at least one dos...,A Study of LY3499446 in Participants With Adva...,A Phase 1/2 Study of LY3499446 Administered to...,TERMINATED,,PHASE1/PHASE2,5.0,ACTUAL,Eli Lilly and Company,The study was terminated due to an unexpected ...,6.0,,The study was terminated due to an unexpected ...,f,,,,f,t,f,,,,,,Data are available 6 months after the primary ...,A research proposal must be approved by an ind...,https://vivli.org/,YES,Anonymized individual patient level data will ...,2024-08-10 03:04:24.451964,2024-08-10 03:04:24.451964,INDUSTRY,,,,,,,failure,Colorectal (Oncology),Oncology,29905748,6.0,33.0,5.0,2019,,,t,12.0,t,f,18.0,,Years,,3.0,8.0,


In [128]:
merged_df_1123.shape

(10616, 117)

In [129]:
# load eligibilities table from AACT
df_eligibilities = pd.read_csv('b3t0zvq5n6oyhvv9h6v8qsqg4j57/eligibilities.txt', sep='|')
df_eligibilities.head()

Unnamed: 0,id,nct_id,sampling_method,gender,minimum_age,maximum_age,healthy_volunteers,population,criteria,gender_description,gender_based,adult,child,older_adult
0,7469903,NCT04882709,,ALL,18 Years,99 Years,f,,Inclusion Criteria:~* Adults (18-99 years) wit...,,,t,f,t
1,7469904,NCT02714556,NON_PROBABILITY_SAMPLE,FEMALE,18 Years,,f,Pregnant women scheduled for elective caesarea...,Inclusion Criteria:~* Written informed consent...,,,t,f,t
2,7469905,NCT04478656,,ALL,18 Years,65 Years,t,,Inclusion Criteria:~1. Normal healthy male and...,,,t,f,t
3,7469906,NCT02448056,NON_PROBABILITY_SAMPLE,ALL,20 Years,,f,"HCC receive surgery, RFA, TACE or Sorafenib tx.","Inclusion Criteria:~* HCC, diagnosed by AASLD ...",,,t,f,t
4,7469907,NCT03516409,,ALL,6 Months,35 Months,t,,Inclusion Criteria:~* 1. Parents / legal guard...,,,f,t,f


In [130]:
# merge eligibilities table with p2_p4 table
merged_df_1123 = pd.merge(merged_df_1123, df_eligibilities, left_on="nct_id", right_on="nct_id", how="inner")
merged_df_1123.head()

Unnamed: 0,nct_id,intervention_type,description,trial_drug_cleaned,conditions,Drug Name,Highest Status,Other Drug Names,Originator Company,Originator Company HQ,Active Companies,Active Companies HQ,Therapy Area,Active Indications,Action,Technologies,Regulatory Designations,Inactive Indications,Inactive Companies,Has Deals,Last Change Date,Added Date,First Launched Date,Extract,Drug Id,cortellis_cleaned_drug,nlm_download_date_description,study_first_submitted_date,results_first_submitted_date,disposition_first_submitted_date,last_update_submitted_date,study_first_submitted_qc_date,study_first_posted_date,study_first_posted_date_type,results_first_submitted_qc_date,results_first_posted_date,results_first_posted_date_type,disposition_first_submitted_qc_date,disposition_first_posted_date,disposition_first_posted_date_type,last_update_submitted_qc_date,last_update_posted_date,last_update_posted_date_type,start_month_year,start_date_type,start_date,verification_month_year,verification_date,completion_month_year,completion_date_type,completion_date,primary_completion_month_year,primary_completion_date_type,primary_completion_date,target_duration,study_type,acronym,baseline_population,brief_title,official_title,overall_status,last_known_status,phase,enrollment,enrollment_type,source,limitations_and_caveats,number_of_arms,number_of_groups,why_stopped,has_expanded_access,expanded_access_type_individual,expanded_access_type_intermediate,expanded_access_type_treatment,has_dmc,is_fda_regulated_drug,is_fda_regulated_device,is_unapproved_device,is_ppsd,is_us_export,biospec_retention,biospec_description,ipd_time_frame,ipd_access_criteria,ipd_url,plan_to_share_ipd,plan_to_share_ipd_description,created_at,updated_at,source_class,delayed_posting,expanded_access_nctid,expanded_access_status_for_nctid,fdaaa801_violation,baseline_type_units_analyzed,patient_registry,drug_outcome,disease_type,new_therapy_area,id_x,number_of_facilities,number_of_nsae_subjects,number_of_sae_subjects,registered_in_calendar_year,nlm_download_date,actual_duration,were_results_reported,months_to_report_results,has_us_facility,has_single_facility,minimum_age_num,maximum_age_num,minimum_age_unit,maximum_age_unit,number_of_primary_outcomes_to_measure,number_of_secondary_outcomes_to_measure,number_of_other_outcomes_to_measure,id_y,sampling_method,gender,minimum_age,maximum_age,healthy_volunteers,population,criteria,gender_description,gender_based,adult,child,older_adult
0,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t
1,NCT04165031,DRUG,Administered orally,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2019-11-14,2021-10-27,,2021-10-27,2019-11-14,2019-11-15,ACTUAL,2021-10-27,2021-11-24,ACTUAL,,,,2021-10-27,2021-11-24,ACTUAL,2019-11-28,ACTUAL,2019-11-28,2021-10,2021-10-31,2020-10-30,ACTUAL,2020-10-30,2020-10-30,ACTUAL,2020-10-30,,INTERVENTIONAL,,All participants who received at least one dos...,A Study of LY3499446 in Participants With Adva...,A Phase 1/2 Study of LY3499446 Administered to...,TERMINATED,,PHASE1/PHASE2,5.0,ACTUAL,Eli Lilly and Company,The study was terminated due to an unexpected ...,6.0,,The study was terminated due to an unexpected ...,f,,,,f,t,f,,,,,,Data are available 6 months after the primary ...,A research proposal must be approved by an ind...,https://vivli.org/,YES,Anonymized individual patient level data will ...,2024-08-10 03:04:24.451964,2024-08-10 03:04:24.451964,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29905748,6.0,33.0,5.0,2019,,,t,12.0,t,f,18.0,,Years,,3.0,8.0,,7625501,,ALL,18 Years,,f,,Inclusion Criteria:~* Participants must have d...,,,t,f,t
2,NCT04165031,DRUG,Administered orally,ly3499446,non-small cell lung cancer,LY-3499446,Discontinued,"KRAS G12C inhibitor (cancer), Eli Lilly; LY-34...",Eli Lilly & Co,Eli Lilly & Co (US),,,Cancer,,Anticancer; K-Ras GTPase inhibitor,Oral formulation; Small molecule therapeutic,,Colorectal tumor; Non-small-cell lung cancer; ...,Eli Lilly & Co,No,2023-10-04,2019-11-22,,"Eli Lilly was developing LY-3499446, a selecti...",120133,ly3499446,,2019-11-14,2021-10-27,,2021-10-27,2019-11-14,2019-11-15,ACTUAL,2021-10-27,2021-11-24,ACTUAL,,,,2021-10-27,2021-11-24,ACTUAL,2019-11-28,ACTUAL,2019-11-28,2021-10,2021-10-31,2020-10-30,ACTUAL,2020-10-30,2020-10-30,ACTUAL,2020-10-30,,INTERVENTIONAL,,All participants who received at least one dos...,A Study of LY3499446 in Participants With Adva...,A Phase 1/2 Study of LY3499446 Administered to...,TERMINATED,,PHASE1/PHASE2,5.0,ACTUAL,Eli Lilly and Company,The study was terminated due to an unexpected ...,6.0,,The study was terminated due to an unexpected ...,f,,,,f,t,f,,,,,,Data are available 6 months after the primary ...,A research proposal must be approved by an ind...,https://vivli.org/,YES,Anonymized individual patient level data will ...,2024-08-10 03:04:24.451964,2024-08-10 03:04:24.451964,INDUSTRY,,,,,,,failure,"Lung, Non-Small Cell",Oncology,29905748,6.0,33.0,5.0,2019,,,t,12.0,t,f,18.0,,Years,,3.0,8.0,,7625501,,ALL,18 Years,,f,,Inclusion Criteria:~* Participants must have d...,,,t,f,t
3,NCT04165031,DRUG,Administered IV,cetuximab,colorectal cancer,cetuximab,Launched,BMS-564717; C225; EMD-271786; EMR-62202; Erbit...,Imclone Systems Inc,Imclone Systems Inc (US),Imclone Systems Inc; Merck KGaA; Merck Serono SA,Imclone Systems Inc (US); Merck KGaA (Germany)...,Cancer; Neurology/Psychiatric,Anal tumor; Cutaneous squamous cell carcinoma;...,Analgesic; Anticancer; Anticancer monoclonal a...,Biological therapeutic; Chimeric monoclonal an...,Accelerated Approval; Fast Track; Orphan Drug;...,Biliary cancer; Bladder cancer; Breast tumor; ...,Bristol-Myers Squibb Co; Inven2,Yes,2024-07-03,1996-02-16,2003-12-01,Cetuximab (Erbitux) is a human-murine chimeric...,10388,cetuximab,,2019-11-14,2021-10-27,,2021-10-27,2019-11-14,2019-11-15,ACTUAL,2021-10-27,2021-11-24,ACTUAL,,,,2021-10-27,2021-11-24,ACTUAL,2019-11-28,ACTUAL,2019-11-28,2021-10,2021-10-31,2020-10-30,ACTUAL,2020-10-30,2020-10-30,ACTUAL,2020-10-30,,INTERVENTIONAL,,All participants who received at least one dos...,A Study of LY3499446 in Participants With Adva...,A Phase 1/2 Study of LY3499446 Administered to...,TERMINATED,,PHASE1/PHASE2,5.0,ACTUAL,Eli Lilly and Company,The study was terminated due to an unexpected ...,6.0,,The study was terminated due to an unexpected ...,f,,,,f,t,f,,,,,,Data are available 6 months after the primary ...,A research proposal must be approved by an ind...,https://vivli.org/,YES,Anonymized individual patient level data will ...,2024-08-10 03:04:24.451964,2024-08-10 03:04:24.451964,INDUSTRY,,,,,,,success,Colorectal (Oncology),Oncology,29905748,6.0,33.0,5.0,2019,,,t,12.0,t,f,18.0,,Years,,3.0,8.0,,7625501,,ALL,18 Years,,f,,Inclusion Criteria:~* Participants must have d...,,,t,f,t
4,NCT04165031,DRUG,Administered orally,ly3499446,colorectal cancer,LY-3499446,Discontinued,"KRAS G12C inhibitor (cancer), Eli Lilly; LY-34...",Eli Lilly & Co,Eli Lilly & Co (US),,,Cancer,,Anticancer; K-Ras GTPase inhibitor,Oral formulation; Small molecule therapeutic,,Colorectal tumor; Non-small-cell lung cancer; ...,Eli Lilly & Co,No,2023-10-04,2019-11-22,,"Eli Lilly was developing LY-3499446, a selecti...",120133,ly3499446,,2019-11-14,2021-10-27,,2021-10-27,2019-11-14,2019-11-15,ACTUAL,2021-10-27,2021-11-24,ACTUAL,,,,2021-10-27,2021-11-24,ACTUAL,2019-11-28,ACTUAL,2019-11-28,2021-10,2021-10-31,2020-10-30,ACTUAL,2020-10-30,2020-10-30,ACTUAL,2020-10-30,,INTERVENTIONAL,,All participants who received at least one dos...,A Study of LY3499446 in Participants With Adva...,A Phase 1/2 Study of LY3499446 Administered to...,TERMINATED,,PHASE1/PHASE2,5.0,ACTUAL,Eli Lilly and Company,The study was terminated due to an unexpected ...,6.0,,The study was terminated due to an unexpected ...,f,,,,f,t,f,,,,,,Data are available 6 months after the primary ...,A research proposal must be approved by an ind...,https://vivli.org/,YES,Anonymized individual patient level data will ...,2024-08-10 03:04:24.451964,2024-08-10 03:04:24.451964,INDUSTRY,,,,,,,failure,Colorectal (Oncology),Oncology,29905748,6.0,33.0,5.0,2019,,,t,12.0,t,f,18.0,,Years,,3.0,8.0,,7625501,,ALL,18 Years,,f,,Inclusion Criteria:~* Participants must have d...,,,t,f,t


In [131]:
merged_df_1123.shape

(10616, 130)

In [132]:
# load officials table from AACT
df_overall_officials = pd.read_csv('b3t0zvq5n6oyhvv9h6v8qsqg4j57/overall_officials.txt', sep='|')
df_overall_officials

Unnamed: 0,id,nct_id,role,name,affiliation
0,6906313,NCT01905826,PRINCIPAL_INVESTIGATOR,"Steven M Holland, M.D.",National Institute of Allergy and Infectious D...
1,6906314,NCT06456398,PRINCIPAL_INVESTIGATOR,"Xu-Heng Chiang, MD",National Taiwan University Hospital
2,6906315,NCT06324253,PRINCIPAL_INVESTIGATOR,"AMAL G SAFAN, MD",Menoufia University
3,6906316,NCT06059846,STUDY_DIRECTOR,"Kamal Hamed, MD",Spero Therapeutics
4,6906317,NCT04571242,STUDY_CHAIR,"Ricardo Vallejo, MD",SGX Medical
...,...,...,...,...,...
477338,6811038,NCT01608191,PRINCIPAL_INVESTIGATOR,Jan Karlsson,"Universitetssjukvårdens forskningscentrum, UFC..."
477339,6811039,NCT00039299,STUDY_CHAIR,"Allan Pantuck, MD",Jonsson Comprehensive Cancer Center
477340,6811040,NCT03258606,PRINCIPAL_INVESTIGATOR,"Marion Danis, M.D.",National Institutes of Health Clinical Center ...
477341,6811041,NCT02938156,PRINCIPAL_INVESTIGATOR,Pratima Chowdary,Royal Free Hospitals NHS Foundation Trust


In [133]:
# merge officials table with p2_p4 table
merged_df_1123 = pd.merge(merged_df_1123, df_overall_officials, left_on="nct_id", right_on="nct_id", how="inner")
merged_df_1123.head()

Unnamed: 0,nct_id,intervention_type,description,trial_drug_cleaned,conditions,Drug Name,Highest Status,Other Drug Names,Originator Company,Originator Company HQ,Active Companies,Active Companies HQ,Therapy Area,Active Indications,Action,Technologies,Regulatory Designations,Inactive Indications,Inactive Companies,Has Deals,Last Change Date,Added Date,First Launched Date,Extract,Drug Id,cortellis_cleaned_drug,nlm_download_date_description,study_first_submitted_date,results_first_submitted_date,disposition_first_submitted_date,last_update_submitted_date,study_first_submitted_qc_date,study_first_posted_date,study_first_posted_date_type,results_first_submitted_qc_date,results_first_posted_date,results_first_posted_date_type,disposition_first_submitted_qc_date,disposition_first_posted_date,disposition_first_posted_date_type,last_update_submitted_qc_date,last_update_posted_date,last_update_posted_date_type,start_month_year,start_date_type,start_date,verification_month_year,verification_date,completion_month_year,completion_date_type,completion_date,primary_completion_month_year,primary_completion_date_type,primary_completion_date,target_duration,study_type,acronym,baseline_population,brief_title,official_title,overall_status,last_known_status,phase,enrollment,enrollment_type,source,limitations_and_caveats,number_of_arms,number_of_groups,why_stopped,has_expanded_access,expanded_access_type_individual,expanded_access_type_intermediate,expanded_access_type_treatment,has_dmc,is_fda_regulated_drug,is_fda_regulated_device,is_unapproved_device,is_ppsd,is_us_export,biospec_retention,biospec_description,ipd_time_frame,ipd_access_criteria,ipd_url,plan_to_share_ipd,plan_to_share_ipd_description,created_at,updated_at,source_class,delayed_posting,expanded_access_nctid,expanded_access_status_for_nctid,fdaaa801_violation,baseline_type_units_analyzed,patient_registry,drug_outcome,disease_type,new_therapy_area,id_x,number_of_facilities,number_of_nsae_subjects,number_of_sae_subjects,registered_in_calendar_year,nlm_download_date,actual_duration,were_results_reported,months_to_report_results,has_us_facility,has_single_facility,minimum_age_num,maximum_age_num,minimum_age_unit,maximum_age_unit,number_of_primary_outcomes_to_measure,number_of_secondary_outcomes_to_measure,number_of_other_outcomes_to_measure,id_y,sampling_method,gender,minimum_age,maximum_age,healthy_volunteers,population,criteria,gender_description,gender_based,adult,child,older_adult,id,role,name,affiliation
0,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,6937223,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc"
1,NCT04165031,DRUG,Administered orally,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2019-11-14,2021-10-27,,2021-10-27,2019-11-14,2019-11-15,ACTUAL,2021-10-27,2021-11-24,ACTUAL,,,,2021-10-27,2021-11-24,ACTUAL,2019-11-28,ACTUAL,2019-11-28,2021-10,2021-10-31,2020-10-30,ACTUAL,2020-10-30,2020-10-30,ACTUAL,2020-10-30,,INTERVENTIONAL,,All participants who received at least one dos...,A Study of LY3499446 in Participants With Adva...,A Phase 1/2 Study of LY3499446 Administered to...,TERMINATED,,PHASE1/PHASE2,5.0,ACTUAL,Eli Lilly and Company,The study was terminated due to an unexpected ...,6.0,,The study was terminated due to an unexpected ...,f,,,,f,t,f,,,,,,Data are available 6 months after the primary ...,A research proposal must be approved by an ind...,https://vivli.org/,YES,Anonymized individual patient level data will ...,2024-08-10 03:04:24.451964,2024-08-10 03:04:24.451964,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29905748,6.0,33.0,5.0,2019,,,t,12.0,t,f,18.0,,Years,,3.0,8.0,,7625501,,ALL,18 Years,,f,,Inclusion Criteria:~* Participants must have d...,,,t,f,t,7228546,STUDY_DIRECTOR,Call 1-877-CTLILLY (1-877-285-4559) or 1-317-6...,Eli Lilly and Company
2,NCT04165031,DRUG,Administered orally,ly3499446,non-small cell lung cancer,LY-3499446,Discontinued,"KRAS G12C inhibitor (cancer), Eli Lilly; LY-34...",Eli Lilly & Co,Eli Lilly & Co (US),,,Cancer,,Anticancer; K-Ras GTPase inhibitor,Oral formulation; Small molecule therapeutic,,Colorectal tumor; Non-small-cell lung cancer; ...,Eli Lilly & Co,No,2023-10-04,2019-11-22,,"Eli Lilly was developing LY-3499446, a selecti...",120133,ly3499446,,2019-11-14,2021-10-27,,2021-10-27,2019-11-14,2019-11-15,ACTUAL,2021-10-27,2021-11-24,ACTUAL,,,,2021-10-27,2021-11-24,ACTUAL,2019-11-28,ACTUAL,2019-11-28,2021-10,2021-10-31,2020-10-30,ACTUAL,2020-10-30,2020-10-30,ACTUAL,2020-10-30,,INTERVENTIONAL,,All participants who received at least one dos...,A Study of LY3499446 in Participants With Adva...,A Phase 1/2 Study of LY3499446 Administered to...,TERMINATED,,PHASE1/PHASE2,5.0,ACTUAL,Eli Lilly and Company,The study was terminated due to an unexpected ...,6.0,,The study was terminated due to an unexpected ...,f,,,,f,t,f,,,,,,Data are available 6 months after the primary ...,A research proposal must be approved by an ind...,https://vivli.org/,YES,Anonymized individual patient level data will ...,2024-08-10 03:04:24.451964,2024-08-10 03:04:24.451964,INDUSTRY,,,,,,,failure,"Lung, Non-Small Cell",Oncology,29905748,6.0,33.0,5.0,2019,,,t,12.0,t,f,18.0,,Years,,3.0,8.0,,7625501,,ALL,18 Years,,f,,Inclusion Criteria:~* Participants must have d...,,,t,f,t,7228546,STUDY_DIRECTOR,Call 1-877-CTLILLY (1-877-285-4559) or 1-317-6...,Eli Lilly and Company
3,NCT04165031,DRUG,Administered IV,cetuximab,colorectal cancer,cetuximab,Launched,BMS-564717; C225; EMD-271786; EMR-62202; Erbit...,Imclone Systems Inc,Imclone Systems Inc (US),Imclone Systems Inc; Merck KGaA; Merck Serono SA,Imclone Systems Inc (US); Merck KGaA (Germany)...,Cancer; Neurology/Psychiatric,Anal tumor; Cutaneous squamous cell carcinoma;...,Analgesic; Anticancer; Anticancer monoclonal a...,Biological therapeutic; Chimeric monoclonal an...,Accelerated Approval; Fast Track; Orphan Drug;...,Biliary cancer; Bladder cancer; Breast tumor; ...,Bristol-Myers Squibb Co; Inven2,Yes,2024-07-03,1996-02-16,2003-12-01,Cetuximab (Erbitux) is a human-murine chimeric...,10388,cetuximab,,2019-11-14,2021-10-27,,2021-10-27,2019-11-14,2019-11-15,ACTUAL,2021-10-27,2021-11-24,ACTUAL,,,,2021-10-27,2021-11-24,ACTUAL,2019-11-28,ACTUAL,2019-11-28,2021-10,2021-10-31,2020-10-30,ACTUAL,2020-10-30,2020-10-30,ACTUAL,2020-10-30,,INTERVENTIONAL,,All participants who received at least one dos...,A Study of LY3499446 in Participants With Adva...,A Phase 1/2 Study of LY3499446 Administered to...,TERMINATED,,PHASE1/PHASE2,5.0,ACTUAL,Eli Lilly and Company,The study was terminated due to an unexpected ...,6.0,,The study was terminated due to an unexpected ...,f,,,,f,t,f,,,,,,Data are available 6 months after the primary ...,A research proposal must be approved by an ind...,https://vivli.org/,YES,Anonymized individual patient level data will ...,2024-08-10 03:04:24.451964,2024-08-10 03:04:24.451964,INDUSTRY,,,,,,,success,Colorectal (Oncology),Oncology,29905748,6.0,33.0,5.0,2019,,,t,12.0,t,f,18.0,,Years,,3.0,8.0,,7625501,,ALL,18 Years,,f,,Inclusion Criteria:~* Participants must have d...,,,t,f,t,7228546,STUDY_DIRECTOR,Call 1-877-CTLILLY (1-877-285-4559) or 1-317-6...,Eli Lilly and Company
4,NCT04165031,DRUG,Administered orally,ly3499446,colorectal cancer,LY-3499446,Discontinued,"KRAS G12C inhibitor (cancer), Eli Lilly; LY-34...",Eli Lilly & Co,Eli Lilly & Co (US),,,Cancer,,Anticancer; K-Ras GTPase inhibitor,Oral formulation; Small molecule therapeutic,,Colorectal tumor; Non-small-cell lung cancer; ...,Eli Lilly & Co,No,2023-10-04,2019-11-22,,"Eli Lilly was developing LY-3499446, a selecti...",120133,ly3499446,,2019-11-14,2021-10-27,,2021-10-27,2019-11-14,2019-11-15,ACTUAL,2021-10-27,2021-11-24,ACTUAL,,,,2021-10-27,2021-11-24,ACTUAL,2019-11-28,ACTUAL,2019-11-28,2021-10,2021-10-31,2020-10-30,ACTUAL,2020-10-30,2020-10-30,ACTUAL,2020-10-30,,INTERVENTIONAL,,All participants who received at least one dos...,A Study of LY3499446 in Participants With Adva...,A Phase 1/2 Study of LY3499446 Administered to...,TERMINATED,,PHASE1/PHASE2,5.0,ACTUAL,Eli Lilly and Company,The study was terminated due to an unexpected ...,6.0,,The study was terminated due to an unexpected ...,f,,,,f,t,f,,,,,,Data are available 6 months after the primary ...,A research proposal must be approved by an ind...,https://vivli.org/,YES,Anonymized individual patient level data will ...,2024-08-10 03:04:24.451964,2024-08-10 03:04:24.451964,INDUSTRY,,,,,,,failure,Colorectal (Oncology),Oncology,29905748,6.0,33.0,5.0,2019,,,t,12.0,t,f,18.0,,Years,,3.0,8.0,,7625501,,ALL,18 Years,,f,,Inclusion Criteria:~* Participants must have d...,,,t,f,t,7228546,STUDY_DIRECTOR,Call 1-877-CTLILLY (1-877-285-4559) or 1-317-6...,Eli Lilly and Company


In [134]:
merged_df_1123.shape

(11196, 134)

In [135]:
# rename few columns and drop extra id column
merged_df_1123 = merged_df_1123.rename(columns={'name': 'official_name'})
merged_df_1123 = merged_df_1123.rename(columns={'affiliation': 'official_affiliation'})
merged_df_1123 = merged_df_1123.rename(columns={'role': 'official_role'})
merged_df_1123 = merged_df_1123.drop(columns=['id'])

In [136]:
# load countries table from AACT
df_countries = pd.read_csv('b3t0zvq5n6oyhvv9h6v8qsqg4j57/countries.txt', sep='|')
df_countries

Unnamed: 0,id,nct_id,name,removed
0,10471681,NCT00397605,United Kingdom,t
1,10471682,NCT03739710,Australia,t
2,10471683,NCT03739710,Denmark,t
3,10471684,NCT02936102,Germany,t
4,10471685,NCT03712202,Canada,t
...,...,...,...,...
696288,10054057,NCT00229307,Israel,f
696289,10054058,NCT02575209,Spain,f
696290,10054059,NCT04520009,United States,f
696291,10054060,NCT01974609,United States,f


In [137]:
# merge countries table with p2_p4 table
merged_df_1123 = pd.merge(merged_df_1123, df_countries, left_on="nct_id", right_on="nct_id", how="inner")
merged_df_1123.head()

Unnamed: 0,nct_id,intervention_type,description,trial_drug_cleaned,conditions,Drug Name,Highest Status,Other Drug Names,Originator Company,Originator Company HQ,Active Companies,Active Companies HQ,Therapy Area,Active Indications,Action,Technologies,Regulatory Designations,Inactive Indications,Inactive Companies,Has Deals,Last Change Date,Added Date,First Launched Date,Extract,Drug Id,cortellis_cleaned_drug,nlm_download_date_description,study_first_submitted_date,results_first_submitted_date,disposition_first_submitted_date,last_update_submitted_date,study_first_submitted_qc_date,study_first_posted_date,study_first_posted_date_type,results_first_submitted_qc_date,results_first_posted_date,results_first_posted_date_type,disposition_first_submitted_qc_date,disposition_first_posted_date,disposition_first_posted_date_type,last_update_submitted_qc_date,last_update_posted_date,last_update_posted_date_type,start_month_year,start_date_type,start_date,verification_month_year,verification_date,completion_month_year,completion_date_type,completion_date,primary_completion_month_year,primary_completion_date_type,primary_completion_date,target_duration,study_type,acronym,baseline_population,brief_title,official_title,overall_status,last_known_status,phase,enrollment,enrollment_type,source,limitations_and_caveats,number_of_arms,number_of_groups,why_stopped,has_expanded_access,expanded_access_type_individual,expanded_access_type_intermediate,expanded_access_type_treatment,has_dmc,is_fda_regulated_drug,is_fda_regulated_device,is_unapproved_device,is_ppsd,is_us_export,biospec_retention,biospec_description,ipd_time_frame,ipd_access_criteria,ipd_url,plan_to_share_ipd,plan_to_share_ipd_description,created_at,updated_at,source_class,delayed_posting,expanded_access_nctid,expanded_access_status_for_nctid,fdaaa801_violation,baseline_type_units_analyzed,patient_registry,drug_outcome,disease_type,new_therapy_area,id_x,number_of_facilities,number_of_nsae_subjects,number_of_sae_subjects,registered_in_calendar_year,nlm_download_date,actual_duration,were_results_reported,months_to_report_results,has_us_facility,has_single_facility,minimum_age_num,maximum_age_num,minimum_age_unit,maximum_age_unit,number_of_primary_outcomes_to_measure,number_of_secondary_outcomes_to_measure,number_of_other_outcomes_to_measure,id_y,sampling_method,gender,minimum_age,maximum_age,healthy_volunteers,population,criteria,gender_description,gender_based,adult,child,older_adult,official_role,official_name,official_affiliation,id,name,removed
0,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138299,Czech Republic,t
1,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138896,United States,f
2,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138897,Argentina,f
3,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138898,Brazil,f
4,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138899,Czechia,f


In [138]:
merged_df_1123.shape

(24959, 136)

In [139]:
# rename few country columns
merged_df_1123 = merged_df_1123.rename(columns={'name': 'trial_country'})
merged_df_1123 = merged_df_1123.rename(columns={'removed': 'country_removed'})

In [140]:
merged_df_1123.columns.to_list()

['nct_id',
 'intervention_type',
 'description',
 'trial_drug_cleaned',
 'conditions',
 'Drug Name',
 'Highest Status',
 'Other Drug Names',
 'Originator Company',
 'Originator Company HQ',
 'Active Companies',
 'Active Companies HQ',
 'Therapy Area',
 'Active Indications',
 'Action',
 'Technologies',
 'Regulatory Designations',
 'Inactive Indications',
 'Inactive Companies',
 'Has Deals',
 'Last Change Date',
 'Added Date',
 'First Launched Date',
 'Extract',
 'Drug Id',
 'cortellis_cleaned_drug',
 'nlm_download_date_description',
 'study_first_submitted_date',
 'results_first_submitted_date',
 'disposition_first_submitted_date',
 'last_update_submitted_date',
 'study_first_submitted_qc_date',
 'study_first_posted_date',
 'study_first_posted_date_type',
 'results_first_submitted_qc_date',
 'results_first_posted_date',
 'results_first_posted_date_type',
 'disposition_first_submitted_qc_date',
 'disposition_first_posted_date',
 'disposition_first_posted_date_type',
 'last_update_submitt

In [148]:
# load the features from SLM/LLM 
df_llm_features = pd.read_csv('nov_24/nov_23/Clincal_data_Language_Model_MAPPINGS.csv', sep=',')
df_llm_features.head()

Unnamed: 0,nct_id,id,Drug Name,conditions,Therapy Area,Model_domenicrosati_ratings,Spacy Pregnant Women Excluded,LLama3_2_Criteria_Robustness,LLM_GBT_4o_Human_Importance_Ratings,description,criteria
0,NCT00000187,24095864,ritanserin,cocaine-related disorders,Toxicity/Intoxication; Neurology/Psychiatric,0.625245,No,0,0,The purpose of this study is to assess ritanse...,Please contact site for information.
1,NCT00000200,23965890,methadone,cocaine-related disorders,Neurology/Psychiatric,0.66299,No,0,1,The purpose of this study is to compare the ef...,Please contact site for information.
2,NCT00000395,24006623,methotrexate,rheumatoid arthritis,Immune; Dermatologic; Gastrointestinal; Ocular,0.564219,No,0,2,This study looks at how the arthritis drug met...,Inclusion Criteria:~* Individuals starting met...
3,NCT00000451,23799489,sertraline,alcoholism,Neurology/Psychiatric; Genitourinary/Sexual Fu...,0.575893,Yes,2,1,This study will assess the ability of naltrexo...,Inclusion Criteria:~* Alaska Native having bio...
4,NCT00001723,24120128,orlistat,obesity,Other/Miscellaneous,0.663715,Yes,2,2,Obesity is a condition affecting one-third off...,* INCLUSION CRITERIA:~Good general health. Ind...


In [160]:
# encode pregnant women exclusion
df_llm_features['Spacy Pregnant Women Excluded'] = df_llm_features['Spacy Pregnant Women Excluded'].replace({'Yes': 1, 'No': 0})

In [167]:
# group by nct_id, drug_name and conditions
LLama3_2_Criteria_Robustness = df_llm_features.groupby(['Drug Name','nct_id','conditions'])['LLama3_2_Criteria_Robustness'].max().reset_index(name='LLama3_2_Criteria_Robustness')

In [168]:
# merge llm_features with main df
merged_df_1123 = pd.merge(merged_df_1123, LLama3_2_Criteria_Robustness, on=['Drug Name','conditions','nct_id'], how="left")
merged_df_1123.head()

Unnamed: 0,nct_id,intervention_type,description,trial_drug_cleaned,conditions,Drug Name,Highest Status,Other Drug Names,Originator Company,Originator Company HQ,Active Companies,Active Companies HQ,Therapy Area,Active Indications,Action,Technologies,Regulatory Designations,Inactive Indications,Inactive Companies,Has Deals,Last Change Date,Added Date,First Launched Date,Extract,Drug Id,cortellis_cleaned_drug,nlm_download_date_description,study_first_submitted_date,results_first_submitted_date,disposition_first_submitted_date,last_update_submitted_date,study_first_submitted_qc_date,study_first_posted_date,study_first_posted_date_type,results_first_submitted_qc_date,results_first_posted_date,results_first_posted_date_type,disposition_first_submitted_qc_date,disposition_first_posted_date,disposition_first_posted_date_type,last_update_submitted_qc_date,last_update_posted_date,last_update_posted_date_type,start_month_year,start_date_type,start_date,verification_month_year,verification_date,completion_month_year,completion_date_type,completion_date,primary_completion_month_year,primary_completion_date_type,primary_completion_date,target_duration,study_type,acronym,baseline_population,brief_title,official_title,overall_status,last_known_status,phase,enrollment,enrollment_type,source,limitations_and_caveats,number_of_arms,number_of_groups,why_stopped,has_expanded_access,expanded_access_type_individual,expanded_access_type_intermediate,expanded_access_type_treatment,has_dmc,is_fda_regulated_drug,is_fda_regulated_device,is_unapproved_device,is_ppsd,is_us_export,biospec_retention,biospec_description,ipd_time_frame,ipd_access_criteria,ipd_url,plan_to_share_ipd,plan_to_share_ipd_description,created_at,updated_at,source_class,delayed_posting,expanded_access_nctid,expanded_access_status_for_nctid,fdaaa801_violation,baseline_type_units_analyzed,patient_registry,drug_outcome,disease_type,new_therapy_area,id_x,number_of_facilities,number_of_nsae_subjects,number_of_sae_subjects,registered_in_calendar_year,nlm_download_date,actual_duration,were_results_reported,months_to_report_results,has_us_facility,has_single_facility,minimum_age_num,maximum_age_num,minimum_age_unit,maximum_age_unit,number_of_primary_outcomes_to_measure,number_of_secondary_outcomes_to_measure,number_of_other_outcomes_to_measure,id_y,sampling_method,gender,minimum_age,maximum_age,healthy_volunteers,population,criteria,gender_description,gender_based,adult,child,older_adult,official_role,official_name,official_affiliation,id,trial_country,country_removed,LLM_GBT_4o_Human_Importance_Ratings,LLama3_2_Criteria_Robustness
0,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138299,Czech Republic,t,1.0,2.0
1,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138896,United States,f,1.0,2.0
2,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138897,Argentina,f,1.0,2.0
3,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138898,Brazil,f,1.0,2.0
4,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138899,Czechia,f,1.0,2.0


In [169]:
# replace NaN values in LLama3_2_Criteria_Robustness with 0
merged_df_1123['LLama3_2_Criteria_Robustness'] = merged_df_1123_v2['LLama3_2_Criteria_Robustness'].fillna(0)

In [162]:
# group by nct_id, drug_name and conditions
LLM_GBT_4o_Human_Importance_Ratings = df_llm_features.groupby(['Drug Name','nct_id','conditions'])['LLM_GBT_4o_Human_Importance_Ratings'].max().reset_index(name='LLM_GBT_4o_Human_Importance_Ratings')

In [163]:
# merge llm_features with main df
merged_df_1123 = pd.merge(merged_df_1123, LLM_GBT_4o_Human_Importance_Ratings, on=['Drug Name','conditions','nct_id'], how="left")
merged_df_1123.head()

Unnamed: 0,nct_id,intervention_type,description,trial_drug_cleaned,conditions,Drug Name,Highest Status,Other Drug Names,Originator Company,Originator Company HQ,Active Companies,Active Companies HQ,Therapy Area,Active Indications,Action,Technologies,Regulatory Designations,Inactive Indications,Inactive Companies,Has Deals,Last Change Date,Added Date,First Launched Date,Extract,Drug Id,cortellis_cleaned_drug,nlm_download_date_description,study_first_submitted_date,results_first_submitted_date,disposition_first_submitted_date,last_update_submitted_date,study_first_submitted_qc_date,study_first_posted_date,study_first_posted_date_type,results_first_submitted_qc_date,results_first_posted_date,results_first_posted_date_type,disposition_first_submitted_qc_date,disposition_first_posted_date,disposition_first_posted_date_type,last_update_submitted_qc_date,last_update_posted_date,last_update_posted_date_type,start_month_year,start_date_type,start_date,verification_month_year,verification_date,completion_month_year,completion_date_type,completion_date,primary_completion_month_year,primary_completion_date_type,primary_completion_date,target_duration,study_type,acronym,baseline_population,brief_title,official_title,overall_status,last_known_status,phase,enrollment,enrollment_type,source,limitations_and_caveats,number_of_arms,number_of_groups,why_stopped,has_expanded_access,expanded_access_type_individual,expanded_access_type_intermediate,expanded_access_type_treatment,has_dmc,is_fda_regulated_drug,is_fda_regulated_device,is_unapproved_device,is_ppsd,is_us_export,biospec_retention,biospec_description,ipd_time_frame,ipd_access_criteria,ipd_url,plan_to_share_ipd,plan_to_share_ipd_description,created_at,updated_at,source_class,delayed_posting,expanded_access_nctid,expanded_access_status_for_nctid,fdaaa801_violation,baseline_type_units_analyzed,patient_registry,drug_outcome,disease_type,new_therapy_area,id_x,number_of_facilities,number_of_nsae_subjects,number_of_sae_subjects,registered_in_calendar_year,nlm_download_date,actual_duration,were_results_reported,months_to_report_results,has_us_facility,has_single_facility,minimum_age_num,maximum_age_num,minimum_age_unit,maximum_age_unit,number_of_primary_outcomes_to_measure,number_of_secondary_outcomes_to_measure,number_of_other_outcomes_to_measure,id_y,sampling_method,gender,minimum_age,maximum_age,healthy_volunteers,population,criteria,gender_description,gender_based,adult,child,older_adult,official_role,official_name,official_affiliation,id,trial_country,country_removed,LLM_GBT_4o_Human_Importance_Ratings
0,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138299,Czech Republic,t,1.0
1,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138896,United States,f,1.0
2,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138897,Argentina,f,1.0
3,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138898,Brazil,f,1.0
4,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138899,Czechia,f,1.0


In [164]:
# replace NaN values in LLama3_2_Criteria_Robustness with 0
merged_df_1123['LLM_GBT_4o_Human_Importance_Ratings'] = merged_df_1123['LLM_GBT_4o_Human_Importance_Ratings'].fillna(0)

In [172]:
# group by nct_id, drug_name and conditions
Spacy_Pregnant_Women_Excluded = df_llm_features.groupby(['Drug Name','nct_id','conditions'])['Spacy Pregnant Women Excluded'].max().reset_index(name='Spacy_Pregnant_Women_Excluded')

In [173]:
# merge Spacy_Pregnant_Women_Excluded with main df
merged_df_1123 = pd.merge(merged_df_1123, Spacy_Pregnant_Women_Excluded, on=['Drug Name','conditions','nct_id'], how="left")
merged_df_1123.head()

Unnamed: 0,nct_id,intervention_type,description,trial_drug_cleaned,conditions,Drug Name,Highest Status,Other Drug Names,Originator Company,Originator Company HQ,Active Companies,Active Companies HQ,Therapy Area,Active Indications,Action,Technologies,Regulatory Designations,Inactive Indications,Inactive Companies,Has Deals,Last Change Date,Added Date,First Launched Date,Extract,Drug Id,cortellis_cleaned_drug,nlm_download_date_description,study_first_submitted_date,results_first_submitted_date,disposition_first_submitted_date,last_update_submitted_date,study_first_submitted_qc_date,study_first_posted_date,study_first_posted_date_type,results_first_submitted_qc_date,results_first_posted_date,results_first_posted_date_type,disposition_first_submitted_qc_date,disposition_first_posted_date,disposition_first_posted_date_type,last_update_submitted_qc_date,last_update_posted_date,last_update_posted_date_type,start_month_year,start_date_type,start_date,verification_month_year,verification_date,completion_month_year,completion_date_type,completion_date,primary_completion_month_year,primary_completion_date_type,primary_completion_date,target_duration,study_type,acronym,baseline_population,brief_title,official_title,overall_status,last_known_status,phase,enrollment,enrollment_type,source,limitations_and_caveats,number_of_arms,number_of_groups,why_stopped,has_expanded_access,expanded_access_type_individual,expanded_access_type_intermediate,expanded_access_type_treatment,has_dmc,is_fda_regulated_drug,is_fda_regulated_device,is_unapproved_device,is_ppsd,is_us_export,biospec_retention,biospec_description,ipd_time_frame,ipd_access_criteria,ipd_url,plan_to_share_ipd,plan_to_share_ipd_description,created_at,updated_at,source_class,delayed_posting,expanded_access_nctid,expanded_access_status_for_nctid,fdaaa801_violation,baseline_type_units_analyzed,patient_registry,drug_outcome,disease_type,new_therapy_area,id_x,number_of_facilities,number_of_nsae_subjects,number_of_sae_subjects,registered_in_calendar_year,nlm_download_date,actual_duration,were_results_reported,months_to_report_results,has_us_facility,has_single_facility,minimum_age_num,maximum_age_num,minimum_age_unit,maximum_age_unit,number_of_primary_outcomes_to_measure,number_of_secondary_outcomes_to_measure,number_of_other_outcomes_to_measure,id_y,sampling_method,gender,minimum_age,maximum_age,healthy_volunteers,population,criteria,gender_description,gender_based,adult,child,older_adult,official_role,official_name,official_affiliation,id,trial_country,country_removed,LLM_GBT_4o_Human_Importance_Ratings,LLama3_2_Criteria_Robustness,Spacy_Pregnant_Women_Excluded
0,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138299,Czech Republic,t,1.0,2.0,1.0
1,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138896,United States,f,1.0,2.0,1.0
2,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138897,Argentina,f,1.0,2.0,1.0
3,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138898,Brazil,f,1.0,2.0,1.0
4,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138899,Czechia,f,1.0,2.0,1.0


In [174]:
# replace NaN values in Spacy_Pregnant_Women_Excluded with 0
merged_df_1123['Spacy_Pregnant_Women_Excluded'] = merged_df_1123['Spacy_Pregnant_Women_Excluded'].fillna(0)

In [175]:
merged_df_1123_v2.shape

(24959, 137)

In [176]:
merged_df_1123.shape

(24959, 139)

In [141]:
# load outcome analyses table from AACT
df_outcome_analyses = pd.read_csv('b3t0zvq5n6oyhvv9h6v8qsqg4j57/outcome_analyses.txt', sep='|')
df_outcome_analyses.head()

  df_outcome_analyses = pd.read_csv('b3t0zvq5n6oyhvv9h6v8qsqg4j57/outcome_analyses.txt', sep='|')


Unnamed: 0,id,nct_id,outcome_id,non_inferiority_type,non_inferiority_description,param_type,param_value,dispersion_type,dispersion_value,p_value_modifier,p_value,ci_n_sides,ci_percent,ci_lower_limit,ci_upper_limit,ci_upper_limit_na_comment,p_value_description,method,method_description,estimate_description,groups_description,other_analysis_description,ci_upper_limit_raw,ci_lower_limit_raw,p_value_raw
0,3926408,NCT00191152,7567777,SUPERIORITY_OR_OTHER,,,,,,,0.145,,95.0,,,,,Log Rank,,,,,,,0.145
1,3926409,NCT00191152,7567778,SUPERIORITY_OR_OTHER,,,,,,,0.361,,95.0,,,,,Log Rank,,,,,,,0.361
2,3926410,NCT00191152,7567779,SUPERIORITY_OR_OTHER,,,,,,,0.145,,95.0,,,,,Log Rank,,,,,,,0.145
3,3926411,NCT00191152,7567780,SUPERIORITY_OR_OTHER,,,,,,,0.385,,95.0,,,,,Log Rank,,,,,,,0.385
4,3926412,NCT00191152,7567781,SUPERIORITY_OR_OTHER,,,,,,,0.377,,95.0,,,,,Log Rank,,,,,,,0.377


In [142]:
df_outcome_analyses.shape

(274108, 25)

In [143]:
# check unique primary completion dates
# pd.DataFrame(df_outcome_analyses['param_value','param_type'].unique()
df_outcome_analyses.groupby('ci_lower_limit')['param_value'].nunique()             

ci_lower_limit
-1395694.00    1
-855009.00     1
-525369.24     1
-439692.00     1
-426928.00     1
              ..
 168364.92     1
 179628.07     1
 181313.34     1
 194650.78     1
 199264.32     1
Name: param_value, Length: 25285, dtype: int64

In [144]:
df_outcome_analyses[df_outcome_analyses['nct_id'] == 'NCT00606502']

Unnamed: 0,id,nct_id,outcome_id,non_inferiority_type,non_inferiority_description,param_type,param_value,dispersion_type,dispersion_value,p_value_modifier,p_value,ci_n_sides,ci_percent,ci_lower_limit,ci_upper_limit,ci_upper_limit_na_comment,p_value_description,method,method_description,estimate_description,groups_description,other_analysis_description,ci_upper_limit_raw,ci_lower_limit_raw,p_value_raw
61352,3989781,NCT00606502,7687855,SUPERIORITY_OR_OTHER_LEGACY,,Cox Proportional Hazard,0.84,,,,,TWO_SIDED,95.0,0.61,1.14,,,,,,,,1.14,0.61,


In [145]:
# merge outcome analyses table with p2_p4 table
merged_df_1123_v2 = pd.merge(merged_df_1123, df_outcome_analyses, left_on="nct_id", right_on="nct_id", how="inner")
merged_df_1123_v2.head()

  merged_df_1123_v2 = pd.merge(merged_df_1123, df_outcome_analyses, left_on="nct_id", right_on="nct_id", how="inner")


Unnamed: 0,nct_id,intervention_type,description,trial_drug_cleaned,conditions,Drug Name,Highest Status,Other Drug Names,Originator Company,Originator Company HQ,Active Companies,Active Companies HQ,Therapy Area,Active Indications,Action,Technologies,Regulatory Designations,Inactive Indications,Inactive Companies,Has Deals,Last Change Date,Added Date,First Launched Date,Extract,Drug Id,cortellis_cleaned_drug,nlm_download_date_description,study_first_submitted_date,results_first_submitted_date,disposition_first_submitted_date,last_update_submitted_date,study_first_submitted_qc_date,study_first_posted_date,study_first_posted_date_type,results_first_submitted_qc_date,results_first_posted_date,results_first_posted_date_type,disposition_first_submitted_qc_date,disposition_first_posted_date,disposition_first_posted_date_type,last_update_submitted_qc_date,last_update_posted_date,last_update_posted_date_type,start_month_year,start_date_type,start_date,verification_month_year,verification_date,completion_month_year,completion_date_type,completion_date,primary_completion_month_year,primary_completion_date_type,primary_completion_date,target_duration,study_type,acronym,baseline_population,brief_title,official_title,overall_status,last_known_status,phase,enrollment,enrollment_type,source,limitations_and_caveats,number_of_arms,number_of_groups,why_stopped,has_expanded_access,expanded_access_type_individual,expanded_access_type_intermediate,expanded_access_type_treatment,has_dmc,is_fda_regulated_drug,is_fda_regulated_device,is_unapproved_device,is_ppsd,is_us_export,biospec_retention,biospec_description,ipd_time_frame,ipd_access_criteria,ipd_url,plan_to_share_ipd,plan_to_share_ipd_description,created_at,updated_at,source_class,delayed_posting,expanded_access_nctid,expanded_access_status_for_nctid,fdaaa801_violation,baseline_type_units_analyzed,patient_registry,drug_outcome,disease_type,new_therapy_area,id_x,number_of_facilities,number_of_nsae_subjects,number_of_sae_subjects,registered_in_calendar_year,nlm_download_date,actual_duration,were_results_reported,months_to_report_results,has_us_facility,has_single_facility,minimum_age_num,maximum_age_num,minimum_age_unit,maximum_age_unit,number_of_primary_outcomes_to_measure,number_of_secondary_outcomes_to_measure,number_of_other_outcomes_to_measure,id_y,sampling_method,gender,minimum_age,maximum_age,healthy_volunteers,population,criteria,gender_description,gender_based,adult,child,older_adult,official_role,official_name,official_affiliation,id_x.1,trial_country,country_removed,id_y.1,outcome_id,non_inferiority_type,non_inferiority_description,param_type,param_value,dispersion_type,dispersion_value,p_value_modifier,p_value,ci_n_sides,ci_percent,ci_lower_limit,ci_upper_limit,ci_upper_limit_na_comment,p_value_description,method,method_description,estimate_description,groups_description,other_analysis_description,ci_upper_limit_raw,ci_lower_limit_raw,p_value_raw
0,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138299,Czech Republic,t,3989781,7687855,SUPERIORITY_OR_OTHER_LEGACY,,Cox Proportional Hazard,0.84,,,,,TWO_SIDED,95.0,0.61,1.14,,,,,,,,1.14,0.61,
1,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138896,United States,f,3989781,7687855,SUPERIORITY_OR_OTHER_LEGACY,,Cox Proportional Hazard,0.84,,,,,TWO_SIDED,95.0,0.61,1.14,,,,,,,,1.14,0.61,
2,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138897,Argentina,f,3989781,7687855,SUPERIORITY_OR_OTHER_LEGACY,,Cox Proportional Hazard,0.84,,,,,TWO_SIDED,95.0,0.61,1.14,,,,,,,,1.14,0.61,
3,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138898,Brazil,f,3989781,7687855,SUPERIORITY_OR_OTHER_LEGACY,,Cox Proportional Hazard,0.84,,,,,TWO_SIDED,95.0,0.61,1.14,,,,,,,,1.14,0.61,
4,NCT00606502,DRUG,150 mg orally in tablet form~Administered dail...,erlotinib,non-small cell lung cancer,erlotinib,Launched,CP-358774; CP-358774-01; NSC-718781; OSI-420; ...,OSI Pharmaceuticals Inc,OSI Pharmaceuticals Inc (US),Astellas Pharma Inc; Baheal Pharmaceutical gro...,Astellas Pharma Inc (Japan); Baheal Pharmaceut...,Cancer; Dermatologic,Acute myelogenous leukemia; Breast tumor; Cent...,Anticancer protein kinase inhibitor; EGFR fami...,Film coating; Oral formulation; Small molecule...,Fast Track; Orphan Drug,Cancer; Colorectal tumor; Ependymoma; Esophagu...,Nippon Roche KK; Pfizer Inc,Yes,2024-07-04,1996-03-28,2004-11-24,Erlotinib (Tarceva; OSI-744; CP-358774; NSC-71...,11961,erlotinib,,2008-01-22,2010-12-22,,2021-02-08,2008-02-01,2008-02-04,ESTIMATED,2010-12-22,2011-01-20,ESTIMATED,,,,2021-02-08,2021-03-05,ACTUAL,2008-01,,2008-01-31,2021-02,2021-02-28,2010-06-24,ACTUAL,2010-06-24,2010-06-24,ACTUAL,2010-06-24,,INTERVENTIONAL,,,Study of Pralatrexate vs. Erlotinib for Non-Sm...,"A Randomized, Phase 2b, Multi-center Study of ...",COMPLETED,,PHASE2,201.0,ACTUAL,"Spectrum Pharmaceuticals, Inc",The date of the CRF database cut-off for patie...,2.0,,,f,,,,,,,,,,,,,,,,,2024-08-04 13:24:13.009685,2024-08-04 13:24:13.009685,INDUSTRY,,,,,,,success,"Lung, Non-Small Cell",Oncology,29671336,47.0,540.0,124.0,2008,,29.0,t,6.0,t,f,18.0,,Years,,1.0,3.0,,7316876,,ALL,18 Years,,f,,Inclusion Criteria:~* Confirmed Stage IIIB/ IV...,,,t,f,t,STUDY_DIRECTOR,"Garry Weems, PharmD","Spectrum Pharmaceuticals, Inc",10138899,Czechia,f,3989781,7687855,SUPERIORITY_OR_OTHER_LEGACY,,Cox Proportional Hazard,0.84,,,,,TWO_SIDED,95.0,0.61,1.14,,,,,,,,1.14,0.61,


In [146]:
merged_df_1123_v2.shape

(203074, 160)

In [177]:
merged_df_1123.to_csv('nov_24/nov_23/merged_df_1123.txt', sep='|', index=True)