# Generating `has_result_manual` and `has_protocol_manual`

This Notebook will create the `has_result_manual` and `has_protocol_manula` variables for the final analysis.

<small>**NOTE:** Final analysis depends on these outcome variables.</small>

First we will import the needed packages:

In [64]:
import pandas as pd

Next we will load the required data:

In [65]:
imposed_documents_PR = pd.read_excel('../../../data/ema_rwd/rmp1&2_documents_manual_PR.xlsx').set_index('eu_pas_register_number')

other_documents_CP = pd.read_excel('../../../data/ema_rwd/rmpother_documents_manual_CP.xlsx').set_index('eu_pas_register_number')[[
    'path', 'url', 'name', 'uploaded_document_type', 'manual_document_type'
]]
other_documents_PR = pd.read_excel('../../../data/ema_rwd/rmpother_documents_manual_PR.xlsx').set_index('eu_pas_register_number')

na_values = [
    "", "#N/A", "#N/A N/A", "#NA", "-1.#IND", "-1.#QNAN", "-NaN", "-nan", 
    "1.#IND", "1.#QNAN", "<NA>", "NULL", "NaN", "None", "nan", "null"
    # "N/A",
    # "NA",
    # "n/a",
]

def python_name_converter(x):
    return '_'.join([word.lower() for word in x.split(' ')]) if x[0] != '$' else x

raw = pd.read_excel(
    '../../../data/ema_rwd/ema_rwd_p_m_gpt.xlsx', 
    index_col=0, 
    keep_default_na=False,
    na_values=na_values,
    na_filter=True
).rename(
    columns=python_name_converter
).set_index(
    'eu_pas_register_number'
).assign(
    has_protocol=lambda x: x.filter(like='protocol').notna(),
    has_result = lambda x: x.filter(like='result').notna().any(axis='columns')
)

  warn(msg)
  warn(msg)


After discussing the differences in the classifications of the imposed documents (see `compare_imposed_documents_classification.ipynb`) with CP, we will:

1. merging the interim and progress reports

1. set the single remaining differently classified document to `unclear` (it seems to be the abstract of an interim report)

In [66]:
document_type_map = {
    'interim study report': 'interim/progress study report', 
    'progress study report': 'interim/progress study report',
}

imposed_documents_PR.loc[
    (imposed_documents_PR.index == 7708) & 
    (imposed_documents_PR['uploaded_document_type'] == 'result_document'), 
    'manual_document_type'
] = 'unclear'

imposed_documents_harmonised = imposed_documents_PR.assign(
    manual_document_type = lambda x : x['manual_document_type'].apply(lambda y : document_type_map.get(y, y)),
    has_abstract_only_manual = lambda x : x['manual_document_type'].isin([
        'abstract of final study report',
        'result publication'
    ]),
    has_abstract_manual = lambda x : x['manual_document_type'].isin([
        'abstract of final study report',
        'final study report with abstract',
        'result publication'
    ]),
    has_final_study_report_manual = lambda x : x['manual_document_type'].isin([
        'final study report with abstract',
        'final study report without abstract'
    ]),
    has_intermediate_result_manual = lambda x : x['manual_document_type'].isin([
        'interim/progress study report',
        'result tables only'
    ]),
    has_additional_protocol_manual = lambda x : x['manual_document_type'].eq('protocol')
)

display(imposed_documents_harmonised['manual_document_type'].value_counts(dropna=False))
# display(imposed_documents_harmonised['manual_document_type'].value_counts(dropna=False).plot.bar(figsize=(5,5)))

manual_document_type
abstract of final study report         65
final study report with abstract       25
interim/progress study report          23
other                                  17
result publication                      4
protocol                                3
result tables only                      2
final study report without abstract     2
unclear                                 1
Name: count, dtype: int64

In [67]:
imposed_documents_harmonised_grouped = imposed_documents_harmonised.groupby('eu_pas_register_number').agg(
    aggregated_document_type = ('manual_document_type', lambda x: '; '.join(sorted(x.unique()))),
    has_abstract_only_manual = ('has_abstract_only_manual', 'any'), # Step 1
    has_abstract_manual = ('has_abstract_manual', 'any'),
    has_final_study_report_manual = ('has_final_study_report_manual', 'any'),
    has_intermediate_result_manual = ('has_intermediate_result_manual', 'any'),
    has_additional_protocol_manual = ('has_additional_protocol_manual', 'any')
).assign(
    has_abstract_only_manual = lambda x : x['has_abstract_only_manual'] & ~x['has_final_study_report_manual'], # Step 2
    has_result_manual = lambda x : x['has_abstract_manual'] | x['has_final_study_report_manual']
)

imposed_documents_harmonised_grouped

Unnamed: 0_level_0,aggregated_document_type,has_abstract_only_manual,has_abstract_manual,has_final_study_report_manual,has_intermediate_result_manual,has_additional_protocol_manual,has_result_manual
eu_pas_register_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2165,abstract of final study report,True,True,False,False,False,True
2196,final study report with abstract,False,True,True,False,False,True
2857,final study report with abstract,False,True,True,False,False,True
3142,abstract of final study report,True,True,False,False,False,True
3583,final study report with abstract,False,True,True,False,False,True
...,...,...,...,...,...,...,...
36536,interim/progress study report,False,False,False,True,False,False
41735,abstract of final study report,True,True,False,False,False,True
42543,protocol,False,False,False,False,True,False
43115,abstract of final study report,True,True,False,False,False,True


In [68]:
# print('\n'.join(sorted(other_documents_CP.manual_document_type.dropna().unique())))

In [69]:
other_documents_CP = other_documents_CP[
    other_documents_CP['manual_document_type'].notna()
].assign(
    has_abstract_only_manual = lambda x : x['manual_document_type'].isin([
        'abstract of final study report',
        'abstract of final study report & link to result publication',
        'letter to the editor with results',
        'objectives and results',
        'result publication',
        'result publications',
        'results only'
    ]),
    has_abstract_manual = lambda x : x['manual_document_type'].isin([
        'abstract of final study report',
        'abstract of final study report & link to final report',
        'abstract of final study report & link to result publication',
        'final study report with abstract',
        'final study report with abstract (split documtent)',
        'final study report with abstract, protocol included',
        'letter to the editor with results',
        'link to final study report with abstract',
        'objectives and results',
        'result publication',
        'result publications',
        'results only'
    ]),
    has_final_study_report_manual = lambda x : x['manual_document_type'].isin([
        'abstract of final study report & link to final report',
        'final study report with abstract',
        'final study report with abstract (split documtent)',
        'final study report with abstract, protocol included',
        'final study report without abstract',
        'link to final study report with abstract'
    ]),
    has_intermediate_result_manual = lambda x : x['manual_document_type'].isin([
        'abstract without results',
        'annex - tables and figures',
        'annual report',
        'appendix',
        'appendix - tables',
        'appendix to final study report',
        'draft report',
        'interim study report',
        'letter to the editor with results',
        'meaning and implications of the study results',
        'poster of preliminary results',
        'poster of results',
        'powerpoint presentation of results',
        'preliminary data analysis',
        'preliminary study plan',
        'progress study report',
        'result manuscript',
        'result tables only',
        'study report, partial',
        'study summary without results'
    ]),
    has_additional_protocol_manual = lambda x : x['manual_document_type'].eq('protocol')
)

other_documents_CP_grouped = other_documents_CP.groupby('eu_pas_register_number').agg(
    aggregated_document_type=('manual_document_type', lambda x: '; '.join(sorted(x.unique()))),
    has_abstract_only_manual = ('has_abstract_only_manual', 'any'), # Step 1
    has_abstract_manual = ('has_abstract_manual', 'any'),
    has_final_study_report_manual = ('has_final_study_report_manual', 'any'),
    has_intermediate_result_manual = ('has_intermediate_result_manual', 'any'),
    has_additional_protocol_manual = ('has_additional_protocol_manual', 'any'),
).assign(
    has_abstract_only_manual = lambda x : x['has_abstract_only_manual'] & ~x['has_final_study_report_manual'], # Step 2
    has_result_manual = lambda x : x['has_abstract_manual'] | x['has_final_study_report_manual'],
)

other_documents_CP_grouped

Unnamed: 0_level_0,aggregated_document_type,has_abstract_only_manual,has_abstract_manual,has_final_study_report_manual,has_intermediate_result_manual,has_additional_protocol_manual,has_result_manual
eu_pas_register_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
18923,abstract of final study report,True,True,False,False,False,True
18936,list - principal investigators,False,False,False,False,False,False
18970,final study report without abstract,False,False,True,False,False,True
19066,abstract of final study report,True,True,False,False,False,True
19094,final study report with abstract,False,True,True,False,False,True
...,...,...,...,...,...,...,...
106882,final study report without abstract,False,False,True,False,False,True
107454,final study report with abstract,False,True,True,False,False,True
108167,letter - ethics committee,False,False,False,False,False,False
108254,letter - ethics committee; results only,True,True,False,False,False,True


In [70]:
# print('\n'.join(sorted(other_documents_PR.manual_document_type.dropna().unique())))

In [71]:
other_documents_PR = other_documents_PR[
    other_documents_PR['manual_document_type'].notna()
].assign(
    has_abstract_only_manual = lambda x : x['manual_document_type'].isin([
        'abstract of final study report',
        'abstract of original research article / published abstract',
        'abstract subsection with results',
        'original research article',
        'systematic review / meta analysis article'
    ]),
    has_abstract_manual = lambda x : x['manual_document_type'].isin([
        'abstract of final study report',
        'abstract of original research article / published abstract',
        'abstract subsection with results',
        'final study report with abstract',
        'original research article',
        'systematic review / meta analysis article',
        'unclear report with result with abstract'
    ]),
    has_final_study_report_manual = lambda x : x['manual_document_type'].isin([
        'final study report with abstract', 
        'final study report without abstract',
        'unclear report with result with abstract',
        'unclear report with result without abstract'
    ]),
    has_intermediate_result_manual = lambda x : x['manual_document_type'].isin([
        'abstract of other report / document', 
        'abstract subsection without results',
        'draft study report',
        'figures, tables and appendices',
        'interim study report',
        'poster',
        'presentation',
        'progress study report'
    ]),
    has_additional_protocol_manual = lambda x : x['manual_document_type'].eq('protocol')
)

other_documents_PR_grouped = other_documents_PR.groupby('eu_pas_register_number').agg(
    aggregated_document_type=('manual_document_type', lambda x: '; '.join(sorted(x.unique()))),
    has_abstract_only_manual = ('has_abstract_only_manual', 'any'), # Step 1
    has_abstract_manual = ('has_abstract_manual', 'any'),
    has_final_study_report_manual = ('has_final_study_report_manual', 'any'),
        has_intermediate_result_manual = ('has_intermediate_result_manual', 'any'),
    has_additional_protocol_manual = ('has_additional_protocol_manual', 'any'),
).assign(
    has_abstract_only_manual = lambda x : x['has_abstract_only_manual'] & ~x['has_final_study_report_manual'], # Step 2
    has_result_manual = lambda x : x['has_abstract_manual'] | x['has_final_study_report_manual'],
)

other_documents_PR_grouped

Unnamed: 0_level_0,aggregated_document_type,has_abstract_only_manual,has_abstract_manual,has_final_study_report_manual,has_intermediate_result_manual,has_additional_protocol_manual,has_result_manual
eu_pas_register_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1591,unclear report with result with abstract,False,True,True,False,False,True
1597,final study report with abstract,False,True,True,False,False,True
1613,final study report with abstract,False,True,True,False,False,True
1705,final study report without abstract,False,False,True,False,False,True
1777,abstract of final study report; other,True,True,False,False,False,True
...,...,...,...,...,...,...,...
18739,abstract of final study report,True,True,False,False,False,True
18751,final study report with abstract,False,True,True,False,False,True
18825,abstract of final study report,True,True,False,False,False,True
18909,abstract of final study report,True,True,False,False,False,True


In [72]:
imposed_documents_merged = raw.loc[raw['risk_management_plan'].isin([
    'EU RMP category 1 (imposed as condition of marketing authorisation)',
    'EU RMP category 2 (specific obligation of marketing authorisation)'
]), ['$UPDATED_state', '$CANCELLED_MANUAL', 'risk_management_plan', 'has_protocol', 'has_result']].merge(
    imposed_documents_harmonised_grouped, left_index=True, right_index=True, how='left'
)

other_documents_merged = raw.loc[~raw['risk_management_plan'].isin([
    'EU RMP category 1 (imposed as condition of marketing authorisation)',
    'EU RMP category 2 (specific obligation of marketing authorisation)'
]), ['$UPDATED_state', '$CANCELLED_MANUAL', 'risk_management_plan', 'has_protocol', 'has_result']].merge(
    pd.concat([
        other_documents_CP_grouped,
        other_documents_PR_grouped
    ]), left_index=True, right_index=True, how='left'
)

documents_merged = pd.concat([
    imposed_documents_merged,
    other_documents_merged
])

documents_merged.loc[:, 'has_result_manual'] = documents_merged['has_result_manual'].fillna(False)
documents_merged.loc[:, 'has_intermediate_result_manual'] = documents_merged['has_intermediate_result_manual'].fillna(False)

documents_merged = documents_merged.assign(
    has_protocol_automatic = lambda x : x['has_protocol'],
    has_protocol_manual = lambda x : x['has_protocol_automatic'] | x['has_additional_protocol_manual'],
    protocol_manual_eq_automatic = lambda x : x['has_protocol_automatic'].eq(x['has_protocol_manual']),
    has_result_automatic = lambda x : x['has_result'],
    result_manual_eq_automatic = lambda x : x['has_result_automatic'].eq(x['has_result_manual']),
).drop(['has_protocol', 'has_result'], axis='columns')

documents_merged

Unnamed: 0_level_0,$UPDATED_state,$CANCELLED_MANUAL,risk_management_plan,aggregated_document_type,has_abstract_only_manual,has_abstract_manual,has_final_study_report_manual,has_intermediate_result_manual,has_additional_protocol_manual,has_result_manual,has_protocol_automatic,has_protocol_manual,protocol_manual_eq_automatic,has_result_automatic,result_manual_eq_automatic
eu_pas_register_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2165,Finalised,0.0,EU RMP category 1 (imposed as condition of mar...,abstract of final study report,True,True,False,False,False,True,True,True,True,True,True
2181,Finalised,0.0,EU RMP category 1 (imposed as condition of mar...,,,,,False,,False,False,False,True,False,True
2196,Finalised,0.0,EU RMP category 1 (imposed as condition of mar...,final study report with abstract,False,True,True,False,False,True,True,True,True,True,True
2857,Finalised,0.0,EU RMP category 2 (specific obligation of mark...,final study report with abstract,False,True,True,False,False,True,True,True,True,True,True
3142,Finalised,0.0,EU RMP category 1 (imposed as condition of mar...,abstract of final study report,True,True,False,False,False,True,True,True,True,True,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
108481,Ongoing,0.0,Not applicable,,,,,False,,False,True,True,True,False,True
108728,Planned,0.0,EU RMP category 3 (required),,,,,False,,False,True,True,True,False,True
108847,Planned,0.0,EU RMP category 3 (required),,,,,False,,False,False,False,True,False,True
108904,Planned,0.0,EU RMP category 3 (required),,,,,False,,False,False,False,True,False,True


In [73]:
documents_merged.to_excel('outcomes_manual.xlsx')

We can also export the results for each individual document:

In [74]:
pd.concat([
    imposed_documents_harmonised,
    other_documents_CP,
    other_documents_PR
]).to_excel('outcomes_manual_individual.xlsx')

Now we will append the outcome columns to the data:

In [None]:
pd.read_excel('../../../data/ema_rwd/ema_rwd_p_m_gpt.xlsx', index_col=0).set_index('Eu Pas Register Number').merge(
    documents_merged[['has_result_manual', 'has_protocol_manual']].rename(columns=lambda x : x.removesuffix('_manual')), left_index=True, right_index=True, how='left'
).reset_index(names='Eu Pas Register Number').to_excel('../../../data/ema_rwd/ema_rwd_p_m_gpt_o.xlsx', sheet_name='PAS')

We will now use `has_result_manual` and `has_intermediate_result_manual` to find studies with false status. The status of these studies needs to be fixed.

Some dates will also be adjusted. The changes will be merged with the input data.

**NOTE**: This was done later than the step above and will therefore change another file, which was build on top of `ema_rwd_p_m_gpt_o.xlsx`.

In [75]:
all_included_studies_documents = documents_merged[
    ~documents_merged['$CANCELLED_MANUAL'].fillna(False).astype(bool)
]

In [76]:
pd.merge(
    all_included_studies_documents[
        all_included_studies_documents['has_result_manual'] & 
        all_included_studies_documents['$UPDATED_state'].isin(['Planned', 'Ongoing'])
    ][[
        '$UPDATED_state', 'risk_management_plan', 'aggregated_document_type'
    ]],
    raw[['data_collection_date_actual', 'data_collection_date_planed', 'final_report_date_actual', 'final_report_date_planed']],
    left_index=True,
    right_index=True
).assign(**{
    '$UPDATED_state_override': pd.NA,
    'data_collection_date_actual_override': pd.NA,
    'final_report_date_actual_override': pd.NA
}).sort_values(['$UPDATED_state', 'eu_pas_register_number'], ascending=(False, True)).to_excel('actually_finalised.xlsx')

In [77]:
pd.merge(
    all_included_studies_documents[
        ~all_included_studies_documents['has_result_manual'] & 
        all_included_studies_documents['has_intermediate_result_manual'] & 
        all_included_studies_documents['$UPDATED_state'].eq('Planned')
    ][[
        '$UPDATED_state', 'risk_management_plan', 'aggregated_document_type'
    ]],
    raw[['data_collection_date_actual', 'data_collection_date_planed']],
    left_index=True,
    right_index=True
).assign(**{
    '$UPDATED_state_override': pd.NA,
    'data_collection_date_actual_override': pd.NA
}).sort_index().to_excel('actually_ongoing.xlsx')

After filling in `UPDATED_state_override`, `data_collection_date_actual_override` and `final_report_date_actual_override`, we can update the input data:

In [78]:
actual_finalised = pd.read_excel('actually_finalised_manual.xlsx', index_col=0).filter(like='override')
actual_ongoing = pd.read_excel('actually_ongoing_manual.xlsx', index_col=0).filter(like='override')

In [79]:
actual_state_with_fixed_dates = pd.concat([
    actual_finalised,
    actual_ongoing
])

actual_state_with_fixed_dates

Unnamed: 0_level_0,$UPDATED_state_override,data_collection_date_actual_override,final_report_date_actual_override
eu_pas_register_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
23753,Finalised,2017-09-13,2018-07-19
35766,Ongoing,2019-03-01,NaT
48735,Finalised,2023-04-07,2023-09-15
104156,Finalised,2022-11-01,2023-04-17
8571,,NaT,NaT
14525,Finalised,NaT,2023-10-19
18108,Finalised,NaT,2023-08-29
19769,,NaT,NaT
25151,,NaT,NaT
30560,Finalised,NaT,2024-01-16


In [80]:
actual_state_with_fixed_dates.index.has_duplicates

False

In [81]:
pd.read_excel('../../../data/ema_rwd/ema_rwd_p_m_gpt_o_s_v2.xlsx', index_col=0).set_index('Eu Pas Register Number').merge(
    actual_state_with_fixed_dates, left_index=True, right_index=True, how='left'
).reset_index(names='Eu Pas Register Number').to_excel('../../../data/ema_rwd/ema_rwd_p_m_gpt_o_s_f.xlsx', sheet_name='PAS')

## Statistics

The following requires updated statistic data with manual outcomes and fixed status and due populations.

We can now take a look at the differences between manual and automatic classification for all studies and for studies due.

In [50]:
all_included_studies_documents[['has_result_manual', 'has_result_automatic']].value_counts()

has_result_manual  has_result_automatic
False              False                   1622
True               True                    1008
False              True                      59
True               False                     11
Name: count, dtype: int64

In [51]:
all_included_studies_documents[['has_protocol_manual', 'has_protocol_automatic']].value_counts()

has_protocol_manual  has_protocol_automatic
True                 True                      1501
False                False                     1198
True                 False                        1
Name: count, dtype: int64

In [52]:
variables_due_result, variables_due_protocol = pd.read_excel(
    '../../../output/ema_rwd/ema_rwd_final_statistics_variables.xlsx', 
    sheet_name=['due_result', 'due_protocol'], 
    index_col=0
).values()

all_included_studies_documents = all_included_studies_documents.assign(
    due_protocol = lambda x: x.index.isin(variables_due_protocol.index),
    due_result = lambda x: x.index.isin(variables_due_result.index)
)

due_protocol_studies_documents = documents_merged.loc[variables_due_protocol.index, :].loc[
    ~documents_merged['$CANCELLED_MANUAL'].fillna(False).astype(bool), :
]

due_result_studies_documents = documents_merged.loc[variables_due_result.index, :].loc[
    ~documents_merged['$CANCELLED_MANUAL'].fillna(False).astype(bool), :
]

In [53]:
due_protocol_studies_documents[['has_protocol_manual', 'has_protocol_automatic']].value_counts()

has_protocol_manual  has_protocol_automatic
True                 True                      1369
False                False                      930
True                 False                        1
Name: count, dtype: int64

In [54]:
def abs_plus_rel(amount, max):
    return f'{amount} / {max} ({round(amount / max * 100, 1)}%)'

maximum = len(due_protocol_studies_documents)
display(
    abs_plus_rel(len(due_protocol_studies_documents[due_protocol_studies_documents['has_protocol_manual'].fillna(False).astype(bool)]), maximum),
    abs_plus_rel(len(due_protocol_studies_documents[due_protocol_studies_documents['has_protocol_automatic'].fillna(False).astype(bool)]), maximum)
)

'1370 / 2300 (59.6%)'

'1369 / 2300 (59.5%)'

In [55]:
due_result_studies_documents[['has_result_manual', 'has_result_automatic']].value_counts()

has_result_manual  has_result_automatic
True               True                    1004
False              False                    432
                   True                      36
True               False                     10
Name: count, dtype: int64

In [56]:
maximum = len(due_result_studies_documents)
display(
    abs_plus_rel(len(due_result_studies_documents[due_result_studies_documents['has_result_manual'].fillna(False).astype(bool)]), maximum),
    abs_plus_rel(len(due_result_studies_documents[due_result_studies_documents['has_abstract_only_manual'].fillna(False).astype(bool)]), maximum),
    abs_plus_rel(len(due_result_studies_documents[due_result_studies_documents['has_abstract_manual'].fillna(False).astype(bool)]), maximum),
    abs_plus_rel(len(due_result_studies_documents[due_result_studies_documents['has_final_study_report_manual'].fillna(False).astype(bool)]), maximum),
    pd.concat([
        due_result_studies_documents[['has_abstract_manual', 'has_abstract_only_manual', 'has_final_study_report_manual']].value_counts(dropna=True),
        (due_result_studies_documents[['has_abstract_manual', 'has_abstract_only_manual', 'has_final_study_report_manual']].value_counts(dropna=True, normalize=True) * 100).round(1),
    ], axis='columns')
)

'1014 / 1482 (68.4%)'

'513 / 1482 (34.6%)'

'891 / 1482 (60.1%)'

'501 / 1482 (33.8%)'

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count,proportion
has_abstract_manual,has_abstract_only_manual,has_final_study_report_manual,Unnamed: 3_level_1,Unnamed: 4_level_1
True,True,False,513,48.2
True,False,True,378,35.5
False,False,True,123,11.5
False,False,False,51,4.8


## Experiments

In [90]:
csv_data = all_included_studies_documents.assign(
    count = 1,
    due_result_count = lambda x : x['due_result'].fillna(0).astype(int),
    due_protocol_count = lambda x : x['due_protocol'].fillna(0).astype(int),
    has_abstract_only_manual_count = lambda x : x['has_abstract_only_manual'].fillna(0).astype(int),
    has_abstract_manual_count = lambda x : x['has_abstract_manual'].fillna(0).astype(int),
    has_final_study_report_manual_count = lambda x : x['has_final_study_report_manual'].fillna(0).astype(int),
    has_result_manual_count = lambda x : x['has_result_manual'].fillna(0).astype(int),
    has_result_automatic_count = lambda x : x['has_result_automatic'].fillna(0).astype(int),
    has_additional_protocol_manual_count = lambda x : x['has_additional_protocol_manual'].fillna(0).astype(int),
    has_protocol_manual_count = lambda x : x['has_protocol_manual'].fillna(0).astype(int),
    has_protocol_automatic_count = lambda x : x['has_protocol_automatic'].fillna(0).astype(int)
)

count_fields = csv_data.filter(like='count').columns

csv_data = pd.concat([
    csv_data.assign(
        risk_management_plan = 'All',
        grouped_risk_management_plan = 'All',
    ),
    csv_data.assign(
        grouped_risk_management_plan = csv_data['risk_management_plan'].replace({
            'EU RMP category 1 (imposed as condition of marketing authorisation)': 'Imposed PAS',
            'EU RMP category 2 (specific obligation of marketing authorisation)': 'Imposed PAS',
            'EU RMP category 3 (required)': 'Other PAS',
            'Non-EU RMP only': 'Other PAS',
            'Not applicable': 'Other PAS',
            pd.NA: 'Other PAS'
        })
    )
])

display(csv_data['risk_management_plan'].value_counts(dropna=False))
display(csv_data['grouped_risk_management_plan'].value_counts(dropna=False))

csv_data[[
    'risk_management_plan', 'grouped_risk_management_plan', *count_fields
]].to_csv('outcomes_manual.csv')

risk_management_plan
All                                                                    2700
Not applicable                                                         1550
EU RMP category 3 (required)                                            711
Non-EU RMP only                                                         149
NaN                                                                     132
EU RMP category 1 (imposed as condition of marketing authorisation)     119
EU RMP category 2 (specific obligation of marketing authorisation)       39
Name: count, dtype: int64

grouped_risk_management_plan
All            2700
Other PAS      2542
Imposed PAS     158
Name: count, dtype: int64