# Module 1 Practice 1 - Working with trial structure in the SDTM
In this practice exercise, you will read the SDTM and join datasets together.  You will need to reference the documentation to find the proper keys for doing this.  This will give you some practice in navigating the SDTM documentation.  The Trial Design Model specified in the SDTM documentation, section 3 discusses the standardized way to report study trial design.

In [1]:
import sys
!{sys.executable} -m pip install --upgrade "pandas>=1.1"
!{sys.executable} -m pip install xmltodict

import pandas as pd
import numpy as np


Requirement already up-to-date: pandas>=1.1 in /opt/conda/lib/python3.7/site-packages (1.3.5)




# Find information about the trial arms in the study

We would like to know how many trial arms there are in this study.  Find and print the unique trial arms.
Hint: Reference the [SDTM Implementation Guide](../resources/SDTMIG_v3.3_FINAL.pdf), section 3.

In [2]:
with open('../resources/SDTM_sample/ta.xpt', 'rb') as f:
    ta = pd.read_sas(f, format='xport', encoding='utf-8')
    
display(ta)
display(ta['ARM'].unique())

Unnamed: 0,STUDYID,DOMAIN,ARMCD,ARM,TAETORD,ETCD,ELEMENT,TABRANCH,TATRANS,EPOCH
0,CDISCPILOT01,TA,Pbo,Placebo,1.0,SCRN,Screen,Randomized to Placebo,,SCREENING
1,CDISCPILOT01,TA,Pbo,Placebo,2.0,PBO,Placebo,,,TREATMENT
2,CDISCPILOT01,TA,Pbo,Placebo,3.0,FOLO,Follow_up,,,FOLLOW-UP
3,CDISCPILOT01,TA,Xan_Hi,Xanomeline High Dose,1.0,SCRN,Screen,Randomized to High Dose,,SCREENING
4,CDISCPILOT01,TA,Xan_Hi,Xanomeline High Dose,2.0,HIS,High_Start,,,TREATMENT
5,CDISCPILOT01,TA,Xan_Hi,Xanomeline High Dose,3.0,HIM,High_Middle,,,TREATMENT
6,CDISCPILOT01,TA,Xan_Hi,Xanomeline High Dose,4.0,HIE,High_End,,,TREATMENT
7,CDISCPILOT01,TA,Xan_Hi,Xanomeline High Dose,5.0,FOLO,Follow_up,,,FOLLOW-UP
8,CDISCPILOT01,TA,Xan_Lo,Xanomeline Low Dose,1.0,SCRN,Screen,Randomized to Low Dose,,SCREENING
9,CDISCPILOT01,TA,Xan_Lo,Xanomeline Low Dose,2.0,LO,Low,,,TREATMENT


array(['Placebo', 'Xanomeline High Dose', 'Xanomeline Low Dose'],
      dtype=object)

## How long was each trial arm
We would like to know how long each trial arm was planned to last.  This will be a sum of the duration planned for each element of the trial arm.  You will note that each arm has different elements in it.  The high dose arm has more elements that the placebo and low dose arms, so each arm must be summed up individually as it's possible the arms were planned to have different lengths.  

We have to look up the duration of each element in another table.  Examine the [SDTM Implementation Guide](resources/SDTMIG_v3.3_FINAL.pdf) and look for which table has information on the elements.  You can search for the table that has the ETCD key in it, for example.

### Open the proper data set
Using the dataset you discovered from the SDTMIG, read the correct datafile into a Pandas dataframe.

In [3]:
with open('../resources/SDTM_sample/te.xpt', 'rb') as f:
    te = pd.read_sas(f, format='xport', encoding='utf-8')
    
display(te)

Unnamed: 0,STUDYID,DOMAIN,ETCD,ELEMENT,TESTRL,TEENRL,TEDUR
0,CDISCPILOT01,TE,FOLO,Follow_up,End of last scheduled visit on study (includin...,Completion of all specified followup activitie...,
1,CDISCPILOT01,TE,HIE,High_End,Administration of first dose (from patches sup...,,P2W
2,CDISCPILOT01,TE,HIM,High_Middle,Administration of first dose (from patches sup...,,P22W
3,CDISCPILOT01,TE,HIS,High_Start,Administration of first dose,,P2W
4,CDISCPILOT01,TE,LO,Low,Administration of first dose,,P26W
5,CDISCPILOT01,TE,PBO,Placebo,Administration of first dose,,P26W
6,CDISCPILOT01,TE,SCRN,Screen,Informed consent,Completion of all screening activities and no ...,


### Join the datasets together on the proper key
The trial arm data set has a key that will link to the trial elements table.

In [4]:
trial_arm_elements = te.join(ta.set_index('ETCD'), how='left', on='ETCD', lsuffix='TE_')

### Sum the duration for each arm
The durations will be in ISO 8601 format, which you can find described online.  ISO 8601 describes a format for representing `durations`, which you will need to convert to an interger value to allow for summation.

In [5]:
# strip non numeric characters
trial_arm_elements['TEDUR'] = trial_arm_elements['TEDUR'].str.replace(r"[^0-9]",'',regex=True)
# set all blanks to zero
trial_arm_elements['TEDUR'] = trial_arm_elements['TEDUR'].replace('', '0', regex=False)

# convert to int
trial_arm_elements = trial_arm_elements.astype({'TEDUR': int})

# finally groupby and sum
trial_arm_elements.groupby(by=['ARM'])['TEDUR'].sum()

ARM
Placebo                 26
Xanomeline High Dose    26
Xanomeline Low Dose     26
Name: TEDUR, dtype: int64