# Module 1 Lab 2 - Reading an export from REDCap

[REDCap](https://www.project-redcap.org/) is a platform for collecting data via electronic means.  It can be used for adminstration of surveys, or as an Electronic Data Capture (EDC) system for research.  There are of course many different platforms for collecting data online, such as Qualitrics, SurveyMonkey, etc.  REDCap is free software, but not open source, and requires non-profit organization status to receive a license.  It is one of the the most popular free software in use for research based EDC, and can be made compliant with HIPAA and GDPR, among other privacy standards.

In this lab, we will use REDCap to explore common needs surrounding the analysis of data collected via such an online EDC system.


In [1]:
import pandas as pd
import numpy as np



## Metadata
When analyzing data from an EDC system, it is imperative that you have access to the metadata which describes the data.  In REDCap you can get the data dictionary for any form, and it will contain the necessary pieces of information to help you make sense of the data.

In addition to describing the data, the data dictionary also describes relevant information about the forms, validation, and branching logic, such that a data dictionary can later be uploaded to recreate the EDC system.  This is one way in which you may share data with other researchers (provided the data is de-identified or the other researchers have the proper IRB approvals).

Below we will take a look at the contents of a REDCap data dictionary download.  This data dictionary describes a survey sent out to assess the prevalence and perceived usefulness of medical calculators.


In [2]:
data_dict = pd.read_csv('../resources/REDCap/REDCap_Sample_DataDictionary.csv')

display(data_dict)

Unnamed: 0,Variable / Field Name,Form Name,Section Header,Field Type,Field Label,"Choices, Calculations, OR Slider Labels",Field Note,Text Validation Type OR Show Slider Number,Text Validation Min,Text Validation Max,Identifier?,Branching Logic (Show field only if...),Required Field?,Custom Alignment,Question Number (surveys only),Matrix Group Name,Matrix Ranking?
0,pre_participant_id,sample_survey,,text,Participant ID,,,,,,,,,,,,
1,pre_gender,sample_survey,,radio,Gender,"0, Female | 1, Male",,,,,,,,,,,
2,pre_role,sample_survey,,radio,Role,"1, Attending | 2, Resident | 3, Fellow | 4, Other",,,,,,,y,,,,
3,pre_role_other,sample_survey,,text,Other:,,,,,,,[pre_role] = '4',y,,,,
4,pre_yrs_experience,sample_survey,,radio,Years of Medical Experience,"1, 1 | 2, 2 | 3, 3 | 4, 4 | 5, 5 | 6, 6 | 7, 7...",,,,,,,y,,,,
5,pre_calculator_use,sample_survey,,yesno,Do you use medical calculators such as Anion G...,,,,,,,,y,,,,
6,pre_why_no_use,sample_survey,,checkbox,For what reasons do you choose not to use medi...,"1, Too hard to use | 2, Too time consuming | 3...",,,,,,[pre_calculator_use] = '0',,,,,
7,pre_why_no_use_other,sample_survey,,notes,Other:,,,,,,,[pre_calculator_use] = '0' and [pre_why_no_use...,,,,,
8,pre_mode,sample_survey,,checkbox,How do you access the calculators you use?,"1, Website | 2, Smartphone app | 3, Manual cha...",,,,,,[pre_calculator_use] = '1',y,,,,
9,pre_mode_other,sample_survey,,text,Other:,,,,,,,[pre_mode(5)] = '1' and [pre_calculator_use] =...,y,RH,,,


## REDCap Project

REDCap is organized by projects.  Each project has it's own separate collection of forms and variables.  A REDCap project can be treated as a unique EDC system, within which the data are all related in some way.  It's important to note that REDCap is not a full fledged relational database management system, although it is often used that way.

## Variables and Forms

The data dictionary contains information about variables and forms.  These two pieces of data combined form a key of sorts.  Together they tell you information about that unique piece of data.  The first variable of the first form in the data dictionary is special, because it's used by REDCap as the key identifier to link records across different forms in your REDCap project.

In [3]:
print(data_dict['Form Name'].unique())

['sample_survey']


Here we can quickly see that this project has only one data collection instrument: `sample_survey`

This is the simplest case, because all data in the project is collected on one form.  More complex projects can have many forms.  REDCap data export will take care of linking the data from all forms together during the data export.

## Field Types

The following field types are available:
1. text - capture free text data.  optional choices can be listed in the `Choices, Calculations, OR Slider Labels` column

1. notes - for capturing free text that contain carriage return/new line characters.

1. calc - calculate the value of a variable using other data already entered.  The `Choices, Calculations, OR Slider Labels` column contains the calculation to apply

1. dropdown - allow the user to select only one of one or more options listed in the `Choices, Calculations, OR Slider Labels` column

1. radio - like dropdown, except all choices are displayed in the form.  only one can be selected.

1. checkbox – multi-selectable checkboxes. it appears in the form similar to radio, but allows more than one selection.

1. yesno – radio buttons with yes and no options coded as No = 0 and Yes = 1.

1. truefalse -radio buttons with true and false options coded as False = 0 and True = 1.

1. slider – visual analogue scale. Slider labels can be specified in `Choices, Calculations, OR Slider Labels`

1. file - upload  a  document.  The document itself is not downloaded as part of the csv, but can be downloaded separately as a zip file containing all uploaded documents keyed by the project identifier and the variable name.

1. descriptivetext - This variable contains text for display only.

1. section - a section header that appears on the form


## Field validation
Validation can be specified in the data dictionary in the `Text Validation Type orShow Slider Number` field for text box fields.  There are a number of predefined validations that can be used, for example email, phone number, or date formats.

| validation | example |
| --- | --- |
| email | name@domain.com |
| integer | 1234 |
| number | 123.456 |
| phone | 555-123-4567 |
| time | 15:30:01 |
| zipcode | 65201 |
| date_dmy | 16-02-2011 |
| date_mdy | 02-16-2011 |
| date_my | 02-2011 |
| date_ymd | 2011-02-16 |
| datetime_dmy | 16-02-2011 17:45 |
| datetime_mdy | 02-16-2011 17:45 |
| datetime_ymd | 2011-02-16 17:45 |
| datetime_seconds_dmy | 16-02-2011 17:45:23 |
| datetime_seconds_mdy | 02-16-2011 17:45:23 |
| datetime_seconds_ymd | 2011-02-16 17:45:23 |
| time_hh:mm | 17:45 |
| time_mm:ss | 4:30 |



## Branching Logic
During data entry, the user of a REDCap form can be subject to computer adaptive data entry.  Some fields may be skipped based on the input of prior fields.  For example, if the user selectes no to a question of "Do you smoke", then you can skip asking for the number of cigarettes per day.  This applies even if the variable is identified as required.

## Controlling access to PHI
The column `Identifier?` can be used to mark variables that contain one of the 18 types of values that are considered as protected in the United States by the HIPAA regulations.  Within REDCap, users can be restricted from downloading or viewing these data.

1. Name
1. Fax number
1. Phone number
1. E-mail address
1. Account numbers
1. Social Security number
1. Medical Record number
1. Health Plan number
1. Certificate/license numbers
1. URL
1. IP address
1. Vehicle identifiers
1. Device ID
1. Biometric ID
1. Full face/identifying photo
1. Other unique identifying number, characteristic, or code
1. Postal address (geographic subdivisions smaller than state)
1. Date precision beyond year

## Choices, Calculations, OR Slider Labels
Of all of the fields in the data dictionary, this one is probably the most important when it comes to reading and understanding a data extraction.  For drop down, radio, check box, true/false, and yes/no fields, data can be represented either by the descriptive label or by a number associated with the choice, depending on how the data are exported from REDCap.  The raw data export is the easiest to use programmatically: it contains columns of field names and numeric values for responses.  The lables export type use the labels for each option as field names, and text values to represent the checked or unchecked status of options.  

The numbers and labels used in these formats appear in the `Choices, Calculations, OR Slider Labels` field in the data dictionary.  



## Format for Choices
For variables that are of the type drop down, radio, check box, true/false, or yes/no, the `Choices, Calculations, OR Slider Labels` contain a pipe (`|`) delimited list of value/lable pairs.  The value and labels are comma separated.

In [4]:
display(data_dict[~data_dict['Choices, Calculations, OR Slider Labels'].isna()][['Choices, Calculations, OR Slider Labels']])

Unnamed: 0,"Choices, Calculations, OR Slider Labels"
1,"0, Female | 1, Male"
2,"1, Attending | 2, Resident | 3, Fellow | 4, Other"
4,"1, 1 | 2, 2 | 3, 3 | 4, 4 | 5, 5 | 6, 6 | 7, 7..."
6,"1, Too hard to use | 2, Too time consuming | 3..."
8,"1, Website | 2, Smartphone app | 3, Manual cha..."
13,"1, Never | 2, Rarely | 3, Occasionally | 4, Re..."
16,"1, equations based on discrete values from the..."
17,"1, Never | 2, Rarely | 3, Occasionally | 4, Re..."
18,"1, Before I see the patient | 2, While I am ta..."
19,"1, Never | 2, Rarely | 3, Occasionally | 4, Re..."


## Raw Data exports
We will now look at the raw data export for the above data dictionary.  In this course we will concern ourselves only with the raw data export, and not the labels export.


In [5]:
data = pd.read_csv('../resources/REDCap/REDCap_Sample_DATA.csv')
display(data)

Unnamed: 0,pre_participant_id,redcap_survey_identifier,prerollout_survey_timestamp,pre_gender,pre_role,pre_yrs_experience,pre_calculator_use,pre_why_no_use___1,pre_why_no_use___2,pre_why_no_use___3,...,pre_likely_to_use_newer,pre_wait_time_to_use,pre_who_determines,prerollout_survey_complete,pre_barriers,barriers_coded_1,barriers_coded_2,pre_lacking_features,lack_features_coded_1,lack_features_coded_2
0,1,,11/19/14 02:21 PM,1.0,1.0,8.0,0.0,0,0,0,...,,,,2,Not having it at the point of care And not ge...,integration,integration,Prognosis,specific calculator feature,specific calculator feature
1,2,,11/19/14 02:31 PM,1.0,1.0,8.0,1.0,0,0,0,...,1.0,5.0,1.0,2,,,,,,
2,3,,11/19/14 02:24 PM,1.0,1.0,7.0,1.0,0,0,0,...,3.0,4.0,1.0,2,,,,,,
3,4,,11/19/14 02:25 PM,0.0,4.0,9.0,0.0,0,0,0,...,,,,2,I find that I do not need them in my practice,necessity,necessity,none,none,none
4,5,,11/19/14 02:36 PM,1.0,3.0,5.0,1.0,0,0,0,...,3.0,3.0,1.0,2,Phone battery,technical,technical,,none,none
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
117,118,791.0,02/17/15 08:04 PM,1.0,2.0,3.0,1.0,0,0,0,...,3.0,3.0,20.0,2,,,,,,
118,119,820.0,02/17/15 08:19 PM,0.0,2.0,1.0,1.0,0,0,0,...,3.0,2.0,1.0,2,,,,,,
119,120,747.0,02/18/15 09:36 AM,1.0,3.0,5.0,1.0,0,0,0,...,3.0,1.0,1.0,2,,,,,,
120,121,834.0,02/18/15 11:00 AM,1.0,2.0,1.0,1.0,0,0,0,...,4.0,1.0,4.0,2,,,,,,


## Form completion
If a survey or form was only partially filled out, then data for that record may be suspect because it is incomplete.  The survey_timestamp and the survey_complete fields will tell you if the form was complete.  Lack of a valid date/time stamp and a survey_complete value other than 2 indicate the form is incomplete.  It is up to the data scientist to make a determination as to whether the incomplete data can or should be used.

In [6]:
display(data[data['prerollout_survey_complete'] != 2])

Unnamed: 0,pre_participant_id,redcap_survey_identifier,prerollout_survey_timestamp,pre_gender,pre_role,pre_yrs_experience,pre_calculator_use,pre_why_no_use___1,pre_why_no_use___2,pre_why_no_use___3,...,pre_likely_to_use_newer,pre_wait_time_to_use,pre_who_determines,prerollout_survey_complete,pre_barriers,barriers_coded_1,barriers_coded_2,pre_lacking_features,lack_features_coded_1,lack_features_coded_2
5,6,,[not completed],1.0,1.0,9.0,1.0,0,0,0,...,,,,0,Use of the EMR is very user unfriendly. Too b...,UI,UI,One click,UI,UI
7,8,,[not completed],,,,,0,0,0,...,,,,0,Many conditions I treat to not have them,necessity,necessity,,none,none
14,15,,[not completed],1.0,1.0,7.0,1.0,0,0,0,...,,,,0,,,,,,
27,28,,[not completed],1.0,1.0,9.0,,0,0,0,...,,,,0,"Not a major barrier, but more time consuming i...",integration,integration,Some could be configured with most common valu...,specific calculator feature,specific calculator feature
34,35,,[not completed],0.0,4.0,9.0,1.0,0,0,0,...,,,,0,little clinical need,necessity,necessity,,none,none
35,36,,[not completed],0.0,1.0,7.0,,0,0,0,...,,,,0,,,,I do not have sufficient experience with them ...,none,none
36,37,,[not completed],0.0,1.0,7.0,,0,0,0,...,,,,0,Not integrated into the workflow. Can't put t...,integration,integration,Have to link out to a separate web site .,UI,integration
38,39,,[not completed],0.0,1.0,8.0,1.0,0,0,0,...,,,,0,,,,,,
53,54,,[not completed],1.0,4.0,9.0,1.0,0,0,0,...,,,,0,,,,,,
83,84,376.0,[not completed],0.0,4.0,8.0,0.0,0,0,1,...,,,,0,,,,,,


## Data file format

We can see that we have some columns in our data file that are a one to one mapping to the data dictionary.  These fields can be treated as straightforward variables.  In other cases, we see fields that have been appended with a triple underscore followed by a number.  These fields represent the multiple choice data type `checkbox`, and each possible value is represented as a separate field.  For example, for the variable `pre_mode`, there are 5 corresponding columns, one for each pair of value-label combinations in the data dictoary for `pre_mode`:

`1, Website | 2, Smartphone app | 3, Manual chart/nomogram | 4, Integrated Calculators component in PowerChart | 5, Other`


Using the data dictionary, we can compute the possible field names.

In [7]:
def get_checkbox_name_choices(data_dict, variable_name):
    choices = data_dict[data_dict['Variable / Field Name'] == variable_name]['Choices, Calculations, OR Slider Labels']
    pre_mode_variables = []
    for choice in choices.values[0].split('|'):
        pre_mode_variables = pre_mode_variables + [variable_name+'___'+choice.strip().split(',')[0]]

    return pre_mode_variables

In [8]:
print(get_checkbox_name_choices(data_dict, 'pre_mode'))
display(data[get_checkbox_name_choices(data_dict, 'pre_mode')].head())

['pre_mode___1', 'pre_mode___2', 'pre_mode___3', 'pre_mode___4', 'pre_mode___5']


Unnamed: 0,pre_mode___1,pre_mode___2,pre_mode___3,pre_mode___4,pre_mode___5
0,0,0,0,0,0
1,1,0,0,0,0
2,1,1,0,0,0
3,0,0,0,0,0
4,1,1,0,0,0


## Using this data in machine learning
You might take note that the representation REDCap makes for checkbox types appears to be one-hot encoded.  However, this is not the case.  One-hot encoded variables by definition can only have the value of 1 in exactly one out of the set of variables.  This is not the case for REDCap variables that can take on multiple values.  Be warned that treating checkbox types as one-hot encoded will result in invalid regressions.  The proper way to represent checkbox data in regression is to treat each possible value as dichotomous (i.e. binary).

yes/no and true/false types are dichotomous and are already in an appropriate form for use in regression and other machine learning models that require one-hot encoded categoricals.

In [9]:
display(data[['pre_gender']].head())

Unnamed: 0,pre_gender
0,1.0
1,1.0
2,1.0
3,0.0
4,1.0


Other types with more than two possible values, such as drop downs and radio, require the use of preprocessing techniques to prepare them in the one-hot encoded form sutiable for machine learning.  One-hot encoding is approriate for these types because they can take only one value.

In [10]:
data = pd.concat([data,pd.get_dummies(data['pre_role'], prefix='pre_role', drop_first=True)],axis=1)
data.drop(['pre_role'],axis=1, inplace=True)
display(data[['pre_role_2.0', 'pre_role_3.0', 'pre_role_4.0']].head())

Unnamed: 0,pre_role_2.0,pre_role_3.0,pre_role_4.0
0,0,0,0
1,0,0,0
2,0,0,0
3,0,0,1
4,0,1,0


## Categorical data
You can choose to convert a column in your pandas data frame to a Categorical type.  This will have some advantages, such as associating a text meaning to the numeric category.  This is an appropriate step to take for radio, true/false, yes/no, and drop downs.  For checkbox types, the data in the REDCap export are broken into multiple fields, so the metadata cannot be applied in this way to those field types.

Below is a function that will convert the REDCap format found in the `Choices, Calculations, OR Slider Labels` column to a python dict object, and another that will convert a column to `pd.Categorical` using this mapping dict.

In [11]:
def choice_to_dict(data_dict, variable_name):
    field_type = data_dict[data_dict['Variable / Field Name'] == variable_name]['Field Type']

    if field_type.values[0] == 'yesno':
        choices = '1, yes | 0, no'
    elif field_type.values[0] == 'truefalse':
        choices = '1, true | 0, false'
    else:
        choices = data_dict[data_dict['Variable / Field Name'] == variable_name]['Choices, Calculations, OR Slider Labels'].values[0]
    mapping = {}
    for choice in choices.split('|'):
        value_pair = choice.strip().split(',')
        mapping[int(value_pair[0])] = value_pair[1].strip()
    return mapping
            
def categorize(df, col_name, mapping=None):
    if mapping:
        df[col_name] = pd.Categorical(df[col_name].map(mapping))
    else:
        df[col_name] = pd.Categorical(df[col_name])

In [12]:
variable_name = 'pre_gender'

mapping = choice_to_dict(data_dict, 'pre_gender')

print(mapping)

categorize(data, variable_name, mapping)

display(data.head())

{0: 'Female', 1: 'Male'}


Unnamed: 0,pre_participant_id,redcap_survey_identifier,prerollout_survey_timestamp,pre_gender,pre_yrs_experience,pre_calculator_use,pre_why_no_use___1,pre_why_no_use___2,pre_why_no_use___3,pre_why_no_use___4,...,prerollout_survey_complete,pre_barriers,barriers_coded_1,barriers_coded_2,pre_lacking_features,lack_features_coded_1,lack_features_coded_2,pre_role_2.0,pre_role_3.0,pre_role_4.0
0,1,,11/19/14 02:21 PM,Male,8.0,0.0,0,0,0,0,...,2,Not having it at the point of care And not ge...,integration,integration,Prognosis,specific calculator feature,specific calculator feature,0,0,0
1,2,,11/19/14 02:31 PM,Male,8.0,1.0,0,0,0,0,...,2,,,,,,,0,0,0
2,3,,11/19/14 02:24 PM,Male,7.0,1.0,0,0,0,0,...,2,,,,,,,0,0,0
3,4,,11/19/14 02:25 PM,Female,9.0,0.0,0,0,0,0,...,2,I find that I do not need them in my practice,necessity,necessity,none,none,none,0,0,1
4,5,,11/19/14 02:36 PM,Male,5.0,1.0,0,0,0,0,...,2,Phone battery,technical,technical,,none,none,0,1,0


## Numeric data
Numeric data in redcap is captured using text fields.  Additionally, validation should be used to ensure only numeric data is captured.  You can identify numeric data by looking at the `Text Validation Type OR Show Slider Number` field in the data dictionary.  `integer` or `numeric` in this field will give you variables that can be treated as number data.  Pandas typically will do a good job of loading these data into a compatible type but it's a good idea to check.

In [13]:
for i, row in data_dict[(data_dict['Text Validation Type OR Show Slider Number'] == 'integer') | 
                  (data_dict['Text Validation Type OR Show Slider Number'] == 'number')].iterrows():
    print('field:', row['Variable / Field Name'], '| validation:', row['Text Validation Type OR Show Slider Number'], '| pandas type:', data[row['Variable / Field Name']].dtype)

field: pre_num_used | validation: integer | pandas type: float64


## Free text data
Free text data (called an "open question") should be avoided if possible in surveys or data collection instruments because it is hard to process.  However, it is a more natural way for respondents to input data or respond to questions, and can help you to capture data that you may not have considered.  "Closed questions", one that can only take specific values, are easier to process and analyze, but may lead to less information during survey collection.

Some common ways to process free text data is by using sentiment analysis, natural language processing to extract predetermined concepts, or by "coding" responses into categories that are determined by an analysis of the free text data.

The coding method is usually done to break free text responses down into broad categories, and can be a one to one or one to many mapping.  Coding is a manual and labor intensive process for large datasets.  Coding should be done by parsing the free text first to identify candidate categories, then creating a rubric for categorization, and finally applying the rubric to each free text response.  It is usually best if either the first and last step are performed by separate individuals, or if multiple individuals are involved in all of the steps.  The inter-rater agreement can be calulated in the second option, which measures the amount of agreement between coders in applying the rubric.

Cohen's Kappa can also be used for _intra_-rater agreement, although this is less common.  Intra-rater agreement is used if just one person performs the coding.  That individual would perform the coding twice and compare the results using Cohen's Kappa.  There is implicit bias in doing this of course, which can be mitigated somewhat with the passage of time.  Do not code your free text reponses twice in one sitting, as you are likely to recall your coded assignements versus using the rubric to do the coding.

## Cohen's Kappa for inter-related agreement
The Cohen's Kappa statistic is a good measure for coded responses from survey free text.  It will return a value from 0 to 1.  It can be interpreted as:

  * 0 = agreement equivalent to chance.
  * 0.1 – 0.20 = slight agreement.
  * 0.21 – 0.40 = fair agreement.
  * 0.41 – 0.60 = moderate agreement.
  * 0.61 – 0.80 = substantial agreement.
  * 0.81 – 0.99 = near perfect agreement
  * 1 = perfect agreement.
  
This metric should be reported when discussing the use of coded free text responses in survey analyses.  It gives the reader an idea as to how well the rubric or instructions were followed for performing the coding, and how consistently the rubric was applied.  

In our example data, survey respondents were asked to identify features that medical calculators lacked.  Those were collected as free text in the field `pre_lacking_features`, and subsequently coded by two people in `lack_features_coded_1` and `lack_features_coded_2`.  When assessing the inter-rater agreement, we ignore responses that are not coded (i.e. na, or null free text responses)

Below are some functions that can be used to prepare data for reporting this metric, and calculating Cohen's Kappa.

In [14]:
from sklearn.metrics import cohen_kappa_score

def add_pct(df, pct_col_name, cnt_col_name, decimal_places):
    df_sum = df.sum()

    df[pct_col_name] = df.apply(lambda row: round((row[cnt_col_name]/df_sum)*100, decimal_places), axis=1)
    return df

def group(df, group_col_name, cnt_col_name):
    final_group = pd.DataFrame(df.groupby(group_col_name).size())
    final_group.columns = [cnt_col_name]
    return final_group

def cohen_kappa(df, c1, c2):
    cnt_col_name = 'cnt'
    c1group = group(df, c1, cnt_col_name)
    c2group = group(df, c2, cnt_col_name)
    
    print()
    print(c1, 'total: {}'.format(c1group.sum()))
    display(add_pct(c1group, 'pct', cnt_col_name, 1))
    print()
    print(c2, 'total: {}'.format(c2group.sum()))
    display(add_pct(c2group, 'pct', cnt_col_name, 1))
    print()
    print("Cohen's Kappa for {c1} and {c2} is {ck:.3f}".format(c1=c1, c2=c2, ck=cohen_kappa_score(df[c1].cat.codes, df[c2].cat.codes)))

categorize(data, 'lack_features_coded_1')
categorize(data, 'lack_features_coded_2')
    
cohen_kappa(data, 'lack_features_coded_1', 'lack_features_coded_2')


lack_features_coded_1 total: cnt    45
dtype: int64


Unnamed: 0_level_0,cnt,pct
lack_features_coded_1,Unnamed: 1_level_1,Unnamed: 2_level_1
UI,3,6.7
integration,19,42.2
none,16,35.6
other,2,4.4
specific calculator feature,5,11.1



lack_features_coded_2 total: cnt    45
dtype: int64


Unnamed: 0_level_0,cnt,pct
lack_features_coded_2,Unnamed: 1_level_1,Unnamed: 2_level_1
UI,2,4.4
integration,18,40.0
none,17,37.8
other,2,4.4
specific calculator feature,6,13.3



Cohen's Kappa for lack_features_coded_1 and lack_features_coded_2 is 0.941


## Interpretation
Our Kappa value was 0.941, which indicates near perfect agreement.  For continuation of using the coded responses, you may either: 

1. Make another pass with the coders examining the differences and agreeing on which coded response to use
1. Use the responses from one or the other.

If you have a lower level of agreement, then option 1 is the best choice.  If you have near perfect agreement, then you could choose either option.  For a small number of disagreements, option 2 is feasible.  If there are a very large number of free text responses and therefore a significant number of disagreements, then it may not be worth the effort for near perfect agreement already.