### 2. Survey Programming and Instrument Design

#### Best Programming Practices
Here we highlight just a few of the best practices you should keep in mind when designing your survey instrument:

-	Use a consistent naming convention for variables that is also easy to read. Rather than naming variables with question numbers (q1, q2, q3, etc.), it is recommended that you use a 1-2 letter prefix that indicates the module followed an underscore and then a short, descriptive name that refers specifically to the question. For example, a question in food security module about the number of meals consumed per day might be named as “fs_meals_per_day”. 
-	Make sure that your variable names are comfortably within the 32 character limit that they will face once imported into Stata. Keep in mind that any variables within a repeat group will become longer if reshaped wide, so it is safest to ensure variable names are less than or equal to 25 characters. Else, Stata will automatically truncate the variable names upon importing, which could mess up the data structure, not to mention cause a great deal of confusion. You can do a quick check of variable name lengths in your Excel form by generating a column in Excel that calculates the character length of the current variable name (using the LEN formula) and then uses a conditional formatting rule to highlight any cells in that column containing a value greater than 25.
-	Use notes and hints liberally to give both enumerators and respondents additional instructions on how to answer questions. For example, you may want to instruct the enumerator not to read the choice options for a question aloud, or you may want to remind the respondent that they are free to state that they refuse to answer or don’t know the answer.
-	Use a consistent set of values for standardized options such as “Other (specify)”, “Not applicable”, “Refuse to answer”, and “Don’t know”. Also use a consistent set of values for common response sets, such as 0 – “No” and 1 – “Yes”.
-	Make sure all numeric questions (of the field type integer, decimal, and date) have constraints within a reasonable range. For example, a respondent should not be allowed to declare their age as 2019. It is also a good idea to include soft constraints, so that if a suspicious value is entered, the enumerator will be prompted to confirm it is correct before proceeding. For example, if a respondent declares their age as 130, a message should display saying “Enumerator: the value for age you have entered is unusually high. Are you sure this is the correct response?” 
-	Include confirmation questions to double check the entry of important information, such as dates, IDs, or sensitive variables.
-	Make sure a unique ID exists for observations at each unit level. For instance, if a survey asks some questions at the household level and some at the individual level, there should be separate ways of uniquely identifying all household observations (i.e. a household ID) and all individual observations (i.e. an individual ID). If conducting multiple rounds of a single survey, it is critical to ensure that these IDs are preserved across the various rounds so that observations can be accurately matched.

For a full overview of survey programming best practices, you should read the checklist referenced above.

#### Instrument Design

Designing a survey means not only programming the questionnaire but also thinking about the best way to structure the form(s). In some cases, it may make sense to create a single form; this makes it quite easy to pull data across various survey sections. However, a single form that is very long and complex has drawbacks: it may increase the risk of data loss/fabrication and it could make it harder to sift through the submitted data. Having multiple forms addresses these concerns, but note that it poses challenges of its own e.g. when trying to stream data between forms (since this requires reliable network connection).

It is also worth mentioning the concept of preloading. To preload data means that you pull existing data from a previous source into the current survey form, relying on a consistent unique ID to identify a particular observation. This can be useful if, for example, you are conducting an endline survey that asks household members for their names, genders, and dates of birth. If this information was also collected during the baseline survey, it may be worthwhile to preload that into the endline survey as use as a “check” to ensure that you are correctly capturing information for the same person across rounds. You could then provide respondents with the option to confirm and/or modify this preloaded data, as opposed to requiring them to enter it all anew.

#### Bench Testing

Once the survey instrument has been designed, team members on a project that uses CAI must bench test the form thoroughly before it is rolled out for actual data collection. Bench testing means checking the final programming for 1) consistency with the questionnaire and 2) data quality assurance. This includes making sure that all questions are properly labeled, there are constraints where relevant, skip patterns ensure correct logic flow, preloaded data appears when applicable, etc. It is best to have this testing done by a third party (when possible) or by individuals who are unfamiliar with the actual survey structure.

Although bench testing is a time-intensive process, it is key. Investing up front in a well-programmed survey has immense pay-offs. Not only can it prevent disasters in the field (e.g., forms can crash and lose all data because of programming errors), but it can also improve the quality of collected data and cut down significantly on time spent cleaning and preparing the dataset for analysis.

If you are using SurveyCTO as your CAI software, you can bench test a survey directly on the SurveyCTO web platform or else on the SurveyCTO Collect App using any Android device. 

During bench testing, be sure to:

-	Examine the range of values allowed for each response. You may suggest limiting the range to a certain set to avoid entry errors that create outliers. Check that each response only allows the right sort of values e.g. a numeric question should only allow numbers.
-	Inspect the final dataset from a few test surveys. Are all of the variables there? Do they take on the values you expect? Are missing values (don’t know, refusal, etc.) properly coded? 
-	Test that all skip patterns are working properly and that the logic flow is correct.
-	Check that critical questions are required.
-	Spend extra time on particularly sophisticated code e.g. if a question has a constraint that depends on multiple other questions’ values, play around with different answers that push the range for each of those questions.
-	Watch out for any obvious opportunities for data entry error (e.g., when you try to swipe to the next screen or scroll down for some questions, is it easy to change or clear out your response?)

-	**Resource:** [Bench Testing Form - Basic](https://northwestern.app.box.com/file/311246786680) (Box)
