A script to generate a mock survey overview csv file based on the following:

Survey overview file

Columns: 

Question identifier (“Question_ID”)

Question text (“Question_Text”)

Response option text (“Response_Option”)

Response option numerical mapping (“Response_Value”)


What ADICO Component this question identifies (Attibute, Aim, Condition, Deontic) (“ADICO_Category”)

Rows: a row for each response option. E.g. if a question can be answered with any integer between one and five then that question takes up five rows.



Structure the survey to cover necessary ADICO elements:

Attributes: set of questions to distinguish relevant demographics

Aims: Questions about actions they have performed, intend to perform, changes they made, and beliefs they hold. It depends on the institutional analyst designing the survey, what behaviour they are analysing, norms and shared strategies are all about 
what social and external factors drive behaviour.  

Conditions: Questions about relevant factors that impact responders behaviour. The Institutional Analyst can follow aim questions up with condition questions asking whether particular factors influenced their behaviour. E.g. “Did you perform this action due to this factor?” or “Would you perform this action if this factor applied?”. Questions can also be open and ask the responder to identify which factors influenced their behaviour regarding this aim. The more tangible the factor is the more interesting e.g. receiving a subsidy for it, their neighbour did it,  their wage is above a threshold, they saw an advertisement for it, etc. 

Deontic: Each aim question could be paired with a deontic follow-up question such as: “Do you feel obligated to perform this action” (must), “Do your peers expect you to perform this action” (must), “Are you not allowed to perform this action?” (may not), or “Why did you not perform this action?” (may not/must not). These questions identify the social aspect of the actions.  
 

Usually, the goal of an institutional analyst is to explore human behaviour, what factors impact it and inform policy decisions. E.g. if a condition clearly influences behaviour, the condition can be used.


Let's say 2 attribute questions, 3 aim questions, 2-5 condition questions per aim question, and 1-2 deontic questions per aim question. 


Thanks!

In [6]:
import pandas as pd

# Creating the data structure for the DataFrame
data = {
    "Question_ID": [],
    "Question_Text": [],
    "Response_Option": [],
    "Response_Value": [],
    "ADICO_Category": []
}

# Attributes questions
attributes_questions = [
    {"id": "Q1", "text": "What is your age?", "responses": ["Under 18", "18-35", "36-50", "51-65", "Over 65"], "category": "Attribute"},
    {"id": "Q2", "text": "What is your gender?", "responses": ["Male", "Female", "Other"], "category": "Attribute"}
]

# Aims questions
aims_questions = [
    {"id": "Q3", "text": "Have you recycled in the past month?", "responses": ["Yes", "No"], "category": "Aim"},
    {"id": "Q4", "text": "Do you use public transportation frequently?", "responses": ["Yes", "No"], "category": "Aim"},
    {"id": "Q5", "text": "Did you vote in the last election?", "responses": ["Yes", "No"], "category": "Aim"}
]

# Conditions questions
conditions_questions = [
    {"id": "Q6", "text": "Do you think recycling is easy?", "responses": ["Yes", "No"], "category": "Condition"},
    {"id": "Q7", "text": "Do you consider pulic transport to be affordable?", "responses": ["Very affordable", "Affordable", "Neutral", "Expensive", "Very expensive"], "category": "Condition"},
    {"id": "Q8", "text": "Do you follow political news on Social media?", "responses": ["Yes", "No"], "category": "Condition"}
]

# Deontic questions
deontic_questions = [
    {"id": "Q9", "text": "Do you feel obligated to recycle?", "responses": ["Yes", "No"], "category": "Deontic"},
    {"id": "Q10", "text": "Do your peers expect you to use public transportation?", "responses": ["Yes", "No"], "category": "Deontic"}
]

# Function to add questions to the DataFrame
def add_questions(questions):
    for q in questions:
        for idx, resp in enumerate(q["responses"], 1):
            data["Question_ID"].append(q["id"])
            data["Question_Text"].append(q["text"])
            data["Response_Option"].append(resp)
            data["Response_Value"].append(idx)
            data["ADICO_Category"].append(q["category"])

# Adding each category of questions to the data structure
add_questions(attributes_questions)
add_questions(aims_questions)
add_questions(conditions_questions)
add_questions(deontic_questions)

# Creating the DataFrame
survey_df = pd.DataFrame(data)

# Display the DataFrame
display(survey_df)

survey_df.to_csv("Mock_Survey_Overview.csv", index=False)


Unnamed: 0,Question_ID,Question_Text,Response_Option,Response_Value,ADICO_Category
0,Q1,What is your age?,Under 18,1,Attribute
1,Q1,What is your age?,18-35,2,Attribute
2,Q1,What is your age?,36-50,3,Attribute
3,Q1,What is your age?,51-65,4,Attribute
4,Q1,What is your age?,Over 65,5,Attribute
5,Q2,What is your gender?,Male,1,Attribute
6,Q2,What is your gender?,Female,2,Attribute
7,Q2,What is your gender?,Other,3,Attribute
8,Q3,Have you recycled in the past month?,Yes,1,Aim
9,Q3,Have you recycled in the past month?,No,2,Aim


2. Survey Responses file with the following content

	Columns: the questions identified by their matching "Question_ID"

	Rows: the responses identified by a "Response_ID"  
    

In [7]:
import pandas as pd
import numpy as np

# Re-using the 'survey_df' DataFrame created in previous steps
# Generate a dictionary to map question IDs to response options
question_response_mapping = {}
for idx in survey_df['Question_ID'].unique():
    options = survey_df[survey_df['Question_ID'] == idx]['Response_Value'].tolist()
    question_response_mapping[idx] = options

# Number of respondents to simulate
num_respondents = 100  # You can adjust this number

# Creating the responses
responses_data = {"Response_ID": range(1, num_respondents + 1)}
for question_id in question_response_mapping.keys():
    # Assign random responses based on the available response values
    responses_data[question_id] = [np.random.choice(question_response_mapping[question_id]) for _ in range(num_respondents)]

# Creating the DataFrame
responses_df = pd.DataFrame(responses_data)

# Transpose the DataFrame to match your requirement
# Columns are questions, rows are responses, each identified by Response_ID
responses_df = responses_df.set_index("Response_ID")

# Display the DataFrame
display(responses_df.head(10))  # Display the first 10 responses to check

# Optionally, save to CSV
responses_df.to_csv("mock_survey_responses.csv")


Unnamed: 0_level_0,Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8,Q9,Q10
Response_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,5,3,2,2,2,1,2,1,2,1
2,3,1,1,1,1,1,4,2,1,1
3,1,3,1,2,1,1,1,2,1,1
4,1,1,2,2,1,2,4,1,1,2
5,4,1,2,2,2,1,1,2,2,1
6,2,1,1,1,1,1,2,1,2,2
7,3,3,2,1,1,2,1,1,1,2
8,5,1,2,1,2,1,3,1,1,2
9,1,2,2,1,2,2,3,1,1,2
10,5,2,1,2,1,1,1,1,2,1
