# Technical Audit of Sentience Institute Study

### Purpose
* Evaluate the accuracy of the figures in this study
* Evaluate the methodology for reasonableness
* Evaluate whether the conclusions of this study match the data
* Consider whether this dataset is useful for further research purposes

### Data Description
* Population: US Adults
* Sample: 1,094 people, census-balanced based on age, sex, region, ethnicity, and income.
    * *Note: which census? How does Ipsos do census-balancing?*
    * Excludes individuals who failed the awareness check - 46.3% of people passed.
        * I wonder whether selecting based on attention biases the results?
    
### Main Findings
1. **54%** of US adults say they are “currently trying to **consume fewer animal-based foods** (meat, dairy, and/or eggs) **and more plant-based foods** (fruits, grains, beans, and/or vegetables).”
2. **97%** of US adults agree “Whether to eat animals or be vegetarian is a **personal choice**, and nobody has the right to tell me which one they think I should do.”
3. **49%** of US adults support a **ban on factory farming**
4. **47%** support a **ban on slaughterhouses**
5. **33%** support a **ban on animal farming**
6. **58%** of US adults think “most farmed animals are **treated well**”
7. **75%** of US adults say they usually buy animal products “**from animals that are treated humanely**”

In [1]:
import numpy as np
import pandas as pd
import seaborn as sb

from openpyxl import load_workbook
from openpyxl.utils import get_column_interval

In [2]:
import os
import re
from collections import Counter

In [3]:
data_dir = "data"
input_data = os.path.join(data_dir, "Animal Farming Attitudes Survey Data.xlsx")

def get_excel_range(filename, sheet_name, sheet_range):
    """
    FN suggested here: https://stackoverflow.com/questions/43327975/how-can-i-
    read-a-rangea5b10-and-place-these-values-into-a-dataframe-using-o
    """
    wb = load_workbook(filename=filename, read_only=True)
    ws = wb[sheet_name]
    
    sheet_range = re.sub("\s", "", sheet_range)
    ws_range = ws[sheet_range]
    header = [cell.value for cell in ws_range[0]]
    rows = [ [cell.value for cell in row] for row in ws_range[1:] ]
    
    return pd.DataFrame(rows, columns=header)

#### 1. Clean Raw Dataset

In [4]:
raw = get_excel_range(input_data, "Raw", "A1:AH1095")
raw.head()

Unnamed: 0,Run,Program Version,Time Started,Minutes Spent,Points,numberOfQuestions,increment,progress,Position,otx_id_1,...,I support a ban on slaughterhouses. (a20sbox),I support a ban on animal farming. (8mjexdf),"Suppose you were given $10 and allowed to donate any amount of it to an effective non-profit organization that works to help farmed animals, keeping the rest for yourself. How much of this $10 would you donate? (sbzmhrb)","Suppose a public demonstration against the problems of factory farming occurred near where you live and your friend asked you to come demonstrate with her. If this demonstration fit into your schedule, how likely would you be to join and help demonstrate? (yrd9ug1)","When these foods are the same price as animal-based foods, people should eat more of these foods and fewer animal-based foods. (5yfiuvk)","When these foods are the same price as animal-based foods, I would prefer to eat more of these foods and fewer animal-based foods. (g9j5zwx)","When these foods are the same price as animal-based foods, people should eat more of these foods and fewer animal-based foods. (nawdq1n)","When these foods are the same price as animal-based foods, I would prefer to eat more of these foods and fewer animal-based foods. (88inyp8)","Most farmed animals are treated well. For example, the animals are given enough space and kept in good health. (1uaomy1)","The animal-based foods I purchase (meat, dairy, and/or eggs) usually come from animals that are treated humanely. For example, the animals are given enough space and kept in good health. (drd18km)"
0,2891666.0,8.0,2017-10-06 23:53:40,3.97,0.0,18.0,0.055556,1.0,arutyo6,501854929.0,...,Disagree,Disagree,10,Somewhat likely,Somewhat disagree,Somewhat disagree,Strongly disagree,Strongly disagree,Somewhat agree,No opinion
1,2891675.0,8.0,2017-10-07 00:01:52,7.02,0.0,18.0,0.055556,1.0,arutyo6,501854985.0,...,Strongly disagree,Strongly disagree,5,Very unlikely,Agree,Strongly agree,Strongly agree,Strongly agree,Agree,No opinion
2,2891670.0,8.0,2017-10-06 23:59:43,1.52,0.0,18.0,0.055556,1.0,arutyo6,501855036.0,...,Somewhat agree,Somewhat disagree,3,Very likely,Somewhat agree,Somewhat agree,Somewhat disagree,Disagree,Strongly disagree,Somewhat disagree
3,2891681.0,8.0,2017-10-07 00:07:28,3.62,0.0,18.0,0.055556,1.0,arutyo6,501855307.0,...,No opinion,Strongly disagree,10,Somewhat likely,Somewhat agree,Somewhat agree,Somewhat agree,Somewhat agree,Somewhat disagree,Somewhat disagree
4,2891716.0,8.0,2017-10-07 00:19:02,4.5,0.0,18.0,0.055556,1.0,arutyo6,501855695.0,...,Disagree,Disagree,0,Very unlikely,Disagree,Strongly disagree,Disagree,Strongly disagree,Agree,Agree


In [5]:
# rename column names for readability
raw.columns

Index(['Run', 'Program Version', 'Time Started', 'Minutes Spent', 'Points',
       'numberOfQuestions', 'increment', 'progress', 'Position', 'otx_id_1',
       'otx_id_2', 'Age', 'Gender', 'Region', 'Ethnicity', 'Education',
       'Income',
       'People should consume fewer animal-based foods (meat, dairy, and/or eggs) and more plant-based foods (fruits, grains, beans, and/or vegetables). (jaocxhv)',
       'I am currently trying to consume fewer animal-based foods (meat, dairy, and/or eggs) and more plant-based foods (fruits, grains, beans, and/or vegetables). (8xrlozv)',
       'I have some discomfort with the way animals are used in the food industry. (ev1wy8h)',
       'Farmed animals have roughly the same ability to feel pain and discomfort as humans. (hiia3w0)',
       'Whether to eat animals or be vegetarian is a personal choice, and nobody has the right to tell me which one they think I should do. (saaqnbp)',
       'The factory farming of animals is one of the most importan

In [6]:
# 1. rename question vars for ease of use. Save the text of the questions in a dictionary somewhere.
questions = [
    'People should consume fewer animal-based foods (meat, dairy, and/or eggs) and more plant-based foods (fruits, grains, beans, and/or vegetables). (jaocxhv)',
    'I am currently trying to consume fewer animal-based foods (meat, dairy, and/or eggs) and more plant-based foods (fruits, grains, beans, and/or vegetables). (8xrlozv)',
    'I have some discomfort with the way animals are used in the food industry. (ev1wy8h)',
    'Farmed animals have roughly the same ability to feel pain and discomfort as humans. (hiia3w0)',
    'Whether to eat animals or be vegetarian is a personal choice, and nobody has the right to tell me which one they think I should do. (saaqnbp)',
    'The factory farming of animals is one of the most important social issues in the world today. /(A factory farm is a large industrialized farm, especially one on which a large number of livestock are raised indoors in conditions intended to maximize production at minimal cost.)/ (r1z9y5o)',
    'I support a ban on the factory farming of animals. (zc3ae6)',
    'I support a ban on slaughterhouses. (a20sbox)',
    'I support a ban on animal farming. (8mjexdf)',
    'Suppose you were given $10 and allowed to donate any amount of it to an effective non-profit organization that works to help farmed animals, keeping the rest for yourself. How much of this $10 would you donate? (sbzmhrb)',
    'Suppose a public demonstration against the problems of factory farming occurred near where you live and your friend asked you to come demonstrate with her. If this demonstration fit into your schedule, how likely would you be to join and help demonstrate? (yrd9ug1)',
    'When these foods are the same price as animal-based foods, people should eat more of these foods and fewer animal-based foods. (5yfiuvk)',
    'When these foods are the same price as animal-based foods, I would prefer to eat more of these foods and fewer animal-based foods. (g9j5zwx)',
    'When these foods are the same price as animal-based foods, people should eat more of these foods and fewer animal-based foods. (nawdq1n)',
    'When these foods are the same price as animal-based foods, I would prefer to eat more of these foods and fewer animal-based foods. (88inyp8)',
    'Most farmed animals are treated well. For example, the animals are given enough space and kept in good health. (1uaomy1)',
    'The animal-based foods I purchase (meat, dairy, and/or eggs) usually come from animals that are treated humanely. For example, the animals are given enough space and kept in good health. (drd18km)'
]

format_dict = { text: "q{}".format(i + 1) for i, text in enumerate(questions) }
labels = { "q{}".format(i + 1): text for i, text in enumerate(questions) }

raw = raw.rename(index=str, columns=format_dict)
raw.columns

Index(['Run', 'Program Version', 'Time Started', 'Minutes Spent', 'Points',
       'numberOfQuestions', 'increment', 'progress', 'Position', 'otx_id_1',
       'otx_id_2', 'Age', 'Gender', 'Region', 'Ethnicity', 'Education',
       'Income', 'q1', 'q2', 'q3', 'q4', 'q5', 'q6', 'q7', 'q8', 'q9', 'q10',
       'q11', 'q12', 'q13', 'q14', 'q15', 'q16', 'q17'],
      dtype='object')

In [7]:
# 2. Convert all vars to snake case
def make_snakecase(text):
    remove_spaces = re.sub("\s+", "_", text)
    return re.sub('([a-z0-9])([A-Z])', r'\1_\2', remove_spaces).lower()

raw.columns = map(make_snakecase, raw.columns)
raw.columns

Index(['run', 'program_version', 'time_started', 'minutes_spent', 'points',
       'number_of_questions', 'increment', 'progress', 'position', 'otx_id_1',
       'otx_id_2', 'age', 'gender', 'region', 'ethnicity', 'education',
       'income', 'q1', 'q2', 'q3', 'q4', 'q5', 'q6', 'q7', 'q8', 'q9', 'q10',
       'q11', 'q12', 'q13', 'q14', 'q15', 'q16', 'q17'],
      dtype='object')

In [8]:
key = get_excel_range(input_data, "Key", "A1:E54")
key.head()

Unnamed: 0,Question Name,Question Text,Data Type,Answer Name,Answer Text
0,AGE,what is your age?,Numeric Text,,
1,GENDER,what is your gender?,Single Punch,A1,male
2,GENDER,what is your gender?,Single Punch,A2,female
3,REGION,dummy question for zip-region4,Single Punch,A1,northeast
4,REGION,dummy question for zip-region4,Single Punch,A2,midwest


In [9]:
questions = list(map(make_snakecase, key["Question Name"]))
values = key["Answer Name"]
text = key["Answer Text"]
unique_key = zip(questions, values)
key_dict = dict(zip(unique_key, text))
list(key_dict.items())[:10]

[(('age', ''), ''),
 (('gender', 'A1'), 'male'),
 (('gender', 'A2'), 'female'),
 (('region', 'A1'), 'northeast'),
 (('region', 'A2'), 'midwest'),
 (('region', 'A3'), 'south'),
 (('region', 'A4'), 'west'),
 (('ethnicity', 'A1'), 'white or caucasian (not hispanic or latino)'),
 (('ethnicity', 'A2'), 'black or african-american (not hispanic or latino)'),
 (('ethnicity', 'A3'), 'asian/pacific islander')]

In [10]:
raw.head()

Unnamed: 0,run,program_version,time_started,minutes_spent,points,number_of_questions,increment,progress,position,otx_id_1,...,q8,q9,q10,q11,q12,q13,q14,q15,q16,q17
0,2891666.0,8.0,2017-10-06 23:53:40,3.97,0.0,18.0,0.055556,1.0,arutyo6,501854929.0,...,Disagree,Disagree,10,Somewhat likely,Somewhat disagree,Somewhat disagree,Strongly disagree,Strongly disagree,Somewhat agree,No opinion
1,2891675.0,8.0,2017-10-07 00:01:52,7.02,0.0,18.0,0.055556,1.0,arutyo6,501854985.0,...,Strongly disagree,Strongly disagree,5,Very unlikely,Agree,Strongly agree,Strongly agree,Strongly agree,Agree,No opinion
2,2891670.0,8.0,2017-10-06 23:59:43,1.52,0.0,18.0,0.055556,1.0,arutyo6,501855036.0,...,Somewhat agree,Somewhat disagree,3,Very likely,Somewhat agree,Somewhat agree,Somewhat disagree,Disagree,Strongly disagree,Somewhat disagree
3,2891681.0,8.0,2017-10-07 00:07:28,3.62,0.0,18.0,0.055556,1.0,arutyo6,501855307.0,...,No opinion,Strongly disagree,10,Somewhat likely,Somewhat agree,Somewhat agree,Somewhat agree,Somewhat agree,Somewhat disagree,Somewhat disagree
4,2891716.0,8.0,2017-10-07 00:19:02,4.5,0.0,18.0,0.055556,1.0,arutyo6,501855695.0,...,Disagree,Disagree,0,Very unlikely,Disagree,Strongly disagree,Disagree,Strongly disagree,Agree,Agree


#### 2. Weight cleaned dataset

In [11]:
raw.education.head()

0    A8
1    A5
2    A5
3    A8
4    A6
Name: education, dtype: object

In [15]:
def get_freq(df=None, colname=None, series_obj=None):
    series_obj = series_obj if series_obj is not None else df[colname]
    return Counter(series_obj).most_common()

In [19]:
educ_mappings = {
    i: key_dict[("education", "A{}".format(i))] for i in range(1, 14)
}
educ_mappings

{1: 'grade 4 or less',
 2: 'grade 5 to 8',
 3: 'grade 9 to 11',
 4: 'grade 12 (no diploma)high school graduate',
 5: 'regular high school diploma',
 6: 'ged or alternative credentialcollege or some college',
 7: 'some college credit, but less than 1 year of college credit',
 8: '1 or more years of college credit, no degree',
 9: "associate's degree (for example:  aa, as)",
 10: "bachelor's degree (for example:  ba, bs) after bachelor's degree",
 11: "master's degree (for example:  ma, ms, meng, med, mba)",
 12: "professional degree beyond bachelor's degree (for example:  md, dds, dvm, llb, jd)",
 13: 'doctorate degree (for example:  phd, edd)'}

In [12]:
def convert_educ(val):
    test_val = int(val[1])
    
    if val in [1, 2, 3, 4]:
        return "less_than_high"
    elif val in [5, 6]:
        return "high"
    elif val in [7, 8]:
        return "some_college"
    elif val == 9:
        return "associates"
    elif val == 10:
        return "college"
    elif val in [11, 12, 13]:
        return "post_grad"

raw.educ_flags = raw.education.apply(convert_educ)
raw.educ_flags.head()

0         1 or more years of college credit, no degree
1                          regular high school diploma
2                          regular high school diploma
3         1 or more years of college credit, no degree
4    ged or alternative credentialcollege or some c...
Name: education, dtype: object

In [14]:
get_freq(series_obj=raw.education)

[("bachelor's degree (for example:  ba, bs) after bachelor's degree", 296),
 ('1 or more years of college credit, no degree', 194),
 ('regular high school diploma', 179),
 ("master's degree (for example:  ma, ms, meng, med, mba)", 118),
 ("associate's degree (for example:  aa, as)", 106),
 ('some college credit, but less than 1 year of college credit', 97),
 ('ged or alternative credentialcollege or some college', 34),
 ("professional degree beyond bachelor's degree (for example:  md, dds, dvm, llb, jd)",
  22),
 ('doctorate degree (for example:  phd, edd)', 20),
 ('grade 9 to 11', 19),
 ('grade 12 (no diploma)high school graduate', 5),
 ('grade 5 to 8', 3),
 ('grade 4 or less', 1)]

In [None]:
educ_weights = [0.112, 0.291, 0.1889, 0.096, 0.202, 0.111)]

In [None]:


get_freq(df=raw, colname="points")

In [None]:
get_freq(df=raw, colname="number_of_questions")

In [None]:
get_freq(df=raw, colname="progress")

In [None]:
raw.columns

In [None]:
get_freq(df=raw, colname="run")

In [None]:
sorted(get_freq(series_obj=raw.minutes_spent.apply(lambda x: round(x, 0))))