# Additional survey analysis

This notebook contains analyses of the raw survey data, to answer questions that go beyond the basic cross-tabulations already available.

In [1]:
import pandas as pd
import weightedcalcs

In [2]:
# The variable `Weightvar_MI` contains the survey weights Ipsos has assigned each response.
# We use them below for our calculations.
wc = weightedcalcs.Calculator("Weightvar_MI")

In [3]:
prop_to_pct = lambda x, rounding=1: (x * 100).round(rounding)
prop_to_pct_sign = lambda x, rounding=1: (x * 100).round(rounding).astype(str) + "%"

## Load online survey responses

In [4]:
online = pd.read_csv("../data/Transgender Issues_SPSS v2.csv")

In [5]:
online.head()

Unnamed: 0,Serial,ID,wave,month,year,ReturnCode,IDType,avg_time_sec,avg_time,qcountry,...,MI6_5_3,MI6_5_4,MI6_5_5,MI6_6_1,MI6_6_2,MI6_6_3,MI6_6_4,MI6_6_5,MI_D12,Weightvar_MI
0,3,202040676,88,2016-11-15,2016,9,1,1443,4,31,...,0,0,0,0,0,0,1,0,2.0,0.420021
1,4,138003181673,88,2016-11-15,2016,9,1,1451,4,2,...,0,0,0,0,0,0,0,0,2.0,0.473525
2,5,203087192,88,2016-11-15,2016,9,1,2436,5,31,...,0,1,1,0,0,0,1,1,2.0,0.437239
3,9,203414983,88,2016-11-15,2016,9,1,1445,4,31,...,0,1,0,0,0,0,1,0,2.0,0.417648
4,10,204361036,88,2016-11-15,2016,9,1,2292,5,31,...,0,0,0,0,0,0,0,0,1.0,0.437239


Convert country IDs to country names:

In [6]:
COUNTRIES = { 1: "Argentina", 2: "Australia", 3: "Belgium", 4: "Brazil", 5: "Canada", 6: "China", 7: "France", 8: "Germany", 9: "Greece", 10: "Guatemala", 11: "Hong Kong", 12: "Hungary", 13: "India (online)", 14: "Indonesia", 15: "Ireland", 16: "Italy", 17: "Japan", 18: "Mexico", 19: "Netherlands", 20: "Philippines", 21: "Poland", 22: "Russia", 23: "Saudi Arabia", 24: "Singapore", 25: "South Africa", 26: "South Korea", 27: "Spain", 28: "Sweden", 29: "Thailand", 30: "Turkey", 31: "United Kingdom", 32: "US", 33: "Portugal", 34: "Vietnam", 35: "Malaysia", 36: "Finland", 37: "Egypt", 38: "Norway", 39: "Switzerland", 40: "Czech Republic", 41: "United Arab Emirates", 42: "Denmark", 43: "New Zealand", 44: "Colombia", 45: "Taiwan", 46: "Venezuela", 47: "Costa Rica", 48: "Romania", 49: "Chile", 50: "Nigeria", 51: "Israel", 52: "Peru", 53: "Serbia", 54: "Malta", 55: "Kenya", }

In [7]:
online["country"] = online["qcountry"].apply(COUNTRIES.get)
assert(online["country"].isnull().sum() == 0)

## Load India face-to-face survey responses

In [8]:
f2f = pd.read_csv("../data/GDTI_16-042614-01_GDTI_31Aug_v1.csv")

In [9]:
f2f.head()

Unnamed: 0,InstanceID,ProtoSurveyID,SHELL_START_DATE,SHELL_START_TIME,SHELL_COUNTRY,SHELL_LANGUAGE,SHELL_RECORDING_CONFIRMATION,SHELL_GENDER,SHELL_AGE,SHELL_AGE.SHELL_AGE_Age,...,Q8Loop@_2.Q8,Q8Loop@_3.Q8,Q8Loop@_4.Q8,Q8Loop@_5.Q8,Q8Loop@_6.Q8,Q8Loop@_7.Q8,Q8Loop@_8.Q8,Q9,D12,weight
0,38989,429,13691030400,13692040463,1,5,1,2,1,90,...,1,1,1,5,4,3,3,2,1,1.672149
1,39052,429,13691116800,13692024425,1,5,2,1,1,25,...,3,2,2,2,3,2,1,2,2,0.716635
2,39125,429,13691116800,13692027666,1,5,2,1,1,28,...,2,1,2,1,1,2,2,2,2,0.912081
3,39172,429,13691116800,13692029247,1,5,2,1,1,24,...,2,3,1,2,1,2,2,2,2,0.716635
4,39230,429,13691116800,13692031736,1,5,2,1,1,25,...,3,2,3,2,1,2,2,2,2,0.716635


The dataset from the face-to-face survey uses slightly different variable names. Here, we rename them to match the variable names from the online survey:

In [10]:
F2F_CONVERSIONS = {
    "weight": "Weightvar_MI",
    "Q1._1": "MI1_5",
    "Q1._2": "MI1_4",
    "Q1._3": "MI1_3",
    "Q1._4": "MI1_2",
    "Q1._5": "MI1_1",
    "Q1._6": "MI1_6",
    "Q3._1": "MI3_1",
    "Q3._2": "MI3_2",
    "Q3._3": "MI3_3",
    "Q3._4": "MI3_4",
    "Q3._5": "MI3_5",
    "Q3._6": "MI3_6",
    "Q3._7": "MI3_7",
    "Q4Loop@_1.Q4": "MI4_1",
    "Q4Loop@_2.Q4": "MI4_2",
    "Q4Loop@_3.Q4": "MI4_3",
    "Q4Loop@_4.Q4": "MI4_4",
    "Q4Loop@_5.Q4": "MI4_5",
    "Q4Loop@_6.Q4": "MI4_6",
    "DEM2": "a2"
}

In [11]:
f2f_comparable = f2f.rename(columns=F2F_CONVERSIONS)[list(F2F_CONVERSIONS.values())]
f2f_comparable["country"] = "India (F2F)"

In [12]:
# Q4 coding is reversed in India F2F
f2f_comparable["MI4_1"] = f2f_comparable["MI4_1"].apply({ 1: 4, 4: 1, 2: 3, 3: 2, 5: 5 }.get)
f2f_comparable["MI4_2"] = f2f_comparable["MI4_2"].apply({ 1: 4, 4: 1, 2: 3, 3: 2, 5: 5 }.get)
f2f_comparable["MI4_3"] = f2f_comparable["MI4_3"].apply({ 1: 4, 4: 1, 2: 3, 3: 2, 5: 5 }.get)
f2f_comparable["MI4_4"] = f2f_comparable["MI4_4"].apply({ 1: 4, 4: 1, 2: 3, 3: 2, 5: 5 }.get)
f2f_comparable["MI4_5"] = f2f_comparable["MI4_5"].apply({ 1: 4, 4: 1, 2: 3, 3: 2, 5: 5 }.get)
f2f_comparable["MI4_6"] = f2f_comparable["MI4_6"].apply({ 1: 4, 4: 1, 2: 3, 3: 2, 5: 5 }.get)

## Combine online and face-to-face responses

In [13]:
survey = pd.concat([online[online["country"] != "India (online)"], f2f_comparable])
len(survey)

17209

## Calculate overall supportiveness scores

To calculate an overall supportiveness score, we assign a numerical value to every response the six questions in loop #4:

- "They should be allowed to have surgery so their body matches their identity"
- "They should be allowed to use the restroom of the sex they identify with"
- "They should be allowed to marry a person of their birth sex"
- "They should be allowed to conceive or give birth to children (if biologically capable of doing so)"
- "They should be allowed to adopt children"
- "They should be protected from discrimination by the Government"

The scoring system:

- "Strongly agree": 1
- "Somewhat agree": 0.75
- "Don't know": 0.5
- "Somewhat disagree": 0.25
- "Strongly disagree": 0

Then, we calculated the supportiveness across the six questions for each respondent, and calculated the weighted average of those scores for each country (using Ipsos's respondent-level survey weights).

In [14]:
Q4_COLS = ["MI4_1", "MI4_2", "MI4_3", "MI4_4", "MI4_5", "MI4_6"]
Q4_ORD_COLS = [ col + "_ord" for col in Q4_COLS ]
survey[Q4_ORD_COLS] = survey[Q4_COLS].applymap({ 1: 0, 2: 0.25, 3: 0.75, 4: 1, 5: 0.5 }.get)

In [15]:
survey["Q4_ord_mean"] = survey[Q4_ORD_COLS].mean(axis=1)

In [16]:
wc.mean(survey.groupby("country"), "Q4_ord_mean")\
    .sort_values(ascending=False).pipe(prop_to_pct, 0).round().astype(int)

country
Spain             81
Sweden            77
Argentina         76
Canada            76
Germany           74
United Kingdom    73
Belgium           70
India (F2F)       68
Australia         68
US                66
Mexico            64
France            64
Italy             63
Brazil            63
South Africa      63
Japan             62
China             60
Turkey            59
Peru              55
Poland            53
South Korea       53
Hungary           53
Russia            44
dtype: int64

## Do you know someone who is transgender?

Below, we calculate the proportions of each country's respondents who said they knew someone who is transgender. More specifically, we classified the respondents into four groups:

- __`proximal`__: People who said they were or knew — as family, friends, or acquaintances — someone who is transgender.

- __`non-proximal`__: People who responded, "I have seen people like this but do not know them personally" or "I rarely or never encounter people like this".

- __`inconsistent`__: People whose provided inconsistent responses, e.g., people who said "I have seen people like this but do not know them personally" *and* "I have personal friends/family like this"

- __`dont-know`__: People who responded, "Don't know".

In [17]:
def classify_proximity(response):
    rarely = response["MI1_1"]
    have_seen = response["MI1_2"]
    have_acquaintances = response["MI1_3"]
    have_friends = response["MI1_4"]
    myself = response["MI1_5"]
    dont_know = response["MI1_6"]
    if (rarely or have_seen) and (have_acquaintances or have_friends):
        return "inconsistent"
    if (have_acquaintances or have_friends or myself):
        return "proximal"
    if rarely or have_seen:
        return "non-proximal"
    if dont_know:
        return "dont-know"
    raise ValueError("Can't compute")

In [18]:
survey["proximity"] = survey.apply(classify_proximity, axis=1)

In [19]:
PROXIMITY_COLS = [ "MI1_1", "MI1_2", "MI1_3", "MI1_4", "MI1_5", "MI1_6", ]
PROXIMITY_ORDER = ["proximal", "non-proximal", "inconsistent", "dont-know"]

The overall number of responses that fit into each category:

In [20]:
survey["proximity"].value_counts()

non-proximal    11869
proximal         3847
dont-know        1032
inconsistent      461
Name: proximity, dtype: int64

The weighted distribution of responses:

In [21]:
wc.distribution(survey.groupby("country"), "proximity")\
    .sort_values("proximal", ascending=False)\
    .pipe(prop_to_pct_sign)[PROXIMITY_ORDER]

proximity,proximal,non-proximal,inconsistent,dont-know
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Brazil,49.4%,36.5%,6.5%,7.6%
Mexico,44.5%,45.0%,7.4%,3.1%
Peru,43.1%,47.2%,7.9%,1.9%
Argentina,33.9%,59.4%,3.2%,3.5%
South Africa,32.1%,57.6%,7.2%,3.1%
Italy,30.4%,63.5%,1.6%,4.5%
Canada,27.6%,64.7%,2.5%,5.2%
US,25.6%,65.0%,6.5%,2.9%
Spain,23.3%,69.8%,1.6%,5.3%
Sweden,22.2%,76.2%,0.4%,1.2%


## What's the relationship between "proximity" and overall supportiveness

Here, we compare the overall supportiveness scores in each country, by `proximal` vs. `non-proximal`. In nearly every country, respondents were substantially more likely to support transgender rights if they know someone who is transgender than if they don't:

In [22]:
wc.mean(survey.groupby(["country", "proximity"]), "Q4_ord_mean")\
    [["non-proximal", "proximal"]]\
    .pipe(prop_to_pct)\
    .assign(
        diff=lambda x: x["proximal"] - x["non-proximal"],
        ratio=lambda x: x["proximal"] / x["non-proximal"],
    ).round(2)\
    .sort_values("ratio", ascending=False)

proximity,non-proximal,proximal,diff,ratio
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Turkey,55.5,72.7,17.2,1.31
US,60.3,78.0,17.7,1.29
Sweden,73.1,92.2,19.1,1.26
Russia,42.8,52.5,9.7,1.23
Peru,49.5,60.1,10.6,1.21
Australia,65.9,79.2,13.3,1.2
Italy,60.4,72.0,11.6,1.19
France,63.2,75.3,12.1,1.19
Argentina,71.7,83.6,11.9,1.17
Poland,52.9,62.0,9.1,1.17


## Classifying responses to the question, "Do you think people like this should be allowed to legally change their sex on identity documents, such as government ID cards or driving licenses?"

Respondents could choose multiple answers to this question, so we grouped the responses into five categories:

- __`no-restrictions`__: People who answered, "Yes, they should be allowed to legally change their sex on these documents if they want with no additional restrictions."
- __`surgery-required`__: People who answered, "Yes, they should be able to legally change their sex but only after they have to surgery so their body matches their identity" (including people who wanted additional requirements beyond surgery).
- __`other-restrictions`__: People who answered, "Yes, they should be able to legally change their sex on these documents but only with approval from a government official, such as a judge," or "Yes, they should be able to legally change their sex on these documents but only with a doctor's approval," but did not say surgery should be required.
- __`illegal`__: People who answered, "No, they should not be allowed to legally change their sex on these documents no matter what."
- __`no-answer`__: People who answered, "Don't know" or "Do not wish to answer"

In [23]:
def classify_documentation_policy(response):
    if response.sum() == 0: 
        # Note: This question was not included in China
        return "question-not-posed"
    if response["MI3_6"] | response["MI3_7"]:
        return "no-answer"
    if response["MI3_1"]:
        return "no-restrictions"
    if response["MI3_3"]:
        return "surgery-required"
    if response["MI3_2"] | response["MI3_4"]:
        return "other-restrictions"
    if response["MI3_5"]:
        return "illegal"
    raise ValueError("Can't compute")

In [24]:
survey["documentation_policy"] = survey[[
    "MI3_1", "MI3_2", "MI3_3", "MI3_4", "MI3_5", "MI3_6", "MI3_7"
]].fillna(0).astype(int).apply(classify_documentation_policy, axis=1)

In [25]:
wc.distribution(survey.groupby("country"), "documentation_policy")\
    .sort_values("no-restrictions", ascending=False)\
    .pipe(prop_to_pct)[["no-restrictions", "surgery-required", "other-restrictions", "illegal", "no-answer" ]]

documentation_policy,no-restrictions,surgery-required,other-restrictions,illegal,no-answer
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Spain,51.5,15.0,16.9,4.3,12.2
Argentina,47.8,14.6,15.2,11.4,10.9
Mexico,35.4,15.8,16.5,18.3,14.0
Brazil,32.2,13.0,13.8,16.7,24.4
Germany,31.4,21.8,18.2,7.9,20.6
Italy,29.2,27.4,17.0,8.5,17.9
Turkey,28.4,20.2,22.3,15.8,13.4
Sweden,28.3,21.8,22.6,7.1,20.2
Peru,27.5,20.6,19.8,20.3,11.8
United Kingdom,26.7,23.6,19.5,8.2,22.1


---

---

---