In [1]:
# Load prepped_raw_data.xlsx into a dataframe. First row is header, second row should be skipped.
import numpy as np
import pandas as pd

from worldview import preprocessor

# Show full dataframes when printing
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
pd.set_option("display.float_format", lambda x: "%.5f" % x)
np.set_printoptions(threshold=100, suppress=True)

# Load and prepare the data
df = preprocessor.create_prepped_data()

Number of rows with a tie for the maximum worldview score: 16


Look at the demographics and categorical vars and do any cleaning as needed. 

NOTE: For questions where "prefer not to answer" was an option, I mapped those to null/NaN, as that is easier for the statistical analysis. 

In [2]:
# Age
print(df["age_group"].value_counts(dropna=False))

age_group
25-34    134
35-44    117
45-54     71
55-64     54
NaN       23
65-74      7
Name: count, dtype: int64


We didn't have a group for 18-24. You will need report that in any papers, and drop the missing folks from any analysis on age. I wouldn't recommend inferring that the missings are from that group, since we don't know if they skipped the question for other reasons. 

You have very few in the 65-74 group, but I don't think that will be an issue if we are treating age as ordinal. Can see as we get into the analysis. 

In [3]:
# Gender
print(df["gender"].value_counts(dropna=False))

gender
male         208
female       189
nonbinary      8
NaN            1
Name: count, dtype: int64


We only have 8 non-binary. We can either drop them from analysis and use t-tests when looking at gender differences (most simple approach), or we can do one-way or Welch's ANOVA, depending on if we meet the assumptions for those tests. 

FOLLOWUP: Let me know if you have a preference. ANOVAs do take a bit more time and more write-up in papers. It's unlikely you will get significant results with only 8, but we can spend the time if you like. 

In [4]:
# transsexual
print(df["transsexual"].value_counts(dropna=False))

transsexual
NaN    247
no     157
yes      2
Name: count, dtype: int64


Only 2 people responded yes (NaN means null/no answer). Not enough to do stats on. 

In [5]:
# ethnicity
print(df["ethnicity"].value_counts(dropna=False))

ethnicity
White                                        266
Black or African American                     86
Asian or Asian American                       27
Hispanic or Latino                            21
Middle Eastern or North African                3
NaN                                            2
Native Hawaiian or other Pacific Islander      1
Name: count, dtype: int64


In [6]:
# ethnicity - specify
print(df["ethnicity_specify"].value_counts(dropna=False))

ethnicity_specify
NaN                          396
biracial/black/white           1
White-passing latinx           1
white/hispanic                 1
Black and Middle Eastern       1
Black/White                    1
Black White                    1
White, Puerto Rican            1
Australian                     1
black and white                1
Black caribbean and white      1
Name: count, dtype: int64


It looks like we could have benefited from offering a mixed race option, but the group would have been small. It also wouldn't get at different values for mixed race (black/white vs black/middle eastern). 

FOLLOWUP: Let me know if you want me to manually make a mixed race category and do statistics for it. We would override whatever value they put in the main ethnicity question with "mixed". 

In [7]:
# education
print(df["education"].value_counts(dropna=False))

education
Graduated with Bachelors                                            168
1-2 years college/associate’s degree/trade school/certifications     86
Graduated with master’s degree                                       79
Highschool gradate or proficiency                                    47
Graduated with PhD                                                   23
NaN                                                                   3
Name: count, dtype: int64


In [8]:
# education - other
print(df["education_other"].value_counts(dropna=False))

education_other
NaN                                                        402
Masters, two.    and the student loans for both :((((((      1
not a high school graduate                                   1
Graduated JD                                                 1
MD                                                           1
Name: count, dtype: int64


I cleaned up the main education question to make sure that those with the JD/MD were in the "Graduated with PhD" group, but in the future I recommend the option be "Graduate with Doctorate degree" to reflect that not all doctorates are PhDs. 

FOLLOWUP: For our analysis, I recommend combining the "Attended trade school/certifications" and "1-2 years college/associate’s degree", due to the first one being so small. I don't know what the current research standards on this, but another option would be to combine "Some graduate school" with "Graduated with Bachelors" - reason being that "Some graduate school" indicates you started, but didn't get a degree. If we framed it as highest level of education completed, that would clean up the groups for analysis a bit. Let me know what you would like to do. 

In [9]:
# religious/spiritual orientation
print(df["religious_spiritual_orientation"].value_counts(dropna=False))

religious_spiritual_orientation
Christian               182
Agnostic                 82
Atheist                  59
Spiritually eclectic     43
NaN                      25
Judaism                   8
Buddhist                  5
Muslim                    2
Name: count, dtype: int64


In [10]:
# religious/spiritual orientation - other
print(df["religious_spiritual_orientation_other"].value_counts(dropna=False))

religious_spiritual_orientation_other
NaN                                                                                                                                         381
Muslim                                                                                                                                        2
none                                                                                                                                          2
Wiccan                                                                                                                                        2
Roman Catholic                                                                                                                                1
spiritual                                                                                                                                     1
Metaphysical                                                                                      

FOLLOWUPs: I already fixed the main question so that the 2 people that indicated "Muslim" have that value. Based on the small group sizes, I recommend combining Judaism/Buddhist/Muslim into an "other" group.  

We have a lot of people (23) that didn't answer the main question but did answer the "other" question. It would take me about 30 mins to manually try to map those. Let me know if you want me to do that, and if so, what should map to what. 

In [11]:
# Consider themselves to be open/inclusive
print(df["consider_open_inclusive"].value_counts(dropna=False))

consider_open_inclusive
yes    390
no      16
Name: count, dtype: int64


FOLLOWUP: You won't be able to do much with this given how few people answered no. I'd recommend skipping the time doing analysis on this question, but let me know if you want me to move forward. 

In [12]:
# Had experiences to make them open/inclusive
print(df["experience_open_inclusive"].value_counts(dropna=False))

experience_open_inclusive
yes    343
no      63
Name: count, dtype: int64


FOLLOWUP: You do have more that said no to this question. I'll assume you do want me to do analysis on this one, but let me know if not. 

In [13]:
# Feels their experiences have changed views of themselves/others
print(df["feel_experience_changed"].value_counts(dropna=False))

feel_experience_changed
yes    350
no      52
NaN      4
Name: count, dtype: int64


Followup: Same with this. I'll move forward with analysis. 

# Dominant Worldview
Adding in the counts for dominant worldview. Note that we did have some participants where their top scores tied. The paper had no instructions on what to do. The most simple thing would be to drop those participants (and report that in papers). Otherwise we have to do the more complex factor analysis which will be time consuming/require more complex writeup. 

In [14]:
# Had experiences to make them open/inclusive
print(df["dominant_worldview"].value_counts(dropna=False))

dominant_worldview
traditional      244
modern            81
postmodern        36
integrative       29
multiple_ties     16
Name: count, dtype: int64
