# Code of Conduct Data Preprocessing for the Self-Organizing Map Analysis

This notebook presents the steps of data preprocessing including data cleaning, renaming variables and labelling.

In [1]:
#if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse, readxl, tidyr, dplyr, expss, sjlabelled, labelled, fastDummies) # load R packages

In [2]:
#set directory
path <- c("~/Documents/R/CoC")
setwd(path)

The data was downloaded from the Otakantaa.fi platform where the survey has taken place. The data from the Code of Conduct project is sensitive and cannot be currently published in its original way. It has the structure of 27 variables and 85 observations. The code provided below reads the data, renames the variables, gives variables values and value labels.

First, we read the data which was downloaded in excel format.

In [3]:
#read data
data <- read_excel("data/otakantaa_2022-07-29.xlsx")


## Renaming variables and value labels

Then we rename all the variables from the question form to short form that would be easier to use in further analysis.

In [4]:
#rename variables
data <- select(data, -c(1:5)) %>%
  dplyr::rename(
    Relationship_to_the_faculty = "Relationship to the faculty:",
    Fluency_in_Finnish = "How fluent do you consider yourself in Finnish? (1-6 from fluent to no skills at all)",
    Discipline = "Which discipline or sector of the faculty do you most identify with?",
    feel_positive = "In general, I feel very positive about the Faculty",
    treated_fairly = "I am treated fairly at the Faculty",
    safe  = "I feel safe at the Faculty",
    connected_comm = "I feel connected to some community at the Faculty",
    positive_role = "In general, I feel very positive about my role at the Faculty",
    failure = "At times, I feel like I am a failure in my work or studies",
    close_people = "I feel close to people at the Faculty",
    support_each_other = "People at the Faculty support each other",
    support_me = "People at the Faculty support me",
    respect_each_other = "People at the Faculty treat each other with respect",
    respect_me = "People at the Faculty treat me with respect",
    appr_each_other = "People at the Faculty appreciate each other",
    appr_me = "People at the Faculty appreciate me",
    role = "How are you feeling in your role(s) related to the Faculty?",
    language = "Have you experienced difficulties related to language policies at the Faculty and not feeling equal?",
    situation = "In which situations have you observed or experienced incidents that you consider inappropriate or unfair behavior?",
    experience  = "Do you have any experiences of raising concerns about discrimination, oppression or inappropriate behavior, and what was the response?",
    barrier = "What kinds of barriers do you face when raising issues of unfair or inappropriate behavior?",
    injustice = "What kinds of structural injustice do you experience, and does it have an impact on you?",
    consquence = "What consequences should come from not following the code of conduct?",
    change = "What kinds of changes would you like the faculty to make in response to structures of discrimination or oppression?",
    intervention = "What kinds of interventions would you like the faculty to undertake in response to inappropriate behavior?",
    action = "What kinds of concrete actions could the faculty undertake to enhance social interactions and the sense of belonging to the community?",
    els = "Is there something else that you would like to share?"
)

Now we rename the value labels of background variables to numeric type.

In [5]:
#rename the value labels to numeric format
data <- data %>% select(-4) %>%
    mutate(
        Gender = dplyr::recode(Gender,  "Male" = 3, 
                                        "Female" = 2, 
                                        "Other" = 1,
                                        "Don’t want to answer" = 0, #leave it because it's important group
                                        .default = NA_real_), 
        Fluency_in_Finnish = dplyr::recode(Fluency_in_Finnish, 
                                            "Native" = 5, 
                                            "Advanced" = 4, 
                                            "Intermediate" = 3, 
                                            "Elementary" = 2, 
                                            "Beginner" = 1, 
                                            "No skills" = 0,
                                            .default = NA_real_),
        Relationship_to_the_faculty = dplyr::recode(Relationship_to_the_faculty, 
                                        "Staff: Professor-level" = 5, 
                                        "Staff: Other teaching staff" = 4, 
                                        "Staff: Other" = 3,  
                                        "Alumni, collaborator, other" = 2,          
                                        "Phd student" = 1, 
                                        "Phd student , Staff: Other teaching staff" = 1,
                                        "Phd student , Staff: Other" = 1,
                                        "Bachelor or Master student" = 0,                
                                        "Exchange student or other" = 0,               
                                        "Bachelor or Master student , Staff: Other" = 0,    
                                        .default = NA_real_)
)

#"Don't want to answer" option is interpreted as NA despite the gender question

Then we rename the value labels of the Likert scale questions to numeric type.

In [6]:
#rename the value labels for the Likert scale questions
data[4:16] <- apply(data[4:16], 2,
                    function(x) dplyr::recode(x, "Fully agree" = 5, 
                                                  "Somewhat agree" = 4, 
                                                  "Neither agree nor disagree" = 3, 
                                                  "Somewhat disagree" = 2,
                                                   "Fully disagree" = 1)) 

#set value labels for closed questions
labelled::val_labels(data[4:16]) <- c( "Fully agree" = 5, 
                                            "Somewhat agree" = 4, 
                                            "Neither agree nor disagree" = 3, 
                                            "Somewhat disagree" = 2,
                                            "Fully disagree" = 1)

And now we create the variables' labels.

## Creating variables and values labels 

In [7]:
#create variables' labels
data <- data %>% 
  select(1:16) %>%
  apply_labels(
  Relationship_to_the_faculty = "Relationship to the Faculty",
    Gender = "Gender",
    Fluency_in_Finnish = "How fluent do you consider yourself in Finnish?",
    feel_positive = "In general, I feel very positive about the Faculty",
    treated_fairly = "I am treated fairly at the Faculty",
    safe  = "I feel safe at the Faculty",
    connected_comm = "I feel connected to some community at the Faculty",
    positive_role = "In general, I feel very positive about my role at the Faculty",
    failure = "At times, I feel like I am a failure in my work or studies",
    close_people = "I feel close to people at the Faculty",
    support_each_other = "People at the Faculty support each other",
    support_me = "People at the Faculty support me",
    respect_each_other = "People at the Faculty treat each other with respect",
    respect_me = "People at the Faculty treat me with respect",
    appr_each_other = "People at the Faculty appreciate each other",
    appr_me = "People at the Faculty appreciate me")

And we add value labels to the background questions.

In [8]:
#add value labels
data <- data %>% apply_labels(
  Relationship_to_the_faculty = c(  "Staff: Professor-level" = 5,  
                                    "Staff: Other teaching staff" = 4,
                                    "Staff: Other" = 3,
                                    "Alumni, collaborator, other" = 2,
                                    "Phd student" = 1,
                                    "Bachelor, Master, Exchange student or other" = 0),
                       Gender = c("Male" = 3, 
                                  "Female" = 2, 
                                  "Other" = 1, 
                                  "Don’t want to answer" = 0),
           Fluency_in_Finnish = c( "Native" = 5,
                                   "Advanced" = 4, 
                                    "Intermediate" = 3, 
                                    "Elementary" = 2,
                                    "Beginner" = 1,
                                    "No skills" = 0)
)

We can check the result.

In [9]:
#check the labels
get_labels(data)
#or
str(data)

tibble [85 × 16] (S3: tbl_df/tbl/data.frame)
 $ Relationship_to_the_faculty:Class 'labelled' num [1:85] 3 1 3 1 0 4 1 4 1 0 ...
   .. .. LABEL: Relationship to the Faculty 
   .. .. VALUE LABELS [1:6]: 0=Bachelor, Master, Exchange student or other, 1=Phd student, 2=Alumni, collaborator, other, 3=Staff: Other, 4=Staff: Other teaching staff, 5=Staff: Professor-level 
 $ Gender                     :Class 'labelled' num [1:85] 3 2 2 2 2 3 2 2 2 3 ...
   .. .. LABEL: Gender 
   .. .. VALUE LABELS [1:4]: 0=Don’t want to answer, 1=Other, 2=Female, 3=Male 
 $ Fluency_in_Finnish         :Class 'labelled' num [1:85] 5 5 5 5 5 5 5 5 5 5 ...
   .. .. LABEL: How fluent do you consider yourself in Finnish? 
   .. .. VALUE LABELS [1:6]: 0=No skills, 1=Beginner, 2=Elementary, 3=Intermediate, 4=Advanced, 5=Native 
 $ feel_positive              :Class 'labelled' num [1:85] 1 4 2 4 4 5 4 5 2 4 ...
   .. .. LABEL: In general, I feel very positive about the Faculty 
   .. .. VALUE LABELS [1:5]: 5=Fully agr

## Creating binary variables for SOM analysis

Finally, we create binary variables from variable "Gender".

In [10]:
# change the categorical variables to a dummy variables
data <- dummy_cols(data, select_columns = c('Gender'))

#remove unnecessary columns
data <- data[, !(colnames(data) %in% c("Gender","Gender_NA"))]

#create variable labels for new dummy variables
data <- data %>%
  apply_labels(
    Gender_Female = "Female gender",
    Gender_Male = "Male gender",
    Gender_Other = "Other gender",
    "Gender_Don’t want to answer" = "Gender: Don't want to answer")

Check what data set we have now.

In [11]:
#check the data
head(data)

Relationship_to_the_faculty,Fluency_in_Finnish,feel_positive,treated_fairly,safe,connected_comm,positive_role,failure,close_people,support_each_other,support_me,respect_each_other,respect_me,appr_each_other,appr_me,Gender_Don’t want to answer,Gender_Female,Gender_Male,Gender_Other
<labelled>,<labelled>,<labelled>,<labelled>,<labelled>,<labelled>,<labelled>,<labelled>,<labelled>,<labelled>,<labelled>,<labelled>,<labelled>,<labelled>,<labelled>,<labelled>,<labelled>,<labelled>,<labelled>
3,5,1,1,5,1,1,2,1,2,2,2,1,2,2,0,0,1,0
1,5,4,2,5,4,2,4,4,4,4,2,3,2,3,0,1,0,0
3,5,2,4,5,3,3,1,2,2,3,3,3,3,3,0,1,0,0
1,5,4,5,5,5,5,4,4,5,5,5,5,5,5,0,1,0,0
0,5,4,4,4,3,4,5,2,4,4,4,4,3,4,0,1,0,0
4,5,5,5,5,5,5,1,5,5,5,5,5,5,5,0,0,1,0


In [12]:
#save processed data
saveRDS(data, file = "data/CoC_processed.rds")

Now, we can proceed with the notebook with the Self-organizing map analysis.