In [1]:
library(dplyr)
library(readxl)


Attaching package: â€˜dplyrâ€™


The following objects are masked from â€˜package:statsâ€™:

    filter, lag


The following objects are masked from â€˜package:baseâ€™:

    intersect, setdiff, setequal, union




In [129]:
path <- "./data/results.xlsx"
df <- read_excel(path)

# Preprocessing

Remove the first row as it contained invalid response ðŸ§Œ

In [130]:
df <- df[df$ID != 1,]

Regenerate row index 

In [131]:
df$ID <- 1:dim(df)[1]

Let's look at the columns

In [132]:
for (n in colnames(df)) {
    print(n)
}

[1] "ID"
[1] "Start time"
[1] "Completion time"
[1] "Email"
[1] "Name"
[1] "Wiek (w latach)"
[1] "PÅ‚eÄ‡"
[1] "Miejsce zamieszkania"
[1] "WyksztaÅ‚cenie"
[1] "Ile godzin dziennie spÄ™dzasz na Facebooku? (z wyÅ‚Ä…czeniem Messengera)"
[1] "Ile godzin dziennie spÄ™dzasz na Instagramie?"
[1] "Ile godzin dziennie spÄ™dzasz na TikToku?"
[1] "Czy w ostatnim miesiÄ…cu opublikowaÅ‚*Å› post na ktÃ³rymÅ› z powyÅ¼szych social mediÃ³w?"
[1] "UwaÅ¼am, Å¼e jestem osobÄ… wartoÅ›ciowÄ… przynajmniej w takim samym stopniu, co inni."
[1] "UwaÅ¼am, Å¼e posiadam wiele pozytywnych cech."
[1] "OgÃ³lnie biorÄ…c jestem skÅ‚onn* sÄ…dziÄ‡, Å¼e nie wiedzie mi siÄ™."
[1] "PotrafiÄ™ robiÄ‡ rÃ³Å¼ne rzeczy tak dobrze, jak wiÄ™kszoÅ›Ä‡ innych ludzi."
[1] "UwaÅ¼am, Å¼e nie mam wielu powodÃ³w, aby byÄ‡ z siebie dumn*."
[1] "LubiÄ™ siebie."
[1] "OgÃ³lnie rzecz biorÄ…c, jestem z siebie zadowolon*."
[1] "ChciaÅ‚*bym mieÄ‡ wiÄ™cej szacunku dla samego siebie."
[1] "Czasami czujÄ™ siÄ™ bezuÅ¼yteczn*."
[1] "Niekiedy uwaÅ¼am, Å¼

Rename the columns to something similer

In [133]:
df <- df %>%
  rename(
    age = "Wiek (w latach)",
    sex = "PÅ‚eÄ‡",
    education = "WyksztaÅ‚cenie",
    city = "Miejsce zamieszkania",
    facebookTime = "Ile godzin dziennie spÄ™dzasz na Facebooku? (z wyÅ‚Ä…czeniem Messengera)",
    instagramTime = "Ile godzin dziennie spÄ™dzasz na Instagramie?",
    tiktokTime = "Ile godzin dziennie spÄ™dzasz na TikToku?",
    publisher = "Czy w ostatnim miesiÄ…cu opublikowaÅ‚*Å› post na ktÃ³rymÅ› z powyÅ¼szych social mediÃ³w?",
    ses1 = "UwaÅ¼am, Å¼e jestem osobÄ… wartoÅ›ciowÄ… przynajmniej w takim samym stopniu, co inni.",
    ses2 = "UwaÅ¼am, Å¼e posiadam wiele pozytywnych cech.",
    ses3 = "OgÃ³lnie biorÄ…c jestem skÅ‚onn* sÄ…dziÄ‡, Å¼e nie wiedzie mi siÄ™.",
    ses4 = "PotrafiÄ™ robiÄ‡ rÃ³Å¼ne rzeczy tak dobrze, jak wiÄ™kszoÅ›Ä‡ innych ludzi.",
    ses5 = "UwaÅ¼am, Å¼e nie mam wielu powodÃ³w, aby byÄ‡ z siebie dumn*.",
    ses6 = "LubiÄ™ siebie.",
    ses7 = "OgÃ³lnie rzecz biorÄ…c, jestem z siebie zadowolon*.",
    ses8 = "ChciaÅ‚*bym mieÄ‡ wiÄ™cej szacunku dla samego siebie.",
    ses9 = "Czasami czujÄ™ siÄ™ bezuÅ¼yteczn*.",
    ses10 = "Niekiedy uwaÅ¼am, Å¼e jestem do niczego."
  ) %>% 
  select(
    -c("Start time",
       "Completion time",
       "Name", 
       "Email"))

## Calculate SES score

In [134]:
toScore <- function(text, reverse = FALSE) {
    rawScore <- case_when(
        text == "zdecydowanie nie zgadzam siÄ™" ~ 1,
        text == "nie zgadzam siÄ™" ~ 2,
        text == "zgadzam siÄ™" ~ 3,
        text == "zdecydowanie zgadzam siÄ™" ~ 4,
        TRUE ~ NA
    )

    if (reverse) 5 - rawScore
    else rawScore
}

In [135]:
df <- df %>% 
 mutate(
   ses = toScore(ses1)
     + toScore(ses2)
     + toScore(ses3, TRUE)
     + toScore(ses4)
     + toScore(ses5, TRUE)
     + toScore(ses6)
     + toScore(ses7)
     + toScore(ses8, TRUE)
     + toScore(ses9, TRUE)
     + toScore(ses10, TRUE)
 ) 

We don't need the raw data anymore

In [136]:
df <- df %>% select(-ses1, -ses2, -ses3, -ses4, -ses5, -ses6, -ses7, -ses8, -ses9, -ses10)

## Convert the SM time to minutes

In [137]:
df <- df %>%  
  mutate(
      facebookTime = floor(as.numeric(facebookTime) * 60),
      instagramTime = floor(as.numeric(instagramTime) * 60),
      tiktokTime = floor(as.numeric(tiktokTime) * 60),
      socialMediaTime = facebookTime + instagramTime + tiktokTime
  ) %>%
  select(ID, age, sex, facebookTime, instagramTime, tiktokTime, socialMediaTime, publisher, ses)

## Divide participants in groups by SM percentiles

First, we'll divide participants into 3 groups based on their total time spent on social media:
1. Participants below the 33 percentile - low SM usage
2. Participants between 34and 66 percentile - moderate SM usage
3. Participants over 67 per percentile - high SM usage

In [138]:
quantile(df$socialMediaTime, c(0.33, 0.66))

In [149]:
twoGroups <- function (quantiles) {
    function (time) {
        if (time <= quantiles[1]) "low"
        else "high"
    }
}

threeGroups <- function (quantiles) {
    function (time) {
        if (time <= quantiles[1]) "low"
        else if (time <= quantiles[2]) "moderate"
        else "high"
    }
}

In [150]:
q = c(0.33, 0.66)

df <- df %>% 
   mutate(
       facebookUsage = as.factor(sapply(facebookTime, threeGroups(quantile(df$facebookTime, q)))), 
       instagramUsage = as.factor(sapply(instagramTime, threeGroups(quantile(df$instagramTime, q)))), 
       tiktokUsage = as.factor(sapply(tiktokTime, threeGroups(quantile(df$tiktokTime, q)))),
       socialMediaUsage = as.factor(sapply(socialMediaTime, threeGroups(quantile(df$socialMediaTime, q)))),
       socialMediaUsageTwo = as.factor(sapply(socialMediaTime, twoGroups(quantile(df$socialMediaTime, 0.5)))),
   )

## Final DataFrame

In [151]:
glimpse(df)

Rows: 57
Columns: 14
$ ID                  [3m[90m<int>[39m[23m 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,â€¦
$ age                 [3m[90m<chr>[39m[23m "22", "21", "22", "21", "23", "22", "21", "19", "2â€¦
$ sex                 [3m[90m<chr>[39m[23m "MÄ™Å¼czyzna", "MÄ™Å¼czyzna", "Kobieta", "Kobieta", "Kâ€¦
$ facebookTime        [3m[90m<dbl>[39m[23m 0, 60, 60, 0, 0, 0, 180, 60, 0, 6, 60, 60, 60, 21,â€¦
$ instagramTime       [3m[90m<dbl>[39m[23m 120, 120, 120, 60, 180, 60, 0, 60, 60, 60, 60, 120â€¦
$ tiktokTime          [3m[90m<dbl>[39m[23m 0, 0, 0, 0, 0, 120, 0, 240, 0, 60, 0, 0, 0, 70, 60â€¦
$ socialMediaTime     [3m[90m<dbl>[39m[23m 120, 180, 180, 60, 180, 180, 180, 360, 60, 126, 12â€¦
$ publisher           [3m[90m<chr>[39m[23m "Nie", "Tak", "Nie", "Nie", "Nie", "Nie", "Nie", "â€¦
$ ses                 [3m[90m<dbl>[39m[23m 31, 33, 33, 31, 24, 22, 29, 24, 18, 27, 20, 32, 29â€¦
$ facebookUsage       [3m[90m<fct>[39m[23m low, moderate, mode

### LicznoÅ›ci poszczegÃ³lnych grup

In [156]:
table(df$socialMediaUsage)


    high      low moderate 
      16       30       11 

In [157]:
table(df[,c("socialMediaUsageTwo", "publisher")])

                   publisher
socialMediaUsageTwo Nie Tak
               high  15  12
               low   27   3

In [158]:
write.csv(df, "./output/data.csv", row.names = FALSE)

In [159]:
file.copy("./output/data.csv", "~/shared-vm/data.csv", overwrite = TRUE)