# Title

## Background


## Variables
**Subject ID:** dyad_id

**Demographic Measures:**
- **Age:** (Age.Years_Child) Child age in years at the time of study participation, calculated by experimenter using DOB as reported by parent
- **Sex:** (Sex_Child) Categorical male (0) or female (1), as reported by parent

**Survey Measures:**
- **Child Activities Inventory (CAI):** (CAI_Child) A continuous measure for gendered behavior in children (Golombok & Rust, 1993; Golombok et al., 2008). Parent specifies frequency of masculine and feminine behaviors, choosing from "Never," "Hardly ever," "Sometimes," Often," and "Very Often." “Male” and “female” items are added up, feminine score is subtracted from masculine score and converted into a “pseudo-T” scale by multiplying by 1.1 and adding 48.25. A higher score is more masculine behavior (mean=61.66 for boys) and lower score indicated more feminine behavior (mean=38.72 for girls).
- **Spatial Toys and Activities Checklist (STAC):** (STAC_Child) Checklist developed by Nora Newcombe & Elizabeth Gunderson (not yet published) based off the children's book title checklist from Sénéchal et al., 1996. Checklist consists of 40 real games and 20 foils, and parents are instructed to choose the items that they know to be names of children’s games and toys. 
- **Spatial Home Learning Environment Questionnaire (Spatial HLE):** (HLE_Child) A questionnaire on the frequency of spatial games and activites also developed by Nora Newcombe & Elizabeth Gunderson (not yet published) adapted from Zippert & Rittle-Johnson, 2018. Parents are asked "How frequently does your child engage in the following activities either alone or with others when they are at home?" and choose from "Never," "Once a month," 2-3 time a month," "1-2 times a week," "3-4 times a week," 5-6 times a week," and "Daily."

| Survey Measure | Example Items |
| --- | --- |
| CAI | Jewelry, Tool set, Swords |
| STAC | Minecraft, Magna-Tiles, Rinx |
| Spatial HLE | Do mazes, Play with puzzles |

**Cognitive Measures:**
- **Corsi Block Task:** (CorsiBlock_Child and CorsiBlock_Parent) Visuo-spatial working memory test. Score is the max number of squares that participant can correctly remember order of being lit up on the computer screen.
- **KBIT-2 Verbal Knowledge Raw Score:** (KBIT2.VerbalKnowl_Child and KBIT2.VerbalKnowl_Parent) Participants are asked to point to the image that matches the word given by the researcher. Score is total number of questions correctly answered.
- **KBIT-2 Matrices Raw Score:** (KBIT2.Matrices_Child and KBIT2.Matrices_Parent) Participants are asked to choose which answer (out of 5 options) would correctly complete a matrix. Score is total number of questions correctly answered.
- **Number Line Estimation:** (NumLine_Child and NumLine_Parent) Participants are told to guess where a number is located on a number line. Score is the average absolute difference between guess and correct answer.
- **WJIII Calculation Subtest Score:** (WJCalc_Child and WJCalc_Parent) Test of formal math abilities, score is the total number of math problems correctly answered.



## Hypotheses
- **Hypothesis 1:** Children’s math performance (as measured by a number line estimation task and a calculation task) is predicted by their gender (categorical and CAI) and prior experience with spatial activities (as measured by Spatial HLE Questionnaire and STAC) when controlling for spatial working memory (Corsi block), IQ measures (KBIT-2 verbal knowledge and matrices).

    - $Y_{NumLine} = β_{0} + β_{1}X_{HLE} + β_{2}X_{STAC} +  β_{3}X_{sex} + β){4}X_{CAI} + β_{5}X_{corsi} + β_{6}X_{KBITVerbal} + β_{7}X_{KBITMatrices}$
    - $Y_{WJCalc} = β_{0} + β_{1}X_{HLE} + β_{2}X_{STAC} + β_{3}X_{gender} + β_{4}X_{CAI} + β_{5}X_{corsi} + β_{6}X_{KBITVerbal} + β+{7}X_{KBITMatrices}$
    - For both: $β_{1} != 0, β_{2} != 0, β_{3} != 0, β_{4} != 0$
    
    
- **Hypothesis 2:** Parents’ math and spatial performance (as measured by a number line estimation task and a calculation task) is predictive of their children’s math and spatial performance and potentially mediated by the child’s spatial experience (Spatial HLE).

    - $Y_{NumLineChild} = β_{0} + β_{1}X_{NumLineParent}$
    - $Y_{WJCalcChild} =  β_{0} + β_{1}X_{WJCalcParent}$
    - For both: $β_{1} != 0$


## Data Organization
**Data Architecture**

Data is organized in a dyadic structure, with each dyad being a parent and their child. Importantly, if a parent brought in 2 kids to participate in the study, the parent's data was duplicated such that their data was present in each child's dyad.

**Data Cleansing & Tidying**
1. Export original excel file into csv format
2. Remove rows containing pilot data
3. Convert Sex variable from Male and Female to 0 and 1, respectively
4. Only keep measures that are relevant to my hypotheses
    - dyad_id
    - Sex_Child
    - CAI_Child
    - HLE_Child
    - STAC_Child
    - CorsiBlock_Child
    - KBIT2.VerbalKnowl_Child
    - KBIT2.Matrices_Child
    - WJCalc_Child
    - WJCalc_Parent
    - NumLine_Child
    - NumLine_Parent
    
    
5. Create 2 separate tables, 1 for each hypothesis. This is done in order to avoid removing dyads that may be missing data relevant to one hypothesis but not the other. In other words, I want to make sure I have as many observations as possible for each hypothesis.


6. Remove dyads with missing data from each table
 - Hypothesis 1 Table: dyads 1, 2, and 21
 - Hypothesis 2 Table: dyads 1, 18, 19, and 21


### Final Data Tables

**Hypothesis 1 Data Table:**

In [1]:
hyp1_dat = read.csv(file="Hyp1_Data.csv")
head(dat)

NameError: name 'read' is not defined

**Hypothesis 2 Data Table:**

In [None]:
hyp2_dat = read.csv(file="Hyp2_Data.csv")
head(dat)

## Analysis

### Hypothesis 1

### Hypothesis 2: Parents’ math and spatial performance is predictive of their children’s math and spatial performance and potentially mediated by the child’s spatial experience.

1. Check if response variables (Y) are normally distributed.

In [None]:
library(tidyverse)

ggplot(hyp2_dat, aes(x=NumLine_Child)) +
    geom_histogram()
    
ggplot(hyp2_dat, aes(x=WJCalc_Child)) +
    geom_histogram()

2. Linear Models

In [None]:
numline_lm <- glm(NumLine_Child ~ NumLine_Parent, hyp2_dat)
cv.err <- cv.glm(hyp2_dat, numline_lm, K=4)
cv.err$delta 

calc_lm <- glm(WJCalc_Child ~ WJCalc_Parent, hyp2_dat)
cv.err <- cv.glm(hyp2_dat, calc_lm, K=4)
cv.err$delta 

3. Bootstrapping

In [None]:
library(boot)

numline_boot <- function(data, index){  
    return(coef(lm(NumLine_Child ~ NumLine_Parent, data=data, subset=index)))}

calc_boot <- function(data, index){  
    return(coef(lm(WJCalc_Child ~ WJCalc_Parent, data=data, subset=index)))}

#sanity check
print(numline_boot(hyp2_dat, 1:34))
print(calc_boot(hyp2_dat, 1:34))

#bootstrap for number line model
numline_boot_obj = boot(hyp2_dat, numline_boot, R=1000)
print(numline_boot_obj)
attributes(numline_boot_obj)

hist(boot(hyp2_dat, numline_boot, R=1000)$t[,2], xlab="Parent numberline coefficient")

#bootstrap for calculation model
calc_boot_obj = boot(hyp2_dat, calc_boot, R=1000)
print(calc_boot_obj)
attributes(calc_boot_obj)

hist(boot(hyp2_dat, calc_boot, R=1000)$t[,2], xlab="Parent calculation coefficient")

4. Exploratory Mediation Analysis

Plot relationships between (1) child spatial experience, child numberline score, and parent numberline score; and (2) child spatial experience, child calculation score, and parent calculation score

In [None]:
#NumLine plot
ggplot(hyp2_dat, aes(x=NumLine_Parent, y=NumLine_Child, color = HLE_Child)) +
  geom_point(size = 2.5) +
  geom_smooth()

#Calc plot
ggplot(hyp2_dat, aes(x=WJCalc_Parent, y=WJCalc_Child, color = HLE_Child)) +
  geom_point(size = 2.5) +
  geom_smooth()

Mediation Analyses

In [None]:
install.packages("mediation")
library(mediation)

#NumLine analysis
fitM <- lm(HLE_Child ~ NubLine_Parent, data=hyp2_dat) #Step 1: IV on M, parent numberline score predicting child spatial experience
fitY <- lm(NumLine_Child ~ NumLine_Parent + HLE_Child, data=hyp2_dat) #Step 2: IV and M on DV, parent numberline and child spatial experience predicting child numberline score
summary(fitM)
summary(fitY)
fitMed <- mediate(fitM, fitY, treat="NumLine_Parent", mediator="HLE_Child")
summary(fitMed)

#Calc analysis
fitM <- lm(HLE_Child ~ WJCalc_Parent, data=hyp2_dat) #Step 1: IV on M, parent calculation predicting child spatial experience
fitY <- lm(WJCalc_Child ~ WJCalc_Parent + HLE_Child, data=hyp2_dat) #Step 2: IV and M on DV, parent calculation score and child spatial experience predicting child calculation score
summary(fitM)
summary(fitY)
fitMed <- mediate(fitM, fitY, treat="WJCalc_Parent", mediator="HLE_Child")
summary(fitMed)

## Conclusions