# Are we going to last?
## Predicting divorce rates based on self reported levels of negative communication patterns

**Introduction** - Every newlywed couple has asked themselves at least once: are we going to last? We use the Divorce Prediction Dataset to predict whether a couple will get divorced. The dataset contains responses from 150 couples from Turkey on 54 questions about their relationship. The responses are on a  5 point scale (0=Never, 1=Seldom, 2=Averagely, 3=Frequently, 4=Always). 

John Gottman is a renowned psychologist who is widely recognized for his work in martial relationships. His research highlights four major predictors of divorce which he refers to as "Four Horsemen of the Apocalypse" - critcism, contempt, defensiveness and stonewalling. 

Our team has selected questions from the Divorce Prediction Dataset that meet the criteria of one of these negative communication patterns. Our goals is to use classification to train the dataset and then predict if a couple will get divorced based on their score across these four negative communication patterns.

**Method** - The variables used in our analysis are as follows:
1. Divorce- This is a dummy variable that takes the value of 1 for divorce and 0 for Married 
2. Criticism- This includes couple's score (0-4) across questions that show criticism. John Gottman describe criticism as attacking your partner’s character instead of voicing a complaint. 
3. Contempt- This includes couple's score (0-4) across questions that show contempt. Contempt is described as assuming a position of moral superiority while criticising. 
4. Defensiveness This includes couple's score (0-4) across questions that show defensiveness. Defensiveness is described as not taking your partner’s concerns seriously and not taking responsibility for your mistakes.
5. Stonewalling- This includes couple's score (0-4) across questions that show stonewalling. Stonewalling is described as withdrawing from the interaction, shutting down, and not responding to your partner.




**Preliminary Exploratory Data Analysis:**

**Step 1** - Downloading the data into R 

In [1]:
library(dplyr)
library(tidyverse)
library(repr)
library(tidymodels)


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.6     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.7     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mforcats[39m 0.5.1
[32m✔[39m [34mreadr  [39m 2.1.2     

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.0.0 ──

[32m✔[39m [34mbroom       [39m 1.0.0     [32m✔[39m [34mrsample     [39m 1.0.0
[32m✔[39m 

In [2]:
my_url <- "https://raw.githubusercontent.com/apurva-b/dsci100-project-58/main/divorce_data.csv"
data <- read_delim(my_url, delim = ";")
head(data)

[1mRows: [22m[34m170[39m [1mColumns: [22m[34m55[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ";"
[32mdbl[39m (55): Q1, Q2, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Q13, Q14, Q15, ...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8,Q9,Q10,⋯,Q46,Q47,Q48,Q49,Q50,Q51,Q52,Q53,Q54,Divorce
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
2,2,4,1,0,0,0,0,0,0,⋯,2,1,3,3,3,2,3,2,1,1
4,4,4,4,4,0,0,4,4,4,⋯,2,2,3,4,4,4,4,2,2,1
2,2,2,2,1,3,2,1,1,2,⋯,3,2,3,1,1,1,2,2,2,1
3,2,3,2,3,3,3,3,3,3,⋯,2,2,3,3,3,3,2,2,2,1
2,2,1,1,1,1,0,0,0,0,⋯,2,1,2,3,2,2,2,1,0,1
0,0,1,0,0,2,0,0,0,1,⋯,2,2,1,2,1,1,1,2,0,1


**Step 2** - Selecting the columns that are relevant to our research question and mutate the data so that the Divorce column is read as a factor instead of an integer


In [3]:
data <- data |> 
    mutate(Divorce = as_factor(Divorce))

selected_data <- select(data, Divorce, Q32:Q37, Q52, Q31, Q38, Q41, Q48, Q53, Q54, Q49, Q50, Q51, Q42:Q47 )

In [4]:
head(selected_data)

Divorce,Q32,Q33,Q34,Q35,Q36,Q37,Q52,Q31,Q38,⋯,Q54,Q49,Q50,Q51,Q42,Q43,Q44,Q45,Q46,Q47
<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,2,1,2,0,1,2,3,1,1,⋯,1,3,3,2,1,1,2,3,2,1
1,4,2,3,0,2,3,4,0,4,⋯,2,4,4,4,2,3,4,2,2,2
1,3,1,1,1,1,2,2,3,1,⋯,2,1,1,1,3,2,3,2,3,2
1,3,2,2,1,1,3,2,2,3,⋯,2,3,3,3,2,3,2,3,2,2
1,1,1,1,0,0,0,2,1,0,⋯,0,3,2,2,2,3,0,2,2,1
1,1,1,1,1,1,1,1,4,2,⋯,0,2,1,1,1,2,3,0,2,2


**Step 3** - Creating a new column for each negative communication style by averaging the score for the questions relevant for them. 

For example- Creating a column named Contempt that includes the average score of all the questions that correspond to having contempt in a relationship

In [5]:
mutated_data <- selected_data %>%
mutate( 
    Criticism= rowMeans(across(2:8)), 
    Contempt= rowMeans(across(9:14)), 
    Defensiveness= rowMeans(across(15:17)), 
    Stonewalling = rowMeans(across(18:23)))

In [6]:
head(mutated_data)

Divorce,Q32,Q33,Q34,Q35,Q36,Q37,Q52,Q31,Q38,⋯,Q42,Q43,Q44,Q45,Q46,Q47,Criticism,Contempt,Defensiveness,Stonewalling
<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,2,1,2,0,1,2,3,1,1,⋯,1,1,2,3,2,1,1.5714286,1.6666667,2.666667,1.666667
1,4,2,3,0,2,3,4,0,4,⋯,2,3,4,2,2,2,2.5714286,2.1666667,4.0,2.5
1,3,1,1,1,1,2,2,3,1,⋯,3,2,3,2,3,2,1.5714286,2.3333333,1.0,2.5
1,3,2,2,1,1,3,2,2,3,⋯,2,3,2,3,2,2,2.0,2.3333333,3.0,2.333333
1,1,1,1,0,0,0,2,1,0,⋯,2,3,0,2,2,1,0.7142857,0.6666667,2.333333,1.666667
1,1,1,1,1,1,1,1,4,2,⋯,1,2,3,0,2,2,1.0,1.8333333,1.333333,1.666667


**Step 4** - Selecting the columns relevant for our analysis

In [7]:
final_dataset <- select(mutated_data, Divorce, Criticism, Contempt, Defensiveness, Stonewalling)

In [8]:
head(final_dataset)

Divorce,Criticism,Contempt,Defensiveness,Stonewalling
<fct>,<dbl>,<dbl>,<dbl>,<dbl>
1,1.5714286,1.6666667,2.666667,1.666667
1,2.5714286,2.1666667,4.0,2.5
1,1.5714286,2.3333333,1.0,2.5
1,2.0,2.3333333,3.0,2.333333
1,0.7142857,0.6666667,2.333333,1.666667
1,1.0,1.8333333,1.333333,1.666667


The graph is split into 75% training data and 25% testing data as it is a good split in order to train the data while making sure the testing data is large enough.

In [9]:
#Training the data

set.seed(2023)

divorce_split <- initial_split(final_dataset, prop = 0.75, strata = Divorce)   
 divorce_train <- training(divorce_split)    
 divorce_test <- testing(divorce_split) 
 divorce_train
 divorce_test

Divorce,Criticism,Contempt,Defensiveness,Stonewalling
<fct>,<dbl>,<dbl>,<dbl>,<dbl>
0,0.1428571,0.6666667,0.0000000,0.8333333
0,0.4285714,0.8333333,0.3333333,0.5000000
0,0.0000000,0.6666667,0.0000000,0.0000000
0,0.7142857,1.0000000,1.0000000,0.1666667
0,0.2857143,0.1666667,0.0000000,1.1666667
0,0.1428571,0.3333333,1.0000000,1.0000000
0,0.4285714,0.6666667,1.0000000,1.0000000
0,0.8571429,0.8333333,2.0000000,0.0000000
0,0.2857143,1.0000000,2.0000000,1.0000000
0,0.2857143,1.5000000,0.0000000,1.5000000


Divorce,Criticism,Contempt,Defensiveness,Stonewalling
<fct>,<dbl>,<dbl>,<dbl>,<dbl>
1,1.5714286,1.6666667,2.6666667,1.6666667
1,1.5714286,2.0,1.6666667,1.6666667
1,4.0,4.0,4.0,4.0
1,4.0,4.0,4.0,4.0
1,4.0,4.0,4.0,4.0
1,4.0,4.0,4.0,4.0
1,4.0,4.0,4.0,3.8333333
1,4.0,4.0,4.0,4.0
1,4.0,4.0,4.0,4.0
1,3.5714286,3.5,3.6666667,3.5
