# Are we going to last?
## Predicting divorce rates based on self reported levels of negative communication patterns

**Introduction** - Every newlywed couple has asked themselves at least once: are we going to last? We use the Divorce Prediction Dataset to predict whether a couple will get divorced. The dataset contains responses from 150 couples from Turkey on 54 questions about their relationship. The responses are on a  5 point scale (0=Never, 1=Seldom, 2=Averagely, 3=Frequently, 4=Always). 

John Gottman is a renowned psychologist who is widely recognized for his work in martial relationships. His research highlights four major predictors of divorce which he refers to as "Four Horsemen of the Apocalypse" - critcism, contempt, defensiveness and stonewalling. 

Our team has selected questions from the Divorce Prediction Dataset that meet the criteria of one of these negative communication patterns. Our goals is to use classification to train the dataset and then predict if a couple will get divorced based on their score across these four negative communication patterns.

**Method** - 

**Preliminary Exploratory Data Analysis:**

**Step 1** - Downloading the data into R 

In [16]:
library(dplyr)

In [13]:
library(tidyverse)
my_url <- "https://raw.githubusercontent.com/apurva-b/dsci100-project-58/main/divorce_data.csv"
data <- read_delim(my_url, delim = ";")
data

[1mRows: [22m[34m170[39m [1mColumns: [22m[34m55[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ";"
[32mdbl[39m (55): Q1, Q2, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Q13, Q14, Q15, ...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8,Q9,Q10,⋯,Q46,Q47,Q48,Q49,Q50,Q51,Q52,Q53,Q54,Divorce
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
2,2,4,1,0,0,0,0,0,0,⋯,2,1,3,3,3,2,3,2,1,1
4,4,4,4,4,0,0,4,4,4,⋯,2,2,3,4,4,4,4,2,2,1
2,2,2,2,1,3,2,1,1,2,⋯,3,2,3,1,1,1,2,2,2,1
3,2,3,2,3,3,3,3,3,3,⋯,2,2,3,3,3,3,2,2,2,1
2,2,1,1,1,1,0,0,0,0,⋯,2,1,2,3,2,2,2,1,0,1
0,0,1,0,0,2,0,0,0,1,⋯,2,2,1,2,1,1,1,2,0,1
3,3,3,2,1,3,4,3,2,2,⋯,3,2,3,2,3,3,2,2,2,1
2,1,2,2,2,1,0,3,3,2,⋯,0,1,2,2,2,1,1,1,0,1
2,2,1,0,0,4,1,3,3,3,⋯,1,1,1,1,1,1,1,1,1,1
1,1,1,1,1,2,0,2,2,2,⋯,2,0,2,2,2,2,4,3,3,1


**Step 2** - Selecting the columns that are relevant to our research question

In [19]:
selected_data <- select(data, Divorce, Q32:Q37, Q52, Q31, Q38, Q41, Q48, Q53, Q54, Q49, Q50, Q51, Q42:Q47 )

In [20]:
selected_data

Divorce,Q32,Q33,Q34,Q35,Q36,Q37,Q52,Q31,Q38,⋯,Q54,Q49,Q50,Q51,Q42,Q43,Q44,Q45,Q46,Q47
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,2,1,2,0,1,2,3,1,1,⋯,1,3,3,2,1,1,2,3,2,1
1,4,2,3,0,2,3,4,0,4,⋯,2,4,4,4,2,3,4,2,2,2
1,3,1,1,1,1,2,2,3,1,⋯,2,1,1,1,3,2,3,2,3,2
1,3,2,2,1,1,3,2,2,3,⋯,2,3,3,3,2,3,2,3,2,2
1,1,1,1,0,0,0,2,1,0,⋯,0,3,2,2,2,3,0,2,2,1
1,1,1,1,1,1,1,1,4,2,⋯,0,2,1,1,1,2,3,0,2,2
1,2,2,1,1,2,3,2,1,2,⋯,2,2,3,3,3,3,4,3,3,2
1,1,0,2,2,1,4,1,1,4,⋯,0,2,2,1,4,3,2,0,0,1
1,1,1,1,1,1,1,1,1,2,⋯,1,1,1,1,2,2,2,2,1,1
1,1,0,1,0,0,1,4,1,1,⋯,3,2,2,2,2,3,2,2,2,0


**Step 3** - Creating a new column for each negative communication style by averaging the score for the questions relevant for them. 

For example- Creating a column named Contempt that includes the average score of all the questions that correspond to having contempt in a relationship

In [33]:
mutated_data <- selected_data %>%
mutate( 
    Criticism= rowMeans(across(2:8)), 
    Contempt= rowMeans(across(9:14)), 
    Defensiveness= rowMeans(across(15:17)), 
    Stonewalling = rowMeans(across(18:23)))

In [34]:
head(mutated_data)

Divorce,Q32,Q33,Q34,Q35,Q36,Q37,Q52,Q31,Q38,⋯,Q42,Q43,Q44,Q45,Q46,Q47,Criticism,Contempt,Defensiveness,Stonewalling
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,2,1,2,0,1,2,3,1,1,⋯,1,1,2,3,2,1,1.5714286,1.6666667,2.666667,1.666667
1,4,2,3,0,2,3,4,0,4,⋯,2,3,4,2,2,2,2.5714286,2.1666667,4.0,2.5
1,3,1,1,1,1,2,2,3,1,⋯,3,2,3,2,3,2,1.5714286,2.3333333,1.0,2.5
1,3,2,2,1,1,3,2,2,3,⋯,2,3,2,3,2,2,2.0,2.3333333,3.0,2.333333
1,1,1,1,0,0,0,2,1,0,⋯,2,3,0,2,2,1,0.7142857,0.6666667,2.333333,1.666667
1,1,1,1,1,1,1,1,4,2,⋯,1,2,3,0,2,2,1.0,1.8333333,1.333333,1.666667


**Step 4** - Selecting the columns relevant for our analysis

In [35]:
final_dataset <- select(mutated_data, Divorce, Criticism, Contempt, Defensiveness, Stonewalling)

In [37]:
final_dataset

Divorce,Criticism,Contempt,Defensiveness,Stonewalling
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,1.5714286,1.6666667,2.666667,1.666667
1,2.5714286,2.1666667,4.000000,2.500000
1,1.5714286,2.3333333,1.000000,2.500000
1,2.0000000,2.3333333,3.000000,2.333333
1,0.7142857,0.6666667,2.333333,1.666667
1,1.0000000,1.8333333,1.333333,1.666667
1,1.8571429,2.1666667,2.666667,3.000000
1,1.5714286,2.0000000,1.666667,1.666667
1,1.0000000,1.3333333,1.000000,1.666667
1,1.0000000,1.8333333,2.000000,1.833333
