# CMPS 320 Final Project - Divorce Dataset Analysis

## Importing packages

In [119]:
# imports
import warnings
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sn
import statsmodels.formula.api as smf

from sklearn.preprocessing import scale
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge, RidgeCV, Lasso, LassoCV
from sklearn.metrics import mean_squared_error

import plotly.offline as py
import plotly.graph_objs as go
import plotly.tools as tls
import plotly.express as px


from sklearn import decomposition
from sklearn import preprocessing
from sklearn import metrics

%matplotlib inline
plt.style.use('seaborn-white')

warnings.filterwarnings('ignore')


## Loading, cleaning, and exploring dataset

In [120]:
divorces = pd.read_csv('divorce.csv', delimiter=';')
divorces = divorces.sample(frac=1, random_state=42) # we want a more randomized spread of our data for when we visualize it

divorces.head()

Unnamed: 0,Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8,Q9,Q10,...,Q46,Q47,Q48,Q49,Q50,Q51,Q52,Q53,Q54,Divorce
139,3,1,1,0,0,0,0,0,0,0,...,3,2,2,0,2,2,0,0,4,0
30,3,4,3,2,3,0,1,4,3,2,...,4,4,4,4,4,4,4,4,4,1
119,0,1,1,0,0,2,0,0,0,0,...,2,2,2,0,2,1,1,1,0,0
29,4,3,3,2,4,1,0,3,3,2,...,4,4,4,4,4,4,4,4,4,1
144,0,0,2,4,0,0,0,0,0,2,...,2,0,2,4,0,0,1,0,0,0


Our report aims to focus itself around the `Divorce Predictors` dataset. The dataset itself consists of data coming from 150 Turkish couples; 84 divorced, and 86 currently married, who completed an assessment aimed at getting a basic understanding of the couple, and their relationship. The data has 170 instances, containing 54 predictors in the form of a written question. These predictors are then divided into a 0-5 scale (inclusive), with:


+ 0 = Never
+ 1 = Seldom
+ 2 = Averagely
+ 3 = Frequently
+ 4 = Always

The following questions are predictors in our dataframe that we will be using to assess ...

In [121]:
questionSet= [
    '1 If one of us apologizes when our discussion deteriorates, the discussion ends.',
    '2 I know we can ignore our differences, even if things get hard sometimes.',
    '3 When we need it, we can take our discussions with my spouse from the beginning and correct it.',
    '4 When I discuss with my spouse, to contact him will eventually work.',
    '5 The time I spent with my wife is special for us.',
    '6 We don\'t have time at home as partners.',
    '7 We are like two strangers who share the same environment at home rather than family.',
    '8 I enjoy our holidays with my wife.',
    '9 I enjoy traveling with my wife.',
    '10 Most of our goals are common to my spouse.',
    '11 I think that one day in the future, when I look back, I see that my spouse and I have been in harmony with each other.',
    '12 My spouse and I have similar values in terms of personal freedom.',
    '13 My spouse and I have similar sense of entertainment.',
    '14 Most of our goals for people (children, friends, etc) are the same.',
    '15 Our dreams with my spouse are similar and harmonious.',
    '16 We\'re compatible with my spouse about what love should be.',
    '17 We share the same views about being happy in our life with my spouse.',
    '18 My spouse and I have similar ideas about how marriage should be.',
    '19 My spouse and I have similar ideas about how roles should be in marriage.',
    '20 My spouse and I have similar values in trust.',
    '21 I know exactly what my wife likes.',
    '22 I know how my spouse wants to be taken care of when she/he sick.',
    '23 I know my spouse\'s favorite food.',
    '24 I can tell you what kind of stress my spouse is facing in her/his life.',
    '25 I have knowledge of my spouse\'s inner world.',
    '26 I know my spouse\'s basic anxieties.',
    '27 I know what my spouse\'s current sources of stress are.',
    '28 I know my spouse\'s hopes and wishes.',
    '29 I know my spouse very well.',
    '30 I know my spouse\'s friends and their social relationships.',
    '31 I feel aggressive when I argue with my spouse.',
    "32 When discussing with my spouse, I usually use expressions such as 'you always' or 'you never'.",
    '33 I can use negative statements about my spouse\'s personality during our discussions.',
    '34 I can use offensive expressions during our discussions.',
    '35 I can insult my spouse during our discussions.',
    '36 I can be humiliating when we discussions.',
    '37 My discussion with my spouse is not calm.',
    '38 I hate my spouse\'s way of open a subject.',
    '39 Our discussions often occur suddenly.',
    '40 We\'re just starting a discussion before I know what\'s going on.',
    '41 When I talk to my spouse about something, my calm suddenly breaks.',
    '42 When I argue with my spouse, I only go out and I don\'t say a word.',
    '43 I mostly stay silent to calm the environment a little bit.',
    '44 Sometimes I think it\'s good for me to leave home for a while.',
    '45 I\'d rather stay silent than discuss with my spouse.',
    '46 Even if I\'m right in the discussion, I stay silent to hurt my spouse.',
    '47 When I discuss with my spouse, I stay silent because I am afraid of not being able to control my anger.',
    '48 I feel right in our discussions.',
    '49 I have nothing to do with what I\'ve been accused of.',
    '50 I\'m not actually the one who\'s guilty about what I\'m accused of.',
    '51 I\'m not the one who\'s wrong about problems at home.',
    '52 I wouldn\'t hesitate to tell my spouse about her/his inadequacy.',
    '53 When I discuss, I remind my spouse of her/his inadequacy.',
    '54 I\'m not afraid to tell my spouse about her/his incompetence.'
]

In [122]:
questions = divorces.drop(['Divorce'], axis=1).columns
perc = {}
for col in questions:
    perc[col] = round(divorces[col].value_counts(normalize=True)*100, 2)
result = pd.DataFrame(perc).T
result.index = questionSet

result.style.highlight_max(color='blue', axis=1).highlight_min(color='green', axis=1).format('{:.0f}%')


Unnamed: 0,0,1,2,3,4
"1 If one of us apologizes when our discussion deteriorates, the discussion ends.",41%,5%,8%,28%,18%
"2 I know we can ignore our differences, even if things get hard sometimes.",35%,14%,16%,22%,13%
"3 When we need it, we can take our discussions with my spouse from the beginning and correct it.",30%,14%,15%,31%,10%
"4 When I discuss with my spouse, to contact him will eventually work.",44%,7%,18%,19%,12%
5 The time I spent with my wife is special for us.,48%,6%,5%,26%,15%
6 We don't have time at home as partners.,51%,29%,17%,2%,1%
7 We are like two strangers who share the same environment at home rather than family.,67%,25%,3%,2%,3%
8 I enjoy our holidays with my wife.,48%,6%,12%,22%,12%
9 I enjoy traveling with my wife.,49%,4%,8%,29%,10%
10 Most of our goals are common to my spouse.,36%,11%,22%,20%,11%


In [123]:
divorces['Divorce'].value_counts() # roughly half of these couples are happily married, and (roughly) the other half of them are currently divorced

0    86
1    84
Name: Divorce, dtype: int64

In [124]:
divorces.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 170 entries, 139 to 102
Data columns (total 55 columns):
 #   Column   Non-Null Count  Dtype
---  ------   --------------  -----
 0   Q1       170 non-null    int64
 1   Q2       170 non-null    int64
 2   Q3       170 non-null    int64
 3   Q4       170 non-null    int64
 4   Q5       170 non-null    int64
 5   Q6       170 non-null    int64
 6   Q7       170 non-null    int64
 7   Q8       170 non-null    int64
 8   Q9       170 non-null    int64
 9   Q10      170 non-null    int64
 10  Q11      170 non-null    int64
 11  Q12      170 non-null    int64
 12  Q13      170 non-null    int64
 13  Q14      170 non-null    int64
 14  Q15      170 non-null    int64
 15  Q16      170 non-null    int64
 16  Q17      170 non-null    int64
 17  Q18      170 non-null    int64
 18  Q19      170 non-null    int64
 19  Q20      170 non-null    int64
 20  Q21      170 non-null    int64
 21  Q22      170 non-null    int64
 22  Q23      170 non-null   

We can see here that we have no missing data, and everything that we are expecting to see is accounted for. With that, we can move forward. We wanted to include an easy way of interpreting the way that our couples ranked in regard to their answers for each question, so we created an `avg_score` column that indicates the mean score of all of their answers across the assessment.

In [125]:
y = divorces.Divorce
X = preprocessing.scale(divorces)


SyntaxError: invalid syntax (2893787496.py, line 2)

In [None]:
corr = divorces.iloc[:,:-1].corr()
cmap = sn.diverging_palette(5, 250, as_cmap=True)
corr.style.background_gradient(cmap, axis=1).set_properties(**{'max-width': '80px', 'font-size': '10pt'}).set_precision(2)

In [None]:
# thought this was interesting to add, what can we do with this?
divorces['avg_score'] = divorces.mean(axis=1)
divorces.head()  # now that we dropped our response variable, and added our new average score variable, we take another look

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(9, 7))

ax1.hist(divorces.avg_score)
ax1.set_title('Histogram')
ax1.set_xlabel('Score')
ax1.set_ylabel('Frequency')

ax2.scatter([_ for _ in range(1,171)], divorces.avg_score)
ax2.set_title('Scatter Plot')
ax2.set_ylabel('Average Score')


By looking at this distribution, we can make an assumption that it is probably the low scores that are divorced; but we can't be sure. We will take a closer look at this to see if this is the case.

In [None]:
len(divorces.loc[divorces.avg_score <= 0.7])

In [None]:
len(divorces.loc[divorces.avg_score >= 2.5])


# Decision Trees

In [None]:
from sklearn.tree import DecisionTreeClassifier

tree_clf = DecisionTreeClassifier(max_depth=3, random_state=42)
tree_clf.fit(X, y)