## Introduction

This workbook contains data wrangling for the Matchmaker's Dilemma project. Data were imported and optimized for use in machine learning algorithms. For this reason, most data were either left as or converted to integer format. 

### Table of Contents

1. [Imports](#Imports)
2. [Data uploading and high-level view](#Uploading)
3. [Recoding survey data to Numeric](#SurvToNum)
4. [Translating Categorical data from Numeric](#NumToCat)
5. [Convert all types that are not Object to Integer](#IntConv)
6. [Take a look and export!](#export)

### 1. Imports <a class="anchor" id="Imports"></a>

In [1]:
import pandas as pd
import pandas_profiling
import seaborn as sns

### 2. Data uploading and high-level view <a class="anchor" id="Uploading"></a>

In [2]:
# Load data set, column names
data = pd.read_excel('marital_satisfaction_data.xlsx')
col_names = pd.read_csv('col_names.csv')
# Drop extra rows
data = data.drop(index=[1])
data = data.drop(index=[0])
# Add col names
col_names = list(col_names)
data.columns = col_names

data.head()

Unnamed: 0,country,sex,age,marriage_duration_years,num_children_total,num_children_inhome,edu_level,material_situation,religion,religiosity,...,spouse_satisfaction,relationship_satisfaction,natl_pride_in_parents,natl_pride_in_children,natl_aging_parents_live_with_children,natl_children_live_at_home_marraige,indv_pride_in_parents,indv_pride_in_children,indv_aging_parents_live_with_children,indv_children_live_at_home_marraige
2,Brazil,1.0,21.0,2.0,0.0,0.0,5.0,3.0,1.0,4.0,...,7.0,7.0,1,1.0,1.0,1.0,1,1.0,1.0,1.0
3,Brazil,1.0,29.0,3.0,1.0,0.0,5.0,3.0,1.0,6.0,...,6.0,6.0,2,1.0,1.0,1.0,1,1.0,1.0,1.0
4,Brazil,1.0,30.0,7.0,0.0,0.0,5.0,3.0,1.0,4.0,...,7.0,7.0,2,1.0,2.0,1.0,1,1.0,1.0,1.0
5,Brazil,1.0,30.0,7.0,1.0,1.0,5.0,3.0,1.0,6.0,...,6.0,6.0,3,1.0,1.0,2.0,1,1.0,1.0,1.0
6,Brazil,1.0,28.0,9.0,0.0,0.0,4.0,2.0,1.0,5.0,...,6.0,7.0,3,2.0,3.0,2.0,1,1.0,1.0,1.0


In [3]:
data = data.reset_index(drop=True)

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7178 entries, 0 to 7177
Data columns (total 31 columns):
 #   Column                                 Non-Null Count  Dtype  
---  ------                                 --------------  -----  
 0   country                                7178 non-null   object 
 1   sex                                    7178 non-null   float64
 2   age                                    7178 non-null   float64
 3   marriage_duration_years                7178 non-null   float64
 4   num_children_total                     7178 non-null   float64
 5   num_children_inhome                    7178 non-null   float64
 6   edu_level                              7178 non-null   float64
 7   material_situation                     7178 non-null   float64
 8   religion                               7092 non-null   float64
 9   religiosity                            7178 non-null   float64
 10  pension                                7178 non-null   float64
 11  enjo

In [5]:
# Inspect null values
data.isnull().sum()

country                                   0
sex                                       0
age                                       0
marriage_duration_years                   0
num_children_total                        0
num_children_inhome                       0
edu_level                                 0
material_situation                        0
religion                                 86
religiosity                               0
pension                                   0
enjoy_spouse_company                      0
happiness                                 0
spouse_attraction                         0
spouse_enjoy_doing_things_together        0
spouse_enjoy_cuddling                     0
spouse_respect                            0
spouse_pride                              0
spouse_romance                            0
spouse_love                               0
marital_satisfaction                      0
spouse_satisfaction                       0
relationship_satisfaction       

### 3. Recoding survey data to Numeric <a class="anchor" id="SurvToNum"></a>

In [6]:
# Recode Globe Survey

globe_survey_dict = {1:3,2:2,3:1,4:0,5:-1,6:-2,7:-3}
globe_survey_cols = list(data.columns[23:])
data[globe_survey_cols] = data[globe_survey_cols].replace(globe_survey_dict)

# Recode Marriage & Relationships Questionnaire

mrq_dict = {1:2,2:1,3:0,4:-1,5:-2,6:-2,7:-2}
mrq_cols = list(data.columns[11:20])
data[mrq_cols] = data[mrq_cols].replace(mrq_dict)

# Recode Marriage Satisfaction Survey

marsat_dict = {1:-3,2:-2,3:-1,4:0,5:1,6:2,7:3}
marsat_cols = list(data.columns[20:23])
data[marsat_cols] = data[marsat_cols].replace(marsat_dict)

### 4. Translating Categorical data from Numeric <a class="anchor" id="NumToCat"></a>

In [7]:
# Recode Religion into categories, replace Nulls with "Did not answer"

data['religion'] = data['religion'].fillna(value="Did not answer")
data['religion'] = data['religion'].replace({1:"Protestant",2:"Catholic",3:"Jewish",4:"Muslim",5:"Buddhist",6:"None",7:"Jehovah",8:"Evangelic",9:"Spiritualism",10:"Other",11:"Orthodox",12:"Hindu"})

# Recode Pension using Globe Survey scale, as the scale is the same

data['pension'] = data['pension'].replace(globe_survey_dict)

# Recode Material Situation as 0 = Neutral

data['material_situation'] = data['material_situation'].replace({1:2,2:1,3:0,4:-1,5:-2,6:-2})


### 5. Convert all types that are not Object to Integer <a class="anchor" id="IntConv"></a>

In [8]:
col_list = list(data.columns)

for c in col_list:
    col_type = data.dtypes[c]
    
    if col_type != 'object':
        data[c] = data[c].astype(int)
    else:
        pass
        

### 6. Take a look and export! <a class="anchor" id="export"></a>

In [9]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7178 entries, 0 to 7177
Data columns (total 31 columns):
 #   Column                                 Non-Null Count  Dtype 
---  ------                                 --------------  ----- 
 0   country                                7178 non-null   object
 1   sex                                    7178 non-null   int32 
 2   age                                    7178 non-null   int32 
 3   marriage_duration_years                7178 non-null   int32 
 4   num_children_total                     7178 non-null   int32 
 5   num_children_inhome                    7178 non-null   int32 
 6   edu_level                              7178 non-null   int32 
 7   material_situation                     7178 non-null   int32 
 8   religion                               7178 non-null   object
 9   religiosity                            7178 non-null   int32 
 10  pension                                7178 non-null   int32 
 11  enjoy_spouse_comp

In [10]:
data.to_csv("marital_satisfaction_data_wrangled_final.csv")