# Exploration and Analysis of Mental Health Data from Shamiri

### by F Njakai

### Table of contents
* [Introduction](#introduction)
* [Preliminary Wrangling](#preliminary-wrangling)
* [Exploration](#exploration)
* [Summary of Findings](#exploration)
* [Conclusions](#conclusions)

<div id="introduction"></div>

## Introduction

<div id="preliminary-wrangling"></div>

## Preliminary Wrangling

In [4]:
#install and/or upgrade packages

!pip install pandas seaborn pandas-profiling

Defaulting to user installation because normal site-packages is not writeable


In [20]:
#imports

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.filterwarnings(action='ignore')
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import pandas_profiling
from random import randint
from os import path
from contextlib import suppress
from pandas_profiling import ProfileReport

%load_ext autoreload
%autoreload 2
%reload_ext autoreload


%matplotlib inline

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Default settings for plots

automate, as much as possible, the process of creating visualisations

##### why?
* it is efficient
* visualisations are consistent

##### how?
* create templates

In [2]:
#template no. 1

#default blue
default_blue = sns.color_palette('tab10')[0]

#default orange
default_orange = sns.color_palette('tab10')[1]

#default palette
default_palette = sns.color_palette('tab10')

In [3]:
#template no. 2

'''
Simple function to create `Figure` object
using matplotlib. Has an x-lab, y-lab and
title.

"Father Figure", if you like :)

3 params, all type `str`:
@x_lab: x label
@y_lab: y label
@title: title

return: None
'''

def create_fig(x_lab: str, y_lab: str, title: str):
    """Father Figure"""
    try:
        plt.figure(figsize=(10, 6.18), dpi=216, frameon=False, clear=True)
        plt.xlabel(x_lab)
        plt.ylabel(y_lab)
        plt.title(title)
    except ModuleNotFoundError:
        print(f'Please `import matplotlib.pyplot as plt` and try again')
    except:
        print(f'Failed to create template')
        raise

In [4]:
#template no. 3

'''
Simple function to create `Figure` object
using matplotlib for sub-plots.

"Father Figure", for sub-plots :)

2 params, type `int`; number of sub-plots:
@n_row: #rows
@n_col: #cols

return: fig and ax objects
'''

def create_sub(n_row: int=1, n_col: int=1):
    """Father Figure for  sub-plots"""
    try:
        fig, ax = plt.subplots(n_row, n_col, figsize=(10, 6.18), dpi=216)
        fig.tight_layout(pad=10.0)
        return fig, ax
    except ModuleNotFoundError:
        print(f'Please `import matplotlib.pyplot as plt` and try again')
    except:
        print(f'Failed to create template')
        raise

In [5]:
#confirm that a df exists

'''
Simple function to see if a df exists

1 param: name of variable holding the df

Do not pass the arg as a string. 
Repeat: DO NOT pass the arg as a string.

return: None
'''

def confirm_df_exists(df):
    """confirm that a df exists"""
    if not df.empty:
        print(f'This dataframe exists')
        return
    print(f'This dataframe does not exist')

In [6]:
#confirm that a file exists and/or has been
# created in current dir

'''
simple function to confirm that a file exists
and/or has been created in current dir

1 param, type `str`: name of file
@file_name: name of file

return None
'''

def confirm_file_exists(file_name: str):
    """confirm that file exists"""
    if path.exists(file_name):
        print(f'File exists')
    else:
        print(f'Something went wrong. Investigate')

### Load the data

In [16]:
df = pd.read_csv('shamiri_imputed_dataset.csv', sep=',')
confirm_df_exists(df)

This dataframe exists


In [23]:
profile = ProfileReport(df)
profile.to_file(output_file='report.html')

Summarize dataset: 100%|██████████| 211/211 [01:01<00:00,  3.43it/s, Completed]                       
Generate report structure: 100%|██████████| 1/1 [00:18<00:00, 18.39s/it]
Render HTML: 100%|██████████| 1/1 [00:09<00:00,  9.95s/it]
Export report to file: 100%|██████████| 1/1 [00:00<00:00, 34.52it/s]


In [25]:
df.duplicated().value_counts()

False    658
dtype: int64

In [26]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 658 entries, 0 to 657
Data columns (total 33 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   ParticipantID     658 non-null    object 
 1   PHQ1              658 non-null    int64  
 2   PHQ2              658 non-null    int64  
 3   PHQ3              658 non-null    int64  
 4   PHQ4              658 non-null    int64  
 5   PHQ5              658 non-null    int64  
 6   PHQ6              658 non-null    int64  
 7   PHQ7              658 non-null    int64  
 8   PHQ8              658 non-null    int64  
 9   GAD1              658 non-null    int64  
 10  GAD2              658 non-null    int64  
 11  GAD3              658 non-null    int64  
 12  GAD4              658 non-null    int64  
 13  GAD5              658 non-null    int64  
 14  GAD6              658 non-null    int64  
 15  GAD7              658 non-null    int64  
 16  MSSS1             658 non-null    int64  
 1

<div id="explration"></div>

## Exploration

<div id="summary-of-findings"></div>

## Summary of Findings

<div id="conclusions"></div>

## Conclusions