# MAT5014 - Economic research - Suicide rates
*authors: BRAYE Valérien, CHHAY Ly An, ROLLAND Obed* <br>
*language: Python 3* <br>
*dataset: [Suicide Rates Overview 1985 to 2016](https://www.kaggle.com/datasets/russellyates88/suicide-rates-overview-1985-to-2016)*

## Imports and preliminary code

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

The following cell defines usefull global variables to configure this notebook.
- `DATAPATH`: change this to the path to the directory where the datasets are stored
- `SAVEFIGS`: weither to save the figures to disk or not. Saved figures will not have titles to be included in the report
- `FIGPATH`: path where the figures will be saved if `SAVEFIGS` is set to `True`.

In [2]:
DATAPATH = "./data/"
SAVEFIGS = False
FIGPATH = "./report/figures/"

In [3]:
"""
Publication ready pyplot theme
Source: https://github.com/matplotlib/matplotlib/issues/19028
"""

plot_settings = {'ytick.labelsize': 16,
                 'xtick.labelsize': 16,
                 'font.size': 22,
                 'figure.figsize': (10, 5),
                 'axes.titlesize': 22,
                 'axes.labelsize': 18,
                 'lines.linewidth': 2,
                 'lines.markersize': 3,
                 'legend.fontsize': 11,
                 'mathtext.fontset': 'stix',
                 'font.family': 'STIXGeneral'}
plt.style.use(plot_settings)

In [4]:
def print_list(l: list, title: str = None, bullet: str = "•", max_items: int = None, randomize: bool = False, indent_level: int = 0) -> None:
    """
    Prints a list nicely. Handles sublists. Tupples will be displayed as items: ('e1', 'e2', ...).
    Parameters:
        • l: list to print
        • title: title of the list
        • bullet: bullet point to use
        • max_items: maximum number of items to print
        • randomize: pick random elements or not
        • indent_level: indentation level for sub-lists
    """
    if title:
        print(f"{indent_level * '  '}{title}:")
        indent_level += 1
    final_list_to_print = l.copy()
    if randomize:
        random.shuffle(final_list_to_print)
    if max_items:
        final_list_to_print = l[:max_items]
    for item in final_list_to_print:
        if isinstance(item, list):
            print_list(item, bullet=bullet, indent_level=indent_level+1)
        elif isinstance(item, tuple):
            print(f"{indent_level * '  '}{bullet} {str(item)}")
        else:
            print(f"{indent_level * '  '}{bullet} {item}")
    if len(final_list_to_print) != len(l):
        print(f"{indent_level * '  '}...")

## Exploratory Data Analysis

In [5]:
suicides = pd.read_csv(DATAPATH + "master.csv")

In [6]:
print(f"Dataset dimensions: {suicides.shape[0]} × {suicides.shape[1]}")

Dataset dimensions: 27820 × 12


In [7]:
print_list(suicides.columns, "Variables in dataset")

Variables in dataset:
  • country
  • year
  • sex
  • age
  • suicides_no
  • population
  • suicides/100k pop
  • country-year
  • HDI for year
  •  gdp_for_year ($) 
  • gdp_per_capita ($)
  • generation
