<a id = 'contents'></a>
# Setup

## Table of contents

- [Sys library](#sys)
- [Scikit-Learn library](#sk)
- [NumPy library](#np)
- [Os library](#os)
- [Pandas library](#pd)
- [Matplotlib library](#mt)
- [Plotly library](#pl)
- [Path definitions](#path)
- [User defined functions](#user)

In [2]:
# If you need some packages uncomment
# !pip install sklearn
# !pip install numpy 
# !pip install pandas 
# !pip install matplotlib 
# !pip install plotly
# !pip install -U kaleido
# !pip install tqdm

Collecting tqdm
  Downloading tqdm-4.63.1-py2.py3-none-any.whl (76 kB)
Installing collected packages: tqdm
Successfully installed tqdm-4.63.1


In [1]:
# Suppress future warnings for visualizzation
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

<a id = 'sys'></a>
# Sys library 

`Sys` provides various functions and variables that are used to manipulate different parts of the Python runtime environment. 

I use `sys` to ensure that the python version of the user is the 3.5 or greater.

In [2]:
import sys
assert sys.version_info >= (3, 5)

[Return to contents](#contents)

<a id = 'sk'></a>
# Scikit-Learn

`Scikit-Learn` library provides functions to compute large part of the common clustering algortihms and machine learnig tecniques.

`Scikit-Learn` interoperate with `NumPy`.

Also here we ensure about the required version.

In [3]:
import sklearn
assert sklearn.__version__ >= "0.20"

[Return to contents](#contents)

<a id = 'np'></a>
# NumPy library

`NumPy` is a numerical and scientific library that simplyfies a lot all the linear algebra computations.

It provides computationally efficient foundations for computing with arrays to a large number of other popular libraries, including `Pandas`, `Matplotlib` and `Scikit-Learn`.

In [4]:
import numpy as np

[Return to contents](#contents)

<a id = 'os'></a>
# Os library

The `Os` library is used to deal with files and path objects.

In [5]:
import os

[Return to contents](#contents)

<a id = 'pd'></a>
# Pandas

`Pandas` is a library for data manipulation and analysis.

The main classes in `Pandas` are `Series` and `DataFrame`.

The `series` are sequences with user-defined indexes while `dataframes` are analogous to relational tables.

We can think `dataframes` as dictionaries of `series`.

In [6]:
import pandas as pd

[Return to contents](#contents)

<a id = 'mt'></a>
# Matplotlib

`Matplotlib` library allow us to draw bidimensional graphic representations called plots, like:
- Scatter 
- Line 
- And export those to various file formats, including PDF, PNG, JPEG.

`%matplotLib inline` allow us to show resulting plots below the code cell in the notebook.

`matplotlib.pyplot` is a collection of functions that make matplotlib work like MATLAB. Each pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, and so on.

With `matplotlib.rc` we set our default styles for every plot element that we create.

In [7]:
%matplotlib inline 
import matplotlib as mpl 
import matplotlib.pyplot as plt

# matplotlib.rc
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

[Return to contents](#contents)

<a id = 'pl'></a>
# Plotly library 

`Plotly` is an interactive, open-source plotting library.

`plotly.express` module contain functions that can create entire figures at once.

In [8]:
import plotly.express as px

[Return to contents](#contents)

<a id = 'path'></a>
# Path definitions

Here I define the path object for storing data.

`PROJECT_ROOT_DIR = "../"` is a variable setted in order to tell us to do a "step back" from our current path i.e., from script to the general folder project.

With the other `os.path.join` functions I create the paths for store configurations, data, results and images by  concatenate the general folder project path `PROJECT_ROOT_DIR` with respectively: data, results and images strings.

`os.path.join` put between the concatenation strings `"/"` for respect the path objects syntax.

`os.makedirs` function is used to create directories recursively. If the target directory already exist, `exist_ok=True` parameter leaves the directory unaltered.

In [9]:
PROJECT_ROOT_DIR = "../"

DATA_PATH = os.path.join(PROJECT_ROOT_DIR, "data")
os.makedirs(DATA_PATH, exist_ok=True)

[Return to contents](#contents)

<a id = 'user'></a>
# User defined functions

## Display_side_by_side

For visualize, side by side, two or more dataframes.

`IPython.display` module for display tools in IPython.

`display_html` display the HTML representation of an object.

`itertools` library that contains functions that create iterators for looping.

`chain` joins iterators e.g _chain('ABC', 'DEF') --> A B C D E F_

`cycle` generate an infinite iterator e.g _cycle('ABCD') --> A B C D A B C D ..._

Parameters:

- **args**: desidered dataframes.

- **titles**: desidered `list` of `string` titles.

In `html_str` are defined style, position and format of our titles and dataframes: `<br>`produces a line break in text, `<th style="text-align:center">` sets the horizontal alignment of a text, `<td style="vertical-align:top">` sets the content to top-align(these 2 are used for positioning), `<h2>` sets text as heading(used for titles), `df.to_html()` render a DataFrame as an HTML table.

In [13]:
from IPython.display import display_html
from itertools import chain,cycle

def display_side_by_side(*args,titles=cycle([''])):
    html_str=''
    for df,title in zip(args, chain(titles,cycle(['</br>'])) ):
        html_str+='<th style="text-align:center"><td style="vertical-align:top">'
        html_str+=f'<h2>{title}</h2>'
        html_str+=df.to_html().replace('table','table style="display:inline"')
        html_str+='</td></th>'
    display_html(html_str,raw=True)

# Interpreter 

Since doesn't make sense create a json file dictionary to convert just one binary variable in strings for result interpretation, I implement a short function.

In [14]:
def interpreter(b):
    b = list(b)
    obj = []
    for a in b:
        if a == 0:
            obj.append('Worker')
        if a == 1:
            obj.append('Smartworker')
    return(obj)

[Return to contents](#contents)