** 0.0 / IMPORTS & INIT**

In [1]:
import pandas as pd
import time
import os
import dash
from jupyter_plotly_dash import JupyterDash
import dash_core_components as dcc
import dash_html_components as html
import matplotlib.pyplot as plt
import plotly.express as px

**1.1 / CSV IMPORTING**

We import the 4 working databases from csv files in the subdirectory "Base" at the root of the current working directory through Pandas and assign them as Dataframes.                          
We also display progress information, since the process can be quite lenghty.

Caution : When all bases are imported at once, they are all loaded into RAM and they can take up to 20G. If your machine can't handle it, you should comment out what you don't need and work sequentially.

We define the columns on which we are going to work for each dataframes

In [2]:
Col_avantage = ['ligne_identifiant', 'denomination_sociale', 'categorie', 'qualite', 'benef_codepostal', 'benef_ville', 'pays', 'benef_titre_libelle', 'benef_speicalite_libelle', 'benef_etablissement_codepostal', 'ligne_type', 'avant_date_signature', 'avant_montant_ttc', 'avant_nature']
Col_convention = ['ligne_identifiant', 'denomination_sociale', 'categorie', 'qualite', 'benef_codepostal', 'benef_ville', 'pays', 'benef_titre_libelle', 'benef_speicalite_libelle', 'benef_etablissement_codepostal', 'ligne_type', 'conv_date_signature']
Col_remuneration = ['entreprise_identifiant', 'denomination_sociale', 'benef_categorie_code', 'qualite', 'benef_codepostal', 'pays', 'benef_titre_libelle', 'benef_speicalite_libelle', 'benef_etablissement_codepostal', 'remu_date', 'remu_montant_ttc']
Col_entreprise = ['identifiant', 'pays', 'secteur', 'code_postal', 'ville']

In [3]:
# Imports all bases and displays progress info
# Note : last base uses a comma as separator

start = time.perf_counter() # getting starting timestamp
print('Starting import...\n\n')
print('Importing D_avantage...')
D_avantage = pd.read_csv("Base/declaration_avantage_2020_02_19_04_00.csv", sep = ";", usecols = Col_avantage)
print('D_avantage successfully imported. 3 more to go.')
print('Importing D_Convention...')
D_Convention = pd.read_csv("Base/declaration_convention_2020_02_19_04_00.csv", sep = ";", usecols = Col_convention)
print('D_Convention successfully imported. 2 more to go.')
print('Importing D_Remuneration...')
D_Remuneration = pd.read_csv("Base/declaration_remuneration_2020_02_19_04_00.csv", sep = ";", usecols = Col_remuneration)
print('D_Remuneration successfully imported. 1 more to go.')
print('Importing Entreprise...')
Entreprise = pd.read_csv("Base/entreprise_2020_02_19_04_00.csv", sep = ",", usecols = Col_entreprise)
print('Entreprise successfully imported.')
success = time.perf_counter() # getting ending timestamp
import_time = int(success - start) 
print('All csv successfully imported in %s seconds.'%(import_time))

Starting import...


Importing D_avantage...


**1.2 / DATAFRAMES CLEANING**

We aggregate all values representing less than 0.2% of the dataframe as one.

In [None]:
def aggregator3000(df, c, c2):
    for i in c:
        try:
            i.lower().strip()
        except Exception:
            continue
    res= df[[c,c2]].groupby(c).mean()

    return res





** DASH **


In [None]:
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

app = dash.Dash(__name__, external_stylesheets=external_stylesheets)

app.layout = html.Div(children=[
    html.H1(children='Transparence Santé'),

    html.Div(children='''
        Visualisation de données à partir de la base de données publique Transparence - Santé
    ''' ),

    dcc.Graph(
        id='example-graph',
        figure={
            'data': [
                {'x': [1, 2, 3], 'y': [1, 1, 2], 'type': 'bar', 'name': ':hap:'},
                {'x': [1, 2, 3], 'y': [2, 4, 5], 'type': 'bar', 'name': u':noel:'},
            ],
            'layout': {
                'title': 'Dash Data Visualization'
            }

        }
    )
])




if __name__ == '__main__':
    app.run_server(debug=True)