The data of handicaps and other employees is stored in two different csv files. The very first step is to read the csv files and transform them into dataframes using pandas. 

In [1]:
import pandas as pd
file_path_1= 'data/raw/bilan-social-d-edf-sa-effectifs-et-repartition-par-age-statut-et-sexe.csv'
file_path_2= 'data/raw/bilan-social-d-edf-sa-salaries-en-situation-de-handicap.csv'
df_age_statut_sexe = pd.read_csv(file_path_1, delimiter=';')
df_handicap=pd.read_csv(file_path_2, delimiter=';')

Drop all english columns from both dataframes

In [2]:
df_age_statut_sexe_fr=df_age_statut_sexe.drop(['Spatial perimeter','Indicator', 'Type of contract', 
                         'Employee category', 'Employee subcategory', 'Gender','M3E classification', 
                         'Nationality', 'Seniority', 'Age bracket', 'Unit'], axis=1)

df_handicap_fr=df_handicap.drop(['Spatial perimeter', 'Indicator',
       'Type of contract', 'Employee category', 'Gender', 'Unit'], axis=1)

We only keep the columns of interest in both dataframes.  

In [3]:
colonnes_a_conserver=['Année', 'Indicateur', 'Valeur']
df1=df_age_statut_sexe_fr[colonnes_a_conserver]
df2=df_handicap_fr[colonnes_a_conserver]

We are only interested in the 'Effectif' category of the column 'Indicateur' in the non_handicap dataframe, and in the 'Salariés en situation de handicap' category oàf the column 'Indicateur' in the handicap dataframe. 

In [4]:
df3=df1[df1['Indicateur']=='Effectif']
df4=df2[df2['Indicateur']=='Salariés en situation de handicap']

We group now per year, in order to calculate the total number of employees, using the 'Valeur' colonnes

In [7]:
df_grouped = df3.groupby('Année', as_index=False)['Valeur'].sum()
df_grouped.rename(columns={'Valeur': 'Effectif'}, inplace=True)

We do the same for the handicap dataframe

In [9]:
df_grouped_handicap = df4.groupby('Année', as_index=False)['Valeur'].sum()
df_grouped_handicap.rename(columns={'Valeur': 'Effectif_handicap'}, inplace=True)

We merge the two dataframes in order to calculte the percentage of handicaps with respect to the total number of employees, and we calculate the corresponding percentage. 

In [11]:
merged_df = pd.merge(df_grouped, df_grouped_handicap, on='Année', how='outer')
merged_df['Pourcentage']=merged_df['Effectif_handicap']/merged_df['Effectif']*100

We transform our clean dataframe to a csv file, ready to be used in Tableau for visualization. 

In [12]:
merged_df.to_csv('Fichier1.csv', index=False)

We now move to access the web scarpped data saved in a csv file for other companies, and store it in a dataframe. 

In [13]:
file_path_3= 'data/all/all_entreprises_data_effectif_et_handicap.csv'
df_all = pd.read_csv(file_path_3, delimiter=',')

We group per year, on the company name and the 'Indicateur' column to calulate the total number of employees and handicaps. 

In [None]:
df_all.groupby(['Année', 'Perimètre juridique', 'Indicateur'], as_index=False)['Valeur'].sum()

Unnamed: 0,Année,Perimètre juridique,Indicateur,Valeur
0,2017,EDF,Effectif,328738
1,2017,EDF,Salariés en situation de handicap,2209
2,2018,EDF,Effectif,321659
3,2018,EDF,Salariés en situation de handicap,2277
4,2019,EDF,Effectif,314735
5,2019,EDF,Salariés en situation de handicap,2272
6,2019,ENGIE,Effectif,4045
7,2019,ENGIE,Salariés en situation de handicap,189
8,2019,Orange,Effectif,79774
9,2019,Orange,Salariés en situation de handicap,5247


We reorganize our dataframe per year for all companies. 

In [15]:
df_pivot = df_all.pivot_table(index=['Année', 'Perimètre juridique'], columns='Indicateur', values='Valeur', aggfunc='sum')
df_pivot.columns = ['Effectif', 'Effectif_handicap']
df_pivot = df_pivot.reset_index()

We calculate the percentage of the handicaps employees with respect to the total number of employees. 

In [17]:
df_pivot['Pourcentage']=df_pivot['Effectif_handicap']/df_pivot['Effectif']*100

We transform our dataframe inot a csv file, erady to be used for vizualisation in Tableau. 

In [None]:
df_pivot.to_csv('Fichier2.csv', index=False)