O Turnover (rotatividade de funcionários) é um grande problema para as empresas. Sempre que um funcionário deixa um determinado trabalho, a empresa perde dinheiro e tempo com novas entrevistas e treinamentos do novo funcionário. Isso sem falar da perda de produtividade do setor afetado por esse turnover. São muitas as questões que fazem um funcionário deixar a empresa, entre eles: Melhores oportunidades, clima organizacional ruim, chefes ruins, baixo equilíbrio entre vida pessoal e profissional, entre outros.

Para tentar entender quais as características que fazem um funcionário ficar ou deixar uma empresa de Tecnologia, o RH desta empresa catalogou informações de 1470 funcionários que deixaram ou permaneceram na companhia no último ano. O resultado desse levantamento gerou 19 possíveis fatores que explicam o comportamento do turnover, que estão disponíveis no arquivo Base_RH.xlsx. Para conhecer esses fatores, verifique a tabela de metadados existente na guia Metadados.

Com base nisso, o RH encomendou um estudo para o analista de dados da área para responder a seguinte pergunta:

**Quais políticas/fatores da empresa deveriam mudar de forma a minimizar o turnover?**

Você, como um(a) bom(a) analista de dados, sabe que para responder essa pergunta é necessário realizar uma boa análise exploratória dos dados e avaliar a existência de associação entre o turnover e os diversos fatores.

In [1]:
#importando bibliotecas
import pandas as pd
from unicodedata import normalize
import plotly.express as px
import plotly.graph_objs as go
import plotly.figure_factory as ff
import numpy as np

In [2]:
colors = ['#19d4b5', '#c6dcd6', '#2c5c3c', '#78e4d0', '#14ccab', '#77797a']

In [3]:
#importando bases
base_rh = pd.read_excel("/content/drive/MyDrive/Portfolio/Base_RH.xlsx", sheet_name="Base", skiprows=6)

In [4]:
#informações das variáveis
base_rh.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1470 entries, 0 to 1469
Data columns (total 21 columns):
 #   Column                        Non-Null Count  Dtype 
---  ------                        --------------  ----- 
 0   ID                            1470 non-null   int64 
 1   Funcionário_deixou_a_empresa  1470 non-null   object
 2   Idade                         1470 non-null   int64 
 3   Frequência de Viagens         1470 non-null   object
 4   Distância_do_trabalho         1470 non-null   int64 
 5   Formação                      1470 non-null   object
 6   E-Sat                         1470 non-null   object
 7   Gênero                        1470 non-null   object
 8   Estado_Civil                  1470 non-null   object
 9   Salário                       1470 non-null   int64 
 10  Qte_Empresas_Trabalhadas      1470 non-null   int64 
 11  Faz_hora_extras?              1470 non-null   object
 12  Perc_de_aumento               1470 non-null   int64 
 13  Qte_ações_da_empre

In [5]:
#descartando ID
base_rh.drop(['ID'], axis=1, inplace=True)

In [6]:
#alterando nomes das colunas para minúsculas
colunas = base_rh.columns.str.lower().values

In [7]:
#tirando acentos e substituindo espaços por underscore
for i, c in enumerate(colunas):
  colunas[i] = normalize('NFKD', c).encode('ASCII','ignore').decode('ASCII')
  colunas[i] = colunas[i].replace(' ', '_')

In [8]:
base_rh.columns = colunas

In [9]:
#estatísticas das variáveis numéricas
base_rh.describe()

Unnamed: 0,idade,distancia_do_trabalho,salario,qte_empresas_trabalhadas,perc_de_aumento,qte_acoes_da_empresa,tempo_de_carreira,horas_de_treinamento,tempo_de_empresa,anos_no_mesmo_cargo,anos_desde_a_ultima_promocao,anos_com_o_mesmo_chefe
count,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0,1470.0
mean,36.92381,9.192517,6502.931293,2.693197,15.209524,0.793878,11.279592,2.79932,7.008163,4.229252,2.187755,4.123129
std,9.135373,8.106864,4707.956783,2.498009,3.659938,0.852077,7.780782,1.289271,6.126525,3.623137,3.22243,3.568136
min,18.0,1.0,1009.0,0.0,11.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,30.0,2.0,2911.0,1.0,12.0,0.0,6.0,2.0,3.0,2.0,0.0,2.0
50%,36.0,7.0,4919.0,2.0,14.0,1.0,10.0,3.0,5.0,3.0,1.0,3.0
75%,43.0,14.0,8379.0,4.0,18.0,1.0,15.0,3.0,9.0,7.0,3.0,7.0
max,60.0,29.0,19999.0,9.0,25.0,3.0,40.0,6.0,40.0,18.0,15.0,17.0


In [10]:
freq_abs = base_rh['funcionario_deixou_a_empresa'].value_counts().reset_index()
freq_abs.columns = ['funcionario_deixou_a_empresa', 'Qtde']
freq_rel = (base_rh['funcionario_deixou_a_empresa'].value_counts(normalize=True)*100).round(2).reset_index()
freq_rel.columns = ['funcionario_deixou_a_empresa', '%']
funcionario_deixou_a_empresa = pd.merge(freq_abs, freq_rel, how='left')
funcionario_deixou_a_empresa

Unnamed: 0,funcionario_deixou_a_empresa,Qtde,%
0,Não,1233,83.88
1,Sim,237,16.12


16,1% da amostra deixou a empresa.

In [11]:
fig = px.pie(funcionario_deixou_a_empresa, values='%', names='funcionario_deixou_a_empresa', color_discrete_sequence =colors)
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Funcionário deixou a empresa",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ),
    legend=dict(
     orientation="h",
    yanchor="bottom",
    y=-0.1,
    xanchor="center",
    x=0.50
))
fig.show()

In [12]:
freq_abs = base_rh['frequencia_de_viagens'].value_counts().reset_index()
freq_abs.columns = ['frequencia_de_viagens', 'Qtde']
freq_rel = (base_rh['frequencia_de_viagens'].value_counts(normalize=True)*100).round(2).reset_index()
freq_rel.columns = ['frequencia_de_viagens', '%']
frequencia_de_viagens = pd.merge(freq_abs, freq_rel, how='left')

70,95% dos colaboradores viajam raramente. 10,2% não viajam.

In [13]:
fig = px.bar(frequencia_de_viagens, x='%', y='frequencia_de_viagens', color_discrete_sequence =colors, text='%')
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Frequência de viagens",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
fig.update_traces(textposition='outside')
fig.update_yaxes(title='')
fig.show()

In [14]:
freq_abs = base_rh['formacao'].value_counts().reset_index()
freq_abs.columns = ['formacao', 'Qtde']
freq_rel = (base_rh['formacao'].value_counts(normalize=True)*100).round(2).reset_index()
freq_rel.columns = ['formacao', '%']
formacao = pd.merge(freq_abs, freq_rel, how='left')

Mais de 60% da empresa tem ensino superior, mestrado, doutorado.

In [15]:
fig = px.bar(formacao, x='%', y='formacao', color_discrete_sequence =colors, text = '%')
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Formação",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
fig.update_yaxes(title='')
fig.update_traces(textposition='outside')
fig.show()

In [16]:
freq_abs = base_rh['e-sat'].value_counts().reset_index()
freq_abs.columns = ['e-sat', 'Qtde']
freq_rel = (base_rh['e-sat'].value_counts(normalize=True)*100).round(2).reset_index()
freq_rel.columns = ['e-sat', '%']
e_sat = pd.merge(freq_abs, freq_rel, how='left')


61% das pessoas têm nível de satisfação alto ou muito alto.

In [17]:
fig = px.bar(e_sat, x='%', y='e-sat', color_discrete_sequence =colors, text='%')
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "e-sat",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
fig.update_yaxes(title='')
fig.update_traces(textposition='outside')
fig.show()

In [18]:
freq_abs = base_rh['genero'].value_counts().reset_index()
freq_abs.columns = ['genero', 'Qtde']
freq_rel = (base_rh['genero'].value_counts(normalize=True)*100).round(2).reset_index()
freq_rel.columns = ['genero', '%']
genero = pd.merge(freq_abs, freq_rel, how='left')


60% dos colaboradores são homens.

In [19]:
fig = px.pie(genero, values='%', names='genero', color_discrete_sequence =colors)
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Gênero",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ),
    legend=dict(
     orientation="h",
    yanchor="bottom",
    y=-0.1,
    xanchor="center",
    x=0.50
))
fig.show()

In [20]:
freq_abs = base_rh['estado_civil'].value_counts().reset_index()
freq_abs.columns = ['estado_civil', 'Qtde']
freq_rel = (base_rh['estado_civil'].value_counts(normalize=True)*100).round(2).reset_index()
freq_rel.columns = ['estado_civil', '%']
estado_civil = pd.merge(freq_abs, freq_rel, how='left')

45,8% das pessoas são casadas.

In [21]:
fig = px.pie(estado_civil, values='%', names='estado_civil', color_discrete_sequence =colors)
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Estado civil",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ),
    legend=dict(
     orientation="h",
    yanchor="bottom",
    y=-0.1,
    xanchor="center",
    x=0.50
))
fig.show()

In [22]:
freq_abs = base_rh['faz_hora_extras?'].value_counts().reset_index()
freq_abs.columns = ['faz_hora_extras?', 'Qtde']
freq_rel = (base_rh['faz_hora_extras?'].value_counts(normalize=True)*100).round(2).reset_index()
freq_rel.columns = ['faz_hora_extras?', '%']
hora_extras = pd.merge(freq_abs, freq_rel, how='left')

Mais de 70% dos colaboradores não fazem hora-extra.

In [23]:
fig = px.pie(hora_extras, values='%', names='faz_hora_extras?', color_discrete_sequence =colors)
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Faz horas extras?",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ),
    legend=dict(
     orientation="h",
    yanchor="bottom",
    y=-0.1,
    xanchor="center",
    x=0.50
))
fig.show()

In [24]:
freq_abs = base_rh['equilibrio_de_vida'].value_counts().reset_index()
freq_abs.columns = ['equilibrio_de_vida', 'Qtde']
freq_rel = (base_rh['equilibrio_de_vida'].value_counts(normalize=True)*100).round(2).reset_index()
freq_rel.columns = ['equilibrio_de_vida', '%']
equilibrio_de_vida = pd.merge(freq_abs, freq_rel, how='left')

84% consideram o equilíbrio de vida bom ou muito bom.

In [25]:
fig = px.bar(equilibrio_de_vida, x='%', y='equilibrio_de_vida', color_discrete_sequence =colors, text = "%")
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Equilíbrio de vida",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
fig.update_yaxes(title='')
fig.update_traces(textposition='outside')
fig.show()

A maior parte dos colaboradores estão na faixa etária entre 25 e 44 anos.

In [26]:
fig = px.box(base_rh, x="idade", color_discrete_sequence = [colors[0]])
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Idade",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
for x in zip(["min","q1","med","q3","max"],base_rh.quantile([0,0.25,0.5,0.75,1]).iloc[:,0].values):
    fig.add_annotation(
        x=x[1],
        y=0.3,
        text=x[0] + ":" + str(x[1]),
        showarrow=False
        )
fig.update_yaxes(title='')
fig.show()





In [27]:
fig = px.histogram(base_rh, x="idade", color_discrete_sequence = [colors[0]], nbins=10, text_auto=True)
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Idade",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
fig.update_yaxes(title='')
fig.show()

In [28]:
# criando faixas etárias
base_rh['faixa_etaria'] = pd.cut(base_rh.idade, bins=10)

In [29]:
base_rh["faixa_etaria"] = base_rh["faixa_etaria"].astype(object)

In [30]:
freq_abs = base_rh['faixa_etaria'].value_counts().reset_index()
freq_abs.columns = ['faixa_etaria', 'Qtde']
freq_rel = (base_rh['faixa_etaria'].value_counts(normalize=True)*100).round(2).reset_index()
freq_rel.columns = ['faixa_etaria', '%']
faixa_etaria = pd.merge(freq_abs, freq_rel, how='left')
faixa_etaria

Unnamed: 0,faixa_etaria,Qtde,%
0,"(34.8, 39.0]",297,20.2
1,"(30.6, 34.8]",265,18.03
2,"(26.4, 30.6]",224,15.24
3,"(39.0, 43.2]",175,11.9
4,"(43.2, 47.4]",131,8.91
5,"(22.2, 26.4]",105,7.14
6,"(47.4, 51.6]",92,6.26
7,"(51.6, 55.8]",77,5.24
8,"(17.958, 22.2]",57,3.88
9,"(55.8, 60.0]",47,3.2


In [31]:
freq_abs = base_rh['distancia_do_trabalho'].value_counts().reset_index()
freq_abs.columns = ['distancia_do_trabalho', 'Qtde']
freq_rel = (base_rh['distancia_do_trabalho'].value_counts(normalize=True)*100).round(2).reset_index()
freq_rel.columns = ['distancia_do_trabalho', '%']
distancia_do_trabalho = pd.merge(freq_abs, freq_rel, how='left')
distancia_do_trabalho.sort_values(by=['distancia_do_trabalho'])

Unnamed: 0,distancia_do_trabalho,Qtde,%
1,1,208,14.15
0,2,211,14.35
4,3,84,5.71
8,4,64,4.35
7,5,65,4.42
9,6,59,4.01
5,7,84,5.71
6,8,80,5.44
3,9,85,5.78
2,10,86,5.85


A maior parte dos colaboradores mora há até 10 km do trabalho.

In [32]:
fig = px.histogram(base_rh, x="distancia_do_trabalho", color_discrete_sequence = [colors[0]], nbins=10, text_auto=True)
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Distância do trabalho",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
fig.update_yaxes(title='')
fig.show()

In [33]:
fig = px.box(base_rh, x="distancia_do_trabalho", color_discrete_sequence = [colors[0]])
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Distância do trabalho",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
for x in zip(["min","q1","med","q3","max"],base_rh.quantile([0,0.25,0.5,0.75,1]).iloc[:,1].values):
    fig.add_annotation(
        x=x[1],
        y=0.3,
        text=x[0] + ":" + str(x[1]),
        showarrow=False, 
        textangle=-45
        )
fig.update_xaxes(title='')
fig.show()





In [34]:
freq_abs = base_rh['salario'].value_counts().reset_index()
freq_abs.columns = ['salario', 'Qtde']
freq_rel = (base_rh['salario'].value_counts(normalize=True)*100).round(2).reset_index()
freq_rel.columns = ['salario', '%']
salario = pd.merge(freq_abs, freq_rel, how='left')

50% dos colaboradores ganham até R$ 4.919.



In [35]:
fig = px.box(base_rh, x="salario", color_discrete_sequence = [colors[0]])
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Salário",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
for x in zip(["min","q1","med","q3","max"],base_rh.quantile([0,0.25,0.5,0.75,1]).iloc[:,2].values):
    fig.add_annotation(
        x=x[1],
        y=0.3,
        text=x[0] + ":" + str(x[1]),
        showarrow=False
        )
fig.update_xaxes(title='')
fig.show()





In [36]:
fig = px.histogram(base_rh, x="salario", color_discrete_sequence = [colors[0]], nbins=20, text_auto=True)
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Salário",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
fig.update_yaxes(title='')
fig.show()

In [37]:
# criando faixas de salário
base_rh['faixas_salario'] = pd.cut(base_rh.salario, bins=10)

In [38]:
base_rh["faixas_salario"] = base_rh["faixas_salario"].astype(object)

In [39]:
freq_abs = base_rh['faixas_salario'].value_counts().reset_index()
freq_abs.columns = ['faixas_salario', 'Qtde']
freq_rel = (base_rh['faixas_salario'].value_counts(normalize=True)*100).round(2).reset_index()
freq_rel.columns = ['faixas_salario', '%']
faixas_salario = pd.merge(freq_abs, freq_rel, how='left')
faixas_salario

Unnamed: 0,faixas_salario,Qtde,%
0,"(990.01, 2908.0]",365,24.83
1,"(2908.0, 4807.0]",349,23.74
2,"(4807.0, 6706.0]",290,19.73
3,"(8605.0, 10504.0]",110,7.48
4,"(6706.0, 8605.0]",109,7.41
5,"(18100.0, 19999.0]",67,4.56
6,"(10504.0, 12403.0]",56,3.81
7,"(16201.0, 18100.0]",54,3.67
8,"(12403.0, 14302.0]",52,3.54
9,"(14302.0, 16201.0]",18,1.22


In [40]:
freq_abs = base_rh['qte_empresas_trabalhadas'].value_counts().reset_index()
freq_abs.columns = ['qte_empresas_trabalhadas', 'Qtde']
freq_rel = (base_rh['qte_empresas_trabalhadas'].value_counts(normalize=True)*100).round(2).reset_index()
freq_rel.columns = ['qte_empresas_trabalhadas', '%']
qte_empresas_trabalhadas = pd.merge(freq_abs, freq_rel, how='left')
qte_empresas_trabalhadas

Unnamed: 0,qte_empresas_trabalhadas,Qtde,%
0,1,521,35.44
1,0,197,13.4
2,3,159,10.82
3,2,146,9.93
4,4,139,9.46
5,7,74,5.03
6,6,70,4.76
7,5,63,4.29
8,9,52,3.54
9,8,49,3.33


In [41]:
fig = px.box(base_rh, x="qte_empresas_trabalhadas", color_discrete_sequence = [colors[0]])
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Quantidade de empresas trabalhadas",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
for x in zip(["min","q1","med","q3","max"],base_rh.quantile([0,0.25,0.5,0.75,1]).iloc[:,3].values):
    fig.add_annotation(
        x=x[1],
        y=0.3,
        text=x[0] + ":" + str(x[1]),
        showarrow=False
        )
fig.update_xaxes(title='')
fig.show()





In [42]:
fig = px.histogram(base_rh, x="qte_empresas_trabalhadas", color_discrete_sequence = [colors[0]], nbins=10, text_auto=True)
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Quantidade de empresas trabalhadas",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
fig.update_yaxes(title='')
fig.show()

In [43]:
fig = px.box(base_rh, x="perc_de_aumento", color_discrete_sequence = [colors[0]])
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Percentual de aumento",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
for x in zip(["min","q1","med","q3","max"],base_rh.quantile([0,0.25,0.5,0.75,1]).iloc[:,4].values):
    fig.add_annotation(
        x=x[1],
        y=0.3,
        text=x[0] + ":" + str(x[1]),
        showarrow=False
        )
fig.update_xaxes(title='')
fig.show()





In [44]:
fig = px.histogram(base_rh, x="perc_de_aumento", color_discrete_sequence = [colors[0]], nbins=10, text_auto=True)
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Percentual de aumento",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
fig.update_yaxes(title='')
fig.show()

In [45]:
fig = px.histogram(base_rh, x="qte_acoes_da_empresa", color_discrete_sequence = [colors[0]], text_auto=True)
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Quantidade de ações da empresa",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
fig.update_yaxes(title='')
fig.show()

In [46]:
fig = px.box(base_rh, x="tempo_de_carreira", color_discrete_sequence = [colors[0]])
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Tempo de carreira",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
for x in zip(["min","q1","med","q3","max"],base_rh.quantile([0,0.25,0.5,0.75,1]).iloc[:,6].values):
    fig.add_annotation(
        x=x[1],
        y=0.3,
        text=x[0] + ":" + str(x[1]),
        showarrow=False
        )
fig.update_xaxes(title='')
fig.show()





In [47]:
fig = px.histogram(base_rh, x="tempo_de_carreira", color_discrete_sequence = [colors[0]], text_auto=True, nbins = 10)
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Tempo de carreira",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
fig.update_yaxes(title='')
fig.show()

In [48]:
# criando tempo de carreira
base_rh['faixa_tempo_de_carreira'] = pd.cut(base_rh.tempo_de_carreira, bins=10)

In [49]:
base_rh["faixa_tempo_de_carreira"] = base_rh["faixa_tempo_de_carreira"].astype(object)

In [50]:
freq_abs = base_rh['faixa_tempo_de_carreira'].value_counts().reset_index()
freq_abs.columns = ['faixa_tempo_de_carreira', 'Qtde']
freq_rel = (base_rh['faixa_tempo_de_carreira'].value_counts(normalize=True)*100).round(2).reset_index()
freq_rel.columns = ['faixa_tempo_de_carreira', '%']
faixa_etaria = pd.merge(freq_abs, freq_rel, how='left')
faixa_etaria

Unnamed: 0,faixa_tempo_de_carreira,Qtde,%
0,"(4.0, 8.0]",397,27.01
1,"(8.0, 12.0]",382,25.99
2,"(-0.04, 4.0]",228,15.51
3,"(12.0, 16.0]",144,9.8
4,"(16.0, 20.0]",112,7.62
5,"(20.0, 24.0]",95,6.46
6,"(24.0, 28.0]",49,3.33
7,"(28.0, 32.0]",35,2.38
8,"(32.0, 36.0]",21,1.43
9,"(36.0, 40.0]",7,0.48


In [51]:
fig = px.histogram(base_rh, x="horas_de_treinamento", color_discrete_sequence = [colors[0]], text_auto=True, nbins = 10)
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Horas de treinamento",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
fig.update_yaxes(title='')
fig.show()

In [52]:
fig = px.box(base_rh, x="tempo_de_empresa", color_discrete_sequence = [colors[0]])
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Tempo de empresa",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
for x in zip(["min","q1","med","q3","max"],base_rh.quantile([0,0.25,0.5,0.75,1]).iloc[:,8].values):
    fig.add_annotation(
        x=x[1],
        y=0.3,
        text=x[0] + ":" + str(x[1]),
        showarrow=False
        )
fig.update_xaxes(title='')
fig.show()





In [53]:
fig = px.histogram(base_rh, x="tempo_de_empresa", color_discrete_sequence = [colors[0]], text_auto=True, nbins = 10)
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Tempo de empresa",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
fig.update_yaxes(title='')
fig.show()

In [54]:
# criando tempo_de_empresa
base_rh['faixa_tempo_de_empresa'] = pd.cut(base_rh.tempo_de_empresa, bins=10)

In [55]:
base_rh["faixa_tempo_de_empresa"] = base_rh["faixa_tempo_de_empresa"].astype(object)

In [56]:
freq_abs = base_rh['faixa_tempo_de_empresa'].value_counts().reset_index()
freq_abs.columns = ['faixa_tempo_de_empresa', 'Qtde']
freq_rel = (base_rh['faixa_tempo_de_empresa'].value_counts(normalize=True)*100).round(2).reset_index()
freq_rel.columns = ['faixa_tempo_de_empresa', '%']
tempo_de_empresa = pd.merge(freq_abs, freq_rel, how='left')
tempo_de_empresa

Unnamed: 0,faixa_tempo_de_empresa,Qtde,%
0,"(-0.04, 4.0]",580,39.46
1,"(4.0, 8.0]",442,30.07
2,"(8.0, 12.0]",248,16.87
3,"(12.0, 16.0]",74,5.03
4,"(16.0, 20.0]",60,4.08
5,"(20.0, 24.0]",37,2.52
6,"(24.0, 28.0]",10,0.68
7,"(28.0, 32.0]",9,0.61
8,"(32.0, 36.0]",8,0.54
9,"(36.0, 40.0]",2,0.14


In [57]:
fig = px.box(base_rh, x="anos_no_mesmo_cargo", color_discrete_sequence = [colors[0]])
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Anos no mesmo cargo",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
for x in zip(["min","q1","med","q3","max"],base_rh.quantile([0,0.25,0.5,0.75,1]).iloc[:,9].values):
    fig.add_annotation(
        x=x[1],
        y=0.3,
        text=x[0] + ":" + str(x[1]),
        showarrow=False
        )
fig.update_xaxes(title='')
fig.show()





In [58]:
fig = px.histogram(base_rh, x="anos_no_mesmo_cargo", color_discrete_sequence = [colors[0]], text_auto=True, nbins = 10)
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Anos no mesmo cargo",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
fig.update_yaxes(title='')
fig.show()

In [59]:
# criando anos_no_mesmo_cargo
base_rh['faixa_anos_no_mesmo_cargo'] = pd.cut(base_rh.anos_no_mesmo_cargo, bins=6)

In [60]:
base_rh["faixa_anos_no_mesmo_cargo"] = base_rh["faixa_anos_no_mesmo_cargo"].astype(object)

In [61]:
freq_abs = base_rh['faixa_anos_no_mesmo_cargo'].value_counts().reset_index()
freq_abs.columns = ['faixa_anos_no_mesmo_cargo', 'Qtde']
freq_rel = (base_rh['faixa_anos_no_mesmo_cargo'].value_counts(normalize=True)*100).round(2).reset_index()
freq_rel.columns = ['faixa_anos_no_mesmo_cargo', '%']
anos_no_mesmo_cargo = pd.merge(freq_abs, freq_rel, how='left')
anos_no_mesmo_cargo

Unnamed: 0,faixa_anos_no_mesmo_cargo,Qtde,%
0,"(-0.018, 3.0]",808,54.97
1,"(6.0, 9.0]",378,25.71
2,"(3.0, 6.0]",177,12.04
3,"(9.0, 12.0]",61,4.15
4,"(12.0, 15.0]",33,2.24
5,"(15.0, 18.0]",13,0.88


In [62]:
fig = px.histogram(base_rh, x="anos_desde_a_ultima_promocao", color_discrete_sequence = [colors[0]], text_auto=True)
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Anos desde a última promoção",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
fig.update_yaxes(title='')
fig.show()

In [63]:
# criando anos_no_mesmo_cargo
base_rh['faixa_anos_desde_a_ultima_promocao'] = pd.cut(base_rh.anos_desde_a_ultima_promocao, bins=6)

In [64]:
base_rh["faixa_anos_desde_a_ultima_promocao"] = base_rh["faixa_anos_desde_a_ultima_promocao"].astype(object)

In [65]:
freq_abs = base_rh['faixa_anos_desde_a_ultima_promocao'].value_counts().reset_index()
freq_abs.columns = ['faixa_anos_desde_a_ultima_promocao', 'Qtde']
freq_rel = (base_rh['faixa_anos_desde_a_ultima_promocao'].value_counts(normalize=True)*100).round(2).reset_index()
freq_rel.columns = ['faixa_anos_desde_a_ultima_promocao', '%']
anos_promocao = pd.merge(freq_abs, freq_rel, how='left')
anos_promocao

Unnamed: 0,faixa_anos_desde_a_ultima_promocao,Qtde,%
0,"(-0.015, 2.5]",1097,74.63
1,"(2.5, 5.0]",158,10.75
2,"(5.0, 7.5]",108,7.35
3,"(7.5, 10.0]",41,2.79
4,"(10.0, 12.5]",34,2.31
5,"(12.5, 15.0]",32,2.18


In [66]:
fig = px.box(base_rh, x="anos_com_o_mesmo_chefe", color_discrete_sequence = [colors[0]])
fig.update_layout(separators = ',.', paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(0,0,0,0)", font_family="Arial",
    font_color="black",
    title_font_family="Arial",
    font=dict(size=20, color='black'),
    title={
        'text': "Anos com o mesmo chefe",
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    title_font_color="black",
    legend_title_font_color="white",
    hoverlabel=dict(
        bgcolor="white",
        font_size=16,
        font_family="Arial"
    ))
for x in zip(["min","q1","med","q3","max"],base_rh.quantile([0,0.25,0.5,0.75,1]).iloc[:,11].values):
    fig.add_annotation(
        x=x[1],
        y=0.3,
        text=x[0] + ":" + str(x[1]),
        showarrow=False
        )
fig.update_xaxes(title='')
fig.show()





In [67]:
# criando anos_no_mesmo_cargo
base_rh['faixa_anos_com_o_mesmo_chefe'] = pd.cut(base_rh.anos_com_o_mesmo_chefe, bins=5)

In [68]:
base_rh["faixa_anos_com_o_mesmo_chefe"] = base_rh["faixa_anos_com_o_mesmo_chefe"].astype(object)

In [69]:
freq_abs = base_rh['faixa_anos_com_o_mesmo_chefe'].value_counts().reset_index()
freq_abs.columns = ['faixa_anos_com_o_mesmo_chefe', 'Qtde']
freq_rel = (base_rh['faixa_anos_com_o_mesmo_chefe'].value_counts(normalize=True)*100).round(2).reset_index()
freq_rel.columns = ['faixa_anos_com_o_mesmo_chefe', '%']
anos_promocao = pd.merge(freq_abs, freq_rel, how='left')
anos_promocao

Unnamed: 0,faixa_anos_com_o_mesmo_chefe,Qtde,%
0,"(-0.017, 3.4]",825,56.12
1,"(6.8, 10.2]",414,28.16
2,"(3.4, 6.8]",158,10.75
3,"(10.2, 13.6]",54,3.67
4,"(13.6, 17.0]",19,1.29


In [70]:
corr = base_rh.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
df_mask = corr.mask(mask)

fig = ff.create_annotated_heatmap(z=df_mask.to_numpy().round(2), 
                                  x=df_mask.columns.tolist(),
                                  y=df_mask.columns.tolist(),
                                  colorscale=px.colors.diverging.RdYlGn,
                                  hoverinfo="none", #Shows hoverinfo for null values
                                  showscale=True, ygap=1, xgap=1
                                 )

fig.update_xaxes(side="bottom")

fig.update_layout(
    title_text='Correlação entre variáveis', 
    title_x=0.5, 
    width=1000, 
    height=1000,
    xaxis_showgrid=False,
    yaxis_showgrid=False,
    xaxis_zeroline=False,
    yaxis_zeroline=False,
    yaxis_autorange='reversed',
    template='plotly_white'
)

# NaN values are not handled automatically and are displayed in the figure
# So we need to get rid of the text manually
for i in range(len(fig.layout.annotations)):
    if fig.layout.annotations[i].text == 'nan':
        fig.layout.annotations[i].text = ""

fig.show()






In [71]:
categoricas = base_rh.select_dtypes(include="object").columns

In [72]:
iv = {}
tabelas_iv = {}

In [73]:

for c in categoricas[1:]:
  freq_abs = pd.DataFrame(base_rh[c].value_counts())
  freq_rel = pd.DataFrame(base_rh[c].value_counts(1)*100).round(2)
  freq_categoricas = freq_abs.merge(freq_rel, how='left', left_index=True, right_index=True).reset_index()
  freq_categoricas.columns = [c, 'Qtde', '%']
  freq_abs_sim = base_rh[base_rh['funcionario_deixou_a_empresa']=="Sim"][c].value_counts(dropna=False).reset_index()
  freq_abs_sim.columns = [c, 'Sim']
  freq_abs_nao = base_rh[base_rh['funcionario_deixou_a_empresa']=="Não"][c].value_counts(dropna=False).reset_index()
  freq_abs_nao.columns = [c, 'Não']
  freq_rel_sim = base_rh[base_rh['funcionario_deixou_a_empresa']=="Sim"][c].value_counts(1, dropna=False).reset_index().round(2)
  freq_rel_sim.columns = [c, '% Sim']
  freq_rel_nao = base_rh[base_rh['funcionario_deixou_a_empresa']=="Não"][c].value_counts(1, dropna=False).reset_index().round(2)
  freq_rel_nao.columns = [c, '% Não']
  freq_IV_categoricas = freq_categoricas.merge(freq_abs_sim.merge(freq_abs_nao.merge(freq_rel_sim.merge(freq_rel_nao, how='outer'), how='outer'), how='outer'), how='outer')
  freq_IV_categoricas.fillna(0, inplace=True)
  freq_IV_categoricas['Sim'] = freq_IV_categoricas['Sim'].astype(int)
  freq_IV_categoricas['% taxa de Sim'] = (freq_IV_categoricas['Sim']/freq_IV_categoricas['Qtde'] * 100).round(2)
  freq_IV_categoricas['Odds'] = (freq_IV_categoricas['% Sim']/freq_IV_categoricas['% Não']).round(2)
  freq_IV_categoricas['LN(Odds)'] = np.log(freq_IV_categoricas['Odds']).round(2)
  freq_IV_categoricas['IV'] = ((freq_IV_categoricas['% Sim']-freq_IV_categoricas['% Não'])*freq_IV_categoricas['LN(Odds)']).round(2)
  freq_IV_categoricas.replace([np.inf, -np.inf], 0, inplace=True)
  freq_IV_categoricas.fillna(0, inplace=True)
  freq_IV_categoricas = freq_IV_categoricas.sort_values("% taxa de Sim", ascending = False)
  tabelas_iv[c] = freq_IV_categoricas
  iv[c] = [freq_IV_categoricas['IV'].sum().round(2)]


divide by zero encountered in log


divide by zero encountered in log


divide by zero encountered in log


divide by zero encountered in log


divide by zero encountered in log



In [74]:
lista_iv = pd.DataFrame(iv).T.reset_index()

In [75]:
lista_iv.columns = ["variável", "IV"]

In [76]:
lista_iv["interpretação"] = (
    np.where(
        lista_iv["IV"] < 0.02,
        "muito fraca",
        np.where(
            lista_iv["IV"] < 0.1,
            "fraca",
             np.where(
                lista_iv["IV"] < 0.3, 
                "média",
                np.where(
                  lista_iv["IV"] < 0.5,
                  "forte",
                  "suspeita/verificar")
                  )
                )
             )
        )

In [77]:
lista_iv.sort_values(by="IV", ascending=False)

Unnamed: 0,variável,IV,interpretação
5,faz_hora_extras?,0.42,forte
7,faixa_etaria,0.35,forte
9,faixa_tempo_de_carreira,0.34,forte
8,faixas_salario,0.28,média
4,estado_civil,0.23,média
10,faixa_tempo_de_empresa,0.23,média
11,faixa_anos_no_mesmo_cargo,0.17,média
0,frequencia_de_viagens,0.11,média
2,e-sat,0.09,fraca
6,equilibrio_de_vida,0.08,fraca


In [78]:
funcionario_deixou_a_empresa[funcionario_deixou_a_empresa["funcionario_deixou_a_empresa"]=="Sim"]["%"]

1    16.12
Name: %, dtype: float64

In [79]:
tabelas_iv["faz_hora_extras?"]

Unnamed: 0,faz_hora_extras?,Qtde,%,Sim,Não,% Sim,% Não,% taxa de Sim,Odds,LN(Odds),IV
1,Sim,416,28.3,127,289,0.54,0.23,30.53,2.35,0.85,0.26
0,Não,1054,71.7,110,944,0.46,0.77,10.44,0.6,-0.51,0.16


In [80]:
tabelas_iv["estado_civil"]

Unnamed: 0,estado_civil,Qtde,%,Sim,Não,% Sim,% Não,% taxa de Sim,Odds,LN(Odds),IV
1,Solteiro,470,31.97,120,350,0.51,0.28,25.53,1.82,0.6,0.14
0,Casado,673,45.78,84,589,0.35,0.48,12.48,0.73,-0.31,0.04
2,Divorciado,327,22.24,33,294,0.14,0.24,10.09,0.58,-0.54,0.05


In [81]:
tabelas_iv["frequencia_de_viagens"]

Unnamed: 0,frequencia_de_viagens,Qtde,%,Sim,Não,% Sim,% Não,% taxa de Sim,Odds,LN(Odds),IV
1,Viaja frequentemente,277,18.84,69,208,0.29,0.17,24.91,1.71,0.54,0.06
0,Viaja raramente,1043,70.95,156,887,0.66,0.72,14.96,0.92,-0.08,0.0
2,Não viaja,150,10.2,12,138,0.05,0.11,8.0,0.45,-0.8,0.05


In [82]:
tabelas_iv["faixas_salario"]

Unnamed: 0,faixas_salario,Qtde,%,Sim,Não,% Sim,% Não,% taxa de Sim,Odds,LN(Odds),IV
0,"(990.01, 2908.0]",365,24.83,107,258,0.45,0.21,29.32,2.14,0.76,0.18
3,"(8605.0, 10504.0]",110,7.48,21,89,0.09,0.07,19.09,1.29,0.25,0.0
1,"(2908.0, 4807.0]",349,23.74,50,299,0.21,0.24,14.33,0.88,-0.13,0.0
4,"(6706.0, 8605.0]",109,7.41,14,95,0.06,0.08,12.84,0.75,-0.29,0.01
2,"(4807.0, 6706.0]",290,19.73,30,260,0.13,0.21,10.34,0.62,-0.48,0.04
8,"(12403.0, 14302.0]",52,3.54,5,47,0.02,0.04,9.62,0.5,-0.69,0.01
6,"(10504.0, 12403.0]",56,3.81,5,51,0.02,0.04,8.93,0.5,-0.69,0.01
5,"(18100.0, 19999.0]",67,4.56,5,62,0.02,0.05,7.46,0.4,-0.92,0.03
7,"(16201.0, 18100.0]",54,3.67,0,54,0.0,0.04,0.0,0.0,0.0,0.0
9,"(14302.0, 16201.0]",18,1.22,0,18,0.0,0.01,0.0,0.0,0.0,0.0


In [83]:
tabelas_iv["faixa_etaria"]

Unnamed: 0,faixa_etaria,Qtde,%,Sim,Não,% Sim,% Não,% taxa de Sim,Odds,LN(Odds),IV
8,"(17.958, 22.2]",57,3.88,27,30,0.11,0.02,47.37,5.5,1.7,0.15
5,"(22.2, 26.4]",105,7.14,29,76,0.12,0.06,27.62,2.0,0.69,0.04
2,"(26.4, 30.6]",224,15.24,44,180,0.19,0.15,19.64,1.27,0.24,0.01
1,"(30.6, 34.8]",265,18.03,50,215,0.21,0.17,18.87,1.24,0.22,0.01
9,"(55.8, 60.0]",47,3.2,8,39,0.03,0.03,17.02,1.0,0.0,0.0
6,"(47.4, 51.6]",92,6.26,11,81,0.05,0.07,11.96,0.71,-0.34,0.01
4,"(43.2, 47.4]",131,8.91,15,116,0.06,0.09,11.45,0.67,-0.4,0.01
7,"(51.6, 55.8]",77,5.24,8,69,0.03,0.06,10.39,0.5,-0.69,0.02
0,"(34.8, 39.0]",297,20.2,30,267,0.13,0.22,10.1,0.59,-0.53,0.05
3,"(39.0, 43.2]",175,11.9,15,160,0.06,0.13,8.57,0.46,-0.78,0.05


In [84]:
tabelas_iv["faixa_tempo_de_carreira"]

Unnamed: 0,faixa_tempo_de_carreira,Qtde,%,Sim,Não,% Sim,% Não,% taxa de Sim,Odds,LN(Odds),IV
2,"(-0.04, 4.0]",228,15.51,75,153,0.32,0.12,32.89,2.67,0.98,0.2
9,"(36.0, 40.0]",7,0.48,2,5,0.01,0.0,28.57,0.0,0.0,0.0
0,"(4.0, 8.0]",397,27.01,72,325,0.3,0.26,18.14,1.15,0.14,0.01
1,"(8.0, 12.0]",382,25.99,47,335,0.2,0.27,12.3,0.74,-0.3,0.02
4,"(16.0, 20.0]",112,7.62,12,100,0.05,0.08,10.71,0.62,-0.48,0.01
3,"(12.0, 16.0]",144,9.8,15,129,0.06,0.1,10.42,0.6,-0.51,0.02
8,"(32.0, 36.0]",21,1.43,2,19,0.01,0.02,9.52,0.5,-0.69,0.01
5,"(20.0, 24.0]",95,6.46,8,87,0.03,0.07,8.42,0.43,-0.84,0.03
6,"(24.0, 28.0]",49,3.33,3,46,0.01,0.04,6.12,0.25,-1.39,0.04
7,"(28.0, 32.0]",35,2.38,1,34,0.0,0.03,2.86,0.0,0.0,0.0
