### Projeto: Análise da área de Dados


#### Contexto

- Fonte dos dados: https://www.kaggle.com/datasets/datahackers/state-of-data-brazil-20242025
- Os dados são fornecem um panorma sobre o mercado de trabalho brasileiro na área de Dados.

### Objetivos da Análise

Entender quais são os fatores relacionados à remuneração dos profissionais que estão atuando como funcionários CLT.

#### Sobre os dados da Pesquisa
O questionário foi dividido em 8 partes, e dentro de cada uma das partes temos as perguntas e opções de escolha.

- Parte 1 - Dados demográficos
- Parte 2 - Dados sobre carreira
- Parte 3 - Desafios dos gestores de times de dados
- Parte 4 - Conhecimentos na área de dados
- Parte 5 - Objetivos na área de dados
- Parte 6 - Conhecimentos em Engenharia de Dados/DE
- Parte 7 - Conhecimentos em Análise de Dados/DA
- Parte 8 - Conhecimentos em Ciências de Dados/DS

Cada pergunta é dividida em Parte, Letra da Pergunta, Número da Opção escolhida
Exemplo: P3a_1 = Parte 3, pergunta (a), opção (1)

In [1]:
import pandas as pd
import re as re
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_theme(style='white', palette='Set2', context='notebook')
pd.set_option('display.max_columns', None)

from src.config import DADOS_ORIGINAIS, PASTA_DADOS, DADOS_TRATADO
from src.graficos import composicao_histograma_boxplot

In [2]:
base_total = pd.read_csv(DADOS_ORIGINAIS)

base_total.head(5)

Unnamed: 0,0.a_token,0.d_data/hora_envio,1.a_idade,1.a.1_faixa_idade,1.b_genero,1.c_cor/raca/etnia,1.d_pcd,1.e_experiencia_profissional_prejudicada,1.e.1_Não acredito que minha experiência profissional seja afetada,"1.e.2_Sim, devido a minha Cor/Raça/Etnia","1.e.3_Sim, devido a minha identidade de gênero","1.e.4_Sim, devido ao fato de ser PCD",1.i.1_uf_onde_mora,1.f.1_Quantidade de oportunidades de emprego/vagas recebidas,1.f.2_Senioridade das vagas recebidas em relação à sua experiência,1.f.3_Aprovação em processos seletivos/entrevistas,1.f.4_Oportunidades de progressão de carreira,1.f.5_Velocidade de progressão de carreira,1.f.6_Nível de cobrança no trabalho/Stress no trabalho,1.f.7_Atenção dada pelas pessoas diante das minhas opiniões e ideias,"1.f.8_Relação com outras pessoas da empresa, em momentos de trabalho","1.f.9_Relação com outras pessoas da empresa, em momentos de integração e outros momentos fora do trabalho",1.i.2_regiao_onde_mora,1.f_aspectos_prejudicados,1.k.1_uf_de_origem,1.k.2_regiao_de_origem,1.g_vive_no_brasil,1.h_pais_onde_mora,1.i_estado_onde_mora,1.j_vive_no_estado_de_formacao,1.k_estado_de_origem,1.l_nivel_de_ensino,1.m_área_de_formação,2.a_situação_de_trabalho,2.b_setor,2.c_numero_de_funcionarios,2.d_atua_como_gestor,2.e_cargo_como_gestor,2.f_cargo_atual,2.g_nivel,2.h_faixa_salarial,2.i_tempo_de_experiencia_em_dados,2.j_tempo_de_experiencia_em_ti,2.k_satisfeito_atualmente,2.l.1_Remuneração/Salário,2.l.2_Benefícios,2.l.3_Propósito do trabalho e da empresa,2.l.4_Flexibilidade de trabalho remoto,2.l.5_Ambiente e clima de trabalho,2.l.6_Oportunidade de aprendizado e trabalhar com referências,2.l.7_Oportunidades de crescimento,2.l.8_Maturidade da empresa em termos de tecnologia e dados,2.l.9_Relação com os gestores e líderes,2.l.10_Reputação que a empresa tem no mercado,2.l.11_Gostaria de trabalhar em outra área,2.l_motivo_insatisfacao,2.m_participou_de_entrevistas_ultimos_6m,2.n_planos_de_mudar_de_emprego_6m,2.o_criterios_para_escolha_de_emprego,2.o.1_Remuneração/Salário,2.o.2_Benefícios,2.o.3_Propósito do trabalho e da empresa,2.o.4_Flexibilidade de trabalho remoto,2.o.5_Ambiente e clima de trabalho,2.o.6_Oportunidade de aprendizado e trabalhar com referências,2.o.7_Plano de carreira e oportunidades de crescimento,2.o.8_Maturidade da empresa em termos de tecnologia e dados,2.o.9_Qualidade dos gestores e líderes,2.o.10_Reputação que a empresa tem no mercado,2.q_empresa_passou_por_layoff_em_2024,2.r_modelo_de_trabalho_atual,2.s_modelo_de_trabalho_ideal,2.t_atitude_em_caso_de_retorno_presencial,3.a_numero_de_pessoas_em_dados,3.b_cargos_no_time_de_dados_da_empresa,3.b.1_Analytics Engineer,3.b.2_Engenharia de Dados/Data Engineer,3.b.3_Analista de Dados/Data Analyst,3.b.4_Cientista de Dados/Data Scientist,3.b.5_Database Administrator/DBA,3.b.6_Analista de Business Intelligence/BI,3.b.7_Arquiteto de Dados/Data Architect,3.b.8_Data Product Manager/DPM,3.b.9_Business Analyst,3.b.10_ML Engineer/AI Engineer,3.c_responsabilidades_como_gestor,3.c.1_Pensar na visão de longo prazo de dados,3.c.2_Organização de treinamentos e iniciativas,"3.c.3_Atração, seleção e contratação",3.c.4_Decisão sobre contratação de ferramentas,3.c.5_gestor da equipe de engenharia de dados,"3.c.6_gestor da equipe de estudos, relatórios",3.c.7_gestor da equipe de Inteligência Artificial e Machine Learning,3.c.8_Apesar de ser gestor ainda atuo na parte técnica,3.c.9_Gestão de projetos de dados,3.c.10_Gestão de produtos de dados,3.c.11_Gestão de pessoas,3.d_desafios_como_gestor,3.d.1_Contratar talentos,3.d.2_Reter talentos,3.d.3_Convencer a empresa a aumentar investimentos,3.d.4_Gestão de equipes no ambiente remoto,3.d.5_Gestão de projetos envolvendo áreas multidisciplinares,3.d.6_Organizar as informações com qualidade e confiabilidade,3.d.7_Processar e armazenar um alto volume de dados,3.d.8_Gerar valor para as áreas de negócios,3.d.9_Desenvolver e manter modelos Machine Learning em produção,3.d.10_Gerenciar a expectativa das áreas,3.d.11_Garantir a manutenção dos projetos e modelos em produção,3.d.12_Conseguir levar inovação para a empresa,3.d.13_Garantir (ROI) em projetos de dados,3.d.14_Dividir o tempo entre entregas técnicas e gestão,3.e_ai_generativa_e_llm_é_uma_prioridade?,3.f_tipo_de_uso_de_ai_generativa_e_llm_na_empresa,3.f.1 Colaboradores usando AI generativa de forma independente e descentralizada,3.f.2 Direcionamento centralizado do uso de AI generativa,3.f.3 Desenvolvedores utilizando Copilots,3.f.4 AI Generativa e LLMs para melhorar produtos externos para os clientes finais,3.f.5 AI Generativa e LLMs para melhorar produtos internos para os colaboradores,3.f.6 IA Generativa e LLMs como principal frente do negócio,3.f.7 IA Generativa e LLMs não é prioridade,3.f.8 Não sei opinar sobre o uso de IA Generativa e LLMs na empresa,3.g_motivos_para_não_usar_ai_generativa_e_llm,3.g.1 Falta de compreensão dos casos de uso,3.g.2 Falta de confiabilidade das saídas (alucinação dos modelos),3.g.3 Incerteza em relação a regulamentação,3.g.4 Preocupações com segurança e privacidade de dados,3.g.5 Retorno sobre investimento (ROI) não comprovado de IA Generativa,3.g.6 Dados da empresa não estão prontos para uso de IA Generativa,3.g.7 Falta de expertise ou falta de recursos,3.g.8 Alta direção da empresa não vê valor ou não vê como prioridade,3.g.9 Preocupações com propriedade intelectual,4.a_funcao_de_atuacao,4.a.1_atuacao_em_dados,4.b_fontes_de_dados_(dia_a_dia),4.b.1_Dados relacionais (estruturados em bancos SQL),4.b.2_Dados armazenados em bancos NoSQL,4.b.3_Imagens,4.b.4_Textos/Documentos,4.b.5_Vídeos,4.b.6_Áudios,4.b.7_Planilhas,4.b.8_Dados georeferenciados,4.c_fonte_de_dado_mais_usada,4.c.1_Dados relacionais (estruturados em bancos SQL),4.c.2_Dados armazenados em bancos NoSQL,4.c.3_Imagens,4.c.4_Textos/Documentos,4.c.5_Vídeos,4.c.6_Áudios,4.c.7_Planilhas,4.c.8_Dados georeferenciados,4.d_linguagem_de_programacao_(dia_a_dia),4.d.1_SQL,4.d.2_R,4.d.3_Python,4.d.4_C/C++/C#,4.d.5_.NET,4.d.6_Java,4.d.7_Julia,4.d.8_SAS/Stata,4.d.9_Visual Basic/VBA,4.d.10_Scala,4.d.11_Matlab,4.d.12_Rust,4.d.13_PHP,4.d.14_JavaScript,4.d.15_Não utilizo nenhuma das linguagens listadas,4.e_linguagem_mais_usada,4.f_linguagem_preferida,4.g_banco_de_dados_(dia_a_dia),4.g.1_MySQL,4.g.2_Oracle,4.g.3_SQL SERVER,4.g.4_Amazon Aurora ou RDS,4.g.5_DynamoDB,4.g.6_CoachDB,4.g.7_Cassandra,4.g.8_MongoDB,4.g.9_MariaDB,4.g.10_Datomic,4.g.11_S3,4.g.12_PostgreSQL,4.g.13_ElasticSearch,4.g.14_DB2,4.g.15_Microsoft Access,4.g.16_SQLite,4.g.17_Sybase,4.g.18_Firebase,4.g.19_Vertica,4.g.20_Redis,4.g.21_Neo4J,4.g.22_Google BigQuery,4.g.23_Google Firestore,4.g.24_Amazon Redshift,4.g.25_Amazon Athena,4.g.26_Snowflake,4.g.27_Databricks,4.g.28_HBase,4.g.29_Presto,4.g.30_Splunk,4.g.31_SAP HANA,4.g.32_Hive,4.g.33_Firebird,4.h_cloud_(dia_a_dia),4.h.1_Amazon Web Services (AWS),4.h.2_Google Cloud (GCP),4.h.3_Azure (Microsoft),4.h.4_Oracle Cloud,4.h.5_IBM,4.h.6_Servidores On Premise/Não utilizamos Cloud,4.h.7_Cloud Própria,4.i_cloud_preferida,4.j_ferramenta_de_bi_(dia_a_dia),4.j.1_Microsoft PowerBI,4.j.2_Qlik View/Qlik Sense,4.j.3_Tableau,4.j.4_Metabase,4.j.5_Superset,4.j.6_Redash,4.j.7_Looker,4.j.8_Looker Studio(Google Data Studio),4.j.9_Amazon Quicksight,4.j.10_Alteryx,4.j.11_SAP Business Objects/SAP Analytics,4.j.12_Oracle Business Intelligence,4.j.13_Salesforce/Einstein Analytics,4.j.14_SAS Visual Analytics,4.j.15_Grafana,4.j.16_Pentaho,4.j.17_Fazemos todas as análises utilizando apenas Excel ou planilhas do google,4.j.18_Não utilizo nenhuma ferramenta de BI no trabalho,4.k_ferramenta_de_bi_preferida,4.l_tipo_de_uso_de_ai_generativa_e_llm_na_empresa,4.l.1 Colaboradores usando AI generativa de forma independente e descentralizada,4.l.2 Direcionamento centralizado do uso de AI generativa,4.l.3 Desenvolvedores utilizando Copilots,4.l.4 AI Generativa e LLMs para melhorar produtos externos para os clientes finais,4.l.5 AI Generativa e LLMs para melhorar produtos internos para os colaboradores,4.l.6 IA Generativa e LLMs como principal frente do negócio,4.l.7 IA Generativa e LLMs não é prioridade,4.l.8 Não sei opinar sobre o uso de IA Generativa e LLMs na empresa,4.m_usa_chatgpt_ou_copilot_no_trabalho?,4.m.1 Não uso soluções de AI Generativa com foco em produtividade,4.m.2 Uso soluções gratuitas de AI Generativa com foco em produtividade,4.m.3 Uso e pago pelas soluções de AI Generativa com foco em produtividade,4.m.4 A empresa que trabalho paga pelas soluções de AI Generativa com foco em produtividade,4.m.5 Uso soluções do tipo Copilot,5.a_objetivo_na_area_de_dados,5.b_oportunidade_buscada,5.c_tempo_em_busca_de_oportunidade,5.d_experiencia_em_processos_seletivos,6.a_rotina_como_de,"6.a.1_Desenvolvo pipelines de dados utilizando linguagens de programação como Python, Scala, Java etc.","6.a.2_Realizo construções de ETL's em ferramentas como Pentaho, Talend, Dataflow etc.",6.a.3_Crio consultas através da linguagem SQL para exportar informações e compartilhar com as áreas de negócio.,"6.a.4_Atuo na integração de diferentes fontes de dados através de plataformas proprietárias como Stitch Data, Fivetran etc.","6.a.5_Modelo soluções de arquitetura de dados, criando componentes de ingestão de dados, transformação e recuperação da informação.",6.a.6_Desenvolvo/cuido da manutenção de repositórios de dados baseados em streaming de eventos como Data Lakes e Data Lakehouses.,"6.a.7_Atuo na modelagem dos dados, com o objetivo de criar conjuntos de dados como Data Warehouses, Data Marts, Datasets etc.","6.a.8_Cuido da qualidade dos dados, metadados e dicionário de dados.",6.a.9_Nenhuma das opções listadas refletem meu dia a dia.,6.b_ferramentas_etl_de,6.b.1_Scripts Python,6.b.2_SQL & Stored Procedures,6.b.3_Apache Airflow,6.b.4_Apache NiFi,6.b.5_Luigi,6.b.6_AWS Glue,6.b.7_Talend,6.b.8_Pentaho,6.b.9_Alteryx,6.b.10_Stitch,6.b.11_Fivetran,6.b.12_Google Dataflow,6.b.13_Oracle Data Integrator,6.b.14_IBM DataStage,6.b.15_SAP BW ETL,6.b.16_SQL Server Integration Services (SSIS),6.b.17_SAS Data Integration,6.b.18_Qlik Sense,6.b.19_Knime,6.b.20_Databricks,6.b.21_Não utilizo ferramentas de ETL,6.c_possui_data_lake,6.d_tecnologia_data_lake,6.e_possui_data_warehouse,6.f_tecnologia_data_warehouse,6.g_ferramentas_de_qualidade_de_dados_(dia_a_dia),6.h_maior_tempo_gasto_como_de,"6.h.1_Desenvolvendo pipelines de dados utilizando linguagens de programação como Python, Scala, Java etc.","6.h.2_Realizando construções de ETL\s em ferramentas como Pentaho, Talend, Dataflow etc.",6.h.3_Criando consultas através da linguagem SQL para exportar informações e compartilhar com as áreas de negócio.,"6.h.4_Atuando na integração de diferentes fontes de dados através de plataformas proprietárias como Stitch Data, Fivetran etc.","6.h.5_Modelando soluções de arquitetura de dados, criando componentes de ingestão de dados, transformação e recuperação da informação.",6.h.6_Desenvolvendo/cuidando da manutenção de repositórios de dados baseados em streaming de eventos como Data Lakes e Data Lakehouses.,"6.h.7_Atuando na modelagem dos dados, com o objetivo de criar conjuntos de dados como Data Warehouses, Data Marts, Datasets etc.","6.h.8_Cuidando da qualidade dos dados, metadados e dicionário de dados.",6.h.9_Nenhuma das opções listadas refletem meu dia a dia.,7.a_rotina_como_da,"7.a.1_Processo e analiso dados utilizando linguagens de programação como Python, R etc.","7.a.2_Realizo construções de dashboards em ferramentas de BI como PowerBI, Tableau, Looker, Qlik etc.",7.a.3_Crio consultas através da linguagem SQL para exportar informações e compartilhar com as áreas de negócio.,7.a.4_Utilizo API\s para extrair dados e complementar minhas análises.,"7.a.5_Realizo experimentos e estudos utilizando metodologias estatísticas como teste de hipótese, modelos de regressão etc.","7.a.6_Desenvolvo/cuido da manutenção de ETL\s utilizando tecnologias como Talend, Pentaho, Airflow, Dataflow etc.","7.a.7_Atuo na modelagem dos dados, com o objetivo de criar conjuntos de dados como Data Warehouses, Data Marts etc.",7.a.8_Desenvolvo/cuido da manutenção de planilhas para atender as áreas de negócio.,"7.a.9_Utilizo ferramentas avançadas de estatística como SAS, SPSS, Stata etc, para realizar análises de dados.",7.a.10_Nenhuma das opções listadas refletem meu dia a dia.,7.b_ferramentas_etl_da,7.b.1_Scripts Python,7.b.2_SQL & Stored Procedures,7.b.3_Apache Airflow,7.b.4_Apache NiFi,7.b.5_Luigi,7.b.6_AWS Glue,7.b.7_Talend,7.b.8_Pentaho,7.b.9_Alteryx,7.b.10_Stitch,7.b.11_Fivetran,7.b.12_Google Dataflow,7.b.13_Oracle Data Integrator,7.b.14_IBM DataStage,7.b.15_SAP BW ETL,7.b.16_SQL Server Integration Services (SSIS),7.b.17_SAS Data Integration,7.b.18_Qlik Sense,7.b.19_Knime,7.b.20_Databricks,7.b.21_Não utilizo ferramentas de ETL,7.c_ferramentas_autonomia_area_de_negocios,"7.c.1_Ferramentas de AutoML como H2O.ai, Data Robot, BigML etc.","7.c.2_""Point and Click"" Analytics como Alteryx, Knime, Rapidminer etc.","7.c.3_Product metricts & Insights como Mixpanel, Amplitude, Adobe Analytics.",7.c.4_Ferramentas de análise dentro de ferramentas de CRM como Salesforce Einstein Anaytics ou Zendesk dashboards.,7.c.5_Minha empresa não utiliza essas ferramentas.,7.c.6_Não sei informar.,7.d_maior_tempo_gasto_como_da,"7.d.1_Processando e analisando dados utilizando linguagens de programação como Python, R etc.","7.d.2_Realizando construções de dashboards em ferramentas de BI como PowerBI, Tableau, Looker, Qlik etc.",7.d.3_Criando consultas através da linguagem SQL para exportar informações e compartilhar com as áreas de negócio.,7.d.4_Utilizando API's para extrair dados e complementar minhas análises.,"7.d.5_Realizando experimentos e estudos utilizando metodologias estatísticas como teste de hipótese, modelos de regressão etc.","7.d.6_Desenvolvendo/cuidando da manutenção de ETL's utilizando tecnologias como Talend, Pentaho, Airflow, Dataflow etc.","7.d.7_Atuando na modelagem dos dados, com o objetivo de criar conjuntos de dados como Data Warehouses, Data Marts, Datasets etc.",7.d.8_Desenvolvendo/cuidando da manutenção de planilhas para atender as áreas de negócio.,"7.d.9_Utilizando ferramentas avançadas de estatística como SAS, SPSS, Stata etc, para realizar análises de dados.",7.d.10_Nenhuma das opções listadas refletem meu dia a dia.,8.a_rotina_como_ds,"8.a.1_Estudos Ad-hoc com o objetivo de confirmar hipóteses, realizar modelos preditivos, forecasts, análise de cluster para resolver problemas pontuais e responder perguntas das áreas de negócio.",8.a.2_Sou responsável pela coleta e limpeza dos dados que uso para análise e modelagem.,"8.a.3_Sou responsável por entrar em contato com os times de negócio para definição do problema, identificar a solução e apresentação de resultados.",8.a.4_Desenvolvo modelos de Machine Learning com o objetivo de colocar em produção em sistemas (produtos de dados).,"8.a.5_Sou responsável por colocar modelos em produção, criar os pipelines de dados, APIs de consumo e monitoramento.","8.a.6_Cuido da manutenção de modelos de Machine Learning já em produção, atuando no monitoramento, ajustes e refatoração quando necessário.","8.a.7_Realizo construções de dashboards em ferramentas de BI como PowerBI, Tableau, Looker, Qlik, etc","8.a.8_Utilizo ferramentas avançadas de estatística como SAS, SPSS, Stata etc, para realizar análises.","8.a.9_Crio e dou manutenção em ETLs, DAGs e automações de pipelines de dados.",8.a.10_Crio e gerencio soluções de Feature Store e cultura de MLOps.,"8.a.11_Sou responsável por criar e manter a infra que meus modelos e soluções rodam (clusters, servidores, API, containers, etc.)",8.a.12_Treino e aplico LLM's para solucionar problemas de negócio.,8.b_tecnicas_e_metodos_ds,"8.b.1_Utilizo modelos de regressão (linear, logística, GLM).",8.b.2_Utilizo redes neurais ou modelos baseados em árvore para criar modelos de classificação.,8.b.3_Desenvolvo sistemas de recomendação (RecSys).,8.b.4_Utilizo métodos estatísticos Bayesianos para analisar dados.,8.b.5_Utilizo técnicas de NLP (Natural Language Processing) para análisar dados não-estruturados.,"8.b.6_Utilizo métodos estatísticos clássicos (Testes de hipótese, análise multivariada, sobrevivência, dados longitudinais, inferência estatistica) para analisar dados.",8.b.7_Utilizo cadeias de Markov ou HMM\s para realizar análises de dados.,"8.b.8_Desenvolvo técnicas de Clusterização (K-means, Spectral, DBScan etc).",8.b.9_Realizo previsões através de modelos de Séries Temporais (Time Series).,8.b.10_Utilizo modelos de Reinforcement Learning (aprendizado por reforço).,8.b.11_Utilizo modelos de Machine Learning para detecção de fraude.,8.b.12_Utilizo métodos de Visão Computacional.,8.b.13_Utilizo modelos de Detecção de Churn.,8.b.14_Utilizo LLM's para solucionar problemas de negócio.,8.c_tecnologias_ds,"8.c.1_Ferramentas de BI (PowerBI, Looker, Tableau, Qlik etc).","8.c.2_Planilhas (Excel, Google Sheets etc).","8.c.3_Ambientes de desenvolvimento local (R-studio, JupyterLab, Anaconda).","8.c.4_Ambientes de desenvolvimento na nuvem (Google Colab, AWS Sagemaker, Kaggle Notebooks etc).","8.c.5_Ferramentas de AutoML (Datarobot, H2O, Auto-Keras etc).","8.c.6_Ferramentas de ETL (Apache Airflow, NiFi, Stitch, Fivetran, Pentaho etc).","8.c.7_Plataformas de Machine Learning (TensorFlow, Azure Machine Learning, Kubeflow etc).","8.c.8_Feature Store (Feast, Hopsworks, AWS Feature Store, Databricks Feature Store etc).","8.c.9_Sistemas de controle de versão (Github, DVC, Neptune, Gitlab etc).","8.c.10_Plataformas de Data Apps (Streamlit, Shiny, Plotly Dash etc).","8.c.11_Ferramentas de estatística avançada como SPSS, SAS, etc.",8.d_maior_tempo_gasto_como_ds,"8.d.1_Estudos Ad-hoc com o objetivo de confirmar hipóteses, realizar modelos preditivos, forecasts, análise de cluster para resolver problemas pontuais e responder perguntas das áreas de negócio.",8.d.2_Coletando e limpando dos dados que uso para análise e modelagem.,"8.d.3_Entrando em contato com os times de negócio para definição do problema, identificar a solução e apresentação de resultados.",8.d.4_Desenvolvendo modelos de Machine Learning com o objetivo de colocar em produção em sistemas (produtos de dados).,"8.d.5_Colocando modelos em produção, criando os pipelines de dados, APIs de consumo e monitoramento.","8.d.6_Cuidando da manutenção de modelos de Machine Learning já em produção, atuando no monitoramento, ajustes e refatoração quando necessário.","8.d.7_Realizando construções de dashboards em ferramentas de BI como PowerBI, Tableau, Looker, Qlik, etc.","8.d.8_Utilizando ferramentas avançadas de estatística como SAS, SPSS, Stata etc, para realizar análises.","8.d.9_Criando e dando manutenção em ETLs, DAGs e automações de pipelines de dados.",8.d.10_Criando e gerenciando soluções de Feature Store e cultura de MLOps.,"8.d.11_Criando e mantendo a infra que meus modelos e soluções rodam (clusters, servidores, API, containers, etc.)",8.d.12_Treinando e aplicando LLM's para solucionar problemas de negócio.
0,reb94rv0msth7q4nreb94riaq80iz3yi,16/10/2024 11:19:17,18,17-21,Masculino,Branca,Não,,,,,,RS,,,,,,,,,,Sul,,,,True,,Rio Grande do Sul (RS),True,,Estudante de Graduação,Computação / Engenharia de Software / Sistemas...,Estagiário,Marketing,de 101 a 500,False,,Analista de Dados/Data Analyst,Júnior,de R$ 1.001/mês a R$ 2.000/mês,de 1 a 2 anos,de 1 a 2 anos,True,,,,,,,,,,,,,"Sim, fui aprovado e mudei de emprego","Não estou buscando, mas me considero aberto a ...","Remuneração/Salário, Reputação que a empresa t...",1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,"Sim, ocorreram layoffs/demissões em massa na e...",Modelo 100% remoto,Modelo híbrido flexível (o funcionário tem lib...,Vou procurar outra oportunidade no modelo híbr...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*Análise de Dados/BI:* Extrai e cruza dados un...,Análise de Dados,"Planilhas, Dados relacionais (estruturados em ...",1.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,"Planilhas, Textos/Documentos",0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,"Python, JavaScript, SQL",1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Python,Python,Google BigQuery,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Servidores On Premise/Não utilizamos Cloud,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Amazon Web Services (AWS),Fazemos todas as análises utilizando apenas Ex...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Microsoft PowerBI,Colaboradores utilizando soluções baseadas em ...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Utilizo apenas soluções gratuitas (como por ex...,0.0,1.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Realizo experimentos e estudos utilizando meto...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,Scripts Python,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Minha empresa não utiliza essas ferramentas.,0.0,0.0,0.0,0.0,1.0,0.0,Desenvolvendo/cuidando da manutenção de planil...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,1zc66g69jjt49y32l1zc66g8wqj79m4e,16/10/2024 20:45:31,18,17-21,Masculino,Branca,Não,,,,,,SC,,,,,,,,,,Sul,,PR,Sul,True,,Santa Catarina (SC),False,Paraná (PR),Estudante de Graduação,Computação / Engenharia de Software / Sistemas...,Estagiário,Finanças ou Bancos,Acima de 3.000,False,,Analista de BI/BI Analyst,Júnior,Menos de R$ 1.000/mês,Menos de 1 ano,Menos de 1 ano,True,,,,,,,,,,,,,Não participei de entrevistas de emprego/proce...,Não estou buscando e não pretendo mudar de emp...,Oportunidade de aprendizado e trabalhar com re...,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,Não ocorreram layoffs/demissões em massa na em...,Modelo 100% presencial,Modelo 100% presencial,Vou aceitar e retornar ao modelo 100% presencial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*Análise de Dados/BI:* Extrai e cruza dados un...,Análise de Dados,Dados relacionais (estruturados em bancos SQL)...,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,Dados relacionais (estruturados em bancos SQL),1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Python,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Python,Python,Databricks,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,Azure (Microsoft),0.0,0.0,1.0,0.0,0.0,0.0,0.0,Azure (Microsoft),Microsoft PowerBI,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Microsoft PowerBI,Colaboradores utilizando soluções baseadas em ...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,Utilizo apenas soluções gratuitas (como por ex...,0.0,1.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Realizo construções de dashboards em ferrament...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Scripts Python,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Não sei informar.,0.0,0.0,0.0,0.0,0.0,1.0,Realizando construções de dashboards em ferram...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,uu99wmam4n5kc2uu99wmydf0rk7l58f7,17/10/2024 18:10:59,18,17-21,Masculino,Parda,Não,Não acredito que minha experiência profissiona...,1.0,0.0,0.0,0.0,SP,,,,,,,,,,Sudeste,,,,True,,São Paulo (SP),True,,Estudante de Graduação,Computação / Engenharia de Software / Sistemas...,Empregado (CLT),Indústria,de 501 a 1.000,False,,Outra Opção,Júnior,de R$ 1.001/mês a R$ 2.000/mês,Não tenho experiência na área de dados,Não tive experiência na área de TI/Engenharia ...,False,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Remuneração/Salário não corresponde a realidad...,Não participei de entrevistas de emprego/proce...,Estou em busca de oportunidades dentro ou fora...,"Remuneração/Salário, Propósito do trabalho e d...",1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,Não ocorreram layoffs/demissões em massa na em...,Modelo 100% presencial,Modelo híbrido com dias fixos de trabalho pres...,Vou aceitar e retornar ao modelo 100% presencial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"Atuo na área de dados, mas não atuo em nenhuma...",Outra atuação,"Dados armazenados em bancos NoSQL, Dados relac...",1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,Planilhas,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Não utilizo linguagem de programação no trabalho,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,"MySQL, CoachDB",1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Google Cloud (GCP),0.0,1.0,0.0,0.0,0.0,0.0,0.0,Amazon Web Services (AWS),Redash,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Metabase,Colaboradores utilizando soluções baseadas em ...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Utilizo apenas soluções gratuitas (como por ex...,0.0,1.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,3ynsw7z0hl5hhpbfvaeqk73ynsw7z04l,22/10/2024 18:03:00,18,17-21,Masculino,Branca,Não,,,,,,SP,,,,,,,,,,Sudeste,,,,True,,São Paulo (SP),True,,Estudante de Graduação,Computação / Engenharia de Software / Sistemas...,Estagiário,Tecnologia/Fábrica de Software,de 501 a 1.000,False,,Analista de Dados/Data Analyst,Júnior,de R$ 1.001/mês a R$ 2.000/mês,Menos de 1 ano,Não tive experiência na área de TI/Engenharia ...,True,,,,,,,,,,,,,Não participei de entrevistas de emprego/proce...,Não estou buscando e não pretendo mudar de emp...,Oportunidade de aprendizado e trabalhar com re...,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,Não ocorreram layoffs/demissões em massa na em...,Modelo híbrido flexível (o funcionário tem lib...,Modelo híbrido flexível (o funcionário tem lib...,Vou aceitar e retornar ao modelo 100% presencial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,*Análise de Dados/BI:* Extrai e cruza dados un...,Análise de Dados,Dados relacionais (estruturados em bancos SQL),1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Dados relacionais (estruturados em bancos SQL)...,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,SQL,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,SQL,SQL,"Snowflake, PostgreSQL",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Amazon Web Services (AWS),1.0,0.0,0.0,0.0,0.0,0.0,0.0,Amazon Web Services (AWS),"Metabase, Microsoft PowerBI, Grafana",1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,Não tenho preferência / Não sei opinar,Não sei opinar.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,Utilizo apenas soluções gratuitas (como por ex...,0.0,1.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Nenhuma das opções listadas refletem meu dia a...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,Não utilizo ferramentas de ETL,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,Minha empresa não utiliza essas ferramentas.,0.0,0.0,0.0,0.0,1.0,0.0,Nenhuma das opções listadas refletem meu dia a...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,v6bji2ct5xckpl1uouv6bjiotkrf3b4f,23/10/2024 01:33:23,18,17-21,Masculino,Branca,Não,,,,,,SP,,,,,,,,,,Sudeste,,,,True,,São Paulo (SP),True,,Estudante de Graduação,Computação / Engenharia de Software / Sistemas...,Estagiário,Tecnologia/Fábrica de Software,de 1.001 a 3.000,False,,Desenvolvedor/ Engenheiro de Software/ Analist...,Júnior,de R$ 1.001/mês a R$ 2.000/mês,Menos de 1 ano,Menos de 1 ano,True,,,,,,,,,,,,,"Sim, fui aprovado mas decidi não mudar de emprego",Não estou buscando e não pretendo mudar de emp...,Oportunidade de aprendizado e trabalhar com re...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,Não ocorreram layoffs/demissões em massa na em...,Modelo 100% remoto,Modelo híbrido flexível (o funcionário tem lib...,Vou aceitar e retornar ao modelo 100% presencial,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Outra atuação,Dados relacionais (estruturados em bancos SQL)...,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,Dados relacionais (estruturados em bancos SQL)...,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,"SQL, JavaScript",1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,JavaScript,Python,MySQL,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Azure (Microsoft),0.0,0.0,1.0,0.0,0.0,0.0,0.0,Amazon Web Services (AWS),Não utilizo nenhuma ferramenta de BI no trabalho,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,,Colaboradores utilizando soluções baseadas em ...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Utilizo apenas soluções gratuitas (como por ex...,0.0,1.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [3]:
base_total.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5217 entries, 0 to 5216
Columns: 403 entries, 0.a_token to 8.d.12_Treinando e aplicando LLM's para solucionar problemas de negócio.
dtypes: bool(1), float64(323), int64(1), object(78)
memory usage: 16.0+ MB


In [4]:
base_total.describe()

Unnamed: 0,1.a_idade,1.e.1_Não acredito que minha experiência profissional seja afetada,"1.e.2_Sim, devido a minha Cor/Raça/Etnia","1.e.3_Sim, devido a minha identidade de gênero","1.e.4_Sim, devido ao fato de ser PCD",1.f.1_Quantidade de oportunidades de emprego/vagas recebidas,1.f.2_Senioridade das vagas recebidas em relação à sua experiência,1.f.3_Aprovação em processos seletivos/entrevistas,1.f.4_Oportunidades de progressão de carreira,1.f.5_Velocidade de progressão de carreira,1.f.6_Nível de cobrança no trabalho/Stress no trabalho,1.f.7_Atenção dada pelas pessoas diante das minhas opiniões e ideias,"1.f.8_Relação com outras pessoas da empresa, em momentos de trabalho","1.f.9_Relação com outras pessoas da empresa, em momentos de integração e outros momentos fora do trabalho",2.l.1_Remuneração/Salário,2.l.2_Benefícios,2.l.3_Propósito do trabalho e da empresa,2.l.4_Flexibilidade de trabalho remoto,2.l.5_Ambiente e clima de trabalho,2.l.6_Oportunidade de aprendizado e trabalhar com referências,2.l.7_Oportunidades de crescimento,2.l.8_Maturidade da empresa em termos de tecnologia e dados,2.l.9_Relação com os gestores e líderes,2.l.10_Reputação que a empresa tem no mercado,2.l.11_Gostaria de trabalhar em outra área,2.o.1_Remuneração/Salário,2.o.2_Benefícios,2.o.3_Propósito do trabalho e da empresa,2.o.4_Flexibilidade de trabalho remoto,2.o.5_Ambiente e clima de trabalho,2.o.6_Oportunidade de aprendizado e trabalhar com referências,2.o.7_Plano de carreira e oportunidades de crescimento,2.o.8_Maturidade da empresa em termos de tecnologia e dados,2.o.9_Qualidade dos gestores e líderes,2.o.10_Reputação que a empresa tem no mercado,3.b.1_Analytics Engineer,3.b.2_Engenharia de Dados/Data Engineer,3.b.3_Analista de Dados/Data Analyst,3.b.4_Cientista de Dados/Data Scientist,3.b.5_Database Administrator/DBA,3.b.6_Analista de Business Intelligence/BI,3.b.7_Arquiteto de Dados/Data Architect,3.b.8_Data Product Manager/DPM,3.b.9_Business Analyst,3.b.10_ML Engineer/AI Engineer,3.c.1_Pensar na visão de longo prazo de dados,3.c.2_Organização de treinamentos e iniciativas,"3.c.3_Atração, seleção e contratação",3.c.4_Decisão sobre contratação de ferramentas,3.c.5_gestor da equipe de engenharia de dados,"3.c.6_gestor da equipe de estudos, relatórios",3.c.7_gestor da equipe de Inteligência Artificial e Machine Learning,3.c.8_Apesar de ser gestor ainda atuo na parte técnica,3.c.9_Gestão de projetos de dados,3.c.10_Gestão de produtos de dados,3.c.11_Gestão de pessoas,3.d.1_Contratar talentos,3.d.2_Reter talentos,3.d.3_Convencer a empresa a aumentar investimentos,3.d.4_Gestão de equipes no ambiente remoto,3.d.5_Gestão de projetos envolvendo áreas multidisciplinares,3.d.6_Organizar as informações com qualidade e confiabilidade,3.d.7_Processar e armazenar um alto volume de dados,3.d.8_Gerar valor para as áreas de negócios,3.d.9_Desenvolver e manter modelos Machine Learning em produção,3.d.10_Gerenciar a expectativa das áreas,3.d.11_Garantir a manutenção dos projetos e modelos em produção,3.d.12_Conseguir levar inovação para a empresa,3.d.13_Garantir (ROI) em projetos de dados,3.d.14_Dividir o tempo entre entregas técnicas e gestão,3.f.1 Colaboradores usando AI generativa de forma independente e descentralizada,3.f.2 Direcionamento centralizado do uso de AI generativa,3.f.3 Desenvolvedores utilizando Copilots,3.f.4 AI Generativa e LLMs para melhorar produtos externos para os clientes finais,3.f.5 AI Generativa e LLMs para melhorar produtos internos para os colaboradores,3.f.6 IA Generativa e LLMs como principal frente do negócio,3.f.7 IA Generativa e LLMs não é prioridade,3.f.8 Não sei opinar sobre o uso de IA Generativa e LLMs na empresa,3.g.1 Falta de compreensão dos casos de uso,3.g.2 Falta de confiabilidade das saídas (alucinação dos modelos),3.g.3 Incerteza em relação a regulamentação,3.g.4 Preocupações com segurança e privacidade de dados,3.g.5 Retorno sobre investimento (ROI) não comprovado de IA Generativa,3.g.6 Dados da empresa não estão prontos para uso de IA Generativa,3.g.7 Falta de expertise ou falta de recursos,3.g.8 Alta direção da empresa não vê valor ou não vê como prioridade,3.g.9 Preocupações com propriedade intelectual,4.b.1_Dados relacionais (estruturados em bancos SQL),4.b.2_Dados armazenados em bancos NoSQL,4.b.3_Imagens,4.b.4_Textos/Documentos,4.b.5_Vídeos,4.b.6_Áudios,4.b.7_Planilhas,4.b.8_Dados georeferenciados,4.c.1_Dados relacionais (estruturados em bancos SQL),4.c.2_Dados armazenados em bancos NoSQL,4.c.3_Imagens,4.c.4_Textos/Documentos,4.c.5_Vídeos,4.c.6_Áudios,4.c.7_Planilhas,4.c.8_Dados georeferenciados,4.d.1_SQL,4.d.2_R,4.d.3_Python,4.d.4_C/C++/C#,4.d.5_.NET,4.d.6_Java,4.d.7_Julia,4.d.8_SAS/Stata,4.d.9_Visual Basic/VBA,4.d.10_Scala,4.d.11_Matlab,4.d.12_Rust,4.d.13_PHP,4.d.14_JavaScript,4.d.15_Não utilizo nenhuma das linguagens listadas,4.g.1_MySQL,4.g.2_Oracle,4.g.3_SQL SERVER,4.g.4_Amazon Aurora ou RDS,4.g.5_DynamoDB,4.g.6_CoachDB,4.g.7_Cassandra,4.g.8_MongoDB,4.g.9_MariaDB,4.g.10_Datomic,4.g.11_S3,4.g.12_PostgreSQL,4.g.13_ElasticSearch,4.g.14_DB2,4.g.15_Microsoft Access,4.g.16_SQLite,4.g.17_Sybase,4.g.18_Firebase,4.g.19_Vertica,4.g.20_Redis,4.g.21_Neo4J,4.g.22_Google BigQuery,4.g.23_Google Firestore,4.g.24_Amazon Redshift,4.g.25_Amazon Athena,4.g.26_Snowflake,4.g.27_Databricks,4.g.28_HBase,4.g.29_Presto,4.g.30_Splunk,4.g.31_SAP HANA,4.g.32_Hive,4.g.33_Firebird,4.h.1_Amazon Web Services (AWS),4.h.2_Google Cloud (GCP),4.h.3_Azure (Microsoft),4.h.4_Oracle Cloud,4.h.5_IBM,4.h.6_Servidores On Premise/Não utilizamos Cloud,4.h.7_Cloud Própria,4.j.1_Microsoft PowerBI,4.j.2_Qlik View/Qlik Sense,4.j.3_Tableau,4.j.4_Metabase,4.j.5_Superset,4.j.6_Redash,4.j.7_Looker,4.j.8_Looker Studio(Google Data Studio),4.j.9_Amazon Quicksight,4.j.10_Alteryx,4.j.11_SAP Business Objects/SAP Analytics,4.j.12_Oracle Business Intelligence,4.j.13_Salesforce/Einstein Analytics,4.j.14_SAS Visual Analytics,4.j.15_Grafana,4.j.16_Pentaho,4.j.17_Fazemos todas as análises utilizando apenas Excel ou planilhas do google,4.j.18_Não utilizo nenhuma ferramenta de BI no trabalho,4.l.1 Colaboradores usando AI generativa de forma independente e descentralizada,4.l.2 Direcionamento centralizado do uso de AI generativa,4.l.3 Desenvolvedores utilizando Copilots,4.l.4 AI Generativa e LLMs para melhorar produtos externos para os clientes finais,4.l.5 AI Generativa e LLMs para melhorar produtos internos para os colaboradores,4.l.6 IA Generativa e LLMs como principal frente do negócio,4.l.7 IA Generativa e LLMs não é prioridade,4.l.8 Não sei opinar sobre o uso de IA Generativa e LLMs na empresa,4.m.1 Não uso soluções de AI Generativa com foco em produtividade,4.m.2 Uso soluções gratuitas de AI Generativa com foco em produtividade,4.m.3 Uso e pago pelas soluções de AI Generativa com foco em produtividade,4.m.4 A empresa que trabalho paga pelas soluções de AI Generativa com foco em produtividade,4.m.5 Uso soluções do tipo Copilot,"6.a.1_Desenvolvo pipelines de dados utilizando linguagens de programação como Python, Scala, Java etc.","6.a.2_Realizo construções de ETL's em ferramentas como Pentaho, Talend, Dataflow etc.",6.a.3_Crio consultas através da linguagem SQL para exportar informações e compartilhar com as áreas de negócio.,"6.a.4_Atuo na integração de diferentes fontes de dados através de plataformas proprietárias como Stitch Data, Fivetran etc.","6.a.5_Modelo soluções de arquitetura de dados, criando componentes de ingestão de dados, transformação e recuperação da informação.",6.a.6_Desenvolvo/cuido da manutenção de repositórios de dados baseados em streaming de eventos como Data Lakes e Data Lakehouses.,"6.a.7_Atuo na modelagem dos dados, com o objetivo de criar conjuntos de dados como Data Warehouses, Data Marts, Datasets etc.","6.a.8_Cuido da qualidade dos dados, metadados e dicionário de dados.",6.a.9_Nenhuma das opções listadas refletem meu dia a dia.,6.b.1_Scripts Python,6.b.2_SQL & Stored Procedures,6.b.3_Apache Airflow,6.b.4_Apache NiFi,6.b.5_Luigi,6.b.6_AWS Glue,6.b.7_Talend,6.b.8_Pentaho,6.b.9_Alteryx,6.b.10_Stitch,6.b.11_Fivetran,6.b.12_Google Dataflow,6.b.13_Oracle Data Integrator,6.b.14_IBM DataStage,6.b.15_SAP BW ETL,6.b.16_SQL Server Integration Services (SSIS),6.b.17_SAS Data Integration,6.b.18_Qlik Sense,6.b.19_Knime,6.b.20_Databricks,6.b.21_Não utilizo ferramentas de ETL,"6.h.1_Desenvolvendo pipelines de dados utilizando linguagens de programação como Python, Scala, Java etc.","6.h.2_Realizando construções de ETL\s em ferramentas como Pentaho, Talend, Dataflow etc.",6.h.3_Criando consultas através da linguagem SQL para exportar informações e compartilhar com as áreas de negócio.,"6.h.4_Atuando na integração de diferentes fontes de dados através de plataformas proprietárias como Stitch Data, Fivetran etc.","6.h.5_Modelando soluções de arquitetura de dados, criando componentes de ingestão de dados, transformação e recuperação da informação.",6.h.6_Desenvolvendo/cuidando da manutenção de repositórios de dados baseados em streaming de eventos como Data Lakes e Data Lakehouses.,"6.h.7_Atuando na modelagem dos dados, com o objetivo de criar conjuntos de dados como Data Warehouses, Data Marts, Datasets etc.","6.h.8_Cuidando da qualidade dos dados, metadados e dicionário de dados.",6.h.9_Nenhuma das opções listadas refletem meu dia a dia.,"7.a.1_Processo e analiso dados utilizando linguagens de programação como Python, R etc.","7.a.2_Realizo construções de dashboards em ferramentas de BI como PowerBI, Tableau, Looker, Qlik etc.",7.a.3_Crio consultas através da linguagem SQL para exportar informações e compartilhar com as áreas de negócio.,7.a.4_Utilizo API\s para extrair dados e complementar minhas análises.,"7.a.5_Realizo experimentos e estudos utilizando metodologias estatísticas como teste de hipótese, modelos de regressão etc.","7.a.6_Desenvolvo/cuido da manutenção de ETL\s utilizando tecnologias como Talend, Pentaho, Airflow, Dataflow etc.","7.a.7_Atuo na modelagem dos dados, com o objetivo de criar conjuntos de dados como Data Warehouses, Data Marts etc.",7.a.8_Desenvolvo/cuido da manutenção de planilhas para atender as áreas de negócio.,"7.a.9_Utilizo ferramentas avançadas de estatística como SAS, SPSS, Stata etc, para realizar análises de dados.",7.a.10_Nenhuma das opções listadas refletem meu dia a dia.,7.b.1_Scripts Python,7.b.2_SQL & Stored Procedures,7.b.3_Apache Airflow,7.b.4_Apache NiFi,7.b.5_Luigi,7.b.6_AWS Glue,7.b.7_Talend,7.b.8_Pentaho,7.b.9_Alteryx,7.b.10_Stitch,7.b.11_Fivetran,7.b.12_Google Dataflow,7.b.13_Oracle Data Integrator,7.b.14_IBM DataStage,7.b.15_SAP BW ETL,7.b.16_SQL Server Integration Services (SSIS),7.b.17_SAS Data Integration,7.b.18_Qlik Sense,7.b.19_Knime,7.b.20_Databricks,7.b.21_Não utilizo ferramentas de ETL,"7.c.1_Ferramentas de AutoML como H2O.ai, Data Robot, BigML etc.","7.c.2_""Point and Click"" Analytics como Alteryx, Knime, Rapidminer etc.","7.c.3_Product metricts & Insights como Mixpanel, Amplitude, Adobe Analytics.",7.c.4_Ferramentas de análise dentro de ferramentas de CRM como Salesforce Einstein Anaytics ou Zendesk dashboards.,7.c.5_Minha empresa não utiliza essas ferramentas.,7.c.6_Não sei informar.,"7.d.1_Processando e analisando dados utilizando linguagens de programação como Python, R etc.","7.d.2_Realizando construções de dashboards em ferramentas de BI como PowerBI, Tableau, Looker, Qlik etc.",7.d.3_Criando consultas através da linguagem SQL para exportar informações e compartilhar com as áreas de negócio.,7.d.4_Utilizando API's para extrair dados e complementar minhas análises.,"7.d.5_Realizando experimentos e estudos utilizando metodologias estatísticas como teste de hipótese, modelos de regressão etc.","7.d.6_Desenvolvendo/cuidando da manutenção de ETL's utilizando tecnologias como Talend, Pentaho, Airflow, Dataflow etc.","7.d.7_Atuando na modelagem dos dados, com o objetivo de criar conjuntos de dados como Data Warehouses, Data Marts, Datasets etc.",7.d.8_Desenvolvendo/cuidando da manutenção de planilhas para atender as áreas de negócio.,"7.d.9_Utilizando ferramentas avançadas de estatística como SAS, SPSS, Stata etc, para realizar análises de dados.",7.d.10_Nenhuma das opções listadas refletem meu dia a dia.,"8.a.1_Estudos Ad-hoc com o objetivo de confirmar hipóteses, realizar modelos preditivos, forecasts, análise de cluster para resolver problemas pontuais e responder perguntas das áreas de negócio.",8.a.2_Sou responsável pela coleta e limpeza dos dados que uso para análise e modelagem.,"8.a.3_Sou responsável por entrar em contato com os times de negócio para definição do problema, identificar a solução e apresentação de resultados.",8.a.4_Desenvolvo modelos de Machine Learning com o objetivo de colocar em produção em sistemas (produtos de dados).,"8.a.5_Sou responsável por colocar modelos em produção, criar os pipelines de dados, APIs de consumo e monitoramento.","8.a.6_Cuido da manutenção de modelos de Machine Learning já em produção, atuando no monitoramento, ajustes e refatoração quando necessário.","8.a.7_Realizo construções de dashboards em ferramentas de BI como PowerBI, Tableau, Looker, Qlik, etc","8.a.8_Utilizo ferramentas avançadas de estatística como SAS, SPSS, Stata etc, para realizar análises.","8.a.9_Crio e dou manutenção em ETLs, DAGs e automações de pipelines de dados.",8.a.10_Crio e gerencio soluções de Feature Store e cultura de MLOps.,"8.a.11_Sou responsável por criar e manter a infra que meus modelos e soluções rodam (clusters, servidores, API, containers, etc.)",8.a.12_Treino e aplico LLM's para solucionar problemas de negócio.,"8.b.1_Utilizo modelos de regressão (linear, logística, GLM).",8.b.2_Utilizo redes neurais ou modelos baseados em árvore para criar modelos de classificação.,8.b.3_Desenvolvo sistemas de recomendação (RecSys).,8.b.4_Utilizo métodos estatísticos Bayesianos para analisar dados.,8.b.5_Utilizo técnicas de NLP (Natural Language Processing) para análisar dados não-estruturados.,"8.b.6_Utilizo métodos estatísticos clássicos (Testes de hipótese, análise multivariada, sobrevivência, dados longitudinais, inferência estatistica) para analisar dados.",8.b.7_Utilizo cadeias de Markov ou HMM\s para realizar análises de dados.,"8.b.8_Desenvolvo técnicas de Clusterização (K-means, Spectral, DBScan etc).",8.b.9_Realizo previsões através de modelos de Séries Temporais (Time Series).,8.b.10_Utilizo modelos de Reinforcement Learning (aprendizado por reforço).,8.b.11_Utilizo modelos de Machine Learning para detecção de fraude.,8.b.12_Utilizo métodos de Visão Computacional.,8.b.13_Utilizo modelos de Detecção de Churn.,8.b.14_Utilizo LLM's para solucionar problemas de negócio.,"8.c.1_Ferramentas de BI (PowerBI, Looker, Tableau, Qlik etc).","8.c.2_Planilhas (Excel, Google Sheets etc).","8.c.3_Ambientes de desenvolvimento local (R-studio, JupyterLab, Anaconda).","8.c.4_Ambientes de desenvolvimento na nuvem (Google Colab, AWS Sagemaker, Kaggle Notebooks etc).","8.c.5_Ferramentas de AutoML (Datarobot, H2O, Auto-Keras etc).","8.c.6_Ferramentas de ETL (Apache Airflow, NiFi, Stitch, Fivetran, Pentaho etc).","8.c.7_Plataformas de Machine Learning (TensorFlow, Azure Machine Learning, Kubeflow etc).","8.c.8_Feature Store (Feast, Hopsworks, AWS Feature Store, Databricks Feature Store etc).","8.c.9_Sistemas de controle de versão (Github, DVC, Neptune, Gitlab etc).","8.c.10_Plataformas de Data Apps (Streamlit, Shiny, Plotly Dash etc).","8.c.11_Ferramentas de estatística avançada como SPSS, SAS, etc.","8.d.1_Estudos Ad-hoc com o objetivo de confirmar hipóteses, realizar modelos preditivos, forecasts, análise de cluster para resolver problemas pontuais e responder perguntas das áreas de negócio.",8.d.2_Coletando e limpando dos dados que uso para análise e modelagem.,"8.d.3_Entrando em contato com os times de negócio para definição do problema, identificar a solução e apresentação de resultados.",8.d.4_Desenvolvendo modelos de Machine Learning com o objetivo de colocar em produção em sistemas (produtos de dados).,"8.d.5_Colocando modelos em produção, criando os pipelines de dados, APIs de consumo e monitoramento.","8.d.6_Cuidando da manutenção de modelos de Machine Learning já em produção, atuando no monitoramento, ajustes e refatoração quando necessário.","8.d.7_Realizando construções de dashboards em ferramentas de BI como PowerBI, Tableau, Looker, Qlik, etc.","8.d.8_Utilizando ferramentas avançadas de estatística como SAS, SPSS, Stata etc, para realizar análises.","8.d.9_Criando e dando manutenção em ETLs, DAGs e automações de pipelines de dados.",8.d.10_Criando e gerenciando soluções de Feature Store e cultura de MLOps.,"8.d.11_Criando e mantendo a infra que meus modelos e soluções rodam (clusters, servidores, API, containers, etc.)",8.d.12_Treinando e aplicando LLM's para solucionar problemas de negócio.
count,5217.0,2641.0,2641.0,2641.0,2641.0,1289.0,1289.0,1289.0,1289.0,1289.0,1289.0,1289.0,1289.0,1289.0,1527.0,1527.0,1527.0,1527.0,1527.0,1527.0,1527.0,1527.0,1527.0,1527.0,1527.0,4863.0,4863.0,4863.0,4863.0,4863.0,4863.0,4863.0,4863.0,4863.0,4863.0,1015.0,1015.0,1015.0,1015.0,1015.0,1015.0,1015.0,1015.0,1015.0,1015.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1045.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,974.0,974.0,974.0,974.0,974.0,974.0,974.0,974.0,974.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3589.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,3619.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,929.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,1669.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0,773.0
mean,32.358827,0.511927,0.221128,0.319576,0.033321,0.392552,0.359969,0.373157,0.521334,0.556245,0.276959,0.556245,0.293251,0.247479,0.445972,0.170923,0.128356,0.165684,0.138179,0.329404,0.340537,0.303209,0.056974,0.013098,0.170923,0.810405,0.254164,0.164302,0.576599,0.211803,0.242443,0.317294,0.12955,0.077319,0.056755,0.375369,0.698522,0.698522,0.658128,0.223645,0.539901,0.381281,0.26601,0.394089,0.409852,0.633493,0.377033,0.482297,0.368421,0.316746,0.520574,0.335885,0.380861,0.451675,0.377033,0.627751,0.164593,0.157895,0.190431,0.096651,0.208612,0.270813,0.057416,0.27177,0.052632,0.351196,0.149282,0.232536,0.250718,0.305263,0.395,0.23,0.359,0.402,0.396,0.11,0.12,0.082,0.342916,0.177618,0.12731,0.322382,0.255647,0.296715,0.336756,0.090349,0.126283,0.932302,0.380492,0.175463,0.585521,0.046974,0.056369,0.909644,0.315004,0.84341,0.139872,0.013931,0.127055,0.004179,0.001115,0.533018,0.021454,0.876567,0.098077,0.817777,0.012817,0.008638,0.081081,0.001672,0.036222,0.059905,0.035107,0.005573,0.003344,0.009473,0.062134,0.0,0.232377,0.179159,0.293396,0.066314,0.064921,0.00195,0.011424,0.114517,0.035943,0.004737,0.236556,0.314015,0.042909,0.021733,0.033157,0.062692,0.005573,0.02034,0.002508,0.032042,0.014767,0.262469,0.015603,0.1017,0.140429,0.062413,0.31095,0.006408,0.041794,0.006687,0.053497,0.073837,0.010867,0.441349,0.292561,0.308721,0.039844,0.016718,0.129005,0.053218,0.551534,0.049185,0.189831,0.08787,0.026803,0.015198,0.229621,0.166621,0.046145,0.019895,0.015198,0.009947,0.025421,0.020448,0.080133,0.031224,0.106383,0.097541,0.500414,0.211661,0.261951,0.280741,0.268859,0.068527,0.113844,0.070185,0.065488,0.537165,0.155292,0.192595,0.237082,0.82239,0.334769,0.778256,0.170075,0.559742,0.404736,0.630786,0.587729,0.016146,0.81916,0.689989,0.438105,0.044133,0.001076,0.194833,0.009688,0.051668,0.01507,0.007535,0.021529,0.079656,0.03014,0.010764,0.008611,0.082885,0.01507,0.01507,0.010764,0.37352,0.01507,0.595264,0.113025,0.34338,0.027987,0.235737,0.110872,0.190527,0.094726,0.033369,0.539844,0.84302,0.745357,0.273817,0.22169,0.206111,0.240863,0.437987,0.029359,0.023966,0.506291,0.479928,0.090473,0.004793,0.000599,0.040743,0.005992,0.044937,0.02157,0.001797,0.005392,0.024566,0.010186,0.006591,0.005392,0.048532,0.016177,0.031156,0.017376,0.227681,0.159976,0.025165,0.05033,0.054524,0.1432,0.424805,0.366687,0.289395,0.546435,0.442181,0.022169,0.04254,0.055123,0.061714,0.151588,0.004194,0.04973,0.666235,0.580854,0.56533,0.746442,0.390686,0.454075,0.272962,0.062096,0.260026,0.141009,0.135834,0.307891,0.707633,0.59379,0.119017,0.218629,0.341527,0.569211,0.051746,0.529107,0.42044,0.063389,0.221216,0.139715,0.223803,0.390686,0.428202,0.681759,0.711514,0.617076,0.100906,0.25097,0.412678,0.174644,0.664942,0.271669,0.064683,0.375162,0.433376,0.22251,0.298836,0.086675,0.09185,0.041397,0.003881,0.043984,0.009056,0.011643,0.133247
std,7.419433,0.499952,0.415085,0.466401,0.179507,0.488508,0.480177,0.483831,0.499739,0.497019,0.44767,0.497019,0.455429,0.431715,0.497235,0.376565,0.334596,0.371919,0.345201,0.470151,0.474045,0.459795,0.23187,0.11373,0.376565,0.392021,0.435435,0.370587,0.494149,0.408628,0.428605,0.465471,0.335842,0.267124,0.231398,0.484457,0.459126,0.459126,0.474571,0.416892,0.498651,0.485941,0.442087,0.488895,0.492049,0.482081,0.484875,0.499926,0.482607,0.46543,0.499816,0.472525,0.485831,0.497897,0.484875,0.483636,0.370991,0.364817,0.392829,0.295623,0.406511,0.444593,0.232748,0.445085,0.223404,0.477573,0.356537,0.422651,0.433634,0.460739,0.489095,0.421043,0.479947,0.490547,0.489309,0.313046,0.325124,0.274502,0.474928,0.382387,0.333491,0.467628,0.436448,0.457044,0.472843,0.286828,0.332339,0.251262,0.485575,0.380415,0.4927,0.211613,0.230665,0.286731,0.464582,0.363464,0.346902,0.117223,0.333081,0.064522,0.03337,0.498978,0.144914,0.328979,0.297461,0.386082,0.1125,0.092549,0.272998,0.040859,0.186868,0.237344,0.184077,0.074452,0.057735,0.096883,0.241433,0.0,0.422407,0.383538,0.455382,0.248864,0.24642,0.044126,0.106285,0.318482,0.186174,0.06867,0.425027,0.464187,0.20268,0.145831,0.179071,0.242441,0.074452,0.14118,0.050021,0.176137,0.120637,0.440037,0.123952,0.302295,0.34748,0.241938,0.462947,0.079807,0.200147,0.081512,0.225053,0.261541,0.103689,0.496617,0.455002,0.46203,0.19562,0.12823,0.335253,0.2245,0.497406,0.216284,0.392222,0.283144,0.16153,0.122355,0.420648,0.372688,0.209829,0.139659,0.122355,0.099254,0.157423,0.141545,0.271536,0.173947,0.30837,0.296734,0.500069,0.408542,0.439757,0.449423,0.443428,0.252683,0.317665,0.255494,0.247419,0.498686,0.362232,0.394392,0.425352,0.38239,0.472164,0.415643,0.375901,0.496685,0.491105,0.482852,0.492509,0.126106,0.385093,0.462747,0.496422,0.205502,0.032809,0.396286,0.098002,0.221476,0.121897,0.086523,0.145216,0.270905,0.171064,0.103247,0.092447,0.275857,0.121897,0.121897,0.103247,0.483999,0.121897,0.491105,0.316794,0.475093,0.165025,0.424687,0.314143,0.392929,0.292993,0.179695,0.498559,0.363891,0.435791,0.44605,0.415508,0.404633,0.427735,0.496288,0.168861,0.15299,0.50011,0.499747,0.286945,0.069088,0.024478,0.197753,0.077196,0.207228,0.145318,0.042371,0.073257,0.154843,0.100439,0.08094,0.073257,0.214952,0.126195,0.173792,0.130706,0.419461,0.366694,0.156672,0.21869,0.227116,0.350381,0.494462,0.482044,0.453617,0.497988,0.496795,0.147277,0.201879,0.228288,0.240707,0.358728,0.064646,0.217453,0.471862,0.493739,0.496035,0.435329,0.48822,0.498209,0.44577,0.241486,0.438932,0.348256,0.342834,0.46192,0.455145,0.491443,0.324018,0.413584,0.474529,0.495507,0.221658,0.499475,0.493949,0.24382,0.415334,0.346916,0.417062,0.48822,0.495139,0.466095,0.453352,0.486415,0.301399,0.433852,0.492635,0.379908,0.472316,0.445108,0.246125,0.484478,0.495862,0.416201,0.458044,0.281541,0.289001,0.199336,0.062217,0.205193,0.094791,0.107342,0.340062
min,18.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,27.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,31.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,36.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,68.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [5]:
base_total.describe(exclude="number")

Unnamed: 0,0.a_token,0.d_data/hora_envio,1.a.1_faixa_idade,1.b_genero,1.c_cor/raca/etnia,1.d_pcd,1.e_experiencia_profissional_prejudicada,1.i.1_uf_onde_mora,1.i.2_regiao_onde_mora,1.f_aspectos_prejudicados,1.k.1_uf_de_origem,1.k.2_regiao_de_origem,1.g_vive_no_brasil,1.h_pais_onde_mora,1.i_estado_onde_mora,1.j_vive_no_estado_de_formacao,1.k_estado_de_origem,1.l_nivel_de_ensino,1.m_área_de_formação,2.a_situação_de_trabalho,2.b_setor,2.c_numero_de_funcionarios,2.d_atua_como_gestor,2.e_cargo_como_gestor,2.f_cargo_atual,2.g_nivel,2.h_faixa_salarial,2.i_tempo_de_experiencia_em_dados,2.j_tempo_de_experiencia_em_ti,2.k_satisfeito_atualmente,2.l_motivo_insatisfacao,2.m_participou_de_entrevistas_ultimos_6m,2.n_planos_de_mudar_de_emprego_6m,2.o_criterios_para_escolha_de_emprego,2.q_empresa_passou_por_layoff_em_2024,2.r_modelo_de_trabalho_atual,2.s_modelo_de_trabalho_ideal,2.t_atitude_em_caso_de_retorno_presencial,3.a_numero_de_pessoas_em_dados,3.b_cargos_no_time_de_dados_da_empresa,3.c_responsabilidades_como_gestor,3.d_desafios_como_gestor,3.e_ai_generativa_e_llm_é_uma_prioridade?,3.f_tipo_de_uso_de_ai_generativa_e_llm_na_empresa,3.g_motivos_para_não_usar_ai_generativa_e_llm,4.a_funcao_de_atuacao,4.a.1_atuacao_em_dados,4.b_fontes_de_dados_(dia_a_dia),4.c_fonte_de_dado_mais_usada,4.d_linguagem_de_programacao_(dia_a_dia),4.e_linguagem_mais_usada,4.f_linguagem_preferida,4.g_banco_de_dados_(dia_a_dia),4.h_cloud_(dia_a_dia),4.i_cloud_preferida,4.j_ferramenta_de_bi_(dia_a_dia),4.k_ferramenta_de_bi_preferida,4.l_tipo_de_uso_de_ai_generativa_e_llm_na_empresa,4.m_usa_chatgpt_ou_copilot_no_trabalho?,5.a_objetivo_na_area_de_dados,5.b_oportunidade_buscada,5.c_tempo_em_busca_de_oportunidade,5.d_experiencia_em_processos_seletivos,6.a_rotina_como_de,6.b_ferramentas_etl_de,6.c_possui_data_lake,6.d_tecnologia_data_lake,6.e_possui_data_warehouse,6.f_tecnologia_data_warehouse,6.g_ferramentas_de_qualidade_de_dados_(dia_a_dia),6.h_maior_tempo_gasto_como_de,7.a_rotina_como_da,7.b_ferramentas_etl_da,7.c_ferramentas_autonomia_area_de_negocios,7.d_maior_tempo_gasto_como_da,8.a_rotina_como_ds,8.b_tecnicas_e_metodos_ds,8.c_tecnologias_ds,8.d_maior_tempo_gasto_como_ds
count,5217,5217,5217,5217,5217,5217,2641,5075,5075,1289,953,953,5217,139,5078,5078,977,5217,5123,5217,4863,4863,4863,1045,3818,3818,4863,4863,4863,4863,1527,4863,4863,4862,4863,4863,4863,4863,1045,1015,1045,1045,1045,1000,974,3799,5217,3619,3589,3589,3424,3424,3589,3589,3589,3619,3266,3619,3619,541,286,290,289,929,929,923,735,917,726,560,929,1669,1669,1669,1669,773,773,773,773
unique,5215,5188,9,4,7,3,20,25,5,573,25,5,2,46,27,2,27,7,8,13,21,8,2,5,15,3,13,7,7,2,439,5,4,504,3,4,4,3,8,806,750,711,5,228,264,5,6,482,76,208,14,33,1644,221,13,655,63,288,24,46,9,4,9,321,333,2,7,2,10,16,62,370,320,28,64,478,490,437,84
top,lb7gt5hdqqxuguv2lb7gto44mpk3ejha,14/10/2024 17:37:10,30-34,Masculino,Branca,Não,Não acredito que minha experiência profissiona...,SP,Sudeste,Atenção dada pelas pessoas diante das minhas o...,MG,Sudeste,True,Portugal,São Paulo (SP),True,Minas Gerais (MG),Pós-graduação,Computação / Engenharia de Software / Sistemas...,Empregado (CLT),Finanças ou Bancos,Acima de 3.000,False,Gerente/Head,Analista de Dados/Data Analyst,Sênior,de R$ 8.001/mês a R$ 12.000/mês,de 3 a 4 anos,Não tive experiência na área de TI/Engenharia ...,True,Remuneração/Salário não corresponde a realidad...,Não participei de entrevistas de emprego/proce...,"Não estou buscando, mas me considero aberto a ...","Remuneração/Salário, Benefícios, Flexibilidade...",Não ocorreram layoffs/demissões em massa na em...,Modelo 100% remoto,Modelo 100% remoto,Vou procurar outra oportunidade no modelo híbr...,Acima de 300 pessoas,Analista de Dados/Data Analyst,Pensar na visão de longo prazo de dados da emp...,Conseguir gerar valor para as áreas de negócio...,"Sim, está entre nossas principais prioridades ...",Colaboradores utilizando soluções baseadas em ...,Falta de compreensão dos casos de uso.,*Análise de Dados/BI:* Extrai e cruza dados un...,Análise de Dados,Dados relacionais (estruturados em bancos SQL)...,Dados relacionais (estruturados em bancos SQL)...,"SQL, Python",SQL,Python,Google BigQuery,Amazon Web Services (AWS),Não sei opinar / Não tenho preferência,Microsoft PowerBI,Microsoft PowerBI,Colaboradores utilizando soluções baseadas em ...,Utilizo apenas soluções gratuitas (como por ex...,Migração de carreira: Trabalho em outra área e...,Cientista de Dados/Data Scientist,0 - 6 meses,Ainda não me candidatei a nenhuma vaga na área,Desenvolvo pipelines de dados utilizando lingu...,"Scripts Python, SQL & Stored Procedures",True,Amazon S3 + Redshift + Athena,True,Google BigQuery,dbt,Desenvolvendo pipelines de dados utilizando li...,Realizo construções de dashboards em ferrament...,Não utilizo ferramentas de ETL,Minha empresa não utiliza essas ferramentas.,Realizando construções de dashboards em ferram...,Estudos Ad-hoc com o objetivo de confirmar hip...,"Utilizo modelos de regressão (linear, logístic...","Ferramentas de BI (PowerBI, Looker, Tableau, Q...",Estudos Ad-hoc com o objetivo de confirmar hip...
freq,2,3,1503,3968,3478,5037,1321,2064,3132,45,147,421,5078,39,2064,4101,147,1943,2076,3783,1035,2335,3818,400,957,1573,1080,1386,2634,3336,96,2432,1897,461,3470,2222,2238,2150,249,27,22,8,325,91,58,1669,1669,546,1358,1624,1760,3041,344,962,1165,938,1704,895,1629,182,104,187,116,62,77,763,208,743,190,190,124,118,259,688,270,22,15,15,64


In [6]:
colunas_lista = base_total.columns.to_list()
colunas_lista

['0.a_token',
 '0.d_data/hora_envio',
 '1.a_idade',
 '1.a.1_faixa_idade',
 '1.b_genero',
 '1.c_cor/raca/etnia',
 '1.d_pcd',
 '1.e_experiencia_profissional_prejudicada',
 '1.e.1_Não acredito que minha experiência profissional seja afetada',
 '1.e.2_Sim, devido a minha Cor/Raça/Etnia',
 '1.e.3_Sim, devido a minha identidade de gênero',
 '1.e.4_Sim, devido ao fato de ser PCD',
 '1.i.1_uf_onde_mora',
 '1.f.1_Quantidade de oportunidades de emprego/vagas recebidas',
 '1.f.2_Senioridade das vagas recebidas em relação à sua experiência',
 '1.f.3_Aprovação em processos seletivos/entrevistas',
 '1.f.4_Oportunidades de progressão de carreira',
 '1.f.5_Velocidade de progressão de carreira',
 '1.f.6_Nível de cobrança no trabalho/Stress no trabalho',
 '1.f.7_Atenção dada pelas pessoas diante das minhas opiniões e ideias',
 '1.f.8_Relação com outras pessoas da empresa, em momentos de trabalho',
 '1.f.9_Relação com outras pessoas da empresa, em momentos de integração e outros momentos fora do trabalho

In [7]:
base_total['4.e_linguagem_mais_usada'].value_counts()

4.e_linguagem_mais_usada
SQL                                                        1760
Python                                                     1430
R                                                            90
Visual Basic/VBA                                             36
Não utilizo nenhuma das linguagens listadas no trabalho      25
Scala                                                        22
JavaScript                                                   19
SAS/Stata                                                    15
Java                                                         10
C/C++/C#                                                      7
.NET                                                          4
PHP                                                           4
Julia                                                         1
Matlab                                                        1
Name: count, dtype: int64

Devido a grande quantidade de colunas (324) iremos selecionar alguma colunas hipotéticamente que podem afetar a faixa de salários para analisá-las.

In [8]:
coluna_alvo = []
colunas_analise = []
coluna_alvo = ['2.h_faixa_salarial']

colunas_analise = [
'1.i.2_regiao_onde_mora',
 '1.g_vive_no_brasil',
'1.l_nivel_de_ensino',
 '1.m_área_de_formação',
 '2.a_situação_de_trabalho',
'2.b_setor',
 '2.d_atua_como_gestor',
'2.f_cargo_atual',
'2.g_nivel',
'2.i_tempo_de_experiencia_em_dados',
 '2.r_modelo_de_trabalho_atual',
'4.a.1_atuacao_em_dados',
  '4.d.2_R', #'4.d_linguagem_de_programacao_(dia_a_dia)',
    '4.e_linguagem_mais_usada',
'4.m.2 Uso soluções gratuitas de AI Generativa com foco em produtividade',
 '4.m.3 Uso e pago pelas soluções de AI Generativa com foco em produtividade',
]

colunas = colunas_analise + coluna_alvo

In [9]:
base = base_total[colunas]


In [10]:
base.head(5)

Unnamed: 0,1.i.2_regiao_onde_mora,1.g_vive_no_brasil,1.l_nivel_de_ensino,1.m_área_de_formação,2.a_situação_de_trabalho,2.b_setor,2.d_atua_como_gestor,2.f_cargo_atual,2.g_nivel,2.i_tempo_de_experiencia_em_dados,2.r_modelo_de_trabalho_atual,4.a.1_atuacao_em_dados,4.d.2_R,4.e_linguagem_mais_usada,4.m.2 Uso soluções gratuitas de AI Generativa com foco em produtividade,4.m.3 Uso e pago pelas soluções de AI Generativa com foco em produtividade,2.h_faixa_salarial
0,Sul,True,Estudante de Graduação,Computação / Engenharia de Software / Sistemas...,Estagiário,Marketing,False,Analista de Dados/Data Analyst,Júnior,de 1 a 2 anos,Modelo 100% remoto,Análise de Dados,0.0,Python,1.0,0.0,de R$ 1.001/mês a R$ 2.000/mês
1,Sul,True,Estudante de Graduação,Computação / Engenharia de Software / Sistemas...,Estagiário,Finanças ou Bancos,False,Analista de BI/BI Analyst,Júnior,Menos de 1 ano,Modelo 100% presencial,Análise de Dados,0.0,Python,1.0,0.0,Menos de R$ 1.000/mês
2,Sudeste,True,Estudante de Graduação,Computação / Engenharia de Software / Sistemas...,Empregado (CLT),Indústria,False,Outra Opção,Júnior,Não tenho experiência na área de dados,Modelo 100% presencial,Outra atuação,0.0,,1.0,0.0,de R$ 1.001/mês a R$ 2.000/mês
3,Sudeste,True,Estudante de Graduação,Computação / Engenharia de Software / Sistemas...,Estagiário,Tecnologia/Fábrica de Software,False,Analista de Dados/Data Analyst,Júnior,Menos de 1 ano,Modelo híbrido flexível (o funcionário tem lib...,Análise de Dados,0.0,SQL,1.0,0.0,de R$ 1.001/mês a R$ 2.000/mês
4,Sudeste,True,Estudante de Graduação,Computação / Engenharia de Software / Sistemas...,Estagiário,Tecnologia/Fábrica de Software,False,Desenvolvedor/ Engenheiro de Software/ Analist...,Júnior,Menos de 1 ano,Modelo 100% remoto,Outra atuação,0.0,JavaScript,1.0,0.0,de R$ 1.001/mês a R$ 2.000/mês


Renomear Colunas

In [11]:
colunas_renomeadas = [
    'regiao_onde_mora',
    'vive_no_brasil',
    'nivel_de_ensino',
    'área_de_formação',
    'situação_de_trabalho',
    'setor_da_empresa',
    'atua_como_gestor',
    'cargo_atual',
    'nivel_cargo',
    'tempo_de_experiencia_em_dados',
    'modelo_de_trabalho_atual',
    'atuacao_em_dados',
    'linguagem_R_no_trabalho',
    'linguagem_mais_usada',
'Uso soluções gratuitas de AI Generativa com foco em produtividade',
 'Uso e pago pelas soluções de AI Generativa com foco em produtividade',
    'faixa_salarial'
]

In [12]:
base.columns = colunas_renomeadas

In [13]:
base.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5217 entries, 0 to 5216
Data columns (total 17 columns):
 #   Column                                                                Non-Null Count  Dtype  
---  ------                                                                --------------  -----  
 0   regiao_onde_mora                                                      5075 non-null   object 
 1   vive_no_brasil                                                        5217 non-null   bool   
 2   nivel_de_ensino                                                       5217 non-null   object 
 3   área_de_formação                                                      5123 non-null   object 
 4   situação_de_trabalho                                                  5217 non-null   object 
 5   setor_da_empresa                                                      4863 non-null   object 
 6   atua_como_gestor                                                      4863 non-null   object 
 7

In [14]:
base.isnull().sum()

regiao_onde_mora                                                         142
vive_no_brasil                                                             0
nivel_de_ensino                                                            0
área_de_formação                                                          94
situação_de_trabalho                                                       0
setor_da_empresa                                                         354
atua_como_gestor                                                         354
cargo_atual                                                             1399
nivel_cargo                                                             1399
tempo_de_experiencia_em_dados                                            354
modelo_de_trabalho_atual                                                 354
atuacao_em_dados                                                           0
linguagem_R_no_trabalho                                                 1628

In [15]:
pd.crosstab(index=base['situação_de_trabalho'], columns=base['faixa_salarial'], values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=False, normalize=False)

faixa_salarial,Acima de R$ 40.001/mês,Menos de R$ 1.000/mês,de R$ 1.001/mês a R$ 2.000/mês,de R$ 12.001/mês a R$ 16.000/mês,de R$ 16.001/mês a R$ 20.000/mês,de R$ 2.001/mês a R$ 3.000/mês,de R$ 20.001/mês a R$ 25.000/mês,de R$ 25.001/mês a R$ 30.000/mês,de R$ 3.001/mês a R$ 4.000/mês,de R$ 30.001/mês a R$ 40.000/mês,de R$ 4.001/mês a R$ 6.000/mês,de R$ 6.001/mês a R$ 8.000/mês,de R$ 8.001/mês a R$ 12.000/mês,NaN
situação_de_trabalho,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Desempregado e não estou buscando recolocação,0,0,0,4,2,0,0,0,1,1,1,0,2,0
"Desempregado, buscando recolocação",0,0,0,0,0,0,0,0,0,0,0,0,0,205
Empreendedor ou Empregado (CNPJ),18,2,7,73,56,15,35,26,34,18,62,59,89,0
Empregado (CLT),51,2,43,599,354,155,168,100,211,85,506,567,942,0
Estagiário,1,21,96,1,0,52,0,0,12,0,1,2,0,0
Freelancer,1,4,4,3,3,4,1,0,9,2,3,5,2,0
Prefiro não informar,0,2,3,1,2,1,0,0,0,0,1,2,4,0
Servidor Público,2,1,2,18,14,9,9,7,2,8,13,13,26,0
Somente Estudante (graduação),0,0,0,0,0,0,0,0,0,0,0,0,0,68
Somente Estudante (pós-graduação),0,0,0,0,0,0,0,0,0,0,0,0,0,25


Excluir as linhas com Faixa Salarial nulas:

    - Essas linhas são de respostas de pessoas que não trabalham com remuneração ainda, por isso não faz sentido manter esse caso na análise da Faixa Salarial.

In [16]:
base = base.dropna(axis=0,subset='faixa_salarial')

In [17]:
base.isnull().sum()

regiao_onde_mora                                                         130
vive_no_brasil                                                             0
nivel_de_ensino                                                            0
área_de_formação                                                          81
situação_de_trabalho                                                       0
setor_da_empresa                                                           0
atua_como_gestor                                                           0
cargo_atual                                                             1045
nivel_cargo                                                             1045
tempo_de_experiencia_em_dados                                              0
modelo_de_trabalho_atual                                                   0
atuacao_em_dados                                                           0
linguagem_R_no_trabalho                                                 1274

In [18]:
base.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4863 entries, 0 to 5215
Data columns (total 17 columns):
 #   Column                                                                Non-Null Count  Dtype  
---  ------                                                                --------------  -----  
 0   regiao_onde_mora                                                      4733 non-null   object 
 1   vive_no_brasil                                                        4863 non-null   bool   
 2   nivel_de_ensino                                                       4863 non-null   object 
 3   área_de_formação                                                      4782 non-null   object 
 4   situação_de_trabalho                                                  4863 non-null   object 
 5   setor_da_empresa                                                      4863 non-null   object 
 6   atua_como_gestor                                                      4863 non-null   object 
 7   ca

In [19]:
base['situação_de_trabalho'].value_counts()

situação_de_trabalho
Empregado (CLT)                                                    3783
Empreendedor ou Empregado (CNPJ)                                    494
Estagiário                                                          186
Vivo no Brasil e trabalho remoto para empresa de fora do Brasil     131
Servidor Público                                                    124
Vivo fora do Brasil e trabalho para empresa de fora do Brasil        77
Freelancer                                                           41
Prefiro não informar                                                 16
Desempregado e não estou buscando recolocação                        11
Name: count, dtype: int64

Nosso objetivo é focado em empregados do tipo CLT.

Vamos manter apenas as respostas com essa situacao de trabalho.

In [20]:
base = base[base['situação_de_trabalho'] == "Empregado (CLT)"]

In [21]:
#Simplificar a coluna sobre uso de IA para considerar tanto as pagas quanto gratuitas
condicao = (base['Uso soluções gratuitas de AI Generativa com foco em produtividade'] == 1) | (base['Uso e pago pelas soluções de AI Generativa com foco em produtividade'] == 0)
base['Uso AI Generativa com foco em produtividade'] = condicao.map({True: 1, False: 0})

base = base.drop(columns=['Uso soluções gratuitas de AI Generativa com foco em produtividade', 'Uso e pago pelas soluções de AI Generativa com foco em produtividade']).copy()


In [22]:
base.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3783 entries, 2 to 5210
Data columns (total 16 columns):
 #   Column                                       Non-Null Count  Dtype  
---  ------                                       --------------  -----  
 0   regiao_onde_mora                             3749 non-null   object 
 1   vive_no_brasil                               3783 non-null   bool   
 2   nivel_de_ensino                              3783 non-null   object 
 3   área_de_formação                             3732 non-null   object 
 4   situação_de_trabalho                         3783 non-null   object 
 5   setor_da_empresa                             3783 non-null   object 
 6   atua_como_gestor                             3783 non-null   object 
 7   cargo_atual                                  3048 non-null   object 
 8   nivel_cargo                                  3048 non-null   object 
 9   tempo_de_experiencia_em_dados                3783 non-null   object 
 10  model

In [23]:
base.describe(exclude="number")

Unnamed: 0,regiao_onde_mora,vive_no_brasil,nivel_de_ensino,área_de_formação,situação_de_trabalho,setor_da_empresa,atua_como_gestor,cargo_atual,nivel_cargo,tempo_de_experiencia_em_dados,modelo_de_trabalho_atual,atuacao_em_dados,linguagem_mais_usada,faixa_salarial
count,3749,3783,3783,3732,3783,3783,3783,3048,3048,3783,3783,3783,2759,3783
unique,5,2,7,8,1,21,2,15,3,7,4,6,13,13
top,Sudeste,True,Pós-graduação,Computação / Engenharia de Software / Sistemas...,Empregado (CLT),Finanças ou Bancos,False,Analista de Dados/Data Analyst,Sênior,de 3 a 4 anos,Modelo 100% remoto,Análise de Dados,SQL,de R$ 8.001/mês a R$ 12.000/mês
freq,2387,3750,1541,1496,3783,937,3048,797,1330,1160,1708,1389,1471,942


In [24]:
base.nunique()

regiao_onde_mora                                5
vive_no_brasil                                  2
nivel_de_ensino                                 7
área_de_formação                                8
situação_de_trabalho                            1
setor_da_empresa                               21
atua_como_gestor                                2
cargo_atual                                    15
nivel_cargo                                     3
tempo_de_experiencia_em_dados                   7
modelo_de_trabalho_atual                        4
atuacao_em_dados                                6
linguagem_R_no_trabalho                         2
linguagem_mais_usada                           13
faixa_salarial                                 13
Uso AI Generativa com foco em produtividade     2
dtype: int64

In [25]:
colunas_binarias = base.nunique()[base.nunique() == 2].index.tolist()

colunas_binarias

['vive_no_brasil',
 'atua_como_gestor',
 'linguagem_R_no_trabalho',
 'Uso AI Generativa com foco em produtividade']

Essas colunas binárias são todas categóricas de "Sim" ou "Não", onde 1=Sim e 0=Não.

Formatar colunas para o tipo categórico e renomear para "Sim" ou "Não"

In [26]:
for coluna in colunas_binarias:
    base[coluna] = pd.Categorical(base[coluna]).rename_categories(["Não","Sim"])

In [27]:
#Tranformar coluna "nivel_de_ensino" em categorica ordenada

In [28]:
base['faixa_salarial'].value_counts()

faixa_salarial
de R$ 8.001/mês a R$ 12.000/mês     942
de R$ 12.001/mês a R$ 16.000/mês    599
de R$ 6.001/mês a R$ 8.000/mês      567
de R$ 4.001/mês a R$ 6.000/mês      506
de R$ 16.001/mês a R$ 20.000/mês    354
de R$ 3.001/mês a R$ 4.000/mês      211
de R$ 20.001/mês a R$ 25.000/mês    168
de R$ 2.001/mês a R$ 3.000/mês      155
de R$ 25.001/mês a R$ 30.000/mês    100
de R$ 30.001/mês a R$ 40.000/mês     85
Acima de R$ 40.001/mês               51
de R$ 1.001/mês a R$ 2.000/mês       43
Menos de R$ 1.000/mês                 2
Name: count, dtype: int64

In [29]:
ordem_salario = [
'Menos de R$ 1.000/mês',
'de R$ 1.001/mês a R$ 2.000/mês',
'de R$ 2.001/mês a R$ 3.000/mês',       
'de R$ 3.001/mês a R$ 4.000/mês',
'de R$ 4.001/mês a R$ 6.000/mês', 
'de R$ 6.001/mês a R$ 8.000/mês',
'de R$ 8.001/mês a R$ 12.000/mês',
'de R$ 12.001/mês a R$ 16.000/mês',
'de R$ 16.001/mês a R$ 20.000/mês',
'de R$ 20.001/mês a R$ 25.000/mês',
'de R$ 25.001/mês a R$ 30.000/mês',
'de R$ 30.001/mês a R$ 40.000/mês',       
'Acima de R$ 40.001/mês'
]

In [30]:
ordem_salario

['Menos de R$ 1.000/mês',
 'de R$ 1.001/mês a R$ 2.000/mês',
 'de R$ 2.001/mês a R$ 3.000/mês',
 'de R$ 3.001/mês a R$ 4.000/mês',
 'de R$ 4.001/mês a R$ 6.000/mês',
 'de R$ 6.001/mês a R$ 8.000/mês',
 'de R$ 8.001/mês a R$ 12.000/mês',
 'de R$ 12.001/mês a R$ 16.000/mês',
 'de R$ 16.001/mês a R$ 20.000/mês',
 'de R$ 20.001/mês a R$ 25.000/mês',
 'de R$ 25.001/mês a R$ 30.000/mês',
 'de R$ 30.001/mês a R$ 40.000/mês',
 'Acima de R$ 40.001/mês']

In [31]:
base["faixa_salarial"] = pd.Categorical(
    base["faixa_salarial"], 
    categories=ordem_salario, 
    ordered=True
).rename_categories(
    [
        "0-1k",
        "1k-2k",
        "2k-3k",
        "3k-4k",
        "4k-6k",
        "6k-8k",
        "8k-12k",
        "12k-16k",
        "16k-20k",
        "20k-25k",
        "25k-30k",
        "30k-40k",
        "40k+",
    ]
)

base["faixa_salarial"].head()

2     1k-2k
6     6k-8k
12     0-1k
13    3k-4k
14    4k-6k
Name: faixa_salarial, dtype: category
Categories (13, object): ['0-1k' < '1k-2k' < '2k-3k' < '3k-4k' ... '20k-25k' < '25k-30k' < '30k-40k' < '40k+']

Salários menores que 1k só tem 2 itens e podemos considerar esse range com Outlier. Para fazer a análise precisamos remover esses 2 outliers

In [32]:
base = base[base["faixa_salarial"] != "0-1k"].copy()

In [33]:
base.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3781 entries, 2 to 5210
Data columns (total 16 columns):
 #   Column                                       Non-Null Count  Dtype   
---  ------                                       --------------  -----   
 0   regiao_onde_mora                             3747 non-null   object  
 1   vive_no_brasil                               3781 non-null   category
 2   nivel_de_ensino                              3781 non-null   object  
 3   área_de_formação                             3730 non-null   object  
 4   situação_de_trabalho                         3781 non-null   object  
 5   setor_da_empresa                             3781 non-null   object  
 6   atua_como_gestor                             3781 non-null   category
 7   cargo_atual                                  3046 non-null   object  
 8   nivel_cargo                                  3046 non-null   object  
 9   tempo_de_experiencia_em_dados                3781 non-null   object 

#### Ajustando as demais colunas categoricas

In [34]:
base['regiao_onde_mora'] = pd.Categorical(
    base['regiao_onde_mora'],
    ordered=False
)
base['regiao_onde_mora'].head(5)

2          Sudeste
6          Sudeste
13             Sul
14         Sudeste
20    Centro-oeste
Name: regiao_onde_mora, dtype: category
Categories (5, object): ['Centro-oeste', 'Nordeste', 'Norte', 'Sudeste', 'Sul']

In [35]:
base['regiao_onde_mora'] = pd.Categorical(
    base['regiao_onde_mora'],
    ordered=False
)
base['nivel_de_ensino'].head(5)

2     Estudante de Graduação
6      Graduação/Bacharelado
13    Estudante de Graduação
14    Estudante de Graduação
20    Estudante de Graduação
Name: nivel_de_ensino, dtype: object

In [36]:
ordem_ensino = [
    'Não tenho graduação formal',
    'Estudante de Graduação',
    'Graduação/Bacharelado',
    "Pós-graduação",
    "Mestrado",
    "Doutorado ou Phd"
]

# 2. Limpeza: Remover o "Prefiro não informar" para não enviesar a ordem
base = base[base["nivel_de_ensino"] != "Prefiro não informar"].copy()

base['nivel_de_ensino'] = pd.Categorical(
    base['nivel_de_ensino'],
    categories = ordem_ensino,
    ordered=True
)

base['nivel_de_ensino'].value_counts().sort_index()

nivel_de_ensino
Não tenho graduação formal      50
Estudante de Graduação         291
Graduação/Bacharelado         1279
Pós-graduação                 1540
Mestrado                       477
Doutorado ou Phd               143
Name: count, dtype: int64

In [37]:
base['área_de_formação'] = pd.Categorical(
    base['área_de_formação'],
)

base['área_de_formação'].value_counts().sort_index()

área_de_formação
Ciências Biológicas/ Farmácia/ Medicina/ Área da Saúde                      83
Computação / Engenharia de Software / Sistemas de Informação/ TI          1494
Economia/ Administração / Contabilidade / Finanças/ Negócios               582
Estatística/ Matemática / Matemática Computacional/ Ciências Atuariais     347
Marketing / Publicidade / Comunicação / Jornalismo / Ciências Sociais      132
Outra opção                                                                186
Outras Engenharias (não incluir engenharia de software ou TI)              817
Química / Física                                                            89
Name: count, dtype: int64

In [38]:
# 1. Crie um dicionário com De -> Para
mapeamento_nomes = {
    'Ciências Biológicas/ Farmácia/ Medicina/ Área da Saúde': 'Saúde',
 'Computação / Engenharia de Software / Sistemas de Informação/ TI':'Tecnologia' ,
 'Economia/ Administração / Contabilidade / Finanças/ Negócios':'Finanças',
 'Estatística/ Matemática / Matemática Computacional/ Ciências Atuariais': 'Matemática' ,
 'Marketing / Publicidade / Comunicação / Jornalismo / Ciências Sociais':'Marketing',
 'Outras Engenharias (não incluir engenharia de software ou TI)':'Engenharias'
}

# 2. Renomeie as categorias
base['área_de_formação'] = base['área_de_formação'].cat.rename_categories(mapeamento_nomes)

# 3. Verifique o resultado ordenado
print(base['área_de_formação'].value_counts().sort_index())

área_de_formação
Saúde                 83
Tecnologia          1494
Finanças             582
Matemática           347
Marketing            132
Outra opção          186
Engenharias          817
Química / Física      89
Name: count, dtype: int64


In [39]:
base['situação_de_trabalho'].value_counts()

situação_de_trabalho
Empregado (CLT)    3780
Name: count, dtype: int64

Como filtramos apenas as pessoas que estão trabalhando no formato CLT, a coluna de situação de trabalho se torna irrelevante pois só traz essa unica informação. Melhor excluir essa coluna para facilitar nossa análise.

In [40]:
base = base.drop('situação_de_trabalho', axis=1)

In [41]:
base['setor_da_empresa'].info()                       

<class 'pandas.core.series.Series'>
Index: 3780 entries, 2 to 5210
Series name: setor_da_empresa
Non-Null Count  Dtype 
--------------  ----- 
3780 non-null   object
dtypes: object(1)
memory usage: 59.1+ KB


In [42]:
base['setor_da_empresa'] = pd.Categorical(
    base['setor_da_empresa'],
)

base['setor_da_empresa'].value_counts().sort_index()

setor_da_empresa
Agronegócios                            59
Educação                               127
Entretenimento ou Esportes              27
Filantropia/ONG's                        9
Finanças ou Bancos                     937
Indústria                              291
Internet/Ecommerce                     130
Marketing                               59
Outra Opção                            253
Seguros ou Previdência                  54
Setor Alimentício                      122
Setor Automotivo                        41
Setor Farmaceutico                      22
Setor Imobiliário/ Construção Civil     54
Setor Público                           41
Setor de Energia                        63
Tecnologia/Fábrica de Software         681
Telecomunicação                         83
Varejo                                 351
Área da Saúde                          142
Área de Consultoria                    234
Name: count, dtype: int64

In [43]:
base['cargo_atual'] = pd.Categorical(
    base['cargo_atual'],
)

base['cargo_atual'].value_counts().sort_index()

cargo_atual
Analista de BI/BI Analyst                                      324
Analista de Dados/Data Analyst                                 797
Analista de Negócios/Business Analyst                          160
Analista de Suporte/Analista Técnico                            66
Analytics Engineer                                             188
Arquiteto de Dados/Data Architect                               43
Cientista de Dados/Data Scientist                              562
Data Product Manager/ Product Manager (PM/APM/DPM/GPM/PO)       73
Desenvolvedor/ Engenheiro de Software/ Analista de Sistemas     80
Engenheiro de Dados/Data Engineer/Data Architect               466
Engenheiro de Machine Learning/ML Engineer/AI Engineer          75
Estatístico                                                     10
Outra Opção                                                    170
Outras Engenharias (não inclui dev)                             21
Professor/Pesquisador                             

In [44]:
base['cargo_atual'].unique().tolist()

['Outra Opção',
 'Analista de BI/BI Analyst',
 'Engenheiro de Dados/Data Engineer/Data Architect',
 'Analytics Engineer',
 'Analista de Negócios/Business Analyst',
 nan,
 'Analista de Dados/Data Analyst',
 'Analista de Suporte/Analista Técnico',
 'Engenheiro de Machine Learning/ML Engineer/AI Engineer',
 'Cientista de Dados/Data Scientist',
 'Desenvolvedor/ Engenheiro de Software/ Analista de Sistemas',
 'Arquiteto de Dados/Data Architect',
 'Data Product Manager/ Product Manager (PM/APM/DPM/GPM/PO)',
 'Estatístico',
 'Outras Engenharias (não inclui dev)',
 'Professor/Pesquisador']

In [45]:
# 1. Crie um dicionário com De -> Para
mapeamento_nomes = {
    'Outra Opção': 'Outra Opção',
 'Analista de BI/BI Analyst':'Analista de BI' ,
 'Engenheiro de Dados/Data Engineer/Data Architect':'Engenheiro de Dados',
 'Analista de Negócios/Business Analyst': 'Analista de Negócios' ,
 'Analista de Dados/Data Analyst':'Analista de Dados',
 'Analista de Suporte/Analista Técnico':'Analista de Suporte',
 'Engenheiro de Machine Learning/ML Engineer/AI Engineer':'ML Engineer',
 'Cientista de Dados/Data Scientist': 'Cientista de Dados',
 'Desenvolvedor/ Engenheiro de Software/ Analista de Sistemas': 'Desenvolvedor',
 'Arquiteto de Dados/Data Architect':'Arquiteto de Dados' ,
 'Data Product Manager/ Product Manager (PM/APM/DPM/GPM/PO)':'Product Manager',
 'Outras Engenharias (não inclui dev)':'Outras Engenharias' 
}

# 2. Renomeie as categorias
base['cargo_atual'] = base['cargo_atual'].cat.rename_categories(mapeamento_nomes)

# 3. Verifique o resultado ordenado
print(base['cargo_atual'].value_counts().sort_index())

cargo_atual
Analista de BI           324
Analista de Dados        797
Analista de Negócios     160
Analista de Suporte       66
Analytics Engineer       188
Arquiteto de Dados        43
Cientista de Dados       562
Product Manager           73
Desenvolvedor             80
Engenheiro de Dados      466
ML Engineer               75
Estatístico               10
Outra Opção              170
Outras Engenharias        21
Professor/Pesquisador     10
Name: count, dtype: int64


In [46]:
ordem_cargo = [
    'Júnior',
    'Pleno',
    'Sênior'
]

base['nivel_cargo'] = pd.Categorical(
    base['nivel_cargo'],
    categories = ordem_cargo,
    ordered=True
)

base['nivel_cargo'].value_counts().sort_index()



nivel_cargo
Júnior     563
Pleno     1154
Sênior    1328
Name: count, dtype: int64

In [47]:
base['nivel_cargo'].head(5)

2     Júnior
6     Júnior
13    Júnior
14    Júnior
20     Pleno
Name: nivel_cargo, dtype: category
Categories (3, object): ['Júnior' < 'Pleno' < 'Sênior']

In [48]:
ordem_experiencia = [
    'Não tenho experiência na área de dados',
    'Menos de 1 ano',
    'de 1 a 2 anos',
    'de 3 a 4 anos',
    'de 5 a 6 anos',
    'de 7 a 10 anos',
    'Mais de 10 anos'
]

base['tempo_de_experiencia_em_dados'] = pd.Categorical(
    base['tempo_de_experiencia_em_dados'],
    categories = ordem_experiencia,
    ordered=True
)

base['tempo_de_experiencia_em_dados'].value_counts().sort_index()

tempo_de_experiencia_em_dados
Não tenho experiência na área de dados     134
Menos de 1 ano                             218
de 1 a 2 anos                              715
de 3 a 4 anos                             1160
de 5 a 6 anos                              666
de 7 a 10 anos                             438
Mais de 10 anos                            449
Name: count, dtype: int64

In [49]:
ordem_modelo = [
    'Modelo 100% presencial',
    'Modelo híbrido com dias fixos de trabalho presencial',
    'Modelo híbrido flexível (o funcionário tem liberdade para escolher quando estar no escritório presencialmente)',
    'Modelo 100% remoto'
]

base['modelo_de_trabalho_atual'] = pd.Categorical(
    base['modelo_de_trabalho_atual'],
    categories = ordem_modelo,
    ordered=True
)

base['modelo_de_trabalho_atual'].value_counts().sort_index()

modelo_de_trabalho_atual
Modelo 100% presencial                                                                                             584
Modelo híbrido com dias fixos de trabalho presencial                                                               701
Modelo híbrido flexível (o funcionário tem liberdade para escolher quando estar no escritório presencialmente)     788
Modelo 100% remoto                                                                                                1707
Name: count, dtype: int64

In [50]:
# 1. Crie um dicionário com De -> Para
mapeamento_nomes = {
    'Modelo 100% presencial': 'Presencial',
    'Modelo híbrido com dias fixos de trabalho presencial': 'Híbrido Fixo',
    'Modelo híbrido flexível (o funcionário tem liberdade para escolher quando estar no escritório presencialmente)': 'Híbrido Flexível',
    'Modelo 100% remoto': 'Remoto'
}

# 2. Renomeie as categorias
base['modelo_de_trabalho_atual'] = base['modelo_de_trabalho_atual'].cat.rename_categories(mapeamento_nomes)

# 3. Verifique o resultado ordenado
print(base['modelo_de_trabalho_atual'].value_counts().sort_index())

modelo_de_trabalho_atual
Presencial           584
Híbrido Fixo         701
Híbrido Flexível     788
Remoto              1707
Name: count, dtype: int64


In [51]:

base['atuacao_em_dados'] = pd.Categorical(
    base['atuacao_em_dados'],
    ordered=False
)



base['atuacao_em_dados'].value_counts().sort_index()

atuacao_em_dados
Análise de Dados                          1389
Buscando oportunidade na área de dados      95
Ciência de Dados                           621
Engenharia de Dados                        719
Gestor                                     735
Outra atuação                              221
Name: count, dtype: int64

In [52]:
base['atuacao_em_dados'] = base['atuacao_em_dados'].cat.rename_categories({
    'Buscando oportunidade na área de dados': 'Fora de Dados', 
})

In [53]:
base['linguagem_mais_usada'] = pd.Categorical(
    base['linguagem_mais_usada'],
    ordered=False
)

base['linguagem_mais_usada'] = base['linguagem_mais_usada'].cat.rename_categories({
    'Não utilizo nenhuma das linguagens listadas no trabalho': 'Nenhuma listada', 
})

base['linguagem_mais_usada'].value_counts().sort_index()



linguagem_mais_usada
.NET                   4
C/C++/C#               3
Java                   5
JavaScript            13
Matlab                 1
Nenhuma listada       18
PHP                    2
Python              1124
R                     52
SAS/Stata             13
SQL                 1470
Scala                 22
Visual Basic/VBA      30
Name: count, dtype: int64

In [54]:
base['Uso AI Generativa com foco em produtividade'].value_counts()

Uso AI Generativa com foco em produtividade
Sim    2522
Não    1258
Name: count, dtype: int64

In [55]:
base.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3780 entries, 2 to 5210
Data columns (total 15 columns):
 #   Column                                       Non-Null Count  Dtype   
---  ------                                       --------------  -----   
 0   regiao_onde_mora                             3746 non-null   category
 1   vive_no_brasil                               3780 non-null   category
 2   nivel_de_ensino                              3780 non-null   category
 3   área_de_formação                             3730 non-null   category
 4   setor_da_empresa                             3780 non-null   category
 5   atua_como_gestor                             3780 non-null   category
 6   cargo_atual                                  3045 non-null   category
 7   nivel_cargo                                  3045 non-null   category
 8   tempo_de_experiencia_em_dados                3780 non-null   category
 9   modelo_de_trabalho_atual                     3780 non-null   categor

Prencher as colunas nulas em linguagem_mais_usada com a categoria "Não utilizo nenhuma das linguagens listadas no trabalho"

In [56]:
# 2. Preencher os valores nulos (NaN) com a frase desejada
base['linguagem_mais_usada'] = base['linguagem_mais_usada'].fillna("Nenhuma listada")

# 3. Verificar o resultado
print(base['linguagem_mais_usada'].value_counts())

linguagem_mais_usada
SQL                 1470
Python              1124
Nenhuma listada     1041
R                     52
Visual Basic/VBA      30
Scala                 22
JavaScript            13
SAS/Stata             13
Java                   5
.NET                   4
C/C++/C#               3
PHP                    2
Matlab                 1
Name: count, dtype: int64


In [57]:
base.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3780 entries, 2 to 5210
Data columns (total 15 columns):
 #   Column                                       Non-Null Count  Dtype   
---  ------                                       --------------  -----   
 0   regiao_onde_mora                             3746 non-null   category
 1   vive_no_brasil                               3780 non-null   category
 2   nivel_de_ensino                              3780 non-null   category
 3   área_de_formação                             3730 non-null   category
 4   setor_da_empresa                             3780 non-null   category
 5   atua_como_gestor                             3780 non-null   category
 6   cargo_atual                                  3045 non-null   category
 7   nivel_cargo                                  3045 non-null   category
 8   tempo_de_experiencia_em_dados                3780 non-null   category
 9   modelo_de_trabalho_atual                     3780 non-null   categor

In [58]:
print(base['linguagem_R_no_trabalho'].value_counts())

linguagem_R_no_trabalho
Não    2633
Sim     255
Name: count, dtype: int64


Prencher as colunas nulas em **"linguagem_R_no_trabalho"** com a categoria **"Não"**

In [59]:
# 2. Preencher os valores nulos (NaN) com a frase desejada
base['linguagem_R_no_trabalho'] = base['linguagem_R_no_trabalho'].fillna("Não")

# 3. Verificar o resultado
print(base['linguagem_R_no_trabalho'].value_counts())

linguagem_R_no_trabalho
Não    3525
Sim     255
Name: count, dtype: int64


#### Tratamento de colunas Nulas em "cargo_atual" e "nivel_cargo"

In [60]:
(base['atua_como_gestor'] == 'Sim').sum()

np.int64(735)

In [61]:
print(base['cargo_atual'].isna().sum())

735


In [62]:
print(base['nivel_cargo'].isna().sum())

735


Podemos ver acima que a linhas nulas nessas colunas é porque a pessoa selecionou que é gestor e por isso essa questão não foi respondida já que é voltada para não gestores. Vamos prencher as linhas nulas dessa coluna com a categoria "Gestor"

In [63]:
# 1. Adicionar a nova categoria à lista de categorias permitidas
# Se não fizeres isto, o fillna dará erro porque "Gestor" não é Sim nem Não
base['cargo_atual'] = base['cargo_atual'].cat.add_categories(["Gestor"])
base['nivel_cargo'] = base['nivel_cargo'].cat.add_categories(["Gestor"])


# 2. Preencher os campos nulos (NaN) com a nova categoria
base['cargo_atual'] = base['cargo_atual'].fillna("Gestor")
base['nivel_cargo'] = base['nivel_cargo'].fillna("Gestor")

In [64]:
base.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3780 entries, 2 to 5210
Data columns (total 15 columns):
 #   Column                                       Non-Null Count  Dtype   
---  ------                                       --------------  -----   
 0   regiao_onde_mora                             3746 non-null   category
 1   vive_no_brasil                               3780 non-null   category
 2   nivel_de_ensino                              3780 non-null   category
 3   área_de_formação                             3730 non-null   category
 4   setor_da_empresa                             3780 non-null   category
 5   atua_como_gestor                             3780 non-null   category
 6   cargo_atual                                  3780 non-null   category
 7   nivel_cargo                                  3780 non-null   category
 8   tempo_de_experiencia_em_dados                3780 non-null   category
 9   modelo_de_trabalho_atual                     3780 non-null   categor

In [65]:
(base['regiao_onde_mora'].isna().sum())

np.int64(34)

In [66]:
3782-3748

34

34 linhas da coluna **regiao_onde_mora** estão vazias porque são caso de pessoas que moram fora do Brasil. Vamos preencher esses campos vazios com a sigla **"EX"** que nesse caso significa pessoas que moram no **Exterior**.

In [67]:
# 1. Adicionar a nova categoria à lista de categorias permitidas
base['regiao_onde_mora'] = base['regiao_onde_mora'].cat.add_categories(["EX"])

# 2. Preencher os campos nulos (NaN) com a nova categoria
base['regiao_onde_mora'] = base['regiao_onde_mora'].fillna("EX")

base['regiao_onde_mora'].value_counts()

regiao_onde_mora
Sudeste         2386
Sul              805
Nordeste         329
Centro-oeste     191
Norte             35
EX                34
Name: count, dtype: int64

In [68]:
base.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3780 entries, 2 to 5210
Data columns (total 15 columns):
 #   Column                                       Non-Null Count  Dtype   
---  ------                                       --------------  -----   
 0   regiao_onde_mora                             3780 non-null   category
 1   vive_no_brasil                               3780 non-null   category
 2   nivel_de_ensino                              3780 non-null   category
 3   área_de_formação                             3730 non-null   category
 4   setor_da_empresa                             3780 non-null   category
 5   atua_como_gestor                             3780 non-null   category
 6   cargo_atual                                  3780 non-null   category
 7   nivel_cargo                                  3780 non-null   category
 8   tempo_de_experiencia_em_dados                3780 non-null   category
 9   modelo_de_trabalho_atual                     3780 non-null   categor

In [69]:
base.describe()

Unnamed: 0,regiao_onde_mora,vive_no_brasil,nivel_de_ensino,área_de_formação,setor_da_empresa,atua_como_gestor,cargo_atual,nivel_cargo,tempo_de_experiencia_em_dados,modelo_de_trabalho_atual,atuacao_em_dados,linguagem_R_no_trabalho,linguagem_mais_usada,faixa_salarial,Uso AI Generativa com foco em produtividade
count,3780,3780,3780,3730,3780,3780,3780,3780,3780,3780,3780,3780,3780,3780,3780
unique,6,2,6,8,21,2,16,4,7,4,6,2,13,12,2
top,Sudeste,Sim,Pós-graduação,Tecnologia,Finanças ou Bancos,Não,Analista de Dados,Sênior,de 3 a 4 anos,Remoto,Análise de Dados,Não,SQL,8k-12k,Sim
freq,2386,3747,1540,1494,937,3045,797,1328,1160,1707,1389,3525,1470,942,2522


Agora temos todas as colunas com 3782 linhas preenchidas, sem nulos e com os tipos adequados para começar a análise estatística.

Vamos exportar essa tabela em formato parquet e salva na nossa pasta de dados.

In [70]:
base.to_parquet(PASTA_DADOS / "State_of_Data_2024_tratado.parquet", index=False)

In [71]:
base_tratado = pd.read_parquet(DADOS_TRATADO)

base_tratado.head()


Unnamed: 0,regiao_onde_mora,vive_no_brasil,nivel_de_ensino,área_de_formação,setor_da_empresa,atua_como_gestor,cargo_atual,nivel_cargo,tempo_de_experiencia_em_dados,modelo_de_trabalho_atual,atuacao_em_dados,linguagem_R_no_trabalho,linguagem_mais_usada,faixa_salarial,Uso AI Generativa com foco em produtividade
0,Sudeste,Sim,Estudante de Graduação,Tecnologia,Indústria,Não,Outra Opção,Júnior,Não tenho experiência na área de dados,Presencial,Outra atuação,Não,Nenhuma listada,1k-2k,Sim
1,Sudeste,Sim,Graduação/Bacharelado,Finanças,Telecomunicação,Não,Analista de BI,Júnior,de 5 a 6 anos,Presencial,Engenharia de Dados,Não,C/C++/C#,6k-8k,Sim
2,Sul,Sim,Estudante de Graduação,Tecnologia,Área de Consultoria,Não,Engenheiro de Dados,Júnior,de 1 a 2 anos,Remoto,Engenharia de Dados,Não,Python,3k-4k,Não
3,Sudeste,Sim,Estudante de Graduação,Tecnologia,Varejo,Não,Analytics Engineer,Júnior,Menos de 1 ano,Remoto,Análise de Dados,Não,Python,4k-6k,Sim
4,Centro-oeste,Sim,Estudante de Graduação,Matemática,Finanças ou Bancos,Não,Analista de Negócios,Pleno,de 1 a 2 anos,Presencial,Outra atuação,Não,Visual Basic/VBA,2k-3k,Sim
