<h1>Analise econômica dos cursos universitários</h1>

<img src='https://voupassar.club/wp-content/uploads/2016/10/Licenciatura-plena.jpg' width='auto'></img>
<p>Milhões de estudantes universitários enfrentam uma realidade sombria: um diploma universitário não é garantia de sucesso econômico. Mas, por meio da escolha do curso, eles podem dar pelo menos alguns passos para aumentar suas chances.<p>
<h2>Objetivo:</h2>
<p>O conjunto de dados a ser tratado, é um guia que tem como objetivo trazer as médias salariais dos cursos listados.</p>
<p>Fonte: <a src=https://www.kaggle.com/datasets/williecosta/economic-guide-to-college-majors>https://www.kaggle.com/datasets/williecosta/economic-guide-to-college-majors</a></p>


<h3>Bibliotecas Utilizadas:</h3>

In [19]:
import pandas as pd
import numpy as np
import seaborn as sns

<h3>Importando a base de dados:</h3>

In [20]:
dataset = pd.read_csv('./college_majors.csv')
display(dataset)

Unnamed: 0,Rank,Major_code,Major,Total,Men,Women,Major_category,ShareWomen,Sample_size,Employed,...,Part_time,Full_time_year_round,Unemployed,Unemployment_rate,Median,P25th,P75th,College_jobs,Non_college_jobs,Low_wage_jobs
0,1,2419,PETROLEUM ENGINEERING,2339.0,2057.0,282.0,Engineering,0.1206,36,1976,...,270,1207,37,0.0184,110000,95000,125000,1534,364,193
1,2,2416,MINING AND MINERAL ENGINEERING,756.0,679.0,77.0,Engineering,0.1019,7,640,...,170,388,85,0.1172,75000,55000,90000,350,257,50
2,3,2415,METALLURGICAL ENGINEERING,856.0,725.0,131.0,Engineering,0.1530,3,648,...,133,340,16,0.0241,73000,50000,105000,456,176,0
3,4,2417,NAVAL ARCHITECTURE AND MARINE ENGINEERING,1258.0,1123.0,135.0,Engineering,0.1073,16,758,...,150,692,40,0.0501,70000,43000,80000,529,102,0
4,5,2405,CHEMICAL ENGINEERING,32260.0,21239.0,11021.0,Engineering,0.3416,289,25694,...,5180,16697,1672,0.0611,65000,50000,75000,18314,4440,972
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
168,169,3609,ZOOLOGY,8409.0,3050.0,5359.0,Biology & Life Science,0.6373,47,6259,...,2190,3602,304,0.0463,26000,20000,39000,2771,2947,743
169,170,5201,EDUCATIONAL PSYCHOLOGY,2854.0,522.0,2332.0,Psychology & Social Work,0.8171,7,2125,...,572,1211,148,0.0651,25000,24000,34000,1488,615,82
170,171,5202,CLINICAL PSYCHOLOGY,2838.0,568.0,2270.0,Psychology & Social Work,0.7999,13,2101,...,648,1293,368,0.1490,25000,25000,40000,986,870,622
171,172,5203,COUNSELING PSYCHOLOGY,4626.0,931.0,3695.0,Psychology & Social Work,0.7987,21,3777,...,965,2738,214,0.0536,23400,19200,26000,2403,1245,308


<h3>Descrição das colunas:</h3>
<ul>
    <li><b>Rank:</b> Classificação principal por salário médio.</li>
    <li><b>Major_code:</b> Código FOD1P para cada curso.</li>
    <li><b>Major:</b> Descrição do curso.</li>
    <li><b>Total:</b> Número total de graduados do curso</li>
    <li><b>Men:</b> Número total de graduados do sexo masculino.</li>
    <li><b>Woman:</b> Número total de graduados do sexo feminino.</li>
    <li><b>Major_category:</b> Categoria principal.</li>
    <li><b>ShareWoman:</b> Número total de graduadas do sexo feminino dividido pelo total de graduadas </li>
    <li><b>Sample_size:</b> Tamanho da amostra não ponderada de graduados com emprego em período integral, SOMENTE o ano todo.</li>
    <li><b>Employed:</b> Número de graduados empregados.</li>
    <li><b>Full_time:</b> Empregado 35 horas ou mais.</li>
    <li><b>Part_time:</b> Empregado menos de 35 horas.</li>
    <li><b>Full_time_year_round:</b> Empregado pelo menos 50 semanas.</li>
    <li><b>Unemployed_rate:</b> Taxa de desempregados (Desempregado / (Desempregado + Empregado)).</li>
    <li><b>Median:</b> Ganhos médios de trabalhadores em tempo integral durante todo o ano.</li>
    <li><b>P25th:</b> 25º percentil de ganhos.</li>
    <li><b>P75th:</b> 75º percentil de ganhos.</li>
    <li><b>College_jobs:</b> Número com trabalho que exige um diploma universitário.</li>
    <li><b>Non_college_jobs:</b> Número com trabalho que não exige diploma universitário.</li>
    <li><b>Low_wage_jobs:</b> Número de empregos de serviços de baixo salário.</li>
</ul>

<h3>Capturando informações sobre a estrutura do <i>Dataset:</i></h3>

In [21]:
dataset.dtypes

Rank                      int64
Major_code                int64
Major                    object
Total                   float64
Men                     float64
Women                   float64
Major_category           object
ShareWomen              float64
Sample_size               int64
Employed                  int64
Full_time                 int64
Part_time                 int64
Full_time_year_round      int64
Unemployed                int64
Unemployment_rate       float64
Median                    int64
P25th                     int64
P75th                     int64
College_jobs              int64
Non_college_jobs          int64
Low_wage_jobs             int64
dtype: object

In [22]:
dataset.columns

Index(['Rank', 'Major_code', 'Major', 'Total', 'Men', 'Women',
       'Major_category', 'ShareWomen', 'Sample_size', 'Employed', 'Full_time',
       'Part_time', 'Full_time_year_round', 'Unemployed', 'Unemployment_rate',
       'Median', 'P25th', 'P75th', 'College_jobs', 'Non_college_jobs',
       'Low_wage_jobs'],
      dtype='object')

In [23]:
dataset.shape

(173, 21)

<h3>Verificando duplicatas:</h3>

In [24]:
def check_duplicates(data,column):
    if any(data.duplicated()) == True:
        print(f'Há valores duplicados na coluna {column}')
    else:
        print('Não há valores duplicados')

In [25]:
dataset.drop_duplicates(subset='Major', keep=False, inplace=True)
dataset.reset_index(drop=True)

Unnamed: 0,Rank,Major_code,Major,Total,Men,Women,Major_category,ShareWomen,Sample_size,Employed,...,Part_time,Full_time_year_round,Unemployed,Unemployment_rate,Median,P25th,P75th,College_jobs,Non_college_jobs,Low_wage_jobs
0,1,2419,PETROLEUM ENGINEERING,2339.0,2057.0,282.0,Engineering,0.1206,36,1976,...,270,1207,37,0.0184,110000,95000,125000,1534,364,193
1,2,2416,MINING AND MINERAL ENGINEERING,756.0,679.0,77.0,Engineering,0.1019,7,640,...,170,388,85,0.1172,75000,55000,90000,350,257,50
2,3,2415,METALLURGICAL ENGINEERING,856.0,725.0,131.0,Engineering,0.1530,3,648,...,133,340,16,0.0241,73000,50000,105000,456,176,0
3,4,2417,NAVAL ARCHITECTURE AND MARINE ENGINEERING,1258.0,1123.0,135.0,Engineering,0.1073,16,758,...,150,692,40,0.0501,70000,43000,80000,529,102,0
4,5,2405,CHEMICAL ENGINEERING,32260.0,21239.0,11021.0,Engineering,0.3416,289,25694,...,5180,16697,1672,0.0611,65000,50000,75000,18314,4440,972
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
168,169,3609,ZOOLOGY,8409.0,3050.0,5359.0,Biology & Life Science,0.6373,47,6259,...,2190,3602,304,0.0463,26000,20000,39000,2771,2947,743
169,170,5201,EDUCATIONAL PSYCHOLOGY,2854.0,522.0,2332.0,Psychology & Social Work,0.8171,7,2125,...,572,1211,148,0.0651,25000,24000,34000,1488,615,82
170,171,5202,CLINICAL PSYCHOLOGY,2838.0,568.0,2270.0,Psychology & Social Work,0.7999,13,2101,...,648,1293,368,0.1490,25000,25000,40000,986,870,622
171,172,5203,COUNSELING PSYCHOLOGY,4626.0,931.0,3695.0,Psychology & Social Work,0.7987,21,3777,...,965,2738,214,0.0536,23400,19200,26000,2403,1245,308


In [26]:
check_duplicates(dataset.Major, 'Major')

Não há valores duplicados


<h3>Verificando valores nulos:</h3>

In [27]:
dataset = dataset.fillna(method='ffill')
dataset.isnull().sum()

Rank                    0
Major_code              0
Major                   0
Total                   0
Men                     0
Women                   0
Major_category          0
ShareWomen              0
Sample_size             0
Employed                0
Full_time               0
Part_time               0
Full_time_year_round    0
Unemployed              0
Unemployment_rate       0
Median                  0
P25th                   0
P75th                   0
College_jobs            0
Non_college_jobs        0
Low_wage_jobs           0
dtype: int64