"""PayScale Inc. did a year-long survey of 1.2 million Americans with only a bachelor's degree"""

the source of the data set can be found on https://www.kaggle.com/datasets/wsj/college-salaries

#Questions that we wanna answer using pandas:

1 Which degrees have the highest starting salaries?

2 Which majors have the lowest earnings after college?

3 Which degrees have the highest earning potential?

4 What are the lowest risk college majors from an earnings standpoint?

5 Do business, STEM (Science, Technology, Engineering, Mathematics) or HASS (Humanities, Arts, Social Science degrees earn more on average?

In [1]:
import pandas as pd

In [2]:
salaries = pd.read_csv('salaries_by_college_major.csv')

In [3]:
salaries.head()

Unnamed: 0,Undergraduate Major,Starting Median Salary,Mid-Career Median Salary,Mid-Career 10th Percentile Salary,Mid-Career 90th Percentile Salary,Group
0,Accounting,46000.0,77100.0,42200.0,152000.0,Business
1,Aerospace Engineering,57700.0,101000.0,64300.0,161000.0,STEM
2,Agriculture,42600.0,71900.0,36300.0,150000.0,Business
3,Anthropology,36800.0,61500.0,33800.0,138000.0,HASS
4,Architecture,41600.0,76800.0,50600.0,136000.0,Business


In [247]:
salaries.shape


(51, 6)

In [5]:
salaries.columns


Index(['Undergraduate Major', 'Starting Median Salary',
       'Mid-Career Median Salary', 'Mid-Career 10th Percentile Salary',
       'Mid-Career 90th Percentile Salary', 'Group'],
      dtype='object')

In [6]:
salaries.isna()

Unnamed: 0,Undergraduate Major,Starting Median Salary,Mid-Career Median Salary,Mid-Career 10th Percentile Salary,Mid-Career 90th Percentile Salary,Group
0,False,False,False,False,False,False
1,False,False,False,False,False,False
2,False,False,False,False,False,False
3,False,False,False,False,False,False
4,False,False,False,False,False,False
5,False,False,False,False,False,False
6,False,False,False,False,False,False
7,False,False,False,False,False,False
8,False,False,False,False,False,False
9,False,False,False,False,False,False


In [7]:
salaries = salaries.dropna(how='any')

In [251]:
#1 finding the college major with the highest starting salaries

salaries[salaries['Starting Median Salary'] == salaries['Starting Median Salary'].max()]  #Hard coded like this a little bit messy 

Unnamed: 0,Undergraduate Major,Starting Median Salary,Mid-Career Median Salary,Mid-Career 10th Percentile Salary,Mid-Career 90th Percentile Salary,Group
43,Physician Assistant,74300.0,91700.0,66400.0,124000.0,STEM


In [252]:
#1 finding the college major with the highest starting salaries

highest_starting_index = salaries['Starting Median Salary'].idxmax()  #THis will give us the index of the row of the maximum
salaries.loc[highest_starting_index]  #This will give us the data series
salaries['Undergraduate Major'][highest_starting_index] #this will give us just the undergraduate major with the highest starting salary median 

'Physician Assistant'

In [253]:
salaries.head()

Unnamed: 0,Undergraduate Major,Starting Median Salary,Mid-Career Median Salary,Mid-Career 10th Percentile Salary,Mid-Career 90th Percentile Salary,Group
0,Accounting,46000.0,77100.0,42200.0,152000.0,Business
1,Aerospace Engineering,57700.0,101000.0,64300.0,161000.0,STEM
2,Agriculture,42600.0,71900.0,36300.0,150000.0,Business
3,Anthropology,36800.0,61500.0,33800.0,138000.0,HASS
4,Architecture,41600.0,76800.0,50600.0,136000.0,Business


In [254]:
#2 Which majors have the lowest earnings after college?

lowest_starting_index = salaries['Starting Median Salary'].idxmin()
salaries.loc[lowest_starting_index]
salaries['Undergraduate Major'][lowest_starting_index]


'Spanish'

In [255]:
# which college has the highest mid career salary

highest_mid_salary = salaries['Mid-Career Median Salary'].idxmax()
salaries.loc[highest_mid_salary]
salaries['Undergraduate Major'][highest_mid_salary]

'Chemical Engineering'

In [256]:
# Which college major has the lowest mid-career salary

lowest_mid_salary = salaries['Mid-Career Median Salary'].idxmin()
salaries['Undergraduate Major'][lowest_mid_salary]

'Education'

In [257]:
#3 Which degrees have the highest earning potential?

higest_potential_index = salaries['Mid-Career 90th Percentile Salary'].idxmax()
salaries.loc[higest_potential_index]

Undergraduate Major                  Economics
Starting Median Salary                 50100.0
Mid-Career Median Salary               98600.0
Mid-Career 10th Percentile Salary      50600.0
Mid-Career 90th Percentile Salary     210000.0
Group                                 Business
Name: 17, dtype: object

In [258]:
# if we wanna know the degrees in general with the highest potential we can use sort_values()

high_potential_salaries = salaries.sort_values(by='Mid-Career 90th Percentile Salary',ascending=False)
high_potential_salaries[['Undergraduate Major', 'Mid-Career 90th Percentile Salary']].head()


Unnamed: 0,Undergraduate Major,Mid-Career 90th Percentile Salary
17,Economics,210000.0
22,Finance,195000.0
8,Chemical Engineering,194000.0
37,Math,183000.0
44,Physics,178000.0


10th Percentile Salary: This is a low-end wage benchmark. If a salary is at the 10th percentile, it means only 10% of all workers in that specific occupation and location make that amount or less, while a large majority (90%) make more. This level of pay is often representative of entry-level wages.

In [259]:
#4 What are the lowest risk college majors from an earnings standpoint?
# this is going to be the diffrence between the 90th percentile and the 10th percentile if it's small then it's low risk and vice versa. 

salaries['Mid-Career 90th Percentile Salary'] - salaries['Mid-Career 10th Percentile Salary']
difference_column = salaries['Mid-Career 90th Percentile Salary'].subtract(salaries['Mid-Career 10th Percentile Salary'])

In [260]:
difference_column.head()

0    109800.0
1     96700.0
2    113700.0
3    104200.0
4     85400.0
dtype: float64

In [261]:
salaries.insert(1, 'Risk', difference_column)

In [262]:
salaries.sort_values(by='Risk').head()

Unnamed: 0,Undergraduate Major,Risk,Starting Median Salary,Mid-Career Median Salary,Mid-Career 10th Percentile Salary,Mid-Career 90th Percentile Salary,Group
40,Nursing,50700.0,54200.0,67000.0,47600.0,98300.0,Business
43,Physician Assistant,57600.0,74300.0,91700.0,66400.0,124000.0,STEM
41,Nutrition,65300.0,39900.0,55300.0,33900.0,99200.0,HASS
49,Spanish,65400.0,34000.0,53100.0,31000.0,96400.0,HASS
27,Health Care Administration,66400.0,38800.0,60600.0,34600.0,101000.0,Business


In [263]:
salaries.loc[salaries['Risk'].idxmin()]['Undergraduate Major'] # this is the lowest risk major from an earning stand point 

'Nursing'

In [272]:
# Which degrees have the highest earning potential?

highest_earning_potential = salaries.sort_values(by='Mid-Career 90th Percentile Salary', ascending=False)
highest_earning_potential[['Undergraduate Major', 'Mid-Career 90th Percentile Salary']].head()




Unnamed: 0,Undergraduate Major,Mid-Career 90th Percentile Salary
17,Economics,210000.0
22,Finance,195000.0
8,Chemical Engineering,194000.0
37,Math,183000.0
44,Physics,178000.0


In [277]:
#let's check just for fun the high risk Undergarduate majors 

highest_risk_majors = salaries.sort_values(by="Risk", ascending=False)
highest_risk_majors.head()

# Economics, Finance, Math although they have the highest earnings potentials they are high risk that means some people with these fancy degrees out there are \
# still not earning as much 

Unnamed: 0,Undergraduate Major,Risk,Starting Median Salary,Mid-Career Median Salary,Mid-Career 10th Percentile Salary,Mid-Career 90th Percentile Salary,Group
17,Economics,159400.0,50100.0,98600.0,50600.0,210000.0,Business
22,Finance,147800.0,47900.0,88300.0,47200.0,195000.0,Business
37,Math,137800.0,45400.0,92400.0,45200.0,183000.0,STEM
36,Marketing,132900.0,40800.0,79600.0,42100.0,175000.0,Business
42,Philosophy,132500.0,39900.0,81200.0,35500.0,168000.0,HASS


In [279]:
salaries.groupby('Group').count()

Unnamed: 0_level_0,Undergraduate Major,Risk,Starting Median Salary,Mid-Career Median Salary,Mid-Career 10th Percentile Salary,Mid-Career 90th Percentile Salary
Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Business,12,12,12,12,12,12
HASS,22,22,22,22,22,22
STEM,16,16,16,16,16,16


In [284]:
#5 Do business, STEM (Science, Technology, Engineering, Mathematics) or HASS (Humanities, Arts, Social Science degrees earn more on average?
pd.options.display.float_format = '{:,.2f}'.format   #this is a neat trick i grabed online so we don't have the big float numbers bloating ourt table:)
salaries.groupby('Group').mean(numeric_only=True)


Unnamed: 0_level_0,Risk,Starting Median Salary,Mid-Career Median Salary,Mid-Career 10th Percentile Salary,Mid-Career 90th Percentile Salary
Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Business,103958.33,44633.33,75083.33,43566.67,147525.0
HASS,95218.18,37186.36,62968.18,34145.45,129363.64
STEM,101600.0,53862.5,90812.5,56025.0,157625.0


In [None]:
#here we see that HASS has the lowest risk, as well as the lowest starting salaries and the lowest growth potential compared to Business & STEM

#STEM has the medium risk, highest starting salaries and the highest growth potential compared with Business and HASS