# College Major v.s. Salary - Data Analysis

### Questions
**1) Which degrees have the highest starting salaries?**

**2) Which majors have the lowest earnings after college?**

**3) Which degrees have the highest earning potential?**

**4) What are the lowest risk college majors from an earnings standpoint?**

**5) Do business, STEM (Science, Technology, Engineering, Mathematics) or HASS (Humanities, Arts, Social Science) degrees earn more on average?**

In [1]:
import pandas as pd

pd.options.display.float_format = "{:,.2f}".format

In [2]:
df = pd.read_csv("salaries_by_college_major.csv")
df.head()

Unnamed: 0,Undergraduate Major,Starting Median Salary,Mid-Career Median Salary,Mid-Career 10th Percentile Salary,Mid-Career 90th Percentile Salary,Group
0,Accounting,46000.0,77100.0,42200.0,152000.0,Business
1,Aerospace Engineering,57700.0,101000.0,64300.0,161000.0,STEM
2,Agriculture,42600.0,71900.0,36300.0,150000.0,Business
3,Anthropology,36800.0,61500.0,33800.0,138000.0,HASS
4,Architecture,41600.0,76800.0,50600.0,136000.0,Business


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51 entries, 0 to 50
Data columns (total 6 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   Undergraduate Major                51 non-null     object 
 1   Starting Median Salary             50 non-null     float64
 2   Mid-Career Median Salary           50 non-null     float64
 3   Mid-Career 10th Percentile Salary  50 non-null     float64
 4   Mid-Career 90th Percentile Salary  50 non-null     float64
 5   Group                              50 non-null     object 
dtypes: float64(4), object(2)
memory usage: 2.5+ KB


In [4]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Starting Median Salary,50.0,44310.0,9360.87,34000.0,37050.0,40850.0,49875.0,74300.0
Mid-Career Median Salary,50.0,74786.0,16088.4,52000.0,60825.0,72000.0,88750.0,107000.0
Mid-Career 10th Percentile Salary,50.0,43408.0,12000.78,26700.0,34825.0,39400.0,49850.0,71900.0
Mid-Career 90th Percentile Salary,50.0,142766.0,27851.25,96400.0,124250.0,145500.0,161750.0,210000.0


In [5]:
df[df.isnull().any(axis=1)]  # check at least one True (NaN) in row

Unnamed: 0,Undergraduate Major,Starting Median Salary,Mid-Career Median Salary,Mid-Career 10th Percentile Salary,Mid-Career 90th Percentile Salary,Group
50,Source: PayScale Inc.,,,,,


In [6]:
df.dropna(axis=0, inplace=True)
df[df.isnull().any(axis=1)]

Unnamed: 0,Undergraduate Major,Starting Median Salary,Mid-Career Median Salary,Mid-Career 10th Percentile Salary,Mid-Career 90th Percentile Salary,Group


In [7]:
df.shape

(50, 6)

In [8]:
df.columns

Index(['Undergraduate Major', 'Starting Median Salary',
       'Mid-Career Median Salary', 'Mid-Career 10th Percentile Salary',
       'Mid-Career 90th Percentile Salary', 'Group'],
      dtype='object')

In [9]:
df.iloc[df["Mid-Career Median Salary"].idxmax()]

Undergraduate Major                  Chemical Engineering
Starting Median Salary                          63,200.00
Mid-Career Median Salary                       107,000.00
Mid-Career 10th Percentile Salary               71,900.00
Mid-Career 90th Percentile Salary              194,000.00
Group                                                STEM
Name: 8, dtype: object

In [10]:
df.iloc[df["Mid-Career Median Salary"].idxmin()]

Undergraduate Major                  Education
Starting Median Salary               34,900.00
Mid-Career Median Salary             52,000.00
Mid-Career 10th Percentile Salary    29,300.00
Mid-Career 90th Percentile Salary   102,000.00
Group                                     HASS
Name: 18, dtype: object

In [11]:
spread_col = df["Mid-Career 90th Percentile Salary"].subtract(
    df["Mid-Career 10th Percentile Salary"]
)  # alternatively: df[col_A] - df[col_B]
df.insert(1, "Spread", spread_col)
df.head()

Unnamed: 0,Undergraduate Major,Spread,Starting Median Salary,Mid-Career Median Salary,Mid-Career 10th Percentile Salary,Mid-Career 90th Percentile Salary,Group
0,Accounting,109800.0,46000.0,77100.0,42200.0,152000.0,Business
1,Aerospace Engineering,96700.0,57700.0,101000.0,64300.0,161000.0,STEM
2,Agriculture,113700.0,42600.0,71900.0,36300.0,150000.0,Business
3,Anthropology,104200.0,36800.0,61500.0,33800.0,138000.0,HASS
4,Architecture,85400.0,41600.0,76800.0,50600.0,136000.0,Business


In [12]:
# Low Risk
df.sort_values(
    by="Spread", ascending=True
)[["Undergraduate Major", "Spread"]].head()

Unnamed: 0,Undergraduate Major,Spread
40,Nursing,50700.0
43,Physician Assistant,57600.0
41,Nutrition,65300.0
49,Spanish,65400.0
27,Health Care Administration,66400.0


In [13]:
# Highest Potential
mid_90_col = "Mid-Career 90th Percentile Salary"
df.sort_values(by=mid_90_col, ascending=False)\
    [["Undergraduate Major", mid_90_col]].head()

Unnamed: 0,Undergraduate Major,Mid-Career 90th Percentile Salary
17,Economics,210000.0
22,Finance,195000.0
8,Chemical Engineering,194000.0
37,Math,183000.0
44,Physics,178000.0


In [14]:
# Highest Spread
df.sort_values(
    by="Spread", ascending=False
)[["Undergraduate Major", "Spread"]].head()

Unnamed: 0,Undergraduate Major,Spread
17,Economics,159400.0
22,Finance,147800.0
37,Math,137800.0
36,Marketing,132900.0
42,Philosophy,132500.0


In [15]:
df.groupby("Group").count()

Unnamed: 0_level_0,Undergraduate Major,Spread,Starting Median Salary,Mid-Career Median Salary,Mid-Career 10th Percentile Salary,Mid-Career 90th Percentile Salary
Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Business,12,12,12,12,12,12
HASS,22,22,22,22,22,22
STEM,16,16,16,16,16,16


In [16]:
df.groupby("Group").mean(numeric_only=True)

Unnamed: 0_level_0,Spread,Starting Median Salary,Mid-Career Median Salary,Mid-Career 10th Percentile Salary,Mid-Career 90th Percentile Salary
Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Business,103958.33,44633.33,75083.33,43566.67,147525.0
HASS,95218.18,37186.36,62968.18,34145.45,129363.64
STEM,101600.0,53862.5,90812.5,56025.0,157625.0
