# Introduction

## Trigger Warning: career-related anxiety

This notebook discusses the potentially negative employment outcomes of certain engineering major choices in very blunt terms. If you are an engineering student and prone to anxiety, reading this notebook may be a significant trigger. Proceed with caution.

## Goal

The goal of this project is to make some educated, data-driven guesses as to how likely an new engineering graduate is to find an entry-level engineering job in their discipline within a given year in the United States. As it currently stands, ***engineering majors are not created equal!*** Some are in high demand while others have extremely saturated job markets and, unfortunately, this is not common knowledge.

## Objectives

The objectives behind this goal are:
* For high-school and incoming college students, inform their educational decisions
* For current engineering students, help them assess how their current academic program may or may not match their career expectations
** For some students, this may entail right out switching majors
** For others, this may mean complementing their education through work experience, relevant extracurricular activities and/or ramping up their networking effort with industry professionals
At the very least, this effort should serve the purpose of informing them of what to expect as they graduate, something that simply is not done enough.

## Justification

Likelihood of finding a job is not the **ONLY** factor that should be taken into account when making career choices; but, on the other hand, it is currently underweighed by large swaths of the higher education student body.

Getting a bachelor's degree is a large investment of time, money and effort. Specifically, the money investment is split between education costs (tuition, fees, housing near college, living expenses, commuting costs, etc.) and foregone full-time wages while in school (usually these are much larger than tuition and fees, and you do not escape them even in countries with fully subsidized higher education). Therefore, not being able to find a full-time job relevant to one's college degree soon is a real, bonafide tragedy for most graduates.

## Note from the author

This effort has been done in good faith. I realize this may upset or discourage some (especially students in the final stages of one of the engineering programs described as having saturated job markets); this is not the intention. Nevertheless, I believe finding out about this early enough at the cost of anxiety is a much lesser tragedy than actually graduating after investing 4 or so years just to end up in less than favorable circumstances w.r.t. employment, in addition to the same anxiety (or worse).

Please feel free to run this Jupyter Notebook, check the results, judge the assumptions made and, if appropriate, contest the conclusions. Feedback (as filed issues or pull requests) is not only appreciated but encouraged.

In [1]:
import os
from typing import Tuple
from urllib import request, parse
from pathlib import PurePath

import pandas as pd
from pandas._typing import FrameOrSeries

# Get the data

First, a small helper function to download files from the net.

In [2]:
def retrieve_file_from_net(url : str) -> str:
    """ Download a file from a URL and save it to the local directory """
    parse_result : Tuple = parse.urlparse(url)
    filename : str = PurePath(parse_result.path).name
    local_path : str = os.path.join(".", filename)
    local_path, http_message = request.urlretrieve(url, local_path)
    print(http_message)
    return local_path

## Engineering graduation rates

The graduation information used in this notebook comes from the 2018 "Engineering by the Numbers" report published by the American Society for Engineering Education (http://www.asee.org); the full report can be found [here](https://ira.asee.org/wp-content/uploads/2019/07/2018-Engineering-by-Numbers-Engineering-Statistics-UPDATED-15-July-2019.pdf).

Specifically, the spreadsheets this next block downloads contains the number of 2018 engineering graduates per engineering discipline.

In [3]:
GRADUATION_DATA_1_URL : str = "http://ira.asee.org/wp-content/uploads/2019/07/Table1a.bachelors-Degrees-Awarded-by-Engineering-Discipline-over-5k.xlsx"
GRADUATION_DATA_2_URL : str = "http://ira.asee.org/wp-content/uploads/2019/07/Table1b.bachelors-Degrees-Awarded-by-Engineering-Discipline-under-5k.xlsx"

grad1_path : str = retrieve_file_from_net(GRADUATION_DATA_1_URL)
print(grad1_path)

grad2_path : str = retrieve_file_from_net(GRADUATION_DATA_2_URL)
print(grad2_path)

Date: Wed, 19 Aug 2020 20:07:30 GMT
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/7.2.23
Last-Modified: Mon, 15 Jul 2019 18:55:28 GMT
ETag: "28a0-58dbccdb92d63"
Accept-Ranges: bytes
Content-Length: 10400
Connection: close
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet


.\Table1a.bachelors-Degrees-Awarded-by-Engineering-Discipline-over-5k.xlsx
Date: Wed, 19 Aug 2020 20:07:30 GMT
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/7.2.23
Last-Modified: Mon, 15 Jul 2019 18:55:28 GMT
ETag: "242a-58dbccdbda201"
Accept-Ranges: bytes
Content-Length: 9258
Connection: close
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet


.\Table1b.bachelors-Degrees-Awarded-by-Engineering-Discipline-under-5k.xlsx


## Employment projections data

This data from the US Bureau of Labor Statistics (http://www.bls.gov) contains employment projections per occupation in the United States between the years 2018 and 2028.

In [4]:
EMPLOYMENT_DATA_URL : str = "https://www.bls.gov/emp/ind-occ-matrix/occupation.XLSX"

emp_path : str = retrieve_file_from_net(EMPLOYMENT_DATA_URL)
print(emp_path)

Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Last-Modified: Thu, 16 Apr 2020 17:00:42 GMT
Accept-Ranges: bytes
ETag: "8b17f98f1014d61:0"
X-Frame-Options: sameorigin
X-XSS-Protection: 1; mode=block
P3P: CP="NOI DSP COR NID CURaADMa OUR STP"
Content-Length: 531966
Strict-Transport-Security: max-age=31536000; includeSubDomains
Pool-Info: WP2
Date: Wed, 19 Aug 2020 20:07:32 GMT
Connection: close


.\occupation.XLSX


# Prepare the data

## Engineering graduation rates

### Some helper functions

In [5]:
def get_grad_count_by_major(df : FrameOrSeries, major : str) -> float:
    """ Returns the graduate count for a given major """
    return float(df.loc[df["major"] == major, "num_grads"])

def set_grad_count_by_major(df : FrameOrSeries, major : str, grad_count : float) -> None:
    """ Sets the graduate count for a given major to a given value """
    df.loc[df["major"] == major, "num_grads"] = grad_count

def rename_major(df : FrameOrSeries, major : str, new_major : str) -> None:
    """ Renames a given major """
    df.loc[df["major"] == major, "major"] = new_major

def drop_major(df : FrameOrSeries, major : str) -> None:
    """ Drops a major (entire row) from data frame
    
    explanation: https://thispointer.com/python-pandas-how-to-drop-rows-in-dataframe-by-conditions-on-column-values/
    """
    df.drop(df[df["major"] == major].index, inplace=True)
    df.reset_index(drop=True, inplace=True) # Regenerating the row labels

### Loading data into Pandas

In [6]:
grad1_df : FrameOrSeries = pd.read_excel(io=grad1_path, sheet_name="By Disciplines", skiprows=[-1])
grad1_df

Unnamed: 0,Discipline,Degree
0,Mechanical,31936.0
1,Computer Science (Inside Engr.),19082.0
2,Electrical,13767.0
3,Civil,12221.0
4,Chemical,11586.0
5,Computer Science (Outside Engr.),10398.0
6,Computer,7906.0
7,Biomedical,7130.0
8,Industrial/Manuf./Systems,6690.0
9,Other,5629.0


In [7]:
grad2_df : FrameOrSeries = pd.read_excel(io=grad2_path, sheet_name="By Disciplines", skiprows=[-1])
grad2_df

Unnamed: 0,Discipline,Degree
0,Aerospace,4148.0
1,Electrical/Computer,3344.0
2,Petroleum,2118.0
3,Engineering (General),2062.0
4,Metallurgical/Materials,1907.0
5,Biological/Agricultural,1388.0
6,Environmental,1288.0
7,Civil and Environmental,1156.0
8,Engr. Science and Engr. Physics,861.0
9,Architectural,642.0


### Culling unnecessary rows

In [8]:
grad1_df = grad1_df[0:10]
grad1_df

Unnamed: 0,Discipline,Degree
0,Mechanical,31936.0
1,Computer Science (Inside Engr.),19082.0
2,Electrical,13767.0
3,Civil,12221.0
4,Chemical,11586.0
5,Computer Science (Outside Engr.),10398.0
6,Computer,7906.0
7,Biomedical,7130.0
8,Industrial/Manuf./Systems,6690.0
9,Other,5629.0


In [9]:
grad2_df = grad2_df[0:13]
grad2_df

Unnamed: 0,Discipline,Degree
0,Aerospace,4148.0
1,Electrical/Computer,3344.0
2,Petroleum,2118.0
3,Engineering (General),2062.0
4,Metallurgical/Materials,1907.0
5,Biological/Agricultural,1388.0
6,Environmental,1288.0
7,Civil and Environmental,1156.0
8,Engr. Science and Engr. Physics,861.0
9,Architectural,642.0


### Merging both tables into one contiguous one

In [10]:
grad_df : FrameOrSeries = pd.concat([grad1_df, grad2_df], ignore_index=True)
grad_df.columns = ["major", "num_grads"]
grad_df

Unnamed: 0,major,num_grads
0,Mechanical,31936.0
1,Computer Science (Inside Engr.),19082.0
2,Electrical,13767.0
3,Civil,12221.0
4,Chemical,11586.0
5,Computer Science (Outside Engr.),10398.0
6,Computer,7906.0
7,Biomedical,7130.0
8,Industrial/Manuf./Systems,6690.0
9,Other,5629.0


### Merging Computer Science

Merging the two rows for Computer Science (it does not matter whether the CS program at a school was in the school of engineering or another school within the university).

In [11]:
cs_in_grads : float = get_grad_count_by_major(grad_df, "Computer Science (Inside Engr.)")
cs_out_grads : float = get_grad_count_by_major(grad_df, "Computer Science (Outside Engr.)")

set_grad_count_by_major(grad_df, "Computer Science (Inside Engr.)", (cs_in_grads + cs_out_grads))

rename_major(grad_df, "Computer Science (Inside Engr.)", "Computer Science")

drop_major(grad_df, "Computer Science (Outside Engr.)")

grad_df

Unnamed: 0,major,num_grads
0,Mechanical,31936.0
1,Computer Science,29480.0
2,Electrical,13767.0
3,Civil,12221.0
4,Chemical,11586.0
5,Computer,7906.0
6,Biomedical,7130.0
7,Industrial/Manuf./Systems,6690.0
8,Other,5629.0
9,Aerospace,4148.0


### Splitting EE/CompE combined entry and merging

Many universities do not have a separate computer engineering degree from their electrical engineering one. Yet, the data later used in this notebook for employment projections does specifically separate the two disciplines. So, for purposes of splitting the data from the combined degree, we will use the ratio of the two separate majors. Namely:

Let EE be the number of electrical engineering graduates, CompE be the number of computer engineering graduates, and ECE the number of graduates from the combined degree.

Update EE and CompE accordingly:

$$EE := EE + \frac{EE}{EE + CompE}ECE,\;\;\;CompE := CompE + \frac{CompE}{EE + CompE}ECE$$


In [12]:
ee_grads : float = get_grad_count_by_major(grad_df, "Electrical")
compe_grads : float = get_grad_count_by_major(grad_df, "Computer")
ece_grads : float = get_grad_count_by_major(grad_df, "Electrical/Computer")

ee_grads_new : float = ee_grads + round(ee_grads * ece_grads / (ee_grads + compe_grads))
compe_grads_new : float = compe_grads + round(compe_grads * ece_grads / (ee_grads + compe_grads))

set_grad_count_by_major(grad_df, "Electrical", ee_grads_new)
set_grad_count_by_major(grad_df, "Computer", compe_grads_new)

drop_major(grad_df, "Electrical/Computer")

grad_df

Unnamed: 0,major,num_grads
0,Mechanical,31936.0
1,Computer Science,29480.0
2,Electrical,15891.0
3,Civil,12221.0
4,Chemical,11586.0
5,Computer,9126.0
6,Biomedical,7130.0
7,Industrial/Manuf./Systems,6690.0
8,Other,5629.0
9,Aerospace,4148.0


### Splitting Civil/Environmental combined entry and merging

Similarly to EE and CompE, many universities do not have a separate environmental engineering degree from their civil engineering one. Yet, the data later used in this notebook for employment projections does specifically separate the two disciplines. So, for purposes of splitting the data from the combined degree, we will use the ratio of the two separate majors. Namely:

Let CE be the number of civil engineering graduates, EnvE be the number of environmental engineering graduates, and CEnvE the number of graduates from the combined degree.

Update CE and EnvE accordingly:

$$CE := CE + \frac{CE}{CE + EnvE}CEnvE,\;\;\;EnvE := EnvE + \frac{EnvE}{CE + EnvE}CEnvE$$


In [13]:
ce_grads : float = get_grad_count_by_major(grad_df, "Civil")
enve_grads : float = get_grad_count_by_major(grad_df, "Environmental")
cenve_grads : float = get_grad_count_by_major(grad_df, "Civil and Environmental")

ce_grads_new : float = ce_grads + round(ce_grads * cenve_grads / (ce_grads + enve_grads))
enve_grads_new : float = enve_grads + round(enve_grads * cenve_grads / (ce_grads + enve_grads))

set_grad_count_by_major(grad_df, "Civil", ce_grads_new)
set_grad_count_by_major(grad_df, "Environmental", enve_grads_new)

drop_major(grad_df, "Civil and Environmental")

grad_df

Unnamed: 0,major,num_grads
0,Mechanical,31936.0
1,Computer Science,29480.0
2,Electrical,15891.0
3,Civil,13267.0
4,Chemical,11586.0
5,Computer,9126.0
6,Biomedical,7130.0
7,Industrial/Manuf./Systems,6690.0
8,Other,5629.0
9,Aerospace,4148.0


### Splitting Architectural Engineering

Architectural engineering (or building engineering) is a comparetively new major, and is focused on building design as a whole, taking elements from civil, mechanical and electrical engineering. Unfortunately, the data from the US Bureau of Labor Statistics does not have employment data specific to this career; so, for purposes of splitting the data from the combined degree, we will use the ratio of those three separate majors to split architectural engineering graduates. Namely:

Let CE be the number of civil engineering graduates, ME be the number of mechanical engineering graduates, EE the number of electrical engineering graduates and ArchE the number of architectural engineering graduates.

Update CE, ME and EE accordingly:

$$CE := CE + \frac{CE}{CE + ME + EE}ArchE,\;\;\;ME := ME + \frac{ME}{CE + ME + EE}ArchE,\;\;\;EE := EE + \frac{EE}{CE + ME + EE}ArchE$$


In [14]:
ce_grads : float = get_grad_count_by_major(grad_df, "Civil")
me_grads : float = get_grad_count_by_major(grad_df, "Mechanical")
ee_grads : float = get_grad_count_by_major(grad_df, "Electrical")
arche_grads : float = get_grad_count_by_major(grad_df, "Architectural")

ce_grads_new : float = ce_grads + round(ce_grads * arche_grads / (ce_grads + me_grads + ee_grads))
me_grads_new : float = me_grads + round(me_grads * arche_grads / (ce_grads + me_grads + ee_grads))
ee_grads_new : float = ee_grads + round(ee_grads * arche_grads / (ce_grads + me_grads + ee_grads))

set_grad_count_by_major(grad_df, "Civil", ce_grads_new)
set_grad_count_by_major(grad_df, "Mechanical", me_grads_new)
set_grad_count_by_major(grad_df, "Electrical", ee_grads_new)

drop_major(grad_df, "Architectural")

grad_df

Unnamed: 0,major,num_grads
0,Mechanical,32272.0
1,Computer Science,29480.0
2,Electrical,16058.0
3,Civil,13406.0
4,Chemical,11586.0
5,Computer,9126.0
6,Biomedical,7130.0
7,Industrial/Manuf./Systems,6690.0
8,Other,5629.0
9,Aerospace,4148.0


### Splitting Engineering (General) and Other

A bachelor's of engineering (BEng) in the United States usually implies a degree that (i) is highly flexible and (ii) is not accredited by ABET. Due to its flexibility, likely the most proper course of action is to find a way to distribute it across other majors to better match their respective occupation.

The "Other" entry likely refers to uncommon engineering degrees that likely end up leading into occupations related to othermore established majors. They should likely be handled like the BEng degree.

For purposes of this notebook, we will exclude computer science.

Following the same conventions as above:

* Let ME be the number of mechanical engineering graduates
* Let EE be the number of electrical engineering graduates
* Let CE be the number of civil engineering graduates
* Let ChemE be the number of chemical engineering graduates
* Let CompE be the number of computer engineering graduates
* Let BioME be the number of biomedical engineering graduates
* Let IndE be the number of industrial, manufacturing and systems engineering graduates
* Let AE be the number of aerospace engineering graduates
* Let PE be the number of petroleum engineering graduates
* Let MatE be the number of materials and metallurgic engineering graduates
* Let AgrE be the number of agriculture and biological engineering graduates
* Let EnvE be the number of environmental engineering graduates
* Let NE be the number of nuclear engineering graduates
* Let MinE be the number of mining engineering graduates
* Let GE be the number of engineering (general) graduates
* Let OE be the number of graduates of other engineering programs

For each engineering disciplines except the last two, update as follows

$$E := E + \frac{E}{ME + EE + ...}(GE + OE)$$

In [15]:
me_grads : float = get_grad_count_by_major(grad_df, "Mechanical")
ee_grads : float = get_grad_count_by_major(grad_df, "Electrical")
ce_grads : float = get_grad_count_by_major(grad_df, "Civil")
cheme_grads : float = get_grad_count_by_major(grad_df, "Chemical")
compe_grads : float = get_grad_count_by_major(grad_df, "Computer")
biome_grads : float = get_grad_count_by_major(grad_df, "Biomedical")
inde_grads : float = get_grad_count_by_major(grad_df, "Industrial/Manuf./Systems")
ae_grads : float = get_grad_count_by_major(grad_df, "Aerospace")
pe_grads : float = get_grad_count_by_major(grad_df, "Petroleum")
mate_grads : float = get_grad_count_by_major(grad_df, "Metallurgical/Materials")
agre_grads : float = get_grad_count_by_major(grad_df, "Biological/Agricultural")
enve_grads : float = get_grad_count_by_major(grad_df, "Environmental")
ne_grads : float = get_grad_count_by_major(grad_df, "Nuclear")
mine_grads : float = get_grad_count_by_major(grad_df, "Mining")
ge_grads : float = get_grad_count_by_major(grad_df, "Engineering (General)")
oe_grads : float = get_grad_count_by_major(grad_df, "Other")

denominator : float = me_grads + ee_grads + ce_grads + cheme_grads + compe_grads + biome_grads + inde_grads + ae_grads + pe_grads + mate_grads + agre_grads + enve_grads + ne_grads + mine_grads

me_grads_new : float = me_grads + round(me_grads * (ge_grads + oe_grads) / denominator)
ee_grads_new : float = ee_grads + round(ee_grads * (ge_grads + oe_grads) / denominator)
ce_grads_new : float = ce_grads + round(ce_grads * (ge_grads + oe_grads) / denominator)
cheme_grads_new : float = cheme_grads + round(cheme_grads * (ge_grads + oe_grads) / denominator)
compe_grads_new : float = compe_grads + round(compe_grads * (ge_grads + oe_grads) / denominator)
biome_grads_new : float = biome_grads + round(biome_grads * (ge_grads + oe_grads) / denominator)
inde_grads_new : float = inde_grads + round(inde_grads * (ge_grads + oe_grads) / denominator)
ae_grads_new : float = ae_grads + round(ae_grads * (ge_grads + oe_grads) / denominator)
pe_grads_new : float = pe_grads + round(pe_grads * (ge_grads + oe_grads) / denominator)
mate_grads_new : float = mate_grads + round(mate_grads * (ge_grads + oe_grads) / denominator)
agre_grads_new : float = agre_grads + round(agre_grads * (ge_grads + oe_grads) / denominator)
enve_grads_new : float = enve_grads + round(enve_grads * (ge_grads + oe_grads) / denominator)
ne_grads_new : float = ne_grads + round(ne_grads * (ge_grads + oe_grads) / denominator)
mine_grads_new : float = mine_grads + round(mine_grads * (ge_grads + oe_grads) / denominator)

set_grad_count_by_major(grad_df, "Mechanical", me_grads_new)
set_grad_count_by_major(grad_df, "Electrical", ee_grads_new)
set_grad_count_by_major(grad_df, "Civil", ce_grads_new)
set_grad_count_by_major(grad_df, "Chemical", cheme_grads_new)
set_grad_count_by_major(grad_df, "Computer", compe_grads_new)
set_grad_count_by_major(grad_df, "Biomedical", biome_grads_new)
set_grad_count_by_major(grad_df, "Industrial/Manuf./Systems", inde_grads_new)
set_grad_count_by_major(grad_df, "Aerospace", ae_grads_new)
set_grad_count_by_major(grad_df, "Petroleum", pe_grads_new)
set_grad_count_by_major(grad_df, "Metallurgical/Materials", mate_grads_new)
set_grad_count_by_major(grad_df, "Biological/Agricultural", agre_grads_new)
set_grad_count_by_major(grad_df, "Environmental", enve_grads_new)
set_grad_count_by_major(grad_df, "Nuclear", ne_grads_new)
set_grad_count_by_major(grad_df, "Mining", mine_grads_new)

drop_major(grad_df, "Engineering (General)")
drop_major(grad_df, "Other")

grad_df

Unnamed: 0,major,num_grads
0,Mechanical,34570.0
1,Computer Science,29480.0
2,Electrical,17201.0
3,Civil,14360.0
4,Chemical,12411.0
5,Computer,9776.0
6,Biomedical,7638.0
7,Industrial/Manuf./Systems,7166.0
8,Aerospace,4443.0
9,Petroleum,2269.0


### Specific considerations for certain majors

#### Computer Science

There is debate on whether computer science should be considered an engineering major, both in CS and in more traditional engineering circles.

Arguments in favor:

* In the United States, computer science programs are usually part of the school of engineering within a university more often than not (source: "Engineering by the Numbers" 2018 report referenced above).
* The American Society for Engineering Education (from where the graduation data used in this notebook comes from) tracks computer science regardless of whether hosted within a school of engineering or not.
* A common designation for the most typical occupation a computer science graduate usually pursues is "software engineer".
* It is said that many software development jobs do not really apply principles of science and mathematics to their work. Although, this difference is already taken into account in the occupation data from the US Bureau of Labor Statistics by having separate entries for "Computer programmer" (signaling less complex job with more mundane, non-mathematical tasks, usually under the perview of a software developer or engineer) and "Web developer" (for development tasks related to web-specific, technical yet simple tasks such as front-end development). This notebook intentionally kept both of those categories (that sometimes siphon some computer science graduates) out of the occupations considered for computer science.
* In some countries and several American states the "engineer" designation is protected and usually gatekept by a professional licensure exam. Nonetheless, tens of thousands of engineers in engineering disciplines never go on to take such exam, yet they still practice engineering. Therefore, using the PE exam to "keep computer science out of engineering" is inconsistent.

Decision: keep major

#### Engineering Science and Engineering Physics

These majors support and often participate in engineering work as individual contributors, so I would normally include them as engineering majors. Nevertheless, the US Bureau of Labor Statistics does not keep separate occupational data for careers that directly match these majors.

Decision: drop majors

#### Engineering Management

Management in engineering usually come from two sources: senior engineers moving into management roles, or people initially trained in management who serve in supporting, non-technical roles until they achieve enough seniority to take on leadership (e.g. a newly graduated videogame producer who eventually becomes game development manager).

Neither of these roles are immediately available to new engineering graduates, and therefore are not counted in this report.

Decision: drop major

In [16]:
drop_major(grad_df, "Engr. Science and Engr. Physics")
drop_major(grad_df, "Engineering Management")

grad_df

Unnamed: 0,major,num_grads
0,Mechanical,34570.0
1,Computer Science,29480.0
2,Electrical,17201.0
3,Civil,14360.0
4,Chemical,12411.0
5,Computer,9776.0
6,Biomedical,7638.0
7,Industrial/Manuf./Systems,7166.0
8,Aerospace,4443.0
9,Petroleum,2269.0


## Employment projections data

### Helper functions

In [17]:
def get_job_count_by_occupation(df : FrameOrSeries, occupation : str) -> float:
    """ Returns the average annual humber of job openings for a given occupation """
    return float(df.loc[df["occupation"] == occupation, "num_jobs"])

### Loading data into Pandas

In [18]:
emp_df : FrameOrSeries = pd.read_excel(io=emp_path, sheet_name="Table 1.2", skiprows=[0])
emp_df

Unnamed: 0,2018 National Employment Matrix title and code,Unnamed: 1,Occupation type,Employment,Unnamed: 4,Unnamed: 5,Unnamed: 6,"Change, 2018-28",Unnamed: 8,"Occupational openings, 2018-28 annual average"
0,,,,Number,,Percent distribution,,,,
1,,,,2018,2028.0,2018,2028.0,Number,Percent,
2,"Total, all occupations",00-0000,Summary,161038,169435.9,100,100.0,8398.1,5.2,19694.0
3,Management occupations,11-0000,Summary,10193.3,10900.2,6.3,6.4,706.9,6.9,955.1
4,Top executives,11-1000,Summary,2691.5,2844.8,1.7,1.7,153.4,5.7,251.1
...,...,...,...,...,...,...,...,...,...,...
1074,Refuse and recyclable material collectors,53-7081,Line item,133,143.9,0.1,0.1,10.9,8.2,20.2
1075,Mine shuttle car operators,53-7111,Line item,1.7,1.3,0,0.0,-0.4,-25.3,0.1
1076,"Tank car, truck, and ship loaders",53-7121,Line item,9.1,9.2,0,0.0,0.1,1.5,1.2
1077,"Material moving workers, all other",53-7199,Line item,27.6,28.9,0,0.0,1.3,4.7,3.7


### Culling unnecessary rows and columns

In [19]:
# culling unnecessary rows at beginning and sources at the end
emp_df = emp_df[2:1078]

# dropping columns with unneeded stats
emp_df.drop(emp_df.columns[3:9], axis=1, inplace=True)

# dropping column with occupation code
emp_df.drop(emp_df.columns[1], axis=1, inplace=True)

# filtering by "engineers" and "Software developers" in occupation name
emp_df = emp_df[emp_df['2018 National Employment Matrix title and code'].str.contains("engineers") | emp_df['2018 National Employment Matrix title and code'].str.contains("Software developers")]

# filtering occupations traditionally named "engineer" but that do not require an engineering degree (e.g. train engineer)
emp_df = emp_df[emp_df.index < 200]

# removing rows for categories (as opposed to individual occupations)
emp_df = emp_df[emp_df['Occupation type'].str.contains("Line item")]

# dropping column for categories (not needed anymore)
emp_df.drop(emp_df.columns[1], axis=1, inplace=True)

# renaming columns (names are unwieldly long)
emp_df.columns = ["occupation", "num_jobs"] 

# Multiplying number of openings by 1000 (unit is thousands of jobs)
emp_df["num_jobs"] = emp_df["num_jobs"] * 1000

emp_df

Unnamed: 0,occupation,num_jobs
90,"Software developers, applications",99200.0
91,"Software developers, systems software",35400.0
116,Aerospace engineers,4500.0
117,Agricultural engineers,200.0
118,Biomedical engineers,1500.0
119,Chemical engineers,2400.0
120,Civil engineers,28300.0
121,Computer hardware engineers,5200.0
123,Electrical engineers,13900.0
124,"Electronics engineers, except computer",9000.0


# Merging the data

The next step is to match one or more specific occupations to their corresponding major.

## Discussion: match vs mismatch

It is well understood than many graduates go on to work on fields other than the one they received their formal education. The objective of this occupation-to-major matching is **not to imply a one-to-one correspondence** between occupation and major but, rather, it is to specify the most natural match between a university program and their intended career paths. This is somewhat easier to do in engineering (e.g. mechanical engineers naturaly map to jobs where a degree in mechanical engineering makes the most sense; likewise, computer science majors tend to map most directly to software engineer/developer jobs).

Why is there a distinction between a "direct match" and more casual matches between graduates and jobs (e.g. a mechanical engineer getting a job as a software developer)? It is the author's belief that one is better served studying the college major that most closely matches one's desired career path. Following this assertion, an instance of a mechanical engineer going to work as a software engineer or a data scientist would be a mismatch; the mechanical engineer graduate would have been better served by graduating in computer science (for software development), or statistics or applied math (for data science).

An underlying (and subjectively reasonable) assumption throughout this notebook is that one goes to university to study something they can make a career out of and, failing that, then one may look for other options as a less-than-optimal outcome. Therefore, we are only counting direct matches in our final report and using it as an optimizing metric for any proposed solutions.

## Discussion: ratio of entry-level job openings to total job openings

The average annual job openings per occupation data from the US Bureau of Labor Statisics does not discriminate between entry-level jobs vs. jobs that require more seniority. This means that, for a given occupation, the job openings counted are not all available to new graduates.

It is a known fact that a college education often provides only a basis of knowledge to enter a field, and does not provide significant actual hands-on experience (at least not to the level expected of a mid-level worker). There is a significant cost and investmenmt to an employer in training new graduates until they become fully contributing participants at work (and, even then, professional development is something that should continue throughout one's professional career).

Therefore, how can we estimate how many of the job openings for a given occupation target new graduates? It is hard to know. This notebook makes some significant assumptions, which are listed in the next two subsections.

* The number of job openings available to new graduates is between **1/3 and 1/4** of total job openings.
* The ratio of new graduate jobs to total jobs is **fairly similar across engineering disciplines**.

These two are bold claims and are recognized as a significant weakness in this report. Nevertheless, if the second assumption is correct, then the resulting information is still useful at least as a method of comparison across engineering disciplines (i.e. "off by a factor").

The data for the first claim is purely circumstantial: the author performed a search in http://www.indeed.com for entry-level jobs in technology, and found that the ratio of entry-level jobs to total jobs was around 1/3 to 1/4. There is no concrete source for the second and more significant assumption.

For the rest of this notebook we will use the more optimistic value of 1/3 as the ratio of entry-level jobs to total jobs and see what types of conclusions we can draw from it.

## Helper functions

In [20]:
def get_value_by_major(df : FrameOrSeries, major : str, col_label : str) -> float:
    """ Gets a specific column value for a given major """
    return float(df.loc[df["major"] == major, col_label])

def set_value_by_major(df : FrameOrSeries, major : str, col_label : str, value : float) -> None:
    """ Sets a specific column value for a given major """
    df.loc[df["major"] == major, col_label] = value

def compute_stats_for_major(df : FrameOrSeries, major : str) -> None:
    """ Computes the ratio of new graduates to entry-level jobs
    
    Based on the updated average annual number of job openings for a specific major, compute the 
    ratio of new graduates to estimated average annual number of entry-level job openings.
    """
    num_grads : float = get_value_by_major(final_df, major, "num_grads")
    num_jobs : float = get_value_by_major(final_df, major, "num_jobs")
    entry_level_ratio : float = get_value_by_major(final_df, major, "ratio_entry_level_total_jobs")
    final_ratio : float = num_grads / num_jobs / entry_level_ratio
    set_value_by_major(final_df, major, "ratio_grads_entry_level_jobs", final_ratio)

In [21]:
RATIO_ENTRY_LEVEL_TO_TOTAL_JOBS : float = 1.0/3.0

In [22]:
final_df : FrameOrSeries = grad_df.copy(deep=True)
final_df.insert(2, "num_jobs", 0.0)
final_df.insert(3, "ratio_entry_level_total_jobs", RATIO_ENTRY_LEVEL_TO_TOTAL_JOBS)
final_df.insert(4, "ratio_grads_entry_level_jobs", -1.0)
final_df

Unnamed: 0,major,num_grads,num_jobs,ratio_entry_level_total_jobs,ratio_grads_entry_level_jobs
0,Mechanical,34570.0,0.0,0.333333,-1.0
1,Computer Science,29480.0,0.0,0.333333,-1.0
2,Electrical,17201.0,0.0,0.333333,-1.0
3,Civil,14360.0,0.0,0.333333,-1.0
4,Chemical,12411.0,0.0,0.333333,-1.0
5,Computer,9776.0,0.0,0.333333,-1.0
6,Biomedical,7638.0,0.0,0.333333,-1.0
7,Industrial/Manuf./Systems,7166.0,0.0,0.333333,-1.0
8,Aerospace,4443.0,0.0,0.333333,-1.0
9,Petroleum,2269.0,0.0,0.333333,-1.0


In [23]:
# Mechanical
me_jobs : float = get_job_count_by_occupation(emp_df, "Mechanical engineers")
nave_jobs : float = get_job_count_by_occupation(emp_df, "Marine engineers and naval architects")
set_value_by_major(final_df, "Mechanical", "num_jobs", me_jobs + nave_jobs)
compute_stats_for_major(final_df, "Mechanical")

# Computer Science
sda_jobs : float = get_job_count_by_occupation(emp_df, "Software developers, applications")
sds_jobs : float = get_job_count_by_occupation(emp_df, "Software developers, systems software")
set_value_by_major(final_df, "Computer Science", "num_jobs", sda_jobs + sds_jobs)
compute_stats_for_major(final_df, "Computer Science")

# Electrical
ee_jobs : float = get_job_count_by_occupation(emp_df, "Electrical engineers")
elece_jobs : float = get_job_count_by_occupation(emp_df, "Electronics engineers, except computer")
set_value_by_major(final_df, "Electrical", "num_jobs", ee_jobs + elece_jobs)
compute_stats_for_major(final_df, "Electrical")

# Civil
ce_jobs : float = get_job_count_by_occupation(emp_df, "Civil engineers")
set_value_by_major(final_df, "Civil", "num_jobs", ce_jobs)
compute_stats_for_major(final_df, "Civil")

# Chemical
cheme_jobs : float = get_job_count_by_occupation(emp_df, "Chemical engineers")
set_value_by_major(final_df, "Chemical", "num_jobs", cheme_jobs)
compute_stats_for_major(final_df, "Chemical")

# Computer
compe_jobs : float = get_job_count_by_occupation(emp_df, "Computer hardware engineers")
set_value_by_major(final_df, "Computer", "num_jobs", compe_jobs)
compute_stats_for_major(final_df, "Computer")

# Biomedical
biome_jobs : float = get_job_count_by_occupation(emp_df, "Biomedical engineers")
set_value_by_major(final_df, "Biomedical", "num_jobs", biome_jobs)
compute_stats_for_major(final_df, "Biomedical")

# Industrial
ie_jobs : float = get_job_count_by_occupation(emp_df, "Industrial engineers")
set_value_by_major(final_df, "Industrial/Manuf./Systems", "num_jobs", ie_jobs)
compute_stats_for_major(final_df, "Industrial/Manuf./Systems")

# Aerospace
ae_jobs : float = get_job_count_by_occupation(emp_df, "Aerospace engineers")
set_value_by_major(final_df, "Aerospace", "num_jobs", ae_jobs)
compute_stats_for_major(final_df, "Aerospace")

# Petroleum
pe_jobs : float = get_job_count_by_occupation(emp_df, "Petroleum engineers")
set_value_by_major(final_df, "Petroleum", "num_jobs", pe_jobs)
compute_stats_for_major(final_df, "Petroleum")

# Materials
mate_jobs : float = get_job_count_by_occupation(emp_df, "Materials engineers")
set_value_by_major(final_df, "Metallurgical/Materials", "num_jobs", mate_jobs)
compute_stats_for_major(final_df, "Metallurgical/Materials")

# Agricultural
agre_jobs : float = get_job_count_by_occupation(emp_df, "Agricultural engineers")
set_value_by_major(final_df, "Biological/Agricultural", "num_jobs", agre_jobs)
compute_stats_for_major(final_df, "Biological/Agricultural")

# Environmental
enve_jobs : float = get_job_count_by_occupation(emp_df, "Environmental engineers")
set_value_by_major(final_df, "Environmental", "num_jobs", enve_jobs)
compute_stats_for_major(final_df, "Environmental")

# Nuclear
ne_jobs : float = get_job_count_by_occupation(emp_df, "Nuclear engineers")
set_value_by_major(final_df, "Nuclear", "num_jobs", ne_jobs)
compute_stats_for_major(final_df, "Nuclear")

# Mining
mine_jobs : float = get_job_count_by_occupation(emp_df, "Mining and geological engineers, including mining safety engineers")
set_value_by_major(final_df, "Mining", "num_jobs", mine_jobs)
compute_stats_for_major(final_df, "Mining")

final_df

Unnamed: 0,major,num_grads,num_jobs,ratio_entry_level_total_jobs,ratio_grads_entry_level_jobs
0,Mechanical,34570.0,23700.0,0.333333,4.375949
1,Computer Science,29480.0,134600.0,0.333333,0.657058
2,Electrical,17201.0,22900.0,0.333333,2.253406
3,Civil,14360.0,28300.0,0.333333,1.522261
4,Chemical,12411.0,2400.0,0.333333,15.51375
5,Computer,9776.0,5200.0,0.333333,5.64
6,Biomedical,7638.0,1500.0,0.333333,15.276
7,Industrial/Manuf./Systems,7166.0,22600.0,0.333333,0.951239
8,Aerospace,4443.0,4500.0,0.333333,2.962
9,Petroleum,2269.0,2500.0,0.333333,2.7228


# Conclusions

If we consider the assumptions made so far to be true, then we can make the following conclusions:
* Some engineering majors experience almost no job saturation: computer science, industrial/manufacturing/systems engineering and environmental engineering. According to these ratios, basically all new graduates willing to relocate for jobs should be able to find one.
    * One thing should be said about computer science: the ratio is extremely good (2/3), but in actuality it is very likely higher than 1. While the number of jobs is huge, one also has to take into account competition coming from non-traditional paths:
        * A significantly large spillover of graduates from engineering disciplines with higher saturation rates often choose software jobs as well.
        * A few companies often strongly rely on outsourcing
        * Graduates from bootcamps and autodidacts sometimes can take part of the software development/engineering job market
* Some majors have *extremely high* job market saturation. Agricultural engineers seem to have it worst, but it is very worrysome than very popular major such as chemical and biomedical engineering majors also have such levels of job market saturation.


# Discussion

## What next?

What is next for engineering students already in some of the majors with saturated job markets? One possible course of action is to change majors to something with a better job market. It is easier said than done: while this may be relatively simple for an incoming first year, it is a much more difficult and less optimal endeavor for a third or fourth year student.

There are many things any graduate can do to increase their chances of getting an entry-level engineering job upon graduation. While all should be encouraged to do these, people facing saturated jobs markets have all the more reason to do so. Some of these include:

### Work experience

This is fairly obvious: having internships, co-ops or part-time engineering jobs while in school increase the chances of a new graduate finding a job upon graduation. In general, potential employers prefer new graduates with experience as a way of ameliorating their initial training costs.

But this becomes a chicken-and-egg problem: needing work experience to get work experience. To help break this cycle, consider the next two suggestions.

### Networking with professionals in industry

Instead of solely relying on career fairs and online applications, students should try to interact with people in industry. The best methods are introduction through faculty connections, industry recruiting events, alumni network events, mentoring programs sponsored by the university, and even cold contacts through LinkedIn.

The author of this notebook has witnessed great levels of success through this method.

### Major-relevant extracurricular activities

Relevant extracurricula activities can also serve to make a new graduate's application stand out. Some of these could be interesting personal projects, meaningful participation in university engineering clubs, undergraduate research experience, among others.

The most important consideration is to have an effective way of **showcasing** such extracurricular activities through a portfolio. This can be done as a website, online code reposiroty, videos hosted in a streaming service, or even as a slideshow in PDF format, for example.

## One word about computer engineering

I believe computer engineering is a very good major in spite of having such high ratio of new graduates to entry-level jobs. This is because they tend to also be a good (although IMHO less optimal; see below) entry point into the software engineer/developer occupation.

Computer engineering is often thought of as a mix between electrical engineering and computer science. Unfortunately, I believe this diminishes its main strength: its focus on digital computer systems (i.e. semiconductor design and implementation, computer architecture, etc.) Most of their software coursework is often from a Computer Science department, but it tends to focus on computer systems more so than more typical computer science subjects (e.g. data structures and algorithms, applications, etc.) This is the main reason why this notebook does not specifically attempt to split software developer jobs across computer science and computer engineering (while not entirely considering it a mismatch).

I do content a point: if one's goal is to go on to a career *exclusively* focused on software, then a computer science degree is likely not only a better fit but also a more accesible path. In my experience, electrical/computer engineering programs are hard due to typical engineering teaching practices and idiosyncracies: tests are fairly difficult, lots of conceptually hard subjects (namely tons of physics), etc. On the other hand, computer science is hard because of time commitment: programming assignments are very time-consuming, and often there's not much room for partial credit (either the software does its job or it does not). I personally think that, for a time-conscious, well-organized, good planning student, a computer science program will likely be easier. This results is better grades and especially in more time and opportunities to work on one's software portfolio (immensely important software job hunting tool).

Further, the argument that studying computer engineering solely to have access to both software developer and computer hardware engineer job markets is IMHO not very convincing. The differences in scale between their respective entry-level job markets is too large (44.9k vs 1.7k) to make it a real consideration.


# TODO

* Revisit assumptions about ratio of annual average entry-level job openingss to annual average total job openings more in depth (very likely the most significant assumption in this notebook).
* Evaluate whether to (and how) include job market growth statistics in this report or not
* Check assumptions about split of software developer job openings among computer science, computer engineering and computer information systems.
* Check assumptions about the impact of non-traditional career paths accessing software development jobs (i.e. spillage of graduates from other engineering disciplines, bootcamp graduates, self-taught developers, outsourcing and H1B consultants).
* Get feedback on method (i.e. linear interpolation) of splitting combined majors and non-descriptive engineering disciplines (e.g. BEng and other).
* Get domain-specific feedback on assumptions about splitting of architectural engineering graduates.
* Get domain-specific feedback on merging of naval and marine engineering jobs into mechanical engineering jobs.
* Get domain-specific feedback on merging of electronics engineering jobs into electrical engineering jobs.
* Get feedback on assumptions about engineering management and decision to drop it.
* Assess the impact of engineering students taking engineering technician jobs (which have their own employment projections data from the Us Bureau of Labor Statistics), and of engineering technology students taking engineering jobs
* Considering how to add sales engineering jobs into the final data frame