# Task 1 - Data Modeling

In this notebook, I perform an exploratory analysis on the data tables provided in the `\data` folder within this repo, and create a sqlite database in which to launch queries to answer specific business questions regarding employers in the jobs dataset. 

## Data relationships
In the following I find that the .csv files are best represented using a SNOWFLAKE model where the `postings.csv` acts as the fact table, and all others as dimenstion tables. While as a whole, a SNOWFLAKE schema is best used to represent the data (because not all tables have 1 degree of separation in relation to the fact table, some have 2) a simpler STAR schema is the most practical to create a database in which I can answer basic business questions defined by the client. 

## Final Schema
A simple STAR schema is the most practical to create a database in which I can answer basic business questions defined by the client.
Specifically, `postings.csv` acts as the fact_table `fact_job_postings` and `company_industries.csv` acts as the dimension table `dim_company`. These tables are related via the "company_id" column that exists within both tables. More like a SHARD schema.


## Insights gained
The following business questions were asked about the dataset, here they are summarized, please view the respective sections for SQL queries for more detailed answers

1. How many companies have more than one job posting?: `601`
2. How many job postings are there for each job industry?: `The range from 1010 in Hospitals and Health Care, to 1 in Government Relations Services`
3. What is the average normalized salary by company industry?: `They range from 250,000 in Information Services to, NONE in sectors where there was insufficient data to state an average.`
4. Name the top 5 companies with the highest average normalized salary for their job postings.:

| Company Name                                      | Salary       |
|--------------------------------------------------|--------------|
| Woodside Staffing Solutions & Consulting         | 337,500.00   |
| Calm                                             | 337,500.00   |
| Health eCareers                                  | 337,246.41   |
| Buck Institute for Research on Aging             | 300,000.00   |
| Spire Orthopedic Partners                        | 284,124.00   |

Exorbidantly high, but this is because often the number jobs posted by that company is just 1, so the average is the single datapoint, perhaps the CEO?



### Insustries with insufficient salary data

For quick reference here is the list of industries where there isnt enough information to give an average norm. salary
| Industry Category                                               | Value |
|----------------------------------------------------------------|--------|
| Writing and Editing                                            | None   |
| Recreational Facilities                                        | None   |
| Public Safety                                                  | None   |
| Printing Services                                              | None   |
| Performing Arts                                                | None   |
| Outsourcing and Offshoring Consulting                          | None   |
| Machinery Manufacturing                                        | None   |
| Libraries                                                      | None   |
| Government Relations Services                                  | None   |
| Civic and Social Organizations                                 | None   |
| Armed Forces                                                   | None   |
| Appliances, Electrical, and Electronics Manufacturing          | None   |
| Animation and Post-production                                  | None   |


In [1]:
# Importing standard data analysis packages
import pandas as pd
import sqlite3
import prettytable
from matplotlib import pyplot as plt
import seaborn as sns
import dash
import plotly
from IPython.display import display, HTML

# 1.1 Explore the source data
The available data has the following folder structure and is shown for convenience below. Lets try and see what variables the tables have in common, so I can identify the fact and dimension tables

In [2]:
'''
├── companies
│   ├── companies.csv
│   ├── company_industries.csv
│   ├── company_specialities.csv
│   └── employee_counts.csv
├── jobs
│   ├── benefits.csv
│   ├── job_industries.csv
│   ├── job_skills.csv
│   └── salaries.csv
├── mappings
│   ├── industries.csv
│   └── skills.csv
└── postings.csv
'''

'\n├── companies\n│\xa0\xa0 ├── companies.csv\n│\xa0\xa0 ├── company_industries.csv\n│\xa0\xa0 ├── company_specialities.csv\n│\xa0\xa0 └── employee_counts.csv\n├── jobs\n│\xa0\xa0 ├── benefits.csv\n│\xa0\xa0 ├── job_industries.csv\n│\xa0\xa0 ├── job_skills.csv\n│\xa0\xa0 └── salaries.csv\n├── mappings\n│\xa0\xa0 ├── industries.csv\n│\xa0\xa0 └── skills.csv\n└── postings.csv\n'

# Postings data (Fact table)
Particularly interesting here is the job_id and the company_id, since these are identifiers that could exist in other lookup tables (dimension tables)

In [3]:
# Exploring Postings data
postings_df = pd.read_csv("../data/postings.csv")
cols_posting = sorted(list(postings_df.columns))
print('n columns: ',len(cols_posting))
print(cols_posting)

n columns:  31
['application_type', 'application_url', 'applies', 'closed_time', 'company_id', 'company_name', 'compensation_type', 'currency', 'description', 'expiry', 'fips', 'formatted_experience_level', 'formatted_work_type', 'job_id', 'job_posting_url', 'listed_time', 'location', 'max_salary', 'med_salary', 'min_salary', 'normalized_salary', 'original_listed_time', 'pay_period', 'posting_domain', 'remote_allowed', 'skills_desc', 'sponsored', 'title', 'views', 'work_type', 'zip_code']


In [4]:
postings_df.head()

Unnamed: 0,job_id,company_name,title,description,max_salary,pay_period,location,company_id,views,med_salary,...,skills_desc,listed_time,posting_domain,sponsored,work_type,currency,compensation_type,normalized_salary,zip_code,fips
0,91700727,Downtown Raleigh Alliance,Economic Development and Planning Intern,Job summary:The Economic Development & Plannin...,20.0,HOURLY,"Raleigh, NC",1481176.0,9.0,,...,,1713456000000.0,,0,INTERNSHIP,USD,BASE_SALARY,35360.0,27601.0,37183.0
1,2264355,Bay West Church,Worship Leader,It is an exciting time to be a part of our chu...,,MONTHLY,"Palm Bay, FL",28631247.0,5.0,350.0,...,"Knowledge, Skills and Abilities: 1. Proficient...",1712456000000.0,,0,PART_TIME,USD,BASE_SALARY,4200.0,32905.0,12009.0
2,229924287,REquipment Durable Medical Equipment and Assis...,Administrative Assistant,The Administrative Assistant will organize and...,,HOURLY,"Woburn, MA",14773918.0,3.0,23.0,...,,1713550000000.0,,0,PART_TIME,USD,BASE_SALARY,47840.0,1801.0,25017.0
3,358267047,ADEPT HRM Solutions,Production Planner (Food Technologist),Job Summary: We are seeking a skilled Producti...,,,"Concord, NC",348976.0,6.0,,...,,1712351000000.0,,0,FULL_TIME,,,,28025.0,37025.0
4,445337908,Food Bank of Alaska,Chief Operating Officer,The Chief Operations Officer (COO) position is...,110000.0,YEARLY,"Anchorage, AK",8849197.0,7.0,,...,,1713554000000.0,,0,FULL_TIME,USD,BASE_SALARY,100000.0,99501.0,2020.0


# Companies data
The company_id column seems to be particulary interesting here, since it is shared with the postings data

In [5]:
# Exploring Companies data
companies_df = pd.read_csv("../data/companies/companies.csv")
industries_df = pd.read_csv("../data/companies/company_industries.csv")
specialties_df = pd.read_csv("../data/companies/company_specialities.csv")
employee_df = pd.read_csv("../data/companies/employee_counts.csv")

# Creating a list to show all available columns
companies = list(companies_df.columns)
industries = list(industries_df.columns)
specialties = list(specialties_df.columns)
employees = list(employee_df.columns)
print("companies:  ", companies)
print("industries: ", industries)
print("specialties:", specialties)
print("employees:  ", employees)

companies:   ['Unnamed: 0', 'company_id', 'name', 'description', 'company_size', 'state', 'country', 'city', 'zip_code', 'address', 'url']
industries:  ['Unnamed: 0', 'company_id', 'industry']
specialties: ['Unnamed: 0', 'company_id', 'speciality']
employees:   ['Unnamed: 0', 'company_id', 'employee_count', 'follower_count', 'time_recorded']


In [6]:
companies_df.head()

Unnamed: 0.1,Unnamed: 0,company_id,name,description,company_size,state,country,city,zip_code,address,url
0,18,1088,NXP Semiconductors,NXP Semiconductors N.V. (NASDAQ: NXPI) enables...,7.0,Noord-Brabant,NL,Eindhoven,5656 AG,High Tech Campus 60,https://www.linkedin.com/company/nxp-semicondu...
1,27,1207,Johnson & Johnson,"At Johnson & Johnson, we believe health is eve...",7.0,NJ,US,New Brunswick,08903,0,https://www.linkedin.com/company/johnson-&-joh...
2,29,1224,US Army Corps of Engineers,U.S. Army Corps of Engineers Mission: \nProvid...,7.0,DC,US,Washington,20314,441 G Street NW,https://www.linkedin.com/company/us-army-corps...
3,44,1292,The Walt Disney Company,From classic animated features and exhilaratin...,7.0,CA,US,Burbank,91521,The Walt Disney Company,https://www.linkedin.com/company/the-walt-disn...
4,52,1360,National Computer Systems,WHY CHOOSE NCS ?\nTop 5 reasons why clients ch...,3.0,0,0,0,0,0,https://www.linkedin.com/company/national-comp...


In [7]:
print(companies_df["Unnamed: 0"].min(),companies_df["Unnamed: 0"].max())

18 24471


In [8]:
industries_df.head()

Unnamed: 0.1,Unnamed: 0,company_id,industry
0,18,33218,Staffing and Recruiting
1,36,7790573,Business Consulting and Services
2,49,24803,Staffing and Recruiting
3,50,13345578,IT Services and IT Consulting
4,57,54077952,Motor Vehicle Manufacturing


In [9]:
print(industries_df["Unnamed: 0"].min(), industries_df["Unnamed: 0"].max())

18 24266


In [10]:
industries_df.describe()

Unnamed: 0.1,Unnamed: 0,company_id
count,1432.0,1432.0
mean,12275.868017,20646890.0
std,7000.207157,31787570.0
min,18.0,1088.0
25%,6200.0,166187.8
50%,12396.0,2860462.0
75%,18530.25,27024450.0
max,24266.0,103468900.0


In [11]:
specialties_df.head()

Unnamed: 0.1,Unnamed: 0,company_id,speciality
0,149,33218,CSS Tec
1,150,33218,CSS ProSearch
2,151,33218,CSS Professional Staffing
3,152,33218,CSS Accounting & Finance
4,153,33218,Peergenics


In [12]:
employee_df.head()

Unnamed: 0.1,Unnamed: 0,company_id,employee_count,follower_count,time_recorded
0,18,33218,191,36335,1712346173
1,36,7790573,16,233,1712346248
2,49,24803,130,60572,1712346323
3,50,13345578,279,85916,1712346323
4,57,54077952,74,686,1712346397


I am not sure what the Unnamed: 0 columns are, some have values in a common range, others dont..

# Jobs data
The job_id column here is shared with the postings.csv data

In [13]:
# Exploring Jobs data
benefits_df       = pd.read_csv("../data/jobs/benefits.csv")
job_industries_df = pd.read_csv("../data/jobs/job_industries.csv")
job_skills_df     = pd.read_csv("../data/jobs/job_skills.csv")
salaries_df       = pd.read_csv("../data/jobs/salaries.csv")

# Creating a list to show all available columns
salaries = list(salaries_df.columns)
benefits = list(benefits_df.columns)
industries = list(job_industries_df.columns)
skills = list(job_skills_df.columns)
print("salaries:  ", salaries)
print("benefits:  ", benefits)
print("industries:", industries)
print("skills:    ", skills)

salaries:   ['salary_id', 'job_id', 'max_salary', 'med_salary', 'min_salary', 'pay_period', 'currency', 'compensation_type']
benefits:   ['job_id', 'inferred', 'type']
industries: ['job_id', 'industry_id']
skills:     ['job_id', 'skill_abr']


In [14]:
benefits_df.head()

Unnamed: 0,job_id,inferred,type
0,3887474156,0,Medical insurance
1,3887474156,0,Vision insurance
2,3887474156,0,Dental insurance
3,3884436043,0,Medical insurance
4,3884436043,0,Vision insurance


In [15]:
job_industries_df.head()

Unnamed: 0,job_id,industry_id
0,3887466990,10
1,3887473087,11
2,3887467990,96
3,3887467990,14
4,3884435035,84


In [16]:
print(job_industries_df["industry_id"].min(), job_industries_df["industry_id"].max())

1 3252


In [17]:
job_skills_df.head()

Unnamed: 0,job_id,skill_abr
0,3887466990,LGL
1,3887466990,ADM
2,3887473087,MRKT
3,3887473087,SALE
4,3887467990,CNSL


In [18]:
salaries_df.head()

Unnamed: 0,salary_id,job_id,max_salary,med_salary,min_salary,pay_period,currency,compensation_type
0,13,3887473087,80000.0,,75000.0,YEARLY,USD,BASE_SALARY
1,18,3887467990,80.0,,60.0,HOURLY,USD,BASE_SALARY
2,65,3884433143,,53000.0,,YEARLY,USD,BASE_SALARY
3,70,3884428699,300000.0,,90000.0,YEARLY,USD,BASE_SALARY
4,96,3887474156,80000.0,,70000.0,YEARLY,USD,BASE_SALARY


In [19]:
salaries_df.describe()

Unnamed: 0,salary_id,job_id,max_salary,med_salary,min_salary
count,2088.0,2088.0,1662.0,426.0,1662.0
mean,20072.255268,3889088000.0,96273.57,36351.924624,66036.549212
std,11559.985785,178640000.0,92329.96,71459.274156,59313.422769
min,13.0,2264355.0,1.0,0.0,1.0
25%,10139.75,3894573000.0,65.0,19.8125,50.0
50%,19874.5,3901800000.0,90000.0,30.0,66300.0
75%,29606.25,3904398000.0,150000.0,53810.0,100000.0
max,40780.0,3906266000.0,1000001.0,500000.0,400000.0


# Mapping data
The industries.csv dataset looks like it has the "industry_id" column in common with job_industries.csv

And the skills.csv dataset looks like it has the "skill_abr" column in common with the job_skills.csv

In [20]:
# Exploring Mappings data
industries_df = pd.read_csv("../data/mappings/industries.csv")
skills_df = pd.read_csv("../data/mappings/skills.csv")

# Creating a list to show all available columns
print("Industries: ", list(industries_df.columns))
print("Skills:     ", list(skills_df.columns))

Industries:  ['industry_id', 'industry_name']
Skills:      ['skill_abr', 'skill_name']


In [21]:
industries_df.head()

Unnamed: 0,industry_id,industry_name
0,1,Defense and Space Manufacturing
1,3,Computer Hardware Manufacturing
2,4,Software Development
3,5,Computer Networking Products
4,6,"Technology, Information and Internet"


In [22]:
skills_df.head()

Unnamed: 0,skill_abr,skill_name
0,ART,Art/Creative
1,DSGN,Design
2,ADVR,Advertising
3,PRDM,Product Management
4,DIST,Distribution


# 1.2 Design a database schema


Based on the column mappings that I have shown in the diagram below, it looks as though `postings.csv` is definitly the fact_table with links to the other dimension tables via the variables 'company_id' and 'job_id'. The data tables look to be arranged best in a SNOWFLAKE schema, with `postings.csv` at the center as a fact table. The reason this is a SNOWFLAKE schema is because the job `industries.csv` and `job_skills.csv` are linked to other tables, extending the graph relationship to `postings.csv` by more than 1 degree. 

```
postings.csv is related to companies.csv, company_industries.csv, company_specialities.csv and employee_counts.csv via variable 'company_id'

postings.csv is related to benefits.csv job_industries.csv job_skills.csv salaries.csv via variable 'job_id'

job_industries.csv is related to industries.csv via variable 'industry_id'

job_skills.csv is related to skills.csv via variable 'skill_abr'
```

The most practical data base scheme is the STAR schema between the `postings.csv` which will act as the fact_table and the `company_industries.csv` that will act as the dim_table

In [23]:
# Folder strucutre and columns in each data table is shown below 
'''
├── companies
│   ├── companies.csv
|   |     companies:   ['Unnamed: 0', 'company_id', 'name', 'description', 'company_size', 'state', 'country', 'city', 'zip_code', 'address', 'url']
│   ├── company_industries.csv
|   |     industries:  ['Unnamed: 0', 'company_id', 'industry']
│   ├── company_specialities.csv
|   |     specialties: ['Unnamed: 0', 'company_id', 'speciality']
│   └── employee_counts.csv
│         employees:   ['Unnamed: 0', 'company_id', 'employee_count', 'follower_count', 'time_recorded']
│
│
├── jobs
│   ├── benefits.csv
|   |     benefits:   ['job_id', 'inferred', 'type']
│   ├── job_industries.csv
|   |     industries: ['job_id', 'industry_id']
│   ├── job_skills.csv
|   |     skills:     ['job_id', 'skill_abr']
│   └── salaries.csv
│         salaries:   ['salary_id', 'job_id', 'max_salary', 'med_salary', 'min_salary', 'pay_period', 'currency', 'compensation_type']
│
│
│
├── mappings
│   ├── industries.csv
|   |     Industries:  ['industry_id', 'industry_name']
│   └── skills.csv
|         Skills:      ['skill_abr', 'skill_name']
|
|
└── postings.csv
        Common variables with other tables: 'company_id', 'job_id'
        postings: ['application_type', 'application_url', 'applies', 'closed_time', 'company_id', 'company_name', 'compensation_type', 
                   'currency', 'description', 'expiry', 'fips', 'formatted_experience_level', 'formatted_work_type', 'job_id', 'job_posting_url', 
                   'listed_time', 'location', 'max_salary', 'med_salary', 'min_salary', 'normalized_salary', 'original_listed_time', 'pay_period', 
                   'posting_domain', 'remote_allowed', 'skills_desc', 'sponsored', 'title', 'views', 'work_type', 'zip_code']
'''

"\n├── companies\n│\xa0\xa0 ├── companies.csv\n|   |     companies:   ['Unnamed: 0', 'company_id', 'name', 'description', 'company_size', 'state', 'country', 'city', 'zip_code', 'address', 'url']\n│\xa0\xa0 ├── company_industries.csv\n|   |     industries:  ['Unnamed: 0', 'company_id', 'industry']\n│\xa0\xa0 ├── company_specialities.csv\n|   |     specialties: ['Unnamed: 0', 'company_id', 'speciality']\n│\xa0\xa0 └── employee_counts.csv\n│         employees:   ['Unnamed: 0', 'company_id', 'employee_count', 'follower_count', 'time_recorded']\n│\n│\n├── jobs\n│\xa0\xa0 ├── benefits.csv\n|   |     benefits:   ['job_id', 'inferred', 'type']\n│\xa0\xa0 ├── job_industries.csv\n|   |     industries: ['job_id', 'industry_id']\n│\xa0\xa0 ├── job_skills.csv\n|   |     skills:     ['job_id', 'skill_abr']\n│\xa0\xa0 └── salaries.csv\n│         salaries:   ['salary_id', 'job_id', 'max_salary', 'med_salary', 'min_salary', 'pay_period', 'currency', 'compensation_type']\n│\n│\n│\n├── mappings\n│\xa0\x

# 1.3 Create and load a local database
Two tables are loaded into a sqlite database called `job_postings.db`

`postings.csv` as `fact_job_postings` and `company_industries.csv` as `dim_company`

In [24]:
# Allows for displaying the sql queries 
prettytable.DEFAULT = 'DEFAULT'

In [25]:
# Connecting to an existing database, or creating it if it does not exist yet
conn = sqlite3.connect("job_postings.db")

# Allows for querying using sql
cursor = conn.cursor()

# Allows for using magic statements within sql
%load_ext sql

# Creating/loading a database called job_postings.sb
%sql sqlite:///job_postings.db

In [26]:
# Reading the fact and dim table into memory using pandas
fact_job_postings_df = pd.read_csv("../data/postings.csv")
dim_company_df = pd.read_csv("../data/companies/company_industries.csv")    

In [27]:
# Converting the dataframes to sql tables, linking them to job_postings.db
fact_job_postings_df.to_sql("fact_job_postings", conn, if_exists='replace', index=False, method="multi")
dim_company_df.to_sql("dim_company", conn, if_exists='replace', index=False, method="multi")

1432

In [28]:
# What info is in the fact table again?
%sql PRAGMA table_info("fact_job_postings")

 * sqlite:///job_postings.db
Done.


cid,name,type,notnull,dflt_value,pk
0,job_id,INTEGER,0,,0
1,company_name,TEXT,0,,0
2,title,TEXT,0,,0
3,description,TEXT,0,,0
4,max_salary,REAL,0,,0
5,pay_period,TEXT,0,,0
6,location,TEXT,0,,0
7,company_id,REAL,0,,0
8,views,REAL,0,,0
9,med_salary,REAL,0,,0


In [29]:
%sql PRAGMA table_info("dim_company")

 * sqlite:///job_postings.db
Done.


cid,name,type,notnull,dflt_value,pk
0,Unnamed: 0,INTEGER,0,,0
1,company_id,INTEGER,0,,0
2,industry,TEXT,0,,0


# 1.4 Use your database to answer some questions

## How many companies have more than 1 job posting?

In [30]:
%%sql
SELECT COUNT(count) as `Companies with > 1 job postings` FROM (SELECT company_name, COUNT(job_id) AS count FROM fact_job_postings GROUP BY company_name)
WHERE count > 1 ;

 * sqlite:///job_postings.db
Done.


Companies with > 1 job postings
601


In [31]:
%%sql
SELECT comp AS Company, count AS `Num Job Postings` FROM (SELECT company_name as comp, COUNT(job_id) AS count FROM fact_job_postings GROUP BY company_name)
WHERE count > 1 
ORDER BY count DESC
LIMIT 10
;

 * sqlite:///job_postings.db
Done.


Company,Num Job Postings
Family Dollar,288
Talentify.io,276
Rent-A-Center,136
National Staffing Solutions,134
AutoZone,131
Claire's,130
Sutter Health,120
Johnson & Johnson,108
Revature,103
"LanceSoft, Inc.",95


# How many job postings are there for each job industry?
This question requires me to join tables so I can use the industry type, the dim table

In [40]:
%%sql
SELECT industry AS Industry, COUNT(job_id) AS `Num Postings` FROM (fact_job_postings INNER JOIN dim_company ON fact_job_postings.company_id = dim_company.company_id)
GROUP BY industry
ORDER BY COUNT(job_id) DESC;

 * sqlite:///job_postings.db
Done.


Industry,Num Postings
Hospitals and Health Care,1010
Retail,913
Staffing and Recruiting,803
IT Services and IT Consulting,762
Software Development,489
Entertainment Providers,211
Insurance,156
Higher Education,143
Construction,126
Hospitality,106


# What is the average normalized salary by company industry?

In [33]:
%%sql
SELECT industry AS Industry, AVG(normalized_salary) AS `Avg. Norm. Salary` FROM (fact_job_postings INNER JOIN dim_company ON fact_job_postings.company_id = dim_company.company_id)
GROUP BY industry
ORDER BY `Avg. Norm. Salary` DESC;

 * sqlite:///job_postings.db
Done.


Industry,Avg. Norm. Salary
Information Services,250000.0
Investment Management,225000.0
Automation Machinery Manufacturing,195900.0
Semiconductor Manufacturing,180000.0
Biotechnology Research,164804.125
Online Audio and Video Media,159500.0
Entertainment Providers,153425.15569620254
Venture Capital and Private Equity Principals,149366.66666666666
Personal Care Product Manufacturing,138401.95789473684
Defense and Space Manufacturing,136776.82222222222


# Name the top 5 companies with the highest average normalized salary for their job postings

In [34]:
%%sql
SELECT company_name AS Company, AVG(normalized_salary) AS `Avg. Norm Salary` FROM fact_job_postings
GROUP BY company_name
ORDER BY `Avg. Norm Salary` DESC
LIMIT 5;

 * sqlite:///job_postings.db
Done.


Company,Avg. Norm Salary
Woodside Staffing Solutions & Consulting,337500.0
Calm,337500.0
Health eCareers,337246.4090909091
Buck Institute for Research on Aging,300000.0
Spire Orthopedic Partners,284124.0


# Verifying the averages, they seem extremely high
seems like there is only 1 postings a lot of the time, so the average is the posted value, seems reasonable

In [35]:
%%sql
SELECT company_name AS Company, normalized_salary FROM fact_job_postings
WHERE company_name='Woodside Staffing Solutions & Consulting'
ORDER BY company_name;


 * sqlite:///job_postings.db
Done.


Company,normalized_salary
Woodside Staffing Solutions & Consulting,337500.0


In [36]:
%%sql
SELECT company_name AS Company, normalized_salary FROM fact_job_postings
WHERE company_name='Calm'
ORDER BY company_name;


 * sqlite:///job_postings.db
Done.


Company,normalized_salary
Calm,337500.0


In [37]:
%%sql
SELECT company_name AS Company, AVG(normalized_salary) FROM fact_job_postings
WHERE company_name='Health eCareers'
ORDER BY company_name;


 * sqlite:///job_postings.db
Done.


Company,AVG(normalized_salary)
Health eCareers,337246.4090909091


In [38]:
%%sql
SELECT company_name AS Company, normalized_salary FROM fact_job_postings
WHERE company_name='Buck Institute for Research on Aging'
ORDER BY company_name;

 * sqlite:///job_postings.db
Done.


Company,normalized_salary
Buck Institute for Research on Aging,300000.0


In [39]:
%%sql
SELECT company_name AS Company, normalized_salary FROM fact_job_postings
WHERE company_name='Spire Orthopedic Partners'
ORDER BY company_name;


 * sqlite:///job_postings.db
Done.


Company,normalized_salary
Spire Orthopedic Partners,450000.0
Spire Orthopedic Partners,118248.0
