# Tech Job Posting

In [1]:
# Import libraries
import pandas as pd
import sqlite3

In [2]:
# Import datasets
salaries = pd.read_csv('ds_salaries_2023.csv')
posting = pd.read_csv('postings.csv')
companies = pd.read_csv('companies.csv')
skills = pd.read_csv('job_skills.csv')
benefits = pd.read_csv('benefits.csv')
industry = pd.read_csv('company_industries.csv')

In [3]:
# Set SQL server
## Connector
conn = sqlite3.connect('tech_jobmarket.db')

# Transform csv to db
salaries.to_sql('salaries', conn, index=False, if_exists='replace')
posting.to_sql('posting', conn, index=False, if_exists='replace')
companies.to_sql('companies', conn, index=False, if_exists='replace')
skills.to_sql('skills', conn, index=False, if_exists='replace')
benefits.to_sql('benefits', conn, index=False, if_exists='replace')
industry.to_sql('industry', conn, index=False, if_exists='replace')

# Activate SQL extension
%load_ext sql

# Connect SQL Magic (to use SQl commands) to our databse
%sql sqlite:///tech_jobmarket.db

## **Business Questions**

### 1. Salary Trends and Insights
- What is the average salary for data science jobs across different industries?
- How do company sizes impact the salaries offered for data science roles?
- Which job titles offer the highest average salary for data professionals?

SQL Skills:
- JOINs (INNER JOIN between job_postings, job_industries, and companies to connect salary, industry, and company data).
- Aggregations (AVG to calculate averages and trends over time).

- What is the average salary for data science jobs across different industries?

In [None]:
%%sql
SELECT
    industry,
    pay_period,
    ROUND(AVG(max_salary),2) as top_salary,
    ROUND(AVG(med_salary),2) as avg_salary,
    ROUND(AVG(min_salary),2) as min_salary
FROM posting
JOIN industry ON posting.company_id = industry.company_id
WHERE posting.title LIKE '%data scientist%' AND max_salary IS NOT NULL AND min_salary IS NOT NULL
GROUP BY industry
ORDER BY min_salary DESC

 * sqlite:///tech_jobmarket.db
Done.


industry,pay_period,top_salary,avg_salary,min_salary
"Technology, Information and Internet",YEARLY,239000.0,,177000.0
Information Services,YEARLY,200000.0,,160000.0
Biotechnology Research,YEARLY,195500.0,,153000.0
Industrial Machinery Manufacturing,YEARLY,189200.0,,142800.0
Truck Transportation,YEARLY,160000.0,,140000.0
Defense and Space Manufacturing,YEARLY,209500.0,,139700.0
Software Development,YEARLY,219922.5,,137499.0
Advertising Services,YEARLY,188050.17,,135205.0
Government Administration,YEARLY,171305.2,,120055.0
Mental Health Care,YEARLY,160000.0,,120000.0


- How do company sizes impact the salaries offered for data science roles?

In [9]:
%%sql
SELECT 
    company_size, 
    pay_period,
    ROUND(AVG(max_salary),2) as top_salary,
    ROUND(AVG(med_salary),2) as avg_salary,
    ROUND(AVG(min_salary),2) as min_salary
FROM companies
JOIN posting ON companies.company_id = posting.company_id
WHERE posting.title LIKE "%data scientist%" AND max_salary IS NOT NULL AND company_size IS NOT NULL
GROUP BY company_size
ORDER BY top_salary DESC

 * sqlite:///tech_jobmarket.db
Done.


company_size,pay_period,top_salary,avg_salary,min_salary
7.0,YEARLY,172107.78,,104480.87
5.0,YEARLY,169496.67,,120739.44
2.0,HOURLY,168592.0,,126018.57
6.0,YEARLY,161235.83,,122897.5
1.0,YEARLY,145025.0,,106735.71
4.0,YEARLY,141780.75,,96030.63
3.0,YEARLY,105648.13,,64892.5


- Which job titles offer the highest average salary for data professionals?

In [15]:
%%sql
SELECT
    title,
    ROUND(AVG(max_salary),2) as top_salary,
    ROUND(AVG(med_salary),2) as avg_salary,
    ROUND(AVG(min_salary),2) as min_salary,
    ROUND(AVG(max_salary) - AVG(min_salary),2) as delta_salary
FROM posting
WHERE title LIKE "%data%" AND max_salary IS NOT NULL
GROUP BY title
ORDER BY top_salary DESC
LIMIT 5;

 * sqlite:///tech_jobmarket.db
Done.


title,top_salary,avg_salary,min_salary,delta_salary
NEO4J- Senior Database administrator,1300000.0,,100000.0,1200000.0
Vice President of Data Engineering,600000.0,,400000.0,200000.0
Database Manager,600000.0,,600000.0,0.0
"Software Engineer, Data Infrastructure at D. E. Shaw Research",550000.0,,230000.0,320000.0
Planner/ Data Sales Analyst,462435.0,,70000.0,392435.0


### 2. Skill Demand and Growth
- Which skills are most in demand for data science roles globally?
- How do industry types influence the demand for specific data science skills?
- What skills are associated with the highest paying data science roles?

SQL Skills:
- CTEs (to simplify queries that identify and track specific skill sets over time).
- JOINs (to link job_skills, job_postings, and company_industries to analyze skill trends).
- Aggregations (COUNT, GROUP BY to identify the frequency of specific skills in job postings).

- Which skills are most in demand for data science roles globally?


In [32]:
%%sql
WITH title_industry AS (
    SELECT title, industry, job_id
    FROM posting
    JOIN industry ON posting.company_id = industry.company_id
    WHERE title LIKE '%data scientist%'
)
SELECT title, skill_abr as skill, COUNT(skill_abr) as skill_count
FROM title_industry
JOIN skills ON title_industry.job_id = skills.job_id
GROUP BY  title, skill
ORDER BY skill_count DESC
LIMIT 10

 * sqlite:///tech_jobmarket.db
Done.


title,skill,skill_count
Data Scientist,IT,44
Data Scientist,ENG,38
Senior Data Scientist,IT,15
Senior Data Scientist,ENG,13
Lead Data Scientist,ENG,8
Data Scientist,ANLS,7
Lead Data Scientist,IT,7
Data Scientist Lead – Telematics (Remote),ENG,6
Data Scientist Lead – Telematics (Remote),IT,6
Data Scientist,OTHR,5


- How do industry types influence the demand for specific data science skills?


In [31]:
%%sql
WITH title_industry AS (
    SELECT title, industry, job_id
    FROM posting
    JOIN industry ON posting.company_id = industry.company_id
    WHERE title LIKE '%data scientist%'
)
SELECT industry, skill_abr as skill, COUNT(skill_abr) as skill_count
FROM title_industry
JOIN skills ON title_industry.job_id = skills.job_id
GROUP BY  industry, skill
ORDER BY skill_count DESC
LIMIT 10

 * sqlite:///tech_jobmarket.db
Done.


industry,skill,skill_count
IT Services and IT Consulting,IT,35
Financial Services,ENG,34
Financial Services,IT,34
Software Development,IT,33
Software Development,ENG,32
IT Services and IT Consulting,ENG,22
Staffing and Recruiting,IT,20
Staffing and Recruiting,ENG,11
Advertising Services,ENG,6
Biotechnology Research,IT,6


- What skills are associated with the highest paying data science roles?


In [37]:
%%sql
SELECT
    title,
    ROUND(AVG(max_salary),2) as top_salary,
    ROUND(AVG(med_salary),2) as avg_salary,
    ROUND(AVG(min_salary),2) as min_salary,
    skill_abr
FROM posting
JOIN skills ON posting.job_id = skills.job_id
WHERE title LIKE '%data scientist%'
GROUP BY title
ORDER BY top_salary DESC
LIMIT 5

 * sqlite:///tech_jobmarket.db
Done.


title,top_salary,avg_salary,min_salary,skill_abr
Principal AI/ML Data Scientist,322100.0,,173400.0,ENG
TikTok Shop - Data Scientist - User Growth,321099.5,,171946.0,ANLS
Data Scientist Lead – Telematics (Remote),286130.0,,158960.0,ENG
Data Scientist - AI Investment,250000.0,,200000.0,IT
"Senior Data Scientist, LLM",249000.0,,216000.0,ENG


### 3. Company Overview and Job Postings Analysis
- Which companies post the most data science jobs?
- How does the size of a company correlate with the number of job postings?
- What are the top industries for data science job postings?

SQL Skills:
- JOINs (INNER JOIN between job_postings, companies, and company_industries to get company job postings).
- Aggregations (COUNT for job postings per company, AVG for size and job posting correlation).

- Which companies post the most data science jobs?


In [39]:
%%sql
SELECT name, COUNT(title) count_ds
FROM companies c
JOIN posting p ON c.company_id = p.company_id
WHERE p.title LIKE '%data scientist%'
GROUP BY name
ORDER BY count_ds DESC
LIMIT 5

 * sqlite:///tech_jobmarket.db
Done.


name,count_ds
Capital One,11
USAA,10
SynergisticIT,7
Verizon,6
Navy Federal Credit Union,5


- How does the size of a company correlate with the number of job postings?


In [42]:
%%sql
SELECT company_size, COUNT(*) as count_posts
FROM posting p 
JOIN companies c ON p.company_id = c.company_id
WHERE p.title LIKE "%data%"
GROUP BY company_size
ORDER BY count_posts DESC
LIMIT 5 

 * sqlite:///tech_jobmarket.db
Done.


company_size,count_posts
7.0,764
5.0,506
2.0,352
3.0,256
1.0,252


- What are the top industries for data science job postings?

In [44]:
%%sql
SELECT industry, COUNT(*) as count_posts, ROUND(AVG(max_salary),2) as max_salary
FROM posting p
JOIN industry i ON p.company_id = i.company_id
WHERE p.title LIKE "%data scientist%"
GROUP BY industry
ORDER BY count_posts DESC
LIMIT 5

 * sqlite:///tech_jobmarket.db
Done.


industry,count_posts,max_salary
IT Services and IT Consulting,49,147532.18
Financial Services,38,200763.0
Software Development,36,219922.5
Staffing and Recruiting,29,93570.53
Business Consulting and Services,8,48401.67


### 4. Benefits Analysis for Data Science Jobs
- What benefits are most commonly offered in data science job postings?
- Is there a correlation between benefits offered and salary for data science roles?
- Which companies offer the most comprehensive benefits packages?

SQL Skills:
- JOINs (LEFT JOIN between benefits and job_postings to gather benefit information per job posting).
- Aggregations (COUNT, GROUP BY to calculate the frequency of each benefit).
- Subqueries (to compare salary with the number of benefits offered).

- What benefits are most commonly offered in data science job postings?


In [50]:
%%sql
SELECT title, type, COUNT(type) as count_benefits
FROM posting p
LEFT JOIN benefits b ON p.job_id = b.job_id
WHERE p.title LIKE "data scientist"
GROUP BY title, type
ORDER BY count_benefits DESC

 * sqlite:///tech_jobmarket.db
Done.


title,type,count_benefits
Data Scientist,401(k),7
Data Scientist,Vision insurance,6
Data Scientist,Dental insurance,3
Data Scientist,Disability insurance,3
Data Scientist,Medical insurance,3
Data Scientist,Child care support,1
Data Scientist,Paid maternity leave,1
Data Scientist,Paid paternity leave,1
Data Scientist,Tuition assistance,1
Data Scientist,,0


- Is there a correlation between benefits offered and salary for data science roles?


In [61]:
%%sql
SELECT
    title,
    ROUND(AVG(max_salary),2) as top_salary,
    ROUND(AVG(med_salary),2) as avg_salary,
    ROUND(AVG(min_salary),2) as min_salary,
    COUNT(type) as count_ben
FROM posting p
LEFT JOIN benefits b ON p.job_id = b.job_id
WHERE p.title LIKE '%data scientist%'
GROUP BY title
ORDER BY top_salary DESC
LIMIT 5

 * sqlite:///tech_jobmarket.db
Done.


title,top_salary,avg_salary,min_salary,count_ben
Principal AI/ML Data Scientist,322100.0,,173400.0,0
TikTok Shop - Data Scientist - User Growth,321099.5,,171946.0,6
Data Scientist Lead – Telematics (Remote),286130.0,,158960.0,2
Data Scientist - AI Investment,250000.0,,200000.0,0
"Senior Data Scientist, LLM",249000.0,,216000.0,1


- Which companies offer the most comprehensive benefits packages?


In [66]:
%%sql
WITH compensation AS (
    SELECT
        company_id,
        ROUND(AVG(max_salary),2) as top_salary,
        ROUND(AVG(med_salary),2) as avg_salary,
        ROUND(AVG(min_salary),2) as min_salary,
        type as ben,
        COUNT(type) count_ben
    FROM posting p
    LEFT JOIN benefits b ON p.job_id = b.job_id
    WHERE p.title LIKE "%data scientist%"
)
SELECT 
    name, 
    top_salary,
    avg_salary,
    min_salary,
    ben,
    count_ben
FROM companies c
LEFT JOIN compensation ON c.company_id = compensation.company_id
WHERE top_salary IS NOT NULL
GROUP BY name
ORDER BY count_ben DESC

 * sqlite:///tech_jobmarket.db
Done.


name,top_salary,avg_salary,min_salary,ben,count_ben
Armstrong World Industries,161275.03,60.0,106937.62,,104


### 5. Job Market Distribution and Location Insights
- Where are the highest paying data science jobs located globally?
- How does the number of job postings vary across different cities or countries?
- Are there any geographical patterns in the industries offering data science jobs?

SQL Skills:
- JOINs (INNER JOIN between job_postings, companies, and company_industries to analyze job distribution).
- Window Functions (for ranking job locations based on salary).
- Aggregations (COUNT, SUM to track job posting counts by location).

- Where are the highest paying data science jobs located globally?


In [12]:
%%sql
SELECT
    title,
    ROUND(AVG(max_salary),2) as max_salary,
    ROUND(AVG(min_salary),2) as min_salary,
    location
FROM posting p
WHERE max_salary IS NOT NULL AND p.title LIKE "%data scientist%"
GROUP BY location
ORDER BY max_salary DESC
LIMIT 5

 * sqlite:///tech_jobmarket.db
Done.


title,max_salary,min_salary,location
Data Scientist Lead – Telematics (Remote),286130.0,158960.0,"Irving, TX"
Data Scientist Lead – Telematics (Remote),286130.0,158960.0,"Houston, TX"
Principal AI/ML Data Scientist,285550.0,194700.0,"San Francisco, CA"
Data Scientist,250000.0,110000.0,"Melbourne, FL"
"Staff Data Scientist, SSP.",240901.0,168630.0,"San Mateo, CA"


- How does the number of job postings vary across different cities or countries?


In [26]:
%%sql
SELECT
    location,
    COUNT(*) as number_posts,
    country
FROM posting p 
INNER JOIN companies c ON p.company_id = c.company_id
WHERE country IS NOT '0'
GROUP BY country
ORDER BY number_posts DESC
LIMIT 5

 * sqlite:///tech_jobmarket.db
Done.


location,number_posts,country
"Albany, NY",109995,US
"Atlanta, GA",3000,GB
"Ann Arbor, MI",1399,CA
"Appleton, WI",792,IN
"Austin, TX",581,CH


- Are there any geographical patterns in the industries offering data science jobs?


In [31]:
%%sql
SELECT
    location,
    COUNT(*) as DS_posts,
    country,
    ROUND(AVG(max_salary),2) as max_salary,
    ROUND(AVG(min_salary),2) as min_salary
FROM posting p 
INNER JOIN companies c ON p.company_id = c.company_id
WHERE 
    country IS NOT '0' 
    and p.title LIKE "%data scientist%"
    and max_salary IS NOT NULL
GROUP BY country
ORDER BY DS_posts DESC
LIMIT 5

 * sqlite:///tech_jobmarket.db
Done.


location,DS_posts,country,max_salary,min_salary
"Richland, WA",76,US,156424.65,101949.85
United States,7,GB,230000.0,174285.71
"San Mateo, CA",1,SG,240901.0,168630.0
"Boston, MA",1,JP,170500.0,108500.0


### 6. Industry Growth and Forecasting
- Which industries are experiencing the fastest growth in data science job postings?
- Which industries feature the widest salary ranges for data science roles, and what factors influence these variations?
- What industries offer the most stable opportunities for data professionals?

SQL Skills:
- Window Functions (for ranking industries based on growth rate of job postings).
- CTEs (to handle and simplify complex time-series analysis).
- Aggregations (SUM, COUNT to track industry-specific trends).

- Which industries are experiencing the fastest growth in data science job postings?


In [35]:
%%sql
SELECT
    industry,
    title,
    COUNT(*) as job_posting
FROM posting p
JOIN industry i ON p.company_id = i.company_id
WHERE title LIKE "%data scientist%"
GROUP BY industry, title
ORDER BY job_posting DESC
LIMIT 5

 * sqlite:///tech_jobmarket.db
Done.


industry,title,job_posting
IT Services and IT Consulting,Data Scientist,12
Software Development,Data Scientist,8
Staffing and Recruiting,Data Scientist,8
Financial Services,Data Scientist Lead – Telematics (Remote),6
Financial Services,Data Scientist,5


- Which industries feature the widest salary ranges for data science roles, and what factors influence these variations?


In [12]:
%%sql
WITH company_location AS(
    SELECT
            i.industry,
            c.company_size,
            c.country,
            c.company_id
    FROM companies c
    JOIN industry i ON c.company_id = i.company_id
    WHERE country IS NOT NULL
)
SELECT
    industry,
    company_size,
    country,
    title,
    ROUND(AVG(max_salary) - AVG(min_salary),2) as salary_range,
    location,
    CASE
        WHEN description LIKE '%remote%' THEN 'remote'
        ELSE 'onsite'
    END as remote
FROM posting p
JOIN company_location ON p.company_id = company_location.company_id
WHERE title LIKE "%data scientist%"
GROUP BY industry, remote
ORDER BY salary_range DESC
LIMIT 10
    

 * sqlite:///tech_jobmarket.db
Done.


industry,company_size,country,title,salary_range,location,remote
Financial Services,7.0,US,Data Scientist Lead – Telematics (Remote),127170.0,"Houston, TX",remote
Entertainment Providers,7.0,US,Sr Data Scientist,99435.67,"Lake Buena Vista, FL",onsite
Software Development,7.0,US,"Data Scientist, PeopleInsight",85558.33,"Seattle, WA",onsite
Research Services,6.0,US,Senior Data Scientist 3 - Nonproliferation,85000.0,"Richland, WA",onsite
Advertising Services,4.0,SG,"Staff Data Scientist, SSP.",72271.0,"San Mateo, CA",remote
IT Services and IT Consulting,2.0,US,Junior Data Scientist - Python /Modeling,72066.67,"Atlanta, GA",remote
Defense and Space Manufacturing,1.0,US,Data Scientist - Clearance Required with Security Clearance,69800.0,"Quantico, VA",onsite
Research Services,5.0,US,Data Scientist,69000.0,"Chicago, IL",remote
Financial Services,7.0,US,Marketing Analytics - Data Scientist Senior Associate,68307.14,"Columbus, OH",onsite
Spectator Sports,1.0,US,NHL Data Scientist,68000.0,United States,onsite


- What industries offer the most stable opportunities for data professionals?


In [21]:
%%sql
WITH company_industry AS(   
    SELECT
        i.industry,
        c.company_id
    FROM companies c
    JOIN industry i ON c.company_id = i.company_id
),
post_benefits AS (
    SELECT
        posting.title,
        posting.job_id,
        posting.description,
        posting.company_id,
        benefits.type
    FROM posting
    JOIN benefits ON posting.job_id = benefits.job_id
    WHERE posting.title LIKE "%data%"
)
SELECT
    post_benefits.title,
    company_industry.industry,
    COUNT(type) as count_benefits,
    post_benefits.type as benefits,
    CASE
        WHEN post_benefits.description LIKE '%remote' THEN 'remote'
        ELSE 'onsite'
    END as remote
FROM post_benefits
JOIN company_industry ON post_benefits.company_id = company_industry.company_id
GROUP BY  title, benefits
ORDER BY count_benefits DESC
LIMIT 15

 * sqlite:///tech_jobmarket.db
Done.


title,industry,count_benefits,benefits,remote
Data Analyst,Paper and Forest Product Manufacturing,11,401(k),onsite
Data Engineer,Wholesale Building Materials,10,401(k),onsite
Data Scientist,Staffing and Recruiting,7,401(k),onsite
Data Analyst,Paper and Forest Product Manufacturing,6,Disability insurance,onsite
Data Analyst,Staffing and Recruiting,6,Medical insurance,onsite
Data Engineer,Oil and Gas,6,Disability insurance,onsite
Data Engineer,Wholesale Building Materials,6,Medical insurance,onsite
Data Scientist,Staffing and Recruiting,6,Vision insurance,onsite
"Manager, Database Administration- Washington DC",Telecommunications,6,401(k),onsite
"Manager, Database Administration- Washington DC",Telecommunications,6,Disability insurance,onsite


## Advanced SQL Project: Data Science Job Market Analysis with Multiple JOINS

1. LEFT JOINAnalyzing Missing Data in Job Postings

Business Question: What industries have job postings that don't require any skills?

In [28]:
%%sql
SELECT
    industry,
    title,
    COUNT(title) as post_count,
    skill_abr
FROM posting p
LEFT JOIN industry i ON p.company_id = i.company_id
LEFT JOIN skills s ON p.job_id = s.job_id
WHERE 
    skill_abr IS NULL
    AND industry IS NOT NULL
GROUP BY industry
ORDER BY post_count DESC
LIMIT 10

 * sqlite:///tech_jobmarket.db
Done.


industry,title,post_count,skill_abr
IT Services and IT Consulting,"Managed File Transfer Specialist (connect direct , NDM) | REMOTE ROLE",363,
Software Development,Customer Success Manager,116,
Hospitals and Health Care,Events & Communications Assistant,64,
Non-profit Organizations,Marketing & Communications – Content Writer Internship,56,
Financial Services,Client Service Associate / Practice Manager,51,
Advertising Services,NPE 2024 Exhibition Event Worker,50,
Higher Education,THEATRE Instructor for Speech Communication & Theatre Department,41,
Business Consulting and Services,🌟🚀🌊 Make Waves with Your Sales Skills! Earn $2500-$3500/Week! 🌊🚀🌟,35,
Construction,Fire Sprinkler Designer,34,
Retail,Customer Service Specialist,32,


2. FULL OUTER JOIN: Combining Salary and Skill Insights

Business Question: How do salaries relate to job postings with missing skills?

In [None]:
%%sql
SELECT
    title,
    ROUND(AVG(max_salary),2) as max_salary,
    ROUND(AVG(min_salary),2) as min_salary,
    COUNT(skill_abr) as skill
FROM posting p
FULL OUTER JOIN skills s ON p.job_id = s.job_id
WHERE s.skill_abr IS NULL
GROUP BY title
ORDER BY max_salary DESC
LIMIT 5

 * sqlite:///tech_jobmarket.db


3. LEFT JOIN: Tracking Data Science Roles Without Benefits

Business Question: Which data science job postings do not mention any benefits?

4. RIGHT JOIN: Companies Offering No Benefits for Data Roles

Business Question: What companies offer data science jobs with no benefits?

5. FULL OUTER JOIN: Combining Data on Salary, Benefits, and Skills

Business Question: How do benefits and skills affect salaries in data science roles?