## Introduction

This dataset contains current job listing for the NYC government jobs. It includes job listing from the nyc.gov website and other job posting website like indeed or LinkedIn. Update: Weekly. It has all the revelant information a job seeker would want such as location, job description, number of available positions , and salary range.

[LINK: NYC JOBS LISTINGS](https://data.cityofnewyork.us/City-Government/NYC-Jobs/kpav-sd4t)

### Import Necessary Packages

In [22]:
import pandas as pd
import re

### Load Data

In [3]:
# API endpoint
url = "https://data.cityofnewyork.us/resource/kpav-sd4t.csv"
params = {
    # increases the limit of rows recieved from API because without the API's default limit was 1000 rows
    "$limit": 10000
}
# Make the API request and parse the response into a dataframe
response = requests.get(url, params=params)
df = pd.read_csv(response.url)

In [4]:
df

Unnamed: 0,job_id,agency,posting_type,number_of_positions,business_title,civil_service_title,title_classification,title_code_no,level,job_category,...,additional_information,to_apply,hours_shift,work_location_1,recruitment_contact,residency_requirement,posting_date,post_until,posting_updated,process_date
0,576947,DEPARTMENT OF BUILDINGS,Internal,3,Plan Examiner,PLAN EXAMINER (BLDGS),Competitive-1,22410,00,"Engineering, Architecture, & Planning",...,You must be able to understand and be understo...,For Non-City/External Candidates: Visit the Ex...,,CITYWIDE Unit assignment and work location are...,,New York City Residency is not required for th...,2023-03-01T00:00:00.000,31-MAR-2023,2023-03-01T00:00:00.000,2023-03-07T00:00:00.000
1,564623,OFFICE OF LABOR RELATIONS,Internal,13,Data Processor,COMMUNITY ASSISTANT,Non-Competitive-5,56056,00,Administration & Human Resources,...,PLEASE NOTE: THIS IS A TEMPORARY POSITION UNT...,TO APPLY PLEASE SUBMIT YOUR COVER LETTER AND R...,,,,New York City residency is generally required ...,2022-12-29T00:00:00.000,,2023-03-06T00:00:00.000,2023-03-07T00:00:00.000
2,533544,ADMIN FOR CHILDREN'S SVCS,External,1,"Security Consultant, Horizon",COMMUNITY COORDINATOR,Non-Competitive-5,56058,00,"Public Safety, Inspections, & Enforcement",...,Section 424-A of the New York Social Services ...,Click on the Apply button now.,,,,New York City residency is generally required ...,2022-08-19T00:00:00.000,,2022-08-19T00:00:00.000,2023-03-07T00:00:00.000
3,545994,DEPARTMENT OF TRANSPORTATION,Internal,1,Civil Engineer Level -3,CIVIL ENGINEER,Competitive-1,20215,03,"Engineering, Architecture, & Planning",...,The City of New York is an inclusive equal opp...,Resumes may be submitted electronically using ...,35 Hours,55 Water St Ny Ny,,New York City Residency is not required for th...,2022-09-27T00:00:00.000,,2022-10-04T00:00:00.000,2023-03-07T00:00:00.000
4,538672,HRA/DEPT OF SOCIAL SERVICES,External,1,"DIRECTOR, SNT PROGRAM",EXECUTIVE AGENCY COUNSEL,Non-Competitive-5,95005,M2,Administration & Human Resources Constituent S...,...,**LOAN FORGIVENESS The federal government pro...,Click Apply Now Button,Monday to Friday: 9 - 5.,,,New York City residency is generally required ...,2022-07-15T00:00:00.000,,2022-12-12T00:00:00.000,2023-03-07T00:00:00.000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6298,553066,HRA/DEPT OF SOCIAL SERVICES,External,1,IT MOBILE SUPPORT TECHNICIAN - EHVP,TELECOMMUNICATIONS ASSOCIATE (,Competitive-1,20247,02,"Technology, Data & Innovation",...,**LOAN FORGIVENESS The federal government pro...,Click APPLY NOW Button.,Monday â Friday 9am â 5pm or Tuesday - Sat...,"12 W 14Th St., N.Y.",,New York City residency is generally required ...,2022-10-19T00:00:00.000,,2023-01-11T00:00:00.000,2023-03-07T00:00:00.000
6299,574319,DEPARTMENT OF FINANCE,Internal,1,Java Web Developer,COMPUTER SPECIALIST (SOFTWARE),Competitive-1,13632,04,Communications & Intergovernmental Affairs Tec...,...,The City of New York is an inclusive equal opp...,Click the Apply Now button. While we apprecia...,"Unless otherwise indicated, all positions requ...",59 Maiden Lane (Current location but could be ...,,New York City Residency is not required for th...,2023-02-23T00:00:00.000,09-MAR-2023,2023-02-22T00:00:00.000,2023-03-07T00:00:00.000
6300,576201,LAW DEPARTMENT,Internal,2,Help Desk Specialist - IT Division,COMPUTER AIDE,Competitive-1,13620,02,"Technology, Data & Innovation",...,âAs a current or prospective employee of the...,Please click on the Apply Now button.,,,,New York City residency is generally required ...,2023-03-01T00:00:00.000,,2023-03-01T00:00:00.000,2023-03-07T00:00:00.000
6301,559445,POLICE DEPARTMENT,Internal,1,Senior Police Administrative Aide,SENIOR POLICE ADMINISTRATIVE A,Competitive-1,10147,00,Administration & Human Resources,...,This lateral opportunity is open to current Se...,Click the Apply Now button.,"MondayxFriday 0700x1500, 0800x1600, 0900x1700","516 East Tremont Ave., Bronx, NY 10457",,New York City residency is generally required ...,2023-02-21T00:00:00.000,,2023-02-22T00:00:00.000,2023-03-07T00:00:00.000


### Cleaning the Data

1. Here, we are renaming some of the columns name to better indicate the purpose of each column.

In [5]:
df = df.rename(columns = {"full_time_part_time_indicator" : "employment_type",
                          "minimum_qual_requirements" : "minimum_qualifications",
                          "work_location_1" : "work_location_2"})

Created a function called clean_columns takes in a dataframe and list of columns names where it removes any non alpha numeric characters each data cell.

In [6]:
def clean_columns(df, columns) :
    for column in columns :
        df[column] = df[column].str.replace('[^a-zA-Z0-9\s]|\t', '', regex = True)
    return df

In [7]:
# replaces non alpha numeric characters with an empty string
clean_columns(df,['job_description', 'minimum_qualifications', 'preferred_skills', 'additional_information'])

Unnamed: 0,job_id,agency,posting_type,number_of_positions,business_title,civil_service_title,title_classification,title_code_no,level,job_category,...,additional_information,to_apply,hours_shift,work_location_2,recruitment_contact,residency_requirement,posting_date,post_until,posting_updated,process_date
0,576947,DEPARTMENT OF BUILDINGS,Internal,3,Plan Examiner,PLAN EXAMINER (BLDGS),Competitive-1,22410,00,"Engineering, Architecture, & Planning",...,You must be able to understand and be understo...,For Non-City/External Candidates: Visit the Ex...,,CITYWIDE Unit assignment and work location are...,,New York City Residency is not required for th...,2023-03-01T00:00:00.000,31-MAR-2023,2023-03-01T00:00:00.000,2023-03-07T00:00:00.000
1,564623,OFFICE OF LABOR RELATIONS,Internal,13,Data Processor,COMMUNITY ASSISTANT,Non-Competitive-5,56056,00,Administration & Human Resources,...,PLEASE NOTE THIS IS A TEMPORARY POSITION UNTI...,TO APPLY PLEASE SUBMIT YOUR COVER LETTER AND R...,,,,New York City residency is generally required ...,2022-12-29T00:00:00.000,,2023-03-06T00:00:00.000,2023-03-07T00:00:00.000
2,533544,ADMIN FOR CHILDREN'S SVCS,External,1,"Security Consultant, Horizon",COMMUNITY COORDINATOR,Non-Competitive-5,56058,00,"Public Safety, Inspections, & Enforcement",...,Section 424A of the New York Social Services L...,Click on the Apply button now.,,,,New York City residency is generally required ...,2022-08-19T00:00:00.000,,2022-08-19T00:00:00.000,2023-03-07T00:00:00.000
3,545994,DEPARTMENT OF TRANSPORTATION,Internal,1,Civil Engineer Level -3,CIVIL ENGINEER,Competitive-1,20215,03,"Engineering, Architecture, & Planning",...,The City of New York is an inclusive equal opp...,Resumes may be submitted electronically using ...,35 Hours,55 Water St Ny Ny,,New York City Residency is not required for th...,2022-09-27T00:00:00.000,,2022-10-04T00:00:00.000,2023-03-07T00:00:00.000
4,538672,HRA/DEPT OF SOCIAL SERVICES,External,1,"DIRECTOR, SNT PROGRAM",EXECUTIVE AGENCY COUNSEL,Non-Competitive-5,95005,M2,Administration & Human Resources Constituent S...,...,LOAN FORGIVENESS The federal government provi...,Click Apply Now Button,Monday to Friday: 9 - 5.,,,New York City residency is generally required ...,2022-07-15T00:00:00.000,,2022-12-12T00:00:00.000,2023-03-07T00:00:00.000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6298,553066,HRA/DEPT OF SOCIAL SERVICES,External,1,IT MOBILE SUPPORT TECHNICIAN - EHVP,TELECOMMUNICATIONS ASSOCIATE (,Competitive-1,20247,02,"Technology, Data & Innovation",...,LOAN FORGIVENESS The federal government provi...,Click APPLY NOW Button.,Monday â Friday 9am â 5pm or Tuesday - Sat...,"12 W 14Th St., N.Y.",,New York City residency is generally required ...,2022-10-19T00:00:00.000,,2023-01-11T00:00:00.000,2023-03-07T00:00:00.000
6299,574319,DEPARTMENT OF FINANCE,Internal,1,Java Web Developer,COMPUTER SPECIALIST (SOFTWARE),Competitive-1,13632,04,Communications & Intergovernmental Affairs Tec...,...,The City of New York is an inclusive equal opp...,Click the Apply Now button. While we apprecia...,"Unless otherwise indicated, all positions requ...",59 Maiden Lane (Current location but could be ...,,New York City Residency is not required for th...,2023-02-23T00:00:00.000,09-MAR-2023,2023-02-22T00:00:00.000,2023-03-07T00:00:00.000
6300,576201,LAW DEPARTMENT,Internal,2,Help Desk Specialist - IT Division,COMPUTER AIDE,Competitive-1,13620,02,"Technology, Data & Innovation",...,As a current or prospective employee of the Ci...,Please click on the Apply Now button.,,,,New York City residency is generally required ...,2023-03-01T00:00:00.000,,2023-03-01T00:00:00.000,2023-03-07T00:00:00.000
6301,559445,POLICE DEPARTMENT,Internal,1,Senior Police Administrative Aide,SENIOR POLICE ADMINISTRATIVE A,Competitive-1,10147,00,Administration & Human Resources,...,This lateral opportunity is open to current Se...,Click the Apply Now button.,"MondayxFriday 0700x1500, 0800x1600, 0900x1700","516 East Tremont Ave., Bronx, NY 10457",,New York City residency is generally required ...,2023-02-21T00:00:00.000,,2023-02-22T00:00:00.000,2023-03-07T00:00:00.000


In [8]:
df[['job_description', 'minimum_qualifications','preferred_skills', 'additional_information']].head()

Unnamed: 0,job_description,minimum_qualifications,preferred_skills,additional_information
0,The Department of Buildings promotes the safet...,1 License or Registration Requirement A valid ...,Knowledge of the NYC Construction Code and Zon...,You must be able to understand and be understo...
1,Under supervision with some latitude for indep...,1 There are no formal education or experience ...,Preference will be given to those candidates w...,PLEASE NOTE THIS IS A TEMPORARY POSITION UNTI...
2,THIS POSITION IS ONLY AVAILABLE TO CANDIDATES ...,1 A baccalaureate degree from an accredited co...,The ideal candidate will demonstrate knowledge...,Section 424A of the New York Social Services L...
3,Serves as Civil Engineer Level 3 in the Bureau...,1 Four 4 years of fulltime satisfactory experi...,Ability to communicate effectively in verbal a...,The City of New York is an inclusive equal opp...
4,The DSS Accountability Office DSS AO is respon...,Admission to the New York State Bar and four y...,Knowledge of Medicaid Supplemental Needs Trus...,LOAN FORGIVENESS The federal government provi...


2. Taking a look at the data type of the columns to make sure they are the right ones for each column
Looks like every columns is the data type is supposed to be.

In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6303 entries, 0 to 6302
Data columns (total 30 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   job_id                  6303 non-null   int64  
 1   agency                  6303 non-null   object 
 2   posting_type            6303 non-null   object 
 3   number_of_positions     6303 non-null   int64  
 4   business_title          6303 non-null   object 
 5   civil_service_title     6303 non-null   object 
 6   title_classification    6303 non-null   object 
 7   title_code_no           6303 non-null   object 
 8   level                   6303 non-null   object 
 9   job_category            6302 non-null   object 
 10  employment_type         6070 non-null   object 
 11  career_level            6302 non-null   object 
 12  salary_range_from       6303 non-null   float64
 13  salary_range_to         6303 non-null   float64
 14  salary_frequency        6303 non-null   


3. Fixing missing and invalid values in the data
Next, we would like to know how many NaN for each column. Looking at the columns with nan values are not need for the analysis later.

In [10]:
na_count = df.isna().sum(axis = 0).sort_values(ascending = False)

In [11]:
na_count

recruitment_contact       6303
post_until                4047
hours_shift               3948
work_location_2           3348
additional_information    1409
preferred_skills           896
employment_type            233
minimum_qualifications      44
job_category                 1
career_level                 1
job_id                       0
job_description              0
posting_updated              0
posting_date                 0
residency_requirement        0
to_apply                     0
work_location                0
division_work_unit           0
agency                       0
salary_frequency             0
salary_range_to              0
salary_range_from            0
level                        0
title_code_no                0
title_classification         0
civil_service_title          0
business_title               0
number_of_positions          0
posting_type                 0
process_date                 0
dtype: int64

In [12]:
print(f'There are', sum(na_count), 'NaN values in the dataframe.')

There are 20230 NaN values in the dataframe.


In [13]:
df.describe()

Unnamed: 0,job_id,number_of_positions,salary_range_from,salary_range_to,recruitment_contact
count,6303.0,6303.0,6303.0,6303.0,0.0
mean,550696.390449,2.89735,59001.394266,81073.720371,
std,41038.412639,11.441326,31941.560906,46261.097782,
min,97899.0,1.0,0.0,15.0,
25%,544177.0,1.0,44083.0,58918.0,
50%,561710.0,1.0,58700.0,78074.0,
75%,571537.0,1.0,75121.0,105138.0,
max,577722.0,250.0,231796.0,252165.0,


4. Creating new column mean to calculate the mean salary for each job positiion.

In [34]:
df_data_jobs['mean_salary'] = (df['salary_range_to'] + df['salary_range_from'] / 2)
df_data_jobs[['business_title', 'career_level', 'mean_salary']]

Unnamed: 0,business_title,career_level,mean_salary
331,2023-BWSO-009-GIS Data Maintenance Intern,Student,22.5
1465,2023-BWSO-009-GIS Data Maintenance Intern,Student,22.5
3332,Data Collection Intern,Student,25.0
4725,Data Collection Intern,Student,25.0
2879,"2023-BWS-015-Aquatics Vegetation Analysis, Kin...",Student,25.5
...,...,...,...
4916,"Data Analyst - Housing Unit, Bureau of Hepatit...",Entry-Level,96210.0
747,Analyst Pension Analysis,Entry-Level,116201.0
5909,Analyst Pension Analysis,Entry-Level,116201.0
753,Database Developer,Entry-Level,167500.0


5. Dropping columns not needed for analysis
6. Dropping row that does not contain any of the keyword

In [18]:
data_jobs_keywords = ['data', 'intelligence', 'analysis', 'analytics', 'database', 'machine learning', 'database', 'power BI']

In [23]:
df_data_jobs = df[df['business_title'].str.contains('|'.join(data_jobs_keywords), flags = re.IGNORECASE)]

In [24]:
df_data_jobs

Unnamed: 0,job_id,agency,posting_type,number_of_positions,business_title,civil_service_title,title_classification,title_code_no,level,job_category,...,additional_information,to_apply,hours_shift,work_location_2,recruitment_contact,residency_requirement,posting_date,post_until,posting_updated,process_date
1,564623,OFFICE OF LABOR RELATIONS,Internal,13,Data Processor,COMMUNITY ASSISTANT,Non-Competitive-5,56056,00,Administration & Human Resources,...,PLEASE NOTE THIS IS A TEMPORARY POSITION UNTI...,TO APPLY PLEASE SUBMIT YOUR COVER LETTER AND R...,,,,New York City residency is generally required ...,2022-12-29T00:00:00.000,,2023-03-06T00:00:00.000,2023-03-07T00:00:00.000
10,564623,OFFICE OF LABOR RELATIONS,External,13,Data Processor,COMMUNITY ASSISTANT,Non-Competitive-5,56056,00,Administration & Human Resources,...,PLEASE NOTE THIS IS A TEMPORARY POSITION UNTI...,TO APPLY PLEASE SUBMIT YOUR COVER LETTER AND R...,,,,New York City residency is generally required ...,2022-12-29T00:00:00.000,,2023-03-06T00:00:00.000,2023-03-07T00:00:00.000
30,575854,HOUSING PRESERVATION & DVLPMNT,External,1,"Data & Analytics Manager, Division of Strategi...",CITY RESEARCH SCIENTIST,Non-Competitive-5,21744,02,"Policy, Research & Analysis",...,This position is also open to qualified person...,Apply online,,100 Gold Street,,New York City residency is generally required ...,2023-02-21T00:00:00.000,23-MAR-2023,2023-03-01T00:00:00.000,2023-03-07T00:00:00.000
58,559206,DEPT OF HEALTH/MENTAL HYGIENE,Internal,1,".NET Developer, Bureau of Application Developm...",CYBER SECURITY ANALYST,Competitive-1,13633,02,"Health Technology, Data & Innovation",...,IMPORTANT NOTES TO ALL CANDIDATES Please note...,Apply online with a cover letter to https://a1...,,,,New York City Residency is not required for th...,2023-02-14T00:00:00.000,14-JUN-2023,2023-02-14T00:00:00.000,2023-03-07T00:00:00.000
60,576307,TEACHERS RETIREMENT SYSTEM,Internal,1,Data Analyst,STAFF ANALYST,Competitive-1,12626,01,"Finance, Accounting, & Procurement Technology,...",...,What We Can Offer You TRSNYC offers a compreh...,Apply via ESS (Employee Self-Service). In or...,,,,New York City residency is generally required ...,2023-02-24T00:00:00.000,14-MAR-2023,2023-03-06T00:00:00.000,2023-03-07T00:00:00.000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6149,572615,DEPARTMENT FOR THE AGING,Internal,1,Senior Research & Data Analyst,CITY RESEARCH SCIENTIST,Non-Competitive-5,21744,03,Constituent Services & Community Programs Comm...,...,,Please be sure to submit a resume & cover lett...,,,,New York City residency is generally required ...,2023-02-23T00:00:00.000,22-AUG-2023,2023-02-23T00:00:00.000,2023-03-07T00:00:00.000
6166,574383,DEPARTMENT OF CITY PLANNING,Internal,1,Data Engineer,COMPUTER SPECIALIST (SOFTWARE),Competitive-1,13632,02,"Technology, Data & Innovation",...,NOTE ONLY CANDIDATES WHO HAVE A PERMANENT COMP...,Only applicants under consideration will be co...,,,,New York City Residency is not required for th...,2023-02-13T00:00:00.000,,2023-02-13T00:00:00.000,2023-03-07T00:00:00.000
6179,572795,DEPT OF PARKS & RECREATION,External,1,Data Scientist,CITY RESEARCH SCIENTIST,Non-Competitive-5,21744,03,"Technology, Data & Innovation Policy, Research...",...,NOTE References will be required upon request ...,Parks Employees: 1) From a Parks computer: Acc...,,"Arsenal, Manhattan",,"Residency in New York City, Nassau, Orange, Ro...",2023-02-03T00:00:00.000,,2023-02-02T00:00:00.000,2023-03-07T00:00:00.000
6180,540619,FINANCIAL INFO SVCS AGENCY,Internal,1,DATA CENTER ASSOCIATE â SHIFT D,TELECOMMUNICATIONS ASSOCIATE (,Competitive-1,20247,01,"Technology, Data & Innovation",...,P140,External applicants please visit https://a127-...,The 12-hour/day work schedule will be Mon-Wed ...,,,New York City residency is generally required ...,2022-07-15T00:00:00.000,,2022-08-08T00:00:00.000,2023-03-07T00:00:00.000


In [28]:
df_data_jobs = df_data_jobs[['business_title','career_level', 'salary_range_from', 'salary_range_to']]
df_data_jobs

Unnamed: 0,business_title,career_level,salary_range_from,salary_range_to
1,Data Processor,Entry-Level,32520.0,42191.0
10,Data Processor,Entry-Level,32520.0,42191.0
30,"Data & Analytics Manager, Division of Strategi...",Experienced (non-manager),80000.0,86830.0
58,".NET Developer, Bureau of Application Developm...",Experienced (non-manager),78795.0,90625.0
60,Data Analyst,Experienced (non-manager),53797.0,70000.0
...,...,...,...,...
6149,Senior Research & Data Analyst,Experienced (non-manager),84468.0,97138.0
6166,Data Engineer,Experienced (non-manager),85371.0,100000.0
6179,Data Scientist,Experienced (non-manager),95000.0,105000.0
6180,DATA CENTER ASSOCIATE â SHIFT D,Experienced (non-manager),43392.0,58918.0


7. Sorting the new dataframe with career level and bottom salary

In [32]:
df_data_jobs.sort_values(by = ['career_level', 'salary_range_from'], ascending = [False, True])

Unnamed: 0,business_title,career_level,salary_range_from,salary_range_to
331,2023-BWSO-009-GIS Data Maintenance Intern,Student,15.0,15.0
1465,2023-BWSO-009-GIS Data Maintenance Intern,Student,15.0,15.0
3332,Data Collection Intern,Student,15.0,17.5
4725,Data Collection Intern,Student,15.0,17.5
2879,"2023-BWS-015-Aquatics Vegetation Analysis, Kin...",Student,17.0,17.0
...,...,...,...,...
4916,"Data Analyst - Housing Unit, Bureau of Hepatit...",Entry-Level,64140.0,64140.0
747,Analyst Pension Analysis,Entry-Level,65604.0,83399.0
5909,Analyst Pension Analysis,Entry-Level,65604.0,83399.0
753,Database Developer,Entry-Level,75000.0,130000.0


From the table above, we can observe that students/intern make the least followed by entry level then experience non-managers then managerial position and finally executives.

8. Filter the data for only entry level job positions

In [38]:
df_data_jobs[df_data_jobs['career_level'] == 'Entry-Level'].sort_values('mean_salary', ascending = False)

Unnamed: 0,business_title,career_level,salary_range_from,salary_range_to,mean_salary
5871,Database Developer,Entry-Level,75000.0,130000.0,167500.0
753,Database Developer,Entry-Level,75000.0,130000.0,167500.0
5909,Analyst Pension Analysis,Entry-Level,65604.0,83399.0,116201.0
747,Analyst Pension Analysis,Entry-Level,65604.0,83399.0,116201.0
2781,"Data Analyst Open Data, Data Analytics, and Re...",Entry-Level,51550.0,73806.0,99581.0
6043,Analyst Labor Contracts Analysis,Entry-Level,51550.0,73806.0,99581.0
5206,"Data Analyst Open Data, Data Analytics, and Re...",Entry-Level,51550.0,73806.0,99581.0
388,Analyst Labor Contracts Analysis,Entry-Level,51550.0,73806.0,99581.0
446,"Data Analyst - Housing Unit, Bureau of Hepatit...",Entry-Level,64140.0,64140.0,96210.0
4916,"Data Analyst - Housing Unit, Bureau of Hepatit...",Entry-Level,64140.0,64140.0,96210.0


9. Converting all string values in a column to Upper cases

In [41]:
df_data_jobs['business_title'].str.upper()

331             2023-BWSO-009-GIS DATA MAINTENANCE INTERN
1465            2023-BWSO-009-GIS DATA MAINTENANCE INTERN
3332                               DATA COLLECTION INTERN
4725                               DATA COLLECTION INTERN
2879    2023-BWS-015-AQUATICS VEGETATION ANALYSIS, KIN...
                              ...                        
4916    DATA ANALYST - HOUSING UNIT, BUREAU OF HEPATIT...
747                             ANALYST  PENSION ANALYSIS
5909                            ANALYST  PENSION ANALYSIS
753                                    DATABASE DEVELOPER
5871                                   DATABASE DEVELOPER
Name: business_title, Length: 210, dtype: object

10. Checking which columns are numeric

In [64]:
def numeric_columns(df):
    numeric_cols = []
    for col in df.columns:
        if pd.api.types.is_numeric_dtype(df[col]):
            numeric_cols.append(col)
    return numeric_cols

In [67]:
numeric_columns(df_data_jobs)

['salary_range_from', 'salary_range_to', 'mean_salary']

In [68]:
numeric_columns(df)

['job_id',
 'number_of_positions',
 'salary_range_from',
 'salary_range_to',
 'recruitment_contact']

11. Grouping the dataset by one column to calculate summary statistics

In [71]:
import numpy as np
df_data_jobs.groupby('business_title').agg({'mean_salary' : [np.median, np.min, np.max]})

Unnamed: 0_level_0,mean_salary,mean_salary,mean_salary
Unnamed: 0_level_1,median,amin,amax
business_title,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
".NET Developer, Bureau of Application Development and Database Administration",130022.50,130022.50,130022.50
"2023-BWS-015-Aquatics Vegetation Analysis, Kingston",25.50,25.50,25.50
2023-BWSO-009-GIS Data Maintenance Intern,22.50,22.50,22.50
AUDIT & QUALITY ASSURANCE DATABASE ASSOCIATE,118316.50,118316.50,118316.50
Analyst Labor Contracts Analysis,99581.00,99581.00,99581.00
...,...,...,...
Senior Data and Legal Research Analyst,107500.00,107500.00,107500.00
Senior Research & Data Analyst,139372.00,139372.00,139372.00
Supervising Analyst / Unit Head Labor Contract Analysis,145006.50,145006.50,145006.50
TELECOMMUNICATIONS ASSOCIATE (DATA),149751.00,149751.00,149751.00


12. Grouping the data set by two columns

In [87]:
df_median = df_data_jobs.groupby(['business_title', 'career_level']).agg({'mean_salary' : [np.median, np.min, np.max]})
df_median

Unnamed: 0_level_0,Unnamed: 1_level_0,mean_salary,mean_salary,mean_salary
Unnamed: 0_level_1,Unnamed: 1_level_1,median,amin,amax
business_title,career_level,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
".NET Developer, Bureau of Application Development and Database Administration",Experienced (non-manager),130022.50,130022.50,130022.50
"2023-BWS-015-Aquatics Vegetation Analysis, Kingston",Student,25.50,25.50,25.50
2023-BWSO-009-GIS Data Maintenance Intern,Student,22.50,22.50,22.50
AUDIT & QUALITY ASSURANCE DATABASE ASSOCIATE,Experienced (non-manager),118316.50,118316.50,118316.50
Analyst Labor Contracts Analysis,Entry-Level,99581.00,99581.00,99581.00
...,...,...,...,...
Senior Data and Legal Research Analyst,Experienced (non-manager),107500.00,107500.00,107500.00
Senior Research & Data Analyst,Experienced (non-manager),139372.00,139372.00,139372.00
Supervising Analyst / Unit Head Labor Contract Analysis,Manager,145006.50,145006.50,145006.50
TELECOMMUNICATIONS ASSOCIATE (DATA),Experienced (non-manager),149751.00,149751.00,149751.00


In [95]:
df_entry_level = df_data_jobs[df_data_jobs['career_level'] == "Entry-Level"]
df_entry_level['mean_salary'].mean()

82337.17876315789

## Conclusion

There 89 distinct entry-level data-related job listing for NYC government jobs with annual salary mean of $82,000. We notice that data science is not used in only department but rather across different department in different industries which illustrated the versatility and the applicatibilty of data science. Thus, further supports the evidence that data science is a good investment of time and money to learn. Data Science is not locked in any specific industry or field but is in demand in almost every industry.