Data Dictionary for initial dataframe (data2.csv)

* **Job Title**: contains the title of the job position that is being advertise
* **Job Info**: contains information about the job, such as whether it is full-time or part-time, and whether it is a job or an internship
* **Company Name**: contains the name of the company that is offering the job
* **Company Location**: contains the location of the company, including the city and state
* **Employees**: contains information about the size of the company, such as the number of employees
* **Industry**: contains the industry in which the company operates
* **Headquarters**: contains the location of the company's headquarters
* **Application deadline**: contains the date and time by which job applications must be submitted
* **Posted date**: contains the date on which the job was posted
* **Location type**: contains information about the type of location where the job is located, such as whether it is on-site or remote
* **US work authorization**: contains information about whether the company requires candidates to have US work authorization
* **Estimated pay**: contains information about the estimated pay for the job position.
* **Seasonal role**: contains information about the working perid for sesonal roles.
* **Company division**: contains information about the division of the company that is offering the job
* **Work study**: contains information about whether the job is a work study program



Important libraries

In [204]:
import pandas as pd
import numpy as np
import math

pd.set_option('display.max_columns', None)

Read data from Exel file and create DataFrame for future analysis

In [205]:
df = pd.read_csv("csv/data.csv")
df_modify = df.copy()

Explore initial dataframe

In [206]:
df_modify.head(10)

Unnamed: 0,Job Title,Job Info,Company Name,Company Location,Employees,Industry,Headquarters,Application deadline,Posted date,Location type,US work authorization,Estimated pay,Seasonal role,Company division,Work study
0,USDA-ARS Postdoctoral Associate Fellowship on ...,Full-Time ∙ Fellowship,USDA Agricultural Research Service (ARS),"Charleston, SC","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",3/31/2023 1:00,21-Mar-23,On-site,Not required,,,,
1,USDA-ARS Internship in Prion Diseases,Full-Time ∙ Internship,USDA Agricultural Research Service (ARS),"Ames, IA","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",3/31/2023 1:00,21-Mar-23,On-site,Required,,,,
2,USDA-ARS Summer Agricultural Engineer Fellowship,Full-Time ∙ Fellowship,USDA Agricultural Research Service (ARS),"Miami, FL","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",3/31/2023 1:00,21-Mar-23,On-site,Not required,,,,
3,USDA-ARS Research Opportunity on Provenance In...,Full-Time ∙ Internship,USDA Agricultural Research Service (ARS),"Washington, DC","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",3/31/2023 1:00,21-Mar-23,On-site,Required,,,,
4,Interventionist-Bethke Elementary School,Part-Time ∙ Job,Poudre School District,"Fort Collins, CO","1,000 - 5,000",K-12 Education,"Fort Collins, CO",4/4/2023 1:55,21-Mar-23,On-site,Will sponsor a work visa and accepts OPT/CPT,"$48,000 per year",,,
5,USDA-ARS Chemist Research Associate Postdoctor...,Part-Time ∙ Fellowship,USDA Agricultural Research Service (ARS),"Beltsville, MD","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",3/31/2023 1:00,21-Mar-23,On-site,Required,,,,
6,USDA-ARS Postdoctoral Research Opportunity in ...,Full-Time ∙ Fellowship,USDA Agricultural Research Service (ARS),"Athens, GA","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",3/27/2023 1:00,20-Mar-23,On-site,Not required,,,,
7,Client Relations Representative,Full-Time ∙ Job,Oak Rising,"Charlotte, NC\nJacksonville, FL",Oct-50,"Advertising, PR & Marketing","Raleigh, NC",4/21/2023 0:55,20-Mar-23,On-site,Accepts OPT/CPT,"$45,000-60,000 per year",,,
8,Teacher Elementary- Music- Timnath Elementary ...,Full-Time ∙ Job,Poudre School District,"Fort Collins, CO","1,000 - 5,000",K-12 Education,"Fort Collins, CO",4/1/2023 1:55,20-Mar-23,On-site,Will sponsor a work visa and accepts OPT/CPT,"$48,000 per year",,,
9,Basketball Coach,Full-Time ∙ Job,Iroquois Springs,"Rock Hill, NY",100 - 250,Summer Camps/Outdoor Recreation,"Rock Hill, NY",4/20/2023 17:30,20-Mar-23,On-site,Accepts OPT/CPT,"$2,700 per year",(6/12/23 - 8/4/23),,


In [207]:
df_modify.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25906 entries, 0 to 25905
Data columns (total 15 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   Job Title              25906 non-null  object
 1   Job Info               25906 non-null  object
 2   Company Name           25906 non-null  object
 3   Company Location       25906 non-null  object
 4   Employees              25906 non-null  object
 5   Industry               25906 non-null  object
 6   Headquarters           25900 non-null  object
 7   Application deadline   25892 non-null  object
 8   Posted date            25892 non-null  object
 9   Location type          25892 non-null  object
 10  US work authorization  25668 non-null  object
 11  Estimated pay          15418 non-null  object
 12  Seasonal role          5982 non-null   object
 13  Company division       4447 non-null   object
 14  Work study             3 non-null      object
dtypes: object(15)
memor

In [208]:
df_modify.shape

(25906, 15)

In [209]:
df_modify.columns

Index(['Job Title', 'Job Info', 'Company Name', 'Company Location',
       'Employees', 'Industry', 'Headquarters', 'Application deadline',
       'Posted date', 'Location type', 'US work authorization',
       'Estimated pay', 'Seasonal role', 'Company division', 'Work study'],
      dtype='object')

***STEP 1: DATA MODIFICATION***

Application deadline/ Post date

In [210]:
# Convert 'Application deadline' column to datetime format
df_modify['Application deadline'] = pd.to_datetime(df_modify['Application deadline'])

# Convert 'Posted date' column to datetime format
df_modify['Posted date'] = pd.to_datetime(df_modify['Posted date'])

# Extract the date and time components from 'Application deadline' column and create new columns
df_modify['Application deadline (date)'] = df_modify['Application deadline'].dt.date
df_modify['Application deadline (time)'] = df_modify['Application deadline'].dt.time

# Calculate the time difference between 'Application deadline (date)' and 'Posted date' columns and create a new column 'Application Window'
df_modify['Application Window (weeks)'] = pd.to_datetime(df_modify['Application deadline (date)']) - df_modify['Posted date']
# Convert 'Application Window' from timedelta64 to int and show in weeks
df_modify['Application Window (weeks)'] = df_modify['Application Window (weeks)'].dt.days // 7

# Convert 'Application deadline (date)' column to datetime format
df_modify['Application deadline (date)'] = pd.to_datetime(df_modify['Application deadline (date)'])

# Drop 'Application deadline' column since we don't need it anymore
df_modify.drop('Application deadline', axis=1, inplace=True)

In [211]:
df_modify

Unnamed: 0,Job Title,Job Info,Company Name,Company Location,Employees,Industry,Headquarters,Posted date,Location type,US work authorization,Estimated pay,Seasonal role,Company division,Work study,Application deadline (date),Application deadline (time),Application Window (weeks)
0,USDA-ARS Postdoctoral Associate Fellowship on ...,Full-Time ∙ Fellowship,USDA Agricultural Research Service (ARS),"Charleston, SC","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Not required,,,,,2023-03-31,01:00:00,1.0
1,USDA-ARS Internship in Prion Diseases,Full-Time ∙ Internship,USDA Agricultural Research Service (ARS),"Ames, IA","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Required,,,,,2023-03-31,01:00:00,1.0
2,USDA-ARS Summer Agricultural Engineer Fellowship,Full-Time ∙ Fellowship,USDA Agricultural Research Service (ARS),"Miami, FL","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Not required,,,,,2023-03-31,01:00:00,1.0
3,USDA-ARS Research Opportunity on Provenance In...,Full-Time ∙ Internship,USDA Agricultural Research Service (ARS),"Washington, DC","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Required,,,,,2023-03-31,01:00:00,1.0
4,Interventionist-Bethke Elementary School,Part-Time ∙ Job,Poudre School District,"Fort Collins, CO","1,000 - 5,000",K-12 Education,"Fort Collins, CO",2023-03-21,On-site,Will sponsor a work visa and accepts OPT/CPT,"$48,000 per year",,,,2023-04-04,01:55:00,2.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25901,Coaching Assistant - Summer camp,Full-Time ∙ Internship,Camp Skylemar,"Naples, ME",100 - 250,Sports & Leisure,"Naples, ME",2023-03-12,On-site,Accepts OPT/CPT,"$3,000 per year",(6/11/23 - 8/6/23),,,2023-04-22,00:00:00,5.0
25902,Videographer/Content Creator,Full-Time ∙ Internship,Camp Skylemar,"Naples, ME",100 - 250,Sports & Leisure,"Naples, ME",2023-03-12,On-site,Accepts OPT/CPT,"$3,500 per year",(6/13/23 - 8/7/23),,,2023-04-29,00:00:00,6.0
25903,Talent Acquisition Intern - Remote,Full-Time ∙ Internship,CONMED,"Tampa, FL","1,000 - 5,000",Medical Devices,"Largo, FL",2023-03-12,Remote,Required,$15.00-20.00 per hour,,,,2023-03-31,00:00:00,2.0
25904,Camp Counselor - Summer 2023,Full-Time ∙ Internship,Camp Danbee,"Peru, MA","250 - 1,000",Summer Camps/Outdoor Recreation,"101 W Main Rd, Peru, Massachusetts 01235, Unit...",2023-03-12,On-site,Will sponsor a work visa and accepts OPT/CPT,"$1,000-2,000 per month",(6/15/23 - 8/11/23),,,2023-04-30,12:00:00,7.0


In [212]:
# split the job_type column into two separate columns based on the " ∙ " separator
df_modify[['Employment Type', 'Job Type', "Payment Status"]] = df_modify['Job Info'].str.split(' ∙ ', expand=True)

# drop the original job_type column
df_modify.drop('Job Info', axis=1, inplace=True)

df_modify.head(10)

Unnamed: 0,Job Title,Company Name,Company Location,Employees,Industry,Headquarters,Posted date,Location type,US work authorization,Estimated pay,Seasonal role,Company division,Work study,Application deadline (date),Application deadline (time),Application Window (weeks),Employment Type,Job Type,Payment Status
0,USDA-ARS Postdoctoral Associate Fellowship on ...,USDA Agricultural Research Service (ARS),"Charleston, SC","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Not required,,,,,2023-03-31,01:00:00,1.0,Full-Time,Fellowship,
1,USDA-ARS Internship in Prion Diseases,USDA Agricultural Research Service (ARS),"Ames, IA","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Required,,,,,2023-03-31,01:00:00,1.0,Full-Time,Internship,
2,USDA-ARS Summer Agricultural Engineer Fellowship,USDA Agricultural Research Service (ARS),"Miami, FL","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Not required,,,,,2023-03-31,01:00:00,1.0,Full-Time,Fellowship,
3,USDA-ARS Research Opportunity on Provenance In...,USDA Agricultural Research Service (ARS),"Washington, DC","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Required,,,,,2023-03-31,01:00:00,1.0,Full-Time,Internship,
4,Interventionist-Bethke Elementary School,Poudre School District,"Fort Collins, CO","1,000 - 5,000",K-12 Education,"Fort Collins, CO",2023-03-21,On-site,Will sponsor a work visa and accepts OPT/CPT,"$48,000 per year",,,,2023-04-04,01:55:00,2.0,Part-Time,Job,
5,USDA-ARS Chemist Research Associate Postdoctor...,USDA Agricultural Research Service (ARS),"Beltsville, MD","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Required,,,,,2023-03-31,01:00:00,1.0,Part-Time,Fellowship,
6,USDA-ARS Postdoctoral Research Opportunity in ...,USDA Agricultural Research Service (ARS),"Athens, GA","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-20,On-site,Not required,,,,,2023-03-27,01:00:00,1.0,Full-Time,Fellowship,
7,Client Relations Representative,Oak Rising,"Charlotte, NC\nJacksonville, FL",Oct-50,"Advertising, PR & Marketing","Raleigh, NC",2023-03-20,On-site,Accepts OPT/CPT,"$45,000-60,000 per year",,,,2023-04-21,00:55:00,4.0,Full-Time,Job,
8,Teacher Elementary- Music- Timnath Elementary ...,Poudre School District,"Fort Collins, CO","1,000 - 5,000",K-12 Education,"Fort Collins, CO",2023-03-20,On-site,Will sponsor a work visa and accepts OPT/CPT,"$48,000 per year",,,,2023-04-01,01:55:00,1.0,Full-Time,Job,
9,Basketball Coach,Iroquois Springs,"Rock Hill, NY",100 - 250,Summer Camps/Outdoor Recreation,"Rock Hill, NY",2023-03-20,On-site,Accepts OPT/CPT,"$2,700 per year",(6/12/23 - 8/4/23),,,2023-04-20,17:30:00,4.0,Full-Time,Job,


Job Type


In [213]:
df_modify.loc[df_modify['Payment Status'].isnull() & df_modify['Estimated pay'].isnull(), 'Payment Status'] = '-'
df_modify.loc[df_modify['Payment Status'].isnull() & df_modify['Estimated pay'].notnull(), 'Payment Status'] = 'Paid'

df_modify.head(10)

Unnamed: 0,Job Title,Company Name,Company Location,Employees,Industry,Headquarters,Posted date,Location type,US work authorization,Estimated pay,Seasonal role,Company division,Work study,Application deadline (date),Application deadline (time),Application Window (weeks),Employment Type,Job Type,Payment Status
0,USDA-ARS Postdoctoral Associate Fellowship on ...,USDA Agricultural Research Service (ARS),"Charleston, SC","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Not required,,,,,2023-03-31,01:00:00,1.0,Full-Time,Fellowship,-
1,USDA-ARS Internship in Prion Diseases,USDA Agricultural Research Service (ARS),"Ames, IA","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Required,,,,,2023-03-31,01:00:00,1.0,Full-Time,Internship,-
2,USDA-ARS Summer Agricultural Engineer Fellowship,USDA Agricultural Research Service (ARS),"Miami, FL","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Not required,,,,,2023-03-31,01:00:00,1.0,Full-Time,Fellowship,-
3,USDA-ARS Research Opportunity on Provenance In...,USDA Agricultural Research Service (ARS),"Washington, DC","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Required,,,,,2023-03-31,01:00:00,1.0,Full-Time,Internship,-
4,Interventionist-Bethke Elementary School,Poudre School District,"Fort Collins, CO","1,000 - 5,000",K-12 Education,"Fort Collins, CO",2023-03-21,On-site,Will sponsor a work visa and accepts OPT/CPT,"$48,000 per year",,,,2023-04-04,01:55:00,2.0,Part-Time,Job,Paid
5,USDA-ARS Chemist Research Associate Postdoctor...,USDA Agricultural Research Service (ARS),"Beltsville, MD","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Required,,,,,2023-03-31,01:00:00,1.0,Part-Time,Fellowship,-
6,USDA-ARS Postdoctoral Research Opportunity in ...,USDA Agricultural Research Service (ARS),"Athens, GA","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-20,On-site,Not required,,,,,2023-03-27,01:00:00,1.0,Full-Time,Fellowship,-
7,Client Relations Representative,Oak Rising,"Charlotte, NC\nJacksonville, FL",Oct-50,"Advertising, PR & Marketing","Raleigh, NC",2023-03-20,On-site,Accepts OPT/CPT,"$45,000-60,000 per year",,,,2023-04-21,00:55:00,4.0,Full-Time,Job,Paid
8,Teacher Elementary- Music- Timnath Elementary ...,Poudre School District,"Fort Collins, CO","1,000 - 5,000",K-12 Education,"Fort Collins, CO",2023-03-20,On-site,Will sponsor a work visa and accepts OPT/CPT,"$48,000 per year",,,,2023-04-01,01:55:00,1.0,Full-Time,Job,Paid
9,Basketball Coach,Iroquois Springs,"Rock Hill, NY",100 - 250,Summer Camps/Outdoor Recreation,"Rock Hill, NY",2023-03-20,On-site,Accepts OPT/CPT,"$2,700 per year",(6/12/23 - 8/4/23),,,2023-04-20,17:30:00,4.0,Full-Time,Job,Paid


Location

In [214]:
# Caluclate the number of company locations based on the "\n" separator, if row contains one, else use 1
df_modify['Number of Location'] = df_modify['Company Location'].apply(lambda x: x.count('\n') + 1 if '\n' in x else 1)
# # Replace '\n' with ' ' if 'Company Location' row contains one, else keep row without changes
# df_modify['Company Location'] = df_modify['Company Location'].apply(lambda x: x.replace('\n', ' ') if '\n' in x else x)

Seasonal Role

In [215]:
# create role_start_date and role_end_date columns from seasonal_role
df_modify[['Role start date', 'Role end date']] = df_modify['Seasonal role'].str.split(' - ', expand=True)

# Remove the brackets from 'Role start date' and 'Role end date'
df_modify['Role start date'] = df_modify['Role start date'].str.replace('(','')
df_modify['Role end date'] = df_modify['Role end date'].str.replace(')','')

# Convert 'Role start date' and 'Role end date' to datetime format
df_modify['Role start date'] = pd.to_datetime(df_modify['Role start date'])
df_modify['Role end date'] = pd.to_datetime(df_modify['Role end date'])

df_modify['Role Duration'] = (df_modify['Role end date'] - df_modify['Role start date']).dt.days
# Convert Role Duration from days to weeks
df_modify['Role Duration (weeks)'] = df_modify['Role Duration'].apply(lambda x: math.ceil(x/7) if pd.notna(x) else None)

# modify seasonal_role column to contain boolean values
df_modify['Seasonal role'] = df_modify['Seasonal role'].notna()

df_modify.drop('Role Duration', axis=1, inplace=True)


  df_modify['Role start date'] = df_modify['Role start date'].str.replace('(','')
  df_modify['Role end date'] = df_modify['Role end date'].str.replace(')','')


In [216]:
df_modify.tail(15)

Unnamed: 0,Job Title,Company Name,Company Location,Employees,Industry,Headquarters,Posted date,Location type,US work authorization,Estimated pay,Seasonal role,Company division,Work study,Application deadline (date),Application deadline (time),Application Window (weeks),Employment Type,Job Type,Payment Status,Number of Location,Role start date,Role end date,Role Duration (weeks)
25891,Resource Team Intern Summer 2023 (Spanish),InReach (formerly AsylumConnect),"Miami, FL\nAtlanta, GA\nAsheville, NC\nNashvil...",10-Jan,Non-Profit - Other,"228 Park Ave S Suite # 90945 New York, NY 1000...",2023-03-11,Remote,Required,,True,,,2023-05-08,17:00:00,8.0,Part-Time,Internship,Unpaid,20,2023-05-22,2023-08-11,12.0
25892,Respite Counselor - Milford - Full Time,"Riverside Community Care, Inc.","Milford, MA","1,000 - 5,000",Healthcare,"270 Bridge Street, Suite 301, Dedham, Massachu...",2023-03-11,On-site,Required,,False,,,2023-08-31,00:00:00,24.0,Full-Time,Job,-,1,NaT,NaT,
25893,Respite Counselor - Milford - Relief,"Riverside Community Care, Inc.","Milford, MA","1,000 - 5,000",Healthcare,"270 Bridge Street, Suite 301, Dedham, Massachu...",2023-03-11,On-site,Required,,False,,,2023-08-31,00:00:00,24.0,Part-Time,Job,-,1,NaT,NaT,
25894,Respite Counselor - Norwood,"Riverside Community Care, Inc.","Norwood, MA","1,000 - 5,000",Healthcare,"270 Bridge Street, Suite 301, Dedham, Massachu...",2023-03-11,On-site,Required,,False,,,2023-08-31,00:00:00,24.0,Full-Time,Job,-,1,NaT,NaT,
25895,Respite Counselor - Norwood - Relief,"Riverside Community Care, Inc.","Norwood, MA","1,000 - 5,000",Healthcare,"270 Bridge Street, Suite 301, Dedham, Massachu...",2023-03-11,On-site,Required,,False,,,2023-08-31,00:00:00,24.0,Part-Time,Job,-,1,NaT,NaT,
25896,Social Media Intern Summer 2023,InReach (formerly AsylumConnect),"Miami, FL\nNew Orleans, LA\nWashington, DC\nIn...",10-Jan,Non-Profit - Other,"228 Park Ave S Suite # 90945 New York, NY 1000...",2023-03-11,Remote,Required,,True,,,2023-05-08,17:00:00,8.0,Part-Time,Internship,Unpaid,12,2023-05-22,2023-08-11,12.0
25897,Taxpayer Advocate Service (TAS) - Case Advocat...,Taxpayer Advocate Service - Internal Revenue S...,United States,"25,000+","Government - Local, State & Federal","Laguna Niguel, CA",2023-03-11,Remote,Required,"$30,000-40,000 per year",False,,,2023-10-11,02:00:00,30.0,Full-Time,Job,Paid,1,NaT,NaT,
25898,Taxpayer Advocate Service (TAS) - Case Advocat...,Taxpayer Advocate Service - Internal Revenue S...,United States,"25,000+","Government - Local, State & Federal","Laguna Niguel, CA",2023-03-11,On-site,Required,"$30,000-40,000 per year",False,,,2023-09-27,02:00:00,28.0,Full-Time,Job,Paid,1,NaT,NaT,
25899,Taxpayer Advocate Service (TAS) - Intake Advoc...,Taxpayer Advocate Service - Internal Revenue S...,United States,"25,000+","Government - Local, State & Federal","Laguna Niguel, CA",2023-03-11,Remote,Required,"$30,000-40,000 per year",False,,,2023-09-26,02:00:00,28.0,Full-Time,Job,Paid,1,NaT,NaT,
25900,Taxpayer Advocate Service (TAS) - Intake Advoc...,Taxpayer Advocate Service - Internal Revenue S...,United States,"25,000+","Government - Local, State & Federal","Laguna Niguel, CA",2023-03-11,Remote,Required,"$30,000-40,000 per year",False,,,2023-09-20,02:00:00,27.0,Full-Time,Job,Paid,1,NaT,NaT,


Estimated Salary

In [217]:
# Split the 'Estimated pay' column into two separate columns 'Pay rate' and 'Payment Period' based on the separator 'per'
df_modify[['Pay rate','Payment Period']] = df_modify['Estimated pay'].str.split('per', expand=True)

# Strip any leading or trailing white spaces in the 'Payment Period' column
df_modify['Payment Period'] = df_modify['Payment Period'].str.strip()


# Remove the dollar sign '$' from the 'Pay rate' column
df_modify['Pay rate'] = df_modify['Pay rate'].str.replace('$','')

# Extract the minimum and maximum salary values from the 'Pay rate' column and store them in new columns 'Min salary' and 'Max salary', respectively
df_modify['Min salary'] = df_modify['Pay rate'].apply(lambda x: x.split('-')[0].strip() if isinstance(x, str) and '-' in x else None)
df_modify['Max salary'] = df_modify['Pay rate'].apply(lambda x: x.split('-')[1].strip() if isinstance(x, str) and '-' in x and len(x.split('-')) >= 2 else None)
df_modify['Min salary'] = df_modify['Pay rate'].apply(lambda x: x.split('-')[0].strip().replace(',', '') if isinstance(x, str) else None)
df_modify['Max salary'] = df_modify['Pay rate'].apply(lambda x: x.split('-')[1].strip().replace(',', '') if isinstance(x, str) and '-' in x else None)

# Convert the 'Min salary' and 'Max salary' columns to numeric data types
df_modify['Min salary'] = pd.to_numeric(df_modify['Min salary'])
df_modify['Max salary'] = pd.to_numeric(df_modify['Max salary'])

# Fill any missing values in the 'Max salary' column with the corresponding value from the 'Min salary' column
df_modify['Max salary'].fillna(df_modify['Min salary'], inplace=True)

# Drop the original 'Estimated pay' column from the DataFrame using the drop() method with axis=1, inplace=True
df_modify.drop('Estimated pay', axis=1, inplace=True)

  df_modify['Pay rate'] = df_modify['Pay rate'].str.replace('$','')


Work study

In [218]:
#Drop the 'Work study' column from the dataframe.
df_modify.drop('Work study', axis=1, inplace=True)

***EXTRA FEATURES***

In [219]:
df_modify.columns

Index(['Job Title', 'Company Name', 'Company Location', 'Employees',
       'Industry', 'Headquarters', 'Posted date', 'Location type',
       'US work authorization', 'Seasonal role', 'Company division',
       'Application deadline (date)', 'Application deadline (time)',
       'Application Window (weeks)', 'Employment Type', 'Job Type',
       'Payment Status', 'Number of Location', 'Role start date',
       'Role end date', 'Role Duration (weeks)', 'Pay rate', 'Payment Period',
       'Min salary', 'Max salary'],
      dtype='object')

In [220]:
#Method to convert monthly/hourly pay to yearly
def convert_salary(df):
    for index, row in df.iterrows():
        if row['Employment Type'] == 'Full-Time':
            if row['Payment Period'] == 'hour':
                df.at[index, 'Min salary'] *= 2080
                df.at[index, 'Max salary'] *= 2080
                df.at[index, 'Payment Period'] = 'year'
            elif row['Payment Period'] == 'month':
                df.at[index, 'Min salary'] *= 12
                df.at[index, 'Max salary'] *= 12
                df.at[index, 'Payment Period'] = 'year'
        
        elif row['Employment Type'] == 'Part-Time':
            if row['Payment Period'] == 'hour':
                df.at[index, 'Min salary'] *= 1040
                df.at[index, 'Max salary'] *= 1040
                df.at[index, 'Payment Period'] = 'year'
            elif row['Payment Period'] == 'month':
                df.at[index, 'Min salary'] *= 12
                df.at[index, 'Max salary'] *= 12
                df.at[index, 'Payment Period'] = 'year'
    return df

In [221]:
#Aplly method convert_salary()
df_modify = convert_salary(df_modify)

In [222]:
df_modify['Min salary'].fillna(0, inplace=True)
df_modify['Max salary'].fillna(0, inplace=True)

In [223]:
df_modify.drop(df_modify[df_modify['Min salary'] > 1000000].index, inplace=True)
df_modify.drop(df_modify[df_modify['Max salary'] > 1000000].index, inplace=True)

In [224]:
df_modify

Unnamed: 0,Job Title,Company Name,Company Location,Employees,Industry,Headquarters,Posted date,Location type,US work authorization,Seasonal role,Company division,Application deadline (date),Application deadline (time),Application Window (weeks),Employment Type,Job Type,Payment Status,Number of Location,Role start date,Role end date,Role Duration (weeks),Pay rate,Payment Period,Min salary,Max salary
0,USDA-ARS Postdoctoral Associate Fellowship on ...,USDA Agricultural Research Service (ARS),"Charleston, SC","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Not required,False,,2023-03-31,01:00:00,1.0,Full-Time,Fellowship,-,1,NaT,NaT,,,,0.0,0.0
1,USDA-ARS Internship in Prion Diseases,USDA Agricultural Research Service (ARS),"Ames, IA","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Required,False,,2023-03-31,01:00:00,1.0,Full-Time,Internship,-,1,NaT,NaT,,,,0.0,0.0
2,USDA-ARS Summer Agricultural Engineer Fellowship,USDA Agricultural Research Service (ARS),"Miami, FL","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Not required,False,,2023-03-31,01:00:00,1.0,Full-Time,Fellowship,-,1,NaT,NaT,,,,0.0,0.0
3,USDA-ARS Research Opportunity on Provenance In...,USDA Agricultural Research Service (ARS),"Washington, DC","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Required,False,,2023-03-31,01:00:00,1.0,Full-Time,Internship,-,1,NaT,NaT,,,,0.0,0.0
4,Interventionist-Bethke Elementary School,Poudre School District,"Fort Collins, CO","1,000 - 5,000",K-12 Education,"Fort Collins, CO",2023-03-21,On-site,Will sponsor a work visa and accepts OPT/CPT,False,,2023-04-04,01:55:00,2.0,Part-Time,Job,Paid,1,NaT,NaT,,48000,year,48000.0,48000.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25901,Coaching Assistant - Summer camp,Camp Skylemar,"Naples, ME",100 - 250,Sports & Leisure,"Naples, ME",2023-03-12,On-site,Accepts OPT/CPT,True,,2023-04-22,00:00:00,5.0,Full-Time,Internship,Paid,1,2023-06-11,2023-08-06,8.0,3000,year,3000.0,3000.0
25902,Videographer/Content Creator,Camp Skylemar,"Naples, ME",100 - 250,Sports & Leisure,"Naples, ME",2023-03-12,On-site,Accepts OPT/CPT,True,,2023-04-29,00:00:00,6.0,Full-Time,Internship,Paid,1,2023-06-13,2023-08-07,8.0,3500,year,3500.0,3500.0
25903,Talent Acquisition Intern - Remote,CONMED,"Tampa, FL","1,000 - 5,000",Medical Devices,"Largo, FL",2023-03-12,Remote,Required,False,,2023-03-31,00:00:00,2.0,Full-Time,Internship,Paid,1,NaT,NaT,,15.00-20.00,year,31200.0,41600.0
25904,Camp Counselor - Summer 2023,Camp Danbee,"Peru, MA","250 - 1,000",Summer Camps/Outdoor Recreation,"101 W Main Rd, Peru, Massachusetts 01235, Unit...",2023-03-12,On-site,Will sponsor a work visa and accepts OPT/CPT,True,,2023-04-30,12:00:00,7.0,Full-Time,Internship,Paid,1,2023-06-15,2023-08-11,9.0,"1,000-2,000",year,12000.0,24000.0


In [225]:
df_modify['Average salary'] = (df_modify['Min salary'] + df_modify['Max salary'])/2

In [226]:
df_modify

Unnamed: 0,Job Title,Company Name,Company Location,Employees,Industry,Headquarters,Posted date,Location type,US work authorization,Seasonal role,Company division,Application deadline (date),Application deadline (time),Application Window (weeks),Employment Type,Job Type,Payment Status,Number of Location,Role start date,Role end date,Role Duration (weeks),Pay rate,Payment Period,Min salary,Max salary,Average salary
0,USDA-ARS Postdoctoral Associate Fellowship on ...,USDA Agricultural Research Service (ARS),"Charleston, SC","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Not required,False,,2023-03-31,01:00:00,1.0,Full-Time,Fellowship,-,1,NaT,NaT,,,,0.0,0.0,0.0
1,USDA-ARS Internship in Prion Diseases,USDA Agricultural Research Service (ARS),"Ames, IA","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Required,False,,2023-03-31,01:00:00,1.0,Full-Time,Internship,-,1,NaT,NaT,,,,0.0,0.0,0.0
2,USDA-ARS Summer Agricultural Engineer Fellowship,USDA Agricultural Research Service (ARS),"Miami, FL","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Not required,False,,2023-03-31,01:00:00,1.0,Full-Time,Fellowship,-,1,NaT,NaT,,,,0.0,0.0,0.0
3,USDA-ARS Research Opportunity on Provenance In...,USDA Agricultural Research Service (ARS),"Washington, DC","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Required,False,,2023-03-31,01:00:00,1.0,Full-Time,Internship,-,1,NaT,NaT,,,,0.0,0.0,0.0
4,Interventionist-Bethke Elementary School,Poudre School District,"Fort Collins, CO","1,000 - 5,000",K-12 Education,"Fort Collins, CO",2023-03-21,On-site,Will sponsor a work visa and accepts OPT/CPT,False,,2023-04-04,01:55:00,2.0,Part-Time,Job,Paid,1,NaT,NaT,,48000,year,48000.0,48000.0,48000.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25901,Coaching Assistant - Summer camp,Camp Skylemar,"Naples, ME",100 - 250,Sports & Leisure,"Naples, ME",2023-03-12,On-site,Accepts OPT/CPT,True,,2023-04-22,00:00:00,5.0,Full-Time,Internship,Paid,1,2023-06-11,2023-08-06,8.0,3000,year,3000.0,3000.0,3000.0
25902,Videographer/Content Creator,Camp Skylemar,"Naples, ME",100 - 250,Sports & Leisure,"Naples, ME",2023-03-12,On-site,Accepts OPT/CPT,True,,2023-04-29,00:00:00,6.0,Full-Time,Internship,Paid,1,2023-06-13,2023-08-07,8.0,3500,year,3500.0,3500.0,3500.0
25903,Talent Acquisition Intern - Remote,CONMED,"Tampa, FL","1,000 - 5,000",Medical Devices,"Largo, FL",2023-03-12,Remote,Required,False,,2023-03-31,00:00:00,2.0,Full-Time,Internship,Paid,1,NaT,NaT,,15.00-20.00,year,31200.0,41600.0,36400.0
25904,Camp Counselor - Summer 2023,Camp Danbee,"Peru, MA","250 - 1,000",Summer Camps/Outdoor Recreation,"101 W Main Rd, Peru, Massachusetts 01235, Unit...",2023-03-12,On-site,Will sponsor a work visa and accepts OPT/CPT,True,,2023-04-30,12:00:00,7.0,Full-Time,Internship,Paid,1,2023-06-15,2023-08-11,9.0,"1,000-2,000",year,12000.0,24000.0,18000.0


***STEP 2: DATA CLEANING***

Explore modified dataframe

In [227]:
df_modify.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 25795 entries, 0 to 25905
Data columns (total 26 columns):
 #   Column                       Non-Null Count  Dtype         
---  ------                       --------------  -----         
 0   Job Title                    25795 non-null  object        
 1   Company Name                 25795 non-null  object        
 2   Company Location             25795 non-null  object        
 3   Employees                    25795 non-null  object        
 4   Industry                     25795 non-null  object        
 5   Headquarters                 25789 non-null  object        
 6   Posted date                  25781 non-null  datetime64[ns]
 7   Location type                25781 non-null  object        
 8   US work authorization        25558 non-null  object        
 9   Seasonal role                25795 non-null  bool          
 10  Company division             4429 non-null   object        
 11  Application deadline (date)  25781 non-nu

In [228]:
df_modify.describe()

Unnamed: 0,Application Window (weeks),Number of Location,Role Duration (weeks),Min salary,Max salary,Average salary
count,25781.0,25795.0,5964.0,25795.0,25795.0,25795.0
mean,35.102556,2.025741,20.498491,26804.306521,32811.601695,29807.954108
std,85.910883,6.915402,45.152773,31228.467262,39336.940188,34758.642733
min,-671.0,1.0,-1394.0,0.0,0.0,0.0
25%,6.0,1.0,10.0,0.0,0.0,0.0
50%,20.0,1.0,12.0,18000.0,20800.0,18720.0
75%,41.0,1.0,18.0,45673.0,58021.2,52000.0
max,4173.0,322.0,1601.0,780000.0,832000.0,780000.0


In [229]:
df_modify.shape

(25795, 26)

In [230]:
df_modify.nunique()

Job Title                      19304
Company Name                    5485
Company Location                4959
Employees                         10
Industry                          74
Headquarters                    2907
Posted date                      966
Location type                      2
US work authorization              5
Seasonal role                      2
Company division                1203
Application deadline (date)      708
Application deadline (time)      312
Application Window (weeks)       382
Employment Type                    3
Job Type                           8
Payment Status                     3
Number of Location                80
Role start date                  446
Role end date                    473
Role Duration (weeks)            140
Pay rate                        3129
Payment Period                     2
Min salary                      2054
Max salary                      2192
Average salary                  2465
dtype: int64

Find duplicates

In [231]:
#Drop rows with duplicated values
df_modify.drop_duplicates(inplace=True)

In [232]:
#Check number of duplicates
df_modify.duplicated().value_counts()

False    22734
dtype: int64

Empoyment type

In [233]:
df_modify['Employment Type'].unique()

array(['Full-Time', 'Part-Time', 'Seasonal'], dtype=object)

In [234]:
df_modify[df_modify['Employment Type']=='Seasonal']
# df_modify.loc[filt1,['Employment Type','Seasonal role']].value_counts()

Unnamed: 0,Job Title,Company Name,Company Location,Employees,Industry,Headquarters,Posted date,Location type,US work authorization,Seasonal role,Company division,Application deadline (date),Application deadline (time),Application Window (weeks),Employment Type,Job Type,Payment Status,Number of Location,Role start date,Role end date,Role Duration (weeks),Pay rate,Payment Period,Min salary,Max salary,Average salary
5638,Summer Camp Counselor,Camp Weequahic,"Lakewood, PA",100 - 250,Summer Camps/Outdoor Recreation,"Lakewood, PA",2022-09-14,On-site,Will sponsor a work visa and accepts OPT/CPT,True,,2023-06-01,12:00:00,37.0,Seasonal,Job,Paid,1,2023-06-14,2023-08-07,8.0,2000,year,2000.0,2000.0,2000.0


In [235]:
df_modify.drop(df_modify[df_modify['Employment Type']=='Seasonal'].index, inplace=True)

In [236]:
df_modify['Employment Type'].unique()

array(['Full-Time', 'Part-Time'], dtype=object)

Payment Period

In [237]:
df_modify['Payment Period'].unique()

array([nan, 'year', 'year or more'], dtype=object)

In [238]:
# Assign NaN to the "Max value" column where "Payment period" is "year or more"
df_modify.loc[df_modify['Payment Period'] == 'year or more', 'Max salary'] = np.nan

Role Duration (weeks)

In [239]:
df_modify.drop(df_modify[df_modify['Role Duration (weeks)'] < 0].index, inplace=True)

Application deadline

In [240]:
df_modify.drop(df_modify[df_modify['Application Window (weeks)'] < 0].index, inplace=True)

***STEP 3: SAVE NEW (COMPLETE) DATAFRAME TO .csv FILE***

In [241]:
df_modify['ID'] = df_modify.index + 1

In [242]:
df_modify

Unnamed: 0,Job Title,Company Name,Company Location,Employees,Industry,Headquarters,Posted date,Location type,US work authorization,Seasonal role,Company division,Application deadline (date),Application deadline (time),Application Window (weeks),Employment Type,Job Type,Payment Status,Number of Location,Role start date,Role end date,Role Duration (weeks),Pay rate,Payment Period,Min salary,Max salary,Average salary,ID
0,USDA-ARS Postdoctoral Associate Fellowship on ...,USDA Agricultural Research Service (ARS),"Charleston, SC","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Not required,False,,2023-03-31,01:00:00,1.0,Full-Time,Fellowship,-,1,NaT,NaT,,,,0.0,0.0,0.0,1
1,USDA-ARS Internship in Prion Diseases,USDA Agricultural Research Service (ARS),"Ames, IA","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Required,False,,2023-03-31,01:00:00,1.0,Full-Time,Internship,-,1,NaT,NaT,,,,0.0,0.0,0.0,2
2,USDA-ARS Summer Agricultural Engineer Fellowship,USDA Agricultural Research Service (ARS),"Miami, FL","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Not required,False,,2023-03-31,01:00:00,1.0,Full-Time,Fellowship,-,1,NaT,NaT,,,,0.0,0.0,0.0,3
3,USDA-ARS Research Opportunity on Provenance In...,USDA Agricultural Research Service (ARS),"Washington, DC","5,000 - 10,000","Government - Local, State & Federal","5601 Sunnyside Avenue Beltsville, Maryland, 20...",2023-03-21,On-site,Required,False,,2023-03-31,01:00:00,1.0,Full-Time,Internship,-,1,NaT,NaT,,,,0.0,0.0,0.0,4
4,Interventionist-Bethke Elementary School,Poudre School District,"Fort Collins, CO","1,000 - 5,000",K-12 Education,"Fort Collins, CO",2023-03-21,On-site,Will sponsor a work visa and accepts OPT/CPT,False,,2023-04-04,01:55:00,2.0,Part-Time,Job,Paid,1,NaT,NaT,,48000,year,48000.0,48000.0,48000.0,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25897,Taxpayer Advocate Service (TAS) - Case Advocat...,Taxpayer Advocate Service - Internal Revenue S...,United States,"25,000+","Government - Local, State & Federal","Laguna Niguel, CA",2023-03-11,Remote,Required,False,,2023-10-11,02:00:00,30.0,Full-Time,Job,Paid,1,NaT,NaT,,"30,000-40,000",year,30000.0,40000.0,35000.0,25898
25898,Taxpayer Advocate Service (TAS) - Case Advocat...,Taxpayer Advocate Service - Internal Revenue S...,United States,"25,000+","Government - Local, State & Federal","Laguna Niguel, CA",2023-03-11,On-site,Required,False,,2023-09-27,02:00:00,28.0,Full-Time,Job,Paid,1,NaT,NaT,,"30,000-40,000",year,30000.0,40000.0,35000.0,25899
25899,Taxpayer Advocate Service (TAS) - Intake Advoc...,Taxpayer Advocate Service - Internal Revenue S...,United States,"25,000+","Government - Local, State & Federal","Laguna Niguel, CA",2023-03-11,Remote,Required,False,,2023-09-26,02:00:00,28.0,Full-Time,Job,Paid,1,NaT,NaT,,"30,000-40,000",year,30000.0,40000.0,35000.0,25900
25900,Taxpayer Advocate Service (TAS) - Intake Advoc...,Taxpayer Advocate Service - Internal Revenue S...,United States,"25,000+","Government - Local, State & Federal","Laguna Niguel, CA",2023-03-11,Remote,Required,False,,2023-09-20,02:00:00,27.0,Full-Time,Job,Paid,1,NaT,NaT,,"30,000-40,000",year,30000.0,40000.0,35000.0,25901


In [243]:
print(df.columns)
print("_________________________")
print(df_modify.columns)
print("_________________________")
print(df_modify.shape)

Index(['Job Title', 'Job Info', 'Company Name', 'Company Location',
       'Employees', 'Industry', 'Headquarters', 'Application deadline',
       'Posted date', 'Location type', 'US work authorization',
       'Estimated pay', 'Seasonal role', 'Company division', 'Work study'],
      dtype='object')
_________________________
Index(['Job Title', 'Company Name', 'Company Location', 'Employees',
       'Industry', 'Headquarters', 'Posted date', 'Location type',
       'US work authorization', 'Seasonal role', 'Company division',
       'Application deadline (date)', 'Application deadline (time)',
       'Application Window (weeks)', 'Employment Type', 'Job Type',
       'Payment Status', 'Number of Location', 'Role start date',
       'Role end date', 'Role Duration (weeks)', 'Pay rate', 'Payment Period',
       'Min salary', 'Max salary', 'Average salary', 'ID'],
      dtype='object')
_________________________
(22725, 27)


In [244]:
df_modify = df_modify[['ID', 
                       
                       'Job Title',
                       'Company Name',
                       'Industry',
                       'Company division',
                       
                       'Posted date',
                       'Application deadline (date)', 
                       'Application deadline (time)',
                       'Application Window (weeks)',

                       'Employment Type',
                       'Job Type',
                       'Location type', 

                       'US work authorization',

                       'Seasonal role',
                       'Role start date', 
                       'Role end date',
                       'Role Duration (weeks)',

                       'Payment Status',
                       'Payment Period',
                       'Pay rate', 
                       'Min salary', 
                       'Max salary',
                       'Average salary',

                       'Headquarters',
                       'Company Location',
                       'Number of Location',

                       'Employees'
                       ]]

Data Dictionary for modified table (modified_data.csv)

* **Job Title**: The title of the job position being advertised.
* **Company Name**: The name of the company offering the job.
* **Industry**: The industry in which the company operates.
* **Company division**: The division of the company that is offering the job.
* **Posted date**: The date on which the job was posted.
* **Application deadline (date)**: The date by which job applications must be submitted.
* **Application deadline (time)**: The time of day by which job applications must be submitted.
* **Application Window (weeks)**: The time period between the job being posted and the application deadline (weeks).
* **Employment Type**: Whether the job is full-time, part-time, or some other type of employment.
* **Job Type**: Whether the job is a job or an internship.
* **Location type**: Whether the job is on-site or remote.
* **US work authorization**: Whether the company requires candidates to have US work authorization.
* **Seasonal role**: Whether the job is a seasonal role.
* **Role start date**: The date on which the seasonal role starts.
* **Role end date**: The date on which the seasonal role ends.
* **Role Duration (weeks)**: Describer the number of weeks to work, for esonal roles
* **Payment Status**: Whether the job is paid or unpaid.
* **Payment Period**: Describes the period for payment in the job posting (year/year or more).
* **Pay rate**: The rate at which the job pays.
* **Min salary**: The minimum salary for the job.
* **Max salary**: The maximum salary for the job.
* **Headquarters**: The location of the company's headquarters.
* **Company Location**: The location of the company offering the job.
* **Number of Location**: The number of locations where the company has offices.
* **Employees**: Information about the size of the company, such as the number of employees

In [245]:
df_modify.head(10)

Unnamed: 0,ID,Job Title,Company Name,Industry,Company division,Posted date,Application deadline (date),Application deadline (time),Application Window (weeks),Employment Type,Job Type,Location type,US work authorization,Seasonal role,Role start date,Role end date,Role Duration (weeks),Payment Status,Payment Period,Pay rate,Min salary,Max salary,Average salary,Headquarters,Company Location,Number of Location,Employees
0,1,USDA-ARS Postdoctoral Associate Fellowship on ...,USDA Agricultural Research Service (ARS),"Government - Local, State & Federal",,2023-03-21,2023-03-31,01:00:00,1.0,Full-Time,Fellowship,On-site,Not required,False,NaT,NaT,,-,,,0.0,0.0,0.0,"5601 Sunnyside Avenue Beltsville, Maryland, 20...","Charleston, SC",1,"5,000 - 10,000"
1,2,USDA-ARS Internship in Prion Diseases,USDA Agricultural Research Service (ARS),"Government - Local, State & Federal",,2023-03-21,2023-03-31,01:00:00,1.0,Full-Time,Internship,On-site,Required,False,NaT,NaT,,-,,,0.0,0.0,0.0,"5601 Sunnyside Avenue Beltsville, Maryland, 20...","Ames, IA",1,"5,000 - 10,000"
2,3,USDA-ARS Summer Agricultural Engineer Fellowship,USDA Agricultural Research Service (ARS),"Government - Local, State & Federal",,2023-03-21,2023-03-31,01:00:00,1.0,Full-Time,Fellowship,On-site,Not required,False,NaT,NaT,,-,,,0.0,0.0,0.0,"5601 Sunnyside Avenue Beltsville, Maryland, 20...","Miami, FL",1,"5,000 - 10,000"
3,4,USDA-ARS Research Opportunity on Provenance In...,USDA Agricultural Research Service (ARS),"Government - Local, State & Federal",,2023-03-21,2023-03-31,01:00:00,1.0,Full-Time,Internship,On-site,Required,False,NaT,NaT,,-,,,0.0,0.0,0.0,"5601 Sunnyside Avenue Beltsville, Maryland, 20...","Washington, DC",1,"5,000 - 10,000"
4,5,Interventionist-Bethke Elementary School,Poudre School District,K-12 Education,,2023-03-21,2023-04-04,01:55:00,2.0,Part-Time,Job,On-site,Will sponsor a work visa and accepts OPT/CPT,False,NaT,NaT,,Paid,year,48000,48000.0,48000.0,48000.0,"Fort Collins, CO","Fort Collins, CO",1,"1,000 - 5,000"
5,6,USDA-ARS Chemist Research Associate Postdoctor...,USDA Agricultural Research Service (ARS),"Government - Local, State & Federal",,2023-03-21,2023-03-31,01:00:00,1.0,Part-Time,Fellowship,On-site,Required,False,NaT,NaT,,-,,,0.0,0.0,0.0,"5601 Sunnyside Avenue Beltsville, Maryland, 20...","Beltsville, MD",1,"5,000 - 10,000"
6,7,USDA-ARS Postdoctoral Research Opportunity in ...,USDA Agricultural Research Service (ARS),"Government - Local, State & Federal",,2023-03-20,2023-03-27,01:00:00,1.0,Full-Time,Fellowship,On-site,Not required,False,NaT,NaT,,-,,,0.0,0.0,0.0,"5601 Sunnyside Avenue Beltsville, Maryland, 20...","Athens, GA",1,"5,000 - 10,000"
7,8,Client Relations Representative,Oak Rising,"Advertising, PR & Marketing",,2023-03-20,2023-04-21,00:55:00,4.0,Full-Time,Job,On-site,Accepts OPT/CPT,False,NaT,NaT,,Paid,year,"45,000-60,000",45000.0,60000.0,52500.0,"Raleigh, NC","Charlotte, NC\nJacksonville, FL",2,Oct-50
8,9,Teacher Elementary- Music- Timnath Elementary ...,Poudre School District,K-12 Education,,2023-03-20,2023-04-01,01:55:00,1.0,Full-Time,Job,On-site,Will sponsor a work visa and accepts OPT/CPT,False,NaT,NaT,,Paid,year,48000,48000.0,48000.0,48000.0,"Fort Collins, CO","Fort Collins, CO",1,"1,000 - 5,000"
9,10,Basketball Coach,Iroquois Springs,Summer Camps/Outdoor Recreation,,2023-03-20,2023-04-20,17:30:00,4.0,Full-Time,Job,On-site,Accepts OPT/CPT,True,2023-06-12,2023-08-04,8.0,Paid,year,2700,2700.0,2700.0,2700.0,"Rock Hill, NY","Rock Hill, NY",1,100 - 250


In [246]:
df_modify.to_csv('csv/modified_data.csv', index=False)