In [139]:
import pandas as pd
import numpy as np

In [140]:
df = pd.read_csv('monster_com-job_sample.csv')

### cleaning and transformation steps

- replace empty string with NULL.
- `salary` column is explainatory instead of clear quantify value.
- In `Job type` column, need to deal with "Employee" and other strings after part-time and full-time.
- `Location` and `organization` column contains lots of dirty data(values that have no meaning) but there is no clear pattern between those values.
- In `sector` column, values with '/' are legitimate sectors other all are dirty data.

In [141]:
df.describe()

Unnamed: 0,country,country_code,date_added,has_expired,job_board,job_description,job_title,job_type,location,organization,page_url,salary,sector,uniq_id
count,22000,22000,122,22000,22000,22000,22000,20372,22000,15133,22000,3446,16806,22000
unique,1,1,78,1,1,18744,18759,39,8423,738,22000,1737,163,22000
top,United States of America,US,9/22/2016,No,jobs.monster.com,12N Horizontal Construction Engineers Job Desc...,Monster,Full Time,"Dallas, TX",Healthcare Services,http://jobview.monster.com/it-support-technici...,"40,000.00 - 100,000.00 $ /year",Experienced (Non-Manager),11d599f229a80023d2f40e7c52cd941e
freq,22000,22000,6,22000,22000,104,318,6757,646,1919,1,50,4594,1


In [142]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22000 entries, 0 to 21999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   country          22000 non-null  object
 1   country_code     22000 non-null  object
 2   date_added       122 non-null    object
 3   has_expired      22000 non-null  object
 4   job_board        22000 non-null  object
 5   job_description  22000 non-null  object
 6   job_title        22000 non-null  object
 7   job_type         20372 non-null  object
 8   location         22000 non-null  object
 9   organization     15133 non-null  object
 10  page_url         22000 non-null  object
 11  salary           3446 non-null   object
 12  sector           16806 non-null  object
 13  uniq_id          22000 non-null  object
dtypes: object(14)
memory usage: 2.3+ MB


In [143]:
# replacing blanked-spaced/empty strings with NaN
df.replace(r'^\s*$', np.nan , regex=True)

Unnamed: 0,country,country_code,date_added,has_expired,job_board,job_description,job_title,job_type,location,organization,page_url,salary,sector,uniq_id
0,United States of America,US,,No,jobs.monster.com,TeamSoft is seeing an IT Support Specialist to...,IT Support Technician Job in Madison,Full Time Employee,"Madison, WI 53702",,http://jobview.monster.com/it-support-technici...,,IT/Software Development,11d599f229a80023d2f40e7c52cd941e
1,United States of America,US,,No,jobs.monster.com,The Wisconsin State Journal is seeking a flexi...,Business Reporter/Editor Job in Madison,Full Time,"Madison, WI 53708",Printing and Publishing,http://jobview.monster.com/business-reporter-e...,,,e4cbb126dabf22159aff90223243ff2a
2,United States of America,US,,No,jobs.monster.com,Report this job About the Job DePuy Synthes Co...,Johnson & Johnson Family of Companies Job Appl...,"Full Time, Employee",DePuy Synthes Companies is a member of Johnson...,Personal and Household Services,http://jobview.monster.com/senior-training-lea...,,,839106b353877fa3d896ffb9c1fe01c0
3,United States of America,US,,No,jobs.monster.com,Why Join Altec? If you’re considering a career...,Engineer - Quality Job in Dixon,Full Time,"Dixon, CA",Altec Industries,http://jobview.monster.com/engineer-quality-jo...,,Experienced (Non-Manager),58435fcab804439efdcaa7ecca0fd783
4,United States of America,US,,No,jobs.monster.com,Position ID# 76162 # Positions 1 State CT C...,Shift Supervisor - Part-Time Job in Camphill,Full Time Employee,"Camphill, PA",Retail,http://jobview.monster.com/shift-supervisor-pa...,,Project/Program Management,64d0272dc8496abfd9523a8df63c184c
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21995,United States of America,US,,No,jobs.monster.com,This is a major premier Cincinnati based finan...,Assistant Vice President - Controller Job in C...,Full Time,"Cincinnati, OH",,http://jobview.monster.com/Assistant-Vice-Pres...,"120,000.00 - 160,000.00 $ /yearbonus",,a80bc8cc3a90c17eef418963803bc640
21996,United States of America,US,,No,jobs.monster.com,Luxury homebuilder in Cincinnati seeking multi...,Accountant Job in Cincinnati,Full Time,"Cincinnati, OH 45236",Construction - Residential & Commercial/Office,http://jobview.monster.com/Accountant-Job-Cinc...,"45,000.00 - 60,000.00 $ /year",Manager (Manager/Supervisor of Staff),419a3714be2b30a10f628de207d041de
21997,United States of America,US,,No,jobs.monster.com,RE: Adobe AEM- Client - Loca...,AEM/CQ developer Job in Chicago,Full Time,"Chicago, IL 60602",,http://jobview.monster.com/AEM-CQ5-developer-J...,,,5a590350b73b2cec46b05750a208e345
21998,United States of America,US,,No,jobs.monster.com,Jernberg Industries was established in 1937 an...,Electrician - Experienced Forging Electrician ...,Full Time Employee,"Chicago, IL 60609","Jernberg Industries, Inc.",http://jobview.monster.com/Electrician-Experie...,25.00 - 28.00 $ /hour,Installation/Maintenance/Repair,40161cf61c283af9dc2b0a62947a5f1b


In [144]:
df['salary'].to_frame()

Unnamed: 0,salary
0,
1,
2,
3,
4,
...,...
21995,"120,000.00 - 160,000.00 $ /yearbonus"
21996,"45,000.00 - 60,000.00 $ /year"
21997,
21998,25.00 - 28.00 $ /hour


In [145]:
df['salary'].unique()

array([nan, '9.00 - 13.00 $ /hour', '80,000.00 - 95,000.00 $ /year', ...,
       '$80,000.00+ /year', '120,000.00 - 160,000.00 $ /yearbonus',
       '40,000.00 - 46,000.00 $ /year+ annual bonus (up to 15% of salary)'],
      dtype=object)

- need to identify a way to extract such salaries.

In [146]:
df.job_type.to_frame()

Unnamed: 0,job_type
0,Full Time Employee
1,Full Time
2,"Full Time, Employee"
3,Full Time
4,Full Time Employee
...,...
21995,Full Time
21996,Full Time
21997,Full Time
21998,Full Time Employee


In [147]:
df.job_type.unique()

array(['Full Time Employee', 'Full Time', 'Full Time, Employee',
       'Part Time Employee', nan, 'Full Time Temporary/Contract/Project',
       'Full Time , Employee', 'Full Time, Temporary/Contract/Project',
       'Employee', 'Part Time', 'Part Time, Employee', 'Full Time Intern',
       'Temporary/Contract/Project', 'Full Time / Employee',
       'Full Time , Temporary/Contract/Project',
       'Part Time, Temporary/Contract/Project', 'Full Time/ Employee',
       'Per Diem, Employee', 'Job Type Full Time Employee', 'Per Diem',
       'Full Time\xa0', 'Part Time Intern', 'Per Diem Employee',
       'Part Time/ Temporary/Contract/Project',
       'Part Time Temporary/Contract/Project', 'Exempt',
       'Part Time , Temporary/Contract/Project', 'Full Time\xa0 Employee',
       'Part Time Seasonal', 'Part Time , Employee', 'Job Type Employee',
       'Job Type Full Time Temporary/Contract/Project',
       'Full Time / > Employee', 'Part Time\xa0',
       'Per Diem, Temporary/Contract

In [148]:
df.sector.to_frame()

Unnamed: 0,sector
0,IT/Software Development
1,
2,
3,Experienced (Non-Manager)
4,Project/Program Management
...,...
21995,
21996,Manager (Manager/Supervisor of Staff)
21997,
21998,Installation/Maintenance/Repair


In [149]:
df.sector.unique()

array(['IT/Software Development', nan, 'Experienced (Non-Manager)',
       'Project/Program Management', 'Customer Support/Client Care',
       'Entry Level', 'Building Construction/Skilled Trades',
       'Civil & Structural EngineeringGeneral/Other: Engineering',
       'Installation/Maintenance/Repair', 'Business/Strategic Management',
       'Accounting/Finance/Insurance', 'General/Other: Engineering',
       'Engineering', 'Editorial/Writing', 'Medical/Health',
       'Marketing/Product', 'Manager (Manager/Supervisor of Staff)',
       'Administrative/Clerical', 'Student (Undergraduate/Graduate)',
       'Biotech/R&D/Science', 'Logistics/Transportation',
       'General/Other: Customer Support/Client Care',
       'Sales/Retail/Business Development', 'Education/Training', 'Other',
       'General/Other: Installation/Maintenance/RepairVehicle Repair and Maintenance',
       'General/Other: IT/Software Development',
       'Brand/Product MarketingGeneral/Other: Marketing/ProductProd

In [150]:
df.location.unique()

array(['Madison, WI 53702', 'Madison, WI 53708',
       "DePuy Synthes Companies is a member of Johnson & Johnson's Family of Companies, and is recruiting for a Senior Training Leader located in Raynham, MA.DePuy Synthes Companies of Johnson & Johnson is the largest, most innovative and comprehensive orthopedic and neurological business in the world. DePuy Synthes Companies offer an unparalleled breadth and depth of products, services and programs in the areas of joint reconstruction, trauma, spine, sports medicine, neurological, craniomaxillofacial, power tools and biomaterials. Building on the legacy and strengths of two great companies, more agile and better equipped to meet the needs of today’s evolving health care environment. With a focus on activating insights to develop innovative, comprehensive solutions, we are inspired to advance patient care in greater ways than either company could accomplish on its own.Position Overview• The Training Leader leads the site training functio

In [151]:
df.organization.unique()

array([nan, 'Printing and Publishing', 'Personal and Household Services',
       'Altec Industries', 'Retail', 'Computer/IT Services',
       'Computer Software',
       'Hotels and Lodging Personal and Household Services', 'Insurance',
       'Business Services - Other', 'Education',
       'Construction - Industrial Facilities and InfrastructureConstruction - Residential & Commercial/Office',
       'Accounting and Auditing Services', 'Legal Services',
       'Construction - Residential & Commercial/Office',
       'Engineering Services', 'AllComputer SoftwareComputer/IT Services',
       'Healthcare Services', 'Chicago, IL', 'Manufacturing - Other',
       'Oklahoma City, OK', 'Aerospace and Defense', 'San Francisco, CA',
       'Advertising and PR ServicesManagement Consulting ServicesBusiness Services - Other',
       'Other/Not Classified',
       'RetailAdvertising and PR ServicesBusiness Services - Other',
       'All', 'Electronics, Components, and Semiconductor Mfg',
       '

In [152]:
df.organization.value_counts().to_frame()

Unnamed: 0,organization
Healthcare Services,1919
All,1158
Retail,1081
Other/Not Classified,1048
Manufacturing - Other,885
...,...
Printing and Publishing Architectural and Design ServicesConstruction - Residential & Commercial/Office,1
Construction - Industrial Facilities and InfrastructureManufacturing - Other,1
Automotive and Parts Mfg; Manufacturing - Other; Retail,1
"Tifco Industries, Inc.",1


In [153]:
df.location.value_counts().to_frame()

Unnamed: 0,location
"Dallas, TX",646
"Cincinnati, OH",384
"Columbus, OH",345
"Camphill, PA",333
"Dallas, TX 75201",304
...,...
"Audubon, PA 19403",1
"Goshen, IN",1
"Cheektowaga, NY 14225",1
"Northport, AL 35476",1


In [154]:
df.sector.value_counts().to_frame()

Unnamed: 0,sector
Experienced (Non-Manager),4594
Medical/Health,1254
Entry Level,1172
Sales/Retail/Business Development,938
Manager (Manager/Supervisor of Staff),900
...,...
"Strong work ethic with a drive to exceed expectations Excellent people person: Work well with others in a fast paced, commission sales environment Open to learning and growing independently and from feedback Work well under high pressure with a positive attitude and contagious enthusiasm Detail oriented and highly organized Sense of Design: Able to distinguish and put together various styles, colors, and textures Associates Degree or higher, preferred not required Basic mathematical and computer skills Ability to read, write, and speak in English (a secondary language is a plus) Previous experience in retail or a related field preferred (home improvement, furniture, electronics, customer service, home furnishings, hospitality, flooring, sales, retail, etc.) Benefits Great Pay and Exceptional Training Individual Career Growth Opportunities Holiday and Vacation Pay Medical, Dental, and Vision Insurance HSA Employer Contributions 401(k) Plan with employer matching Company Paid Basic Life Insurance and Accidental Death & Dismemberment Company Paid Long Term Disability The Tile Shop is an Equal Opportunity Employer. *CB",1
General/Other: Sales/Business DevelopmentRetail/Counter Sales and CashierStore/Branch Management,1
Systems Analysis - ITWeb/UI/UX Design,1
Optical,1


In [155]:
df.job_type.value_counts().to_frame()

Unnamed: 0,job_type
Full Time,6757
Full Time Employee,6617
"Full Time, Employee",3360
Full Time Temporary/Contract/Project,1062
"Full Time, Temporary/Contract/Project",533
"Full Time , Employee",406
Part Time Employee,382
Part Time,329
"Part Time, Employee",196
Temporary/Contract/Project,193


In [156]:
df[df['job_title'].str.contains(r"Part Time") == True]

Unnamed: 0,country,country_code,date_added,has_expired,job_board,job_description,job_title,job_type,location,organization,page_url,salary,sector,uniq_id
302,United States of America,US,,No,jobs.monster.com,DescriptionSurprise. Delight. Engage. Amaze. A...,Macy's **Seasonal Holiday Retail Sales** Part ...,Full Time Employee,"Dallas, TX",,http://jobview.monster.com/macy's-seasonal-hol...,,Sales/Retail/Business Development,73b3e1df114ed0c299b598b2cc5300a8
346,United States of America,US,,No,jobs.monster.com,Summary RETAIL RESET MERCHANDISER KROGER PART ...,Retail Reset Merchandiser Kroger Part Time Job...,Part Time Employee,"Las Vegas, NV",Retail,http://jobview.monster.com/retail-reset-mercha...,,Sales/Retail/Business Development,bb9b4a486a27e61549d6ed5717fd049f
360,United States of America,US,,No,jobs.monster.com,Summary Event Specialist Part Time Sales Are...,Event Specialist Part Time Sales Job in Henderson,Part Time Employee,"Henderson, NV",Retail,http://jobview.monster.com/event-specialist-pa...,,Sales/Retail/Business Development,bf8f633eba8f4074ced84bbff62689f7
396,United States of America,US,,No,jobs.monster.com,Summary Event Specialist Part Time Sales Are...,Event Specialist Part Time Sales Job in Maryville,Part Time Employee,"Maryville, TN",Retail,http://jobview.monster.com/event-specialist-pa...,,Sales/Retail/Business Development,ef68e541ac6ff587ae93f56a21761650
402,United States of America,US,,No,jobs.monster.com,DescriptionOverview: The Seasonal Line Cook’s ...,**Seasonal Holiday Lakeshore Grill Restaurant ...,Full Time Employee,"Minnetonka, MN",,http://jobview.monster.com/seasonal-holiday-la...,,Food Services/Hospitality,a0925a5e49c8e5c8586ebd9f862ed3ed
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21888,United States of America,US,,No,jobs.monster.com,Summary Retail PROJECT Merchandiser PART Time ...,Retail Project Merchandiser Part Time Job in C...,Part Time Employee,"Cincinnati, OH",Retail,http://jobview.monster.com/Retail-Project-Merc...,,Sales/Retail/Business Development,449e22b55a3e07d9113f80d4b051cb18
21908,United States of America,US,,No,jobs.monster.com,Summary Brand Ambassador Sales part Time We ...,Brand Ambassador Sales Part Time Job in Cincin...,Part Time Employee,"Cincinnati, OH",Retail,http://jobview.monster.com/Brand-Ambassador-Sa...,,Sales/Retail/Business Development,2ddaad3fd7125f48553893bc63327537
21935,United States of America,US,,No,jobs.monster.com,Summary RETAIL RESET MERCHANDISER KROGER PART ...,Retail Reset Merchandiser Kroger Part Time Job...,Part Time Employee,"Cincinnati, OH",Retail,http://jobview.monster.com/Retail-Reset-Mercha...,,Sales/Retail/Business Development,87f4da8e44941f49b1b1a74075534f01
21945,United States of America,US,,No,jobs.monster.com,Summary Brand Ambassador Sales part Time We ...,Brand Ambassador Sales Part Time Job in Bellevue,Part Time Employee,"Bellevue, KY",Retail,http://jobview.monster.com/Brand-Ambassador-Sa...,,Sales/Retail/Business Development,a924773a37f74d04ff5c2760de29ae3b


### `job_type` column cleaning

In [157]:
# 933 rows returned which contains (Part Time)
df[df['job_type'].str.contains(r'Part\sTime', na=False)]

Unnamed: 0,country,country_code,date_added,has_expired,job_board,job_description,job_title,job_type,location,organization,page_url,salary,sector,uniq_id
8,United States of America,US,,No,jobs.monster.com,"Part-Time, 4:30 pm - 9:30 pm, Mon - Fri Brookd...",Housekeeper Job in Austin,Part Time Employee,"Austin, TX 78746",Hotels and Lodging Personal and Household Serv...,http://jobview.monster.com/housekeeper-job-aus...,,Customer Support/Client Care,a6a2b5e825b8ce1c3b517adb2497c5ed
59,United States of America,US,,No,jobs.monster.com,Maden Technologies is accepting applications f...,Technical Writer - Documentation Job in Ogden,Part Time Employee,"Ogden, UT",,http://jobview.monster.com/Technical-Writer-Do...,,Editorial/Writing,93b4ec0f424b532fb0beebba0c51bef9
69,United States of America,US,,No,jobs.monster.com,Do you have a passion for the beauty industry?...,Salon Coordinator Job in Fargo,Part Time Employee,"Fargo, ND",Other/Not Classified,http://jobview.monster.com/Salon-Coordinator-J...,,Customer Support/Client Care,2f9e2894fbff2bddd6864444b9a81e75
102,United States of America,US,,No,jobs.monster.com,The Customer Service Representative (CSR) is r...,Customer Service Representative (Part-Time) / ...,Part Time Employee,"Lansing, MI 48911","Travel, Transportation and Tourism",http://jobview.monster.com/Customer-Service-Re...,,Customer Support/Client Care,ed3a0136b8c7b78144c18124d9d3263f
136,United States of America,US,,No,jobs.monster.com,Mechanical or Vessel Engineer will perform det...,Contract Mechanical/Vessel Engineer Job in Tel...,Part Time,Telecommute,Marine Mfg & Services,http://jobview.monster.com/Contract-Mechanical...,45.00 - 50.00 $ /hour,Experienced (Non-Manager),c8d4793a041484771cca9c30491ac025
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21945,United States of America,US,,No,jobs.monster.com,Summary Brand Ambassador Sales part Time We ...,Brand Ambassador Sales Part Time Job in Bellevue,Part Time Employee,"Bellevue, KY",Retail,http://jobview.monster.com/Brand-Ambassador-Sa...,,Sales/Retail/Business Development,a924773a37f74d04ff5c2760de29ae3b
21961,United States of America,US,,No,jobs.monster.com,"LIFEGUARD in Cincinnati, Ohio*TriHealth Fitnes...",Lifeguard Job in Cincinnati,Part Time Employee,"Cincinnati, OH 45242",Healthcare Services,http://jobview.monster.com/Lifeguard-Job-Cinci...,,General/Other: Medical/Health,494cf4540d4cc36805a215c13829c761
21978,United States of America,US,,No,jobs.monster.com,We're excited to be hosting our Saturday hirin...,Cincinnati Job Fair Job in Cincinnati,Part Time,"Cincinnati, OH 45226",Other/Not Classified,http://jobview.monster.com/Cincinnati-Job-Fair...,,,b75903b5d271fc41fcd3af45e1de82c4
21984,United States of America,US,,No,jobs.monster.com,Now Hiring Data Entry Reps and Mail Clerks- ca...,Now Hiring Data Entry Reps and Mail Clerks- ca...,Part Time,"Cincinnati, OH 45214",Other/Not Classified,http://jobview.monster.com/Now-Hiring-Data-Ent...,,,ddaa5f770c7634cd0c897af3d78e50f5


In [158]:
df[df['job_type'].str.contains(r'Part[\w,\s]*Time', na=False)]

Unnamed: 0,country,country_code,date_added,has_expired,job_board,job_description,job_title,job_type,location,organization,page_url,salary,sector,uniq_id
8,United States of America,US,,No,jobs.monster.com,"Part-Time, 4:30 pm - 9:30 pm, Mon - Fri Brookd...",Housekeeper Job in Austin,Part Time Employee,"Austin, TX 78746",Hotels and Lodging Personal and Household Serv...,http://jobview.monster.com/housekeeper-job-aus...,,Customer Support/Client Care,a6a2b5e825b8ce1c3b517adb2497c5ed
59,United States of America,US,,No,jobs.monster.com,Maden Technologies is accepting applications f...,Technical Writer - Documentation Job in Ogden,Part Time Employee,"Ogden, UT",,http://jobview.monster.com/Technical-Writer-Do...,,Editorial/Writing,93b4ec0f424b532fb0beebba0c51bef9
69,United States of America,US,,No,jobs.monster.com,Do you have a passion for the beauty industry?...,Salon Coordinator Job in Fargo,Part Time Employee,"Fargo, ND",Other/Not Classified,http://jobview.monster.com/Salon-Coordinator-J...,,Customer Support/Client Care,2f9e2894fbff2bddd6864444b9a81e75
102,United States of America,US,,No,jobs.monster.com,The Customer Service Representative (CSR) is r...,Customer Service Representative (Part-Time) / ...,Part Time Employee,"Lansing, MI 48911","Travel, Transportation and Tourism",http://jobview.monster.com/Customer-Service-Re...,,Customer Support/Client Care,ed3a0136b8c7b78144c18124d9d3263f
136,United States of America,US,,No,jobs.monster.com,Mechanical or Vessel Engineer will perform det...,Contract Mechanical/Vessel Engineer Job in Tel...,Part Time,Telecommute,Marine Mfg & Services,http://jobview.monster.com/Contract-Mechanical...,45.00 - 50.00 $ /hour,Experienced (Non-Manager),c8d4793a041484771cca9c30491ac025
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21945,United States of America,US,,No,jobs.monster.com,Summary Brand Ambassador Sales part Time We ...,Brand Ambassador Sales Part Time Job in Bellevue,Part Time Employee,"Bellevue, KY",Retail,http://jobview.monster.com/Brand-Ambassador-Sa...,,Sales/Retail/Business Development,a924773a37f74d04ff5c2760de29ae3b
21961,United States of America,US,,No,jobs.monster.com,"LIFEGUARD in Cincinnati, Ohio*TriHealth Fitnes...",Lifeguard Job in Cincinnati,Part Time Employee,"Cincinnati, OH 45242",Healthcare Services,http://jobview.monster.com/Lifeguard-Job-Cinci...,,General/Other: Medical/Health,494cf4540d4cc36805a215c13829c761
21978,United States of America,US,,No,jobs.monster.com,We're excited to be hosting our Saturday hirin...,Cincinnati Job Fair Job in Cincinnati,Part Time,"Cincinnati, OH 45226",Other/Not Classified,http://jobview.monster.com/Cincinnati-Job-Fair...,,,b75903b5d271fc41fcd3af45e1de82c4
21984,United States of America,US,,No,jobs.monster.com,Now Hiring Data Entry Reps and Mail Clerks- ca...,Now Hiring Data Entry Reps and Mail Clerks- ca...,Part Time,"Cincinnati, OH 45214",Other/Not Classified,http://jobview.monster.com/Now-Hiring-Data-Ent...,,,ddaa5f770c7634cd0c897af3d78e50f5


In [159]:
df[df['job_type'].str.contains(r'[Full\sTime]', na=False)]

Unnamed: 0,country,country_code,date_added,has_expired,job_board,job_description,job_title,job_type,location,organization,page_url,salary,sector,uniq_id
0,United States of America,US,,No,jobs.monster.com,TeamSoft is seeing an IT Support Specialist to...,IT Support Technician Job in Madison,Full Time Employee,"Madison, WI 53702",,http://jobview.monster.com/it-support-technici...,,IT/Software Development,11d599f229a80023d2f40e7c52cd941e
1,United States of America,US,,No,jobs.monster.com,The Wisconsin State Journal is seeking a flexi...,Business Reporter/Editor Job in Madison,Full Time,"Madison, WI 53708",Printing and Publishing,http://jobview.monster.com/business-reporter-e...,,,e4cbb126dabf22159aff90223243ff2a
2,United States of America,US,,No,jobs.monster.com,Report this job About the Job DePuy Synthes Co...,Johnson & Johnson Family of Companies Job Appl...,"Full Time, Employee",DePuy Synthes Companies is a member of Johnson...,Personal and Household Services,http://jobview.monster.com/senior-training-lea...,,,839106b353877fa3d896ffb9c1fe01c0
3,United States of America,US,,No,jobs.monster.com,Why Join Altec? If you’re considering a career...,Engineer - Quality Job in Dixon,Full Time,"Dixon, CA",Altec Industries,http://jobview.monster.com/engineer-quality-jo...,,Experienced (Non-Manager),58435fcab804439efdcaa7ecca0fd783
4,United States of America,US,,No,jobs.monster.com,Position ID# 76162 # Positions 1 State CT C...,Shift Supervisor - Part-Time Job in Camphill,Full Time Employee,"Camphill, PA",Retail,http://jobview.monster.com/shift-supervisor-pa...,,Project/Program Management,64d0272dc8496abfd9523a8df63c184c
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21995,United States of America,US,,No,jobs.monster.com,This is a major premier Cincinnati based finan...,Assistant Vice President - Controller Job in C...,Full Time,"Cincinnati, OH",,http://jobview.monster.com/Assistant-Vice-Pres...,"120,000.00 - 160,000.00 $ /yearbonus",,a80bc8cc3a90c17eef418963803bc640
21996,United States of America,US,,No,jobs.monster.com,Luxury homebuilder in Cincinnati seeking multi...,Accountant Job in Cincinnati,Full Time,"Cincinnati, OH 45236",Construction - Residential & Commercial/Office,http://jobview.monster.com/Accountant-Job-Cinc...,"45,000.00 - 60,000.00 $ /year",Manager (Manager/Supervisor of Staff),419a3714be2b30a10f628de207d041de
21997,United States of America,US,,No,jobs.monster.com,RE: Adobe AEM- Client - Loca...,AEM/CQ developer Job in Chicago,Full Time,"Chicago, IL 60602",,http://jobview.monster.com/AEM-CQ5-developer-J...,,,5a590350b73b2cec46b05750a208e345
21998,United States of America,US,,No,jobs.monster.com,Jernberg Industries was established in 1937 an...,Electrician - Experienced Forging Electrician ...,Full Time Employee,"Chicago, IL 60609","Jernberg Industries, Inc.",http://jobview.monster.com/Electrician-Experie...,25.00 - 28.00 $ /hour,Installation/Maintenance/Repair,40161cf61c283af9dc2b0a62947a5f1b


### Copy dataframe for processing

In [160]:
df2 = df

### Column : job_type
   - Full Time: 18970
   - Part Time: 993
   - Per Diem: 67
   - na: 1628

In [161]:
df.job_type.value_counts().to_frame()

Unnamed: 0,job_type
Full Time,6757
Full Time Employee,6617
"Full Time, Employee",3360
Full Time Temporary/Contract/Project,1062
"Full Time, Temporary/Contract/Project",533
"Full Time , Employee",406
Part Time Employee,382
Part Time,329
"Part Time, Employee",196
Temporary/Contract/Project,193


In [162]:
df2['job_type'].str.contains(r'Part[\w,\s]*Time', na=False).value_counts()

False    21007
True       993
Name: job_type, dtype: int64

In [163]:
df2['job_type'].str.contains(r'Full[\w,\s]*Time', na=False).value_counts()

True     18970
False     3030
Name: job_type, dtype: int64

In [164]:
df2['job_type'].isna().sum()

1628

### `job_type(time)` : contains Full Time, Part Time, Per Diem
   

In [165]:
df2['job_type(time)'] = df2['job_type'].str.extract(r'(Full[\w,\s]*Time|Part[\w,\s]*Time|Per[\w,\s]*Diem)')

In [166]:
df2['job_type'].str.extract(r'(Employee|Intern|Temporary)')

Unnamed: 0,0
0,Employee
1,
2,Employee
3,
4,Employee
...,...
21995,
21996,
21997,
21998,Employee


- ### `job_type(Period)` : contains Employee, Intern, Temporary


In [167]:
df2['job_type(Period)'] = df2['job_type'].str.extract(r'(Employee|Intern|Temporary)')

In [168]:
df[df2['job_type(Period)'] == 'Intern']

Unnamed: 0,country,country_code,date_added,has_expired,job_board,job_description,job_title,job_type,location,organization,page_url,salary,sector,uniq_id,job_type(time),job_type(Period)
178,United States of America,US,,No,jobs.monster.com,Springer Nature is a major new force in sc...,Editorial Internship,Full Time Intern,"New York, NY 10013",Printing and Publishing,http://jobview.monster.com/Editorial-Internshi...,,Editorial/Writing,0405aaa87456bb11407f780bd5688132,Full Time,Intern
2005,United States of America,US,,No,jobs.monster.com,Summer Intern RAMJob Function: SalesPrimary Lo...,Summer Marketing Internship Job in Columbus,Part Time Intern,"Columbus, OH",,http://jobview.monster.com/summer-marketing-in...,0.00 - 15.00 $ /hour,Marketing/Product,c83fe0d1f4cf4fa947910e7bd3de3563,Part Time,Intern
2422,United States of America,US,,No,jobs.monster.com,Position Summary The HR Co-Op will be a part o...,HR Internship/Co-Op Job in Algona,Full Time Intern,"Algona, WA 98001",Aerospace and Defense,http://jobview.monster.com/hr-internship-co-op...,,Human Resources,d99d552308bec9a14c50e8b1e9777941,Full Time,Intern
11347,United States of America,US,,No,jobs.monster.com,Essential Functions: Monitors the IT Suppor...,Information Systems Intern Job in Columbus,"Part Time, Intern","Columbus, OH",Energy and Utilities,http://jobview.monster.com/information-systems...,,IT/Software Development,ad9799eee47b0d7a95d806216860c8a5,Part Time,Intern
17079,United States of America,US,,No,jobs.monster.com,"Our Mission: “At BBRG, we strive to be the Be...",Digital Marketing Intern Job in Columbus,Part Time Intern,"Columbus, OH 43085",Restaurant/Food Services,http://jobview.monster.com/Digital-Marketing-I...,,Entry Level,25eb1c21490394f643a041500ddbd754,Part Time,Intern
17959,United States of America,US,2/29/2016,No,jobs.monster.com,Lehigh Hanson is seeking an Intern – Informati...,Front Office IT Intern Job in Irving,Full Time Intern,"Irving, TX 75062",Metals and MineralsConstruction - Industrial F...,http://jobview.monster.com/Front-Office-IT-Int...,,IT/Software Development,de1ab6c78ef1b828239b79200d308801,Full Time,Intern
19916,United States of America,US,,No,jobs.monster.com,"Overview: Aesculap, Inc., a B. Braun company, ...",Marketing Internship Job in Center Valley,Full Time Intern,"Center Valley, PA",Medical Devices and Supplies,http://jobview.monster.com/Marketing-Internshi...,,Medical/Health,8418b7d1135191bc41bd50db104f4697,Full Time,Intern
20137,United States of America,US,,No,jobs.monster.com,Brand Ambassador Internship As a Brand Ambassa...,Brand Ambassador Internship Job in Wauwatosa,Full Time Intern,"Wauwatosa, WI 53222",Advertising and PR Services,http://jobview.monster.com/brand-ambassador-in...,,Marketing/Product,889d830fcba9c1392090c2cf79b821cd,Full Time,Intern


In [169]:
df2.head() 

Unnamed: 0,country,country_code,date_added,has_expired,job_board,job_description,job_title,job_type,location,organization,page_url,salary,sector,uniq_id,job_type(time),job_type(Period)
0,United States of America,US,,No,jobs.monster.com,TeamSoft is seeing an IT Support Specialist to...,IT Support Technician Job in Madison,Full Time Employee,"Madison, WI 53702",,http://jobview.monster.com/it-support-technici...,,IT/Software Development,11d599f229a80023d2f40e7c52cd941e,Full Time,Employee
1,United States of America,US,,No,jobs.monster.com,The Wisconsin State Journal is seeking a flexi...,Business Reporter/Editor Job in Madison,Full Time,"Madison, WI 53708",Printing and Publishing,http://jobview.monster.com/business-reporter-e...,,,e4cbb126dabf22159aff90223243ff2a,Full Time,
2,United States of America,US,,No,jobs.monster.com,Report this job About the Job DePuy Synthes Co...,Johnson & Johnson Family of Companies Job Appl...,"Full Time, Employee",DePuy Synthes Companies is a member of Johnson...,Personal and Household Services,http://jobview.monster.com/senior-training-lea...,,,839106b353877fa3d896ffb9c1fe01c0,Full Time,Employee
3,United States of America,US,,No,jobs.monster.com,Why Join Altec? If you’re considering a career...,Engineer - Quality Job in Dixon,Full Time,"Dixon, CA",Altec Industries,http://jobview.monster.com/engineer-quality-jo...,,Experienced (Non-Manager),58435fcab804439efdcaa7ecca0fd783,Full Time,
4,United States of America,US,,No,jobs.monster.com,Position ID# 76162 # Positions 1 State CT C...,Shift Supervisor - Part-Time Job in Camphill,Full Time Employee,"Camphill, PA",Retail,http://jobview.monster.com/shift-supervisor-pa...,,Project/Program Management,64d0272dc8496abfd9523a8df63c184c,Full Time,Employee


- ### `job_title` cleaning

In [170]:
df['job_title'].value_counts().to_frame()

Unnamed: 0,job_title
Monster,318
Shift Supervisor Job in Camphill,256
RN,70
Shift Supervisor - Part-Time Job in Camphill,56
Manager,50
...,...
Sales Job in Knoxville,1
Merchandiser Job in Madison,1
Retail Project Merchandiser Part Time Job in Madison,1
Senior Windows Systems Administrator* Job in Madison,1


In [171]:
df['job_title'].unique()

array(['IT Support Technician Job in Madison',
       'Business Reporter/Editor Job in Madison',
       'Johnson & Johnson Family of Companies Job Application for Senior Training Leader | Monster.com var MONS_LOG_VARS = {"JobID":',
       ..., 'AEM/CQ developer Job in Chicago',
       'Electrician - Experienced Forging Electrician Job in Chicago',
       'Contract Administrator Job in Cincinnati'], dtype=object)

In [172]:
# at few places '-' is used as seperator

In [173]:
df[df['job_title'].str.contains(r'(?i)job')== False].head(20)

Unnamed: 0,country,country_code,date_added,has_expired,job_board,job_description,job_title,job_type,location,organization,page_url,salary,sector,uniq_id,job_type(time),job_type(Period)
28,United States of America,US,,No,jobs.monster.com,Resident Care Specialist DescriptionSummaryPro...,Resident Care Specialist,Full Time Employee,"Houston, TX",Healthcare Services,http://jobview.monster.com/Resident-Care-Speci...,,,bc88dfe50b64e2925866e4af6330fc55,Full Time,Employee
29,United States of America,US,,No,jobs.monster.com,Experis is working with a Pharmaceutical start...,Sr. Process Engineer,Full Time Employee,"Sr. Process Engineer, Manufacturing","Chicago, IL",http://jobview.monster.com/Sr-Process-Engineer...,"70,000.00 - 100,000.00 $ /year",Engineering,779bb4c9bf038b7fb775134736d36fd4,Full Time,Employee
36,United States of America,US,,No,jobs.monster.com,"POSITION TITLE: RF System Technician, Field Se...",RF System Technician,Full Time Temporary/Contract/Project,"RF System Technician, Field Service","Oklahoma City, OK",http://jobview.monster.com/RF-System-Technicia...,"68,000.00 - 72,000.00 $ /year",Engineering,ceb44cca7cd280adcb0c84c20f3c6c21,Full Time,Temporary
50,United States of America,US,,No,jobs.monster.com,Palgrave Macmillan is a global academic publis...,Commissioning Editor,Full Time Employee,"New York, NY 10013",Printing and Publishing,http://jobview.monster.com/Commissioning-Edito...,,Editorial/Writing,b15ef853a3ffac7425c48460a32bd0d3,Full Time,Employee
57,United States of America,US,,No,jobs.monster.com,Enterprise Products Partners L.P. is one of ...,Specialist,Full Time,"Houston, TX 77001",All,http://jobview.monster.com/Specialist-Technica...,,Experienced (Non-Manager),54b3bb1da783d46a5b4db82995ff357f,Full Time,
79,United States of America,US,,No,jobs.monster.com,Commercial Construction Superintendent: Denver...,Commercial Construction Superintendent: Denver,Full Time,"Denver, CO 80002",Construction - Residential & Commercial/Office,http://jobview.monster.com/Commercial-Construc...,,Manager (Manager/Supervisor of Staff),a309736fa99da07b91f267531a24f924,Full Time,
81,United States of America,US,,No,jobs.monster.com,City: Houston State:Texas Postal/Zip Code: 770...,Asphalt Quality Control Manager -Houston,Full Time Employee,"Houston, TX",,http://jobview.monster.com/Asphalt-Quality-Con...,,,932b0176fba2f2878d1d73fef2574e44,Full Time,Employee
96,United States of America,US,,No,jobs.monster.com,Financial Advisor Northwestern MutualOur finan...,Financial Advisor / Financial Sales Representa...,Full Time Employee,"Houston, TX",Banking,http://jobview.monster.com/Financial-Advisor-F...,,Sales/Retail/Business Development,e52d5e30ecdf97dd64a44f45bfc2032b,Full Time,Employee
98,United States of America,US,,No,jobs.monster.com,"*****THIS POSITION IS IN Decatur, IL. PLEASE A...",Mechanical Engineer Relocate to Decatur,Full Time Employee,"Bloomington, IL",,http://jobview.monster.com/Mechanical-Engineer...,,General/Other: Engineering,1670bab124bbe35985e761939c4cbb6b,Full Time,Employee
119,United States of America,US,,No,jobs.monster.com,"Engineer, Certification (Certifier, Gas Contro...",Engineer,,"Cleveland, OH",Energy and Utilities,http://jobview.monster.com/Engineer-Certificat...,,Experienced (Non-Manager),0ea5e804d194e41db779ee119c70e497,,


In [174]:
list(df['page_url'])

['http://jobview.monster.com/it-support-technician-job-madison-wi-us-167855963.aspx?mescoid=1500134001001&jobPosition=20',
 'http://jobview.monster.com/business-reporter-editor-job-madison-wi-us-167830105.aspx?mescoid=2700437001001&jobPosition=7',
 'http://jobview.monster.com/senior-training-leader-job-raynham-ma-us-177958678.aspx?mescoid=1100055001001&jobPosition=4',
 'http://jobview.monster.com/engineer-quality-job-dixon-ca-us-178572650.aspx?mescoid=1700192001001&jobPosition=18',
 'http://jobview.monster.com/shift-supervisor-part-time-job-camphill-pa-us-179185125.aspx?mescoid=5100910001001&jobPosition=16',
 'http://jobview.monster.com/construction-pm-charlottesville-job-charlottesville-va-us-179234939.aspx?mescoid=1100034001001&jobPosition=8',
 'http://jobview.monster.com/principal-qa-engineer-java-selenium-job-san-francisco-ca-us-179110452.aspx?mescoid=1700192001001&jobPosition=3',
 'http://jobview.monster.com/mailroom-clerk-job-austin-tx-us-179169192.aspx?mescoid=4300757001001&jobP

In [175]:
# df['page_url'].apply(lambda x: re.findall(r'(.com\/)(.*)-(Job?)',x)).to_frame()
df2['job_title_clean'] = df['page_url'].str.extract(r'(.com\/)(.*)-(?i)(Job?)')[1]
# match.group(2)


In [176]:
df2.isna().sum()

country                 0
country_code            0
date_added          21878
has_expired             0
job_board               0
job_description         0
job_title               0
job_type             1628
location                0
organization         6867
page_url                0
salary              18554
sector               5194
uniq_id                 0
job_type(time)       1970
job_type(Period)     8771
job_title_clean        41
dtype: int64

In [177]:
 df2[df2['job_title_clean'].isnull()]

Unnamed: 0,country,country_code,date_added,has_expired,job_board,job_description,job_title,job_type,location,organization,page_url,salary,sector,uniq_id,job_type(time),job_type(Period),job_title_clean
303,United States of America,US,,No,jobs.monster.com,Please apply only if you are qualified.,Please apply only if you are qualified.,,"Shaffer Trucking, IncGretna, LA",,http://job-openings.monster.com/monster/d687c9...,,,8b76720fe70edd9e7f48ff2742def465,,,
305,United States of America,US,,No,jobs.monster.com,Please apply only if you are qualified.,Please apply only if you are qualified.,,"Boyd Bros.KENNER, LA",,http://job-openings.monster.com/monster/459c91...,,,a43b5a31136e96afc06c3899e20043e4,,,
307,United States of America,US,,No,jobs.monster.com,Please apply only if you are qualified.,Please apply only if you are qualified.,,"Rooms To GoMetairie, LA",,http://job-openings.monster.com/monster/8284d2...,,,c6a5a61d9cd05fca25b93705b9e66335,,,
357,United States of America,US,,No,jobs.monster.com,Position Description: Southwest Region – Distr...,Job Details,,Location(s):,,http://job-openings.monster.com/monster/8abedf...,,,8aff3609fc8a21867325672efa45140a,,,
392,United States of America,US,,No,jobs.monster.com,Please apply only if you are qualified.,Please apply only if you are qualified.,,"Mondelez InternationalKnoxville, TN",,http://job-openings.monster.com/monster/1584bf...,,,97e230c230ab4487572b29971a217130,,,
4208,United States of America,US,,No,jobs.monster.com,Please apply only if you are qualified.,Please apply only if you are qualified.,,"Wellspan HealthYork, PA",,http://job-openings.monster.com/monster/aa27a2...,,,e7a4aedac9dcade8c42579b70768d399,,,
4459,United States of America,US,,No,jobs.monster.com,Please apply only if you are qualified.,Please apply only if you are qualified.,,"Premier TransportationChattanooga, TN",,http://job-openings.monster.com/monster/7e8794...,,,36ac85b751e19f6debca4e94f34f5539,,,
4461,United States of America,US,,No,jobs.monster.com,Please apply only if you are qualified.,Please apply only if you are qualified.,,"Ozark Motor LinesChattanooga, TN",,http://job-openings.monster.com/monster/e77c51...,,,85a9d37cc9bef1560215d43ddb885cc8,,,
4462,United States of America,US,,No,jobs.monster.com,Please apply only if you are qualified.,Please apply only if you are qualified.,,"Summitt Trucking, LLCChattanooga, TN",,http://job-openings.monster.com/monster/dfd62a...,,,d24043783a4fcf91b9b7518aa1495a98,,,
4463,United States of America,US,,No,jobs.monster.com,Please apply only if you are qualified.,Please apply only if you are qualified.,,"SchneiderChattanooga, TN",,http://job-openings.monster.com/monster/3e6ee0...,,,7a236d8d1a98651b09c2ea3b1a11a3d5,,,


In [178]:
# df2['page_url'].loc[303]

In [179]:
# df2['job_title_clean'].loc[10403]

In [180]:
# re.search(r'(.com\/)(.*)-(Job?)', )\
# df['page_url'].apply(lambda x : re.search(r'(.com\/)(.*)-(Job?)', x).group(0))

- ### Cleaning `sector` column

In [181]:
df2.columns

Index(['country', 'country_code', 'date_added', 'has_expired', 'job_board',
       'job_description', 'job_title', 'job_type', 'location', 'organization',
       'page_url', 'salary', 'sector', 'uniq_id', 'job_type(time)',
       'job_type(Period)', 'job_title_clean'],
      dtype='object')

In [182]:
df2.sector

0                      IT/Software Development
1                                          NaN
2                                          NaN
3                    Experienced (Non-Manager)
4                   Project/Program Management
                         ...                  
21995                                      NaN
21996    Manager (Manager/Supervisor of Staff)
21997                                      NaN
21998          Installation/Maintenance/Repair
21999                Experienced (Non-Manager)
Name: sector, Length: 22000, dtype: object

In [191]:
df2.sector.value_counts().to_frame().tail(30)

Unnamed: 0,sector
Shipping and Receiving/Warehousing,1
Account Management (Commissioned)Insurance Agent/BrokerFinancial Products Sales/Brokerage,1
"Car, Van and Bus DrivingGeneral/Other: Logistics/TransportationTruck Driving",1
Military Combat,1
Administrative Support,1
Field SalesStore/Branch Management,1
Real Estate Agent/BrokerRetail/Counter Sales and CashierStore/Branch Management,1
General/Other: Logistics/TransportationPurchasing,1
General/Other: Legal Paralegal & Legal Secretary,1
Supplier Management/Vendor Management,1


In [206]:
df2['sector'].str.replace(r'.{150,}', 'other')

  df2['sector'].str.replace(r'.{150,}', 'other')


0                      IT/Software Development
1                                          NaN
2                                          NaN
3                    Experienced (Non-Manager)
4                   Project/Program Management
                         ...                  
21995                                      NaN
21996    Manager (Manager/Supervisor of Staff)
21997                                      NaN
21998          Installation/Maintenance/Repair
21999                Experienced (Non-Manager)
Name: sector, Length: 22000, dtype: object

In [207]:
df3 = df2[df2['sector'].str.contains(r'.{150,}', na = False)]['sector'].to_frame()
df3['length'] = df3['sector'].apply(lambda x : len(x))

In [208]:
df3

Unnamed: 0,sector,length
807,"To perform this job successfully, an individua...",2217
854,High school diploma or general education degre...,579
1027,"To perform this job successfully, an individua...",1740
2558,Passionate coders with 5+ years of application...,3487
4092,Strong work ethic with a drive to exceed expec...,1139
11233,"Bachelor's degree, preferrably in Marketing or...",788
20750,High level of responsiveness and exceptional c...,3300
21931,Demonstrate the highest level of leadership an...,1158


#### Cleaning Step : Replaced messy long sector value(values greater than 150) with 'other'

In [219]:
df2['sector'] = df2['sector'].str.replace(r'.{150,}', 'other')

  df2['sector'] = df2['sector'].str.replace(r'.{150,}', 'other')


In [223]:
df2[df2['sector']=='other']

Unnamed: 0,country,country_code,date_added,has_expired,job_board,job_description,job_title,job_type,location,organization,page_url,salary,sector,uniq_id,job_type(time),job_type(Period),job_title_clean
807,United States of America,US,,No,jobs.monster.com,Overview We are currently seeking a Bilingual ...,Bilingual Field Nurse Case Manager (RN) Job in...,Full Time Employee,"Dallas, TX",Healthcare Services,http://jobview.monster.com/bilingual-field-nur...,,other,b8d37f16771a6a26347c9a6a20b51ea5,Full Time,Employee,bilingual-field-nurse-case-manager-rn
854,United States of America,US,,No,jobs.monster.com,Overview We are currently seeking an Intake/Re...,Intake Coordinator Job in Dallas,Full Time Employee,"Dallas, TX",Healthcare Services,http://jobview.monster.com/intake-coordinator-...,,other,67bcf8463d03704fd3de4700c7ace7b0,Full Time,Employee,intake-coordinator
1027,United States of America,US,,No,jobs.monster.com,Overview We are currently seeking a Telephonic...,Remote Telephonic Nurse Case Manager (RN) Job ...,Full Time Employee,"Dallas, TX",Healthcare Services,http://jobview.monster.com/remote-telephonic-n...,,other,bb076a4ba1987629f8dd4da16a3c376c,Full Time,Employee,remote-telephonic-nurse-case-manager-rn
2558,United States of America,US,,No,jobs.monster.com,Overview: At Perficient you’ll deliver missi...,PHP/Magento Lead Job in Milwaukee,,"Milwaukee, WI",,http://jobview.monster.com/php-magento-lead-jo...,,other,1768c857402371ba505465714aa4050d,,,php-magento-lead
4092,United States of America,US,,No,jobs.monster.com,Overview: If you have a love for design and en...,Interior Design Consultant / Retail Sales Job ...,Full Time Employee,"Brown Deer, WI",Business Services - Other,http://jobview.monster.com/interior-design-con...,,other,931467534e1bde8225ff3cacc1491d19,Full Time,Employee,interior-design-consultant-retail-sales
11233,United States of America,US,,No,jobs.monster.com,Sally Beauty Holdings (NYSE: SBH) is the world...,Advertising and Promotion Specialist Job in De...,"Full Time, Employee","Denton, TX",Retail,http://jobview.monster.com/advertising-promoti...,,other,001c008403da40a8c3424be526e8c1bb,Full Time,Employee,advertising-promotion-specialist
20750,United States of America,US,,No,jobs.monster.com,Overview: At Perficient you’ll deliver missi...,Forecast Support Specialist Job in St. Louis,,"St. Louis, MO",,http://jobview.monster.com/forecast-support-sp...,,other,aa260b1b13dda41069358815c2805033,,,forecast-support-specialist
21931,United States of America,US,,No,jobs.monster.com,Sally Beauty is the world’s largest wholesale ...,Store Managers Job in Cincinnati,"Full Time, Employee","Cincinnati, OH",Retail,http://jobview.monster.com/Store-Managers-Job-...,,other,67b2a95a1d306aecaffbb9dfe73c35df,Full Time,Employee,Store-Managers


### `organization` column cleaning

In [224]:
df.organization.unique()

array([nan, 'Printing and Publishing', 'Personal and Household Services',
       'Altec Industries', 'Retail', 'Computer/IT Services',
       'Computer Software',
       'Hotels and Lodging Personal and Household Services', 'Insurance',
       'Business Services - Other', 'Education',
       'Construction - Industrial Facilities and InfrastructureConstruction - Residential & Commercial/Office',
       'Accounting and Auditing Services', 'Legal Services',
       'Construction - Residential & Commercial/Office',
       'Engineering Services', 'AllComputer SoftwareComputer/IT Services',
       'Healthcare Services', 'Chicago, IL', 'Manufacturing - Other',
       'Oklahoma City, OK', 'Aerospace and Defense', 'San Francisco, CA',
       'Advertising and PR ServicesManagement Consulting ServicesBusiness Services - Other',
       'Other/Not Classified',
       'RetailAdvertising and PR ServicesBusiness Services - Other',
       'All', 'Electronics, Components, and Semiconductor Mfg',
       '

In [225]:
df2.organization.value_counts().to_frame().head(60)

Unnamed: 0,organization
Healthcare Services,1919
All,1158
Retail,1081
Other/Not Classified,1048
Manufacturing - Other,885
Computer/IT Services,822
Legal Services,466
Business Services - Other,410
Restaurant/Food Services,384
Transport and Storage - Materials,342


In [226]:
# Below statement with return all the city values whcih are in organization column
temp = df2[df2['organization'].str.contains(r'[\w\d\s]*,\s+\w{2}\b[\w\d\s]*')==True]

In [227]:
temp.organization.head(50)

29                     Chicago, IL
36               Oklahoma City, OK
38               San Francisco, CA
73                      Durham, NC
152                    Houston, TX
249              Chicago, IL 60661
425            Princeton, NJ 08543
624                     Dallas, TX
652                     Dallas, TX
674              Belfast, ME 04915
734                     Dallas, TX
766                     Dallas, TX
776                     Dallas, TX
817                      Plano, TX
924                     Dallas, TX
1012                   Coppell, TX
1019                    Dallas, TX
1045                    Dallas, TX
1049                    Dallas, TX
1063                     Plano, TX
1070                    Denver, CO
1095          Richardson, TX 75082
1106                    Dallas, TX
1129          Washington, DC 20001
1139           Prairie Village, KS
1146                  Savannah, GA
1175           Hoffman Estates, IL
1227           El Dorado Hills, CA
1239                

In [228]:
# replace the city values in organization table with 'other'
df2['organization'] = df2['organization'].str.replace(r'[\w\d\s]*,\s+\w{2}\b[\w\d\s]*', 'other')

  df2['organization'] = df2['organization'].str.replace(r'[\w\d\s]*,\s+\w{2}\b[\w\d\s]*', 'other')


In [229]:
df2[df2['organization'] == 'other'] 

Unnamed: 0,country,country_code,date_added,has_expired,job_board,job_description,job_title,job_type,location,organization,page_url,salary,sector,uniq_id,job_type(time),job_type(Period),job_title_clean
29,United States of America,US,,No,jobs.monster.com,Experis is working with a Pharmaceutical start...,Sr. Process Engineer,Full Time Employee,"Sr. Process Engineer, Manufacturing",other,http://jobview.monster.com/Sr-Process-Engineer...,"70,000.00 - 100,000.00 $ /year",Engineering,779bb4c9bf038b7fb775134736d36fd4,Full Time,Employee,Sr-Process-Engineer-Manufacturing
36,United States of America,US,,No,jobs.monster.com,"POSITION TITLE: RF System Technician, Field Se...",RF System Technician,Full Time Temporary/Contract/Project,"RF System Technician, Field Service",other,http://jobview.monster.com/RF-System-Technicia...,"68,000.00 - 72,000.00 $ /year",Engineering,ceb44cca7cd280adcb0c84c20f3c6c21,Full Time,Temporary,RF-System-Technician-Field-Service
38,United States of America,US,,No,jobs.monster.com,**MUST be able to work as a W2 employee for AN...,Bi-Lingual Editorial Strategist Job in San Fra...,Full Time Temporary/Contract/Project,Bi-Lingual Editorial Strategist,other,http://jobview.monster.com/Bi-Lingual-Editoria...,,Editorial/Writing,4e195c4c6d72e738a1e13aea51005398,Full Time,Temporary,Bi-Lingual-Editorial-Strategist
73,United States of America,US,,No,jobs.monster.com,Experis Engineering has an immediate opening f...,Quality Engineer Job in Durham,Full Time Employee,Quality Engineer,other,http://jobview.monster.com/Quality-Engineer-Jo...,,Engineering,f6427362e1064e70ad9a223942a588a6,Full Time,Employee,Quality-Engineer
152,United States of America,US,,No,jobs.monster.com,Support program to implement global regulatory...,Business Analyst Job in Houston,Full Time Temporary/Contract/Project,Business Analyst,other,http://jobview.monster.com/Business-Analyst-Jo...,,IT/Software Development,48b975d642803ab6c201b1b59adb5dbf,Full Time,Temporary,Business-Analyst
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21401,United States of America,US,,No,jobs.monster.com,"Through our reach and resources, #Experis brin...",Performance Engineer Job in Cincinnati,Full Time Temporary/Contract/Project,Performance Engineer,other,http://jobview.monster.com/Performance-Enginee...,,IT/Software Development,8dedbef9816dc564aa3e38813a47d334,Full Time,Temporary,Performance-Engineer
21456,United States of America,US,,No,jobs.monster.com,"Through our reach and resources, #Experis brin...",Supply Chain Business Analyst Job in Cincinnati,Full Time Temporary/Contract/Project,Supply Chain Business Analyst,other,http://jobview.monster.com/Supply-Chain-Busine...,,IT/Software Development,efaf45dd67298931a6a48afca32f7f76,Full Time,Temporary,Supply-Chain-Business-Analyst
21507,United States of America,US,,No,jobs.monster.com,"Through our reach and resources, #Experis brin...",Project Coordinator Job in Cincinnati,Full Time Temporary/Contract/Project,Project Coordinator,other,http://jobview.monster.com/Project-Coordinator...,,IT/Software Development,80829a0db8344f7fd6fe3b19ab62a022,Full Time,Temporary,Project-Coordinator
21629,United States of America,US,,No,jobs.monster.com,"Through our reach and resources, #Experis brin...",Help Desk / Desktop Support Technician Job in ...,Full Time Temporary/Contract/Project,Help Desk / Desktop Support Technician,other,http://jobview.monster.com/Help-Desk-Desktop-S...,,IT/Software Development,7239bce7ac14d1fed0cb374d4f425de6,Full Time,Temporary,Help-Desk-Desktop-Support-Technician


### `salary` column cleaning

In [239]:
df2.salary.isna().sum()

18554

In [233]:
list(df2['salary'].unique())

[nan,
 '9.00 - 13.00 $ /hour',
 '80,000.00 - 95,000.00 $ /year',
 '60,000.00 - 72,000.00 $ /year',
 'Excellent Pay and Incentives',
 '70,000.00 - 100,000.00 $ /year',
 '62.00 - 81.00 $ /hour',
 '75,000.00 - 100,000.00 $ /year',
 '68,000.00 - 72,000.00 $ /year',
 '58,000.00 - 65,000.00 $ /year',
 'Up to $32000.00',
 '15.00 - 16.00 $ /hour',
 'Salary, plus commission',
 '45,000.00 - 100,000.00 $ /yearBonus, Benefits, 401k',
 '40,000.00 - 50,000.00 $ /yearsalary',
 '13.75 - 16.75 $ /hourYear End Bonus',
 'To be discussed.',
 '40.00 - 50.00 $ /hour',
 '80,000.00 - 90,000.00 $ /year',
 '35,000.00 - 45,000.00 $ /year',
 '80,000.00 - 100,000.00 $ /year',
 'bonus, 401K matching, medical, vacation',
 '31,000.00 - 33,000.00 $ /year',
 '50.00 - 65.00 $ /hour',
 '100,000.00 - 120,000.00 $ /year',
 '17.00 - 22.00 $ /hour',
 'DOE',
 '$50,000.00+ /year',
 '56,000.00 - 64,000.00 $ /yearHighly Competitive Base Salary Plus Lucrative Bonus Plan, Benefit Package, in a Highly Diverse, Very Successful Compa

In [236]:
df2.salary.value_counts().head(50)

40,000.00 - 100,000.00 $ /year     50
Commensurate with Experience       38
50,000.00 - 60,000.00 $ /year      32
75,000.00 - 85,000.00 $ /year      27
40,000.00 - 50,000.00 $ /year      26
65,000.00 - 75,000.00 $ /year      25
60,000.00 - 70,000.00 $ /year      24
60,000.00 - 80,000.00 $ /year      22
80,000.00 - 100,000.00 $ /year     20
80,000.00 - 90,000.00 $ /year      20
75,000.00 - 90,000.00 $ /year      19
70,000.00 - 80,000.00 $ /year      18
20.00 - 25.00 $ /hour              18
14.00 - 14.00 $ /hour              17
Up to $15.00                       17
55,000.00 - 60,000.00 $ /year      16
14.00 - 16.00 $ /hour              16
12.00 - 14.00 $ /hour              15
12.00 - 12.00 $ /hour              14
70,000.00 - 85,000.00 $ /year      14
65,000.00 - 70,000.00 $ /year      14
85,000.00 - 95,000.00 $ /year      14
90,000.00 - 110,000.00 $ /year     14
12.00 - 13.00 $ /hour              14
50,000.00 - 65,000.00 $ /year      14
55,000.00 - 65,000.00 $ /year      14
50,000.00 - 

In [242]:
# fill null salary with 'information not available'
df2['salary'] = df2['salary'].fillna('information not available')

In [243]:
df2.salary.isna().sum()

0

In [244]:
df2

Unnamed: 0,country,country_code,date_added,has_expired,job_board,job_description,job_title,job_type,location,organization,page_url,salary,sector,uniq_id,job_type(time),job_type(Period),job_title_clean
0,United States of America,US,,No,jobs.monster.com,TeamSoft is seeing an IT Support Specialist to...,IT Support Technician Job in Madison,Full Time Employee,"Madison, WI 53702",,http://jobview.monster.com/it-support-technici...,information not available,IT/Software Development,11d599f229a80023d2f40e7c52cd941e,Full Time,Employee,it-support-technician
1,United States of America,US,,No,jobs.monster.com,The Wisconsin State Journal is seeking a flexi...,Business Reporter/Editor Job in Madison,Full Time,"Madison, WI 53708",Printing and Publishing,http://jobview.monster.com/business-reporter-e...,information not available,,e4cbb126dabf22159aff90223243ff2a,Full Time,,business-reporter-editor
2,United States of America,US,,No,jobs.monster.com,Report this job About the Job DePuy Synthes Co...,Johnson & Johnson Family of Companies Job Appl...,"Full Time, Employee",DePuy Synthes Companies is a member of Johnson...,Personal and Household Services,http://jobview.monster.com/senior-training-lea...,information not available,,839106b353877fa3d896ffb9c1fe01c0,Full Time,Employee,senior-training-leader
3,United States of America,US,,No,jobs.monster.com,Why Join Altec? If you’re considering a career...,Engineer - Quality Job in Dixon,Full Time,"Dixon, CA",Altec Industries,http://jobview.monster.com/engineer-quality-jo...,information not available,Experienced (Non-Manager),58435fcab804439efdcaa7ecca0fd783,Full Time,,engineer-quality
4,United States of America,US,,No,jobs.monster.com,Position ID# 76162 # Positions 1 State CT C...,Shift Supervisor - Part-Time Job in Camphill,Full Time Employee,"Camphill, PA",Retail,http://jobview.monster.com/shift-supervisor-pa...,information not available,Project/Program Management,64d0272dc8496abfd9523a8df63c184c,Full Time,Employee,shift-supervisor-part-time
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21995,United States of America,US,,No,jobs.monster.com,This is a major premier Cincinnati based finan...,Assistant Vice President - Controller Job in C...,Full Time,"Cincinnati, OH",,http://jobview.monster.com/Assistant-Vice-Pres...,"120,000.00 - 160,000.00 $ /yearbonus",,a80bc8cc3a90c17eef418963803bc640,Full Time,,Assistant-Vice-President-Controller
21996,United States of America,US,,No,jobs.monster.com,Luxury homebuilder in Cincinnati seeking multi...,Accountant Job in Cincinnati,Full Time,"Cincinnati, OH 45236",Construction - Residential & Commercial/Office,http://jobview.monster.com/Accountant-Job-Cinc...,"45,000.00 - 60,000.00 $ /year",Manager (Manager/Supervisor of Staff),419a3714be2b30a10f628de207d041de,Full Time,,Accountant
21997,United States of America,US,,No,jobs.monster.com,RE: Adobe AEM- Client - Loca...,AEM/CQ developer Job in Chicago,Full Time,"Chicago, IL 60602",,http://jobview.monster.com/AEM-CQ5-developer-J...,information not available,,5a590350b73b2cec46b05750a208e345,Full Time,,AEM-CQ5-developer
21998,United States of America,US,,No,jobs.monster.com,Jernberg Industries was established in 1937 an...,Electrician - Experienced Forging Electrician ...,Full Time Employee,"Chicago, IL 60609","Jernberg Industries, Inc.",http://jobview.monster.com/Electrician-Experie...,25.00 - 28.00 $ /hour,Installation/Maintenance/Repair,40161cf61c283af9dc2b0a62947a5f1b,Full Time,Employee,Electrician-Experienced-Forging-Electrician


### Cleaning summary

- Replaced blanked-spaced/empty strings with NaN
<br></br>

- `job_type`:
    - Created two columns as per main job_type column as below:
        - job_type(time) : contains Full Time, Part Time, Per Diem
        - job_type(Period) : contains Employee, Intern, Temporary
<br></br>

- `job_title`:
    - Extracted job_title from 'pge_url' column using regex.
<br></br>    
- `sector`:
    - Replaced messy long sector value(values greater than 150) with 'other'
<br></br>   
- `organization`:
     - Replaced the messy data(cityname/location) values in organization table with 'other'
<br></br>    
- `salary`:
    -  Filled null salary with 'information not available'

