# Job Posting Data Analysis
In this notebook, the group will be working with the [Job Posting in Singapore](https://www.kaggle.com/datasets/techsalerator/job-posting-data-in-singapore) dataset. This dataset will be used for processing, analyzing, and visualizing data.

This project is carried out by the group **DS NERDS**, under Section **S19**, which consists of the following members:
- Colobong, Franz Andrick
- Chu, Andre Benedict M. 
- Pineda, Mark Gabriel A.
- Rocha, Angelo H. 
  
The output fulfills a part of our requirements for the course Statistical Modeling and Simulation (CSMODEL). 


# Import Libraries

In [76]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Dataset Description and Collection Process

This dataset offers a comprehensive overview of job openings across various sectors in Singapore. It provides an essential resource for businesses, job seekers, and labor market analysts, and it can also be a valuable tool for people who would like to be informed about job openings and employment trends in Singapore.

The data was collected by a global data provider called **Techsalerator**, by consolidating and categorizing job-related information from diverse sources, including company websites, job boards, and recruitment agencies. 

Now, let us load the CSV file into our workspace with **'latin1'** encoding as it contains special characters (e.g., é, ñ, ’) that caused a UnicodeDecodeError with the default **'utf-8'** encoding.

In [77]:
job_posting_df = pd.read_csv('Job Posting.csv', encoding='latin1')
job_posting_df.head()

Unnamed: 0,Website Domain,Ticker,Job Opening Title,Job Opening URL,First Seen At,Last Seen At,Location,Location Data,Category,Seniority,...,Description,Salary,Salary Data,Contract Types,Job Status,Job Language,Job Last Processed At,O*NET Code,O*NET Family,O*NET Occupation Name
0,bosch.com,,IN_RBAI_Assistant Manager_Dispensing Process E...,https://jobs.smartrecruiters.com/BoschGroup/74...,2024-05-29T19:59:45Z,2024-07-31T14:35:44Z,"Indiana, United States","[{""city"":null,""state"":""Indiana"",""zip_code"":nul...","engineering, management, support",manager,...,**IN\_RBAI\_Assistant Manager\_Dispensing Proc...,,"{""salary_low"":null,""salary_high"":null,""salary_...",full time,closed,en,2024-08-02T14:47:55Z,43-1011.00,Office and Administrative Support,First-Line Supervisors of Office and Administr...
1,bosch.com,,Professional Internship: Hardware Development ...,https://jobs.smartrecruiters.com/BoschGroup/74...,2024-05-04T01:00:12Z,2024-07-29T17:46:16Z,"Delaware, United States","[{""city"":null,""state"":""Delaware"",""zip_code"":nu...",internship,non_manager,...,**Professional Internship: Hardware Developmen...,,"{""salary_low"":null,""salary_high"":null,""salary_...","full time, internship, m/f",closed,en,2024-07-31T17:50:07Z,17-2061.00,Architecture and Engineering,Computer Hardware Engineers
2,zf.com,,Process Expert BMS Production,https://jobs.zf.com/job/Shenyang-Process-Exper...,2024-04-19T06:47:24Z,2024-05-16T02:25:08Z,China,"[{""city"":null,""state"":null,""zip_code"":null,""co...",engineering,non_manager,...,ZF is a global technology company supplying sy...,,"{""salary_low"":null,""salary_high"":null,""salary_...",,closed,en,2024-05-18T02:32:04Z,51-9141.00,Production,Semiconductor Processing Technicians
3,bosch.com,,DevOps Developer with Python for ADAS Computin...,https://jobs.smartrecruiters.com/BoschGroup/74...,2024-08-16T10:20:37Z,2024-08-22T11:14:49Z,Romania,"[{""city"":null,""state"":null,""zip_code"":null,""co...","information_technology, software_development",non_manager,...,**DevOps Developer with Python for ADAS Comput...,,"{""salary_low"":null,""salary_high"":null,""salary_...",full time,closed,en,2024-08-23T00:33:30Z,15-1252.00,Computer and Mathematical,Software Developers
4,bosch.com,,Senior Engineer Sales - Video Systems and Solu...,https://jobs.smartrecruiters.com/BoschGroup/74...,2024-07-01T17:31:20Z,2024-08-01T05:11:33Z,India,"[{""city"":null,""state"":null,""zip_code"":null,""co...","engineering, sales",non_manager,...,**Senior Engineer Sales - Video Systems and So...,,"{""salary_low"":null,""salary_high"":null,""salary_...",full time,closed,en,2024-08-02T19:03:16Z,41-9031.00,Sales and Related,Sales Engineers


In [78]:
job_posting_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9919 entries, 0 to 9918
Data columns (total 21 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Website Domain         9919 non-null   object 
 1   Ticker                 0 non-null      float64
 2   Job Opening Title      9919 non-null   object 
 3   Job Opening URL        9919 non-null   object 
 4   First Seen At          9919 non-null   object 
 5   Last Seen At           9919 non-null   object 
 6   Location               9508 non-null   object 
 7   Location Data          9919 non-null   object 
 8   Category               8250 non-null   object 
 9   Seniority              9919 non-null   object 
 10  Keywords               7646 non-null   object 
 11  Description            9807 non-null   object 
 12  Salary                 576 non-null    object 
 13  Salary Data            9919 non-null   object 
 14  Contract Types         8004 non-null   object 
 15  Job 

In [79]:
# Remove Duplicates
job_posting_df = job_posting_df.drop_duplicates()


In [80]:
# Imputation of data without any contract type and location
# Delete rows without a contract type or a location since this is useless for analyzing


print(job_posting_df[[ 'Location', 'Contract Types']].isnull().sum())
print(f"Entries with both missing a Location and a Contract Type: {job_posting_df[['Location', 'Contract Types']].isnull().all(axis=1).sum()}\n")

job_posting_df = job_posting_df.dropna(subset=['Location', 'Contract Types'], how='any')

print(job_posting_df[[ 'Location', 'Contract Types']].isnull().sum())
print(f"Entries with both missing a Location and a Contract Type: {job_posting_df[['Location', 'Contract Types']].isnull().all(axis=1).sum()}")



Location           411
Contract Types    1915
dtype: int64
Entries with both missing a Location and a Contract Type: 30

Location          0
Contract Types    0
dtype: int64
Entries with both missing a Location and a Contract Type: 0


In [81]:
# Check the entries in the dataframe to check if succesful in cleaning
job_posting_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 7623 entries, 0 to 9918
Data columns (total 21 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Website Domain         7623 non-null   object 
 1   Ticker                 0 non-null      float64
 2   Job Opening Title      7623 non-null   object 
 3   Job Opening URL        7623 non-null   object 
 4   First Seen At          7623 non-null   object 
 5   Last Seen At           7623 non-null   object 
 6   Location               7623 non-null   object 
 7   Location Data          7623 non-null   object 
 8   Category               6396 non-null   object 
 9   Seniority              7623 non-null   object 
 10  Keywords               6030 non-null   object 
 11  Description            7571 non-null   object 
 12  Salary                 474 non-null    object 
 13  Salary Data            7623 non-null   object 
 14  Contract Types         7623 non-null   object 
 15  Job Statu

In [None]:
# Fixing text fields

# Remove any spaces and make the text into lower case
job_posting_df['O*NET Family'] = job_posting_df['O*NET Family'].str.strip().str.lower()
job_posting_df['Keywords'] = job_posting_df['Keywords'].str.strip().str.lower()
job_posting_df['Category'] = job_posting_df['Category'].str.strip().str.lower()



### Categorizing Data by Seniority, Job Category, Location, and Skills 
Job category was specified through analyzing the contents of the `O*NET Family Column` to specifically find related skills, education, or training that is required in a specific job field. Skills was specified through analyzing the content of the `Keywords Column` to specifically find the skills that often appeared in the dataset. The `unique()` function was used in categorizing all data to reduce redundant results.

In [None]:
# Check all unique values
unique_values_seniority = job_posting_df['Seniority'].unique()
print(unique_values_seniority)

['manager' 'non_manager' 'director' 'head' 'vice_president' 'c_level'
 'partner' 'president']


In [84]:
# Seniority Categorization
seniority_mapping = {
    'non_manager': 'Non-Managerial Position',
    'manager': 'Managerial Position',
    'director': 'Managerial Position',
    'head': 'Managerial Position',
    'vice_president': 'Executive Position',
    'c_level': 'Executive Position',
    'partner': 'Executive Position',
    'president': 'Executive Position',
}

# Map the values and count categories
seniority_categories = job_posting_df['Seniority'].map(seniority_mapping)
seniority_category_counts = seniority_categories.value_counts().sort_index()

print("Seniority Category Counts:")
print(seniority_category_counts)

Seniority Category Counts:
Seniority
Executive Position           17
Managerial Position        1433
Non-Managerial Position    6173
Name: count, dtype: int64


In [85]:
unique_job_fields = job_posting_df['O*NET Family'].unique()
print(unique_job_fields)

['office and administrative support' 'architecture and engineering'
 'computer and mathematical' 'sales and related'
 'installation, maintenance, and repair'
 'business and financial operations' 'production'
 'life, physical, and social science' 'management'
 'community and social service' 'transportation and material moving'
 'healthcare practitioners and technical' 'personal care and service'
 'educational instruction and library' 'construction and extraction'
 'arts, design, entertainment, sports, and media'
 'food preparation and serving related' 'protective service'
 'military specific' 'legal' 'healthcare support'
 'farming, fishing, and forestry'
 'building and grounds cleaning and maintenance' nan]


In [86]:
job_fields_mapping = {
    'office and administrative support' : 'Business and Administration',
    'architecture and engineering': 'Engineering and Construction',
    'computer and mathematical' : 'Technology',
    'sales and related': 'Business and Administration',    
    'installation, maintenance, and repair': 'Facilities Management and Services',
    'business and financial operations' : 'Business and Administration',
    'production': 'Manufacturing',
    'life, physical, and social science' : 'Science and Research',
    'management': 'Business and Administration',
    'community and social service' : 'Public Service',
    'transportation and material moving': 'Transportation and Logistics',
    'healthcare practitioners and technical' : 'Healthcare',
    'personal care and service': 'Healthcare',
    'educational instruction and library' : 'Education',
    'construction and extraction': 'Engineering and Construction',
    'arts, design, entertainment, sports, and media': 'Multimedia and Sports',
    'food preparation and serving related' : 'Facilities Management and Services',
    'protective service': 'Government and Public Safety',
    'military specific' : 'Government and Public Safety',
    'legal' : 'Legal Services',
    'healthcare support': 'Healthcare',
    'farming, fishing, and forestry': 'Agriculture and Natural Resources',
    'building and grounds cleaning and maintenance': 'Facilities Management and Services', 
    'nan' : 'Others'
}

job_fields_categories = job_posting_df['O*NET Family'].map(job_fields_mapping)
job_fields_category_counts = job_fields_categories.value_counts().sort_index()
print(job_fields_category_counts)


O*NET Family
Agriculture and Natural Resources       12
Business and Administration           2861
Education                              247
Engineering and Construction          1202
Facilities Management and Services     285
Government and Public Safety            53
Healthcare                             238
Legal Services                          18
Manufacturing                          669
Multimedia and Sports                   84
Public Service                          33
Science and Research                   269
Technology                            1342
Transportation and Logistics           308
Name: count, dtype: int64


In [88]:
# Split the locations first
split_locations = job_posting_df['Location'].str.split(',').explode()

# Then find the unique values, these mitigates redundancy a lot
unique_locations = split_locations.str.strip().unique()

print(unique_locations)



['Indiana' 'United States' 'Delaware' 'Romania' 'India' 'Yokohama' 'Japan'
 'Lincolnton' 'North Carolina' '28092' 'Hanau' 'Germany' 'Campinas'
 'Brazil' 'Charleston' 'South Carolina' '29418' 'Tennessee'
 'Fort Lauderdale' 'Florida' '33309' 'Singapore' 'United Kingdom' 'France'
 'Albion' '46701' 'Madrid' 'Spain' 'Bangalore' 'Vietnam' 'Malaysia'
 'Anderson' '29621' 'Debrecen' 'Hungary' 'Pamplona' 'Bursa' 'Turkey'
 'Salzburg' 'Austria' 'Slovenia' 'Vernon Hills' 'Illinois' '60061'
 'London' 'Tokyo' 'Australia' 'Massachusetts' 'Northville' 'Michigan'
 'Portugal' 'Osnabr\x9fck' 'Berlin' 'Budapest' 'Beijing' 'China'
 '_esk\x8e Bud_jovice' 'Czechia' 'Chandler' 'Arizona' '85226'
 'Auburn Hills' '48326' 'Hartland' 'Wisconsin' 'Vilnius' 'Lithuania'
 'Mexico' 'Mississippi' 'Shibuya' 'Dublin' 'Ireland' 'Shanghai' 'Vigo'
 'Houston' 'Texas' '77061' 'Nebraska' 'Barcelona' 'Cz_stochowa' 'Poland'
 'Mount Prospect' '60056' 'Sydney' 'Owatonna' 'Minnesota' '55060'
 'Belgium' 'Aveiro' 'Leverkusen' 'Virginia

In [89]:
# Locations mapped into countries; this includes streets, cities, zip codes
locations_mapping = {
    'Indiana': 'United States',
    'United States': 'United States',
    'Delaware': 'United States',
    'Romania': 'Romania',
    'India': 'India',
    'Yokohama': 'Japan',
    'Japan': 'Japan',
    'Lincolnton': 'United States',
    'North Carolina': 'United States',
    '28092': 'United States',
    'Hanau': 'Germany',
    'Germany': 'Germany',
    'Campinas': 'Brazil',
    'Brazil': 'Brazil',
    'Charleston': 'United States',
    'South Carolina': 'United States',
    '29418': 'United States',
    'Tennessee': 'United States',
    'Fort Lauderdale': 'United States',
    'Florida': 'United States',
    '33309': 'United States',
    'Singapore': 'Singapore',
    'United Kingdom': 'United Kingdom',
    'France': 'France',
    'Albion': 'United States',
    '46701': 'United States',
    'Madrid': 'Spain',
    'Spain': 'Spain',
    'Bangalore': 'India',
    'Vietnam': 'Vietnam',
    'Malaysia': 'Malaysia',
    'Anderson': 'United States',
    '29621': 'United States',
    'Debrecen': 'Hungary',
    'Hungary': 'Hungary',
    'Pamplona': 'Spain',
    'Bursa': 'Turkey',
    'Turkey': 'Turkey',
    'Salzburg': 'Austria',
    'Austria': 'Austria',
    'Slovenia': 'Slovenia',
    'Vernon Hills': 'United States',
    'Illinois': 'United States',
    '60061': 'United States',
    'London': 'United Kingdom',
    'Tokyo': 'Japan',
    'Australia': 'Australia',
    'Massachusetts': 'United States',
    'Northville': 'United States',
    'Michigan': 'United States',
    'Portugal': 'Portugal',
    'Osnabrück': 'Germany',
    'Berlin': 'Germany',
    'Budapest': 'Hungary',
    'Beijing': 'China',
    'China': 'China',
    'České Budějovice': 'Czechia',
    'Czechia': 'Czechia',
    'Chandler': 'United States',
    'Arizona': 'United States',
    '85226': 'United States',
    'Auburn Hills': 'United States',
    '48326': 'United States',
    'Hartland': 'United States',
    'Wisconsin': 'United States',
    'Vilnius': 'Lithuania',
    'Lithuania': 'Lithuania',
    'Mexico': 'Mexico',
    'Mississippi': 'United States',
    'Shibuya': 'Japan',
    'Dublin': 'Ireland',
    'Ireland': 'Ireland',
    'Shanghai': 'China',
    'Vigo': 'Spain',
    'Houston': 'United States',
    'Texas': 'United States',
    '77061': 'United States',
    'Nebraska': 'United States',
    'Barcelona': 'Spain',
    'Częstochowa': 'Poland',
    'Poland': 'Poland',
    'Mount Prospect': 'United States',
    '60056': 'United States',
    'Sydney': 'Australia',
    'Owatonna': 'United States',
    'Minnesota': 'United States',
    '55060': 'United States',
    'Belgium': 'Belgium',
    'Aveiro': 'Portugal',
    'Leverkusen': 'Germany',
    'Virginia': 'United States',
    'Passau': 'Germany',
    'Bamberg': 'Germany',
    'Bentonville': 'United States',
    'Arkansas': 'United States',
    'South America': 'Colombia',
    'Bogotá': 'Colombia',
    'Colombia': 'Colombia',
    'San Francisco': 'United States',
    'California': 'United States',
    'Georgia': 'United States',
    'Rolla': 'United States',
    'Missouri': 'United States',
    '65401': 'United States',
    'Atlanta': 'United States',
    '30336': 'United States',
    'Charlotte': 'United States',
    '28273': 'United States',
    'Mankato': 'United States',
    '56003': 'United States',
    'Monterrey': 'Mexico',
    'Pune': 'India',
    'Belo Horizonte': 'Brazil',
    'Chile': 'Chile',
    'Bekasi': 'Indonesia',
    'Indonesia': 'Indonesia',
    'Eger': 'Hungary',
    'Mesa': 'United States',
    '85212': 'United States',
    'Burnsville': 'United States',
    'Timișoara': 'Romania',
    'Sunnyvale': 'United States',
    '94085': 'United States',
    'Pančevo': 'Serbia',
    'Serbia': 'Serbia',
    'Plymouth': 'United States',
    '48170': 'United States',
    'Tuscaloosa': 'United States',
    'Alabama': 'United States',
    'Friedrichshafen': 'Germany',
    'Brandenburg': 'Germany',
    'Bethlehem': 'United States',
    'Pennsylvania': 'United States',
    '18017': 'United States',
    'Farmington Hills': 'United States',
    '48331': 'United States',
    'Bucharest': 'Romania',
    'Chennai': 'India',
    'Lafayette': 'United States',
    'Hannover': 'Germany',
    'Villa Park': 'United States',
    '47904': 'United States',
    'Maine': 'United States',
    'Thailand': 'Thailand',
    '60181': 'United States',
    'Niagara': 'United States',
    'New York': 'United States',
    'Fairport': 'United States',
    '14450': 'United States',
    'Denver': 'United States',
    'Colorado': 'United States',
    'Utrecht': 'Netherlands',
    'Netherlands': 'Netherlands',
    'Nederland': 'Netherlands',
    'Chihuahua': 'Mexico',
    'Armenia': 'Armenia',
    'August': 'United States',
    'São Paulo': 'Brazil',
    'Nevada': 'United States',
    'Wartburg': 'United States',
    '37887': 'United States',
    'Ponte de Lima': 'Portugal',
    'Washington': 'United States',
    '48094': 'United States',
    'Buford': 'United States',
    '30518': 'United States',
    'Minneapolis': 'United States',
    'Gothenburg': 'Sweden',
    'Sweden': 'Sweden',
    'Rayong': 'Thailand',
    'Stuttgart': 'Germany',
    'Egypt': 'Egypt',
    'Greece': 'Greece',
    'Koblenz': 'Germany',
    'Ghana': 'Ghana',
    'Hanoi': 'Vietnam',
    'Northern America': 'United States',
    'Europe': 'Germany',
    'Plzeň': 'Czechia',
    'Paris': 'France',
    'Dortmund': 'Germany',
    '55337': 'United States',
    'Chengdu': 'China',
    'Bitterfeld': 'Germany',
    'Neuwied': 'Germany',
    'Canada': 'Canada',
    'Morocco': 'Morocco',
    'Boston': 'United States',
    'Pleasanton': 'United States',
    '94566': 'United States',
    'Guangzhou': 'China',
    'Suzhou': 'China',
    'Marysville': 'United States',
    'Anderlecht': 'Belgium',
    'Stamford': 'United States',
    'Connecticut': 'United States',
    'Saarbrücken': 'Germany',
    'Denmark': 'Denmark',
    'Celaya': 'Mexico',
    'Florence': 'United States',
    'Kentucky': 'United States',
    '41042': 'United States',
    'Istanbul': 'Turkey',
    'Hoffman Estates': 'United States',
    '60192': 'United States',
    'Changsha': 'China',
    'Linz': 'Austria',
    'New Zealand': 'New Zealand',
    'Manisa': 'Turkey',
    'Almaty': 'Kazakhstan',
    'Kazakhstan': 'Kazakhstan',
    'Fountain Inn': 'United States',
    '29644': 'United States',
    'Oak Brook': 'United States',
    'Walnut Ridge': 'United States',
    '72476': 'United States',
    'Louisiana': 'United States',
    'Mexico City': 'Mexico',
    'Pforzheim': 'Germany',
    'Querétaro': 'Mexico',
    'Juárez': 'Mexico',
    'Idaho': 'United States',
    'Livonia': 'United States',
    '48150': 'United States',
    'Ansbach': 'Germany',
    'Watertown': 'United States',
    '02472': 'United States',
    'Eisenach': 'Germany',
    'İzmir': 'Turkey',
    'Braunschweig': 'Germany',
    'Italy': 'Italy',
    'Changzhou': 'China',
    'Atlantic': 'United States',
    'Iowa': 'United States',
    'Wrocław': 'Poland',
    'South Dakota': 'United States',
    'Auckland': 'New Zealand',
    'Grove City': 'United States',
    'Ohio': 'United States',
    '43123': 'United States',
    'Trnava': 'Slovakia',
    'Slovakia': 'Slovakia',
    'Caen': 'France',
    'San Luis Potosí': 'Mexico',
    'South Africa': 'South Africa',
    'Southern Africa': 'South Africa',
    'Ho Chi Minh City': 'Vietnam',
    'Switzerland': 'Switzerland',
    'Garrett': 'United States',
    'Warsaw': 'Poland',
    'Chicago': 'United States',
    'Dresden': 'Germany',
    'Santa Fe': 'United States',
    '': 'United States',
    'Argentina': 'Argentina',
    'Bhāvnagar': 'India',
    'Allentown': 'United States',
    'Saint Joseph': 'United States',
    '49085': 'United States',
    '53029': 'Germany',
    'Tulsa': 'United States',
    'Oklahoma': 'United States',
    'Mechelen': 'Belgium',
    'Amsterdam': 'Netherlands',
    'Düsseldorf': 'Germany',
    'Saitama': 'Japan',
    'Gainesville': 'United States',
    'Eindhoven': 'Netherlands',
    'Petaling Jaya': 'Malaysia',
    'Nottingham': 'United Kingdom',
    'Roseville': 'United States',
    '95747': 'United States',
    'Memphis': 'United States',
    'Shelby': 'United States',
    '28152': 'United States',
    'Manila': 'Philippines',
    'Philippines': 'Philippines',
    'North Charleston': 'United States',
    'Coimbatore': 'India',
    'Ingolstadt': 'Germany',
    'Greenville': 'United States',
    '29645': 'United States',
    'Munster': 'Germany',
    'Braga': 'Portugal',
    'Cartago': 'Costa Rica',
    'Costa Rica': 'Costa Rica',
    'Flowery Branch': 'United States',
    '30542': 'United States',
    'Corpus Christi': 'United States',
    '78406': 'United States',
    'Dayton': 'United States',
    'Aachen': 'Germany',
    'Tampa': 'United States',
    '33607': 'United States',
    'Laredo': 'United States',
    '78045': 'United States',
    '30324': 'United States',
    'Tsuchiura': 'Japan',
    'Braselton': 'United States',
    'Bonn': 'Germany',
    'Lincolnshire': 'United States',
    '60069': 'United States',
    'Austin': 'United States',
    'Philadelphia': 'United States',
    'Rio Bravo': 'Mexico',
    'Brits': 'South Africa',
    'Castelo Branco': 'Portugal',
    'Anna': 'United States',
    'Port Huron': 'United States',
    'Wuhan': 'China',
    'Cali': 'Colombia',
    'Raleigh': 'United States',
    'Heilbronn': 'Germany',
    'Conshohocken': 'United States',
    '19428': 'United States',
    'Jacksonville': 'United States',
    'Kecskemét': 'Hungary',
    'Straubing': 'Germany',
    'Summerville': 'United States',
    '29483': 'United States',
    'Gómez Palacio': 'Mexico',
    'Reutlingen': 'Germany',
    'Toluca': 'Mexico',
    'Nasushiobara': 'Japan',
    'District of Columbia': 'United States',
    'Reynosa': 'Mexico',
    'Klagenfurt': 'Austria',
    'Jiaxing': 'China',
    'Brussels': 'Belgium',
    'Vernon': 'United States',
    'Guadalajara': 'Mexico',
    'Ankara': 'Turkey',
    'Hyderābād': 'India',
    'Jersey City': 'United States',
    'New Jersey': 'United States',
    'Bochum': 'Germany',
    'Bayreuth': 'Germany',
    'Antwerp': 'Belgium',
    'Sofia': 'Bulgaria',
    'Bulgaria': 'Bulgaria',
    'Phoenix': 'United States',
    'Hangzhou': 'China',
    'Socorro': 'United States',
    '79927': 'United States',
    'New Hampshire': 'United States',
    'Pittsburgh': 'United States',
    '15222': 'United States',
    'Decatur': 'United States',
    '62441': 'United States',
    'Sri Lanka': 'Sri Lanka',
    'Montana': 'United States',
    'Detroit': 'United States',
    'Windsor': 'Canada',
    'Novi Sad': 'Serbia',
    'Miami': 'United States',
    'Jamshedpur': 'India',
    'Solihull': 'United Kingdom',
    'Mechanicsville': 'United States',
    '23111': 'United States',
    'Bengaluru': 'India',
    'El Paso': 'United States',
    'Auburn': 'United States',
    'Frankfurt': 'Germany',
    'Campo Grande': 'Brazil',
    '02215': 'United States',
    'Oeiras': 'Portugal',
    'Greer': 'United States',
    '29651': 'United States',
    'Calama': 'Chile',
    'Puteaux': 'France',
    'Gelsenkirchen': 'Germany',
    'Kocaeli': 'Turkey',
    '48040': 'United States',
    'Schweinfurt': 'Germany',
    'Sakarya': 'Turkey',
    'Innsbruck': 'Austria',
    'Sumaré': 'Brazil',
    'Seattle': 'United States',
    'Cluj-Napoca': 'Romania',
    'Qingdao': 'China',
    '48073': 'United States',
    'Asia': 'China',
    'Panama': 'Panama',
    'Grand Rapids': 'United States',
    '49512': 'United States',
    'Bourges': 'France',
    'Brno': 'Czechia',
    'Czech Republic': 'Czechia',
    'Saint Paul': 'United States',
    '55127': 'United States',
    'Zürich': 'Switzerland',
    'Duncan': 'United States',
    'Baltimore': 'United States',
    'Maryland': 'United States',
    'CA': 'United States',
    'Craiova': 'Romania',
    'Burlington': 'United States',
    '95678': 'United States',
    'High Point': 'United States',
    '27263': 'United States',
    'Welcome': 'United States',
    'Setúbal': 'Portugal',
    'Wetzlar': 'Germany',
    'Bern': 'Switzerland',
    'Dornbirn': 'Austria',
    'Salamanca': 'Spain',
    'Vienna': 'Austria',
    'Faro': 'Portugal',
    'Ambato': 'Ecuador',
    'Ecuador': 'Ecuador',
    'Lancaster': 'United States',
    'Chongqing': 'China',
    'Ramos Arizpe': 'Mexico',
    'Santa Barbara': 'United States',
    'Munich': 'Germany',
    'Bologna': 'Italy',
    'Finland': 'Finland',
    'Zwolle': 'Netherlands',
    'Milan': 'Italy',
    'Oregon': 'United States',
    'Tlaquepaque': 'Mexico',
    'Magdeburg': 'Germany',
    'Immenstadt': 'Germany',
    'Hebron': 'United States',
    '41048': 'United States',
    'Chad': 'Chad',
    'Czech': 'Czechia',
    'Lisbon': 'Portugal',
    'Hawaii': 'United States',
    'Gliwice': 'Poland',
    'Ravensburg': 'Germany',
    'Augsburg': 'Germany',
    'Sibiu': 'Romania',
    'Dubai': 'United Arab Emirates',
    'United Arab Emirates': 'United Arab Emirates',
    '11433': 'United States',
    'Salt Lake City': 'United States',
    'Utah': 'United States',
    'Allen Park': 'United States',
    '48101': 'United States',
    'San Antonio': 'United States',
    'Santa Fe Springs': 'United States',
    'Athens': 'Greece',
    '90670': 'United States',
    'Fridley': 'United States',
    '55421': 'United States',
    'San': 'United States',
    'Mali': 'Mali',
    'Kalamazoo': 'United States',
    '49009': 'United States',
    'Amiens': 'France',
    'Antofagasta': 'Chile',
    'Edmonton': 'Canada',
    'Alberta': 'Canada',
    'Bandung': 'Indonesia',
    'Kranj': 'Slovenia',
    'Morrisville': 'United States',
    '19067': 'United States',
    'Gaithersburg': 'United States',
    '20878': 'United States',
    'Weihai': 'China',
    'Jaipur': 'India',
    '30501': 'United States',
    'Brasília': 'Brazil',
    '17601': 'United States',
    'Los Angeles': 'United States',
    'San Diego': 'United States',
    'Rājkot': 'India',
    'Jamaica': 'Jamaica',
    '11430': 'United States',
    'Lincoln': 'United States',
    '68507': 'United States',
    'LA': 'United States',
    'Dallas': 'United States',
    'Vermont': 'United States',
    'Peru': 'Peru',
    '46970': 'United States',
    'Prague': 'Czechia',
    'Fenton': 'United States',
    '48430': 'United States',
    'Porto': 'Portugal',
    'Londonderry': 'United States',
    '03053': 'United States',
    'Hamamatsu': 'Japan',
    'Miramar': 'United States',
    'Bāli': 'India',
    'Mooresville': 'United States',
    '28117': 'United States',
    '18020': 'United States',
    'Vernier': 'Switzerland',
    'Turin': 'Italy',
    'Wilmington': 'United States',
    '28405': 'United States',
    'Busan': 'South Korea',
    'South Korea': 'South Korea',
    'Graz': 'Austria',
    '35401': 'United States',
    'Aschaffenburg': 'Germany',
    'Zaria': 'Nigeria',
    'Nigeria': 'Nigeria',
    'São Bernardo do Campo': 'Brazil',
    'Shah Alam': 'Malaysia',
    'Trujillo': 'Peru',
    'Leipzig': 'Germany',
    'Unity': 'United States',
    'Belgrade': 'Serbia',
    'Portland': 'United States',
    'Hagen': 'Germany',
    'Alicante': 'Spain',
    'Viseu': 'Portugal',
    'Toyota': 'Japan',
    'Bielefeld': 'Germany',
    'Wuxi': 'China',
    'Alexandria': 'Egypt',
    'Wilde': 'Argentina',
    'Orlando': 'United States',
    'Lille': 'France',
    'Sacramento': 'United States',
    'Arras': 'France',
    'Mut': 'Turkey',
    '48836': 'United States',
    'Vila Real': 'Portugal',
    'Accra': 'Ghana',
    'Southampton': 'United States',
    '18966': 'United States',
    'Kaunas': 'Lithuania',
    'Bordeaux': 'France',
    'Gießen': 'Germany',
    'Bratislava': 'Slovakia',
    'North America': 'United States',
    'Tochigi': 'Japan'
}

location_categories = job_posting_df['Location'].map(locations_mapping)
location_category_counts = location_categories.value_counts().sort_index()

print("Location by Continent - Countries Involved Count:")
print(location_category_counts)



Location by Continent - Countries Involved Count:
Location
Armenia             1
Australia          29
Austria            19
Belgium            25
Brazil             74
Bulgaria            1
Canada              5
Chad                2
Chile               1
China               9
Colombia            1
Costa Rica          2
Czechia            53
Denmark            56
Egypt              13
Finland             4
France             11
Germany           256
Ghana               2
Greece              3
Hungary            16
India             317
Indonesia           2
Ireland            15
Italy              20
Japan              93
Malaysia           98
Mexico            130
Morocco            20
Netherlands        22
New Zealand         4
Nigeria             1
Panama              2
Poland             74
Portugal          139
Romania           109
Serbia             60
Slovakia           13
Slovenia           50
Spain              21
Sri Lanka           1
Switzerland         9
Thailand         

In [90]:
# Split the locations first
split_keywords = job_posting_df['Keywords'].str.split(',').explode()

# Then find the unique values, these mitigates redundancy a lot
unique_keywords = split_keywords.str.strip().unique()

print(unique_keywords)

[nan 'scrum' 'github' 'jenkins' 'growth' 'c++' 'linux' 'python'
 'microsoft azure' 'docker' 'business development' 'internship'
 'ecommerce' 'sap successfactors' 'e-commerce' 'servicenow' 'microsoft'
 'sap' 'cognex' 'omron' 'call center' 'hris' 'salesforce' 'social media'
 'customer success' 'contentful' 'gainsight' 'facebook' 'linkedin'
 'agorapulse' 'teamtailor' '.net' 'c#' 'angular' 'android' 'java' 'gerrit'
 'kotlin' 'power bi' 'keyence' 'bmc remedy' 'databricks'
 'azure databricks' 'microsoft excel' 'microsoft teams' 'simulink' 'novi'
 'kanban' 'real estate' 'microsoft word' 'sap s/4hana' 'informatica'
 'atlassian' 'atlassian jira' 'splunk' 'matlab' 'selenium' 'gradle'
 'postman' 'javascript' 'successfactors' 'qualtrics' 'microsoft 365'
 'contractor' 'branding' 'outbound' 'glassdoor' 'websocket' 'sigfox'
 'json' 'django' 'ansible' 'kubernetes' 'marketing campaigns' 'front-end'
 'back-end' 'angularjs' 'node.js' 'php' 'ruby' 'gatsby' 'graphql' 'remix'
 'informa' 'hubspot' 'microsoft

In [91]:
keywords_skills_mapping = {

    # Programming Languages
    'c++': 'Programming Languages',
    'python': 'Programming Languages',
    'c#': 'Programming Languages',
    'java': 'Programming Languages',
    'kotlin': 'Programming Languages',
    'javascript': 'Programming Languages',
    'php': 'Programming Languages',
    'ruby': 'Programming Languages',
    'abap': 'Programming Languages',
    'go': 'Programming Languages',
    'typescript': 'Programming Languages',
    'perl': 'Programming Languages',
    'dart': 'Programming Languages',
    'c': 'Programming Languages',
    'cobol': 'Programming Languages',
    'objective-c': 'Programming Languages',
    'scala': 'Programming Languages',
    'solidity': 'Programming Languages',
    'visual basic .net': 'Programming Languages',  
    
    # Frameworks & Libraries
    '.net': 'Frameworks & Libraries',
    'angular': 'Frameworks & Libraries',
    'django': 'Frameworks & Libraries',
    'node.js': 'Frameworks & Libraries',
    'angularjs': 'Frameworks & Libraries',
    'react': 'Frameworks & Libraries',
    'spring framework': 'Frameworks & Libraries',
    'spring boot': 'Frameworks & Libraries',
    'flask': 'Frameworks & Libraries',
    'jquery': 'Frameworks & Libraries',
    'vue.js': 'Frameworks & Libraries',
    'laravel': 'Frameworks & Libraries',
    'gatsby': 'Frameworks & Libraries',
    'graphql': 'Frameworks & Libraries',
    'remix': 'Frameworks & Libraries',
    'primefaces': 'Frameworks & Libraries',
    'blazor': 'Frameworks & Libraries',
    'rxjs': 'Frameworks & Libraries',
    'svelte': 'Frameworks & Libraries',
    'drupal': 'Frameworks & Libraries',
    'jekyll': 'Frameworks & Libraries',
    'bootstrap': 'Frameworks & Libraries',
    'ext js': 'Frameworks & Libraries',
    'next.js': 'Frameworks & Libraries',
    'mui': 'Frameworks & Libraries',
    'primeng': 'Frameworks & Libraries',
    'scikit-learn': 'Frameworks & Libraries',
    'pytorch': 'Frameworks & Libraries',
    'tensorflow': 'Frameworks & Libraries',
    'ionic': 'Frameworks & Libraries',
    'j2ee': 'Frameworks & Libraries',
    'junit': 'Frameworks & Libraries',
    'nunit': 'Frameworks & Libraries',
    'redux': 'Frameworks & Libraries',
    'nette framework': 'Frameworks & Libraries',
    
    # Tools & Platforms
    'github': 'Tools & Platforms',
    'gitlab': 'Tools & Platforms',
    'jenkins': 'Tools & Platforms',
    'docker': 'Tools & Platforms',
    'kubernetes': 'Tools & Platforms',
    'ansible': 'Tools & Platforms',
    'postman': 'Tools & Platforms',
    'selenium': 'Tools & Platforms',
    'appium': 'Tools & Platforms',
    'gradle': 'Tools & Platforms',
    'gerrit': 'Tools & Platforms',
    'ranorex': 'Tools & Platforms',
    'splunk': 'Tools & Platforms',
    'matlab': 'Tools & Platforms',
    'minitab': 'Tools & Platforms',
    'autocad': 'Tools & Platforms',
    'arduino': 'Tools & Platforms',
    'stata': 'Tools & Platforms',
    'labview': 'Tools & Platforms',
    'simulink': 'Tools & Platforms',
    'ansys': 'Tools & Platforms',
    'catia': 'Tools & Platforms',
    'ptc creo': 'Tools & Platforms',
    'android studio': 'Tools & Platforms',
    'visual studio code': 'Tools & Platforms',
    'eclipse ide': 'Tools & Platforms',
    'microsoft visual studio': 'Tools & Platforms',
    'autodesk eagle': 'Tools & Platforms',
    'autodesk revit': 'Tools & Platforms',
    'corel': 'Tools & Platforms',
    'dreamweaver': 'Tools & Platforms',
    'paintshop pro': 'Tools & Platforms',
    'sourcetree': 'Tools & Platforms',
    'webstorm': 'Tools & Platforms',
    'robot framework': 'Tools & Platforms',
    
    # Cloud Services
    'microsoft azure': 'Cloud Services',
    'azure devops': 'Cloud Services',
    'google cloud': 'Cloud Services',
    'amazon web services': 'Cloud Services',
    'google cloud platform': 'Cloud Services',
    'oracle cloud': 'Cloud Services',
    'azure stream analytics': 'Cloud Services',
    'azure kubernetes service': 'Cloud Services',
    'azure pipelines': 'Cloud Services',
    'azure synapse analytics': 'Cloud Services',
    'azure sql': 'Cloud Services',
    'azure artifacts': 'Cloud Services',
    'azure boards': 'Cloud Services',
    'azure monitor': 'Cloud Services',
    'google kubernetes engine': 'Cloud Services',
    'aws iot': 'Cloud Services',
    'aws lambda': 'Cloud Services',
    'amazon ecs': 'Cloud Services',
    'amazon rds': 'Cloud Services',
    'google workspace': 'Cloud Services',
    
    # Databases
    'mysql': 'Databases',
    'mongodb': 'Databases',
    'postgresql': 'Databases',
    'oracle database': 'Databases',
    'microsoft sql server': 'Databases',
    'nosql': 'Databases',
    'sqlite': 'Databases',
    'couchbase': 'Databases',
    'redis': 'Databases',
    'netezza': 'Databases',
    'teradata analytics platform': 'Databases',
    'sybase': 'Databases',
    'mariadb': 'Databases',
    
    # Operating Systems
    'linux': 'Operating Systems',
    'windows server': 'Operating Systems',
    'macos': 'Operating Systems',
    'ubuntu': 'Operating Systems',
    'ibm aix': 'Operating Systems',
    'debian': 'Operating Systems',
    'unix': 'Operating Systems',
    'windows 11': 'Operating Systems',
    'suse': 'Operating Systems',
    'vmware esxi': 'Operating Systems',
    
    # DevOps & CI/CD
    'continuous deployment': 'DevOps & CI/CD',
    'git': 'DevOps & CI/CD',
    'gitlab ci': 'DevOps & CI/CD',
    'circleci': 'DevOps & CI/CD',
    'jenkins': 'DevOps & CI/CD',  
    'docker': 'DevOps & CI/CD',  
    'kubernetes': 'DevOps & CI/CD',  
    'ansible': 'DevOps & CI/CD',  
    'gradle': 'DevOps & CI/CD',  
    
    # ERP & Business Software
    'sap': 'ERP & Business Software',
    'sap successfactors': 'ERP & Business Software',
    'sap s/4hana': 'ERP & Business Software',
    'sap hana': 'ERP & Business Software',
    'sap erp': 'ERP & Business Software',
    'sap abap': 'ERP & Business Software',
    'sap financial accounting': 'ERP & Business Software',
    'sap sales': 'ERP & Business Software',
    'sap controlling': 'ERP & Business Software',
    'sap fiori': 'ERP & Business Software',
    'sap crm': 'ERP & Business Software',
    'sap analytics cloud': 'ERP & Business Software',
    'sap bw/4hana': 'ERP & Business Software',
    'sap netweaver': 'ERP & Business Software',
    'sap strategic enterprise management': 'ERP & Business Software',
    'sap materials management': 'ERP & Business Software',
    'sap treasury and risk management': 'ERP & Business Software',
    'sap quality management': 'ERP & Business Software',
    'sap solution manager': 'ERP & Business Software',
    'sap lumira': 'ERP & Business Software',
    'sap data migration': 'ERP & Business Software',
    'sap master data governance': 'ERP & Business Software',
    'sap cloud platform': 'ERP & Business Software',
    'sap grc': 'ERP & Business Software',
    'sap enable now': 'ERP & Business Software',
    'sap business warehouse': 'ERP & Business Software',
    'sap warehouse management': 'ERP & Business Software',
    'sap sales and distribution': 'ERP & Business Software',
    'sap ariba': 'ERP & Business Software',
    'sap hybris': 'ERP & Business Software',
    'oracle': 'ERP & Business Software',
    'netsuite': 'ERP & Business Software',
    'microsoft dynamics': 'ERP & Business Software',
    'microsoft dynamics 365': 'ERP & Business Software',
    'microsoft dynamics crm': 'ERP & Business Software',
    'workday payroll': 'ERP & Business Software',
    'adp': 'ERP & Business Software',
    'ukg': 'ERP & Business Software',
    'kronos': 'ERP & Business Software',
    'coupa': 'ERP & Business Software',
    'celonis': 'ERP & Business Software',
    'alteryx': 'ERP & Business Software',
    'talend': 'ERP & Business Software',
    'informatica': 'ERP & Business Software',
    
    # CRM
    'salesforce': 'CRM',
    'salesforce commerce cloud': 'CRM',
    'salesforce marketing cloud': 'CRM',
    'hubspot': 'CRM',
    'gainsight': 'CRM',
    
    # CMS & Web Platforms
    'wordpress': 'CMS & Web Platforms',
    'drupal': 'CMS & Web Platforms',
    'sitecore': 'CMS & Web Platforms',
    'contentful': 'CMS & Web Platforms',
    'contentstack': 'CMS & Web Platforms',
    'shopify': 'CMS & Web Platforms',
    'webflow': 'CMS & Web Platforms',
    
    # Marketing & Social Media Tools
    'google analytics': 'Marketing & Social Media Tools',
    'google ads': 'Marketing & Social Media Tools',
    'google adwords': 'Marketing & Social Media Tools',
    'facebook ads': 'Marketing & Social Media Tools',
    'linkedin': 'Marketing & Social Media Tools',
    'agorapulse': 'Marketing & Social Media Tools',
    'hootsuite': 'Marketing & Social Media Tools',
    'semrush': 'Marketing & Social Media Tools',
    'ahrefs': 'Marketing & Social Media Tools',
    'marketo': 'Marketing & Social Media Tools',
    'hubspot': 'Marketing & Social Media Tools',  
    'pinterest': 'Marketing & Social Media Tools',
    'youtube': 'Marketing & Social Media Tools',
    'optimizely': 'Marketing & Social Media Tools',
    'google optimize': 'Marketing & Social Media Tools',
    'google my business': 'Marketing & Social Media Tools',
    'linkedin sales navigator': 'Marketing & Social Media Tools',
    'zoominfo': 'Marketing & Social Media Tools',
    'outreach.io': 'Marketing & Social Media Tools',
    'demandbase': 'Marketing & Social Media Tools',
    'sendoso': 'Marketing & Social Media Tools',
    'account based marketing': 'Marketing & Social Media Tools',
    'demand generation': 'Marketing & Social Media Tools',
    'content marketing': 'Marketing & Social Media Tools',
    'seo': 'Marketing & Social Media Tools',
    
    # Design Tools
    'figma': 'Design Tools',
    'adobe creative suite': 'Design Tools',
    'adobe photoshop': 'Design Tools',
    'adobe indesign': 'Design Tools',
    'adobe illustrator': 'Design Tools',
    'adobe xd': 'Design Tools',
    'adobe after effects': 'Design Tools',
    'adobe acrobat dc': 'Design Tools',
    'adobe premiere pro': 'Design Tools',
    'adobe captivate': 'Design Tools',
    'adobe framemaker': 'Design Tools',
    'canva': 'Design Tools',
    
    # Project Management
    'jira': 'Project Management',
    'trello': 'Project Management',
    'asana': 'Project Management',
    'smartsheet': 'Project Management',
    'microsoft teams': 'Project Management',
    'conceptboard': 'Project Management',
    'cutover': 'Project Management',
    'kanban': 'Project Management',
    'scrum': 'Project Management',
    
    # Networking & Security
    'cisco': 'Networking & Security',
    'cisco catalyst': 'Networking & Security',
    'cisco nexus': 'Networking & Security',
    'fortinet': 'Networking & Security',
    'vmware': 'Networking & Security',
    'kerberos': 'Networking & Security',
    'openstack': 'Networking & Security',
    'cloudflare': 'Networking & Security',
    'okta': 'Networking & Security',
    'cyberark': 'Networking & Security',
    'microsoft defender': 'Networking & Security',
    'microsoft intune': 'Networking & Security',
    'hashicorp consul': 'Networking & Security',
    'infoblox': 'Networking & Security',
    'ns1': 'Networking & Security',
    'openssl': 'Networking & Security',
    'owasp': 'Networking & Security',
    'zeromq': 'Networking & Security',
    
    # Analytics & BI
    'power bi': 'Analytics & BI',
    'tableau prep': 'Analytics & BI',  
    'qlikview': 'Analytics & BI',
    'looker': 'Analytics & BI',
    'microstrategy': 'Analytics & BI',
    'sas': 'Analytics & BI',
    'numpy': 'Analytics & BI',
    'plotly': 'Analytics & BI',
    'dataiku': 'Analytics & BI',
    'apache spark': 'Analytics & BI',
    'apache flink': 'Analytics & BI',
    'apache kafka': 'Analytics & BI',
    'grafana': 'Analytics & BI',
    'prometheus': 'Analytics & BI',
    'qualtrics': 'Analytics & BI',
    
    # Hardware
    'cognex': 'Hardware',
    'omron': 'Hardware',
    'keyence': 'Hardware',
    'arduino': 'Hardware',  
    
    # Methodologies
    'scrum': 'Methodologies', 
    'growth': 'Methodologies',
    'metrics driven': 'Methodologies',
    
    # Other
    'call center': 'Other',
    'hris': 'Other',
    'social media': 'Other',
    'customer success': 'Other',
    'internship': 'Other',
    'ecommerce': 'Other',
    'e-commerce': 'Other',
    'business development': 'Other',
    'real estate': 'Other',
    'contractor': 'Other',
    'branding': 'Other',
    'outbound': 'Other',
    'marketing campaigns': 'Other',
    'front-end': 'Other',
    'back-end': 'Other',
    'ui/ux': 'Other',
    'bem': 'Other',
    'prospecting': 'Other',
    'sales prospecting': 'Other',
    'copywriter': 'Other',
    'martech': 'Other',
    'blockchain': 'Other',
    'pfs': 'Other',
    'uml': 'Other',
    'json': 'Other',
    'websocket': 'Other',
    'sigfox': 'Other',
    'rss': 'Other',
    'dx': 'Other',
    'full time job': 'Other',
    'cropping': 'Other',
    'post-production': 'Other',
    'timesheets': 'Other',
    'metadata': 'Other',
    'swc': 'Other',
    'vite': 'Other',
    'xslt': 'Other',
    'cuda': 'Other',
    'webrtc': 'Other',
    'openid connect': 'Other',
    'grpc': 'Other',
    'rabbitmq': 'Other',
    'apache tomcat': 'Other',
    'apache jmeter': 'Other',
    'apache airflow': 'Other',
    'apache hadoop': 'Other',
    'lucene': 'Other',
    'solr': 'Other',
    'elasticsearch': 'Other',
    'opensearch': 'Other',
    'mlflow': 'Other',
    'dbt': 'Other',
    'walkme': 'Other',
    'jupyter': 'Other',
    'cloudera': 'Other',
    'databricks': 'Other',
    'azure databricks': 'Other',
    'snowflake': 'Other',
    'hadoop': 'Other',
    'atlassian': 'Other',
    'atlassian jira': 'Other',
    'atlassian bitbucket': 'Other',
    'concur': 'Other',
    'servicenow': 'Other',
    'bmc remedy': 'Other',
    'microsoft excel': 'Other',
    'microsoft word': 'Other',
    'microsoft powerpoint': 'Other',
    'microsoft outlook': 'Other',
    'microsoft 365': 'Other',
    'office 365': 'Other',
    'microsoft power bi': 'Other',
    'microsoft power apps': 'Other',
    'microsoft power automate': 'Other',
    'microsoft sharepoint': 'Other',
    'microsoft sharepoint online': 'Other',
    'microsoft active directory': 'Other',
    'microsoft system center': 'Other',
    'microsoft advertising': 'Other',
    'microsoft signalr': 'Other',
    'microsoft.net': 'Other',
    'google sheets': 'Other',
    'google drive': 'Other',
    'zoom': 'Other',
    'skype': 'Other',
    'whatsapp': 'Other',
    'glassdoor': 'Other',
    'cvs': 'Other',
    'windchill': 'Other',
    'centro': 'Other',
    'meister': 'Other',
    'novi': 'Other',
    'ativo': 'Other',
    'como': 'Other',
    'venda': 'Other',
    'ams': 'Other',
    'brt': 'Other',
    'aos': 'Other',
    'rms': 'Other',
    'iis': 'Other',
    'nginx': 'Other',
    'jboss': 'Other',
    'webmethods': 'Other',
    'outsystems': 'Other',
    'axure': 'Other',
    'unity3d': 'Other',
    'pwa': 'Other',
    'jda': 'Other',
    'blue yonder': 'Other',
    'kinaxis': 'Other',
    'personio': 'Other',
    'swimlane': 'Other',
    'tricentis': 'Other',
    'cognigy': 'Other',
    'genesys': 'Other',
    'dhl': 'Other',
    'fedex': 'Other',
    'ups': 'Other',
    'vodafone': 'Other',
    'bt': 'Other',
    'juniper networks': 'Other',
    'thingworx': 'Other',
    'oracle enterprise manager': 'Other',
    'glpi': 'Other',
    'goanywhere': 'Other',
    'carta': 'Other',
    'mobx': 'Other',
    'archimate': 'Other',
    'stripe': 'Other',
    'recurly': 'Other',
    'chargebee': 'Other',
    'netcore': 'Other',
    'quickbooks': 'Other',
    'hyperion': 'Other',
    'recognize': 'Other',
    'engagez': 'Other',
    'borlabs cookie': 'Other',
    'leadfeeder': 'Other',
    'bizzabo': 'Other',
    'hopin': 'Other',
    'provenexpert': 'Other',
    'salsify': 'Other',
    'boundary': 'Other',
    'kahoot': 'Other',
    'jonas': 'Other',
    'dynatrace': 'Other',
    'conveyor': 'Other',
    'demandware': 'Other',
    'vue storefront': 'Other',
    'northstar': 'Other',
    'blackline': 'Other',
    'churnzero': 'Other',
    'planhat': 'Other',
    'salto': 'Other',
    'nextgen': 'Other',
    'artifactory': 'Other',
    'sonarqube': 'Other',
    'icinga': 'Other',
    'new relic': 'Other',
    'appdynamics': 'Other',
    'smartrecruiters': 'Other',
    'workstream': 'Other',
    'edenred': 'Other',
    'articulate 360': 'Other',
    'covisint': 'Other',
    'hydrogen': 'Other',
    'jetpack': 'Other',
    'outreach': 'Other',
    'fivetran': 'Other',  
    'sieben': 'Other',
    'buildout': 'Other',
    'wonderlic': 'Other',
    'methode': 'Other',
    'tandem': 'Other',
    'datasphere': 'Other',
    'weights & biases': 'Other',
    'zuora': 'Other',
    'cogent communications': 'Other',
    'mobileiron': 'Other',
    '1password': 'Other',
    'insightsquared': 'Other',
    'avid pro tools': 'Other',
    'boomi': 'Other',
    'mulesoft anypoint platform': 'Other',
    'dataweave': 'Other',
    'oracle exadata': 'Other',
    'centricity': 'Other',
    'support hero': 'Other',
    'keyless': 'Other',
    'automic': 'Other',
    'ethereum': 'Other',
    'ethers': 'Other',
    'citrix': 'Other',
    'planisware': 'Other',
    'commercetools': 'Other',
    'atento': 'Other',
    'agosto': 'Other',
    'informa': 'Other',
}


keywordsSkills_categories = job_posting_df['Keywords'].map(keywords_skills_mapping)
keywords_skills_counts = keywordsSkills_categories.value_counts().sort_index()
print("Keywords often mentioned in job posts - Sorted through skills:")
print(keywords_skills_counts)


Keywords often mentioned in job posts - Sorted through skills:
Keywords
Analytics & BI                      70
CRM                                  4
Cloud Services                       6
Databases                            4
Design Tools                         2
DevOps & CI/CD                      11
ERP & Business Software           1092
Frameworks & Libraries               1
Hardware                            19
Marketing & Social Media Tools       1
Methodologies                      176
Networking & Security               13
Operating Systems                    6
Other                              517
Programming Languages               67
Project Management                   5
Tools & Platforms                   79
Name: count, dtype: int64


## Potential Implications of the Data

## Structure of the Data

## Key Data Fields 

This section provides a brief description of the key attributes present in the dataset:


- **Job Posting Date**: Captures the date a job is listed. This is crucial for job seekers and HR professionals to stay updated on the latest opportunities and trends.

- **Job Title**: Specifies the position being advertised. This helps in categorizing and filtering job openings based on industry roles and career interests.

- **Company Name**: Lists the hiring company. This information assists job seekers in targeting their applications and helps businesses track competitors and market trends.

- **Job Location**: Provides the job's geographic location within Singapore. Job seekers use this to find opportunities in specific areas, while employers analyze regional talent and market conditions.

- **Job Description**: Includes details about responsibilities, required qualifications, and other relevant aspects. This is vital for candidates to determine if they meet the requirements and for recruiters to communicate expectations clearly.

## General Research Question 

In [None]:
# Checking for Multiple Data Representation of the same Categorical Values

In [None]:
# Checking for Incorrect Datatypes

In [None]:
# Checking for Missing/Null Values

In [None]:
# Checking for Duplicate Data

In [None]:
# Checking for Inconsistent Data

In [None]:
# Checking for Outliers

## Matplotlibs Charts Visualization

### EDA Question 1

Both formulaion and answer in the same cell

### EDA Question 2

### EDA Question 3