# Job Posting Data Analysis
In this notebook, the group will be working with the [Job Posting in Singapore](https://www.kaggle.com/datasets/techsalerator/job-posting-data-in-singapore) dataset. This dataset will be used for processing, analyzing, and visualizing data.

This project is carried out by the group **DS NERDS**, under Section **S19**, which consists of the following members:
- Colobong, Franz Andrick
- Chu, Andre Benedict M. 
- Pineda, Mark Gabriel A.
- Rocha, Angelo H. 
  
The output fulfulls part of the requirements for the course Statistical Modeling and Simulation (CSMODEL). 


# Import Libraries

TO-DO:
Put a brief description for each module used and how it was used in the notebook.


In [599]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import json

## Dataset Description and Collection Process

This dataset offers a comprehensive overview of job openings across various sectors in Singapore. It provides an essential resource for businesses, job seekers, and labor market analysts, and it can also be a valuable tool for people who would like to be informed about job openings and employment trends in Singapore.

The data was collected by a global data provider called **Techsalerator**, by consolidating and categorizing job-related information from diverse sources, including company websites, job boards, and recruitment agencies. 

Now, let us load the CSV file into our workspace with **'latin1'** encoding as it contains special characters (e.g., é, ñ, ’) that caused a UnicodeDecodeError with the default **'utf-8'** encoding.

In [600]:
job_posting_df = pd.read_csv('Job Posting.csv', encoding='latin1')
job_posting_df.head(200)

Unnamed: 0,Website Domain,Ticker,Job Opening Title,Job Opening URL,First Seen At,Last Seen At,Location,Location Data,Category,Seniority,...,Description,Salary,Salary Data,Contract Types,Job Status,Job Language,Job Last Processed At,O*NET Code,O*NET Family,O*NET Occupation Name
0,bosch.com,,IN_RBAI_Assistant Manager_Dispensing Process E...,https://jobs.smartrecruiters.com/BoschGroup/74...,2024-05-29T19:59:45Z,2024-07-31T14:35:44Z,"Indiana, United States","[{""city"":null,""state"":""Indiana"",""zip_code"":nul...","engineering, management, support",manager,...,**IN\_RBAI\_Assistant Manager\_Dispensing Proc...,,"{""salary_low"":null,""salary_high"":null,""salary_...",full time,closed,en,2024-08-02T14:47:55Z,43-1011.00,Office and Administrative Support,First-Line Supervisors of Office and Administr...
1,bosch.com,,Professional Internship: Hardware Development ...,https://jobs.smartrecruiters.com/BoschGroup/74...,2024-05-04T01:00:12Z,2024-07-29T17:46:16Z,"Delaware, United States","[{""city"":null,""state"":""Delaware"",""zip_code"":nu...",internship,non_manager,...,**Professional Internship: Hardware Developmen...,,"{""salary_low"":null,""salary_high"":null,""salary_...","full time, internship, m/f",closed,en,2024-07-31T17:50:07Z,17-2061.00,Architecture and Engineering,Computer Hardware Engineers
2,zf.com,,Process Expert BMS Production,https://jobs.zf.com/job/Shenyang-Process-Exper...,2024-04-19T06:47:24Z,2024-05-16T02:25:08Z,China,"[{""city"":null,""state"":null,""zip_code"":null,""co...",engineering,non_manager,...,ZF is a global technology company supplying sy...,,"{""salary_low"":null,""salary_high"":null,""salary_...",,closed,en,2024-05-18T02:32:04Z,51-9141.00,Production,Semiconductor Processing Technicians
3,bosch.com,,DevOps Developer with Python for ADAS Computin...,https://jobs.smartrecruiters.com/BoschGroup/74...,2024-08-16T10:20:37Z,2024-08-22T11:14:49Z,Romania,"[{""city"":null,""state"":null,""zip_code"":null,""co...","information_technology, software_development",non_manager,...,**DevOps Developer with Python for ADAS Comput...,,"{""salary_low"":null,""salary_high"":null,""salary_...",full time,closed,en,2024-08-23T00:33:30Z,15-1252.00,Computer and Mathematical,Software Developers
4,bosch.com,,Senior Engineer Sales - Video Systems and Solu...,https://jobs.smartrecruiters.com/BoschGroup/74...,2024-07-01T17:31:20Z,2024-08-01T05:11:33Z,India,"[{""city"":null,""state"":null,""zip_code"":null,""co...","engineering, sales",non_manager,...,**Senior Engineer Sales - Video Systems and So...,,"{""salary_low"":null,""salary_high"":null,""salary_...",full time,closed,en,2024-08-02T19:03:16Z,41-9031.00,Sales and Related,Sales Engineers
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,heraeus.com,,Werkstudent / Praktikant / Abschlussarbeit Sup...,https://jobs.heraeus.com/job/Kleinostheim-Werk...,2024-04-30T17:12:07Z,2024-06-13T12:37:09Z,"Delaware, United States","[{""city"":null,""state"":""Delaware"",""zip_code"":nu...",internship,manager,...,**Werkstudent / Praktikant / Abschlussarbeit S...,,"{""salary_low"":null,""salary_high"":null,""salary_...",vollzeit,closed,de,2024-06-14T20:08:13Z,11-3071.04,Management,Supply Chain Managers
196,bosch.com,,Prctica Tcnico Prevencin de Riesgos,https://jobs.smartrecruiters.com/BoschGroup/74...,2024-05-06T20:15:31Z,2024-08-26T15:35:45Z,Chile,"[{""city"":null,""state"":null,""zip_code"":null,""co...","engineering, healthcare_services",non_manager,...,**Prctica Tcnico Prevencin de Riesgos**\n\n...,,"{""salary_low"":null,""salary_high"":null,""salary_...",full time,closed,en,2024-08-28T15:45:45Z,19-4042.00,"Life, Physical, and Social Science",Environmental Science and Protection Technicia...
197,bosch.com,,Bosch - Szchenyi Jobfair Gy_r (2024 Spring) T...,https://jobs.smartrecruiters.com/BoschGroup/74...,2024-04-08T21:26:27Z,2024-04-23T01:06:41Z,"Budapest, Hungary","[{""city"":""Budapest"",""state"":null,""zip_code"":nu...",,non_manager,...,**Bosch - Szchenyi Jobfair Gy_r (2024 Spring)...,,"{""salary_low"":null,""salary_high"":null,""salary_...","contract, full time, internship",closed,en,2024-04-25T01:12:23Z,49-3023.00,"Installation, Maintenance, and Repair",Automotive Service Technicians and Mechanics
198,bosch.com,,Foreign Trade Specialist,https://jobs.smartrecruiters.com/BoschGroup/74...,2024-06-12T13:18:01Z,2024-07-04T14:36:29Z,"Budapest, Hungary","[{""city"":""Budapest"",""state"":null,""zip_code"":nu...",,non_manager,...,**Foreign Trade Specialist**\n\n\n* Full-time\...,,"{""salary_low"":null,""salary_high"":null,""salary_...",full time,closed,en,2024-07-06T14:41:44Z,13-1199.00,Business and Financial Operations,"Business Operations Specialists, All Other"


## Potential Implications of the Data

## Structure of the Data

## Key Data Fields 

This section provides a brief description of the key attributes present in the dataset:


- **Job Posting Date**: Captures the date a job is listed. This is crucial for job seekers and HR professionals to stay updated on the latest opportunities and trends.

- **Job Title**: Specifies the position being advertised. This helps in categorizing and filtering job openings based on industry roles and career interests.

- **Company Name**: Lists the hiring company. This information assists job seekers in targeting their applications and helps businesses track competitors and market trends.

- **Job Location**: Provides the job's geographic location within Singapore. Job seekers use this to find opportunities in specific areas, while employers analyze regional talent and market conditions.

- **Job Description**: Includes details about responsibilities, required qualifications, and other relevant aspects. This is vital for candidates to determine if they meet the requirements and for recruiters to communicate expectations clearly.

In [601]:
key_data_fields = job_posting_df[['First Seen At', 'Job Opening Title', 'Job Opening URL', 'Location', 'Description']]
key_data_fields.head()

Unnamed: 0,First Seen At,Job Opening Title,Job Opening URL,Location,Description
0,2024-05-29T19:59:45Z,IN_RBAI_Assistant Manager_Dispensing Process E...,https://jobs.smartrecruiters.com/BoschGroup/74...,"Indiana, United States",**IN\_RBAI\_Assistant Manager\_Dispensing Proc...
1,2024-05-04T01:00:12Z,Professional Internship: Hardware Development ...,https://jobs.smartrecruiters.com/BoschGroup/74...,"Delaware, United States",**Professional Internship: Hardware Developmen...
2,2024-04-19T06:47:24Z,Process Expert BMS Production,https://jobs.zf.com/job/Shenyang-Process-Exper...,China,ZF is a global technology company supplying sy...
3,2024-08-16T10:20:37Z,DevOps Developer with Python for ADAS Computin...,https://jobs.smartrecruiters.com/BoschGroup/74...,Romania,**DevOps Developer with Python for ADAS Comput...
4,2024-07-01T17:31:20Z,Senior Engineer Sales - Video Systems and Solu...,https://jobs.smartrecruiters.com/BoschGroup/74...,India,**Senior Engineer Sales - Video Systems and So...


Now, we will check for missing or null values in the dataset. Upon inspection, we can see that the **`Ticker`** column—referring to the stock ticker symbol of the company that posted the job—contains only null values.

Since this column provides no usable information for analysis or modeling, we can safely drop it from the dataset.


In [602]:
# Check the Ticker column
null_count = job_posting_df['Ticker'].isna().sum()
print("Unique Values:", job_posting_df['Ticker'].unique())
print(f"Number of null values: {null_count}")

# Drop the column
job_posting_df = job_posting_df.drop(columns=['Ticker'])

Unique Values: [nan]
Number of null values: 9919


Now, we will check the **`Salary Data`** column to understand the details included in the salary information for each job posting. To avoid modifying the original `job_posting_df`, we will create a copy and store it in `salary_df`. This allows us to safely transform and clean the salary-related data without affecting the source DataFrame.

In [603]:
salary_df = job_posting_df.copy()
salary_df['Salary Data']


0       {"salary_low":null,"salary_high":null,"salary_...
1       {"salary_low":null,"salary_high":null,"salary_...
2       {"salary_low":null,"salary_high":null,"salary_...
3       {"salary_low":null,"salary_high":null,"salary_...
4       {"salary_low":null,"salary_high":null,"salary_...
                              ...                        
9914    {"salary_low":null,"salary_high":null,"salary_...
9915    {"salary_low":null,"salary_high":null,"salary_...
9916    {"salary_low":null,"salary_high":null,"salary_...
9917    {"salary_low":null,"salary_high":null,"salary_...
9918    {"salary_low":null,"salary_high":null,"salary_...
Name: Salary Data, Length: 9919, dtype: object

Upon inspection, we notice that the salary descriptions are stored as **JSON objects**—but currently in the form of **JSON strings**.

To make this data usable, we will:

1. **Parse** each string into a Python dictionary.
2. **Normalize** the dictionary so that each key becomes its own separate column in the DataFrame.

This will give us a clearer structure, allowing us to inspect and clean salary values more effectively.


In [604]:
salary_df = job_posting_df.copy()

# Parse json object into a dictionary
salary_df['Salary Data'] = salary_df['Salary Data'].apply(
    lambda x: json.loads(x) if isinstance(x, str) else x
)

# Normalize Salary Data into new columns and remove rows with null values
salary_df = pd.json_normalize(salary_df['Salary Data'])

By running `salary_df.info()`, we can observe that out of thousands of job postings, only **434** entries contain salary-related information. 

Since salary is a critical detail when analyzing job data, we want to ensure our next steps focus only on entries where salary is provided. To simplify our cleaning process, we will **temporarily drop rows with null values** for salary-related fields.


In [605]:
salary_df.info() 

# Drop rows with any null values
salary_df.dropna(inplace=True)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9919 entries, 0 to 9918
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   salary_low        434 non-null    float64
 1   salary_high       434 non-null    float64
 2   salary_currency   434 non-null    object 
 3   salary_low_usd    434 non-null    float64
 4   salary_high_usd   434 non-null    float64
 5   salary_time_unit  434 non-null    object 
dtypes: float64(4), object(2)
memory usage: 465.1+ KB


Now that we've removed rows with null values, we can inspect the unique values present in each field. 

In particular, the **`salary_currency`** column contains two distinct values: **USD** and **EUR**.


In [606]:
salary_df['salary_currency'].value_counts()

salary_currency
USD    291
EUR    143
Name: count, dtype: int64

After checking the `salary_currency` field, we observe that most job salaries are already in **USD**. 

To ensure consistency in our analysis, we will normalize the data by converting all **EUR** salaries to **USD** using the exchange rate as of **June 17, 2025**:

- **1 EUR = 1.15 USD**

This conversion allows us to compare salaries more accurately and ensures uniformity across the dataset.


In [607]:
# Define conversion rate from EUR to USD
conversion_rate = 1.15

# Convert EUR to USD
for index, row in salary_df.iterrows():
    if row['salary_currency'] == 'EUR':
        salary_df.loc[index, 'salary_low'] = row['salary_low'] * conversion_rate
        salary_df.loc[index, 'salary_high'] = row['salary_high'] * conversion_rate
        salary_df.loc[index, 'salary_currency'] = 'USD'

# Drop redundant salary column 
salary_df.drop(columns=['salary_low_usd', 'salary_high_usd'], inplace=True, errors='ignore')

salary_df

Unnamed: 0,salary_low,salary_high,salary_currency,salary_time_unit
31,49.45,49.45,USD,hour
33,34437.90,34437.90,USD,year
166,171000.00,190000.00,USD,year
177,19.50,19.50,USD,hour
213,234062.00,245000.00,USD,year
...,...,...,...,...
9624,1087.90,1087.90,USD,month
9652,43.00,66.00,USD,hour
9758,16.50,16.50,USD,hour
9772,42262.50,42262.50,USD,year


Now that all the salaries are represented in USD currency, we can now focus salary_time_unit which is categorized into three values: hour, month, year. The values represent how each salary is given. 

In [608]:
salary_df['salary_time_unit'].value_counts()

salary_time_unit
year     245
hour     132
month     57
Name: count, dtype: int64

Now that all the salaries are represented in **USD**, we can focus on the `salary_time_unit` column, which is categorized into three values: **hour**, **month**, and **year**. These indicate how each salary is paid.

We notice that most salaries are already given on an **annual basis**. To maintain consistency and enable easier comparisons, we will convert all salaries to **annual salary**.

#### Conversion Formulas:
- **Monthly to Annual**:
  - `annual_salary = monthly_salary * 12`

- **Hourly to Annual** (assuming a standard 9-to-5 schedule):
  - `hours_per_week = 40`
  - `weeks_per_year = 52`
  - `hourly_to_annual = 40 * 52 = 2080`

In [609]:
# Conversion factors
monthly_to_annual = 12
hours_per_week = 40
weeks_per_year = 52
hourly_to_annual = hours_per_week * weeks_per_year  # 40 * 52 = 2080

for index, row in salary_df.iterrows():
    # Convert hourly salaries to annual
    if (row['salary_time_unit'] == 'hour'):
        salary_df.loc[index, 'salary_low'] = row['salary_low'] * hourly_to_annual
        salary_df.loc[index, 'salary_high'] = row['salary_high'] * hourly_to_annual
        salary_df.loc[index, 'salary_time_unit'] = 'year'
    
    # Convert monthly salaries to annual
    elif (row['salary_time_unit'] == 'month'):
        salary_df.loc[index, 'salary_low'] = row['salary_low'] * monthly_to_annual
        salary_df.loc[index, 'salary_high'] = row['salary_high'] * monthly_to_annual
        salary_df.loc[index, 'salary_time_unit'] = 'year'

    # Retain annual salaries
    else:
        salary_df.loc[index, 'salary_low'] = row['salary_low']
        salary_df.loc[index, 'salary_high'] = row['salary_high']

    
salary_df


Unnamed: 0,salary_low,salary_high,salary_currency,salary_time_unit
31,102856.0,102856.0,USD,year
33,34437.9,34437.9,USD,year
166,171000.0,190000.0,USD,year
177,40560.0,40560.0,USD,year
213,234062.0,245000.0,USD,year
...,...,...,...,...
9624,13054.8,13054.8,USD,year
9652,89440.0,137280.0,USD,year
9758,34320.0,34320.0,USD,year
9772,42262.5,42262.5,USD,year


Now that all salaries are in the same currency (**USD**) and time unit (**annual**), we can focus on the `salary_low` and `salary_high` fields.

These two fields represent the **lower and upper bounds** of the offered salary range. To simplify the analysis and create a single representative salary value, we will take the **mean** of these two values.

This gives us a new column, `annual_salary`, which reflects the average offered salary for each job.

In [610]:
salary_df['annual_salary'] = (
    salary_df[['salary_low', 'salary_high']].mean(axis=1)
)

salary_df


Unnamed: 0,salary_low,salary_high,salary_currency,salary_time_unit,annual_salary
31,102856.0,102856.0,USD,year,102856.0
33,34437.9,34437.9,USD,year,34437.9
166,171000.0,190000.0,USD,year,180500.0
177,40560.0,40560.0,USD,year,40560.0
213,234062.0,245000.0,USD,year,239531.0
...,...,...,...,...,...
9624,13054.8,13054.8,USD,year,13054.8
9652,89440.0,137280.0,USD,year,113360.0
9758,34320.0,34320.0,USD,year,34320.0
9772,42262.5,42262.5,USD,year,42262.5


Now that we've created the `annual_salary` column, the original fields—`salary_low`, `salary_high`, `salary_currency`, and `salary_time_unit`—are no longer needed for further analysis.

To clean up the DataFrame and simplify its structure, we will drop these columns.


In [611]:
salary_df.drop(columns=['salary_low', 'salary_high', 'salary_currency', 'salary_time_unit'], inplace=True)
salary_df

Unnamed: 0,annual_salary
31,102856.0
33,34437.9
166,180500.0
177,40560.0
213,239531.0
...,...
9624,13054.8
9652,113360.0
9758,34320.0
9772,42262.5


Now that we've cleaned and normalized the salary information into a single `annual_salary` column, we can integrate it back into the original `job_posting_df`.

We will assign this as a new column called `Annual_Salary`, allowing us to analyze job postings alongside their corresponding annual salaries.

In [618]:
# Add the annual salary to the original job_posting_df
job_posting_df['Annual_Salary'] = salary_df['annual_salary']
job_posting_df[job_posting_df['Annual_Salary'].notnull()]

Unnamed: 0,Website Domain,Job Opening Title,Job Opening URL,First Seen At,Last Seen At,Location,Location Data,Category,Seniority,Keywords,...,Salary,Salary Data,Contract Types,Job Status,Job Language,Job Last Processed At,O*NET Code,O*NET Family,O*NET Occupation Name,Annual_Salary
31,zf.com,Test Driver (m/f/d),https://jobs.zf.com/job/Mutliva-Baja-Test-Driv...,2024-04-22T21:04:16Z,2024-05-07T01:52:44Z,"Pamplona, Spain","[{""city"":""Pamplona"",""state"":null,""zip_code"":nu...","quality_assurance, manual_work",non_manager,,...,ZF reported sales of Û43,"{""salary_low"":43.0,""salary_high"":43.0,""salary_...",m/f,closed,en,2024-05-09T01:57:38Z,53-3032.00,Transportation and Material Moving,Heavy and Tractor-Trailer Truck Drivers,102856.0
33,bosch.com,Praktikum Projektmanagement im technischen Ein...,https://jobs.smartrecruiters.com/BoschGroup/74...,2024-03-29T10:39:29Z,2024-07-18T03:17:52Z,"Salzburg, Austria","[{""city"":""Salzburg"",""state"":null,""zip_code"":nu...","internship, management, purchasing",manager,SAP,...,"Das Bruttogehalt betrgt 29.946,00 EUR p","{""salary_low"":29946.0,""salary_high"":29946.0,""s...",vollzeit,closed,de,2024-07-20T03:21:54Z,13-1082.00,Business and Financial Operations,Project Management Specialists,34437.9
166,contentful.com,"Senior Analyst, People Technology - Workday Pa...",https://www.contentful.com/careers/job/6191264,2024-08-19T19:12:56Z,2024-09-04T07:40:14Z,"San Francisco, California, United States","[{""city"":""San Francisco"",""state"":""California"",...","data_analysis, engineering, human_resources",non_manager,"Growth, Social Media, Kanban, HRIS, Contentful...",...,"$171,000 - $190,000","{""salary_low"":171000.0,""salary_high"":190000.0,...","remote, hybrid, full time",,en,2024-09-02T20:53:39Z,43-3051.00,Office and Administrative Support,Payroll and Timekeeping Clerks,180500.0
177,bosch.com,Warehouse Associate (AT1 - 1st shift),https://jobs.smartrecruiters.com/BoschGroup/74...,2024-09-04T02:37:43Z,2024-09-04T05:08:32Z,"Atlanta, Georgia, 30336, United States","[{""city"":""Atlanta"",""state"":""Georgia"",""zip_code...",manual_work,non_manager,"Microsoft, SAP",...,$19.50,"{""salary_low"":19.5,""salary_high"":19.5,""salary_...",full time,,en,2024-09-04T02:41:43Z,53-7065.00,Transportation and Material Moving,Stockers and Order Fillers,40560.0
213,bosch.com,Senior Software Engineer (REF229535I),https://jobs.smartrecruiters.com/BoschGroup/74...,2024-05-29T19:45:03Z,2024-08-09T15:44:15Z,"Sunnyvale, California, 94085, United States","[{""city"":""Sunnyvale"",""state"":""California"",""zip...","engineering, software_development",non_manager,"C++, Linux, Python",...,"$234,062-$245,000/yr","{""salary_low"":234062.0,""salary_high"":245000.0,...",full time,closed,en,2024-08-11T15:49:20Z,15-1252.00,Computer and Mathematical,Software Developers,239531.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9624,bosch.com,Praktikum Logistik w/m/div. (ab September 2024...,https://jobs.smartrecruiters.com/BoschGroup/74...,2024-04-04T12:05:58Z,2024-09-02T12:46:53Z,"Salzburg, Austria","[{""city"":""Salzburg"",""state"":null,""zip_code"":nu...","internship, operations",non_manager,SAP,...,"946,00 EUR","{""salary_low"":946.0,""salary_high"":946.0,""salar...",vollzeit,,de,2024-09-02T12:46:53Z,43-4181.00,Office and Administrative Support,Reservation and Transportation Ticket Agents a...,13054.8
9652,bosch.com,Multi-Modal Foundation Models for Embodied Age...,https://jobs.smartrecruiters.com/BoschGroup/74...,2024-04-26T15:46:58Z,2024-06-19T22:38:31Z,"Pittsburgh, Pennsylvania, United States","[{""city"":""Pittsburgh"",""state"":""Pennsylvania"",""...",,non_manager,"Python, Internship, Google Cloud, Google, Micr...",...,$43.00-$66.00,"{""salary_low"":43.0,""salary_high"":66.0,""salary_...",intern,closed,en,2024-06-21T22:43:17Z,13-1111.00,Business and Financial Operations,Management Analysts,113360.0
9758,zf.com,Packer 2nd shift,https://jobs.zf.com/job/Garrett-Packer-2nd-shi...,2024-08-07T19:31:10Z,2024-08-12T08:36:53Z,"Garrett, Indiana, United States","[{""city"":""Garrett"",""state"":""Indiana"",""zip_code...",manual_work,non_manager,SAP,...,$16.50hour,"{""salary_low"":16.5,""salary_high"":16.5,""salary_...",m/f,closed,en,2024-08-13T20:08:39Z,53-7064.00,Transportation and Material Moving,"Packers and Packagers, Hand",34320.0
9772,bosch.com,Praktikum Technisches Benchmarking / Data Anal...,https://jobs.smartrecruiters.com/BoschGroup/74...,2024-03-15T07:22:38Z,2024-03-22T00:05:05Z,,[],"data_analysis, information_technology",non_manager,Power BI,...,mindestensÊ36.750 EUR p,"{""salary_low"":36750.0,""salary_high"":36750.0,""s...",m/w,closed,de,2024-03-24T00:19:37Z,15-1253.00,Computer and Mathematical,Software Quality Assurance Analysts and Testers,42262.5


In [613]:
# Checking for Incorrect Datatypes

In [614]:
# Checking for Duplicate Data

To ensure that all date-related fields — which include the fields **`First Seen At`**, **`Last Seen At`**, and **`Job Last Processed At`** — are correctly formatted, we check for inconsistencies by attempting to parse them into datetime objects using pd.to_datetime(). Any values that fail to convert (e.g., due to incorrect format or invalid date values) are set to NaT, allowing us to count how many entries are invalid per column.

In [615]:
# Checking for Inconsistent Date Formatting
date_fields = ['First Seen At', 'Last Seen At', 'Job Last Processed At']
date_df = job_posting_df[date_fields].copy()

for col in date_fields:
    date_df[col] = pd.to_datetime(date_df[col], errors='coerce')
    invalid_formats = date_df[col].isna().sum()
    print(f"{invalid_formats} invalid date(s) found in '{col}'")


0 invalid date(s) found in 'First Seen At'
0 invalid date(s) found in 'Last Seen At'
0 invalid date(s) found in 'Job Last Processed At'


Since all the fields involving date and time have been verified to follow **consistent formats** (i.e., no invalid entries), we can proceed to the next step of the pre-processing pipeline.

In [616]:
# Checking for Outliers
# TODO: Check for outliers in the Salary Data Field

## Matplotlibs Charts Visualization

## General Research Question 

### EDA Question 1

Both formulaion and answer in the same cell

### EDA Question 2

### EDA Question 3