# Notebook 1: Job Postings Data Exploration

### Objective
The primary goal of this notebook is to load the selected dataset (`Google Job Skills`) and perform an initial exploratory data analysis (EDA). This foundational step is crucial to understand the data's structure, identify the relevant information for our project, and prepare it for processing by our AI agents.

### Key Steps:
1.  **Load Data**: Import the necessary libraries and load the CSV file into a pandas DataFrame.
2.  **Initial Inspection**: Examine the DataFrame's basic properties (shape, columns, data types) to get a first overview.
3.  **Identify Key Column**: Pinpoint the specific column containing the job descriptions or responsibilities, which will be the primary input for our `Market_Analyst` agent.

## 1.1. Import Libraries and Load Data

In [1]:
import pandas as pd

In [2]:
data_path = '../data/raw/job_skills.csv'

df = pd.read_csv(data_path)

df.head()

Unnamed: 0,Company,Title,Category,Location,Responsibilities,Minimum Qualifications,Preferred Qualifications
0,Google,Google Cloud Program Manager,Program Management,Singapore,"Shape, shepherd, ship, and show technical prog...",BA/BS degree or equivalent practical experienc...,Experience in the business technology market a...
1,Google,"Supplier Development Engineer (SDE), Cable/Con...",Manufacturing & Supply Chain,"Shanghai, China",Drive cross-functional activities in the suppl...,BS degree in an Engineering discipline or equi...,"BSEE, BSME or BSIE degree.\nExperience of usin..."
2,Google,"Data Analyst, Product and Tools Operations, Go...",Technical Solutions,"New York, NY, United States",Collect and analyze data to draw insight and i...,"Bachelor’s degree in Business, Economics, Stat...",Experience partnering or consulting cross-func...
3,Google,"Developer Advocate, Partner Engineering",Developer Relations,"Mountain View, CA, United States","Work one-on-one with the top Android, iOS, and...",BA/BS degree in Computer Science or equivalent...,"Experience as a software developer, architect,..."
4,Google,"Program Manager, Audio Visual (AV) Deployments",Program Management,"Sunnyvale, CA, United States",Plan requirements with internal customers.\nPr...,BA/BS degree or equivalent practical experienc...,CTS Certification.\nExperience in the construc...


## 1.2. DataFrame Structure Inspection

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1250 entries, 0 to 1249
Data columns (total 7 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Company                   1250 non-null   object
 1   Title                     1250 non-null   object
 2   Category                  1250 non-null   object
 3   Location                  1250 non-null   object
 4   Responsibilities          1235 non-null   object
 5   Minimum Qualifications    1236 non-null   object
 6   Preferred Qualifications  1236 non-null   object
dtypes: object(7)
memory usage: 68.5+ KB


In [5]:
print(df.shape)

(1250, 7)


## 1.3. Feature Engineering: Creating a Unified Job Description

Our analysis shows that the key job details are spread across three columns: `Responsibilities`, `Minimum Qualifications`, and `Preferred Qualifications`. To provide our AI agent with the most complete context, we will combine these into a single text column named `full_description`.

Before concatenating, we must handle any missing values (`NaN`) by filling them with an empty string to prevent errors and ensure clean text.

In [6]:
text_columns = ['Responsibilities', 'Minimum Qualifications', 'Preferred Qualifications']

# Step 1: Fill NaN values in the target columns with an empty string
for col in text_columns:
    df[col] = df[col].fillna('')

# Step 2: Create the 'full_description' column by joining the text from the specified columns.
# We use a double newline character ('\n\n') as a separator for better readability.
df['full_description'] = df[text_columns].apply(lambda x: '\n\n'.join(x), axis=1)


print("Verification: First 5 entries of the 'full_description' column:")
print(df['full_description'].head())


print("\n==================== Example of one full description (index 0) ====================\n")
print(df['full_description'].iloc[0])

Verification: First 5 entries of the 'full_description' column:
0    Shape, shepherd, ship, and show technical prog...
1    Drive cross-functional activities in the suppl...
2    Collect and analyze data to draw insight and i...
3    Work one-on-one with the top Android, iOS, and...
4    Plan requirements with internal customers.\nPr...
Name: full_description, dtype: object


Shape, shepherd, ship, and show technical programs designed to support the work of Cloud Customer Engineers and Solutions Architects.
Measure and report on key metrics tied to those programs to identify any need to change course, cancel, or scale the programs from a regional to global platform.
Communicate status and identify any obstacles and paths for resolution to stakeholders, including those in senior roles, in a transparent, regular, professional and timely manner.
Establish expectations and rationale on deliverables for stakeholders and program contributors.
Provide program performance feedback to teams in 