# 05. Export for Power BI

## Goal
Prepare and export the final dataset specifically for Power BI import. This involves a final data quality check and ensuring all columns are named intuitively for a dashboard user.

## Steps
1. Load the fully processed dataset.
2. **Final Selection**: Select only columns relevant for visualization.
3. **Renaming**: Ensure column names are business-friendly (e.g., 'salary_in_usd' -> 'Salary (USD)').
4. Export to `data/cleaned/power_bi_data.csv`.

In [1]:
import pandas as pd

### 1. Load Processed Data

In [2]:
file_path = "../data/cleaned/processed_jobs_data.csv"
df = pd.read_csv(file_path)
df.head(3)

Unnamed: 0,job_title,experience_level,employment_type,work_models,work_year,employee_residence,salary,salary_currency,salary_in_usd,company_location,company_size,job_category,salary_tier,job_location_type
0,Data Engineer,Mid-level,Full-time,Remote,2024,United States,148100,USD,148100,United States,Medium,Data Engineer,Medium,Global
1,Data Engineer,Mid-level,Full-time,Remote,2024,United States,98700,USD,98700,United States,Medium,Data Engineer,Low,Global
2,Data Scientist,Senior-level,Full-time,Remote,2024,United States,140032,USD,140032,United States,Medium,Data Scientist,Medium,Global


### 2. Rename & Select Columns
Making the data 'Business Ready'.

In [3]:
final_df = df[[
    'work_year',
    'job_category',
    'job_title',
    'salary_in_usd',
    'company_location',
    'job_location_type',
    'experience_level',
    'employment_type',
    'work_models' if 'work_models' in df.columns else 'remote_ratio' # Handling potential column name diffs
]].copy()

# Rename for clarity
final_df.rename(columns={
    'work_year': 'Year',
    'job_category': 'Job Category',
    'job_title': 'Job Title',
    'salary_in_usd': 'Salary (USD)',
    'company_location': 'Country Code',
    'job_location_type': 'Location Group',
    'experience_level': 'Experience Level',
    'employment_type': 'Employment Type',
    'work_models': 'Work Model', # Or 'remote_ratio' if using older dataset version
    'remote_ratio': 'Remote Ratio',
}, inplace=True)

final_df.head()

Unnamed: 0,Year,Job Category,Job Title,Salary (USD),Country Code,Location Group,Experience Level,Employment Type,Work Model
0,2024,Data Engineer,Data Engineer,148100,United States,Global,Mid-level,Full-time,Remote
1,2024,Data Engineer,Data Engineer,98700,United States,Global,Mid-level,Full-time,Remote
2,2024,Data Scientist,Data Scientist,140032,United States,Global,Senior-level,Full-time,Remote
3,2024,Data Scientist,Data Scientist,100022,United States,Global,Senior-level,Full-time,Remote
4,2024,Other,BI Developer,120000,United States,Global,Mid-level,Full-time,On-site


### 3. Final Export
This file will be the direct source for your Power BI Dashboard.

In [4]:
export_path = "../data/cleaned/power_bi_data.csv"
final_df.to_csv(export_path, index=False)
print(f"✅ Final Power BI Ready Dataset saved to: {export_path}")

✅ Final Power BI Ready Dataset saved to: ../data/cleaned/power_bi_data.csv
