# 📦 Import Libraries and Load Data

In [20]:
%load_ext autoreload
%autoreload 2
import sys
import os
project_root = os.path.abspath("..") 
sys.path.append(os.path.abspath(".."))  # Adjust ".." if your notebook is deeper

import pandas as pd 
from features.helpers import filter_jobs
from features.wrangling import remove_outliers
from features.plots import DataJobsViz

CLEAN_PATH = os.path.join("..", "data", "clean", "clean_data.csv") # si ejecutas desde Analysis/

df_clean = pd.read_csv(CLEAN_PATH)
df = df_clean.copy()
viz = DataJobsViz()

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# 🌍 Data Jobs Around the World

Welcome to this analysis! In this notebook, we will explore the global landscape of **data-related jobs**, uncover trends, and gain insights into the skills and locations that are most in demand.

## 🔍 What We Will Cover

- **Countries:** Identify which countries have the highest demand for data professionals.
- **Job Types:** Explore different roles in the field of data analysis.
- **Skills:** Find out the most sought-after technical skills.
- **Trends & Patterns:** Understand the emerging trends in the global data job market.

---

> 💡 **Goal:** Provide a clear, visual, and insightful view of the global data job market for analysts, data scientists, and recruiters alike.



### 📊 Average Salary by Profession

In this first visualization, we can see the countries with the **highest average salary** for each profession.


In [21]:
viz.plot_salary_by_country_and_job(df)


### 💹 Salary Distribution by Country

Since some results from the previous chart were not entirely convincing, we created this visualization to examine the **salary distribution** and the **frequency/density of job listings** in each country.

- **In United States or in Remote Jobs called (Anywhere):** We can observe a clear distribution with enough frequency for the data to be reliable.  
- **Portugal:** The distribution is uneven and supported by a lower frequency compared to other countries, suggesting that the average salaries shown in the previous chart may not fully reflect reality.  

> ⚠️ Note: Out of over 700,000 job listings, only about 20,000 include salary information. Therefore, it is normal that some countries may not accurately represent the real salary situation.


In [22]:
top_countries = ['Anywhere','Canada', 'Germany','India','Portugal', 'United Kingdom','United States']
viz.plot_distribution_interactive(df, country=top_countries, title = 'Salary Distribution by Country')

### 💼 Salary Distribution by Profession

This chart shows the **salary distribution by profession**, giving us a better idea not only of the expected salary range but also of the **frequency of job listings** for each job type.

- **United States:** Data Analysts are among the most sought-after positions.


In [23]:
viz.plot_salary_distribution(df, country="United States") 

### 📦 Salary Dispersion by Country and Job Type

Similar to the previous charts, this visualization shows **salaries by country and job type**.  
Specifically, the **boxplot** allows us to better understand the **spread and dispersion** of salaries.


In [24]:
viz.plot_salary_by_job_title_interactive(df, country= top_countries)

### 🥧 Job Type Proportions

This pie chart shows the **proportion of job types** in the dataset globally.  
Similar to the United States, the most in-demand positions worldwide are **Data Analyst**, **Data Engineer**, and **Data Scientist**.


In [25]:
viz.pie_charts_side_by_side(df, columns=['job_title_short'], maintitle ="Share of Job Titles", textinfo ='label+percent')



This chart highlights that **90% of the job positions offer, at least to some extent, the option to work remotely**, while **69% of listings do not specify any educational requirements**.


In [26]:
viz.pie_charts_side_by_side(df, columns=['job_work_from_home','job_no_degree_mention'], titles =["Work From Home Option", "No Degree Mention in the Job Position"], textinfo='percent')

### ☁️ Skills Word Cloud

This word cloud shows the **most in-demand skills** in the job listings.  
We can see that **Python**, **SQL**, and **Excel** appear most frequently across the positions.


In [27]:
import ast
# Convierte strings que parecen listas en listas reales
df['job_skills'] = df['job_skills'].dropna().apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x)

viz.plot_skills_wordcloud(df)                    

## 📌 Conclusion

After analyzing the global data job market, we can draw several key insights:

- **Top-paying countries:** United States, United Kingdom, Canada, and Germany consistently offer the highest average salaries for data-related positions.  

- **Most in-demand positions:** Data Analyst, Data Engineer, and Data Scientist make up roughly **60% of the job listings**, highlighting their importance across industries worldwide.  

- **Remote work opportunities:** A large majority of positions, approximately 85%, provide some level of remote work flexibility, reflecting the shift towards more flexible working arrangements.  

- **Education requirements:** Interestingly, a significant portion of listings (~85%) do not specify formal educational requirements, indicating that skills and experience may be valued more than formal degrees in many cases.  

- **Top skills in demand:** **Python** leads the list, followed closely by **SQL**, **Excel**, and visualization tools such as **Tableau** and **Power BI**, showing a clear preference for analytical and technical skills in the field.


> ⚠️ Note: Only around **2% of listings include salary information**, so while these trends are informative, they may not fully capture the complete global picture.  

This project is just a starting point. I encourage you to **dig deeper into the data**, explore additional patterns, and **compare these insights with your own country's market**. Understanding global trends can help professionals make better career decisions, identify skill gaps, and stay competitive in the evolving data job landscape.
