# Project Title - Edit Me


## Data set selection

> In this section, you will need to provide the following information about the selected data set:
>
> - Source with a link: https://www.kaggle.com/datasets/emirhanakku/ai-workforce-and-automation-dataset-20152025
> - Fields: Country, Year, AI Investment (% of GDP), Automation Rate (% of jobs), Employment Rate (% of population), Average Annual Wage, Jobs Created, Jobs Displaced, Reskilling Investment, AI Readiness Index, Productivity Change

> - License: CC BY-NC 4.0 You are free to use, remix, and adapt the dataset for academic, educational, and non-commercial projects with proper attribution. Commercial use is not allowed.

### Data set selection rationale

> Why did you select this data set? I chose this dataset because it provides a realistic picture of how artificial intelligence impacts global labor markets. It captures economic and social effects from rising productivity to workforce displacement making it ideal for statistical analysis and visual storytelling about the balance between automation and human employment.

### Questions to be answered

> Using statistical analysis and visualization, what questions would you like to be able answer about this dataset.
>
> - How does increased AI investment correlate with automation rates across countries and years?
> - Does a higher automation rate correspond to lower employment rates?
> - Is there evidence that higher reskilling investments lead to more job creation and less displacement?
> - Which countries are leading in AI readiness, and how does that relate to productivity and employment stability?
> - How have automation rates and employment levels evolved globally over the decade?


### Visualization ideas

> Provide a few examples of what you plan to visualize to answer the questions you posed in the previous section. In this project, you will be producing 6-8 visualizations. You will also be producing an interactive chart using Plotly.
> Think about what those visualization could be: what are the variables used in the charts? what insights do you hope to gain from them? 
| Visualization Type &nbsp; &nbsp; &nbsp;| &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Description&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Insight&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|
| ------------------------- | 
| **Line Chart**                  | AI Investment vs. Year per country            | Shows how AI spending is increasing and which nations lead.                        |
| **Scatter Plot**                | AI Investment vs. Automation Rate             | Reveals the strength of correlation between spending and automation.               |
| **Bar Chart**                   | Job Creation vs. Job Displacement per Country | Highlights which economies are adapting vs. losing jobs.                           |
| **Heatmap**                     | Correlation among all numeric variables       | Quickly shows which metrics are most related (e.g., AI investment â†” productivity). |
| **Geographical Map** | Automation Rate by Country                    | Provides spatial visualization of automation impact worldwide.                     |
| **Dual Axis Plot**              | Employment Rate vs. Automation Rate           | Illustrates how automation may inversely affect employment trends.                 |



In [None]:
# ðŸš€ Importing some libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Load the dataset
df = pd.read_csv('data/global_ai_workforce_automation_2015_2025.csv')

# Display the first few rows
df.head()


In [None]:
# Display basic information about the dataset
print("Dataset Shape:", df.shape)
print("\nColumn Names and Data Types:")
print(df.dtypes)
print("\nBasic Statistics:")
df.describe()


In [None]:
# Check for missing values
print("Missing Values per Column:")
print(df.isnull().sum())
print("\nTotal Missing Values:", df.isnull().sum().sum())
