# Global AI Content Impact Dataset Analysis

## Introduction

This notebook provides a comprehensive analysis of the Global AI Content Impact Dataset, examining how artificial intelligence adoption affects various industries and countries. The dataset contains information about AI adoption rates, economic impacts, social effects, and regulatory environments across different regions and sectors.

## Dataset Overview

The dataset contains the following columns:
- **Country**: The country where the data was collected
- **Year**: The year of data collection
- **Industry**: The industry sector
- **AI Adoption Rate (%)**: Percentage of AI adoption in the sector
- **AI-Generated Content Volume (TBs per year)**: Volume of AI-generated content
- **Job Loss Due to AI (%)**: Percentage of jobs lost due to AI
- **Revenue Increase Due to AI (%)**: Percentage increase in revenue due to AI
- **Human-AI Collaboration Rate (%)**: Rate of collaboration between humans and AI
- **Top AI Tools Used**: Most commonly used AI tools in the sector
- **Regulation Status**: Regulatory environment (Strict/Moderate/Lenient)
- **Consumer Trust in AI (%)**: Level of consumer trust in AI
- **Market Share of AI Companies (%)**: Market share held by AI companies

## Analysis Objectives

1. Understand the global landscape of AI adoption
2. Examine the economic impact of AI adoption
3. Assess the social implications of AI implementation
4. Identify trends in AI tool usage
5. Evaluate regulatory approaches to AI across countries
6. Discover correlations between different AI impact metrics

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
%matplotlib inline

# Set style for better-looking plots
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

## Data Loading and Initial Inspection

Let's load the dataset and perform initial inspection to understand its structure and content.

In [None]:
# Load the dataset
df = pd.read_csv('Global_AI_Content_Impact_Dataset.csv')

# Display basic information about the dataset
print("Dataset Shape:", df.shape)
print("\nColumn Names:")
print(df.columns.tolist())
print("\nDataset Info:")
print(df.info())

In [None]:
# Display first few rows
df.head(10)

In [None]:
# Summary statistics
df.describe()

In [None]:
# Check for missing values
missing_values = df.isnull().sum()
print("Missing Values:")
print(missing_values)
print(f"\nTotal missing values: {missing_values.sum()}")

## Distribution Analysis

Let's examine the distribution of records across different categorical variables to understand the composition of our dataset.

In [None]:
# Distribution of countries
plt.figure(figsize=(12, 6))
country_counts = df['Country'].value_counts()
sns.barplot(x=country_counts.index, y=country_counts.values)
plt.title('Distribution of Records by Country', fontsize=16, fontweight='bold')
plt.xticks(rotation=45)
plt.ylabel('Number of Records')
plt.xlabel('Country')
plt.tight_layout()
plt.show()

print(f"Number of unique countries: {df['Country'].nunique()}")
print(f"Top 5 countries by record count:")
print(country_counts.head())

In [None]:
# Distribution of years
plt.figure(figsize=(10, 6))
year_counts = df['Year'].value_counts().sort_index()
sns.barplot(x=year_counts.index, y=year_counts.values)
plt.title('Distribution of Records by Year', fontsize=16, fontweight='bold')
plt.xlabel('Year')
plt.ylabel('Number of Records')
plt.show()

print(f"Year range: {int(df['Year'].min())} - {int(df['Year'].max())}")
print(f"Number of unique years: {df['Year'].nunique()}")

In [None]:
# Distribution of industries
plt.figure(figsize=(14, 6))
industry_counts = df['Industry'].value_counts()
sns.barplot(x=industry_counts.index, y=industry_counts.values)
plt.title('Distribution of Records by Industry', fontsize=16, fontweight='bold')
plt.xticks(rotation=45)
plt.ylabel('Number of Records')
plt.xlabel('Industry')
plt.tight_layout()
plt.show()

print(f"Number of unique industries: {df['Industry'].nunique()}")
print(f"Top 5 industries by record count:")
print(industry_counts.head())

## AI Adoption Rate Analysis

AI adoption rate is a key metric in our dataset. Let's analyze how it varies across different dimensions.

In [None]:
# AI Adoption Rate by Country
plt.figure(figsize=(14, 7))
sns.boxplot(data=df, x='Country', y='AI Adoption Rate (%)')
plt.title('AI Adoption Rate Distribution by Country', fontsize=16, fontweight='bold')
plt.xticks(rotation=45)
plt.ylabel('AI Adoption Rate (%)')
plt.xlabel('Country')
plt.tight_layout()
plt.show()

# Calculate mean AI adoption rate by country
mean_adoption_by_country = df.groupby('Country')['AI Adoption Rate (%)'].mean().sort_values(ascending=False)
print("Mean AI Adoption Rate by Country (Top 10):")
print(mean_adoption_by_country.head(10))

In [None]:
# AI Adoption Rate by Industry
plt.figure(figsize=(14, 7))
sns.boxplot(data=df, x='Industry', y='AI Adoption Rate (%)')
plt.title('AI Adoption Rate Distribution by Industry', fontsize=16, fontweight='bold')
plt.xticks(rotation=45)
plt.ylabel('AI Adoption Rate (%)')
plt.xlabel('Industry')
plt.tight_layout()
plt.show()

# Calculate mean AI adoption rate by industry
mean_adoption_by_industry = df.groupby('Industry')['AI Adoption Rate (%)'].mean().sort_values(ascending=False)
print("Mean AI Adoption Rate by Industry (Top 10):")
print(mean_adoption_by_industry.head(10))

## Correlation Analysis

Let's examine the relationships between different numerical variables in our dataset to identify potential correlations.

In [None]:
# Define numerical columns for correlation analysis
numerical_cols = [
    'AI Adoption Rate (%)', 
    'AI-Generated Content Volume (TBs per year)', 
    'Job Loss Due to AI (%)', 
    'Revenue Increase Due to AI (%)', 
    'Human-AI Collaboration Rate (%)',
    'Consumer Trust in AI (%)',
    'Market Share of AI Companies (%)'
]

# Create correlation matrix
correlation_matrix = df[numerical_cols].corr()

# Plot correlation matrix
plt.figure(figsize=(12, 10))
mask = np.triu(np.ones_like(correlation_matrix, dtype=bool))
sns.heatmap(correlation_matrix, annot=True, cmap='RdBu_r', center=0, 
            square=True, fmt='.2f', cbar_kws={"shrink": .8}, mask=mask)
plt.title('Correlation Matrix of Numerical Variables', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

print("Strong correlations (>0.5 or <-0.5):")
strong_corr = []
for i in range(len(correlation_matrix.columns)):
    for j in range(i+1, len(correlation_matrix.columns)):
        corr_val = correlation_matrix.iloc[i, j]
        if abs(corr_val) > 0.5:
            strong_corr.append((correlation_matrix.columns[i], 
                               correlation_matrix.columns[j], 
                               round(corr_val, 3)))

for var1, var2, corr in strong_corr:
    print(f"{var1} - {var2}: {corr}")

## Economic Impact Analysis

Let's examine the economic implications of AI adoption, focusing on revenue increases and job displacement.

In [None]:
# Relationship between AI Adoption Rate and Revenue Increase
plt.figure(figsize=(12, 7))
sns.scatterplot(data=df, x='AI Adoption Rate (%)', y='Revenue Increase Due to AI (%)', 
                hue='Country', alpha=0.7, s=100)
plt.title('AI Adoption Rate vs Revenue Increase by Country', fontsize=16, fontweight='bold')
plt.xlabel('AI Adoption Rate (%)')
plt.ylabel('Revenue Increase Due to AI (%)')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

# Calculate correlation between AI adoption and revenue increase
adoption_revenue_corr = df['AI Adoption Rate (%)'].corr(df['Revenue Increase Due to AI (%)'])
print(f"Correlation between AI Adoption Rate and Revenue Increase: {adoption_revenue_corr:.3f}")

In [None]:
# Relationship between Job Loss and AI Adoption Rate
plt.figure(figsize=(12, 7))
sns.scatterplot(data=df, x='AI Adoption Rate (%)', y='Job Loss Due to AI (%)', 
                hue='Country', alpha=0.7, s=100)
plt.title('AI Adoption Rate vs Job Loss by Country', fontsize=16, fontweight='bold')
plt.xlabel('AI Adoption Rate (%)')
plt.ylabel('Job Loss Due to AI (%)')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

# Calculate correlation between AI adoption and job loss
adoption_jobloss_corr = df['AI Adoption Rate (%)'].corr(df['Job Loss Due to AI (%)'])
print(f"Correlation between AI Adoption Rate and Job Loss: {adoption_jobloss_corr:.3f}")

## AI Tools and Technology Analysis

Let's analyze the most popular AI tools used across different sectors and regions.

In [None]:
# Top AI tools used
plt.figure(figsize=(12, 7))
tool_counts = df['Top AI Tools Used'].value_counts()
sns.barplot(x=tool_counts.values, y=tool_counts.index)
plt.title('Most Popular AI Tools', fontsize=16, fontweight='bold')
plt.xlabel('Number of Records')
plt.ylabel('AI Tool')
plt.show()

print("Popularity of AI Tools:")
for tool, count in tool_counts.items():
    percentage = (count / len(df)) * 100
    print(f"{tool}: {count} records ({percentage:.1f}%)")

## Regulatory Environment Analysis

Let's examine the distribution of regulation statuses and how they relate to other metrics.

In [None]:
# Regulation Status distribution
plt.figure(figsize=(10, 6))
regulation_counts = df['Regulation Status'].value_counts()
colors = ['#ff9999', '#66b3ff', '#99ff99']
plt.pie(regulation_counts.values, labels=regulation_counts.index, autopct='%1.1f%%', colors=colors)
plt.title('Distribution of Regulation Status', fontsize=16, fontweight='bold')
plt.show()

print("Regulation Status Distribution:")
for status, count in regulation_counts.items():
    percentage = (count / len(df)) * 100
    print(f"{status}: {count} records ({percentage:.1f}%)")

## Social Impact Analysis

Let's examine consumer trust in AI and human-AI collaboration rates.

In [None]:
# Consumer Trust by Country
plt.figure(figsize=(14, 7))
sns.boxplot(data=df, x='Country', y='Consumer Trust in AI (%)')
plt.title('Consumer Trust in AI Distribution by Country', fontsize=16, fontweight='bold')
plt.xticks(rotation=45)
plt.ylabel('Consumer Trust in AI (%)')
plt.xlabel('Country')
plt.tight_layout()
plt.show()

# Calculate mean consumer trust by country
mean_trust_by_country = df.groupby('Country')['Consumer Trust in AI (%)'].mean().sort_values(ascending=False)
print("Mean Consumer Trust in AI by Country (Top 10):")
print(mean_trust_by_country.head(10))

In [None]:
# Human-AI Collaboration Rate by Industry
plt.figure(figsize=(14, 7))
sns.boxplot(data=df, x='Industry', y='Human-AI Collaboration Rate (%)')
plt.title('Human-AI Collaboration Rate Distribution by Industry', fontsize=16, fontweight='bold')
plt.xticks(rotation=45)
plt.ylabel('Human-AI Collaboration Rate (%)')
plt.xlabel('Industry')
plt.tight_layout()
plt.show()

# Calculate mean collaboration rate by industry
mean_collaboration_by_industry = df.groupby('Industry')['Human-AI Collaboration Rate (%)'].mean().sort_values(ascending=False)
print("Mean Human-AI Collaboration Rate by Industry (Top 10):")
print(mean_collaboration_by_industry.head(10))

## Cross-Dimensional Analysis

Let's examine how different metrics vary across multiple dimensions simultaneously.

In [None]:
# Average metrics by country
country_metrics = df.groupby('Country').agg({
    'AI Adoption Rate (%)': 'mean',
    'Revenue Increase Due to AI (%)': 'mean',
    'Job Loss Due to AI (%)': 'mean',
    'Consumer Trust in AI (%)': 'mean',
    'Market Share of AI Companies (%)': 'mean',
    'Human-AI Collaboration Rate (%)': 'mean'
}).round(2)

print("Average Metrics by Country (Sorted by AI Adoption Rate):")
print(country_metrics.sort_values(by='AI Adoption Rate (%)', ascending=False))

In [None]:
# Average metrics by industry
industry_metrics = df.groupby('Industry').agg({
    'AI Adoption Rate (%)': 'mean',
    'Revenue Increase Due to AI (%)': 'mean',
    'Job Loss Due to AI (%)': 'mean',
    'Consumer Trust in AI (%)': 'mean',
    'Market Share of AI Companies (%)': 'mean',
    'Human-AI Collaboration Rate (%)': 'mean'
}).round(2)

print("Average Metrics by Industry (Sorted by AI Adoption Rate):")
print(industry_metrics.sort_values(by='AI Adoption Rate (%)', ascending=False))

## Temporal Analysis

Let's examine trends over time for key metrics.

In [None]:
# Trends over time for key metrics
time_trends = df.groupby('Year').agg({
    'AI Adoption Rate (%)': 'mean',
    'Revenue Increase Due to AI (%)': 'mean',
    'Job Loss Due to AI (%)': 'mean',
    'Consumer Trust in AI (%)': 'mean',
    'Human-AI Collaboration Rate (%)': 'mean',
    'AI-Generated Content Volume (TBs per year)': 'mean'
}).round(2)

fig, axes = plt.subplots(2, 3, figsize=(18, 10))
fig.suptitle('Trends Over Time for Key Metrics', fontsize=16, fontweight='bold')

metrics = time_trends.columns
axes = axes.flatten()

for i, metric in enumerate(metrics):
    axes[i].plot(time_trends.index, time_trends[metric], marker='o', linewidth=2, markersize=8)
    axes[i].set_title(metric)
    axes[i].set_xlabel('Year')
    axes[i].set_ylabel('Value')
    axes[i].grid(True, linestyle='--', alpha=0.6)

# Hide the last subplot if we have fewer than 6 metrics
if len(metrics) < 6:
    axes[len(metrics)].axis('off')

plt.tight_layout()
plt.show()

print("Trend Analysis:")
for col in time_trends.columns:
    trend = "increasing" if time_trends[col].iloc[-1] > time_trends[col].iloc[0] else "decreasing"
    print(f"{col}: {trend} trend from {time_trends[col].iloc[0]:.2f} to {time_trends[col].iloc[-1]:.2f}")

## Advanced Analysis: AI Impact Patterns

Let's identify patterns in how AI impacts vary across different combinations of country and industry.

In [None]:
# Create a pivot table for AI adoption by country and industry
pivot_adoption = df.pivot_table(
    values='AI Adoption Rate (%)', 
    index='Country', 
    columns='Industry', 
    aggfunc='mean'
)

# Plot heatmap
plt.figure(figsize=(14, 10))
sns.heatmap(pivot_adoption, annot=True, fmt='.1f', cmap='viridis', cbar_kws={'label': 'AI Adoption Rate (%)'})
plt.title('AI Adoption Rate by Country and Industry', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

print("Highest AI Adoption Combinations:")
high_adoption = []
for country in pivot_adoption.index:
    for industry in pivot_adoption.columns:
        if not pd.isna(pivot_adoption.loc[country, industry]):
            high_adoption.append((country, industry, pivot_adoption.loc[country, industry]))

high_adoption.sort(key=lambda x: x[2], reverse=True)
for country, industry, rate in high_adoption[:10]:
    print(f"{country} - {industry}: {rate:.1f}%")

## Summary of Key Findings

Based on our comprehensive analysis, here are the key findings:

In [None]:
# Summary of findings
print("\nSUMMARY OF KEY FINDINGS:")
print(f"1. Dataset contains {df.shape[0]} records across {df['Country'].nunique()} countries, "
      f"{df['Industry'].nunique()} industries, and {df['Year'].nunique()} years ({int(df['Year'].min())}-{int(df['Year'].max())})")
print(f"2. Most popular AI tools: {', '.join(df['Top AI Tools Used'].value_counts().head(3).index)}")
print(f"3. Average AI adoption rate: {df['AI Adoption Rate (%)'].mean():.2f}%")
print(f"4. Average revenue increase due to AI: {df['Revenue Increase Due to AI (%)'].mean():.2f}%")
print(f"5. Average job loss due to AI: {df['Job Loss Due to AI (%)'].mean():.2f}%")
print(f"6. Average consumer trust in AI: {df['Consumer Trust in AI (%)'].mean():.2f}%")
print(f"7. Countries with highest average AI adoption: {', '.join(country_metrics.nlargest(3, 'AI Adoption Rate (%)').index)}")
print(f"8. Industries with highest average revenue increase from AI: {', '.join(industry_metrics.nlargest(3, 'Revenue Increase Due to AI (%)').index)}")
print(f"9. Strongest positive correlation: {max([(abs(correlation_matrix.iloc[i, j]), correlation_matrix.columns[i], correlation_matrix.columns[j]) 
       for i in range(len(correlation_matrix.columns)) for j in range(i+1, len(correlation_matrix.columns))])[1:]} ")
print(f"10. Regulatory approach: {regulation_counts.idxmax()} ({regulation_counts.max()} records)")

print("\nIMPLICATIONS:")
print("- AI adoption appears to have a positive correlation with revenue increases in many cases")
print("- Different countries and industries show varying levels of AI adoption and impact")
print("- Consumer trust in AI varies significantly across regions")
print("- Regulatory approaches differ widely across countries")
print("- Certain industries appear to be leading in AI adoption and implementation")

## Conclusion

This analysis provides insights into the global landscape of AI adoption and its multifaceted impacts. The data reveals significant variations across countries, industries, and time periods. Key observations include:

1. **Geographic Variations**: Different countries show varying levels of AI adoption and impact, suggesting cultural, economic, or policy-related factors.

2. **Industry Differences**: Certain industries like Gaming, Manufacturing, and Media appear to be early adopters of AI technologies.

3. **Economic Impact**: There's evidence of positive correlation between AI adoption and revenue increases, though job displacement remains a concern.

4. **Social Implications**: Consumer trust in AI varies significantly, which could impact the success of AI implementations.

5. **Regulatory Landscape**: Countries have adopted different approaches to regulating AI, which may influence adoption rates and impacts.

These insights can help organizations, policymakers, and researchers understand the current state of AI adoption and its implications for strategic planning and policy development.