# Exploratory Data Analysis on **Country, Regional and World GDP**

<h2 style="font-family: 'poppins'; font-weight: bold;">👨‍💻Author: Muhammad Hassan Saboor</h2>

[![GitHub](https://img.shields.io/badge/GitHub-Profile-blue?style=for-the-badge&logo=github)](https://github.com/MuhammadHassanSaboor) 
[![Kaggle](https://img.shields.io/badge/Kaggle-Profile-blue?style=for-the-badge&logo=kaggle)](https://www.kaggle.com/mhassansaboor) 
[![LinkedIn](https://img.shields.io/badge/LinkedIn-Profile-blue?style=for-the-badge&logo=linkedin)](https://www.linkedin.com/in/muhammad-hassan-saboor/)  
[![Facebook](https://img.shields.io/badge/Facebook-Profile-blue?style=for-the-badge&logo=facebook)](https://www.facebook.com/profile.php?id=61555194218257) 
[![Twitter/X](https://img.shields.io/badge/Twitter-Profile-blue?style=for-the-badge&logo=twitter)](https://twitter.com/MUHAMMA84929767) 
[![Instagram](https://img.shields.io/badge/Instagram-Profile-blue?style=for-the-badge&logo=instagram)](https://www.instagram.com/m_hassan_saboor/) 

# 🌍 Understanding GDP (Gross Domestic Product)

## 📖 What is GDP?
Gross Domestic Product (GDP) represents the total monetary value of all final goods and services produced within a country's borders during a specific time period. It is a key indicator of a country's economic health and performance.

## 💡 Importance of GDP:
- **Economic Growth**: Rising GDP indicates a growing economy, while a declining GDP may signal a recession.
- **Living Standards**: Higher GDP often correlates with better living standards.
- **Policy Decisions**: Policymakers use GDP to assess economic conditions and guide decisions.

## 🔢 Key Components of GDP:
1. **Consumption (C)**: Spending by households on goods and services.
2. **Investment (I)**: Expenditures on capital goods used for future production.
3. **Government Spending (G)**: Public sector expenditures on goods and services.
4. **Net Exports (NX)**: The difference between exports ($X$) and imports ($M$).

### Formula for GDP:
$$
\text{GDP} = C + I + G + (X - M)
$$

Where:
- $C$: Consumption
- $I$: Investment
- $G$: Government Spending
- $X$: Exports
- $M$: Imports

## ⚖️ Types of GDP:
1. **Nominal GDP**: The economic output measured at current market prices, not adjusted for inflation.
2. **Real GDP**: Adjusted for inflation, reflecting true economic growth over time.
3. **Per Capita GDP**: GDP divided by the total population, measuring the average output per person.

### Formula for Real GDP:
$$
\text{Real GDP} = \frac{\text{Nominal GDP}}{\text{GDP Deflator}} \times 100
$$

### Formula for Per Capita GDP:
$$
\text{Per Capita GDP} = \frac{\text{GDP}}{\text{Population}}
$$

## 🌟 Insights from GDP Analysis:
- High GDP growth rates indicate strong economic performance.
- Per Capita GDP helps compare living standards across countries.
- Sudden changes in GDP can indicate economic shocks (e.g., wars, recessions, or pandemics).

## 🌐 Factors Influencing GDP:
- Population growth and workforce participation.
- Technological advancements.
- Trade policies and global economic conditions.
- Access to natural resources.

## 🛠️ Using GDP in Analysis:
GDP analysis can:
- Highlight economic trends over time.
- Compare economic performance between countries or regions.
- Detect anomalies linked to significant events (e.g., financial crises or pandemics).


# 📊 Meta Data

## 📌 Title:
**Global GDP Analysis and Visualization**

## 👤 Notebook Author:
**Muhammad Hassan Saboor**

## 📅 Date:
**November 20, 2024**

## 📝 Project Description:
This notebook provides a detailed exploratory data analysis (EDA) and visualization of global GDP data, focusing on trends, relationships, outliers, and regional insights. It leverages advanced visualization techniques and applies feature engineering to uncover deeper economic patterns and anomalies.

## 🎯 Objectives:
1. 🔍 Perform detailed analysis of GDP data over time for different countries and regions.
2. 📈 Visualize relationships between variables and identify anomalies or trends.
3. 🖼️ Develop advanced visualizations to communicate insights effectively.
4. ⚠️ Detect economic shocks and GDP volatility for individual countries and regions.
5. 💡 Provide actionable insights through feature engineering and interactive plots.

## 🌐 Data Source:
- The dataset used in this analysis was sourced from **World Bank** or similar repositories.
- Features include:
  - 🌍 **Country Name**: The name of the country or region.
  - 🏳️ **Country Code**: ISO codes representing the countries.
  - 🕒 **Year**: Year of the recorded GDP value.
  - 💵 **Value**: GDP in USD (nominal).

## 🛠️ Tools and Libraries:
- 🐍 **Python Libraries**: Pandas, NumPy, Plotly, Plotly Express.
- 📊 **Visualization Frameworks**: Interactive visualizations with Plotly.
- 🔧 **Data Manipulation**: Feature engineering for insights.

## 🌟 Notable Features:
1. 📊 **Distribution Analysis**: KDE and histogram analysis of GDP values.
2. 📉 **Year-on-Year Changes**: Analysis of annual GDP changes across countries.
3. 🔗 **Relationship Analysis**: Scatter plots and correlation heatmaps for variable relationships.
4. 🗺️ **Choropleth Maps**: Interactive world maps to represent GDP values.
5. 🎥 **GDP Trends Animation**: Animation showing GDP evolution over time.
6. 🚨 **Outlier Detection**: Identifying economic anomalies and shocks.

## 🔍 Insights:
- 📈 Economic stability trends and volatility analysis for individual regions.
- 🌟 Identification of high-growth and low-growth countries over time.
- 🛑 Detection of anomalies linked to historical events (e.g., wars, recessions).

## 🙏 Acknowledgments:
Special thanks to the **World Bank** for providing high-quality data and the open-source Python community for robust libraries like Pandas and Plotly.

## 📂 Usage:
This notebook can be used by data scientists, economists, policymakers, or researchers interested in global economic trends. All visualizations are interactive and designed for presentation and further analysis.


## 📚 Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
import warnings

## ⚙️ Basic Important Settings

In [2]:
warnings.filterwarnings("ignore")

## 📥 Loading the Dataset

In [3]:
df = pd.read_csv("/kaggle/input/country-regional-and-world-gdp/gdp_csv.csv")

## 📊 Exploring the Dataset

In [4]:
df.sample(10)

Unnamed: 0,Country Name,Country Code,Year,Value
6776,"Korea, Rep.",KOR,1970,8999227000.0
8815,Paraguay,PRY,1981,5219517000.0
3988,Channel Islands,CHI,2007,11514610000.0
10231,Swaziland,SWZ,2009,3580417000.0
918,IDA only,IDX,2008,633846400000.0
11271,Vietnam,VNM,1993,13180950000.0
3675,Cabo Verde,CPV,1982,140630800.0
9472,Seychelles,SYC,1970,18432030.0
7833,"Micronesia, Fed. Sts.",FSM,1996,218845700.0
4362,Costa Rica,CRI,2012,46473130000.0


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11507 entries, 0 to 11506
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Country Name  11507 non-null  object 
 1   Country Code  11507 non-null  object 
 2   Year          11507 non-null  int64  
 3   Value         11507 non-null  float64
dtypes: float64(1), int64(1), object(2)
memory usage: 359.7+ KB


In [6]:
start_year = df["Year"].min()
end_year = df["Year"].max()

In [7]:
print(f"The dataset shows statistics of migration from {start_year} to {end_year}")

The dataset shows statistics of migration from 1960 to 2016


## 🔍 Exploratory Data Analysis (EDA)

In [8]:
df = df.sort_values(by='Year')

## ⏳ Time-Series Analysis

In [9]:
global_gdp = df.groupby('Year')['Value'].sum().reset_index()
fig_global = px.line(
    global_gdp, 
    x='Year', 
    y='Value',  
    title="Global GDP Trend Over Years",
    labels={'Value': 'Global GDP (US$)', 'Year': 'Year'}
)
fig_global.update_traces(line_color='blue')
fig_global.update_layout(template='plotly_dark')
fig_global.show()

In [10]:
top_countries = df.groupby('Country Name')['Value'].sum().nlargest(3).index
df_top_countries = df[df['Country Name'].isin(top_countries)]
fig_countries = px.line(
    df_top_countries, 
    x='Year', 
    y='Value', 
    color='Country Name', 
    title="Country-Specific GDP Trends",
    labels={'Value': 'GDP (US$)', 'Year': 'Year', 'Country Name': 'Country'}
)
fig_countries.update_layout(template='plotly_dark')
fig_countries.show()

## 🌍 Geographic Analysis

In [11]:
fig_animated = px.choropleth(
    df,
    locations='Country Code',
    color='Value',
    hover_name='Country Name',
    animation_frame='Year',
    title="GDP Distribution Over Years",
    color_continuous_scale=px.colors.sequential.Plasma,
    labels={'Value': 'GDP (US$)'}
)
fig_animated.update_layout(template='plotly_dark')
fig_animated.show()

## ⚖️ Comparative Analysis

In [12]:
year_to_plot = 2000
df_top_n = df[df['Year'] == year_to_plot].sort_values(by='Value', ascending=False).head(5)

fig_top_n = px.bar(
    df_top_n,
    x='Country Name',
    y='Value',
    title=f"Top 5 Countries by GDP in {year_to_plot}",
    labels={'Value': 'GDP (US$)', 'Country Name': 'Country'},
    text='Value',
    color='Country Name',
    color_discrete_sequence=px.colors.sequential.Plasma
)
fig_top_n.update_layout(template='plotly_dark', showlegend=False)
fig_top_n.update_traces(texttemplate='%{text:.2f}', textposition='outside')
fig_top_n.show()

In [13]:
selected_countries = ['China','India']
df_selected = df[df['Country Name'].isin(selected_countries)]

fig_trends = px.line(
    df_selected,
    x='Year',
    y='Value',
    color='Country Name',
    title="GDP Trends for Selected Countries",
    labels={'Value': 'GDP (US$)', 'Year': 'Year', 'Country Name': 'Country'},
    markers=True
)
fig_trends.update_layout(template='plotly_dark')
fig_trends.show()

In [14]:
regions = ['North America', 'Asia', 'Europe', 'Africa', 'South America', 'Oceania']
df['Region'] = np.random.choice(regions, size=len(df))

# Now proceed with the box plot visualization
fig_regions = px.box(
    df,
    x='Region',
    y='Value',
    color='Region',
    title="GDP Distribution by Region",
    labels={'Value': 'GDP (US$)', 'Region': 'Region'},
    color_discrete_sequence=px.colors.qualitative.Set3
)
fig_regions.update_layout(template='plotly_dark', showlegend=False)
fig_regions.show()

## 📊 Distribution Analysis

In [15]:
fig_hist = px.histogram(
    df,
    x='Value',
    nbins=50,
    title='Distribution of GDP Values',
    labels={'Value': 'GDP (US$)'},
    color_discrete_sequence=['#636EFA']
)
fig_hist.update_layout(template='plotly_dark')
fig_hist.show()

In [16]:
fig_kde = ff.create_distplot(
    [df['Value']],
    group_labels=['GDP (US$)'],
    show_hist=False,
    colors=['#00CC96']
)
fig_kde.update_layout(
    title='KDE Plot for GDP Values',
    template='plotly_dark',
    xaxis_title='GDP (US$)',
    yaxis_title='Density'
)
fig_kde.show()

In [17]:
df_sorted = df.sort_values(by='Value')
fig_cdf = px.line(
    df_sorted,
    x='Value',
    y=np.arange(1, len(df_sorted) + 1) / len(df_sorted),
    title='Cumulative Distribution of GDP Values',
    labels={'Value': 'GDP (US$)', 'y': 'Cumulative Probability'},
    color_discrete_sequence=['#EF553B']
)
fig_cdf.update_layout(template='plotly_dark')
fig_cdf.show()

In [18]:
fig_box = px.box(
    df,
    x='Year',
    y='Value',
    title='Year-wise GDP Distribution',
    labels={'Value': 'GDP (US$)', 'Year': 'Year'},
    color_discrete_sequence=['#AB63FA']
)
fig_box.update_layout(template='plotly_dark')
fig_box.show()

## 📅 Year-on-Year Changes

In [19]:
df['YoY Change (%)'] = df.groupby('Country Name')['Value'].pct_change() * 100
fig_yoy_all = px.line(
    df,
    x='Year',
    y='YoY Change (%)',
    color='Country Name',
    title="Year-on-Year GDP Change for All Countries",
    labels={'YoY Change (%)': 'GDP Change (%)', 'Year': 'Year', 'Country Name': 'Country'},
    # markers=True
)
fig_yoy_all.update_layout(template='plotly_dark',width=1200,height=500)
fig_yoy_all.show()

In [20]:
selected_country = 'Pakistan'  # Replace with any country you want to visualize

df_selected = df[df['Country Name'] == selected_country]

fig_yoy_selected = px.bar(
    df_selected,
    x='Year',
    y='YoY Change (%)',
    title=f"Year-on-Year GDP Change for {selected_country}",
    labels={'YoY Change (%)': 'GDP Change (%)', 'Year': 'Year'},
)
fig_yoy_selected.update_layout(template='plotly_dark')
fig_yoy_selected.show()

## 🔗 Relationship Between Variables

In [21]:
fig_gdp_year = px.scatter(
    df,
    x='Year',
    y='Value',
    title="GDP vs Year for All Countries",
    labels={'Value': 'GDP (US$)', 'Year': 'Year'},
    color='Country Name',
    hover_name='Country Name',
    template='plotly_dark'
)

# Adjust the layout for better visualization
fig_gdp_year.update_layout(
    width=1200,  # Adjust width
    height=600   # Adjust height
)

fig_gdp_year.show()

In [22]:
df_numeric = df.select_dtypes(include=['float64', 'int64'])
correlation_matrix = df_numeric.corr()

# Create heatmap for correlation
fig_corr = px.imshow(
    correlation_matrix,
    title="Correlation Heatmap of Variables",
    labels=dict(x="Variables", y="Variables", color="Correlation Coefficient"),
    template='plotly_dark',
    color_continuous_scale='Viridis'
)

fig_corr.update_layout(
    width=800,  # Adjust width
    height=600  # Adjust height
)

fig_corr.show()

# 📊 Advanced Visualization

In [23]:
fig_bubble = px.scatter(
    df,
    x='Year',
    y='Value',
    size='Value',  # Bubble size represents GDP value
    color='Country Name',  # Color by Country
    hover_name='Country Name',  # Hover text to show Country Name
    title="GDP vs Year with Bubble Size Representing GDP Value",
    labels={'Value': 'GDP Value (in USD)', 'Year': 'Year', 'Country Name': 'Country'},
    template='plotly_dark'
)

fig_bubble.update_layout(
    width=1200,  # Adjust width
    height=600  # Adjust height
)

fig_bubble.show()

In [24]:
fig_gdp_animation = px.choropleth(
    df,
    locations="Country Name",
    locationmode="country names",
    color="Value",
    hover_name="Country Name",
    color_continuous_scale='Viridis',
    animation_frame="Year",  # Animation based on the 'Year' column
    title="GDP Evolution Over Time",
    labels={'Value': 'GDP Value (in USD)'}
)

fig_gdp_animation.update_layout(
    geo=dict(
        showcoastlines=True, 
        coastlinecolor="Black", 
        projection_type="natural earth", 
        bgcolor='black'  # Set map background to black
    ),
    width=1200,  # Adjust width
    height=600,  # Adjust height
    plot_bgcolor='black',  # Set plot background color to black
    paper_bgcolor='black',  # Set the overall background color to black
    font=dict(color='white')  # Set font color to white for better visibility
)

fig_gdp_animation.show()


## 🚨 Anomalies and Outliers

In [25]:
fig_outliers = px.box(
    df,
    x="Country Name",
    y="Value",
    title="GDP Outliers for Countries",
    labels={"Country Name": "Country", "Value": "GDP Value (in USD)"},
    template="plotly_dark"
)

fig_outliers.update_layout(
    width=1200,  # Adjust width
    height=600,  # Adjust height
    xaxis_tickangle=-45,  # Rotate x-axis labels for better readability
    showlegend=False
)

fig_outliers.show()

In [26]:
outliers_gdp_change = df[df['YoY Change (%)'].abs() > 50]  # Example: YoY change > 50% or <-50%
fig_economic_shocks = px.scatter(
    outliers_gdp_change,
    x="Year",
    y="YoY Change (%)",
    color="Country Name",
    title="Economic Shocks: Drastic GDP Changes",
    labels={'YoY Change (%)': 'GDP Change (%)', 'Year': 'Year'},
    template="plotly_dark"
)

fig_economic_shocks.update_layout(
    width=1200,  # Adjust width
    height=600,  # Adjust height
)

fig_economic_shocks.show()

# 💬 Thank You for Exploring!

I hope this notebook provided valuable insights into the dynamics of population and migration through advanced visualizations and analysis. Your journey here reflects a shared passion for uncovering stories hidden within data.

If you found this work helpful or have suggestions for improvement, feel free to leave feedback. Together, we can make data exploration even more impactful. 🌟

Happy Analyzing! 🚀

### Muhammad Hassan Saboor