# GDP per Capita Analysis with Pandas

This notebook analyses GDP per capita across different countries and UN regions.  
We clean the data, calculate averages, rank countries, and create simple visualisations.

## 1. Load the dataset

We load `gdp_per_capita.csv` into a DataFrame.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Load GDP per capita dataset
df = pd.read_csv('../data/gdp_per_capita.csv')

df.head(10)

## 2. Basic inspection

We check the structure and summary statistics to understand the shape of the dataset.

In [None]:
# Show column names, data types, and non-null counts
df.info()

In [None]:
# Summary statistics for numeric and non-numeric columns
df.describe(include='all')

## 3. Select relevant columns

This dataset includes multiple fields, but we focus on:
- `Country/Territory`
- `UN_Region`
- `GDP_per_capita`

In [None]:
cols_to_keep = ['Country/Territory', 'UN_Region', 'GDP_per_capita']
df = df[cols_to_keep]

df.head()

## 4. Handle missing values

We remove rows with missing values in key columns to ensure accurate calculations.

In [None]:
# Check how many missing values are in each column
df.isna().sum()

In [None]:
# Remove rows with missing critical values
df_clean = df.dropna(subset=['Country/Territory', 'UN_Region', 'GDP_per_capita'])

df_clean.head()

## 5. GDP per capita by UN region

Calculate the average GDP per capita for each UN region.

In [None]:
region_gdp = (
    df_clean
    .groupby('UN_Region')['GDP_per_capita']
    .mean()
    .sort_values(ascending=False)
)

region_gdp

### 5.1 Visualise regional averages

A bar chart makes it easy to compare average GDP per capita across regions.

In [None]:
plt.figure(figsize=(10, 6))
region_gdp.plot(kind='bar')
plt.ylabel('Average GDP per capita')
plt.title('Average GDP per Capita by UN Region')
plt.xticks(rotation=45, ha='right')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

## 6. Top and bottom countries by GDP per capita

We identify the 10 richest and 10 poorest countries according to GDP per capita.

In [None]:
# Top 10 countries
top_10 = df_clean.sort_values(by='GDP_per_capita', ascending=False).head(10)
top_10

In [None]:
# Bottom 10 countries
bottom_10 = df_clean.sort_values(by='GDP_per_capita', ascending=True).head(10)
bottom_10

### 6.1 Visualise top 10 countries

A horizontal bar chart makes it easy to compare GDP per capita across top-performing countries.

In [None]:
plt.figure(figsize=(10, 6))
plt.barh(top_10['Country/Territory'], top_10['GDP_per_capita'])
plt.xlabel('GDP per capita')
plt.title('Top 10 Countries by GDP per Capita')
plt.gca().invert_yaxis()  # highest at the top
plt.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.show()

## 7. Distribution of GDP per capita

This histogram shows how GDP per capita values are spread across all countries.

In [None]:
plt.figure(figsize=(8, 5))
df_clean['GDP_per_capita'].plot(kind='hist', bins=30)
plt.xlabel('GDP per capita')
plt.ylabel('Frequency')
plt.title('Distribution of GDP per Capita')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

## 8. Conclusions

From this analysis we can observe:

- Which UN regions have the highest and lowest average GDP per capita  
- Clear differences between the top and bottom countries  
- The overall distribution of global GDP per capita  

This is a simple but effective example of exploring and visualising real-world economic data using Pandas.