# Country Area and Population Density Analysis

This notebook analyzes the relationship between country areas and population densities using the world population dataset.

In [None]:
# Import required libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set style for better visualizations
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 8)

# Load the data
df = pd.read_csv('world_population.csv')

# Display basic information about the dataset
print(f"Total countries in dataset: {len(df)}")
print("\nFirst few rows of the dataset:")
display(df.head())

## 1. Basic Statistics
Let's look at the basic statistics for area and population density.

In [None]:
# Display statistics for area and density
print("\n--- Area (km²) Statistics ---")
print(df['Area (km²)'].describe().to_string())

print("\n--- Population Density (per km²) Statistics ---")
print(df['Density (per km²)'].describe().to_string())

## 2. Top and Bottom Countries
Let's identify the countries with the largest and smallest areas, as well as the highest and lowest population densities.

In [None]:
# Find top and bottom countries by area and density
top_largest = df.nlargest(10, 'Area (km²)')[['Country/Territory', 'Area (km²)']]
top_smallest = df.nsmallest(10, 'Area (km²)')[['Country/Territory', 'Area (km²)']]
top_dense = df.nlargest(10, 'Density (per km²)')[['Country/Territory', 'Density (per km²)']]
least_dense = df.nsmallest(10, 'Density (per km²)')[['Country/Territory', 'Density (per km²)']]

# Display the results
print("\n--- 10 Largest Countries by Area ---")
display(top_largest)

print("\n--- 10 Smallest Countries by Area ---")
display(top_smallest)

print("\n--- 10 Most Densely Populated Countries ---")
display(top_dense)

print("\n--- 10 Least Densely Populated Countries ---")
display(least_dense)

## 3. Visualizing the Data
Let's create visualizations to better understand the distribution and relationship between area and population density.

In [None]:
# Create a figure with multiple subplots
plt.figure(figsize=(18, 12))

# Plot 1: Area distribution (log scale)
plt.subplot(2, 2, 1)
sns.histplot(data=df, x='Area (km²)', bins=50, kde=True)
plt.xscale('log')
plt.title('Distribution of Country Areas (log scale)')
plt.xlabel('Area (km², log scale)')

# Plot 2: Density distribution (log scale)
plt.subplot(2, 2, 2)
sns.histplot(data=df, x='Density (per km²)', bins=50, kde=True)
plt.xscale('log')
plt.title('Distribution of Population Density (log scale)')
plt.xlabel('Density (per km², log scale)')

# Plot 3: Area vs Density (log-log scale)
plt.subplot(2, 1, 2)
sns.scatter(data=df, x='Area (km²)', y='Density (per km²)', alpha=0.6)
plt.xscale('log')
plt.yscale('log')
plt.title('Area vs Population Density (log-log scale)')
plt.xlabel('Area (km², log scale)')
plt.ylabel('Density (per km², log scale)')

# Adjust layout and display
plt.tight_layout()
plt.show()

## 4. Correlation Analysis
Let's examine the correlation between area and population density.

In [None]:
# Calculate correlation
correlation = df[['Area (km²)', 'Density (per km²)']].corr().iloc[0, 1]
print(f"Correlation between area and population density: {correlation:.4f}")

# Create a correlation heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(df[['Area (km²)', 'Density (per km²)']].corr(), 
            annot=True, 
            cmap='coolwarm', 
            center=0,
            square=True)
plt.title('Correlation Heatmap')
plt.tight_layout()
plt.show()

## 5. Outlier Analysis
Let's identify and examine outliers in the data.

In [None]:
# Function to identify outliers using IQR method
def find_outliers(series):
    Q1 = series.quantile(0.25)
    Q3 = series.quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    return (series < lower_bound) | (series > upper_bound)

# Find outliers in area and density
area_outliers = df[find_outliers(df['Area (km²)'])]
density_outliers = df[find_outliers(df['Density (per km²)'])]

print("\n--- Countries with Unusually Large/Small Areas ---")
display(area_outliers[['Country/Territory', 'Area (km²)']].sort_values('Area (km²)', ascending=False))

print("\n--- Countries with Unusually High/Low Population Density ---")
display(density_outliers[['Country/Territory', 'Density (per km²)']].sort_values('Density (per km²)', ascending=False))

## 6. Continent-wise Analysis
Let's analyze how area and population density vary by continent.

In [None]:
# Group by continent and calculate statistics
continent_stats = df.groupby('Continent').agg({
    'Area (km²)': ['count', 'mean', 'median', 'std'],
    'Density (per km²)': ['mean', 'median', 'std']
}).round(2)

print("\n--- Continent-wise Statistics ---")
display(continent_stats)

# Create boxplots for area and density by continent
plt.figure(figsize=(16, 12))

plt.subplot(2, 1, 1)
sns.boxplot(data=df, x='Continent', y='Area (km²)')
plt.yscale('log')
plt.title('Distribution of Area by Continent (log scale)')
plt.xticks(rotation=45)

plt.subplot(2, 1, 2)
sns.boxplot(data=df, x='Continent', y='Density (per km²)')
plt.yscale('log')
plt.title('Distribution of Population Density by Continent (log scale)')
plt.xticks(rotation=45)

plt.tight_layout()
plt.show()

## 7. Conclusion
Based on the analysis, we can draw the following conclusions:
1. **Area Distribution**: Most countries have relatively small areas, with a few very large countries (like Russia, Canada, China, USA, Brazil) that are outliers.
2. **Population Density**: There is a wide range of population densities, from very sparsely populated countries (like Greenland and Mongolia) to extremely dense city-states (like Macau and Monaco).
3. **Correlation**: There is a weak negative correlation between a country's area and its population density, meaning that larger countries tend to be less densely populated, but there are many exceptions.
4. **Continent-wise Differences**: 
   - Asia has the highest average population density, while Oceania has the lowest.
   - The Americas show the greatest variation in country sizes.
   - Europe has relatively consistent country sizes compared to other continents.
5. **Outliers**: The analysis identified several interesting outliers, including very small but densely populated city-states and large, sparsely populated countries like Russia, Canada, and Australia.