## Table Extraction

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

# Fetching the webpage
url = 'https://www.worldometers.info/world-population/population-by-country/'
response = requests.get(url)

#Parsing the HTML content
soup = BeautifulSoup(response.text, 'html.parser')

#Locating the table with the class 'table'
table = soup.find('table', {'class': 'table'})



In [None]:
# Extracting the table headers (Country and Population)
headers = [header.text.strip() for header in table.find_all('th')]

# Extracting the rows (country and population data)
rows = []
for row in table.find_all('tr')[1:]:  # Skip header row
    columns = row.find_all('td')
    if len(columns) >= 4:  # Only keep rows with at least two columns
        country = columns[1].text.strip()
        population = columns[2].text.strip().replace(',', '')
        fertility_rate = columns[8].text.strip()
        land_area = columns[6].text.strip().replace(',', '')
        rows.append([country, population, fertility_rate, land_area])


In [None]:
# Create a DataFrame
df = pd.DataFrame(rows, columns=['Country', 'Population', 'Fertility Rate', 'Land Area'])

# Convert numeric columns to appropriate data types
df['Population'] = pd.to_numeric(df['Population'], errors='coerce')
df['Fertility Rate'] = pd.to_numeric(df['Fertility Rate'], errors='coerce')
df['Land Area'] = pd.to_numeric(df['Land Area'], errors='coerce')

# Display the DataFrame
print("\nExtracted Country Data:")
print(df.head())  # Display the first few rows


Extracted Country Data:
         Country  Population  Fertility Rate  Land Area
0          India  1450935791             2.0    2973190
1          China  1419321278             1.0    9388211
2  United States   345426571             1.6    9147420
3      Indonesia   283487931             2.1    1811570
4       Pakistan   251269164             3.5     770880


This dataset contains population figures for countries across the globe. The data is sourced from Worldometer and provides insights into the relative sizes of populations, highlighting the most populous countries and comparing regions. With the total world population exceeding 8 billion, this data helps in understanding the demographic distribution, trends, and potential implications for global development.

## Exploratory Data Analysis (EDA)
###### Exploratory Data Analysis (EDA) is a crucial step in understanding the dataset's structure, identifying patterns, and uncovering insights that might not be immediately apparent. In this analysis, we focus on global population data, examining the distribution and size of populations across different countries. By using EDA techniques, we aim to gain a deeper understanding of the global demographic landscape and its potential implications. The insights derived from this analysis will provide valuable context for exploring trends, challenges, and opportunities at both the country and continental levels.



In [None]:
# Checking the shape of the dataset
print(f"Dataset Shape: {df.shape}")


Dataset Shape: (234, 2)


In [None]:
# Checking for missing values
print(df.isnull().sum())


Country           0
Population        0
Fertility Rate    0
Land Area         0
dtype: int64


In [None]:
# Data types of each column
print(df.dtypes)

Country            object
Population          int64
Fertility Rate    float64
Land Area           int64
dtype: object


In [None]:
# Displaying the first few rows to get a quick overview
print(df.head())

               # Country (or dependency)
0          India           1,450,935,791
1          China           1,419,321,278
2  United States             345,426,571
3      Indonesia             283,487,931
4       Pakistan             251,269,164


### Objective 1: Analyze Population Distribution Across Countries
Understanding how the population is distributed globally helps identify regions with high population density and areas with sparse populations. This analysis can guide decisions in resource allocation, urban planning, and development priorities. It also aids in spotting demographic trends that impact economic and environmental policies.

In [None]:

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Sort data by population in descending order
df['Population'] = df['Population'].astype(int)  # Ensure Population is numeric
df_sorted = df.sort_values(by='Population', ascending=False).head(10)  # Top 10 countries by population



In [None]:
# Plot bar chart
fig = px.bar(df_sorted,
             x='Country',
             y='Population',
             title='Top 10 Most Populous Countries',
             labels={'Population': 'Population (in billions)'},
             color='Population',
             text='Population')
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
fig.update_layout(title_font_size=18, xaxis_title="Country", yaxis_title="Population")
fig.show()




India and China are the only countries to pass the 1 billion polulation Mark while Nigeria and Ethopia are the only countries in the top 10.

### Objective 2: Visualize the Relationship Between Fertility Rate, Population, and Land Area Using a Bubble Chart
The objective is to understand the interplay between a country’s fertility rate, population size, and land area. These three factors are critical in demographic studies, economic planning, and understanding global population dynamics.

In [None]:
import plotly.express as px

# Assuming 'df' is your DataFrame with 'Country', 'Fertility Rate', 'Population', and 'Land Area'
# Creating the Bubble chart with Fertility Rate, Population, and Land Area as the size and axes

fig = px.scatter(df,
                 x='Population',            # X-axis: Population
                 y='Fertility Rate',        # Y-axis: Fertility Rate
                 size='Land Area',          # Bubble size: Land Area
                 color='Fertility Rate',    # Color based on fertility rate
                 hover_name='Country',      # Hover to show Country name
                 title="Bubble Chart: Fertility Rate vs Population with Land Area Size",
                 labels={"Fertility Rate": "Fertility Rate", "Population": "Population (Billions)", "Land Area": "Land Area (km²)"},
                 color_continuous_scale='Viridis', # Color scale
                 template="plotly_dark")   # Dark theme for aesthetics

# Show the plot
fig.show()


Countries with higher Fertility Rates do not appear to neassecarily have higher population. this may also mean that fertility rates in counties with already high population is decreasing.

### Objective 3: Evaluate the Influence of Land Area on Population Density
Land area significantly impacts population density, affecting infrastructure development, housing, and environmental sustainability. This objective aims to assess whether larger countries necessarily support higher populations or if population density depends more on other factors, such as economic opportunities or geographic constraints.

In [None]:
# Calculate Population Density
df['Population Density'] = df['Population'] / df['Land Area']  # Population density = Population / Land Area

# Interactive Density Map
fig = px.scatter_geo(df,
                     locations="Country",
                     locationmode="country names",
                     size="Population Density",
                     color="Population Density",
                     title="Global Population Density",
                     projection="natural earth",
                     hover_name="Country")
fig.update_layout(title_font_size=18)
fig.show()


Countries like Monaco, Singapore, and Bangladesh have some of the highest population densities.
Russia, Canada, and Australia have extremely low population densities due to their vast land areas.

## Recommendations

1. Focus on Sustainable Development in High-Population Countries:
India and China, with populations exceeding 1 billion, should prioritize sustainable urban planning, efficient resource management, and policies to address overpopulation challenges such as housing, healthcare, and education.

2. Strengthen Infrastructure in Rapidly Growing African Nations:
Nigeria and Ethiopia, as the only African countries in the top 10, should focus on expanding infrastructure, particularly in urban areas, to support their rapidly growing populations. Investments in transportation, energy, and water resources are critical.

3. Promote Regional Collaboration for Population Management:
Encourage countries with significant populations to collaborate regionally on shared challenges such as food security, migration, and climate adaptation to leverage collective resources and expertise.

4. Invest in Family Planning and Education in High-Fertility, Low-Population Nations:
Stakeholders in countries with high fertility rates but low overall population should focus on improving access to family planning services and educational programs. These efforts can empower individuals to make informed reproductive choices, contributing to sustainable population growth.

5. Strengthen Healthcare Systems to Address Demographic Transitions:
In nations with decreasing fertility rates and high populations, stakeholders should prioritize investments in healthcare systems, focusing on maternal and child health, to sustain population stability while supporting demographic transitions.

6. Encourage Policy Adjustments Based on Population Dynamics:
Governments and development organizations should tailor policies to the unique population trends in each country. For instance, incentivizing higher fertility in aging populations or moderating growth in regions where resources are under strain can help maintain economic balance and resource sustainability.

7. Optimize Urban Planning in High-Density Countries:
Countries like Monaco, Singapore, and Bangladesh should focus on innovative urban planning strategies, such as vertical housing, efficient public transportation systems, and green infrastructure, to manage space constraints while maintaining quality of life.

8. Promote Regional Development in Low-Density Nations:
In countries with vast land areas and low population densities, like Russia, Canada, and Australia, stakeholders should consider policies to incentivize settlement and economic activities in underutilized regions. This could include investments in transportation, communication networks, and resource management.

9. Strengthen Environmental Conservation Efforts:
Both high- and low-density countries should emphasize environmental conservation. High-density nations can promote sustainable development to mitigate the impacts of overcrowding, while low-density countries can leverage their vast natural resources responsibly to preserve ecosystems.