# Project: Global Demographics Data Scraper
## Overview
This project automates the extraction of geographic and demographic data for 250 countries. Unlike standard table-based scraping, this script parses data from unstructured HTML div containers (cards), creating a comprehensive dataset of world capitals, populations, and land areas.

URL Link = 'https://www.scrapethissite.com/pages/simple/'

## Methodology
HTML Parsing: Connects to the web source using requests and parses the DOM with BeautifulSoup.
Container Logic: Identifies independent country "cards" (div elements) rather than table rows.
Data Cleaning: Extracts text strings, strips whitespace, and organizes attributes (Capital, Population, Area).
Pipeline: Aggregates data into a Python list before converting to a Pandas DataFrame for optimized performance and CSV export.

## Library Used

•requests

•bs4 (BeautifulSoup)

•pandas

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

In [2]:
url = 'https://www.scrapethissite.com/pages/simple/'
page=requests.get(url)
soup= BeautifulSoup(page.text,'html')

In [3]:
headers = ['Country','Capital','Population','Area (Km^2)']
table = soup.find_all('div',class_='col-md-4 country')

In [4]:
data= []
for i in table:
    country = i.find('h3').text.strip()
    capital = i.find_all('span')[0].text.strip()
    population = i.find_all('span')[1].text.strip()
    area = i.find_all('span')[2].text.strip()
    row = [country,capital,population,area]
    data.append(row)

In [5]:
df = pd.DataFrame(data,columns = headers)

In [6]:
df

Unnamed: 0,Country,Capital,Population,Area (Km^2)
0,Andorra,Andorra la Vella,84000,468.0
1,United Arab Emirates,Abu Dhabi,4975593,82880.0
2,Afghanistan,Kabul,29121286,647500.0
3,Antigua and Barbuda,St. John's,86754,443.0
4,Anguilla,The Valley,13254,102.0
...,...,...,...,...
245,Yemen,Sanaa,23495361,527970.0
246,Mayotte,Mamoudzou,159042,374.0
247,South Africa,Pretoria,49000000,1219912.0
248,Zambia,Lusaka,13460305,752614.0


In [7]:
df.to_csv('Countries of the World.csv',index=False)