# Country Codes & Continents: A Dataset with ISO 3166-1 Alpha-2

This notebook aims to create a dataset of countries, their corresponding ISO 3166-1 Alpha-2 codes, and their respective continents. 

**Key Features:**

* Utilizes the ISO 3166-1 Alpha-2 standard for country codes.
* Includes a comprehensive list of countries and their associated continents.
* Provides a clean and organized dataset for various data analysis and mapping projects.

**Potential Use Cases:**

* Geocoding and mapping applications.
* Data analysis and visualization projects.
* Internationalization and localization tasks.
* Building applications that require country-specific information.

This notebook demonstrates a simple and efficient approach to gathering and organizing country-related data. 

**Note:** 

* Data sources may vary, and the accuracy of the information should be verified independently. 
<!-- * This is a basic example, and you can further enhance it by adding more details such as country names, currencies, or time zones. -->

In [1]:
import requests
from IPython.display import display,JSON
from bs4 import BeautifulSoup

# [Territories by Continent](https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_by_continent)

In [2]:
url = "https://en.wikipedia.org/w/api.php"

params = {
    "action": "parse",
    "page": "List_of_sovereign_states_and_dependent_territories_by_continent",
    "format": "json",
    "prop": "text"
}

response = requests.get(url, params=params)
data = response.json()

html_content = ""
if "parse" in data:
    # print(data["parse"]["text"]["*"])  # HTML content of the page
    html_content  = data["parse"]["text"]["*"]
else:
    print("Failed to fetch data.")
soup = BeautifulSoup(html_content,"lxml")

# Extract Continent Names
elements = soup.find_all('div', class_="mw-heading mw-heading2")
continents = [element.text.split('[')[0].strip() for element in elements[1:8]]  # Only take first 7 continent names

# Find all tables corresponding to continents
tables = soup.find_all("table", class_="sortable wikitable")

# Dictionary to store results
continent_data = {}

for continent, table in zip(continents, tables):
    continent_data[continent] = []
    rows = table.find_all("tr")[3:]  # Skip header rows

    for row in rows:
        cols = row.find_all("td")
        if cols:
            country_tag = cols[0].find("a")  # Find <a> tag inside first <td>
            if country_tag:
                country_name = country_tag.text.strip()

                # **Filtering out unwanted entries**
                if "See" in country_name or "Dependent" in country_name or "UN member states" in country_name:
                    continue  # Skip irrelevant entries

                continent_data[continent].append(country_name)

# Print Cleaned Data
# for continent, countries in continent_data.items():
    # print(f"\n{continent}:")
    # print(", ".join(countries))


# print(continent_data)

# [ISO 3166-1 alpha-2](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2#References)

In [3]:
url = "https://en.wikipedia.org/w/api.php"
params = {
    "action": "parse",
    "page": "ISO_3166-1_alpha-2",
    "format": "json",
    "prop": "text"
}

response = requests.get(url, params=params)
data = response.json()
page_content = ""
if "parse" in data:
    page_content = data["parse"]["text"]["*"]  # HTML content of the page
    # print(page_content[:1000])  # Preview first 1000 characters
else:
    print("Failed to fetch data.")
soup = BeautifulSoup(page_content,"lxml")
ISO_3166_1_Alpha_2 = {}

# Find the header and then the table
class_legal = soup.find('h3', id='Officially_assigned_code_elements')
if not class_legal:
    raise ValueError("Section with id 'Officially_assigned_code_elements' not found.")

table = class_legal.find_next("table")
if not table:
    raise ValueError("Table not found after the header.")

# Iterate over each table row
rows = table.find_all("tr")
for row in rows:
    cells = row.find_all("td")
    # Ensure the row has at least 2 cells
    if len(cells) < 2:
        continue
    
    # Try to get the code from the first cell
    code_span = cells[0].find('span', class_="monospaced")
    if code_span:
        key = code_span.text.strip()
    else:
        key = ""
    
    # Try to get the country name from the second cell
    link = cells[1].find('a')
    if link:
        value = link.text.strip()
    else:
        value = ""
    
    # Only add to dictionary if the key is not empty
    if key:
        ISO_3166_1_Alpha_2[key] = value

print(len(ISO_3166_1_Alpha_2))

249
