<a href="https://colab.research.google.com/github/Hugo-Mn/Data-Acquisition-project/blob/main/Colab_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data-Acquisition-project

## Table of Contents

- [Dataset Overview](#dataset)
  - [1. Main Dataset](#1-main-dataset)
  - [2. Web Data](#2-web-data)
  - [3. Combined Dataset](#3-combined-dataset-structure)
- [Visualization](#4-visualization)
- [ChatGPT Prompts and Responses](#chatgpt-prompts-and-responses)
- [Execution Instructions](#execution-instructions)
- [Setup Guide](./SETUP.md)

## Dataset

This project aims to merge two datasets: one from a CSV file and another from web scraping.

## 1. Main Dataset
*[world_population_data.csv](https://www.kaggle.com/datasets/sazidthe1/world-population-data)*

This dataset contains demographic information for 234 countries from 1970 to 2023.

### Main Dataset Structure (CSV)

| Column           | Description                              |
|-----------------|------------------------------------------|
| rank            | Country ranking by population            |
| cca3            | Three-letter country code               |
| country         | Country name                            |
| continent       | Continent name                          |
| 2023 population | Population in 2023                      |
| 2022 population | Population in 2022                      |
| 2020 population | Population in 2020                      |
| 2015 population | Population in 2015                      |
| 2010 population | Population in 2010                      |
| 2000 population | Population in 2000                      |
| 1990 population | Population in 1990                      |
| 1980 population | Population in 1980                      |
| 1970 population | Population in 1970                      |
| area (km²)      | Country area in square kilometers       |
| density (km²)   | Population density (people per km²)     |
| growth rate     | Population growth rate                  |
| world percentage| Percentage of world population          |


## 2. Web Data

Source: [countryeconomy.com](https://countryeconomy.com)

The website provides comprehensive data about countries, including CO2 emissions, demographic information, and energy consumption metrics.

Data categories and their paths:

| Category | Path | Description |
|----------|------|-------------|
| CO2 emissions (total and per capita) | `energy-and-environment/co2-emissions/` | Total and per capita CO2 emissions data by country |
| Birth and fertility rates | `demography/fertility/` | Birth rates and fertility statistics |
| Electricity metrics | `energy-and-environment/electricity-consumption/` | Generation and consumption of electricity |

Note: Data is collected for each year matching the main dataset's timeframe (1970-2023)

### Web Data Structure

| Column            | Description                                  |
|-------------------|----------------------------------------------|
| country           | Country name                                 |
| year              | Data collection year                         |
| co2_total         | Total CO2 emissions for the country          |
| co2_per_capita    | CO2 emissions per person                     |
| birth_rate        | Number of births per 1000 population         |
| generation_GW     | Total electricity generation in gigawatts    |
| consumption_GW    | Total electricity consumption in gigawatts   |

## 3. Combined Dataset Structure

This dataset merges key information from both sources to analyze the relationship between population growth, CO2 emissions, and energy consumption. It enables the study of potential correlations between demographic changes and environmental impacts across different countries and years.

| country | year | population | co2_total | co2_per_capita | birth_rate | generation_GW | consumption_GW |
|---------|------|------------|-----------|----------------|------------|---------------|----------------|
| france  | 1970 | xxx        | xxx       | xxx           | xxx        | xxx           | xxx           |
| germany | 1970 | xxx        | xxx       | xxx           | xxx        | xxx           | xxx           |
| ...     |      |            |           |               |            |               |               |

### Available Metrics

- **Population**: Total number of inhabitants in the country
- **CO2 Total**: Total CO2 emissions for the country (in metric tons)
- **CO2 per Capita**: CO2 emissions per person (in metric tons)
- **Birth Rate**: Number of births per 1000 population
- **Electricity Generation**: Total electricity production in gigawatts (GW)
- **Electricity Consumption**: Total electricity consumption in gigawatts (GW)

## 4. Visualization

### Growing Population and CO2 Emissions

If you have plotted some graphs and compared a few countries, you have already noticed that the growing population is not a really influential factor in the emissions of CO2. The best example possible is to plot a graph with *Germany*, *Bulgaria* and *Zambia*:

You can see for *Bulgaria* when the population number and the CO2 emissions stay close to each other across time, the simple observation could be to say that population is the main influence in CO2 Emissions... but observe the plots for *Germany* and *Zambia*: you can see that *Germany's* emissions still decrease but its population grows, and for *Zambia* it's more remarkable because the CO2 emissions stay the same but the population grows across time...

So you can observe this for every country in the world, the main factor in CO2 emissions is not the population but maybe the laws and how the governments fight against it.

### Generation and Consumption Electricity

For electricity data analysis, we examine both generation and consumption patterns across different countries and their correlation with population and economic development. Some countries like *France* show interesting patterns where electricity generation doesn't increase even as population and consumption rise. This suggests there is a strong correlation between consumption and population growth, but not necessarily between population and electricity generation.

## ChatGPT Prompts and Responses

### Prompt 1: Grammar Correction

**Question:** Fix grammar in the text while keeping the same meaning

**Response:**
The text was corrected while maintaining its original meaning and analysis of the relationship between population and CO2 emissions.

### Prompt 2: Dataset Structure

**Question:** Create a table structure for the main dataset columns

**Response:**

| Column           | Description                              |
|-----------------|------------------------------------------|
| rank            | Country ranking by population            |
| cca3            | Three-letter country code               |
| country         | Country name                            |
| continent       | Continent name                          |
| 2023 population | Population in 2023                      |
| 2022 population | Population in 2022                      |
| 2020 population | Population in 2020                      |
| 2015 population | Population in 2015                      |
| 2010 population | Population in 2010                      |
| 2000 population | Population in 2000                      |
| 1990 population | Population in 1990                      |
| 1980 population | Population in 1980                      |
| 1970 population | Population in 1970                      |
| area (km²)      | Country area in square kilometers       |
| density (km²)   | Population density (people per km²)     |
| growth rate     | Population growth rate                  |
| world percentage| Percentage of world population          |

### Prompt 3: Axis Formatting

**Question:** How to format axis values in millions instead of using scientific notation?

**Response:**

Function to format axis values in millions:

```python
@staticmethod
def millions_formatter(x, pos):
    return f'{x/1e6:.1f}M'
```

Implementation in the plot:

```python
sns.lineplot(data=dataset, x='Year', y='Population', hue='Country', ax=ax1, marker='o')
ax1.yaxis.set_major_formatter(ticker.FuncFormatter(self.millions_formatter))
ax1.set_ylabel('Population (Millions)')
```

### Prompt 4: Check Readme

**Question:** Check if the readme is Okay and fix syntax if you need

**Response:**

I have reviewed and improved the README file. Here are the changes made:

1. Document Structure:
   - Added proper section spacing
   - Fixed table formatting and alignment
   - Improved overall document organization

2. Content Improvements:
   - Fixed grammar and spelling in all sections
   - Enhanced the "Generation and Consumption Electricity" section
   - Added detailed descriptions for data relationships

3. Formatting Enhancements:
   - Standardized Markdown syntax
   - Improved heading hierarchy
   - Added consistent code block formatting
   - Fixed list indentation and spacing

4. Documentation Quality:
   - Ensured clear and professional language
   - Maintained consistent style throughout
   - Added proper technical terminology

The document now follows best practices for technical documentation and provides a clear overview of the project structure and findings.

### Prompt 5: Create Setup Guide

**Question:** Create a separate setup guide for Python and package installation instructions

**Response:**

I have created a new `SETUP.md` file with comprehensive installation instructions:

1. Python Installation Guide:
   - Download instructions
   - PATH configuration
   - Installation verification steps

2. Package Management:
   - Required packages list
   - Installation commands
   - Verification steps

3. Troubleshooting Section:
   - Common issues and solutions
   - Python 2 vs 3 considerations
   - Alternative installation methods

4. Documentation Quality:
   - Clear step-by-step instructions
   - Copy-paste ready commands
   - Platform-specific notes

The setup guide is now separated from the main README for better organization and clarity.

## Execution Instructions

1. Clone the repository:

   ```bash
   git clone https://github.com/Hugo-Mn/Data-Acquisition-project.git
   ```

2. Navigate to the project root:

   ```bash
   cd Data-Acquisition-project
   ```

3. Run the main script:

   ```bash
   python ./main.py
   ```

For Python installation and required packages setup, please refer to [SETUP.md](./SETUP.md).

In [1]:
!pip install pandas matplotlib seaborn requests pandastable

Collecting pandastable
  Downloading pandastable-0.14.0.tar.gz (242 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/242.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m242.2/242.2 kB[0m [31m16.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting odfpy>=1.4.1 (from pandas[excel]>=1.5->pandastable)
  Downloading odfpy-1.4.1.tar.gz (717 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m717.0/717.0 kB[0m [31m30.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting python-calamine>=0.1.7 (from pandas[excel]>=1.5->pandastable)
  Downloading python_calamine-0.5.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.1 kB)
Collecting pyxlsb>=1.0.10 (from pandas[excel]>=1.5->pandastable)
  Downloading pyxlsb-1.0.10-py2.py3-none-any.whl.metadata (2.5 kB)
Collecting xlsxwriter>=3.0.5 (

In [2]:
import sys
import urllib.request
import pandas as pd
from bs4 import BeautifulSoup
from tqdm import tqdm




class WebSiteFormater:

    def __init__(self, url, countries):
        self.url = url
        self.countries = countries
        self.years = sorted([str(i) for i in range(1970,2021, 10)] + [str(i) for i in range(2022,2024)], reverse=True)
        self.colNames = ["co2_total", "co2_per_capita", "birth_rate", "fertility_rate", "generation_GW", "consumption_GW"]
        self.lst_Info = ["energy-and-environment/co2-emissions/", "demography/fertility/", "energy-and-environment/electricity-consumption/"]
        self.lst_Info_Value = {}
        self.webDataSet = None
        self.keyweb = None

    def format_url(self, info, country):
        formatted_url = self.url
        formatted_url = formatted_url.replace("{info}", info)
        formatted_url = formatted_url.replace("{country}", country)
        return formatted_url

    def openUrl(self, url):
        try:
            self.keyweb = urllib.request.urlopen(url)
            webContent = self.keyweb.read()
            soup = BeautifulSoup(webContent, 'html.parser')
            return soup
        except:
            return None

    def foundInformation(self, id, website):
        infos = []
        allInformation = website.find_all("tbody")
        allInformation = allInformation[0].find_all("tr")
        if allInformation is None:
            return None
        for i in allInformation:
            line = i.find_all("td")
            if line[0].text in self.years:
                take = self.takeInfomation(line, id)
                infos.append(take)
        return infos

    def takeInfomation(self, lstTd, id):
        value = []
        lstTd = lstTd[1:]
        for i in range(len(lstTd)):
            if (id == 0 and i == 1) or (id != 0 and i == 2):
                continue
            if len(lstTd) > 3 and (i == 0 or i == 1 or i == 2):
                value.append(self.parseFloat(i , lstTd, False))
            elif len(lstTd) <= 3:
                value.append(self.parseFloat(i, lstTd, True))
            else:
                continue
        return value

    def parseFloat(self, id , lstTd, typeParse):
        if not typeParse:
            id = -(id + 1)
            if lstTd[id].text.strip() != '' and lstTd[id].text.strip() != None:
                return float(lstTd[id].text.strip().replace(",", "").replace("‰", ""))
            else:
                return None
        else:
            if lstTd[id].text.strip() != '' and lstTd[id].text.strip() != None:
                return float(lstTd[id].text.strip().replace(",", "").replace("‰", ""))
            else:
                return None

    def showError(self, error):
        for key, value in error.items():
            print (f"Error for {key}:")
            for v in value:
                print(f" - {v}")

    def getAllInformation(self):
        error = {}
        progress_bar = tqdm(self.countries, desc="Scraping Website")
        for country in progress_bar:
            progress_bar.set_description(f"Scraping Website: Processing {country}")
            country_info = []
            for i in range(len(self.lst_Info)):
                url = self.format_url(self.lst_Info[i], country)
                website = self.openUrl(url)
                if website is None:
                    error[self.lst_Info[i]] = error.get(self.lst_Info[i], []) + [country]
                    continue
                info = self.foundInformation(i, website)
                if info is None:
                    error["url"] = error.get("url", []) + [url]
                    continue
                if len(info) !=  len(self.years):
                    for i in range(len(self.years) - len(info)):
                        info.append([None, None])
                country_info.append(info)
            self.lst_Info_Value[country] = country_info
        self.showError(error)
        return 0

    def TransformToDataFrame(self):
        allInfo = self.getAllInformation()
        if allInfo != 0:
            return None

        categories = ['co2_total', 'co2_per_capita', 'birth_rate', 'fertility_rate', 'generation_GW', 'consumption_GW']

        data = []
        for country, country_data in self.lst_Info_Value.items():
            row = {'country': country}

            for id_category, category in enumerate(country_data):
                if category:
                    start_categories = id_category * 2
                    category_categories = categories[start_categories:start_categories + 2]
                    for id_year, year in enumerate(self.years):
                        if id_year < len(category):
                            year_values = category[id_year]
                            for id_subcategory, subcategory in enumerate(category_categories):
                                if id_subcategory < len(year_values):
                                    col_name = f"{subcategory}_{year}"
                                    row[col_name] = year_values[id_subcategory]

            data.append(row)
        df = pd.DataFrame(data)

        columns = ['country']
        for category in categories:
            for year in self.years:
                columns.append(f"{category}_{year}")

        df = df[columns]

        self.webDataSet = df

    def getWebDataset(self):
        if self.webDataSet is None:
            self.TransformToDataFrame()
            self.keyweb.close()
        return self.webDataSet

In [3]:
import pandas as pd
import os
import tkinter as tk
from tkinter import ttk
from pandastable import Table

class DatasetManager():
    def __init__(self, localPath="", websiteUrl= ""):
        self.datasets = {}
        self.website_url = websiteUrl
        self.local_path = localPath


    def set_local_path(self, path):
        self.local_path = path
        print(f"set local path to : {path}")

    def set_website_url(self, url):
        self.website_url = url
        print(f"set website url to : {url}")

    def initialize_local_dataset(self, delimiter=','):
        if self.local_path == "" or self.local_path is None:
            self.local_path = input(f"actual folder {os.getcwd()} insert the path to your dataset")
        try:

            dataset = pd.read_csv(self.local_path)
            if dataset.shape is not None:
                self.datasets["local"] = dataset
            else:
                print("adding into dict failed")
            return 0
        except:
            print(f"Reading dataset failed ")
            return 84

    def getAllCountries(self):
        if "local" not in self.datasets:
            print("Please init your dataset before")
            return None

        list_countries = [country.lower().replace(" ", "-") for country in self.datasets["local"]['country'].unique()]

        return list_countries

    def initWebDataset(self):
        if self.website_url == "" or self.website_url is None:
            self.website_url = input("insert the url to your dataset")

        webFormatter = WebSiteFormater(self.website_url, self.getAllCountries())
        webDataset = webFormatter.getWebDataset()
        if webDataset.shape is not None:
            self.datasets["web"] = webDataset
        else:
            print("adding into dict failed")
        return 0

    def setStartupParameters(self):
        self.datasets['local']['country'] = self.datasets['local']['country'].str.lower()
        self.datasets['web']['country'] = self.datasets['web']['country'].str.lower()
        years = sorted([col.replace(" population", "") for col in self.datasets['local'].columns
                        if "population" in col])
        metrics = ['population', 'co2_total', 'co2_per_capita', 'birth_rate',
                    'fertility_rate', 'generation_GW', 'consumption_GW']
        countries = set(self.datasets['local']['country']) & set(self.datasets['web']['country'])
        return years, metrics, countries

    def createMergedDataFrame(self, merged_df):
        country_col = merged_df.pop('country') #remove coountry name value
        merged_df.index = country_col #set country name value like index
        cols = [col for col in merged_df.columns if isinstance(col, tuple)]
        merged_df.columns = pd.MultiIndex.from_tuples(cols, names=['metric', 'year'])
        merged_df = merged_df.reindex(sorted(merged_df.columns), axis=1)
        self.datasets["merged"] = merged_df
        return 0

    def mergeDatasets(self):
        if "local" not in self.datasets or "web" not in self.datasets:
            print("Please init both datasets before merging")
            return 1
        data = []
        years, metrics, countries = self.setStartupParameters()
        for country in countries:
            local_data = self.datasets['local'][self.datasets['local']['country'] == country] #use mask to return when the correct row
            web_data = self.datasets['web'][self.datasets['web']['country'] == country]
            if not local_data.empty and not web_data.empty:
                row_data = {'country': country}
                if 'area (km²)' in local_data:
                    area = local_data['area (km²)'].iloc[0]
                row_data[('area (km²)', '')] = area
                for year in years:
                    pop_col = f"{year} population"
                    if pop_col in local_data.columns:
                        row_data[('population', year)] = local_data[pop_col].iloc[0]

                    for metric in metrics[1:]:
                        col_name = f"{metric}_{year}"
                        if col_name in web_data.columns:
                            row_data[(metric, year)] = web_data[col_name].iloc[0]
                data.append(row_data)

        merged_df = pd.DataFrame(data)
        if merged_df.empty:
            print("\nERROR: Merge resulted in empty dataset!")
            return 1
        self.createMergedDataFrame(merged_df)
        return 0


In [4]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.ticker as ticker


class PlotManager:
    def __init__(self):
        sns.set_theme(style="darkgrid")  # More visible grid
        sns.set_context("notebook", font_scale=1.2)  # Larger fonts
        sns.set_palette("deep")  # Deep color

        plt.rcParams['figure.figsize'] = [12, 8]
        plt.rcParams['axes.titlesize'] = 14
        plt.rcParams['axes.labelsize'] = 12
        plt.rcParams['lines.linewidth'] = 2.5
        plt.rcParams['lines.markersize'] = 8

        self.years = sorted([str(i) for i in range(1970,2021, 10)] + [str(i) for i in range(2022,2024)])


    def takeMainInformation(self, dataFiltered, dataset, show=True):
        all_data = []
        for country in dataFiltered.index:
            try:
                for year in self.years:
                    data_point = {
                        'Year': int(year),
                        'Country': country
                    }
                    if ('population', year) in dataFiltered.columns:
                        population = dataFiltered.loc[country, ('population', year)]
                        if pd.notna(population):
                            data_point['Population'] = population
                    if ('co2_total', year) in dataFiltered.columns:
                        co2 = dataFiltered.loc[country, ('co2_total', year)]
                        if pd.notna(co2):
                            data_point['CO2 Total'] = co2
                    if ('co2_per_capita', year) in dataFiltered.columns:
                        co2_per_capita = dataFiltered.loc[country, ('co2_per_capita', year)]
                        if pd.notna(co2_per_capita):
                            data_point['CO2 Per Capita'] = co2_per_capita
                    if ('birth_rate', year) in dataFiltered.columns:
                        birth_rate = dataFiltered.loc[country, ('birth_rate', year)]
                        if pd.notna(birth_rate):
                            data_point['Birth Rate'] = birth_rate

                    if len(data_point) > 2:  # More than just Year and Country
                        all_data.append(data_point)

            except Exception as e:
                print(f"Warning: Error processing {country}: {e}")
                continue
        return all_data

    def takeadditionalInformation(self, dataFiltered, dataset, show=True):
        all_data = []
        for country in dataFiltered.index:
            try:
                for year in self.years:
                    data_point = {
                        'Year': int(year),
                        'Country': country
                    }
                    if ('generation_GW', year) in dataFiltered.columns:
                        generation = dataFiltered.loc[country, ('generation_GW', year)]
                        if pd.notna(generation):
                            data_point['Electricity Generation (GW)'] = generation
                    if ('consumption_GW', year) in dataFiltered.columns:
                        consumption = dataFiltered.loc[country, ('consumption_GW', year)]
                        if pd.notna(consumption):
                            data_point['Electricity Consumption (GW)'] = consumption
                    if len(data_point) > 2:  # More than just Year and Country
                        all_data.append(data_point)

            except Exception as e:
                print(f"Warning: Error processing {country}: {e}")
                continue
        return all_data

    def plotlocal(self, dataset, countries, show=True):
        if not isinstance(countries, list):
            print("Error: countries must be a list")
            return None

        if len(countries) < 1:
            print("Error: Please provide at least one country")
            return None

        countries = [country.lower() for country in countries]
        dataset = dataset.copy()
        df_filtered = dataset.loc[countries]
        if df_filtered.empty:
            print("Error: None of the specified countries were found in the dataset")
            print("Available countries:", sorted(dataset['country'].unique()))
            return None

        all_data = self.takeMainInformation(df_filtered, dataset, show)
        additional_data = self.takeadditionalInformation(df_filtered, dataset, show)
        all_data.extend(additional_data)

        df = pd.DataFrame(all_data)
        if df.empty:
            print("Error: No data available to plot")
            return None
        self.plotCO2AndPopulation(df, countries, show)
        self.plotElectricityData(df, countries, show)
        plt.show()

    @staticmethod
    def millions_formatter(x, pos):
        return f'{x/1e6:.1f}M'

    def plotCO2AndPopulation(self, dataset, countries, show=True):
        sns.set_style("whitegrid", {'grid.linestyle': '--'})
        n_colors = len(countries)
        sns.color_palette("husl", n_colors)
        fig1, ax1 = plt.subplots(figsize=(12, 6))

        sns.lineplot(data=dataset, x='Year', y='Population', hue='Country',
            ax=ax1, marker='o')

        ax1.yaxis.set_major_formatter(ticker.FuncFormatter(self.millions_formatter))
        ax1.set_ylabel('Population')

        ax2 = ax1.twinx()
        sns.lineplot(data=dataset, x='Year', y='CO2 Total', hue='Country',
                               ax=ax2, linestyle='--', marker='s')
        ax2.set_ylabel('CO2 Total')

        plt.title('Population et CO2 Total par Pays')
        legend_labels = []
        handles = []
        pop_handles = ax1.lines
        co2_handles = ax2.lines
        for i, country in enumerate(countries):
            legend_labels.extend([f'Population - {country}', f'CO2 Total - {country}'])
            handles.extend([pop_handles[i], co2_handles[i]])

        ax1.legend(handles, legend_labels, title='Métriques par Pays',
                  bbox_to_anchor=(1.15, 1))
        ax2.get_legend().remove()
        fig1.tight_layout()

    def plotElectricityData(self, dataset, countries, show=True):
        fig2, ax3 = plt.subplots(figsize=(12, 6))

        sns.lineplot(data=dataset, x='Year', y='Electricity Generation (GW)',
                    hue='Country', ax=ax3, marker='o')
        sns.lineplot(data=dataset, x='Year', y='Electricity Consumption (GW)',
                    hue='Country', ax=ax3, linestyle='--', marker='s')

        plt.title('Génération et Consommation d\'Électricité par Pays')
        ax3.set_ylabel('GW')
        legend_labels = []
        for country in countries:
            legend_labels.extend([f'Generation - {country}', f'Consumption - {country}'])

        handles = ax3.lines[::2] + ax3.lines[1::2]
        ax3.legend(handles, legend_labels, title='Métriques par Pays',
                  bbox_to_anchor=(1.15, 1))
        fig2.tight_layout()

In [5]:
class Interface:
    def __init__(self):
        self.local_path = "./Dataset/world_population_data.csv"
        self.website_url = "https://countryeconomy.com/{info}/{country}"
        self.DatasetManager = DatasetManager(localPath=self.local_path, websiteUrl=self.website_url)
        self.PlotManager = PlotManager()

    def print_available_countries(self, dataset):
        print("\nCountries list available:")
        available_countries = sorted(dataset.index.unique())

        max_len = max(len(f"{i}. {country.capitalize()}") for i, country in enumerate(available_countries, 1))
        col_width = max_len + 4

        n_countries = len(available_countries)
        n_rows = (n_countries + 2) // 5

        for row in range(n_rows):
            line = ""
            for col in range(5):
                idx = row + col * n_rows
                if idx < n_countries:
                    country = available_countries[idx]
                    item = f"{idx + 1}. {country.capitalize()}"
                    line += item.ljust(col_width)
            print(line.rstrip())
        return available_countries

    def select_countries(self, dataset):
        available_countries = self.print_available_countries(dataset)
        selected_countries = []

        while len(selected_countries) < 5:
            print(f"\nSelect a country (1-{len(available_countries)}) or type 'done' to finish:")
            choice = input("> ").strip().lower()

            if choice == 'done':
                if len(selected_countries) == 0:
                    print("Please select at least one country.")
                    continue
                break

            try:
                index = int(choice) - 1
                if 0 <= index < len(available_countries):
                    country = available_countries[index]
                    if country in selected_countries:
                        print(f"{country.capitalize()} is already selected.")
                    else:
                        selected_countries.append(country)
                        print(f"{country.capitalize()} added. {len(selected_countries)}/5 countries selected.")
                        if len(selected_countries) == 5:
                            print("Maximum number of countries reached (5).")
                            break
                else:
                    print("Invalid choice. Please select a number from the list.")
            except ValueError:
                print("Invalid input. Please enter a number or 'done' to end.")

        print("\nSelected countries:")
        for country in selected_countries:
            print(f"- {country.capitalize()}")

        return selected_countries

    def main_loop(self, datasetManager, plotManager):
        exit = False
        while not exit:
            print("\nMenu :")
            print("1. (s)elect countries to plot")
            print("2. (e)xit")
            choice = input("> ").strip()
            if choice == '1' or choice.lower().startswith('s'):
                selected_countries = self.select_countries(datasetManager.datasets["merged"])
                if selected_countries:
                    plotManager.plotlocal(datasetManager.datasets["merged"], selected_countries)
                else:
                    print("No countries selected.")
            elif choice == '2' or choice.lower().startswith('e'):
                print("Exiting...")
                exit = True
            else:
                print("Invalid choice. Please select 1 or 2.")

    def init_Managers(self):
        print("Initializing Dataset Manager...")
        print("Loading local dataset...")
        if self.DatasetManager.initialize_local_dataset() != 0:
            print("Local dataset initialization failed.")
            return None
        else:
            print("Local dataset initialized successfully.")

        if self.DatasetManager.initWebDataset() != 0:
            print("Web dataset initialization failed.")
            return None
        else:
            print("Web dataset initialized successfully.")
        print("Merging datasets...")
        self.DatasetManager.mergeDatasets()
        print("Datasets merged successfully.")

    def run_interface(self):
        self.init_Managers()
        self.main_loop(self.DatasetManager, self.PlotManager)
        return 0

### This is the main of the program


In [None]:
#!/usr/bin/env python3
import sys
def main():
    interface = Interface()
    interface.run_interface()
    return 0


if __name__ == "__main__":
    sys.exit(main())
    main()

Initializing Dataset Manager...
Loading local dataset...
Local dataset initialized successfully.


Scraping Website: Processing jamaica:  59%|█████▉    | 138/234 [16:43<07:40,  4.80s/it]