In [1]:
import pandas as pd
import matplotlib.pyplot as plt

 We import pandas for data manipulation and matplotlib.pyplot for creating visualizations.

In [None]:

def import_data(file_path):
    data = pd.read_csv(file_path)
    return data

We define a simple function import_data that takes a file path as input and reads a CSV file into a pandas DataFrame.

In [None]:
gdp_data = import_data('GDP.csv')   
pop_data = import_data('POP.csv')  

We use the import_data function to load two datasets: one for GDP (GDP.csv) and one for Population (POP.csv).

In [None]:
print("GDP数据列名：")
print(gdp_data.columns)

print("\n人口数据列名：")
print(pop_data.columns)

We print the column names of both datasets to check the structure and identify the relevant columns for analysis.

In [None]:
gdp_data = gdp_data[gdp_data['Gross domestic product 2022'].apply(pd.to_numeric, errors='coerce').notna()]
pop_data = pop_data[pop_data['Population 2022'].apply(pd.to_numeric, errors='coerce').notna()]

We remove any rows where the GDP or Population values are missing or non-numeric, ensuring our analysis is based on valid numbers.

In [None]:
gdp_selected = gdp_data[['Unnamed: 0', 'Gross domestic product 2022']]
pop_selected = pop_data[['Unnamed: 0', 'Population 2022']]

We select only the columns containing country names and the 2022 GDP or Population data for further analysis.

In [None]:
gdp_selected = gdp_selected.rename(columns={'Unnamed: 0': 'Country Name', 'Gross domestic product 2022': 'GDP_2022'})
pop_selected = pop_selected.rename(columns={'Unnamed: 0': 'Country Name', 'Population 2022': 'Population_2022'})

We rename the columns to make them more understandable and easier to reference in the next steps.

In [None]:
merged_data = pd.merge(gdp_selected, pop_selected, on='Country Name')

We merge the two datasets on the Country Name column to create a combined dataset containing both GDP and Population information for each country.

In [None]:
print("\n合并后的数据预览：")
print(merged_data.head())

We print the first few rows of the merged dataset to verify that the merge was successful.

In [None]:
print("\n描述性统计：")
print(merged_data.describe())

We use describe() to get basic descriptive statistics (count, mean, std, min, max, etc.) for the merged dataset.

In [None]:
top10_gdp = merged_data.sort_values(by='GDP_2022', ascending=False).head(10)

We sort the dataset by GDP_2022 in descending order and select the top 10 countries with the highest GDP values.

In [None]:
plt.figure(figsize=(10,6))
plt.bar(top10_gdp['Country Name'], top10_gdp['GDP_2022'])
plt.xticks(rotation=45, ha='right', fontsize=10)
plt.title('Top 10 GDP Countries in 2022')
plt.xlabel('Country')
plt.ylabel('GDP (Current US$)')
plt.tight_layout()
plt.show()

We create a bar chart to visualize the top 10 countries by GDP in 2022. The x-axis shows the country names and the y-axis shows the GDP values.