<a href="https://colab.research.google.com/github/MehrNoushR/Mehrnoush/blob/main/GDP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


**Data Loading**

In [None]:
# Load the datasets
inequality_path = 'inequality.csv'
gdp_path ='gdp-per-capita-penn-world-table.csv'

In [None]:
# Reading the CSV files into pandas dataframes
inequality_df = pd.read_csv(inequality_path)
gdp_df = pd.read_csv(gdp_path)

In [None]:
# Let's inspect the first few rows of each dataset to understand their structure
inequality_head = inequality_df.head()
gdp_head = gdp_df.head()

In [None]:
inequality_head

The inequality.csv dataset contains several columns, including:

Country, year, Gini coeffient (before tax), income shares of various percentiles (richest 10%, 1%, 0.1%, and poorest 50%)
and more...


In [None]:
gdp_head

The gdp-per-capita-penn-world-table.csv dataset is simpler, with the following columns:

Entity(Country)
Code (Country code)
Year
GDP per capita(output, multiple price benchmarks)




Let's identify the range of years covered and the number of countries in both datasets to understand how much alignment isnecessary

In [None]:
inequality_years_range = (inequality_df['Year'].min(), inequality_df['Year'].max())
gdp_years_range = (gdp_df['Year'].min(), gdp_df['Year'].max())

inequality_countries_count = inequality_df['Country'].nunique()
gdp_countries_count = gdp_df['Entity'].nunique()

In [None]:
(inequality_years_range, gdp_years_range, inequality_countries_count, gdp_countries_count)

Here's the data we have:

• The inequality.csv dataset spans from 1807 to 2021 andincludes data for 215 unique countries.


• The
gdp-per-capita-penn-world-table.csv dataset convers the years 1950 to 2019 and has data for 182 unique countries.

Filter both datasets to the overlapping
years 1950-2019

In [None]:
inequality_filtered =inequality_df[(inequality_df['Year'] >= 1950) & (inequality_df['Year'] <= 2019)]
gdp_filtered = gdp_df[(gdp_df['Year'] >= 1950) & (gdp_df['Year'] <= 2019)]

Align the country names and filter out countries that do not appear in both datasets

In [None]:
# We'll create a set of common countries present in both  datasets
common_countries = set( inequality_filtered['Country']).intersection(set(gdp_filtered['Entity']))

Filter the dataset to include only the common countries

In [None]:
inequality_aligned = inequality_filtered[inequality_filtered['Country'].isin(common_countries)]
gdp_aligned = gdp_filtered[gdp_filtered['Entity'].isin(common_countries)]

Resample the data if necessary to ensure one value per year per country

In [None]:
# Check if there is more than one entry per country per year in each dataset
inequality_duplicates = inequality_aligned.duplicated(subset=['Country', 'Year'], keep=False)
gdp_duplicates = gdp_aligned.duplicated(subset=['Entity', 'Year'], keep=False)

# Summarize the number of duplicate entries (if any)
inequality_duplicates_sum = inequality_duplicates.sum()
gdp_duplicates_sum = gdp_duplicates.sum()

(inequality_duplicates_sum, gdp_duplicates_sum)

Merge these two datasets on the common columns 'Country'and 'Year', so that each row has the GDP per capita and the Gini coefficient for given countrty and year. After that, we can calculate the correlation coefficient for the aligned data.

In [None]:
# Rename the columns to facilitate the merge
gdp_aligned = gdp_aligned.rename(columns={"Entity": "Country",
                                          "GDP per capita (output, ,multiple price benchmarks)" : "GDP per capita"})

# Merge the dataset on 'Country' and 'Year'
merged_data = pd.merge(inequality_aligned, gdp_aligned, on=['Country', 'Year'])

# Focus on the relevant columns 'Gini coefficient (before tax)' and 'GDP per capita'
# We will also drop any rows that have NaN values in these columns as they cannot be used in correlation analysis
merged_data_relevant = merged_data[['Country', 'Year', 'Gini coefficient (before tax) (World Inequality Database)', 'GDP per capita']].dropna()


# Calculate the correlation coefficient for the aligned data
correlation = merged_data_relevan[['Gini coefficient (before tax) (World Inequality Database)', 'GDP per capita']].corr()

correlation

Visualization


In [None]:
# Set the aesthetic style of the plots
sns.set_style("Whitegrid")

# Plotting the relationship between GDP per capita and Gini coefficient
plt.figure(figsize=(10, 6))
sns.scatterplot(x='GDP per capita', y='Gini coefficient (before tax) (World Inequality Database)',
                data=merged_data_relevant, edgecolor='none', alpha=0.7)

plt.title('GDP per Capita vs Gini Coefficient')
plt.xlable('GDP per Capica (in international $)')
plt.ylabel ('Gini Coefficient (before tax)')
plt.show()

Questions

1. Is there a relation between a country's Gross Domestic Product (GDP) and its income  inequality?


Based on the data we have analyzed, there appears to be a negative correlation between a country's GDP per caoita and its income inequality as measured by the Gini coefficient. The Pearson correlation coefficient is approximately -0.329, indicating that, on average, higher GDP per capita is associated with lower income inequality.



2. Difference between correlation and causation


It is important to note that the observed correlation does not establish causation. The relationship suggests a pattern but does not confirm that higher GDP per capita directly causes lower income inequality or vice versa. There may be other factors (an unknown C) influencing both GDP per capita and income inequality, such as educational attainment, tax policies, social welfare programs, or economic structure.



3. Gini coenffient as a measure of income inequality


The Gini coefficient is a widely used measure of income inequality whitin a country. The closer the Gini coenffient is to 1, the higher the inequality; a coefficient closer to 0 suggests more equality. The dataset inequality.csv provides Gini coefficients for various countries, which we used to measurwe income inequality.



4.Historical GDP data


 The dataset  gdp-per-capita-penn-world-table.csv contains historical GDP per capita data. This measure seves as an indicator of the economic output per person and is often used as a proxy for the average standard of living or economic well-being whitin a country.


 5.Correlation coefficient calculation


 To calculate the correlation coefficient, we first aligned the datasets by filtering for common countries and years. We then merged then datasets and focused on the relevant columns for GDP per capita and the Gini coefficient . No resampling was necessary, as there was only one entry per country per year. The calculated Pearson correlation coefficient is a statistical measure of the linear relationship between the two variables.
