In [None]:
import numpy as np
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd
import warnings
from mpl_toolkits.axes_grid1 import make_axes_locatable
warnings.simplefilter('ignore')

## Read cleaned data from file

In [None]:
cleaned_df = pd.read_csv('data/data_processed.csv')
cleaned_df.head(10)

## Question 1: What is the difference between self-made and inherited billionaires?

### Preprocessing

I will get the necessary attributes to serve the task of processing and answering questions

In [None]:
df = cleaned_df[['rank', 'finalWorth', 'selfMade', 'country']]
df

### Analyze data to answer the question?

#### Comparison of inherited vs self-made billionaires

Calculate number of inherited and self-made billionaries

In [None]:
number_inherited = sum(df['selfMade'] == 0)
number_self_made = sum(df['selfMade'] == 1)

print(f"Total Number of inherited billionaires: {number_inherited}")
print(f"Total Number of self-made billionaires: {number_self_made}")

In [None]:
data = [number_inherited, number_self_made]
labels = ['Inherited Billionaires', 'Self-Made Billionaires']

In [None]:
plt.figure(figsize=(12, 8))
bars = plt.bar(labels, data, color=['blue', 'orange'])

for bar in bars:
    yval = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2, yval + 10, yval, ha='center', va='bottom')

plt.title('Comparison of inherited vs self-made billionaires')
plt.ylabel('Number of billionaires')
plt.grid(axis='y', linestyle='-', alpha=0.7)

plt.show()

In [None]:
plt.pie(data, labels=labels, autopct="%1.1f%%")
plt.title("Ratio of inherited vs self-made billionaires")
plt.legend()
plt.show()

It can be seen that the ratio between inherited and self-made billionaires is quite different. It seems like there are a lot more self-made billionaires these days.

#### FinalWorth average of inherited and self-made billionaires

- Because the odds are quite large, I will use average assets for comparison.
- Because the data set is so large, I only used individuals have finalWorth > 10000.

In [None]:
df = df[df['finalWorth'] > 10000]
df

In [None]:
number_inherited = sum(df['selfMade'] == 0)
number_self_made = sum(df['selfMade'] == 1)

sum_inherited_worth = sum(df[df['selfMade'] == 0]['finalWorth'])
sum_self_made = sum(df[df['selfMade'] == 1]['finalWorth'])

average_inherited = sum_inherited_worth / number_inherited
average_self_made = sum_self_made / number_self_made

print(f"Average worth of inherited billionaires: {average_inherited}")
print(f"Average worth self-made billionaires: {average_self_made}")

In [None]:
data = [sum_inherited_worth, sum_self_made]
labels = ['Inherited Billionaires', 'Self-Made Billionaires']

In [None]:
plt.pie(data, labels=labels, autopct="%1.1f%%")
plt.title("Ratio of inherited vs self-made billionaires")
plt.legend()
plt.show()

In [None]:
data = [average_inherited, average_self_made]

plt.figure(figsize=(12, 8))
bars = plt.bar(labels, data, color=['blue', 'orange'])

for bar in bars:
    yval = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2, yval + 10, yval, ha='center', va='bottom')

plt.title('Comparison of inherited vs self-made billionaires')
plt.ylabel('Number of billionaires')
plt.grid(axis='y', linestyle='-', alpha=0.7)

plt.show()

After eliminating billionaires with FinalWorth > 10000, we can see that the ratio between heirs and self-made billionaires remains unchanged. But the average value of inherited billionaires is slightly greater than that of self-made billionaires.

#### Average finalWorth of countries for billionaires with finalWorth > 10000

In [None]:
avg_wealth_country = df[df['selfMade'] == 0].groupby('country')['finalWorth'].mean().sort_values(ascending=False)
plt.figure(figsize=(12, 8))
avg_wealth_country.plot(kind='bar')
plt.title('Average worth of inherited billionaires by country')
plt.xlabel('Country')
plt.ylabel('Average Wealth (in billions)')
plt.show()

In [None]:
avg_wealth_country = df[df['selfMade'] == 1].groupby('country')['finalWorth'].mean().sort_values(ascending=False)
plt.figure(figsize=(12, 8))
avg_wealth_country.plot(kind='bar')
plt.title('Average worth of self-made billionaires by country')
plt.xlabel('Country')
plt.ylabel('Average Wealth (in billions)')
plt.show()

**Conclusion**:
- The number of self-made billionaires is greater than the number of inherited billionaires.
- Even so, when comparing the average worth between them, we can see that the inherited billionaires people do not own less worth, but sometimes even possess more worth.
- It can be seen that self-made billionaires and inherited billionaires are distributed in different countries.
- We see that in the top 4 countries with the highest average worth for billionaires with worth > 10,000:
    - Inherited billionaires: France, Canada, Belgium, Australia. It may be because there are some problems related to families with status that have existed from previous eras to the present.
    - Self-made billionaire: Mexico, Spain, France, United States. In particular, in the top 7 billionaires of this data set, there are 6 billionaires from the United States, but because the data set is quite large, when considering the average value, the United States has fallen to the top 4.

## Question 2: What age and gender have most billionaires?

### Age distribution of billionaires

In [None]:
plt.hist(cleaned_df['age'], bins=10, color='skyblue', edgecolor='black')

plt.title('Histogram of Age')
plt.xlabel('Age')
plt.ylabel('Number of billionaires')

plt.show()

- We can see the number of billionaires between 50 and 80 years old is the largest.
- the average age of billionaires tended to vary depending on factors such as industry and region. In many cases, billionaires are older individuals who have had time to accumulate wealth through successful careers, businesses, or investments. However, there are also younger billionaires, particularly in industries like technology and finance, where rapid wealth accumulation is possible.

### Gender distribution of billionaires

In [None]:
gender_counts = cleaned_df['gender'].value_counts()

plt.barh(gender_counts.index, gender_counts.values, color=['blue', 'pink'])

plt.title('Number of men and women who are billionaires')
plt.xlabel('Gender')
plt.ylabel('Number')

plt.show()

- There have been significantly more male billionaires than female billionaires.
- Factors contributing to the gender gap in billionaire status include systemic biases in various industries and sectors, as well as cultural and societal factors that have historically disadvantaged women in terms of access to education, capital, and opportunities for advancement.

### Distribution of the number of billionaires by age and gender

In [None]:
age_bins = [0, 30, 40, 50, 60, 70, float('inf')]  # range of age
age_labels = ['0-30', '31-40', '41-50', '51-60', '61-70', '71+']  # label for each range age
cleaned_df['Age_Group'] = pd.cut(cleaned_df['age'], bins=age_bins, labels=age_labels, right=False)


grouped_data = cleaned_df.groupby(['Age_Group', 'gender']).size().unstack(fill_value=0)
grouped_data.plot(kind='bar', stacked=True, figsize=(10, 6))
plt.xlabel('Age')
plt.ylabel('Number of billionairs')
plt.title('Number billionairs by age and gender')
plt.legend(title='Gender')
plt.xticks(rotation=0)
plt.show()

- The majority of billionaires tend to be older males. This demographic trend is a result of various factors, including historical inequalities, access to opportunities, and societal structures that have favored certain demographics.

### Top 10 countries with the highest number of billionaires by gender

In [None]:
grouped = cleaned_df.groupby(['countryOfCitizenship','gender']).size().unstack(fill_value=0)
new_df = grouped.reset_index()
total_phus_by_country = new_df.groupby('countryOfCitizenship').sum().sum(axis=1)
top_10_countries = total_phus_by_country.nlargest(10).index
top_10_df = new_df[new_df['countryOfCitizenship'].isin(top_10_countries)]
top_10_df.set_index('countryOfCitizenship',inplace =True)
top_10_df.plot(kind='bar', stacked = True)
plt.title('Top 10 countries with the highest number of billionaires by gender')
plt.xlabel('Country')
plt.ylabel('Number of billionaires')

- There is often a significant difference in the number of male and female billionaires in most countries, with the proportion of male billionaires usually higher.
- This gap may reflect disparities in rights and opportunities between men and women in fields such as business, technology, and finance. The influence of cultural, social, and political factors also plays a significant role.

### Conclusion
- Gender Disparity: There is a noticeable gender disparity among billionaires, with men typically outnumbering women. This gap may reflect systemic biases and disparities in opportunities between genders in various industries and regions.
- Intersectionality: The intersection of age and gender further complicates the analysis. For example, there may be differences in the age distribution of male and female billionaires, with men tending to be older on average when they achieve billionaire status compared to women.
- Cultural and Societal Factors: Cultural and societal factors also play a significant role in shaping the number of billionaires by age and gender. These factors influence access to education, capital, networks, and opportunities for wealth accumulation.

## Question 3: What is the geographical dispersion of billionaires worldwide?

**Count number of billionaires of each country**

In [None]:
country_df = cleaned_df.groupby(['country']).size().reset_index(name='count')
country_df

In [None]:
country_df.sort_values(by='count', ascending=False)

**Processing data and merge to draw world map**

In [None]:
country_mapping = {
    'United States': 'United States of America',
}
country_df['standard_country'] = country_df['country'].map(country_mapping)

country_df['standard_country'].fillna(country_df['country'], inplace=True)

In [None]:
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world = world.merge(country_df, how='left', left_on='name', right_on='standard_country')
world['count'].fillna(0, inplace=True)
world

In [None]:
fig, ax = plt.subplots(figsize=(15, 10))

divider = make_axes_locatable(ax)
plt.grid(which='both', color='gray', linewidth=0.1)
cax = divider.append_axes("bottom", size="5%", pad=0.4)

world.plot(column='count', cmap='viridis', legend=True, ax=ax, cax=cax,
                 legend_kwds={"label": "Number of billionaires", "orientation": "horizontal"})

plt.tight_layout()
plt.show()

- From world map, we can see that the billionaires are mostly from the US, China, and India.
- How GDP of countries which have many billionaires?
  
- Now let's see GDP and get some insights.

In [None]:
gdp_countries_df = cleaned_df.groupby(['country'])['gdp_country'].mean()
top_10_gdp_countries = gdp_countries_df.nlargest(10).reset_index()  # Selecting top 10 countries by mean GDP

plt.figure(figsize=(10, 6))
sns.barplot(data=top_10_gdp_countries, x='country', y='gdp_country', palette='viridis')
plt.xlabel('Country')
plt.ylabel('Mean GDP')
plt.title('Top 10 Countries by Mean GDP')
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.tight_layout()  # Adjust layout to prevent cutting off labels
plt.show()


- US, China also has the highest GDP
- There are several Europe's countries in top 10 GDP

In [None]:
country_df = cleaned_df.groupby('country').agg(count=('gdp_country', 'size'), gdp=('gdp_country', 'mean')).reset_index()

print(country_df)

**Correlation between number of billionaires and GDP of each country**

In [None]:
correlation_matrix = country_df.drop(columns='country').corr()
correlation_matrix['count']['gdp']

- 0.968/1 is too high 

- America have the bigest number of billionaires, which industry has the most of billionaires in American?

In [None]:
american_billionaires = cleaned_df[cleaned_df['country'] == 'United States']

industry_counts = american_billionaires['industries'].value_counts()

plt.figure(figsize=(10, 8))
plt.pie(industry_counts, labels=industry_counts.index, autopct='%1.1f%%', startangle=140)
plt.title('Distribution of Industries Among American Billionaires')
plt.axis('equal')  
plt.show()

- The finance & investments industry stands out as the leading sector, with the highest number of billionaires. This observation underscores the significant role of financial services, investment management, and wealth accumulation activities in the American economy.
- Technology emerges as the second most prominent industry, with a substantial number of billionaires. This reflects the impact of technological innovation, entrepreneurship, and the success of tech companies in generating wealth for individuals.
- While finance & investments and technology sectors lead in terms of billionaire count, there is a diverse range of industries represented among American billionaires. Industries such as food & beverage, fashion & retail, real estate, and media & entertainment also have notable billionaire presence, highlighting the diverse economic landscape of the United States.
- Industries such as energy, sports, and gambling & casinos have a notable presence among American billionaires, indicating opportunities for wealth creation in these sectors. The inclusion of emerging industries reflects the dynamic nature of the American economy and its ability to adapt to changing market trends and consumer preferences.

**Conclusion**

- The strong positive correlation suggests that countries with a higher GDP tend to have more billionaires. This indicates that economic activity and wealth generation in a country contribute significantly to the number of billionaires it produces.
- A high number of billionaires in a country can be seen as a reflection of its economic growth and entrepreneurial environment. Countries with robust economies and favorable business conditions are more likely to foster the creation of wealth and attract entrepreneurs.
- 
While a high GDP may lead to the creation of more billionaires, it's essential to consider the distribution of wealth within a country. A high concentration of billionaires may indicate disparities in wealth distribution, which can have social and economic implications
- Both the finance & investments and technology sectors play crucial roles in job creation and economic development. The presence of billionaires in these sectors signifies their contribution to employment generation, technological innovation, and overall economic prosperity in the United States..