# The Billionaire’s Data-Driven Will (Beginner Visualization Project)

## Problem Statement

A billionaire with a global business empire wants his legacy to be guided by evidence, not sentiment. As he prepares his will, he refuses to rely on family politics or gut instinct. For years he has tracked detailed data on his nuclear and extended family as well as his most loyal long-serving employees—capturing factors such as financial responsibility, past contributions, loyalty, and leadership potential.

The challenge is to analyze this information and build a transparent, defensible framework for distributing his wealth and appointing key stewards of his assets. The analysis must highlight patterns, rank potential heirs and managers, and provide clear visual evidence that supports each recommendation. The goal is to move from an emotional, subjective process to a quantitative decision, ensuring his fortune is managed responsibly and his legacy preserved worldwide.


## Datasets & Column Meanings

### `people.csv` — Observations about potential heirs and trusted employees
| Column | Meaning |
|---|---|
| **person_id** | Unique identifier for each person. |
| **name** | First and last name. |
| **group** | Relationship to the billionaire: `nuclear_family`, `extended_family`, or `employee`. |
| **age** | Age in years. |
| **country** | Country of residence. |
| **financial_responsibility** | Score (0–100) for how reliably they handle money. |
| **loyalty_score** | Score (0–100) for dedication and trustworthiness. |
| **leadership_potential** | Score (0–100) estimating ability to run companies or major assets. |
| **philanthropy_alignment** | Score (0–100) for how well their values match the billionaire’s giving goals. |
| **legal_flags** | 0 = clean record, 1 = some legal issue present. |

### `assets_updated.csv` — The billionaire’s portfolio of major assets
| Column | Meaning |
|---|---|
| **asset_id** | Unique identifier for each asset. |
| **asset_name** | Descriptive name of the asset. |
| **asset_type** | Category of the asset: `cash`, `stocks`, `real_estate`, `business_unit`, or `art_collection`. |
| **region** | Global region of the asset: `North America`, `Europe`, `Africa`, `Asia`, or `Latin America`. |
| **current_value_million_usd** | Estimated value **in millions of US dollars**. |
| **risk_level** | Overall investment risk: `Low`, `Medium`, or `High`. |
| **liquidity** | How easily the asset can be converted to cash: `High`, `Medium`, or `Low`. |
| **mgmt_expertise_required** | 1 if the asset needs active management; 0 if it can be passively held. |

## Load Libraries and Data Set

In [None]:
import pandas as pd # data manipulation
import numpy as np # mathematical calculations
import matplotlib.pyplot as plt # data visualization
import seaborn as sns # data visualization

In [None]:
people_df = pd.read_csv('people.csv')
asset_df = pd.read_csv('assets_updated.csv')

In [None]:
print()
display(people_df.head())
print()
display(asset_df.head())

### Quickly explore the data

In [None]:
people_df.shape

In [None]:
asset_df.shape

In [None]:
people_df.isnull().sum()

In [None]:
asset_df.isnull().sum()

In [None]:
asset_df.isna().sum()

In [None]:
people_df.info()

In [None]:
asset_df.info()

In [None]:
people_df.duplicated().sum()

In [None]:
asset_df.duplicated().sum()

To display more information about a function use `shift` + `tab` keys

In [None]:
people_df.select_dtypes(exclude='number') 

In [None]:
people_df.describe()

In [None]:
people_df.describe().T

In [None]:
asset_df.describe()

In [None]:
asset_df.describe().T

## Let's Explore more and Visualize

Peoplese Data set

In [None]:
people_df['age']

In [None]:
plt.figure(figsize=(7, 5))
plt.hist(people_df['age'], bins=7, edgecolor='black')
plt.title("Age Distribution")
plt.ylabel("Count")
plt.xlabel("Age")
plt.show()

### People By Group

In [None]:
people_df['group'].value_counts()

In [None]:
sns.countplot(x='group', data=people_df)
plt.show()

In [None]:
people_df['country'].value_counts().index

In [None]:
order = people_df['country'].value_counts().index

sns.countplot(x='country', data=people_df, order=order)

In [None]:
people_df.columns

In [None]:
people_df[['financial_responsibility', 'loyalty_score', 'leadership_potential','philanthropy_alignment']]

In [None]:
corr_columns = ['financial_responsibility', 'loyalty_score', 'leadership_potential','philanthropy_alignment']
corr_df = people_df[corr_columns].corr()
corr_df

In [None]:
sns.heatmap(corr_df, annot=True)

In [None]:
people_df.head()

In [None]:
plt.scatter(x='age', y='financial_responsibility', data=people_df)
plt.xlabel("Age")
plt.ylabel("F/R")

### Assests Data

In [None]:
asset_df.head()

In [None]:
asset_type = asset_df.groupby(['asset_type'])['current_value_million_usd'].sum().sort_values(ascending=False)

In [None]:
asset_type

In [None]:
asset_type.index

In [None]:
asset_type.values

In [None]:
sns.barplot(x=asset_type.index, y=asset_type.values)
plt.show()

In [None]:
people_df[people_df['group'] == 'nuclear_family'][['name', 'group']]

- age >= 60
- financial_responsibility >= 80
- loyalty_score > 70
- leadership_potential >= 70
- philanthropy_alignment

In [None]:
people_df

In [None]:
people_df[
    (people_df['age'] >= 40) &
    (people_df['financial_responsibility'] >= 60) &
    (people_df['loyalty_score'] >= 60) &
    (people_df['leadership_potential'] >= 60) &
    (people_df['philanthropy_alignment'] >= 60) &
    (people_df['legal_flags'] == 0)
]