# Data Analysis with Python using Forbes 2022 Dataset

Hi Guys, Welcome to [Tirendaz Academy](https://youtube.com/c/tirendazacademy) 😀
In this notebook, I'm going to talk about data analysis with Python.
Happy learning 🐱‍🏍 

## Loading Data

In [24]:
import pandas as pd

In [25]:
df = pd.read_csv("DataSets/forbes_2022_billionaires.csv")

In [26]:
df.head()

Unnamed: 0,rank,personName,age,finalWorth,year,month,category,source,country,state,...,organization,selfMade,gender,birthDate,title,philanthropyScore,residenceMsa,numberOfSiblings,bio,about
0,1,Elon Musk,50.0,219000.0,2022,4,Automotive,"Tesla, SpaceX",United States,Texas,...,Tesla,True,M,1971-06-28,CEO,1.0,,,Elon Musk is working to revolutionize transpor...,Musk was accepted to a graduate program at Sta...
1,2,Jeff Bezos,58.0,171000.0,2022,4,Technology,Amazon,United States,Washington,...,Amazon,True,M,1964-01-12,Entrepreneur,1.0,"Seattle-Tacoma-Bellevue, WA",,Jeff Bezos founded e-commerce giant Amazon in ...,"Growing up, Jeff Bezos worked summers on his g..."
2,3,Bernard Arnault & family,73.0,158000.0,2022,4,Fashion & Retail,LVMH,France,,...,LVMH Moët Hennessy Louis Vuitton,False,M,1949-03-05,Chairman and CEO,,,,Bernard Arnault oversees the LVMH empire of so...,"Arnault apparently wooed his wife, Helene Merc..."
3,4,Bill Gates,66.0,129000.0,2022,4,Technology,Microsoft,United States,Washington,...,Bill & Melinda Gates Foundation,True,M,1955-10-28,Cofounder,4.0,"Seattle-Tacoma-Bellevue, WA",,Bill Gates turned his fortune from software fi...,"When Gates was a kid, he spent so much time re..."
4,5,Warren Buffett,91.0,118000.0,2022,4,Finance & Investments,Berkshire Hathaway,United States,Nebraska,...,Berkshire Hathaway,True,M,1930-08-30,CEO,5.0,"Omaha, NE",,"Known as the ""Oracle of Omaha,"" Warren Buffett...","Buffett still lives in the same Omaha, Nebrask..."


In [27]:
df.shape

(2668, 22)

In [28]:
df.columns

Index(['rank', 'personName', 'age', 'finalWorth', 'year', 'month', 'category',
       'source', 'country', 'state', 'city', 'countryOfCitizenship',
       'organization', 'selfMade', 'gender', 'birthDate', 'title',
       'philanthropyScore', 'residenceMsa', 'numberOfSiblings', 'bio',
       'about'],
      dtype='object')

## Data Preprocessing

In [29]:
df = df.loc[:,["rank","personName","age","finalWorth","category","country","gender"]] #selecting columns

In [30]:
df.head()

Unnamed: 0,rank,personName,age,finalWorth,category,country,gender
0,1,Elon Musk,50.0,219000.0,Automotive,United States,M
1,2,Jeff Bezos,58.0,171000.0,Technology,United States,M
2,3,Bernard Arnault & family,73.0,158000.0,Fashion & Retail,France,M
3,4,Bill Gates,66.0,129000.0,Technology,United States,M
4,5,Warren Buffett,91.0,118000.0,Finance & Investments,United States,M


In [31]:
df=df.rename(columns={"rank":"Sıra","personName":"İsim","age":"Yaş",
                      "finalWorth":"Servet","category":"Kategori",
                       "country":"Ülke", "gender":"Cinsiyet"}) #changing column names

In [32]:
df.head()

Unnamed: 0,Sıra,İsim,Yaş,Servet,Kategori,Ülke,Cinsiyet
0,1,Elon Musk,50.0,219000.0,Automotive,United States,M
1,2,Jeff Bezos,58.0,171000.0,Technology,United States,M
2,3,Bernard Arnault & family,73.0,158000.0,Fashion & Retail,France,M
3,4,Bill Gates,66.0,129000.0,Technology,United States,M
4,5,Warren Buffett,91.0,118000.0,Finance & Investments,United States,M


In [33]:
df = df.set_index("Sıra") #setting index

In [34]:
df.head()

Unnamed: 0_level_0,İsim,Yaş,Servet,Kategori,Ülke,Cinsiyet
Sıra,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,Elon Musk,50.0,219000.0,Automotive,United States,M
2,Jeff Bezos,58.0,171000.0,Technology,United States,M
3,Bernard Arnault & family,73.0,158000.0,Fashion & Retail,France,M
4,Bill Gates,66.0,129000.0,Technology,United States,M
5,Warren Buffett,91.0,118000.0,Finance & Investments,United States,M


In [35]:
df.dtypes

İsim         object
Yaş         float64
Servet      float64
Kategori     object
Ülke         object
Cinsiyet     object
dtype: object

In [36]:
df.isnull().sum() #checking for missing values

İsim         0
Yaş         86
Servet       0
Kategori     0
Ülke        13
Cinsiyet    16
dtype: int64

In [37]:
df.dropna(inplace=True) #dropping missing values

In [38]:
df.shape

(2568, 6)

## Informations about gender of the richest in the world

In [39]:
df["Cinsiyet"].value_counts() #checking for unique values

Cinsiyet
M    2282
F     286
Name: count, dtype: int64

In [40]:
df["Cinsiyet"].value_counts(normalize=True) #checking for unique values

Cinsiyet
M    0.888629
F    0.111371
Name: proportion, dtype: float64

In [41]:
df[df["Ülke"]=="Turkey"].Cinsiyet.value_counts(normalize=True)

Cinsiyet
M    0.826087
F    0.173913
Name: proportion, dtype: float64

In [42]:
df["Ülke"].unique()

array(['United States', 'France', 'India', 'Mexico', 'China', 'Singapore',
       'Spain', 'Canada', 'Germany', 'Switzerland', 'Belgium',
       'Hong Kong', 'United Kingdom', 'Australia', 'Austria', 'Italy',
       'Japan', 'Bahamas', 'Indonesia', 'Chile', 'Russia', 'Sweden',
       'Czechia', 'Monaco', 'United Arab Emirates', 'Nigeria', 'Denmark',
       'Thailand', 'Malaysia', 'Brazil', 'Colombia', 'New Zealand',
       'South Korea', 'South Africa', 'Philippines', 'Egypt', 'Taiwan',
       'Israel', 'Vietnam', 'Poland', 'Norway', 'Cayman Islands',
       'Netherlands', 'Eswatini (Swaziland)', 'Peru', 'Algeria',
       'Kazakhstan', 'Georgia', 'Portugal', 'British Virgin Islands',
       'Turkey', 'Finland', 'Ukraine', 'Ireland', 'Bermuda', 'Lebanon',
       'Argentina', 'Cambodia', 'Oman', 'Guernsey', 'Liechtenstein',
       'Turks and Caicos Islands', 'Qatar', 'Morocco', 'Uruguay',
       'Slovakia', 'Romania', 'Nepal', 'Tanzania', 'Bahrain', 'Greece',
       'Hungary', 'Andorra']

In [43]:
df[df["Ülke"]=="Canada"].Cinsiyet.value_counts(normalize=True)

Cinsiyet
M    0.952381
F    0.047619
Name: proportion, dtype: float64

In [44]:
df_cinsiyet = df.groupby(["Cinsiyet"])

In [45]:
df_cinsiyet["Yaş"].mean()

Cinsiyet
F    62.937063
M    64.409290
Name: Yaş, dtype: float64

In [46]:
import seaborn as sns
sns.set_theme()
sns.set(rc = {"figure.dpi":300})
import warnings
warnings.filterwarnings("ignore")

ModuleNotFoundError: No module named 'seaborn'

In [None]:
df_cinsiyet.size().plot(kind = "bar")

## Who are the top 10 richest in the world?

In [None]:
sns.barplot(y = df["İsim"][:10], x = df["Servet"][:10])

## Which country has the highest number of billionaires?

In [None]:
len(df["Ülke"].unique())

In [28]:
df_ulke = df.groupby("Ülke")

In [29]:
df_ulke_sayi = pd.DataFrame(df_ulke.size().sort_values(ascending=False), 
                           columns = ["Sayı"])

In [None]:
df_ulke_sayi.head()

In [None]:
sns.barplot(x = df_ulke_sayi["Sayı"][:10] , y = df_ulke_sayi.index[:10])

## Who are the Top 10 richest in the Turkey?

In [35]:
df_turkiye = df[df["Ülke"]=="Turkey"]

In [None]:
df_turkiye["İsim"].count()

In [None]:
df_turkiye.head(10)

In [None]:
sns.barplot(y=df_turkiye["İsim"][:10], x = df_turkiye["Servet"][:10])

## Which Industry has the most billionaires in it?

In [None]:
df["Kategori"].unique()

In [40]:
df["Kategori"]=df["Kategori"].apply(lambda x:x.replace(" ","")).\
    apply(lambda x:x.replace("&","_"))

In [None]:
df["Kategori"].unique()

In [42]:
df_kategori = df.groupby("Kategori").size()

In [None]:
df_kategori.head()

In [44]:
df_kategori = df_kategori.to_frame()

In [None]:
df_kategori.head()

In [46]:
df_kategori = df_kategori.rename(columns={0:"Sayi"}).sort_values(by="Sayi",
                                                                ascending=False)

In [None]:
df_kategori.head()

In [None]:
sns.barplot( x = df_kategori["Sayi"][:10], y = df_kategori.index[:10])

## Is there a relationship between money and age?

In [None]:
sns.scatterplot(df["Yaş"],df["Servet"])

## The distribution of age

In [None]:
sns.histplot(df["Yaş"])

## The youngest billionaires

In [None]:
df_yas = df.sort_values(by="Yaş")
df_yas

In [None]:
sns.barplot(y=df_yas["İsim"][:10], x = df_yas["Yaş"][:10])

Don't forget to follow us on [YouTube](http://youtube.com/tirendazacademy) | [Medium](http://tirendazacademy.medium.com) | [Twitter](http://twitter.com/tirendazacademy) | [GitHub](http://github.com/tirendazacademy) | [Linkedin](https://www.linkedin.com/in/tirendaz-academy) | [Kaggle](https://www.kaggle.com/tirendazacademy) 😎