## 0) Setup

Import pandas **with the correct name** and set `matplotlib` to always display graphics in the notebook.

In [None]:
import pandas as pd
%matplotlib inline

## 1) Reading in an Excel file

Use pandas to read in the `richpeople.xlsx` Excel file, saving it as a variable with the "correct" name. You will use `read_excel` instead of `read_csv`, *but you'll also need to install a new library*.

In [None]:
df= pd.read_excel("xxxxx/data/richpeople.xlsx")

## 2) Checking your data

Display the number of rows and columns in your data. Also display the names and data types of each column.

In [None]:
df.shape

In [None]:
df.dtypes

In [None]:
df.columns

In [None]:
df.typeofwealth

In [None]:
df.sourceofwealth

## 3) Who are the top 10 richest billionaires? Use the `networthusbillion` column.

In [None]:
df.sort_values(by='networthusbillion', ascending = True)[:10]

## 4) How many male billionaires are there compared to the number of female billionares? Do they have a different average wealth?

> **TIP:** The second part uses `groupby`, but the first part does not.

In [None]:
df['gender'].value_counts()

In [None]:
df.groupby('gender')['networthusbillion'].mean()

In [None]:
male = df[df['gender'] == 'male']
female = df[df['gender'] == 'female']

In [None]:
male['networthusbillion'].mean() - female['networthusbillion'].mean()

## 5) Who is the poorest billionaire? Who are the top 10 poorest billionaires?

In [None]:
df.sort_values(by='networthusbillion')[:1]

In [None]:
df.sort_values(by='networthusbillion')[:10]

## 6) What is the most common source of wealth? Is it different between males and females?

> **TIP:** You know how to `groupby` and you know how to count how many times a value is in a column. Can you put them together???

In [None]:
df['typeofwealth'].value_counts()[:1]

In [None]:
df.groupby('typeofwealth')['gender'].value_counts().sort_values(ascending = False)

## 7) What companies have the most billionaires? Graph the top 5 as a horizontal bar graph.

> **TIP:** First find the answer to the question, then just try to throw `.plot()` on the end
>
> **TIP:** You can use `.head()` on *anything*, not just your basic `df`
>
> **TIP:** You might feel like oyu should use `groupby`, but don't! There's an easier way to count.

In [None]:
df['company'].value_counts().sort_values(ascending = False).head()

In [None]:
df['company'].value_counts().head(5).sort_values()

## 8) How much money do these billionaires have in total?

In [None]:
companies = ["Cargill", "Walmart", "S. C. Johnson & Son", "Oetker-Gruppe", "Hyatt"]
total_bill = 0
for company in companies:
    df_companies = df[df['company'] == company]
    df_copanies_sum = df_companies['networthusbillion'].sum()
    total_bill = total_bill + df_copanies_sum
print(total_bill)

## 9) What are the top 3 countries with the most money held by billionaires?

I am **not** asking which country has the most billionaires - this is total amount of money per country.

> **TIP:** Think about it in steps - "I want them organized by country," "I want their net worth," and "I want to add it all up." Just chain them all together.

In [None]:
df.groupby('citizenship')['networthusbillion'].sum().sort_values(ascending = False).head(5)

## 10) How old is an average billionaire? How old are billionaires self made vs. non self made? 

In [None]:
round(df['age'].mean())

In [None]:
[df[df['selfmade'] == 'self-made']['age'].mean() , df[df['selfmade'] != 'self-made']['age'].mean()]

## 11) Who are the youngest billionaires? Who are the oldest? Make a graph of the distribution of ages.

In [None]:
df.sort_values('age',ascending = True)['name'][:10]

In [None]:
df.sort_values('age',ascending = False)['name'][:10]

In [None]:
df['age'].plot(kind='hist')

## 12) Make a scatterplot of their net worth compared to their age

In [None]:
df.plot(kind='scatter', x='networthusbillion', y='age')

## 13) Make a bar graph of the top 10 richest billionaires

In [None]:
df.sort_values(by='networthusbillion', ascending = False)[:10].plot(kind='bar', y='networthusbillion', x='name')