# Billionaires

## David Montero Loaiza

`conda install sqlite sqlalchemy`

Source: https://corgis-edu.github.io/corgis/csv/billionaires/

SQLAlchemy: https://docs.sqlalchemy.org/en/13/core/connections.html

In [0]:
import numpy as np
import pandas as pd

In [0]:
bill = pd.read_csv("https://corgis-edu.github.io/corgis/datasets/csv/billionaires/billionaires.csv")

## All rows where company is Microsoft or Zara

In [23]:
bill[bill['company.name'].isin(['Microsoft','Zara'])]

Unnamed: 0,name,rank,year,company.founded,company.name,company.relationship,company.sector,company.type,demographics.age,demographics.gender,location.citizenship,location.country code,location.gdp,location.region,wealth.type,wealth.worth in billions,wealth.how.category,wealth.how.from emerging,wealth.how.industry,wealth.how.inherited,wealth.how.was founder,wealth.how.was political
0,Bill Gates,1,1996,1975,Microsoft,founder,Software,new,40,male,United States,USA,8100000000000.0,North America,founder non-finance,18.5,New Sectors,True,Technology-Computer,not inherited,True,True
1,Bill Gates,1,2001,1975,Microsoft,founder,Software,new,45,male,United States,USA,10600000000000.0,North America,founder non-finance,58.7,New Sectors,True,Technology-Computer,not inherited,True,True
2,Bill Gates,1,2014,1975,Microsoft,founder,Software,new,58,male,United States,USA,0.0,North America,founder non-finance,76.0,New Sectors,True,Technology-Computer,not inherited,True,True
7,Paul Allen,3,2001,1975,Microsoft,founder,technology,new,48,male,United States,USA,10600000000000.0,North America,founder non-finance,30.4,New Sectors,True,Technology-Computer,not inherited,True,True
8,Amancio Ortega,3,2014,1975,Zara,founder,Fashion,new,77,male,Spain,ESP,0.0,Europe,founder non-finance,64.0,Non-Traded Sectors,True,"Retail, Restaurant",not inherited,True,True
36,Paul Allen,13,1996,1975,Microsoft,founder,technology,new,43,male,United States,USA,8100000000000.0,North America,founder non-finance,7.5,New Sectors,True,Technology-Computer,not inherited,True,True
37,Steven Ballmer,13,2001,1975,Microsoft,CEO,technology,new,44,male,United States,USA,10600000000000.0,North America,executive,16.6,New Sectors,True,Technology-Computer,not inherited,True,True
107,Steve Ballmer,36,2014,1975,Microsoft,CEO,technology,new,57,male,United States,USA,0.0,North America,executive,19.3,New Sectors,True,Technology-Computer,not inherited,True,True
129,Amancio Ortega,43,2001,1975,Zara,founder,Fashion,new,65,male,Spain,ESP,626000000000.0,Europe,founder non-finance,6.6,Traded Sectors,True,Consumer,not inherited,True,True
167,Paul Allen,56,2014,1975,Microsoft,founder,technology,new,61,male,United States,USA,0.0,North America,founder non-finance,15.9,New Sectors,True,Technology-Computer,not inherited,True,True


## Number of times each person appears in the database in descending order

In [24]:
bill['name'].value_counts()

Alice Walton                     3
Jacqueline Mars                  3
Micky Arison                     3
Kumar Birla                      3
Thaksin Shinawatra               3
                                ..
Harold Hamm                      1
Senapathy Gopalakrishnan         1
Seydoux/Schlumberger families    1
Rafaela Aponte                   1
Bob Parsons                      1
Name: name, Length: 2077, dtype: int64

## Top ten sectors (by frequency appereance in the dataset)

In [25]:
bill['company.sector'].value_counts().head(10)

real estate        177
retail             120
media              117
construction        96
banking             93
pharmaceuticals     76
oil                 74
software            67
hedge funds         50
technology          36
Name: company.sector, dtype: int64

## SQL IO

Create a database called 'billionaries' using sqlite and save the data from the dataframe using the appropiate tables. Save the information of people, companies and the relationship between people and companies (ie. company.relationship). Use the functions viewed in class.

In [0]:
from sqlalchemy import create_engine

engine = create_engine('sqlite:///billionaries.db')

Extract necessary information:

In [0]:
peopleColumns = ['name']
companiesColumns = ['company.founded', 'company.name','company.sector', 'company.type']

people = bill.loc[:,peopleColumns].drop_duplicates().reset_index()
companies = bill.loc[:,companiesColumns].drop_duplicates().reset_index()

In [0]:
relationship = pd.merge(people,bill,on = ['name'])
relationship = pd.merge(companies,relationship,on = ['company.name'])
relationship = relationship.loc[:,['index_x','index_y','company.relationship']].drop_duplicates()

Rename columns:

In [0]:
relationship.columns = ['idx_people','idx_company','company.relationship']

Save to the database:

In [0]:
people.to_sql('people',con = engine,if_exists = 'replace')
companies.to_sql('companies',con = engine,if_exists = 'replace')
relationship.to_sql('relationship',con = engine,if_exists = 'replace')

Test query

```
sqlite> select count(*) from people inner join positions on people."index" = positions.person_id inner join companies on positions.company_id=companies."index";
count(*)
2102
```

In [66]:
query = '''select * from people
        inner join relationship on relationship.idx_people = people."index"
        inner join companies on relationship.idx_company = companies."index"
        '''

engine.execute(query).fetchall()

[(0, 0, 'Bill Gates', 0, 0, 0, 'founder', 0, 0, 1975, 'Microsoft', ' Software', 'new'),
 (0, 0, 'Bill Gates', 3, 0, 7, 'founder', 4, 7, 1975, 'Microsoft', 'technology', 'new'),
 (0, 0, 'Bill Gates', 9, 0, 1421, "head of Microsoft's application software group", 835, 1421, 1975, 'Microsoft', 'software', 'new'),
 (1, 3, 'Warren Buffett', 33, 3, 3, 'founder', 1, 3, 1962, 'Berkshire Hathaway', ' Finance', 'new'),
 (1, 3, 'Warren Buffett', 41, 3, 2497, 'investor', 1576, 2497, 1962, 'Berkshire Hathaway', 'finance ', 'new'),
 (2, 5, 'Carlos Slim Helu', 60, 5, 5, 'founder', 2, 5, 1990, 'Telmex', ' Communications', 'privatization'),
 (3, 6, 'Oeri Hoffman and Sacher', 63, 6, 6, None, 3, 6, 1896, 'F. Hoffmann-La Roche', 'pharmaceuticals', 'new'),
 (4, 7, 'Paul Allen', 11, 7, 0, 'founder', 0, 0, 1975, 'Microsoft', ' Software', 'new'),
 (4, 7, 'Paul Allen', 14, 7, 7, 'founder', 4, 7, 1975, 'Microsoft', 'technology', 'new'),
 (4, 7, 'Paul Allen', 20, 7, 1421, "head of Microsoft's application software