### Cars

Run the cell below to get a Data Frame with information on cars from different countries, including cars per capita and whether they drive on the right or the left.

This data set is from Data Camp so may be familiar to some.

In [1]:
import pandas as pd
import numpy as np

cars = pd.DataFrame([
    {'cars_per_cap': 809, 'country': 'United States', 'drives_right': True},
    {'cars_per_cap': 731, 'country': 'Australia', 'drives_right': False},
    {'cars_per_cap': 588, 'country': 'Japan', 'drives_right': False},
    {'cars_per_cap': 18, 'country': 'India', 'drives_right': False},
    {'cars_per_cap': 200, 'country': 'Russia', 'drives_right': True},
    {'cars_per_cap': 70, 'country': 'Morocco', 'drives_right': True},
    {'cars_per_cap': 45, 'country': 'Egypt', 'drives_right': True}
], index=['US', 'AUS', 'JAP', 'IN', 'RU', 'MOR', 'EG'])

cars

Unnamed: 0,cars_per_cap,country,drives_right
US,809,United States,True
AUS,731,Australia,False
JAP,588,Japan,False
IN,18,India,False
RU,200,Russia,True
MOR,70,Morocco,True
EG,45,Egypt,True


Print out the first 4 observations (rows) using `iloc`.

In [2]:
cars.iloc[0:4]

Unnamed: 0,cars_per_cap,country,drives_right
US,809,United States,True
AUS,731,Australia,False
JAP,588,Japan,False
IN,18,India,False


Print out the 6th and 7th observations

In [3]:
cars.iloc[-2:]

Unnamed: 0,cars_per_cap,country,drives_right
MOR,70,Morocco,True
EG,45,Egypt,True


Print out the observations from Japan and Morocco.

In [4]:
cars.loc[['JAP', 'MOR']]

Unnamed: 0,cars_per_cap,country,drives_right
JAP,588,Japan,False
MOR,70,Morocco,True


Print out just the columns `"cars_per_cap"` and `"drives_right"`.

In [5]:
cars[['cars_per_cap', 'drives_right']]

Unnamed: 0,cars_per_cap,drives_right
US,809,True
AUS,731,False
JAP,588,False
IN,18,False
RU,200,True
MOR,70,True
EG,45,True


Print out just the rows where they drive on right in that country.

In [6]:
cars[cars['drives_right'] == True]

Unnamed: 0,cars_per_cap,country,drives_right
US,809,United States,True
RU,200,Russia,True
MOR,70,Morocco,True
EG,45,Egypt,True


### Tech Firms

Run the cell below to load in some data into a Data Frame.

In [7]:
tech_firms = pd.read_json('data/tech_firms.json')
# Shows the first 5 rows
tech_firms

Unnamed: 0,Company,FY,Headquarters,Market cap ($B),Revenue ($B)
0,Apple Inc.,2016,"Cupertino, CA, US",815.39,215.6
1,Amazon.com,2016,"Seattle, WA, US",478.0,135.9
2,Samsung Electronics,2016,"Suwon, South Korea",311.0,173.9
3,Foxconn,2016,"New Taipei City, Taiwan",66.0,135.1
4,Alphabet Inc.,2016,"Mountain View, CA, US",676.0,90.2
5,Microsoft,2016,"Redmond, WA, US",561.0,85.3
6,Hitachi,2016,"Tokyo, Japan",32.0,84.5
7,IBM,2016,"Armonk, NY, US",145.0,79.9
8,Huawei,2016,"Shenzhen, China",,78.5
9,Sony,2016,"Tokyo, Japan",51.0,70.1


How many companies are in this data frame?

In [8]:
len(tech_firms)

14

Set the index of the Data Frame to be the `"Company"` column.

In [9]:
tech_firms = tech_firms.set_index('Company')

The financial year (`"FY"`) is the same for each company and isn't really necessary, drop this column from the table.

In [10]:
tech_firms = tech_firms.drop('FY', axis=1)

Huawei's Market Cap is officially a secret, but is estimated to be around $7.18B, fill in the market capitalisation for `Huawei` with this value:

In [11]:
tech_firms.loc['Huawei', 'Market cap ($B)'] = 7.1

The employee number figures are given below in the variable employees. 

Add a new column named 'Employees' to your Data Frame.

In [12]:
employees = [116000, 341400, 325000, 726772, 72053, 114000, 303887,
             414400, 180000, 128400, 257533, 138000, 106000, 195000]

tech_firms['Employees'] = employees

tech_firms.head()

Unnamed: 0_level_0,Headquarters,Market cap ($B),Revenue ($B),Employees
Company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Apple Inc.,"Cupertino, CA, US",815.39,215.6,116000
Amazon.com,"Seattle, WA, US",478.0,135.9,341400
Samsung Electronics,"Suwon, South Korea",311.0,173.9,325000
Foxconn,"New Taipei City, Taiwan",66.0,135.1,726772
Alphabet Inc.,"Mountain View, CA, US",676.0,90.2,72053


What is Microsoft's revenue per employee?

In [13]:
billion = 10 ** 9
tech_firms.loc['Microsoft', 'Revenue ($B)'] * billion / tech_firms.loc['Microsoft', 'Employees']

748245.61403508775

Select all the companies with a `"Headquarters"` in either `"Tokyo, Japan"` or `"Osaka, Japan"`. (Don't set this to a variable.)

In [14]:
tech_firms[(tech_firms['Headquarters'] == "Tokyo, Japan") | (tech_firms['Headquarters'] == "Osaka, Japan")]

Unnamed: 0_level_0,Headquarters,Market cap ($B),Revenue ($B),Employees
Company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Hitachi,"Tokyo, Japan",32.0,84.5,303887
Sony,"Tokyo, Japan",51.0,70.1,128400
Panasonic,"Osaka, Japan",33.0,67.7,257533


Select all the companies with more than 200,000 employees that have a market capitalisation of over $100 billion. (Don't set this to a variable.)

In [15]:
tech_firms[(tech_firms['Employees'] > 200000) & (tech_firms['Market cap ($B)'] > 100)]

Unnamed: 0_level_0,Headquarters,Market cap ($B),Revenue ($B),Employees
Company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Amazon.com,"Seattle, WA, US",478.0,135.9,341400
Samsung Electronics,"Suwon, South Korea",311.0,173.9,325000
IBM,"Armonk, NY, US",145.0,79.9,414400


Sort the data frame by alphabetical order of the company.

In [16]:
tech_firms = tech_firms.sort_index()

tech_firms

Unnamed: 0_level_0,Headquarters,Market cap ($B),Revenue ($B),Employees
Company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Alphabet Inc.,"Mountain View, CA, US",676.0,90.2,72053
Amazon.com,"Seattle, WA, US",478.0,135.9,341400
Apple Inc.,"Cupertino, CA, US",815.39,215.6,116000
Dell Technologies,"Austin, TX, US",14.0,64.8,138000
Foxconn,"New Taipei City, Taiwan",66.0,135.1,726772
Hewlett Packard Enterprise,"Palo Alto, CA, US",30.0,50.1,195000
Hitachi,"Tokyo, Japan",32.0,84.5,303887
Huawei,"Shenzhen, China",7.1,78.5,180000
IBM,"Armonk, NY, US",145.0,79.9,414400
Intel,"Santa Clara, CA, US",163.0,59.3,106000


What is the minimum market capitalisation of the companies with a revenue more than $75 billion?

In [17]:
np.min(tech_firms[tech_firms['Revenue ($B)'] > 75]['Market cap ($B)']) * billion

7100000000.0

What is the mean revenue of all the companies?

In [18]:
np.mean(tech_firms['Revenue ($B)']) * billion

99350000000.000015

What is the total market capitalisation of all the companies?

In [19]:
np.sum(tech_firms['Market cap ($B)']) * billion

3382490000000.0