# Introduction to Pandas

Similar to the import convention for NumPy (import numpy as np), the import convention for pandas is:

We'll be working with data set from Fortune magazine's Global 500 list 2017, which ranks the top 500 corporations worldwide by revenue.

## Pandas and Numpy

### Import data

In [19]:
import numpy as np
import pandas as pd
f500 = pd.read_csv("f500.csv")

In [20]:
# What is the type of f500 ?
type(f500)

pandas.core.frame.DataFrame

In [10]:
# What is the shape of f500 ?
f500.shape

(500, 16)

### Introducing dataframe

In [21]:
f500

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
0,Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
1,State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
2,Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
3,China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
4,Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210
5,Volkswagen,6,240264,1.5,5937.3,432116,,Matthias Muller,Motor Vehicles and Parts,Motor Vehicles & Parts,7,Germany,"Wolfsburg, Germany",http://www.volkswagen.com,23,626715,97753
6,Royal Dutch Shell,7,240033,-11.8,4575.0,411275,135.9,Ben van Beurden,Petroleum Refining,Energy,5,Netherlands,"The Hague, Netherlands",http://www.shell.com,23,89000,186646
7,Berkshire Hathaway,8,223604,6.1,24074.0,620854,,Warren E. Buffett,Insurance: Property and Casualty (Stock),Financials,11,USA,"Omaha, NE",http://www.berkshirehathaway.com,21,367700,283001
8,Apple,9,215639,-7.7,45687.0,321686,-14.4,Timothy D. Cook,"Computers, Office Equipment",Technology,9,USA,"Cupertino, CA",http://www.apple.com,15,116000,128249
9,Exxon Mobil,10,205004,-16.7,7840.0,330314,-51.5,Darren W. Woods,Petroleum Refining,Energy,6,USA,"Irving, TX",http://www.exxonmobil.com,23,72700,167325


Dataframes are two dimensional pandas objects, the pandas equivalent of a Numpy 2D ndarray. Unlike NumPy, pandas does not use the same type for 1D and 2D arrays.

In [13]:
# Get information about the types of each column
f500.dtypes

rank                          int64
revenues                      int64
revenue_change              float64
profits                     float64
assets                        int64
profit_change               float64
ceo                          object
industry                     object
sector                       object
previous_rank                 int64
country                      object
hq_location                  object
website                      object
years_on_global_500_list      int64
employees                     int64
total_stockholder_equity      int64
dtype: object

In [14]:
# Get the first few rows
f500.head()

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


In [15]:
# Get the last few rows
f500.tail()

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Teva Pharmaceutical Industries,496,21903,11.5,329.0,92890,-79.3,Yitzhak Peterburg,Pharmaceuticals,Health Care,0,Israel,"Petach Tikva, Israel",http://www.tevapharm.com,1,56960,33337
New China Life Insurance,497,21796,-13.3,743.9,100609,-45.6,Wan Feng,"Insurance: Life, Health (stock)",Financials,427,China,"Beijing, China",http://www.newchinalife.com,2,54378,8507
Wm. Morrison Supermarkets,498,21741,-11.3,406.4,11630,20.4,David T. Potts,Food and Drug Stores,Food & Drug Stores,437,Britain,"Bradford, Britain",http://www.morrisons.com,13,77210,5111
TUI,499,21655,-5.5,1151.7,16247,195.5,Friedrich Joussen,Travel Services,Business Services,467,Germany,"Hanover, Germany",http://www.tuigroup.com,23,66779,3006
AutoNation,500,21609,3.6,430.5,10060,-2.7,Michael J. Jackson,Specialty Retailers,Retailing,0,USA,"Fort Lauderdale, FL",http://www.autonation.com,12,26000,2310


In [16]:
# Overview 
f500.info()

<class 'pandas.core.frame.DataFrame'>
Index: 500 entries, Walmart to AutoNation
Data columns (total 16 columns):
rank                        500 non-null int64
revenues                    500 non-null int64
revenue_change              498 non-null float64
profits                     499 non-null float64
assets                      500 non-null int64
profit_change               436 non-null float64
ceo                         500 non-null object
industry                    500 non-null object
sector                      500 non-null object
previous_rank               500 non-null int64
country                     500 non-null object
hq_location                 500 non-null object
website                     500 non-null object
years_on_global_500_list    500 non-null int64
employees                   500 non-null int64
total_stockholder_equity    500 non-null int64
dtypes: float64(3), int64(7), object(6)
memory usage: 66.4+ KB


### Selecting columns from a dataframe by label

Because our axes in pandas have labels, we can select data using those labels, unlike in NumPy where we needed to know the exact index location. To do this, we use the DataFrame.loc[] method.

In [17]:
# Select a single column by specifying a single label
f500.loc[:, 'rank']

company
Walmart                                           1
State Grid                                        2
Sinopec Group                                     3
China National Petroleum                          4
Toyota Motor                                      5
Volkswagen                                        6
Royal Dutch Shell                                 7
Berkshire Hathaway                                8
Apple                                             9
Exxon Mobil                                      10
McKesson                                         11
BP                                               12
UnitedHealth Group                               13
CVS Health                                       14
Samsung Electronics                              15
Glencore                                         16
Daimler                                          17
General Motors                                   18
AT&T                                             19
EXOR

We see that selecting a single column returns a pandas series. We'll talk about pandas series later in this notebook, but for now the important thing is to note that the new series has the same index axis labels as the original dataframe.

In [18]:
# Select multiple columns using a list of labels
f500.loc[:, ['rank', 'country']]

Unnamed: 0_level_0,rank,country
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Walmart,1,USA
State Grid,2,China
Sinopec Group,3,China
China National Petroleum,4,China
Toyota Motor,5,Japan
Volkswagen,6,Germany
Royal Dutch Shell,7,Netherlands
Berkshire Hathaway,8,USA
Apple,9,USA
Exxon Mobil,10,USA


When we use a list of labels, a dataframe is returned with only the columns specified in our list, in the order specified in our list. Just like when we used a single column label, the new dataframe has the same index axis labels as the original.

In [19]:
# Select multiple columns using slicing
f500.loc[:, 'rank':'industry']

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts
Volkswagen,6,240264,1.5,5937.3,432116,,Matthias Muller,Motor Vehicles and Parts
Royal Dutch Shell,7,240033,-11.8,4575.0,411275,135.9,Ben van Beurden,Petroleum Refining
Berkshire Hathaway,8,223604,6.1,24074.0,620854,,Warren E. Buffett,Insurance: Property and Casualty (Stock)
Apple,9,215639,-7.7,45687.0,321686,-14.4,Timothy D. Cook,"Computers, Office Equipment"
Exxon Mobil,10,205004,-16.7,7840.0,330314,-51.5,Darren W. Woods,Petroleum Refining


### Two simpler ways to select columns

In [20]:
# 1. Single bracket
f500['country']

company
Walmart                                                 USA
State Grid                                            China
Sinopec Group                                         China
China National Petroleum                              China
Toyota Motor                                          Japan
Volkswagen                                          Germany
Royal Dutch Shell                               Netherlands
Berkshire Hathaway                                      USA
Apple                                                   USA
Exxon Mobil                                             USA
McKesson                                                USA
BP                                                  Britain
UnitedHealth Group                                      USA
CVS Health                                              USA
Samsung Electronics                             South Korea
Glencore                                        Switzerland
Daimler                         

In [23]:
f500[['rank','country']]

Unnamed: 0_level_0,rank,country
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Walmart,1,USA
State Grid,2,China
Sinopec Group,3,China
China National Petroleum,4,China
Toyota Motor,5,Japan
Volkswagen,6,Germany
Royal Dutch Shell,7,Netherlands
Berkshire Hathaway,8,USA
Apple,9,USA
Exxon Mobil,10,USA


In [24]:
# 2. Dot Accessor
f500.ceo

company
Walmart                                                       C. Douglas McMillon
State Grid                                                                Kou Wei
Sinopec Group                                                           Wang Yupu
China National Petroleum                                            Zhang Jianhua
Toyota Motor                                                          Akio Toyoda
Volkswagen                                                        Matthias Muller
Royal Dutch Shell                                                 Ben van Beurden
Berkshire Hathaway                                              Warren E. Buffett
Apple                                                             Timothy D. Cook
Exxon Mobil                                                       Darren W. Woods
McKesson                                                       John H. Hammergren
BP                                                               Robert W. Dudley
UnitedHe

### Selecting Items from a Series by Label

Series is the pandas type for one-dimensional objects. Anytime you see a 1D pandas object, it will be a series, and anytime you see a 2D pandas object, it will be a dataframe.

In [27]:
ceos = f500['ceo']

In [28]:
type(ceos)

pandas.core.series.Series

Just like dataframes, we can use Series.loc[] to select items from a series using single labels, a list, or a slice object. We can also omit loc[] and use bracket shortcuts for all three.

In [40]:
# Using .loc with a single label
ceos.loc['Facebook']

'Mark Zuckerberg'

In [41]:
# Using .loc with a list
ceos.loc[['Berkshire Hathaway', 'Amazon.com']]

company
Berkshire Hathaway    Warren E. Buffett
Amazon.com             Jeffrey P. Bezos
Name: ceo, dtype: object

In [35]:
# Using .loc with a slice
ceos.loc['Walmart':'Apple']

company
Walmart                     C. Douglas McMillon
State Grid                              Kou Wei
Sinopec Group                         Wang Yupu
China National Petroleum          Zhang Jianhua
Toyota Motor                        Akio Toyoda
Volkswagen                      Matthias Muller
Royal Dutch Shell               Ben van Beurden
Berkshire Hathaway            Warren E. Buffett
Apple                           Timothy D. Cook
Name: ceo, dtype: object

### Selecting Rows From a Dataframe by Label

In [44]:
# Select a single row
f500.country.head(100)

company
Walmart                                          USA
State Grid                                     China
Sinopec Group                                  China
China National Petroleum                       China
Toyota Motor                                   Japan
Volkswagen                                   Germany
Royal Dutch Shell                        Netherlands
Berkshire Hathaway                               USA
Apple                                            USA
Exxon Mobil                                      USA
McKesson                                         USA
BP                                           Britain
UnitedHealth Group                               USA
CVS Health                                       USA
Samsung Electronics                      South Korea
Glencore                                 Switzerland
Daimler                                      Germany
General Motors                                   USA
AT&T                                  

In [52]:
# Select a list of row
f500.loc[['Alphabet', 'Toyota Motor']]

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Alphabet,65,90272,20.4,19478.0,167497,19.1,Larry Page,Internet Services and Retailing,Technology,94,USA,"Mountain View, CA",http://www.abc.xyz,9,72053,139036
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


In [68]:
# Select mulitple rows using slicing
f500.loc['Samsung Electronics':'AT&T']

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Samsung Electronics,15,173957,-2.0,19316.5,217104,16.8,Oh-Hyun Kwon,"Electronics, Electrical Equip.",Technology,13,South Korea,"Suwon, South Korea",http://www.samsung.com,23,325000,154376
Glencore,16,173883,2.0,1379.0,124600,,Ivan Glasenberg,"Mining, Crude-Oil Production",Energy,14,Switzerland,"Baar, Switzerland",http://www.glencore.com,7,93123,44243
Daimler,17,169483,2.2,9428.4,256262,0.9,Dieter Zetsche,Motor Vehicles and Parts,Motor Vehicles & Parts,16,Germany,"Stuttgart, Germany",http://www.daimler.com,23,282488,61116
General Motors,18,166380,9.2,9427.0,221690,-2.7,Mary T. Barra,Motor Vehicles and Parts,Motor Vehicles & Parts,20,USA,"Detroit, MI",http://www.gm.com,23,225000,43836
AT&T,19,163786,11.6,12976.0,403821,-2.8,Randall L. Stephenson,Telecommunications,Telecommunications,23,USA,"Dallas, TX",http://www.att.com,23,268540,123135


### Series and Dataframe Describe Methods

In [75]:
# Describe the dataframe, show only numeric comlumns
f500.describe()

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,previous_rank,years_on_global_500_list,employees,total_stockholder_equity
count,500.0,500.0,498.0,499.0,500.0,436.0,500.0,500.0,500.0,500.0
mean,250.5,55416.358,4.538353,3055.203206,243632.3,24.152752,222.134,15.036,133998.3,30628.076
std,144.481833,45725.478963,28.549067,5171.981071,485193.7,437.509566,146.941961,7.932752,170087.8,43642.576833
min,1.0,21609.0,-67.3,-13038.0,3717.0,-793.7,0.0,1.0,328.0,-59909.0
25%,125.75,29003.0,-5.9,556.95,36588.5,-22.775,92.75,7.0,42932.5,7553.75
50%,250.5,40236.0,0.55,1761.6,73261.5,-0.35,219.5,17.0,92910.5,15809.5
75%,375.25,63926.75,6.975,3954.0,180564.0,17.7,347.25,23.0,168917.2,37828.5
max,500.0,485873.0,442.3,45687.0,3473238.0,8909.5,500.0,23.0,2300000.0,301893.0


In [77]:
# Describe the dataframe, show only non-numeric comlumns
f500.describe(include=['O'])

Unnamed: 0,ceo,industry,sector,country,hq_location,website
count,500,500,500,500,500,500
unique,500,58,21,34,235,500
top,William J. DeLaney III,Banks: Commercial and Savings,Financials,USA,"Beijing, China",http://www.sinochem.com
freq,1,51,118,132,56,1


In [73]:
# Describe a Numeric Serie
f500['profits'].describe()

count      499.000000
mean      3055.203206
std       5171.981071
min     -13038.000000
25%        556.950000
50%       1761.600000
75%       3954.000000
max      45687.000000
Name: profits, dtype: float64

In [74]:
# Describe a String Serie
f500['country'].describe()

count     500
unique     34
top       USA
freq      132
Name: country, dtype: object

### Some Aggregrate Functions

In [81]:
# Top 10 countries
f500['country'].value_counts().head(10)

USA            132
China          109
Japan           51
Germany         29
France          29
Britain         24
South Korea     15
Switzerland     14
Netherlands     14
Canada          11
Name: country, dtype: int64

In [106]:
# Maximum profit
profits = f500['profits']
profits.max()

45687.0

In [110]:
# Which company has the maximum profit
profits.idxmax()

'Apple'

In [113]:
# Sum of all revenues
f500['revenues'].sum()

27708179

### Assignment with pandas

When we used NumPy, we learned that the same techniques that we use to select data could be used for assignment. 

In [158]:
# Setting a value
f500.loc['Facebook', 'ceo'] = 'Pandas'

In [160]:
f500.ceo.Facebook

'Pandas'

In [162]:
# Add a new column
f500['revenues_b'] = f500['revenues'] / 1000

In [163]:
f500.head()

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,revenues_b
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798,485.873
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456,315.199
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523,267.518
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893,262.573
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210,254.694


### Using Boolean Indexing with pandas Objects

Just like NumPy, pandas allows us to use boolean indexing to select items based on their value

In [188]:
# Get a boolean Series, Is a country in South Korea ?
in_korea = (f500['country'] == 'South Korea')

In [189]:
in_korea

company
Walmart                                         False
State Grid                                      False
Sinopec Group                                   False
China National Petroleum                        False
Toyota Motor                                    False
Volkswagen                                      False
Royal Dutch Shell                               False
Berkshire Hathaway                              False
Apple                                           False
Exxon Mobil                                     False
McKesson                                        False
BP                                              False
UnitedHealth Group                              False
CVS Health                                      False
Samsung Electronics                              True
Glencore                                        False
Daimler                                         False
General Motors                                  False
AT&T                

In [230]:
# Get countries in South Korea 
f500[in_korea].head()

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity,revenues_b
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Samsung Electronics,15.0,173957.0,-2.0,19316.5,217104.0,16.8,Oh-Hyun Kwon,"Electronics, Electrical Equip.",Technology,13.0,South Korea,"Suwon, South Korea",http://www.samsung.com,23.0,325000.0,154376.0,173.957
Hyundai Motor,78.0,80701.0,-0.8,4659.0,148092.0,-17.9,Mong-Koo Chung,Motor Vehicles and Parts,Motor Vehicles & Parts,84.0,South Korea,"Seoul, South Korea",http://worldwide.hyundai.com,22.0,129315.0,55639.0,80.701
SK Holdings,95.0,72579.0,107.4,659.7,85332.0,-86.0,Tae Won Chey,Petroleum Refining,Energy,294.0,South Korea,"Seoul, South Korea",http://www.sk.co.kr,2.0,84000.0,10858.0,72.579
Korea Electric Power,177.0,51500.0,-0.6,6074.1,147265.0,-48.3,Hwan-Eik Cho,Utilities,Energy,172.0,South Korea,"Jeollanam-do, South Korea",http://www.kepco.co.kr,23.0,43688.0,59394.0,51.5
LG Electronics,201.0,47712.0,-4.6,66.2,31348.0,-39.8,Seong-Jin Jo,"Electronics, Electrical Equip.",Technology,180.0,South Korea,"Seoul, South Korea",http://www.lg.com,17.0,75000.0,9926.0,47.712


### Using Boolean Arrays to Assign Values

In [200]:
f500[['rank','previous_rank']].head()

Unnamed: 0_level_0,rank,previous_rank
company,Unnamed: 1_level_1,Unnamed: 2_level_1
Walmart,1,1
State Grid,2,2
Sinopec Group,3,4
China National Petroleum,4,3
Toyota Motor,5,8


In [203]:
f500['previous_rank'].value_counts().head()

0      33
159     1
147     1
148     1
149     1
Name: previous_rank, dtype: int64

In [204]:
f500['previous_rank'] == 0

company
Walmart                                         False
State Grid                                      False
Sinopec Group                                   False
China National Petroleum                        False
Toyota Motor                                    False
Volkswagen                                      False
Royal Dutch Shell                               False
Berkshire Hathaway                              False
Apple                                           False
Exxon Mobil                                     False
McKesson                                        False
BP                                              False
UnitedHealth Group                              False
CVS Health                                      False
Samsung Electronics                             False
Glencore                                        False
Daimler                                         False
General Motors                                  False
AT&T                

In [207]:
f500[f500['previous_rank'] == 0] = np.nan

In [208]:
f500['previous_rank'].value_counts().head()

471.0    1
204.0    1
190.0    1
125.0    1
166.0    1
Name: previous_rank, dtype: int64

### Examples

#### Top 5 headquarter location in US


In [220]:
f500.loc[f500['country'] == 'USA', 'hq_location'].value_counts().head()

New York, NY      15
Houston, TX        5
Chicago, IL        4
Atlanta, GA        4
Cincinnati, OH     3
Name: hq_location, dtype: int64

#### Top 3 sector in China

In [225]:
f500.loc[f500['country'] == 'China', 'sector'].value_counts().head()

Financials                    22
Energy                        22
Engineering & Construction     8
Motor Vehicles & Parts         7
Materials                      7
Name: sector, dtype: int64

#### Average number of employees for companies headquartered in Japan

In [229]:
f500.loc[f500['country'] == 'Japan', 'employees'].mean()

104564.45098039215