# 1. Introduction

 In this mission, we'll continue working with the 2017 Fortune Global 500 dataset as we learn more advanced selection and exploration techniques.

In [1]:
# import pandas module
import pandas as pd

# read data file
f500=pd.read_csv('f500.csv')


In [2]:
f500.head()

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
0,Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
1,State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
2,Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
3,China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
4,Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


In [3]:
f500['previous_rank'].value_counts(dropna=False).head()

0      33
159     1
147     1
148     1
149     1
Name: previous_rank, dtype: int64

In [4]:
# replace 0 values in previous_rank column
import numpy as np 
f500.loc[f500['previous_rank']==0,'previous_rank']=np.nan

In [5]:
f500['previous_rank'].value_counts(dropna=False).head()

NaN      33
471.0     1
234.0     1
125.0     1
166.0     1
Name: previous_rank, dtype: int64

## TODO
* Select the rank, revenues, and revenue_change columns in f500. Then, use the DataFrame.head() method to select the first five rows. Assign the result to f500_selection.
* Use the variable inspector to view f500_selection. Compare the results to the first few lines of our CSV file above.

In [6]:
f500_selection=f500[['rank','revenues','revenue_change']].head()

In [7]:
f500_selection

Unnamed: 0,rank,revenues,revenue_change
0,1,485873,0.8
1,2,315199,-4.4
2,3,267518,-9.1
3,4,262573,-12.3
4,5,254694,7.7


# 2. Reading CSV files with pandas

In [8]:
f500 = pd.read_csv("f500.csv", index_col=0)
#f500.index.name = None

In [9]:
f500.head()

Unnamed: 0_level_0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
company,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


**`index_col` parameter is an optional argument and should specify which column to use as the row labels for the dataframe. When we used a value of 0, we specified that we wanted to use the first column as the row labels.**

In [10]:
f500 = pd.read_csv("f500.csv", index_col=0)
f500.index.name = None

In [11]:
f500.head()

Unnamed: 0,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523
China National Petroleum,4,262573,-12.3,1867.5,585619,-73.7,Zhang Jianhua,Petroleum Refining,Energy,3,China,"Beijing, China",http://www.cnpc.com.cn,17,1512048,301893
Toyota Motor,5,254694,7.7,16899.3,437575,-12.3,Akio Toyoda,Motor Vehicles and Parts,Motor Vehicles & Parts,8,Japan,"Toyota, Japan",http://www.toyota-global.com,23,364445,157210


* Notice that above the index labels is the text company, the name of the first column in the CSV. Pandas used this value as the axis name for the index axis.

* **Both the column and index axes can have names assigned to them.**
* However, we originally used the code below to access the name of the index axes and set it to None, so our dataframe didn't have a name for the index axis:

# 3. Using iloc to select by integer position

In [12]:
f500=pd.read_csv('f500.csv')

In [13]:
f500.head(3)

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
0,Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
1,State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
2,Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523


### There are two differences with this approach:

* **The company column is now included as a regular column, instead of being used for the index.**
* **The index labels are now integers starting from 0**

* This is the more conventional way to read in a dataframe

* In some scenarios, using labels to make selections makes things easier — in others though, it makes things harder.

* Just like in NumPy, we can also use integer positions to select data using `Dataframe.iloc[]` and `Series.iloc[]`. It's easy to get loc[] and iloc[] confused at first, but the easiest way is to remember the first letter of each method:

* loc: label based selection
* iloc: integer position based selection

**Using `iloc[]` is almost identical to indexing with NumPy, with integer positions starting at 0 like ndarrays and Python lists.**

`df.iloc[row_index, column_index]`

## TODO
* Select just the fifth row of the f500 dataframe. Assign the result to fifth_row.
* Select the value in first row of the company column. Assign the result to company_value.

In [14]:
fifth_row=f500.iloc[4]
company_value=f500.iloc[0,0]

In [15]:
fifth_row

company                                     Toyota Motor
rank                                                   5
revenues                                          254694
revenue_change                                       7.7
profits                                          16899.3
assets                                            437575
profit_change                                      -12.3
ceo                                          Akio Toyoda
industry                        Motor Vehicles and Parts
sector                            Motor Vehicles & Parts
previous_rank                                          8
country                                            Japan
hq_location                                Toyota, Japan
website                     http://www.toyota-global.com
years_on_global_500_list                              23
employees                                         364445
total_stockholder_equity                          157210
Name: 4, dtype: object

# 4. Using iloc to select by integer position continued

* With loc[], the ending slice is included.
* With iloc[], the ending slice is not included.

<block><pre>
Select by integer position	                   Explicit Syntax	         Shorthand Convention
Single column from dataframe	                 df.iloc[:,3]	
List of columns from dataframe	                 df.iloc[:,[3,5,6]]	
Slice of columns from dataframe	                 df.iloc[:,3:7]	
Single row from dataframe	                     df.iloc[20]	
List of rows from dataframe	                     df.iloc[[0,3,8]]	
Slice of rows from dataframe	                 df.iloc[3:5]	                df[3:5]
Single items from series	                      s.iloc[8]	                    s[8]
List of item from series	                      s.iloc[[2,8,1]]	            s[[2,8,1]]
Slice of items from series	                      s.iloc[5:10]	                s[5:10]

<block></pre>

## TODO
* Select the first three rows of the f500 dataframe. Assign the result to first_three_rows.
* Select the first and seventh rows and the first five columns of the f500 dataframe. Assign the result to first_seventh_row_slice.

In [16]:
first_three_rows=f500.iloc[0:3]
first_seventh_row=f500.iloc[[0,6]]

In [17]:
first_three_rows

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
0,Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
1,State Grid,2,315199,-4.4,9571.3,489838,-6.2,Kou Wei,Utilities,Energy,2,China,"Beijing, China",http://www.sgcc.com.cn,17,926067,209456
2,Sinopec Group,3,267518,-9.1,1257.9,310726,-65.0,Wang Yupu,Petroleum Refining,Energy,4,China,"Beijing, China",http://www.sinopec.com,19,713288,106523


In [18]:
first_seventh_row

Unnamed: 0,company,rank,revenues,revenue_change,profits,assets,profit_change,ceo,industry,sector,previous_rank,country,hq_location,website,years_on_global_500_list,employees,total_stockholder_equity
0,Walmart,1,485873,0.8,13643.0,198825,-7.2,C. Douglas McMillon,General Merchandisers,Retailing,1,USA,"Bentonville, AR",http://www.walmart.com,23,2300000,77798
6,Royal Dutch Shell,7,240033,-11.8,4575.0,411275,135.9,Ben van Beurden,Petroleum Refining,Energy,5,Netherlands,"The Hague, Netherlands",http://www.shell.com,23,89000,186646


# 5. Using pandas methods to create boolean masks

* we used Python boolean operators like >, <, and == to create boolean masks to select subsets of data. There are also a number of pandas methods that return boolean masks useful for exploring data

* Two examples are the `Series.isnull()` method and `Series.notnull()` method. These can be used to select either rows that contain null (or NaN) values or rows that do not contain null values for a certain column.

## TODO
* Use the Series.isnull() method to select all rows from f500 that have a null value for the previous_rank column.Select only the company, rank, and previous_rank columns. Assign the result to null_previous_rank.

In [33]:
null_previous_rank = f500[f500["previous_rank"].isnull()][["company","rank", "previous_rank"]]

In [34]:
null_previous_rank
# why it is not working with isnull() ???????????????

Unnamed: 0,company,rank,previous_rank


In [35]:
null_previous_rankk = f500[f500["previous_rank"]==0][["company","rank", "previous_rank"]]

In [36]:
null_previous_rankk

Unnamed: 0,company,rank,previous_rank
48,Legal & General Group,49,0
90,Uniper,91,0
123,Dell Technologies,124,0
138,Anbang Insurance Group,139,0
140,Albertsons Cos.,141,0
180,Hewlett Packard Enterprise,181,0
267,Hengli Group,268,0
271,Johnson Controls International,272,0
341,Chubb,342,0
375,Charter Communications,376,0


# 6. Working with Integer Labels