Refer links:
    
http://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/#comment-463

In [None]:
import pandas as pd
import random

In [42]:
# read the data from the downloaded CSV file.
data = pd.read_csv('https://s3-eu-west-1.amazonaws.com/shanebucket/downloads/uk-500.csv')
# set a numeric id for use as an index for examples.
data['id'] = [random.randint(0,1000) for x in range(data.shape[0])]
data.head(7)

Unnamed: 0,first_name,last_name,company_name,address,city,county,postal,phone1,phone2,email,web,id
0,Aleshia,Tomkiewicz,Alan D Rosenburg Cpa Pc,14 Taylor St,St. Stephens Ward,Kent,CT2 7PP,01835-703597,01944-369967,atomkiewicz@hotmail.com,http://www.alandrosenburgcpapc.co.uk,810
1,Evan,Zigomalas,Cap Gemini America,5 Binney St,Abbey Ward,Buckinghamshire,HP11 2AX,01937-864715,01714-737668,evan.zigomalas@gmail.com,http://www.capgeminiamerica.co.uk,97
2,France,Andrade,"Elliott, John W Esq",8 Moor Place,East Southbourne and Tuckton W,Bournemouth,BH6 3BE,01347-368222,01935-821636,france.andrade@hotmail.com,http://www.elliottjohnwesq.co.uk,846
3,Ulysses,Mcwalters,"Mcmahan, Ben L",505 Exeter Rd,Hawerby cum Beesby,Lincolnshire,DN36 5RP,01912-771311,01302-601380,ulysses@hotmail.com,http://www.mcmahanbenl.co.uk,445
4,Tyisha,Veness,Champagne Room,5396 Forth Street,Greets Green and Lyng Ward,West Midlands,B70 9DT,01547-429341,01290-367248,tyisha.veness@hotmail.com,http://www.champagneroom.co.uk,846
5,Eric,Rampy,"Thompson, Michael C Esq",9472 Lind St,Desborough,Northamptonshire,NN14 2GH,01969-886290,01545-817375,erampy@rampy.co.uk,http://www.thompsonmichaelcesq.co.uk,984
6,Marg,Grasmick,Wrangle Hill Auto Auct & Slvg,7457 Cowl St #70,Bargate Ward,Southampton,SO14 3TY,01865-582516,01362-620532,marg@hotmail.com,http://www.wranglehillautoauctslvg.co.uk,931


Selecting pandas data using “iloc”

The iloc indexer for Pandas Dataframe is used for integer-location based indexing / selection by position.

The iloc indexer syntax is data.iloc[<row selection>, <column selection>], which is sure to be a source of confusion for R users. “iloc” in pandas is used to select rows and columns by number, in the order that they appear in the data frame. You can imagine that each row has a row number from 0 to the total rows (data.shape[0])  and iloc[] allows selections based on these numbers. The same applies for columns (ranging from 0 to data.shape[1] )

There are two “arguments” to iloc – a row selector, and a column selector. 


In [None]:
# first row of data frame
print(data.iloc[0,:])
# last row of data frame (Mi Richan)
print(data.iloc[-1,:])
# second column of data frame (last_name)
print(data.iloc[:,1])
#first 5 rows and 5th, 6th, 7th columns of data frame
print(data.iloc[0:5,4:7])
# 1st, 4th, 7th, 25th row + 1st 6th 7th columns.
print(data.iloc[[0,3,6,24],[0,5,6]])

There’s two gotchas to remember when using iloc in this manner:

1) Note that .iloc returns a Pandas Series when one row is selected, and a Pandas DataFrame when multiple rows are selected, or if any column in full is selected. To counter this, pass a single-valued list if you require DataFrame output.
2) When selecting multiple columns or multiple rows in this manner, remember that in your selection e.g.[1:5], the rows/columns selected will run from the first number to one minus the second number. e.g. [1:5] will go 1,2,3,4., [x,y] goes from x to y-1

Selecting pandas data using “loc”

The Pandas loc indexer can be used with DataFrames for two different use cases:

    a.) Selecting rows by label/index
    b.) Selecting rows with a boolean / conditional lookup

In [None]:
#Label-based / Index-based indexing using .loc
data.set_index("last_name", inplace=True)
data.head()

In [None]:
#this will give a series
print(data.loc['Andrade'])
#this will give in a dataframe format
print(data.loc[['Andrade']])
#select multiple rows and columns
print(data.loc[['Andrade','Veness'],['city','county','postal']])
# Select rows with index values 'Andrade' to 'Veness', with all columns between 'city' and 'email'
print(data.loc['Andrade':'Veness','city':'email'])
#resetting the index
data.reset_index(inplace=True)
# Change the index to be based on the 'id' column
data1 = data.set_index('id')
print(data1.loc[63])
#Note the difference with using iloc. Here iloc will give the series corresponding to 63rd observation
print(data1.iloc[63])

In [None]:
#Boolean / Logical indexing using .loc
#select company_name,email and phone for first name Erasmo
print(data.loc[data['first_name']=='Erasmo',['first_name','company_name','email','phone1']])
#here again, if a single is required and we want it in a series format,give it in form of sinle values list
print(data.loc[data['first_name']=='Erasmo',['email']])
# Select rows with first name Antonio, # and all columns between 'city' and 'email'
print(data.loc[data['first_name'] == 'Antonio', 'city':'email'])
# Select rows where the email column ends with 'hotmail.com', include above 2 cols
print(data.loc[data['email'].str.endswith("hotmail.com"),'city':'email'])
# Select rows with last_name equal to some values, all columns
print(data.loc[data['first_name'].isin(['France', 'Tyisha', 'Eric'])])
# Select rows with first name Antonio AND hotmail email addresses
print(data.loc[data['email'].str.endswith("gmail.com") & (data['first_name'] == 'Antonio')])
# select rows with id column between 100 and 200, and just return 'postal' and 'web' columns
print(data.loc[(data['id'] > 100) & (data['id'] <= 200), ['postal', 'web']])
# A lambda function that yields True/False values can also be used.
# Select rows where the company name has 4 words in it.
print(data.loc[data['company_name'].apply(lambda x: len(x.split(' ')) == 4)]) 
# Selections can be achieved outside of the main .loc for clarity:
# Form a separate variable with your selections:
idx = data['company_name'].apply(lambda x: len(x.split(' ')) == 4)
# Select only the True values in 'idx' and only the 3 columns specified:
print(data.loc[idx, ['email', 'first_name', 'company']])

Selecting pandas data using ix

The ix[] indexer is a hybrid of .loc and .iloc. Generally, ix is label based and acts just as the .loc indexer. However, .ix also supports integer type selections (as in .iloc) where passed an integer. This only works where the index of the DataFrame is not integer based. ix will accept any of the inputs of .loc and .iloc

In [47]:
# Select the third cell in the row named Marg.
#hence first set the first_name as index
data.reset_index(inplace=True)
data.set_index('first_name',inplace = True)
print(data.ix['Marg',2])
#Select the 3rd cell in the column last_name
print(data.ix[2,'last_name'])

7457 Cowl St #70
Andrade
