## Indexing the columns in a DataFrame
A column in a dataframe can be accessed as a Series either by dict-like notation or by attribute

In [1]:
import pandas as pd

In [2]:
data = {
    'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
    'year': [2000, 2001, 2002, 2001, 2002, 2003],
    'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]
    }

frame = pd.DataFrame(data)

In [3]:
frame


Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9
5,Nevada,2003,3.2


### Indexing a single column from a dataframe
You can do this by doing the following:<br>
<i>frame['column_name']</i><br><br>
If a single column is indexed, then a Series object will be returned
You can verify this by using the type() method.<br>
The return will be "pandas.core.series.Series"

In [4]:
#Indexing the dataframe column with a dict-like notation
frame['state']

0      Ohio
1      Ohio
2      Ohio
3    Nevada
4    Nevada
5    Nevada
Name: state, dtype: object

In [5]:
print(frame['year'])
type(frame['year'])

0    2000
1    2001
2    2002
3    2001
4    2002
5    2003
Name: year, dtype: int64


pandas.core.series.Series

### Indexing multiple columns 
We can index multiple columns by passing a list of column names like: <br>

frame [['$\lt$column1$\gt$', '$\lt$column1$\gt$']] <br>

<i>Note: Unlike while indexing a single column, the object returned while multiple column indexing will be of the type DataFrame </i>

In [10]:
frame[['state' ,'pop']]

Unnamed: 0,state,pop
0,Ohio,1.5
1,Ohio,1.7
2,Ohio,3.6
3,Nevada,2.4
4,Nevada,2.9
5,Nevada,3.2


In [11]:
type(frame[['state', 'pop']])

pandas.core.frame.DataFrame

## Indexing the rows in a DataFrame

To index the rows of a DataFrame, we can use the <b>loc and the iloc</b> indexers


## The loc and iloc methods

The loc and iloc methods are used to index the rows (to be more specific), but it can also be used to filter the columns subsequently. <br>

The loc indexer is used to index rows by label while iloc is used for implicit index position.<br>

<i><strong> These functions take the row indices as the first argument and the second argument that they take is the columns to be shown </strong></i>

Syntax:<br>
df.iloc[row_index,column_index] <br>
If no column index is specified, the default value will be "all"<br>
i.e. ":", I guess
<br><br>


In case of multiple rows and multiple columns, we will have to pass a list of the indices like:<br>
df.iloc [ [r1, r5], [c1, c3, c5] ]


<b>NOTE:</b> <br>
Instead of passing the row and column indices with comma separation, we can also use the slicing notation like:<br>
df.iloc[start:stop, start:stop]





In [17]:
people = {
    "first": ["Corey", "Jane", "John"],
    "last": ["Schafer", "Doe", "Randall"],
    "email": ["corey@gmail.com", "jane@gmail.com", "john@gmail.com"]
}

my_df = pd.DataFrame(people)
my_df

Unnamed: 0,first,last,email
0,Corey,Schafer,corey@gmail.com
1,Jane,Doe,jane@gmail.com
2,John,Randall,john@gmail.com


## Using the iloc indexer
We implement the implicit index of the rows here<br>
The implicit index meaning the range(n)

<b> Important Note: </b><br>
Just like for the columns,if we access a singe row from the dataframe, it will return a Series object <br>
And if we access multiple rows, a dataframe obj will be returned

### Indexing a single row 

In [19]:
my_df.iloc[0]

first              Corey
last             Schafer
email    corey@gmail.com
Name: 0, dtype: object

In [20]:
type(my_df.iloc[0])

pandas.core.series.Series

### Indexing muliple rows

Just like for columns, pass a list of the integers (in case of iloc) or row indices (in case of loc)

In [24]:
my_df.iloc[[0, 1]] #will return the rows at 0 and 1 index with all their columns

Unnamed: 0,first,last,email
0,Corey,Schafer,corey@gmail.com
1,Jane,Doe,jane@gmail.com


In [26]:
#Specifying which columns to show
my_df.iloc[[0,1],2] #will return the rows at 0 and 1 index with the colunmn at index 2, i.e.email

0    corey@gmail.com
1     jane@gmail.com
Name: email, dtype: object

In [32]:
my_df.iloc[:2,:]

Unnamed: 0,first,last,email
0,Corey,Schafer,corey@gmail.com
1,Jane,Doe,jane@gmail.com


## Using the loc indexer
This is used for explicit indexing of the rows and columns in the dataframe


In [33]:
my_df2 = pd.DataFrame(people, index = ['a','b','c'])
my_df2

Unnamed: 0,first,last,email
a,Corey,Schafer,corey@gmail.com
b,Jane,Doe,jane@gmail.com
c,John,Randall,john@gmail.com


In [35]:
#Indexing a single row
my_df2.loc['a']

first              Corey
last             Schafer
email    corey@gmail.com
Name: a, dtype: object

In [36]:
#Indexing multiple rows
#pass a list of row indices (the explicit ones)
my_df2.loc[['a','b']]

Unnamed: 0,first,last,email
a,Corey,Schafer,corey@gmail.com
b,Jane,Doe,jane@gmail.com


In [37]:
#Filtering the columns
#pass the column index/indices as the second argument

my_df2.loc[['a','b'],'email']

a    corey@gmail.com
b     jane@gmail.com
Name: email, dtype: object

In [41]:
#Specifing multiple columns
#Pass a list of column indices
my_df2.loc[:, ['email','last']]

Unnamed: 0,email,last
a,corey@gmail.com,Schafer
b,jane@gmail.com,Doe
c,john@gmail.com,Randall
