# Research - Not Sure...
I need to practice the data frame indexing, a lot! Is the following true?

Single brackets:

* Single brackets with name = column name - this returns a series, not a data frame
* Single brackets with value = row index - this returns a series, not a data frame

Double brackets:

* Double brackets - always produces a data frame.
* The same rules apply as with a single bracket - a name assumes column, an integers assume row index

Loc and iloc:

* loc - refers to locations by name
* iloc - refers to locations by integer index (hence **i**loc)


In [2]:
# Prepare our columns as lists
ints = list(range(4))
letters = ["a", "b", "c", "d"]

# add them to a dict
# the key is our proposed column name
# the values are the column rows
d = {'ints' : ints,
     'letters' : letters}

import pandas as pd
df = pd.DataFrame(d)

In [4]:
df

Unnamed: 0,ints,letters
0,0,a
1,1,b
2,2,c
3,3,d


In [26]:
# If you select a single column, it returns a series
df['ints']

0    0
1    1
2    2
3    3
Name: ints, dtype: int64

In [28]:
# You can select more than one column, by placing them in a list
df[['ints', 'letters']]

Unnamed: 0,ints,letters
0,0,a
1,1,b
2,2,c
3,3,d


In [36]:
# You can also 'trick' it into returning a single column as a data frame
# Use a list as the index, but only with a single row
df[['ints']]

Unnamed: 0,ints
0,0
1,1
2,2
3,3


In [37]:
# You can select a single column, and turn it into a list
df['ints'].tolist()

[0, 1, 2, 3]

In [38]:
# This doesn't work for dataframes, even single column data frames
df[['ints']].tolist()

AttributeError: 'DataFrame' object has no attribute 'tolist'

In [35]:
# When you select rows, it has to return a data frame.
# Rows are selected with a range - if you want a single range,
# slice accordingly
df[0:1]

# NOTE: each column can be a different type, so you can't assume
# it will return a single type matrix; Only a data frame can
# have a 'matrix' with columns of different types.

Unnamed: 0,ints,letters
0,0,a


In [39]:
# Select all rows
df[:]

Unnamed: 0,ints,letters
0,0,a
1,1,b
2,2,c
3,3,d


In [46]:
# You can mix the index stragies, like a list of list,
# to subset rows and columns
df[0:1]['ints']

0    0
Name: ints, dtype: int64

In [43]:
# Multiple columns, or a single column in a list,
# also returns a data frame
df[0:2][['ints']]

Unnamed: 0,ints
0,0
1,1


In [47]:
# The order of index isn't important.
# Names are assumed to be columns, and 
# integers rows.
df[['ints']][0:2]

Unnamed: 0,ints
0,0
1,1


In [None]:
# You can use loc and iloc to provide comma based i and j subsetting, like R.

In [53]:
# if you want row, col - you must use loc or iloc
# this will raise error
df.loc[[1,2], ['ints']]

Unnamed: 0,ints
1,1
2,2


In [69]:
# Let's change rownames to prove a point about using integer
# row references in loc
rows = ["one", "two", "three", "four"]
df['rows'] = rows
df = df.set_index('rows')
df

Unnamed: 0_level_0,ints,letters
rows,Unnamed: 1_level_1,Unnamed: 2_level_1
one,0,a
two,1,b
three,2,c
four,3,d


In [70]:
# This no longer works with loc - our index is not default integers
df.loc[[1,2], ['ints']]

KeyError: 'None of [[1, 2]] are in the [index]'

In [71]:
# Same rules apply - if we select a single column,
# it will return a series unless we co-erce i and j 
# to a list
df.loc['one', ['ints']]

ints    0
Name: one, dtype: object

In [73]:
# Produce a dataframe - even though its a single value!
df.loc[['two'], ['ints']]

Unnamed: 0_level_0,ints
rows,Unnamed: 1_level_1
two,1


In [78]:
# Let's try iloc now. Normal python index and slice rules apply
df.iloc[0:2, 0:3]

Unnamed: 0_level_0,ints,letters
rows,Unnamed: 1_level_1,Unnamed: 2_level_1
one,0,a
two,1,b


In [79]:
# If we return a single column (including a col of one value!)
# we return a series
df.iloc[:,0]

rows
one      0
two      1
three    2
four     3
Name: ints, dtype: int64

In [81]:
# You can force a data frame to be returned with a list for the columns,
# even if it contains a single column
df.iloc[:,[0]]

Unnamed: 0_level_0,ints
rows,Unnamed: 1_level_1
one,0
two,1
three,2
four,3


In [None]:
# So iloc is useful with groups of columns
# It is also useful of the row index is named
# But the sweet spot for me is name columns and integer index
# (i.e. don't declare an existing column as an index)
# Then you can name the columns explicitly, and use numeric 
# grouping on the rows, eg. to perform split-map-reduce problems.
