# Indexing and selecting data

This section will be focused on slice subsets of series and dataframe objects. Python has built in tools to do this that while capable are not the best for the job. Pandas has optimized methods to access data reccommended for production code.

# Different choices for indexing

Pandas supports 3 types of multi-axis indexing:
    .loc(), mainly for label based operations but boolean arrays are also possible with a KeyError being raised in cases when the item isn't found. Possible inputs include:
            
            a single label(e.g. 'avocados' or 3 which is interpreted as an index label)
        
            a list or array of labels (e.g. ['avocado', 'banana']
        
            Slices of objects with labels (e.g. 'avocado':'banana'), unlike normal python slicing operations both the start and stop are included however.
        
            A boolean array
        
            A callable function with a single argument the yields a valid indexing output from the above list.

    .iloc() is primarily a integer position based (0 to len(-1)) but also accepts boolean arrays. When an indexer is out of bounds .iloc() will raise an IndexError barring a slice indexer which can use out-of-bounds indexers. Valid inputs include:
            
            An integer (e.g.9)
            
            A list or array of integers (e.g. [3, 6, 2])
            
            A slice object with ints (e.g. 0:3)
            
            A boolean array
            
            Or a callable function with a single argument the yields a valid indexing output from the above list.
            
    .loc(), .iloc(), and [] all accept callable functions as indexers

When working will multiple axes, the following notation appplies. Null slices (':') can be used fpr any accesor but can also be left out (e.g. df.loc['b'] == df.loc['b', :, :]).

For a series object the format is s.loc[indexer].
For a DataFrame object the format is df.loc[row_indexer, column_indexer]
For a Panel object the format is p.loc[item_indexer, major_indexer, minor_indexer]

# Basics

The primary function of slicing with [] notation is to select lower dimensiional slices.
    for a series, series[label] returns a scalar value
    
    for a dataframe, df[colname], returns a series matching the colname
    
    for a panel, panel[itemname], returns a dataframe matching the itemname

In [1]:
#importing modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
#constructing a simple dataframe to work with
index = list('abcde')
df = pd.DataFrame(np.random.randn(5, 3), index = index, columns = ['happy', 'sad', 'angry'])
df

Unnamed: 0,happy,sad,angry
a,0.922165,0.456259,-1.444831
b,-1.411568,-0.834025,-0.840879
c,-0.395592,0.918014,0.154386
d,0.391592,-0.315628,-0.547479
e,1.159844,-0.414914,-0.379612


In [3]:
#constructing a panel
panel = pd.Panel({'alpha': df, 'beta' : df - df['angry'].mean()})
panel

<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 5 (major_axis) x 3 (minor_axis)
Items axis: alpha to beta
Major_axis axis: a to e
Minor_axis axis: happy to angry

In [4]:
#selecting a scalar index from a series from a datframe from a panel
df1 = panel['beta']
s = df1['happy']
s[2:4]

c    0.216091
d    1.003275
Name: happy, dtype: float64

In [5]:
#we can also select multiple columns by passing in a list using [[]].
s = df[['angry', 'sad']]
s[0:3]

Unnamed: 0,angry,sad
a,-1.444831,0.456259
b,-0.840879,-0.834025
c,0.154386,0.918014


This same process can also be used to set multiple columns.

In [6]:
df[['sad', 'angry']] = df1[['sad', 'angry']]
df

Unnamed: 0,happy,sad,angry
a,0.922165,1.067942,-0.833148
b,-1.411568,-0.222342,-0.229196
c,-0.395592,1.529697,0.766069
d,0.391592,0.296055,0.064205
e,1.159844,0.196769,0.232071


This might be useful for applying in-place transformations to a subset of columns. However, it is important to note that pandas aligns all axes when setting Series and Dataframe objects using .loc and .iloc.

The following fails to modify the df because column alignment preceeds value assignment.

In [7]:
#incorrect method
df.loc[:, ['sad', 'angry']] = df[['angry', 'sad']]
df

Unnamed: 0,happy,sad,angry
a,0.922165,1.067942,-0.833148
b,-1.411568,-0.222342,-0.229196
c,-0.395592,1.529697,0.766069
d,0.391592,0.296055,0.064205
e,1.159844,0.196769,0.232071


The correct method uses the raw values as follows

In [8]:
df.loc[:, ['sad', 'angry']] = df[['angry', 'sad']].values
df

Unnamed: 0,happy,sad,angry
a,0.922165,-0.833148,1.067942
b,-1.411568,-0.229196,-0.222342
c,-0.395592,0.766069,1.529697
d,0.391592,0.064205,0.296055
e,1.159844,0.232071,0.196769


# Attribute Access

Directly accessible attributes include an index of a series, a dataframe column, and a panel item. In the IPython environment you can also use tab completion the access these values.

In [9]:
#accesing a series attribute
s = df['happy']
s.b

-1.4115677265347357

In [10]:
#on a dataframe
df.sad

a   -0.833148
b   -0.229196
c    0.766069
d    0.064205
e    0.232071
Name: sad, dtype: float64

In [11]:
#on a panel
panel.beta

Unnamed: 0,happy,sad,angry
a,1.533848,1.067942,-0.833148
b,-0.799885,-0.222342,-0.229196
c,0.216091,1.529697,0.766069
d,1.003275,0.296055,0.064205
e,1.771527,0.196769,0.232071


we can also use this process to modify existing objects

In [12]:
#modifying a series value
s.b = 2
s.b

2.0

In [13]:
#modifying column values
df.happy = list(range(len(df.index)))
df

Unnamed: 0,happy,sad,angry
a,0,-0.833148,1.067942
b,1,-0.229196,-0.222342
c,2,0.766069,1.529697
d,3,0.064205,0.296055
e,4,0.232071,0.196769


In [14]:
#to create a new column the notation is as follows
df['glad'] = df.happy - df.sad
df

Unnamed: 0,happy,sad,angry,glad
a,0,-0.833148,1.067942,0.833148
b,1,-0.229196,-0.222342,1.229196
c,2,0.766069,1.529697,1.233931
d,3,0.064205,0.296055,2.935795
e,4,0.232071,0.196769,3.767929


Some caveats:
    This access only works when the index element is a valid python identifier.
    The attribute is not available if it conflicts with an existing method name like min or max
    It will also be unavaiable if it conflicts with the following lists: *index, major_axis, minor_axis, items, labels.*

when these cases occur, standard indexing is still valid.

We can also assign a dict to a row of a dataframe:

In [15]:
df.iloc[3] = {'happy':12, 'sad':13, 'angry':14, 'glad':6}
df

Unnamed: 0,happy,sad,angry,glad
a,0,-0.833148,1.06794,0.833148
b,1,-0.229196,-0.222342,1.2292
c,2,0.766069,1.5297,1.23393
d,happy,sad,angry,glad
e,4,0.232071,0.196769,3.76793


# slicing ranges

This section will focus on the [] operator

With a Series the [] operator uses the same syntax as when working with an ndarray.

In [16]:
# slicing out a series from df
s = df['angry']
#slicing a range
s[:3]

a     1.06794
b   -0.222342
c      1.5297
Name: angry, dtype: object

In [17]:
s[::3]

a    1.06794
d      angry
Name: angry, dtype: object

In [18]:
s[::-2]

e    0.196769
c      1.5297
a     1.06794
Name: angry, dtype: object

setting works the same was as well

In [19]:
s2 = s.copy()
s2[3] = 6
s2

a     1.06794
b   -0.222342
c      1.5297
d           6
e    0.196769
Name: angry, dtype: object

Slicing for a dataframe using the [] operator slices rows.

In [20]:
#slicing rows in a dataframe
df[2:4]

Unnamed: 0,happy,sad,angry,glad
c,2,0.766069,1.5297,1.23393
d,happy,sad,angry,glad


# Selection by label

This section concerns the .loc accessor and other purely label based methods.

a few notes:
    chained assignment should be avoided
    slicers must be compatible or convertible with the index type or they will raise a type error (e.g. tryin to slice a datetime index with integers will raise this error)
    
To reiterate a few points. Purely label based indexing in pandas is a strict inclusion protocol. Slices must include the start bound and stop bound when present in the index. Integers, in this case, refer to labels and not positions.

The.loc method is the primary attribute used to do this. valid inputs include the following:
    a single label
    
    a list or array of labels
    
    a slice object with labels 'start':'finish'
    
    a boolean array
    
    a callable function

In [21]:
# generating a new dataframe to work with
df = pd.DataFrame(np.random.randn(6, 6), index = list('abcdef'), columns = [1, 2, 3, 4, 5, 6])
df

Unnamed: 0,1,2,3,4,5,6
a,-0.242416,-0.951107,2.116526,-0.925814,-0.496736,1.416686
b,0.283976,0.306144,1.076659,-0.693901,-1.041501,0.608209
c,1.014191,-0.555344,0.146529,-1.925252,0.599487,-0.722274
d,-0.352002,-0.347169,2.11221,0.003346,-0.390103,-0.051325
e,-0.683089,0.52847,-2.110392,0.379916,0.555074,2.059263
f,0.979241,-1.230551,0.487951,0.695517,-1.131307,-1.803358


In [22]:
#selecting rows based on a series label within a dataframe
df[1].loc['a':'c']

a   -0.242416
b    0.283976
c    1.014191
Name: 1, dtype: float64

In [23]:
#setting a series value based on label in a series within a dataframe
df[1].loc['a'] = np.nan
df

Unnamed: 0,1,2,3,4,5,6
a,,-0.951107,2.116526,-0.925814,-0.496736,1.416686
b,0.283976,0.306144,1.076659,-0.693901,-1.041501,0.608209
c,1.014191,-0.555344,0.146529,-1.925252,0.599487,-0.722274
d,-0.352002,-0.347169,2.11221,0.003346,-0.390103,-0.051325
e,-0.683089,0.52847,-2.110392,0.379916,0.555074,2.059263
f,0.979241,-1.230551,0.487951,0.695517,-1.131307,-1.803358


In [24]:
#using selected rows and columns
df.loc[['b', 'd', 'f'], 1:3]

Unnamed: 0,1,2,3
b,0.283976,0.306144,1.076659
d,-0.352002,-0.347169,2.11221
f,0.979241,-1.230551,0.487951


In [25]:
#using label slices
df.loc['c':'f', 3:6]

Unnamed: 0,3,4,5,6
c,0.146529,-1.925252,0.599487,-0.722274
d,2.11221,0.003346,-0.390103,-0.051325
e,-2.110392,0.379916,0.555074,2.059263
f,0.487951,0.695517,-1.131307,-1.803358


In [26]:
#cross section with a label
df.loc['e']

1   -0.683089
2    0.528470
3   -2.110392
4    0.379916
5    0.555074
6    2.059263
Name: e, dtype: float64

In [27]:
#using a boolean array
df.loc['a':'c', 1:4] <1

Unnamed: 0,1,2,3,4
a,False,True,False,True
b,True,True,False,True
c,False,True,True,True


In [28]:
#for grabbing a value explicitly, equivalent to "df.at['a', '1']
df.loc['b', 3]

1.076658515386979

# slicing with labels

Slicing using the .loc accessor returns the elements inbetween and including the start and stop labels when they are both present in the index.

In [29]:
s = df[1]
s

a         NaN
b    0.283976
c    1.014191
d   -0.352002
e   -0.683089
f    0.979241
Name: 1, dtype: float64

In [30]:
#slicing a series
s.loc['b':'e']

b    0.283976
c    1.014191
d   -0.352002
e   -0.683089
Name: 1, dtype: float64

In the case where one of the two elements is missing but the index is sorted. Slicing will still work by selecting the labels ranked between the two.

In [31]:
s.sort_index().loc['d':'g']

d   -0.352002
e   -0.683089
f    0.979241
Name: 1, dtype: float64

In this same case when the index is not sorted an error will be raised instead. aka, don't do the thing.

# Selecting by Position

Chained assingment should be avoided

Purely integer based indexing is available in pandas with a number of methods. All methods are 0 based, start bound is included stop bound is excluded. Using anything but an integer will raise and IndexError.

.iloc is the primary method with the following valid inputs
    an integer
    a list or array of integers
    a slice object with ints

In [32]:
# creating a new series to work with
s = pd.Series(np.random.randn(6), index = list(range(0, 18, 3)))
s

0     0.847218
3    -0.951058
6    -0.131249
9     0.263152
12   -1.730107
15    0.579728
dtype: float64

In [33]:
#grabbing the third value (integer position 2 in this case)
s.iloc[2]

-0.13124864421401022

In [34]:
#grabbing a central slice
s.iloc[2:4]

6   -0.131249
9    0.263152
dtype: float64

In [35]:
#setting the value of integer position 2
s.iloc[2] = 3
s

0     0.847218
3    -0.951058
6     3.000000
9     0.263152
12   -1.730107
15    0.579728
dtype: float64

In [36]:
#generating a new dataframe to work with
df = pd.DataFrame(np.random.randn(10, 5), index = list(range(0, 40, 4)), columns = list(range(0, 10, 2)))
df

Unnamed: 0,0,2,4,6,8
0,0.21058,-0.906602,-2.3904,-0.797841,0.273868
4,-1.330317,1.345429,0.327931,-0.644835,0.972289
8,-0.928789,-0.806594,-1.06595,0.549011,0.739513
12,-0.756749,0.817782,-0.090105,-0.493505,-1.037237
16,-0.317896,0.724582,0.499436,-0.545581,-1.317301
20,-0.48155,0.650169,0.764207,0.514775,-0.313022
24,0.927957,1.068104,-0.210802,-1.382112,0.689783
28,0.53546,0.688742,0.512451,-0.780696,-0.602517
32,-0.523694,0.241828,-0.314429,-1.432156,-1.62516
36,-0.300745,-0.111574,0.021927,-0.216563,-1.803607


In [37]:
#using integer slicing, specifying rows
df.iloc[:4]

Unnamed: 0,0,2,4,6,8
0,0.21058,-0.906602,-2.3904,-0.797841,0.273868
4,-1.330317,1.345429,0.327931,-0.644835,0.972289
8,-0.928789,-0.806594,-1.06595,0.549011,0.739513
12,-0.756749,0.817782,-0.090105,-0.493505,-1.037237


In [38]:
#using integer slicing specifying rows and columns
df.iloc[3:5, 3:5]

Unnamed: 0,6,8
12,-0.493505,-1.037237
16,-0.545581,-1.317301


In [39]:
#using a list of integers
df.iloc[[3, 5, 6], [3, 4]]

Unnamed: 0,6,8
12,-0.493505,-1.037237
20,0.514775,-0.313022
24,-1.382112,0.689783


basic form for dataframes
df.iloc[rows, columns]

When a cross section is desired:

In [40]:
df.iloc[3]

0   -0.756749
2    0.817782
4   -0.090105
6   -0.493505
8   -1.037237
Name: 12, dtype: float64

In [41]:
#out of bounds should be handled as well
df.iloc[7:15, 3:10]

Unnamed: 0,6,8
28,-0.780696,-0.602517
32,-1.432156,-1.62516
36,-0.216563,-1.803607


when slices go out of bounds they can result in an empty dataframe.

when a single indexer is out of bounds an IndexError will be raised. Similarly, a list of indexers where any single element is out of bounds will also raise an IndexError

# Selection by a callable

The primary indexers (.loc, .iloc, and []) can all accept a callable function as an indexer. However, the callable MUST be a function with one argument(the data object in this case), returning a valid output for indexing.

In [42]:
df.loc[lambda df: df[0] > -1]

Unnamed: 0,0,2,4,6,8
0,0.21058,-0.906602,-2.3904,-0.797841,0.273868
8,-0.928789,-0.806594,-1.06595,0.549011,0.739513
12,-0.756749,0.817782,-0.090105,-0.493505,-1.037237
16,-0.317896,0.724582,0.499436,-0.545581,-1.317301
20,-0.48155,0.650169,0.764207,0.514775,-0.313022
24,0.927957,1.068104,-0.210802,-1.382112,0.689783
28,0.53546,0.688742,0.512451,-0.780696,-0.602517
32,-0.523694,0.241828,-0.314429,-1.432156,-1.62516
36,-0.300745,-0.111574,0.021927,-0.216563,-1.803607


In [43]:
df.iloc[:, lambda df: [2, 3] ]

Unnamed: 0,4,6
0,-2.3904,-0.797841
4,0.327931,-0.644835
8,-1.06595,0.549011
12,-0.090105,-0.493505
16,0.499436,-0.545581
20,0.764207,0.514775
24,-0.210802,-1.382112
28,0.512451,-0.780696
32,-0.314429,-1.432156
36,0.021927,-0.216563


In [44]:
df[lambda df: df.columns[:3]]

Unnamed: 0,0,2,4
0,0.21058,-0.906602,-2.3904
4,-1.330317,1.345429,0.327931
8,-0.928789,-0.806594,-1.06595
12,-0.756749,0.817782,-0.090105
16,-0.317896,0.724582,0.499436
20,-0.48155,0.650169,0.764207
24,0.927957,1.068104,-0.210802
28,0.53546,0.688742,0.512451
32,-0.523694,0.241828,-0.314429
36,-0.300745,-0.111574,0.021927


In [45]:
#callable indexing can also be used in a series
s.loc[lambda s: s>1]

6    3.0
dtype: float64

you can avoid the use of a temporary variable by using chain data selection operations


The following are deprecated
.ix indexer in favor of .loc, and .iloc
using .loc or [] with a list containing one or more missing labels in favor of .reindex()

# reindexing

This is the idiomatic way to select potentially not-found elements.

In [46]:
s.reindex([0, 3, 8])

0    0.847218
3   -0.951058
8         NaN
dtype: float64

Another option for returing only valid keys and preserving the dtype is the following:

In [47]:
labels = [0, 3, 6]
s.loc[s.index.intersection(labels)]

0    0.847218
3   -0.951058
6    3.000000
dtype: float64

a duplicated index will raise an error for .reindex()

In [48]:
#generating a new series and failing to generate the the dulicated axis error
s = pd.Series(np.arange(5), index = ['a', 'b', 'c', 'd', 'e'])
labels = ['a', 'b']
s.reindex(labels)

a    0
b    1
dtype: int32

The duplication error can be circumvented by first intersecting the desired lables and then reindexing but this will still raise an error if the resulting index is duplicated

# selecting random samples

This is done using the sample() method on a Series, DataFrame or Panel. Its default behavior is to sample rows by default, returning a specific number of rows/columns or a fraction of rows

In [49]:
#generating a new series to work with
s = pd.Series(np.arange(15), index = list('abcdefghijklmno'))
print(s)

a     0
b     1
c     2
d     3
e     4
f     5
g     6
h     7
i     8
j     9
k    10
l    11
m    12
n    13
o    14
dtype: int32


In [50]:
#without passing an argument only one row is returned
s.sample()

e    4
dtype: int32

In [51]:
#specifying a number of rows
s.sample(n=5)

f     5
k    10
m    12
n    13
e     4
dtype: int32

In [52]:
#sampling a fraction of rows
s.sample(frac = 0.66)

n    13
o    14
a     0
f     5
g     6
m    12
j     9
d     3
c     2
k    10
dtype: int32

You can sample with replacement using the replace option otherwise sample() will only return each row at most once.

In [53]:
#sampling without replacement
s.sample(n = 5, replace=False)

i     8
n    13
d     3
c     2
b     1
dtype: int32

In [54]:
#with replacement
s.sample(n=6, replace = True)

g     6
m    12
a     0
e     4
l    11
l    11
dtype: int32

Using sample(), by default, each row has an equal probability of being selected. To change this we can pass the weights argument to the sample function. The weights can be in the form of a list, a np array, or series as long as they are the same length as the object being sampled. Missing values are assigned a weight of 0 and infinite values are not allowed. If the weights do not sum to one they will be normalized by dividing all the weights by the sum of the weights.

In [55]:
#creating a list of weights with sum 105
weights = pd.Series(np.arange(15))
#sampling with weights with re-normalizing
s.sample(n= 5, weights = weights.values)

m    12
h     7
o    14
i     8
n    13
dtype: int32

For DataFrames, a column within the df can be used as sampling weights(only when you are sampling rows, not when you are sampling columns) by passing the column name as a string.

In [56]:
#modifying our existing dataframe
df['weights'] = df[8]
del df[8]

In [57]:
#selecting the rows where weights are positive since it cannot accept negatives
df = df[df['weights'] >=0]

In [58]:
#sampling
df.sample(n = 4, weights = 'weights')

Unnamed: 0,0,2,4,6,weights
24,0.927957,1.068104,-0.210802,-1.382112,0.689783
4,-1.330317,1.345429,0.327931,-0.644835,0.972289
8,-0.928789,-0.806594,-1.06595,0.549011,0.739513
0,0.21058,-0.906602,-2.3904,-0.797841,0.273868


In [59]:
#we can alse sample columns
df.sample(n = 2, axis = 1)

Unnamed: 0,4,0
0,-2.3904,0.21058
4,0.327931,-1.330317
8,-1.06595,-0.928789
24,-0.210802,0.927957


As a final not, we can also set a seed for sample()'s RNG using the random_state arg using either an int or a np RandomState object.

In [60]:
#the sample will always draw the same rows when given a seed(aka. int)
df.sample(n=3, random_state= 5)

Unnamed: 0,0,2,4,6,weights
0,0.21058,-0.906602,-2.3904,-0.797841,0.273868
4,-1.330317,1.345429,0.327931,-0.644835,0.972289
8,-0.928789,-0.806594,-1.06595,0.549011,0.739513


# Setting with Enlargement
enlargement can be performed using either the .loc or [] operations when setting a non-existent key for that axis.

This is basically an appending operation in the case of a series

In [61]:
# setting by enlargement a value for p
s['p'] = 15
s

a     0
b     1
c     2
d     3
e     4
f     5
g     6
h     7
i     8
j     9
k    10
l    11
m    12
n    13
o    14
p    15
dtype: int64

In the case of a DataFrame, either axis can be enlarged by using .loc

In [62]:
#creating a new column via enlargement using the .loc accessor
df.loc[:, 'alpha'] = 15
df
#I am not sure about the error here......

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item_labels[indexer[info_axis]]] = value


Unnamed: 0,0,2,4,6,weights,alpha
0,0.21058,-0.906602,-2.3904,-0.797841,0.273868,15
4,-1.330317,1.345429,0.327931,-0.644835,0.972289,15
8,-0.928789,-0.806594,-1.06595,0.549011,0.739513,15
24,0.927957,1.068104,-0.210802,-1.382112,0.689783,15


In [63]:
#the following is an append operation
df.loc[8, :] = 7
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


Unnamed: 0,0,2,4,6,weights,alpha
0,0.21058,-0.906602,-2.3904,-0.797841,0.273868,15
4,-1.330317,1.345429,0.327931,-0.644835,0.972289,15
8,7.0,7.0,7.0,7.0,7.0,7
24,0.927957,1.068104,-0.210802,-1.382112,0.689783,15


# Fast scalar value getting and setting
indexing with [] handles multiple use cases and as such it has some overhead to understand what you're asking for. To access only a scalar value it is therefore advisable to utilize the at(labels) and iat(integers) methods.

In [65]:
#using .iat[]
s.iat[3]

3

In [67]:
#using .at on a dataframe
df.at[8, 'alpha']

7

In [68]:
#using .iat to access the same value
df.iat[2, 5]

7

In [70]:
#it is also possible to set scalar values with these indexers
df.at[8, 'alpha'] = 14
df

Unnamed: 0,0,2,4,6,weights,alpha
0,0.21058,-0.906602,-2.3904,-0.797841,0.273868,15
4,-1.330317,1.345429,0.327931,-0.644835,0.972289,15
8,7.0,7.0,7.0,7.0,7.0,14
24,0.927957,1.068104,-0.210802,-1.382112,0.689783,15


In [71]:
#these can also be used to enlarge the object in-place provided the indexer is missing
df.at[0, 'echo'] = 6
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


Unnamed: 0,0,2,4,6,weights,alpha,echo
0,0.21058,-0.906602,-2.3904,-0.797841,0.273868,15,6.0
4,-1.330317,1.345429,0.327931,-0.644835,0.972289,15,
8,7.0,7.0,7.0,7.0,7.0,14,
24,0.927957,1.068104,-0.210802,-1.382112,0.689783,15,


# Boolean indexing

Boolean vectors are a common way to filter data. The list of potential operators are | for or, & for and, ~ for not. These have to be grouped using parantheses to achieve the desired evaluation order.

Series works just the same a with numpy ndarray

In [72]:
s

a     0
b     1
c     2
d     3
e     4
f     5
g     6
h     7
i     8
j     9
k    10
l    11
m    12
n    13
o    14
p    15
dtype: int64