# <h1>Indexing and selecting data</h1>

<p>The axis labeling information in pandas objects serves many purposes:</p>

<ul>
<li><p>Identifies data (i.e. provides <em>metadata</em>) using known indicators,
important for analysis, visualization, and interactive console display.</p></li>
<li><p>Enables automatic and explicit data alignment.</p></li>
<li><p>Allows intuitive getting and setting of subsets of the data set.</p></li>
</ul>

<p>In this section, we will focus on the final point: namely, how to slice, dice, and generally get and set subsets of pandas objects.
The primary focus will be on Series and DataFrame as they have received more development attention in this area.</p>

<p>Note</p>

<p>The Python and NumPy indexing operators <code>[]</code> and attribute operator <code>.</code> provide quick and easy access to pandas data structures across a wide range of use cases.
This makes interactive work intuitive, as there’s little new to learn if you already know how to deal with Python dictionaries and NumPy arrays.
However, since the type of the data to be accessed isn’t known in advance, directly using standard operators has some optimization limits.
For production code, we recommended that you take advantage of the optimized
pandas data access methods exposed in this chapter.</p>

<p>Warning</p>

<p>Whether a copy or a reference is returned for a setting operation, may
depend on the context. This is sometimes called <em>chained assignment</em> and
should be avoided. See <a href="https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-view-versus-copy">Returning a View versus Copy</a>.</p>

<p>See the <a href="https://pandas.pydata.org/docs/user_guide/advanced.html#advanced">MultiIndex / Advanced Indexing</a> for <code>MultiIndex</code> and more advanced indexing documentation.</p>

<p>See the <a href="https://pandas.pydata.org/docs/user_guide/cookbook.html#cookbook-selection">cookbook</a> for some advanced strategies.</p>

## <h2>Different choices for indexing</h2>

<p>Object selection has had a number of user-requested additions in order to
support more explicit location based indexing. Pandas now supports three types
of multi-axis indexing.</p>

<ul>
<li><p><code>.loc</code> is primarily label based, but may also be used with a boolean array. <code>.loc</code> will raise <code>KeyError</code> when the items are not found. Allowed inputs are:</p>

<blockquote>
<div><ul>

<li><p>A single label, e.g. <code>5</code> or <code>'a'</code> (Note that <code>5</code> is interpreted as a <em>label</em> of the index. This use is <strong>not</strong> an integer position along the index.).</p></li>

<li><p>A list or array of labels <code>['a', 'b', 'c']</code>.</p></li>

<li><p>A slice object with labels <code>'a':'f'</code> (Note that contrary to usual Python slices, <strong>both</strong> the start and the stop are included, when present in the index! See <a href="https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-slicing-with-labels">Slicing with labels</a>
and <a href="https://pandas.pydata.org/docs/user_guide/advanced.html#advanced-endpoints-are-inclusive">Endpoints are inclusive</a>.)</p></li>

<li><p>A boolean array (any <code>NA</code> values will be treated as <code>False</code>).</p></li>

<li><p>A <code>callable</code> function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above).</p></li>

<li><p>A tuple of row (and column) indices whose elements are one of the
above inputs.</p></li>

</ul>
</div></blockquote>

<p>See more at <a href="https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-label">Selection by Label</a>.</p>
</li>

<li><p><code>.iloc</code> is primarily integer position based (from <code>0</code> to <code>length-1</code> of the axis), but may also be used with a boolean
array.
<code>.iloc</span></code> will raise <code>IndexError</code> if a requested
indexer is out-of-bounds, except <em>slice</em> indexers which allow
out-of-bounds indexing.  (this conforms with Python/NumPy <em>slice</em>
semantics).  Allowed inputs are:</p>

<blockquote>
<div><ul>

<li><p>An integer e.g. <code>5</code>.</p></li>

<li><p>A list or array of integers <code>[4,</span> <span class="pre">3,</span> <span class="pre">0]</span></code>.</p></li>

<li><p>A slice object with ints <code>1:7</span></code>.</p></li>
<li><p>A boolean array (any <code>NA</span></code> values will be treated as <code>False</code>).</p></li>

<li><p>A <code>callable</code> function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above).</p></li>

<li><p>A tuple of row (and column) indices whose elements are one of the
above inputs.</p></li>

</ul>
</div></blockquote>

<p>See more at <a href="https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-integer">Selection by Position</a>, <a href="https://pandas.pydata.org/docs/user_guide/advanced.html#advanced">Advanced Indexing</a> and <a href="https://pandas.pydata.org/docs/user_guide/advanced.html#advanced-advanced-hierarchical">Advanced Hierarchical</a>.</p>
</li>

<li><p><code>.loc</code>, <code>.iloc</code>, and also <code>[]</code> indexing can accept a <code>callable</code> as indexer. See more at <a href="https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-callable">Selection By Callable</a>.</p>

<div>
<p>Note</p>

<p>Destructuring tuple keys into row (and column) indexes occurs
<em>before</em> callables are applied, so you cannot return a tuple from
a callable to index both rows and columns.</p>
</div>
</li>
</ul>

<p>Getting values from an object with multi-axes selection uses the following
notation (using <code>.loc</code> as an example, but the following applies to <code>.iloc</code> as well).
Any of the axes accessors may be the null slice <code>:</code>.
Axes left out of the specification are assumed to be <code>:</code>, e.g. <code>p.loc['a']</code> is equivalent to <code>p.loc['a', :]</code>.</p>

In [140]:
import pandas as pd
import numpy as np

In [141]:
ser = pd.Series(range(5), index=list("abcde"))

In [142]:
ser.loc[["a", "c", "e"]]

Unnamed: 0,0
a,0
c,2
e,4


In [143]:
df = pd.DataFrame(np.arange(25).reshape(5, 5), index=list("abcde"), columns=list("abcde"))

In [144]:
df.loc[["a", "c", "e"], ["b", "d"]]

Unnamed: 0,b,d
a,1,3
c,11,13
e,21,23


## <h2>Basics</h2>

<p>As mentioned when introducing the data structures in the last section, the primary function of indexing with <code>[]</code> (a.k.a. <code>__getitem__</code> for those familiar with implementing class behavior in Python) is selecting out lower-dimensional slices.
The following table shows return type values when indexing pandas objects with <code>[]</code>:</p>

<table>
<colgroup>
<col style="width: 25.0%">
<col style="width: 25.0%">
<col style="width: 50.0%">
</colgroup>
<thead>
<tr><th><p>Object Type</p></th>
<th><p>Selection</p></th>
<th><p>Return Value Type</p></th>
</tr>
</thead>
<tbody>
<tr><td><p>Series</p></td>
<td><p><code>series[label]</span></code></p></td>
<td><p>scalar value</p></td>
</tr>
<tr><td><p>DataFrame</p></td>
<td><p><code>frame[colname]</span></code></p></td>
<td><p><code>Series</span></code> corresponding to colname</p></td>
</tr>
</tbody>
</table>

<p>Here we construct a simple time series data set to use for illustrating the
indexing functionality:</p>

In [145]:
dates = pd.date_range('1/1/2000', periods=8)

In [146]:
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])

In [147]:
df

Unnamed: 0,A,B,C,D
2000-01-01,-0.309973,0.898827,2.180561,1.955116
2000-01-02,-0.082973,0.068839,-0.50981,-1.634049
2000-01-03,-0.105154,0.530347,1.299516,-0.585902
2000-01-04,1.367474,0.185921,0.642659,1.242647
2000-01-05,-0.985414,0.291152,-1.777454,-0.201539
2000-01-06,-0.298244,-0.519239,-1.576231,-1.212981
2000-01-07,0.001335,-0.052138,0.517689,-1.415114
2000-01-08,-1.264752,1.878147,0.294467,1.558781


<p>Note</p>

<p>None of the indexing functionality is time series specific unless
specifically stated.</p>


<p>Thus, as per above, we have the most basic indexing using <code>[]</span></code>:</p>

In [148]:
s = df['A']

In [149]:
s[dates[5]]

np.float64(-0.298244177753831)

<p>You can pass a list of columns to <code>[]</code> to select columns in that order.
If a column is not contained in the DataFrame, an exception will be
raised. Multiple columns can also be set in this manner:</p>

In [150]:
df

Unnamed: 0,A,B,C,D
2000-01-01,-0.309973,0.898827,2.180561,1.955116
2000-01-02,-0.082973,0.068839,-0.50981,-1.634049
2000-01-03,-0.105154,0.530347,1.299516,-0.585902
2000-01-04,1.367474,0.185921,0.642659,1.242647
2000-01-05,-0.985414,0.291152,-1.777454,-0.201539
2000-01-06,-0.298244,-0.519239,-1.576231,-1.212981
2000-01-07,0.001335,-0.052138,0.517689,-1.415114
2000-01-08,-1.264752,1.878147,0.294467,1.558781


In [151]:
df[['B', 'A']] = df[['A', 'B']]

In [152]:
df

Unnamed: 0,A,B,C,D
2000-01-01,0.898827,-0.309973,2.180561,1.955116
2000-01-02,0.068839,-0.082973,-0.50981,-1.634049
2000-01-03,0.530347,-0.105154,1.299516,-0.585902
2000-01-04,0.185921,1.367474,0.642659,1.242647
2000-01-05,0.291152,-0.985414,-1.777454,-0.201539
2000-01-06,-0.519239,-0.298244,-1.576231,-1.212981
2000-01-07,-0.052138,0.001335,0.517689,-1.415114
2000-01-08,1.878147,-1.264752,0.294467,1.558781


<p>You may find this useful for applying a transform (in-place) to a subset of the columns.</p>

<div class="admonition warning">
<p class="admonition-title">Warning</p>
<p>pandas aligns all AXES when setting <code class="docutils literal notranslate"><span class="pre">Series</span></code> and <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code> from <code class="docutils literal notranslate"><span class="pre">.loc</span></code>.</p>
<p>This will <strong>not</strong> modify <code class="docutils literal notranslate"><span class="pre">df</span></code> because the column alignment is before value assignment.</p>

In [153]:
df[['A', 'B']]

Unnamed: 0,A,B
2000-01-01,0.898827,-0.309973
2000-01-02,0.068839,-0.082973
2000-01-03,0.530347,-0.105154
2000-01-04,0.185921,1.367474
2000-01-05,0.291152,-0.985414
2000-01-06,-0.519239,-0.298244
2000-01-07,-0.052138,0.001335
2000-01-08,1.878147,-1.264752


In [154]:
df.loc[:, ['B', 'A']] = df[['A', 'B']]

In [155]:
df[['A', 'B']]

Unnamed: 0,A,B
2000-01-01,0.898827,-0.309973
2000-01-02,0.068839,-0.082973
2000-01-03,0.530347,-0.105154
2000-01-04,0.185921,1.367474
2000-01-05,0.291152,-0.985414
2000-01-06,-0.519239,-0.298244
2000-01-07,-0.052138,0.001335
2000-01-08,1.878147,-1.264752


<p>The correct way to swap column values is by using raw values:</p>

In [156]:
df.loc[:, ['B', 'A']] = df[['A', 'B']].to_numpy()

In [157]:
df[['A', 'B']]

Unnamed: 0,A,B
2000-01-01,-0.309973,0.898827
2000-01-02,-0.082973,0.068839
2000-01-03,-0.105154,0.530347
2000-01-04,1.367474,0.185921
2000-01-05,-0.985414,0.291152
2000-01-06,-0.298244,-0.519239
2000-01-07,0.001335,-0.052138
2000-01-08,-1.264752,1.878147


<p>However, pandas does not align AXES when setting <code class="docutils literal notranslate"><span class="pre">Series</span></code> and <code class="docutils literal notranslate"><span class="pre">DataFrame</span></code> from <code class="docutils literal notranslate"><span class="pre">.iloc</span></code>
because <code class="docutils literal notranslate"><span class="pre">.iloc</span></code> operates by position.</p>
<p>This will modify <code class="docutils literal notranslate"><span class="pre">df</span></code> because the column alignment is not done before value assignment.</p>

In [158]:
df[['A', 'B']]

Unnamed: 0,A,B
2000-01-01,-0.309973,0.898827
2000-01-02,-0.082973,0.068839
2000-01-03,-0.105154,0.530347
2000-01-04,1.367474,0.185921
2000-01-05,-0.985414,0.291152
2000-01-06,-0.298244,-0.519239
2000-01-07,0.001335,-0.052138
2000-01-08,-1.264752,1.878147


In [159]:
df.iloc[:, [1, 0]] = df[['A', 'B']]

In [160]:

df[['A','B']]

Unnamed: 0,A,B
2000-01-01,0.898827,-0.309973
2000-01-02,0.068839,-0.082973
2000-01-03,0.530347,-0.105154
2000-01-04,0.185921,1.367474
2000-01-05,0.291152,-0.985414
2000-01-06,-0.519239,-0.298244
2000-01-07,-0.052138,0.001335
2000-01-08,1.878147,-1.264752


## <h2>Attribute access</h2>

<p>You may access an index on a <code>Series</code> or  column on a <code>DataFrame</code> directly as an attribute:</p>

In [161]:
sa = pd.Series([1, 2, 3], index=list('abc'))

In [162]:
dfa = df.copy()

In [163]:
sa.b

np.int64(2)

In [164]:
dfa.A

Unnamed: 0,A
2000-01-01,0.898827
2000-01-02,0.068839
2000-01-03,0.530347
2000-01-04,0.185921
2000-01-05,0.291152
2000-01-06,-0.519239
2000-01-07,-0.052138
2000-01-08,1.878147


In [165]:
sa.a = 5

In [166]:
sa

Unnamed: 0,0
a,5
b,2
c,3


In [167]:
dfa['A'] = list(range(len(dfa.index)))

In [168]:
dfa

Unnamed: 0,A,B,C,D
2000-01-01,0,-0.309973,2.180561,1.955116
2000-01-02,1,-0.082973,-0.50981,-1.634049
2000-01-03,2,-0.105154,1.299516,-0.585902
2000-01-04,3,1.367474,0.642659,1.242647
2000-01-05,4,-0.985414,-1.777454,-0.201539
2000-01-06,5,-0.298244,-1.576231,-1.212981
2000-01-07,6,0.001335,0.517689,-1.415114
2000-01-08,7,-1.264752,0.294467,1.558781


<div class="alert alert-warning">

<p>Warning</p>

<ul>

<li><p>You can use this access only if the index element is a valid Python identifier, e.g. <code>s.1</code> is not allowed.
See <a href="https://docs.python.org/3/reference/lexical_analysis.html#identifiers">here for an explanation of valid identifiers</a>.</p></li>

<li><p>The attribute will not be available if it conflicts with an existing method name, e.g. <code>s.min</code> is not allowed, but <code>s['min']</code> is possible.</p></li>

<li><p>Similarly, the attribute will not be available if it conflicts with any of the following list: <code>index</code>,
<code>major_axis</code>, <code>minor_axis</code>, <code>items</code>.</p></li>

<li><p>In any of these cases, standard indexing will still work, e.g. <code>s['1']</code>, <code>s['min']</code>, and <code>s['index']</code> will
access the corresponding element or column.</p></li>

</ul>

</div>

<p>If you are using the IPython environment, you may also use tab-completion to see these accessible attributes.</p>

<p>You can also assign a <code>dict</code> to a row of a <code>DataFrame</code>:</p>

In [169]:
x = pd.DataFrame({'x': [1, 2, 3], 'y': [3, 4, 5]})

In [170]:
x.iloc[1] = {'x': 9, 'y': 99}

In [171]:
x

Unnamed: 0,x,y
0,1,3
1,9,99
2,3,5


<p>You can use attribute access to modify an existing element of a Series or column of a DataFrame, but be careful;
if you try to use attribute access to create a new column, it creates a new attribute rather than a
new column and will this raise a <code class="docutils literal notranslate"><span class="pre">UserWarning</span></code>:</p>

In [172]:
df_new = pd.DataFrame({'one': [1., 2., 3.]})

In [173]:
df_new.two = [4, 5, 6]

  df_new.two = [4, 5, 6]


In [174]:
df_new

Unnamed: 0,one
0,1.0
1,2.0
2,3.0


## <h2>Slicing ranges</h2>

<p>The most robust and consistent way of slicing ranges along arbitrary axes is described in the <a href="https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-integer">Selection by Position</a> section
detailing the <code>.iloc</code> method. For now, we explain the semantics of slicing using the <code>[]</span></code> operator.</p>

<p>With Series, the syntax works exactly as with an ndarray, returning a slice of the values and the corresponding labels:</p>

In [175]:
s[:5]

Unnamed: 0,A
2000-01-01,-0.309973
2000-01-02,-0.082973
2000-01-03,-0.105154
2000-01-04,1.367474
2000-01-05,-0.985414


In [176]:
s[::2]

Unnamed: 0,A
2000-01-01,-0.309973
2000-01-03,-0.105154
2000-01-05,-0.985414
2000-01-07,0.001335


In [177]:
s[::-1]

Unnamed: 0,A
2000-01-08,-1.264752
2000-01-07,0.001335
2000-01-06,-0.298244
2000-01-05,-0.985414
2000-01-04,1.367474
2000-01-03,-0.105154
2000-01-02,-0.082973
2000-01-01,-0.309973


<p>Note that setting works as well:</p>

In [178]:
s2 = s.copy()

In [179]:
s2[:5] = 0

In [180]:
s2

Unnamed: 0,A
2000-01-01,0.0
2000-01-02,0.0
2000-01-03,0.0
2000-01-04,0.0
2000-01-05,0.0
2000-01-06,-0.298244
2000-01-07,0.001335
2000-01-08,-1.264752


<p>With DataFrame, slicing inside of <code>[]</code> <strong>slices the rows</strong>.
This is provided largely as a convenience since it is such a common operation.</p>

In [181]:
df[:3]

Unnamed: 0,A,B,C,D
2000-01-01,0.898827,-0.309973,2.180561,1.955116
2000-01-02,0.068839,-0.082973,-0.50981,-1.634049
2000-01-03,0.530347,-0.105154,1.299516,-0.585902


In [182]:
df[::-1]

Unnamed: 0,A,B,C,D
2000-01-08,1.878147,-1.264752,0.294467,1.558781
2000-01-07,-0.052138,0.001335,0.517689,-1.415114
2000-01-06,-0.519239,-0.298244,-1.576231,-1.212981
2000-01-05,0.291152,-0.985414,-1.777454,-0.201539
2000-01-04,0.185921,1.367474,0.642659,1.242647
2000-01-03,0.530347,-0.105154,1.299516,-0.585902
2000-01-02,0.068839,-0.082973,-0.50981,-1.634049
2000-01-01,0.898827,-0.309973,2.180561,1.955116


## <h2>Selection by label</h2>

<p>Warning</p>

<p>Whether a copy or a reference is returned for a setting operation, may depend on the context.
This is sometimes called <code>chained assignment</code> and should be avoided.
See <anhref="https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-view-versus-copy">Returning a View versus Copy</a>.</p>

<p>Warning</p>

<blockquote>
<div><p><code>.loc</span></code> is strict when you present slicers that are not compatible (or convertible) with the index type. For example
using integers in a <code>DatetimeIndex</code>.
These will raise a <code>TypeError</code>.</p>

In [137]:
dfl = pd.DataFrame(np.random.randn(5, 4), columns=list('ABCD'), index=pd.date_range('20130101', periods=5))

In [183]:
dfl.loc[2:3]

TypeError: cannot do slice indexing on DatetimeIndex with these indexers [2] of type int

<p>String likes in slicing <em>can</em> be convertible to the type of the index and lead to natural slicing.</p>

In [184]:
dfl.loc['20130102':'20130104']

Unnamed: 0,A,B,C,D
2013-01-02,0.788955,-0.324846,1.448673,1.819037
2013-01-03,1.305148,0.77543,-0.807382,-1.897423
2013-01-04,0.750455,-0.033563,1.290676,1.234096




<p>pandas provides a suite of methods in order to have <strong>purely label based indexing</strong>. This is a strict inclusion based protocol.
Every label asked for must be in the index, or a <code>KeyError</code> will be raised.
When slicing, both the start bound <strong>AND</strong> the stop bound are <em>included</em>, if present in the index.
Integers are valid labels, but they refer to the label <strong>and not the position</strong>.</p>
<p>The <code>.loc</code> attribute is the primary access method.
The following are valid inputs:</p>

<ul>

<li><p>A single label, e.g. <code>5</code> or <code>'a'</code> (Note that <code>5</code> is interpreted as a <em>label</em> of the index.
This use is <strong>not</strong> an integer position along the index.).</p></li>

<li><p>A list or array of labels <code>['a', 'b', 'c']</code>.</p></li>

<li><p>A slice object with labels <code>'a':'f'</code> (Note that contrary to usual Python slices, <strong>both</strong> the start and the stop are included, when present in the index! See <a href="https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-slicing-with-labels">Slicing with labels</a>.</p></li>

<li><p>A boolean array.</p></li>

<li><p>A <code>callable</code>, see <a href="https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-callable">Selection By Callable</a>.</p></li>
</ul>

In [185]:
s1 = pd.Series(np.random.randn(6), index=list('abcdef'))

In [186]:
s1

Unnamed: 0,0
a,0.57644
b,-1.806302
c,0.265821
d,-1.031924
e,0.606264
f,-1.112678


In [187]:
s1.loc['c':]

Unnamed: 0,0
c,0.265821
d,-1.031924
e,0.606264
f,-1.112678


In [188]:
s1.loc['b']

np.float64(-1.8063022950848593)

<p>Note that setting works as well:</p>

In [189]:
s1.loc['c'] = 0

In [190]:
s1

Unnamed: 0,0
a,0.57644
b,-1.806302
c,0.0
d,-1.031924
e,0.606264
f,-1.112678


<p>With a DataFrame:</p>

In [191]:
df1 = pd.DataFrame(np.random.randn(6, 4), index=list('abcdef'), columns=list('ABCD'))

In [192]:
df1

Unnamed: 0,A,B,C,D
a,0.059498,-1.086628,-0.212981,-1.346683
b,-0.934758,-0.032437,-0.927853,-0.882451
c,0.996432,-0.640101,-0.330783,0.35691
d,-1.268974,-1.596143,-1.321868,-2.37385
e,-0.131802,-1.183533,-0.256669,-0.743377
f,-0.163448,-0.21441,-0.022001,1.753169


In [193]:
df1.loc[['a', 'b', 'd'], :]

Unnamed: 0,A,B,C,D
a,0.059498,-1.086628,-0.212981,-1.346683
b,-0.934758,-0.032437,-0.927853,-0.882451
d,-1.268974,-1.596143,-1.321868,-2.37385


<p>Accessing via label slices:</p>

In [194]:
df1.loc['d':, 'A':'C']

Unnamed: 0,A,B,C
d,-1.268974,-1.596143,-1.321868
e,-0.131802,-1.183533,-0.256669
f,-0.163448,-0.21441,-0.022001


<p>For getting a cross section using a label (equivalent to <code>df.xs('a')</code>):</p>

In [195]:
df1.loc['a']

Unnamed: 0,a
A,0.059498
B,-1.086628
C,-0.212981
D,-1.346683


<p>For getting values with a boolean array:</p>

In [196]:
df1.loc['a'] > 0

Unnamed: 0,a
A,True
B,False
C,False
D,False


In [197]:
df1.loc[:, df1.loc['a'] > 0]

Unnamed: 0,A
a,0.059498
b,-0.934758
c,0.996432
d,-1.268974
e,-0.131802
f,-0.163448


<p>NA values in a boolean array propagate as <code>False</code>:</p>

In [198]:
mask = pd.array([True, False, True, False, pd.NA, False], dtype="boolean")

In [199]:
mask

<BooleanArray>
[True, False, True, False, <NA>, False]
Length: 6, dtype: boolean

In [200]:
df1[mask]

Unnamed: 0,A,B,C,D
a,0.059498,-1.086628,-0.212981,-1.346683
c,0.996432,-0.640101,-0.330783,0.35691


<p>For getting a value explicitly:</p>

In [201]:
# this is also equivalent to ``df1.at['a','A']``
df1.loc['a', 'A']

np.float64(0.05949817130128715)

### <h3>Slicing with labels</h3>

<p>When using <code>.loc</code> with slices, if both the start and the stop labels are present in the index, then elements <em>located</em> between the two (including them) are returned:</p>

In [202]:
s = pd.Series(list('abcde'), index=[0, 3, 2, 5, 4])

In [203]:
s.loc[3:5]

Unnamed: 0,0
3,b
2,c
5,d


<p>If at least one of the two is absent, but the index is sorted, and can be
compared against start and stop labels, then slicing will still work as
expected, by selecting labels which <em>rank</em> between the two:</p>

In [204]:
s.sort_index()

Unnamed: 0,0
0,a
2,c
3,b
4,e
5,d


In [205]:
s.sort_index().loc[1:6]

Unnamed: 0,0
2,c
3,b
4,e
5,d


<p>However, if at least one of the two is absent <em>and</em> the index is not sorted, an error will be raised (since doing otherwise would be computationally expensive, as well as potentially ambiguous for mixed type indexes). For instance, in the above example, <code>s.loc[1:6]</code> would raise <code>KeyError</code>.</p>

<p>For the rationale behind this behavior, see <a href="https://pandas.pydata.org/docs/user_guide/advanced.html#advanced-endpoints-are-inclusive">Endpoints are inclusive</a>.</p>

In [206]:
s = pd.Series(list('abcdef'), index=[0, 3, 2, 5, 4, 2])

In [207]:
s.loc[3:5]

Unnamed: 0,0
3,b
2,c
5,d


<p>Also, if the index has duplicate labels <em>and</em> either the start or the stop label is duplicated, an error will be raised. For instance, in the above example, <code>s.loc[2:5]</code> would raise a <code>KeyError</code>.</p>

<p>For more information about duplicate labels, see <a class="reference internal" href="https://pandas.pydata.org/docs/user_guide/duplicates.html#duplicates">Duplicate Labels</a>.</p>

## <h2>Selection by position</h2>

<p>Warning</p>

<p>Whether a copy or a reference is returned for a setting operation, may depend on the context.
This is sometimes called <code>chained </code> and should be avoided.
See <a href="https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-view-versus-copy">Returning a View versus Copy</a>.</p>

<p>Pandas provides a suite of methods in order to get <strong>purely integer based indexing</strong>. The semantics follow closely Python and NumPy slicing.
These are <code>0-based</code> indexing.
When slicing, the start bound is <em>included</em>, while the upper bound is <em>excluded</em>.
Trying to use a non-integer, even a <strong>valid</strong> label will raise an <code>IndexError</code>.</p>

<p>The <code>.iloc</code> attribute is the primary access method. The following are valid inputs:</p>

<ul>

<li><p>An integer e.g. <code>5</code>.</p></li>

<li><p>A list or array of integers <code>[4, 3, 0]</code>.</p></li>

<li><p>A slice object with ints <code>1:7</code>.</p></li>

<li><p>A boolean array.</p></li>

<li><p>A <code>callable</code>, see <a href="https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-callable">Selection By Callable</a>.</p></li>

<li><p>A tuple of row (and column) indexes, whose elements are one of the
above types.</p></li>

</ul>

In [208]:
s1 = pd.Series(np.random.randn(5), index=list(range(0, 10, 2)))

In [209]:
s1

Unnamed: 0,0
0,-0.583735
2,-0.563738
4,2.457244
6,1.534862
8,-0.842231


In [210]:
s1.iloc[:3]

Unnamed: 0,0
0,-0.583735
2,-0.563738
4,2.457244


<p>Note that setting works as well:</p>

In [211]:
s1.iloc[:3] = 0

In [212]:
s1

Unnamed: 0,0
0,0.0
2,0.0
4,0.0
6,1.534862
8,-0.842231


<p>With a DataFrame:</p>

In [213]:
df1 = pd.DataFrame(np.random.randn(6, 4), index=list(range(0, 12, 2)), columns=list(range(0, 8, 2)))

In [214]:
df1

Unnamed: 0,0,2,4,6
0,-0.690765,0.192697,0.242357,-0.057781
2,-0.030983,-0.048207,-0.517384,-0.17009
4,0.879947,1.53923,1.592937,1.666912
6,-1.21738,-0.299179,-0.284478,0.389121
8,0.883598,1.53097,-1.296463,-0.738495
10,-0.406917,-1.089195,-0.288121,0.356508


<p>Select via integer slicing:</p>

In [215]:
df1.iloc[:3]

Unnamed: 0,0,2,4,6
0,-0.690765,0.192697,0.242357,-0.057781
2,-0.030983,-0.048207,-0.517384,-0.17009
4,0.879947,1.53923,1.592937,1.666912


In [216]:
df1.iloc[1:5, 2:4]

Unnamed: 0,4,6
2,-0.517384,-0.17009
4,1.592937,1.666912
6,-0.284478,0.389121
8,-1.296463,-0.738495


<p>Select via integer list:</p>

In [217]:
df.iloc[[1, 3, 5], [1, 3]]

Unnamed: 0,B,D
2000-01-02,-0.082973,-1.634049
2000-01-04,1.367474,1.242647
2000-01-06,-0.298244,-1.212981


In [218]:
df1.iloc[1:3, :]

Unnamed: 0,0,2,4,6
2,-0.030983,-0.048207,-0.517384,-0.17009
4,0.879947,1.53923,1.592937,1.666912


In [219]:
df1.iloc[1:3, :]

Unnamed: 0,0,2,4,6
2,-0.030983,-0.048207,-0.517384,-0.17009
4,0.879947,1.53923,1.592937,1.666912


In [220]:
# this is also equivalent to ``df1.iat[1,1]``
df1.iloc[1, 1]

np.float64(-0.04820704913730757)

<p>For getting a cross section using an integer position (equiv to <code class="docutils literal notranslate"><span class="pre">df.xs(1)</span></code>):</p>

In [221]:
df1.iloc[1]

Unnamed: 0,2
0,-0.030983
2,-0.048207
4,-0.517384
6,-0.17009


<p>Out of range slice indexes are handled gracefully just as in Python/NumPy.</p>

In [222]:
# these are allowed in Python/NumPy.
x = list('abcdef')

In [223]:
x

['a', 'b', 'c', 'd', 'e', 'f']

In [224]:
x[4:10]

['e', 'f']

In [225]:
x[8:10]

[]

In [226]:
s = pd.Series(x)

In [227]:
s

Unnamed: 0,0
0,a
1,b
2,c
3,d
4,e
5,f


In [228]:
s.iloc[4:10]

Unnamed: 0,0
4,e
5,f


In [229]:
s.iloc[8:10]

Unnamed: 0,0


<p>Note that using slices that go out of bounds can result in
an empty axis (e.g. an empty DataFrame being returned).</p>

In [230]:
dfl = pd.DataFrame(np.random.randn(5, 2), columns=list('AB'))

In [231]:
dfl

Unnamed: 0,A,B
0,-1.538907,0.883003
1,-1.162989,-0.613923
2,0.750208,0.963149
3,-0.526458,-0.507008
4,2.151518,-1.292975


In [232]:
dfl.iloc[:, 2:3]

0
1
2
3
4


In [233]:
dfl.iloc[:, 1:3]

Unnamed: 0,B
0,0.883003
1,-0.613923
2,0.963149
3,-0.507008
4,-1.292975


In [234]:
dfl.iloc[4:6]

Unnamed: 0,A,B
4,2.151518,-1.292975


<p>A single indexer that is out of bounds will raise an <code class="docutils literal notranslate"><span class="pre">IndexError</span></code>.
A list of indexers where any element is out of bounds will raise an
<code class="docutils literal notranslate"><span class="pre">IndexError</span></code>.</p>

In [236]:
dfl.iloc[[4, 5, 6]]

IndexError: positional indexers are out-of-bounds

In [237]:
dfl.iloc[:, 4]

IndexError: single positional indexer is out-of-bounds

## <h2>Selection by callable</h2>

<p><code>.loc</code>, <code>.iloc</span></code>, and also <code>[]</code> indexing can accept a <code>callable</code> as indexer.
The <code>callable</code> must be a function with one argument (the calling Series or DataFrame) that returns valid output for indexing.</p>

<p>Note</p>

<p>For <code>.iloc</code> indexing, returning a tuple from the callable is
not supported, since tuple destructuring for row and column indexes
occurs <em>before</em> applying callables.</p>

In [238]:
df1 = pd.DataFrame(np.random.randn(6, 4), index=list('abcdef'), columns=list('ABCD'))

In [240]:
df1

Unnamed: 0,A,B,C,D
a,1.342893,-1.356947,-0.56153,0.394292
b,-1.231639,-0.591642,-1.147701,0.833988
c,-1.723899,1.547955,-1.65627,0.395876
d,0.969737,-0.442685,0.849941,-0.203218
e,0.601705,0.0536,-1.527806,-0.596246
f,-0.666624,-0.882096,1.357987,0.212302


In [241]:
df1.loc[lambda df: df['A'] > 0, :]

Unnamed: 0,A,B,C,D
a,1.342893,-1.356947,-0.56153,0.394292
d,0.969737,-0.442685,0.849941,-0.203218
e,0.601705,0.0536,-1.527806,-0.596246


In [242]:
df1.loc[:, lambda df: ['A', 'B']]

Unnamed: 0,A,B
a,1.342893,-1.356947
b,-1.231639,-0.591642
c,-1.723899,1.547955
d,0.969737,-0.442685
e,0.601705,0.0536
f,-0.666624,-0.882096


In [243]:
df1.iloc[:, lambda df: [0, 1]]

Unnamed: 0,A,B
a,1.342893,-1.356947
b,-1.231639,-0.591642
c,-1.723899,1.547955
d,0.969737,-0.442685
e,0.601705,0.0536
f,-0.666624,-0.882096


In [244]:
df1[lambda df: df.columns[0]]

Unnamed: 0,A
a,1.342893
b,-1.231639
c,-1.723899
d,0.969737
e,0.601705
f,-0.666624


<p>You can use callable indexing in <code class="docutils literal notranslate"><span class="pre">Series</span></code>.</p>

In [245]:
df1['A'].loc[lambda s: s > 0]

Unnamed: 0,A
a,1.342893
d,0.969737
e,0.601705


<p>Using these methods / indexers, you can chain data selection operations
without using a temporary variable.</p>

In [246]:
bb = pd.read_csv('https://raw.githubusercontent.com/pandas-dev/pandas/refs/heads/main/doc/data/baseball.csv', index_col='id')

In [247]:
(bb.groupby(['year', 'team']).sum(numeric_only=True)
    .loc[lambda df: df['r'] > 100])

Unnamed: 0_level_0,Unnamed: 1_level_0,stint,g,ab,r,h,X2b,X3b,hr,rbi,sb,cs,bb,so,ibb,hbp,sh,sf,gidp
year,team,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
2007,CIN,6,379,745,101,203,35,2,36,125.0,10.0,1.0,105,127.0,14.0,1.0,1.0,15.0,18.0
2007,DET,5,301,1062,162,283,54,4,37,144.0,24.0,7.0,97,176.0,3.0,10.0,4.0,8.0,28.0
2007,HOU,4,311,926,109,218,47,6,14,77.0,10.0,4.0,60,212.0,3.0,9.0,16.0,6.0,17.0
2007,LAN,11,413,1021,153,293,61,3,36,154.0,7.0,5.0,114,141.0,8.0,9.0,3.0,8.0,29.0
2007,NYN,13,622,1854,240,509,101,3,61,243.0,22.0,4.0,174,310.0,24.0,23.0,18.0,15.0,48.0
2007,SFN,5,482,1305,198,337,67,6,40,171.0,26.0,7.0,235,188.0,51.0,8.0,16.0,6.0,41.0
2007,TEX,2,198,729,115,200,40,4,28,115.0,21.0,4.0,73,140.0,4.0,5.0,2.0,8.0,16.0
2007,TOR,4,459,1408,187,378,96,2,58,223.0,4.0,2.0,190,265.0,16.0,12.0,4.0,16.0,38.0
