# 2.4 Select both rows and columns of a `DataFrame`

In [None]:
import numpy as np
import pandas as pd

In [None]:
# From a dict of lists
df = pd.DataFrame({
    'ticker': ['AAPL', 'AAPL', 'MSFT', 'IBM', 'YHOO'],
    'date': ['2015-12-30', '2015-12-31', '2015-12-30', '2015-12-30', '2015-12-30'],
    'open': [426.23, 427.81, 42.3, 101.65, 35.53]
})
df

The most common situation is logical indexing on the rows and
label indexing on the columns using `loc`.

In [None]:
idx = [True, True, False, True, True]
df.loc[idx, ['date', 'open']]

In [None]:
df.loc[:, ['date', 'open']]

### Add an index 

May select by _label_ on both rows and columns.

We haven't set an index on `df` so it has the default integer index.
Let's set one now.

In [None]:
df

In [None]:
# Note that `df1` is a copy of `df`
df1 = df.set_index('ticker')

### Select by row index

In [None]:
df1

The tickers are no longer part of the values of the `DataFrame`

Consequently, we can use them for index lookups.

### Safely select rows

In [None]:
# explicitly require all columns and return a dataframe
df1.loc[['MSFT'], :]

### Unsafely select rows

In [None]:
# Select by row label
df1.loc['MSFT', :]

Defaults to all columns, but I prefer explicit selection.
Easier to figure out what your code is doing.

In [None]:
df1.loc['MSFT'] # same result but confusing

### Always return rows as a `DataFrame`

Rows may be returned as either `Series` or `DataFrame` by using `loc`.

In [None]:
df1.loc['AAPL', :]

This always returns a `DataFrame`:

In [None]:
df1.loc[['AAPL'], :]