## Selecting multiple DataFrame columns

- Selecting a single column is accomplished by passing the desired column name as a sting to the indexing operator of a DataFrame
- It is often necessary to focus on a subset of the current working dataset, which is accomplished by `selecting multiple columns`.

In this recipe, all the `actor` and `director` columns will be selected from the `movie` dataset

In [2]:
import pandas as pd
import numpy as np
pd.options.display.max_columns = 40

- Read in the movie dataset
- Pass in a list of the desired columns to the indexing operator

In [5]:
movie = pd.read_csv('data/movie.csv')
movie_actor_director = movie[['actor_1_name', 'actor_2_name',
                              'actor_3_name', 'director_name']]
movie_actor_director.head()

Unnamed: 0,actor_1_name,actor_2_name,actor_3_name,director_name
0,CCH Pounder,Joel David Moore,Wes Studi,James Cameron
1,Johnny Depp,Orlando Bloom,Jack Davenport,Gore Verbinski
2,Christoph Waltz,Rory Kinnear,Stephanie Sigman,Sam Mendes
3,Tom Hardy,Christian Bale,Joseph Gordon-Levitt,Christopher Nolan
4,Doug Walker,Rob Walker,,Doug Walker


- There is a case when one column of a Dataframe needs to be selected
- This is done by passing a single element list to the indexing operato

In [6]:
movie[['director_name']].head()

Unnamed: 0,director_name
0,James Cameron
1,Gore Verbinski
2,Sam Mendes
3,Christopher Nolan
4,Doug Walker


## How it works...

- The DataFrame indexing operator is very flexible and capable of accepting a number of different objects
    - If a string is passed, it will return a single-dimensional Series
    - If a list is passed to the indexing operator, it returns a DataFrame of all the columns in the list in the specified order
- Most commonly, a single column is selected with a string, resulting in a Series.
    - When a DataFrame is the desired output, simply put the column name in a single-element list

## There's more...

- Passing a long list inside the indexing operator might cause readability issues
- You may save all your column names to a list variable first

In [7]:
cols = ['actor_1_name', 'actor_2_name',
        'actor_3_name', 'director_name']
movie_actor_director = movie[cols]

- One of the most common exceptions raised when working with pandas is `KeyError`
- This error is mainly due to mistyping of a column or index name
- This error is raised whenever a multiple column selection is attempted without the use of as list:

In [8]:
# no mistyping of col name but multiple column selection is attempted without the use of a list
movie['acotr_1_name', 'actor_2_name',
      'acotr_3_name', 'director_name']

KeyError: ('acotr_1_name', 'actor_2_name', 'acotr_3_name', 'director_name')