# Extracting subsets of data frames

In this notebook, we will learn how to manipulate pandas DataFrame objects, starting with extracting subsets.

In [None]:
# import pandas

# load the gapminder dataset and save as `gapminder`

# take a look at the head of gapminder


### Extracting multiple columns


In [None]:
# try to extract two columns: country and gdpPercap from gapminder using the `df[]` notation with two column names


The `df[]` syntax expects only one value (or object) inside the square parentheses. 

Fortunately, you can provide multiple column names as a single **list** object.

In [None]:
# create a list containing the names of the columns we want to extract: 'country' and 'gdpPercap' 


You can extract both the country and gdpPercap columns by providing this *list* in the indexing square parentheses

In [None]:
# provide the list of names inside the `df[]` notation to extract the two columns from gapminder


### The `.loc` indexer

An alternative (and ultimately more flexible) approach to subsetting a Pandas DataFrame is to use the `.loc` indexer. 

With `.loc`, the square brackets expect *two* values: one for the *row* index and one for the *column* index. 

The general syntax is `df.loc[rows, cols]`.

In [None]:
# Use `df.loc[,]` to extract the entry with row index 3 from the 'gdpPercap' column


### Using `:` with `.loc` to select all rows/columns

If you want to extract all rows (or columns), you can replace the corresponding index entry with `:`. So the following code will extract all rows for the `gdpPercap` column:

In [None]:
# Use `df.loc[,]` to extract all rows from the 'gdpPercap' column

# what are two other ways that you could do this same thing?


If you want to extract multiple columns (or rows), you still need to provide all of the index values that you want to extract in a list.

In [None]:
# use `df.loc[,]` to extract all rows for the 'country' and 'gdpPercap' columns

# what is another way to do the same thing?

In [None]:
# extract the rows with index 4, 5, 6, 7, and 8 for the country and gdpPercap columns


If your index corresponds to a sequence of integers, you can instead provide a "range" object:

In [None]:
# use the `range()` function to simplify the code in the previous cell


### Using `.loc` with non-numeric indexes

Let's create `gapminder_country`, whose row index corresponds to the country variable:

In [None]:
# define gapminder_country as a new dataframe with the country column as the row index

# look at gapminder_country


In [None]:
# use the `df.loc[,]` notation to extract the rows for Germany for the gdpPercap column


### Exercise

1. Extract the population and year columns for Australia using `gapminder_country`.

2. Extract the 'country' and 'lifeExp' columns for the first, second, and third rows of `gapminder_country`.