## DataFrame indexing

In this section we will learn about the column and row indexes of the DataFrame. These are essentially the column and row names of the DataFrame.

We will keep working with the gapminder DataFrame.

Let's start by importing the libraries that we need:

In [None]:
# import the pandas library
import pandas as pd

Let's load the gapminder dataset:

In [None]:
# read the data from the downloaded CSV file and save it as a pandas DataFrame called gapminder
gapminder = pd.read_csv('https://raw.githubusercontent.com/UofUDELPHI/2024-02-08-python/main/content/complete/data/gapminder.csv')

To view the dataset we have just loaded, we can type the name of the variable that we saved it in:

In [None]:
# look at the gapminder object
gapminder

### The column index

To extract the column index, which corresponds to the column names of the DataFrame, we need to extract the `columns` attribute of the Dataframe object.

In [None]:
# extract the column index (the column names) from the gapminder DataFrame
gapminder.columns

Notice that the output of the cell above is an "Index" object. If we want to just extract the column values themselves from the index object, we can use the `list()` function to convert the index object to a simpler type of object called a "list" (which is just a collection of values):

In [None]:
# use the list() function to create a "list" of the column names
list(gapminder.columns)

### The row index

The row index can be extracted using the `index` attribute (there is no `rows` attribute):

In [None]:
# extract the row index from gapminder
gapminder.index

This time, the output is a `RangeIndex` object, which corresponds to a sequence of integer values with a start value and a stop value with a step size. Since the start is 0, the stop is 1704 (note that the stop is *not* inclusive) and the step size is 1, this RangeIndex corresponds to the integer values 0, 1, 2, 3, ..., 1703. 

To extract the actual integer values from the RangeIndex object, we can convert it to a list using the `list()` function:

In [None]:
# use the list() function to create a "list" of the row index entries
list(gapminder.index)

### Changing the index

You can change the index using the `set_index()` method and providing, for example, a column name as a string.

Let's set the `country` column to be the row index:

In [None]:
# use the set_index() method to set the 'country' column as the index of gapminder
gapminder.set_index('country')

Notice that the row index name is on its own line and the original integer row index has disappeared. 

However, notice also that this did not actually modify the `gapminder` object itself. When we print it out below, notice that it is unchanged:

In [None]:
# show that the gapminder DataFrame still has its original index
gapminder

If we wanted to create a version of the `gapminder` DataFrame with the country column as the index, we need to save it as a new variable (you can instead overwrite the `gapminder` variable with this new version, but this is not recommended because we want to keep an unmodified version of the original dataset accessible in our environment). 

Below, we create a *new* DataFrame corresponding to the version of `gapminder` with the `'country'` column as the row index:

In [None]:
# create a new DataFrame called gapminder_country where the index is the 'country' column
gapminder_country = gapminder.set_index('country')

Notice that the original gapminder dataset is unchanged:

In [None]:
# print gapminder to show that gapminder DataFrame still has its original index
gapminder

But that the `gapminder_country` DataFrame has the `country` column as its index. 

In [None]:
# print gapminder_country to show that it has the 'country' column as the index 
gapminder_country