## <center><b>Python for Data Science</b></center>
## <center><b>Lesson 27</b></center>
## <center><b>Pandas Basics -- Part Three</b></center>
## <center><b>Pandas Index (Notes)</b></center>

![7.jpg](attachment:7.jpg)

<font size="6"><center>[Link: Pandas Documentation](https://pandas.pydata.org/docs/)</center></font>

##  <span style="color:red">TABLE OF CONTENTS</span>

1. [What Is a Pandas Index?](#1)<br>
2. [What Does the Pandas Index Do?](#2)<br>
a. [Index for Identification](#2a)<br>
b. [Index for Selection -- loc method](#2b)<br>
c. [Index for Selection -- iloc method](#2c)<br>
d. [Index for Alignment](#2d)<br>

In [12]:
# set up notebook to display multiple output in one cell

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

print('The notebook is set up to display multiple output in one cell.')

The notebook is set up to display multiple output in one cell.


In [5]:
# Import libraries

import pandas as pd
import numpy as np

# The Data Structures / Objects Provided by Pandas

1. Pandas DataFrame (2-Dimensional)
2. Pandas Series (1-Dimensional)
3. <code style="background:yellow;color:black">Pandas Index</code>

<a class="anchor" id="1"></a>
# <span style="color:blue"><b>1. What Is a Pandas Index?</b></span>

<b>So what exactly, is an Index? </b>

It is one of 3 data structures in Pandas with the other 2 being the Pandas Series and Pandas DataFrame.

You can think of the Pandas Index as sort of like the address that you can use in identifying the specific rows or columns.

It should be noted that the index is not considered to be a part of the DataFrame’s dimensions.

<b>For a Pandas DataFrame ...</b>

The indices are the row names and column names. 

Typically the row names are referred to simply as <b>Index</b> while the column names, are still referred to as <b>column names</b>.

Below are some code snippets to print the Index and Column names.

In [10]:
# Create a Pandas DataFrame

big_ten_df = pd.DataFrame({
                  'School': ['Illinois', 'Indiana', 'Iowa', 'Maryland', 'Michigan', 'Michigan State', 
                             'Minnesota', 'Nebraska', 'Northwestern', 'Ohio_State', 'Penn State', 'Purdue', 
                             'Rutgers', 'Wisconsin'], 
                  'City': ['Bloomington', 'Urbana-Champaign', 'Iowa City', 'College Park', 'Ann Arbor', 
                           'East Lansing', 'Minneapolis-St. Paul', 'Lincoln', 'Evanston', 'Columbus', 
                           'University Park', 'West Lafayette', 'New_Brunswick-Piscataway', 'Madison'], 
                  'State': ['Illinois', 'Indiana', 'Iowa', 'Maryland', 'Michigan', 'Michigan', 'Minnesota', 
                            'Nebraska', 'Illinois', 'Ohio', 'Pennsylvania', 'Indiana', 'New_Jersey', 'Wisconsin'], 
                  'Division': ['West', 'East', 'West', 'East', 'East', 'East', 'West', 'West', 'West', 'East', 
                               'East', 'West','East', 'West'], 
                  'Enrollment': [52331, 42552, 30448, 40709, 47907, 49695, 52017, 25057, 22316, 61369, 45901, 
                                 45869, 50411, 45540], 
                  'Type': ['Public', 'Public', 'Public', 'Public', 'Public', 'Public', 'Public', 'Public', 
                           'Private', 'Public', 'Public', 'Public', 'Public', 'Public'], 
                  'Nickname': ['Fighting_Illini', 'Hoosiers', 'Hawkeyes', 'Terrapins', 'Wolverines', 'Spartans', 
                               'Gophers', 'Cornhuskers', 'Wildcats', 'Buckeyes', 'Nittany_Lions', 'Boilermakers', 
                               'Scarlet_Knights', 'Badgers']})

big_ten_df

# To print the Index (row names) of a DataFrame
big_ten_df.index

# To print the column names of a DataFrame
big_ten_df.columns

Unnamed: 0,School,City,State,Division,Enrollment,Type,Nickname
0,Illinois,Bloomington,Illinois,West,52331,Public,Fighting_Illini
1,Indiana,Urbana-Champaign,Indiana,East,42552,Public,Hoosiers
2,Iowa,Iowa City,Iowa,West,30448,Public,Hawkeyes
3,Maryland,College Park,Maryland,East,40709,Public,Terrapins
4,Michigan,Ann Arbor,Michigan,East,47907,Public,Wolverines
5,Michigan State,East Lansing,Michigan,East,49695,Public,Spartans
6,Minnesota,Minneapolis-St. Paul,Minnesota,West,52017,Public,Gophers
7,Nebraska,Lincoln,Nebraska,West,25057,Public,Cornhuskers
8,Northwestern,Evanston,Illinois,West,22316,Private,Wildcats
9,Ohio_State,Columbus,Ohio,East,61369,Public,Buckeyes


RangeIndex(start=0, stop=14, step=1)

Index(['School', 'City', 'State', 'Division', 'Enrollment', 'Type',
       'Nickname'],
      dtype='object')

<b>For a Pandas DataFrame ...</b>

- We can call the index as the Series name. The Series name can be set initially when calling the constructor.

- Below is a code snippet to print the Series name:

![5.jpg](attachment:5.jpg)

In [16]:
# Create a Pandas Series
ivy_league_series = pd.Series(['Brown', 'Columbia', 'Cornell', 'Dartmouth', 'Harvard', 'Penn', 'Princeton', 'Yale'], 
                              name = 'Ivy_League_Schools') 
ivy_league_series

# To print the Series name
ivy_league_series.name

0        Brown
1     Columbia
2      Cornell
3    Dartmouth
4      Harvard
5         Penn
6    Princeton
7         Yale
Name: Ivy_League_Schools, dtype: object

'Ivy_League_Schools'

<a class="anchor" id="2"></a>
# <span style="color:blue"><b>2. What Does the Pandas Index Do?</b></span>

Video #1: [What do I need to know about the pandas index? (Part 1)](https://www.youtube.com/watch?v=OYZNk7Z9s6I)

Video #2: [What do I need to know about the pandas index? (Part 2)](https://www.youtube.com/watch?v=15q-is8P_H4)

<b>Essentially, the Pandas index has 3 roles:</b>

1. Identification
2. Selection
3. Alignment

<a class="anchor" id="2a"></a>
## <span style="color:red"><b><i>a. Index for Identification</b></span>

- In the screenshot shown below, we can see that each row is uniquely identified by a unique index number. 

- Notice that the number is not a simple running number (e.g. 0, 1, 2, 3, 5, etc.) since we have filtered the DataFrame to selectively show only rows where the value == ‘male’ for the sex column. 

- Particularly, this corresponds to rows with index numbers 0, 4, 6, 8, 9, …, 323 and 325.

![image.png](attachment:image.png)

<a class="anchor" id="2b"></a>
## <span style="color:red"><b><i>b. Index for Selection -- loc method</b></span>

![6.jpg](attachment:6.jpg)

![7.jpg](attachment:7.jpg)

- It should be mentioned that the <b>loc</b> method is inherently made for working with <b>labels</b> as in <i>categorical labels</i> for the column names. 

- Typically, column names are categorical labels such as <b>species</b> for this Penguins dataset.

- Thus, we can select the columns by using their categorical labels:

![8.jpg](attachment:8.jpg)

<a class="anchor" id="2c"></a>
## <span style="color:red"><b><i>c. Index for Selection -- iloc method</b></span>

![9.jpg](attachment:9.jpg)

<b>loc method</b>

- There is also the iloc method that is made for working with DataFrames with reference to their index integer values. 

- More specifically, instead of referring to the first column as ‘species’ we can refer to it now as 0 since it is the first column having an index integer value of 0.

- In the screenshot below, we can select specific columns and rows via the use of the numerical index positions (e.g. 0, 1, 2 to refer to rows 1, 2, 3 as well as columns 0, 1, 2).

![11.jpg](attachment:11.jpg)

<a class="anchor" id="2d"></a>
## <span style="color:red"><b><i>d. Index for Alignment</b></span>

- The index number serves as a unique identifier of the rows and columns.

- We can see in the example below that the multiplication of <b>df.body_mass_g</b> against <b>s1</b> will perform the multiplication operation and give rise to <b>NaN</b> values (because there are no values in the <b>s1 Series</b> that can be used to multiply with <b>df.body_mass_g</b>.

- Thus, the index values are aligned and only those that are presented in both Pandas object will produce a valid value (i.e. the <b>s1 Series</b> only has 5 index values and these produce valid results) while those not presented will produce <b>NaN</b> values.

![image.png](attachment:image.png)

### Related Articles

[How to Use loc and iloc for Selecting Data in Pandas](https://towardsdatascience.com/how-to-use-loc-and-iloc-for-selecting-data-in-pandas-bd09cb4c3d79())

[loc vs. iloc in pandas](https://towardsdatascience.com/loc-vs-iloc-in-pandas-92fc125ed8eb)