---
# Index Alignment
---

## Examining the Index object
All Index objects, except for the MultiIndex, are single-dimensional data structures that combine the functionality of Python sets and NumPy ndarrays.

Examine the column index of the college dataset and explore much
of its functionality.  
Read in the college dataset, and create a variable columns that holds the column index

In [1]:
import numpy as np
import pandas as pd

In [3]:
college = pd.read_csv('./college.csv')
cols = college.columns
cols

Index(['INSTNM', 'CITY', 'STABBR', 'HBCU', 'MENONLY', 'WOMENONLY', 'RELAFFIL',
       'SATVRMID', 'SATMTMID', 'DISTANCEONLY', 'UGDS', 'UGDS_WHITE',
       'UGDS_BLACK', 'UGDS_HISP', 'UGDS_ASIAN', 'UGDS_AIAN', 'UGDS_NHPI',
       'UGDS_2MOR', 'UGDS_NRA', 'UGDS_UNKN', 'PPTUG_EF', 'CURROPER', 'PCTPELL',
       'PCTFLOAN', 'UG25ABV', 'MD_EARN_WNE_P10', 'GRAD_DEBT_MDN_SUPP'],
      dtype='object')

Use the `.values` attribute to access the underlying NumPy array

In [6]:
cols.values

array(['INSTNM', 'CITY', 'STABBR', 'HBCU', 'MENONLY', 'WOMENONLY',
       'RELAFFIL', 'SATVRMID', 'SATMTMID', 'DISTANCEONLY', 'UGDS',
       'UGDS_WHITE', 'UGDS_BLACK', 'UGDS_HISP', 'UGDS_ASIAN', 'UGDS_AIAN',
       'UGDS_NHPI', 'UGDS_2MOR', 'UGDS_NRA', 'UGDS_UNKN', 'PPTUG_EF',
       'CURROPER', 'PCTPELL', 'PCTFLOAN', 'UG25ABV', 'MD_EARN_WNE_P10',
       'GRAD_DEBT_MDN_SUPP'], dtype=object)

Select items from the index by position with a scalar, list, or slice

In [7]:
cols[5]

'WOMENONLY'

In [9]:
cols[[1, 8, -1]]

Index(['CITY', 'SATMTMID', 'GRAD_DEBT_MDN_SUPP'], dtype='object')

In [10]:
cols[2:6]

Index(['STABBR', 'HBCU', 'MENONLY', 'WOMENONLY'], dtype='object')

Indexes share many of the same methods as Series and DataFrames:

In [12]:
cols.min(), cols.max(), cols.isnull().sum(), cols.value_counts().sum()

('CITY', 'WOMENONLY', 0, 27)

 Basic arithmetic and comparison operators on Index objects

In [13]:
cols + '_A'

Index(['INSTNM_A', 'CITY_A', 'STABBR_A', 'HBCU_A', 'MENONLY_A', 'WOMENONLY_A',
       'RELAFFIL_A', 'SATVRMID_A', 'SATMTMID_A', 'DISTANCEONLY_A', 'UGDS_A',
       'UGDS_WHITE_A', 'UGDS_BLACK_A', 'UGDS_HISP_A', 'UGDS_ASIAN_A',
       'UGDS_AIAN_A', 'UGDS_NHPI_A', 'UGDS_2MOR_A', 'UGDS_NRA_A',
       'UGDS_UNKN_A', 'PPTUG_EF_A', 'CURROPER_A', 'PCTPELL_A', 'PCTFLOAN_A',
       'UG25ABV_A', 'MD_EARN_WNE_P10_A', 'GRAD_DEBT_MDN_SUPP_A'],
      dtype='object')

In [14]:
cols > 'G'

array([ True, False,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True, False,  True,  True,  True,  True,  True])

In [20]:
uniq, cnt = np.unique((cols > 'G'), return_counts=True)  
dict(zip(uniq, cnt))

{False: 3, True: 24}

Trying to change an Index value after its creation fails. Indexes are immutable
objects:

In [23]:
# cols[1] = 'city'

Indexes support the set operations—union, intersection, difference, and symmetric difference

In [25]:
 c1 = cols[:4]
 c1

Index(['INSTNM', 'CITY', 'STABBR', 'HBCU'], dtype='object')

In [26]:
c2 = cols[2:6]
c2

Index(['STABBR', 'HBCU', 'MENONLY', 'WOMENONLY'], dtype='object')

In [27]:
c1.union(c2)

Index(['CITY', 'HBCU', 'INSTNM', 'MENONLY', 'STABBR', 'WOMENONLY'], dtype='object')

In [28]:
c1 | c2

Index(['CITY', 'HBCU', 'INSTNM', 'MENONLY', 'STABBR', 'WOMENONLY'], dtype='object')

In [29]:
c1.symmetric_difference(c2)

Index(['CITY', 'INSTNM', 'MENONLY', 'WOMENONLY'], dtype='object')

In [30]:
c1 ^ c2

Index(['CITY', 'INSTNM', 'MENONLY', 'WOMENONLY'], dtype='object')

## Producing Cartesian products
Construct two Series that have indexes that are different but contain some of the
same values:

In [31]:
s1 = pd.Series(data=list(range(4)), index=list('aaab'))
s1

a    0
a    1
a    2
b    3
dtype: int64

In [32]:
s2 = pd.Series(data=list(range(6)), index=list('cababb'))
s2

c    0
a    1
b    2
a    3
b    4
b    5
dtype: int64

Add the two Series together to produce a Cartesian product.  For each a index value
in s1, we add every a in s2

In [33]:
s1 + s2

a    1.0
a    3.0
a    2.0
a    4.0
a    3.0
a    5.0
b    5.0
b    7.0
b    8.0
c    NaN
dtype: float64

In [36]:
(s1+s2).apply(type).unique()

array([<class 'float'>], dtype=object)

## Exploding indexes
Add two larger Series that have indexes with only a few unique values but in
different orders. The result will explode the number of values in the indexes

Read in the employee data and set the index to the RACE column: