### Indexes

If you have not alrady done so, you will need to install Pandas in your virtual environment:

```
pip install pandas
```

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.html

We'll start by looking at Pandas `Index` objects.

By themselves they are not that useful, but they are tightly integrated to two other fundamental types in Pandas: `Series` and `DataFrame`.

In [1]:
import pandas as pd
import numpy as np

In [2]:
idx = pd.Index([10, 20, 30])

In [3]:
idx

Index([10, 20, 30], dtype='int64')

As you can see, an index has a specific data type - like a NumPy array.

If we mix data types, Pandas will find a suitable data type broad enough for our elements:

In [4]:
idx = pd.Index([1, 3.14])
idx

Index([1.0, 3.14], dtype='float64')

We can also specify strings for index elements:

In [5]:
idx = pd.Index(['element 1', 'element 2'])
idx

Index(['element 1', 'element 2'], dtype='object')

What's interesting about indexes is that they can behave both like arrays and like sets.

Let's look at the array-like behaviors first:

In [6]:
idx = pd.Index([2, 4, 6, 8, 10])
idx

Index([2, 4, 6, 8, 10], dtype='int64')

In [7]:
idx[0]

np.int64(2)

In [8]:
idx[1:4]

Index([4, 6, 8], dtype='int64')

In [9]:
idx[::-1]

Index([10, 8, 6, 4, 2], dtype='int64')

We can use fancy indexing (since Index arrays are based on NumPy arrays):

In [10]:
idx[[1, 3, 4]]

Index([4, 8, 10], dtype='int64')

And we can also use boolean masking, just like with normal NumPy arrays:

In [11]:
idx = pd.Index(['London', 'Paris', 'New York', 'Tokyo'])
idx[idx != 'Tokyo']

Index(['London', 'Paris', 'New York'], dtype='object')

Notice how slicing or fancy indexing returns a new `Index` object, not just a Python list or NumPy array.

However, an index is **not** mutable:

In [12]:
try:
    idx[0] = 100
except TypeError as ex:
    print('TypeError:', ex)

TypeError: Index does not support mutable operations


Note that the values we have been using inside the square brackets (`[]`) are the **positional indices** of the elements inside the `Index` objects.

`Index` objects also behave like sets in the sense that we can find the union and intersection of them:

In [13]:
idx_1 = pd.Index(['a', 'b', 'c'])
idx_2 = pd.Index(['c', 'd', 'e'])

In [15]:
idx_1.intersection(idx_2)

Index(['c'], dtype='object')

In [17]:
idx_1.union(idx_2)

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

If we have different data types, Pandas will use the broadest data type it needs for the resulting `Index` object:

In [19]:
pd.Index([1, 2, 3]).union(pd.Index([0.1, 0.2]))

Index([0.1, 0.2, 1.0, 2.0, 3.0], dtype='float64')

Pandas also implements many different types of more specialized indexes, such as 
- `RangeIndex`
- `Int64Index`
- `Float64Index`
- and many others...



We've alread seen `Int64Index` and `Float64Index`:

In [20]:
pd.Index([1, 2, 3])

Index([1, 2, 3], dtype='int64')

In [21]:
pd.Index([0.1, 0.2])

Index([0.1, 0.2], dtype='float64')

A range index is easily created using these two approaches:

Via a Python `range` object:

In [22]:
pd.Index(range(2, 10, 2))

RangeIndex(start=2, stop=10, step=2)

Or directly in Pandas:

In [23]:
pd.RangeIndex(2, 10, 2)

RangeIndex(start=2, stop=10, step=2)

We can still reference elements within that Index like we saw before:

In [24]:
idx = pd.RangeIndex(2, 10, 2)
idx

RangeIndex(start=2, stop=10, step=2)

In [25]:
idx[0]

2

In [26]:
idx[1:4]

RangeIndex(start=4, stop=10, step=2)

In [27]:
idx[::-1]

RangeIndex(start=8, stop=0, step=-2)

You'll notice that for slices, Pandas is smart about the result - it does not generate an entire list of values - instead if just created another range index. This means that a RangeIndex can be a lot more efficient in terms of storage, and sometimes computation speed than a regular index with the same explicit values.

It even is able to handle unions and intersections in the same way:

In [28]:
idx_1 = pd.RangeIndex(0, 5)
list(idx_1)

[0, 1, 2, 3, 4]

In [29]:
idx_2 = pd.RangeIndex(4, 8)
list(idx_2)

[4, 5, 6, 7]

In [31]:
idx_1.intersection(idx_2)

RangeIndex(start=4, stop=5, step=1)

In [34]:
list(idx_1.intersection(idx_2))

[4]

In [35]:
idx_1.union(idx_2)

RangeIndex(start=0, stop=8, step=1)

In [36]:
list(idx_1.union(idx_2))

[0, 1, 2, 3, 4, 5, 6, 7]

Not all unions and intersections of ranges can be expressed as a new range, so sometimes we end up with a regular index, not a range index:

In [37]:
pd.RangeIndex(1, 10, 2).union(pd.RangeIndex(1, 10, 3))

Index([1, 3, 4, 5, 7, 9], dtype='int64')

We can use `in` for containment testing for indexes in general:

In [38]:
idx_1 = pd.Index(['a', 'b', 'c'])
idx_2 = pd.RangeIndex(0, 10, 2)

In [39]:
'b' in idx_1

True

In [40]:
6 in idx_2

True

In [41]:
'x' in idx_1

False

In [42]:
1 in idx_2

False

One last thing to note is that index values do not have to be unique:

In [43]:
idx = pd.Index([1, 1, 2, 2, 3, 3])
idx

Index([1, 1, 2, 2, 3, 3], dtype='int64')