#### Part 16: Advanced Indexing and Selection in Pandas

In this notebook, we'll explore:
- Random sampling with seeds
- Setting with enlargement
- Fast scalar value getting and setting
- Dictionary-like get() method
- The lookup() method
- Index objects and metadata

##### Setup
First, let's import the necessary libraries:

In [None]:
import pandas as pd
import numpy as np
import datetime

##### 1. Random Sampling with Seeds

You can set a seed for sample's random number generator using the `random_state` argument, which will accept either an integer (as a seed) or a NumPy RandomState object.

In [None]:
df4 = pd.DataFrame({'col1': [1, 2, 3], 'col2': [2, 3, 4]})

# With a given seed, the sample will always draw the same rows
df4.sample(n=2, random_state=2)

In [None]:
# Running it again with the same seed gives the same result
df4.sample(n=2, random_state=2)

##### 2. Setting with Enlargement

The `.loc/[]` operations can perform enlargement when setting a non-existent key for that axis. In the Series case, this is effectively an appending operation.

In [None]:
se = pd.Series([1, 2, 3])
print(se)

In [None]:
# Setting a value at a non-existent index position
se[5] = 5.
print(se)

A DataFrame can be enlarged on either axis via `.loc`.

In [None]:
dfi = pd.DataFrame(np.arange(6).reshape(3, 2),
                  columns=['A', 'B'])
dfi

In [None]:
# Adding a new column
dfi.loc[:, 'C'] = dfi.loc[:, 'A']
dfi

In [None]:
# Adding a new row
dfi.loc[3] = 5
dfi

##### 3. Fast Scalar Value Getting and Setting

Since indexing with `[]` must handle a lot of cases (single-label access, slicing, boolean indexing, etc.), it has a bit of overhead in order to figure out what you're asking for. If you only want to access a scalar value, the fastest way is to use the `at` and `iat` methods, which are implemented on all of the data structures.

- `at` provides label-based scalar lookups
- `iat` provides integer-based lookups

In [None]:
# Create a Series and DataFrame for demonstration
s = pd.Series([0, 1, 2, 3, 4, 5])

dates = pd.date_range('20000101', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])

# Display the DataFrame
df

In [None]:
# Using iat for integer-based lookup
s.iat[5]

In [None]:
# Using at for label-based lookup
df.at[dates[5], 'A']

In [None]:
# Using iat for integer-based lookup in DataFrame
df.iat[3, 0]

You can also set values using these same indexers:

In [None]:
# Setting values using at
df.at[dates[5], 'E'] = 7

# Setting values using iat
df.iat[3, 0] = 7

df

`at` may enlarge the object in-place if the indexer is missing:

In [None]:
# Adding a new row with at
df.at[dates[-1] + pd.Timedelta('1 day'), 0] = 7
df

##### 4. Dictionary-like get() Method

Each of Series or DataFrame have a `get` method which can return a default value.

In [None]:
s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])

# Equivalent to s['a']
s.get('a')

In [None]:
# Getting a non-existent key with a default value
s.get('x', default=-1)

##### 5. The lookup() Method

Sometimes you want to extract a set of values given a sequence of row labels and column labels, and the `lookup` method allows for this and returns a NumPy array.

In [None]:
dflookup = pd.DataFrame(np.random.rand(20, 4), columns=['A', 'B', 'C', 'D'])
dflookup.head()

In [None]:
# Extract values at specific row and column positions
dflookup.lookup(list(range(0, 10, 2)), ['B', 'C', 'A', 'B', 'D'])

##### 6. Index Objects

The pandas `Index` class and its subclasses can be viewed as implementing an ordered multiset. Duplicates are allowed.

In [None]:
# Creating an Index directly
index = pd.Index(['e', 'd', 'a', 'b'])
index

In [None]:
# Testing membership
'd' in index

### 6.1 Setting Metadata

You can also pass a name to be stored in the index:

In [None]:
# Creating an index with a name
index = pd.Index(['e', 'd', 'a', 'b'], name='something')
index.name

In [None]:
# The name will be shown in the console display
index = pd.Index(list(range(5)), name='rows')
columns = pd.Index(['A', 'B', 'C'], name='cols')

df = pd.DataFrame(np.random.randn(5, 3), index=index, columns=columns)
df

In [None]:
# Selecting a column shows the index name
df['A']

Indexes are "mostly immutable", but it is possible to set and change their metadata, like the index name:

In [None]:
ind = pd.Index([1, 2, 3])

# Create a new index with a different name
ind.rename("apple")

In [None]:
# Original index is unchanged
ind

In [None]:
# Change the name in-place
ind.set_names(["apple"], inplace=True)
ind

In [None]:
# Another way to change the name
ind.name = "bob"
ind