#### Part 18: Sorting MultiIndex and Concatenation in Pandas

In this notebook, we'll explore:
- Sorting MultiIndex objects
- Concatenating DataFrames
- Different join types in concatenation
- Using the append method
- Ignoring indexes during concatenation

##### Setup
First, let's import the necessary libraries:

In [None]:
import pandas as pd
import numpy as np

##### 1. Sorting MultiIndex Objects

For MultiIndex-ed objects to be indexed and sliced effectively, they need to be sorted. As with any index, you can use `sort_index()`.

In [None]:
# Create a Series with MultiIndex
tuples = [('foo', 'one'), ('foo', 'two'), ('bar', 'one'), ('bar', 'two'), ('qux', 'one'), ('qux', 'two')]
s = pd.Series(np.random.randn(6), index=pd.MultiIndex.from_tuples(tuples))
s

In [None]:
# Sort the index
s.sort_index()

In [None]:
# Sort by level 0
s.sort_index(level=0)

In [None]:
# Sort by level 1
s.sort_index(level=1)

You may also pass a level name to `sort_index` if the MultiIndex levels are named.

In [None]:
# Set names for the levels
s.index.set_names(['L1', 'L2'], inplace=True)
s

In [None]:
# Sort by level name 'L1'
s.sort_index(level='L1')

In [None]:
# Sort by level name 'L2'
s.sort_index(level='L2')

On higher dimensional objects, you can sort any of the other axes by level if they have a MultiIndex:

In [None]:
# Create a DataFrame with MultiIndex
arrays = [['one', 'one', 'zero', 'zero'], ['y', 'x', 'y', 'x']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples)
df = pd.DataFrame(np.random.randn(4, 2), index=index)
df

In [None]:
# Sort the transposed DataFrame by level 1 on axis 1
df.T.sort_index(level=1, axis=1)

Indexing will work even if the data are not sorted, but will be rather inefficient (and show a PerformanceWarning). It will also return a copy of the data rather than a view.

In [None]:
# Create an unsorted MultiIndex DataFrame
dfm = pd.DataFrame({'jim': [0, 0, 1, 1],
                    'joe': ['x', 'x', 'z', 'y'],
                    'jolie': np.random.rand(4)})
dfm = dfm.set_index(['jim', 'joe'])
dfm

In [None]:
# Check if the index is lexsorted
dfm.index.is_lexsorted()

In [None]:
# Get the lexsort depth
dfm.index.lexsort_depth

In [None]:
# Sort the index
dfm = dfm.sort_index()
dfm

In [None]:
# Check if the index is now lexsorted
dfm.index.is_lexsorted()

In [None]:
# Get the lexsort depth after sorting
dfm.index.lexsort_depth

##### 2. Concatenating DataFrames

Pandas provides various facilities for combining together Series and DataFrame objects with various kinds of set logic for the indexes and relational algebra functionality in the case of join / merge-type operations.

In [None]:
# Create sample DataFrames for concatenation
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3'],
    'C': ['C0', 'C1', 'C2', 'C3'],
    'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])

df2 = pd.DataFrame({
    'A': ['A4', 'A5', 'A6', 'A7'],
    'B': ['B4', 'B5', 'B6', 'B7'],
    'C': ['C4', 'C5', 'C6', 'C7'],
    'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])

df3 = pd.DataFrame({
    'A': ['A8', 'A9', 'A10', 'A11'],
    'B': ['B8', 'B9', 'B10', 'B11'],
    'C': ['C8', 'C9', 'C10', 'C11'],
    'D': ['D8', 'D9', 'D10', 'D11']
}, index=[8, 9, 10, 11])

df4 = pd.DataFrame({
    'B': ['B2', 'B3', 'B6', 'B7'],
    'D': ['D2', 'D3', 'D6', 'D7'],
    'F': ['F2', 'F3', 'F6', 'F7']
}, index=[2, 3, 6, 7])

# Display df1 and df2
print("DataFrame 1:")
display(df1)
print("\nDataFrame 2:")
display(df2)

### 2.1 Concatenation with pd.concat

The `concat()` function does all of the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes on the other axes.

In [None]:
# Basic concatenation along axis=0 (rows)
result = pd.concat([df1, df2])
result

### 2.2 Set Logic on the Other Axes

When gluing together multiple DataFrames, you have a choice of how to handle the other axes (other than the one being concatenated):
- Take the union of them all, `join='outer'`. This is the default option as it results in zero information loss.
- Take the intersection, `join='inner'`.

In [None]:
# Outer join (default)
result = pd.concat([df1, df4], axis=1, sort=False)
result

In [None]:
# Inner join
result = pd.concat([df1, df4], axis=1, join='inner')
result

Reusing the exact index from the original DataFrame:

In [None]:
# Reindex after concatenation
result = pd.concat([df1, df4], axis=1).reindex(df1.index)
result

In [None]:
# Reindex before concatenation
result = pd.concat([df1, df4.reindex(df1.index)], axis=1)
result

### 2.3 Concatenating Using append

A useful shortcut to `concat()` are the `append()` instance methods on Series and DataFrame. These methods actually predated concat. They concatenate along axis=0, namely the index.

In [None]:
# Append df2 to df1
result = df1.append(df2)
result

In [None]:
# Append df4 to df1
result = df1.append(df4, sort=False)
result

In [None]:
# Append multiple DataFrames
result = df1.append([df2, df3])
result

### 2.4 Ignoring Indexes on the Concatenation Axis

For DataFrame objects which don't have a meaningful index, you may wish to append them and ignore the fact that they may have overlapping indexes. To do this, use the `ignore_index` argument.

In [None]:
# Concatenate with ignore_index=True
result = pd.concat([df1, df4], ignore_index=True, sort=False)
result