#### Part 19: Advanced Concatenation and Merging in Pandas

In this notebook, we'll explore:
- Concatenating with mixed dimensions
- Using group keys in concatenation
- Joining multiple DataFrames
- Merging values within Series or DataFrame columns
- Timeseries friendly merging

##### Setup
First, let's import the necessary libraries:

In [None]:
import pandas as pd
import numpy as np

##### 1. Concatenating with Mixed Dimensions

You can concatenate a mix of Series and DataFrame objects. The Series will be transformed to DataFrame with the column name as the name of the Series.

In [None]:
# Create a sample DataFrame
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3'],
    'C': ['C0', 'C1', 'C2', 'C3'],
    'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])

# Create a Series
s1 = pd.Series(['X0', 'X1', 'X2', 'X3'], name='X')

# Concatenate DataFrame and Series
result = pd.concat([df1, s1], axis=1)
result

If unnamed Series are passed they will be numbered consecutively.

In [None]:
# Create an unnamed Series
s2 = pd.Series(['_0', '_1', '_2', '_3'])

# Concatenate DataFrame and multiple unnamed Series
result = pd.concat([df1, s2, s2, s2], axis=1)
result

Passing `ignore_index=True` will drop all name references.

In [None]:
# Concatenate with ignore_index=True
result = pd.concat([df1, s1], axis=1, ignore_index=True)
result

##### 2. More Concatenating with Group Keys

A fairly common use of the keys argument is to override the column names when creating a new DataFrame based on existing Series.

In [None]:
# Create Series with and without names
s3 = pd.Series([0, 1, 2, 3], name='foo')
s4 = pd.Series([0, 1, 2, 3])
s5 = pd.Series([0, 1, 4, 5])

# Default behavior inherits Series names
pd.concat([s3, s4, s5], axis=1)

In [None]:
# Override column names with keys
pd.concat([s3, s4, s5], axis=1, keys=['red', 'blue', 'yellow'])

Let's create some sample DataFrames to demonstrate keys with DataFrames:

In [None]:
# Create additional sample DataFrames
df2 = pd.DataFrame({
    'A': ['A4', 'A5', 'A6', 'A7'],
    'B': ['B4', 'B5', 'B6', 'B7'],
    'C': ['C4', 'C5', 'C6', 'C7'],
    'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])

df3 = pd.DataFrame({
    'A': ['A8', 'A9', 'A10', 'A11'],
    'B': ['B8', 'B9', 'B10', 'B11'],
    'C': ['C8', 'C9', 'C10', 'C11'],
    'D': ['D8', 'D9', 'D10', 'D11']
}, index=[8, 9, 10, 11])

frames = [df1, df2, df3]

# Concatenate with keys
result = pd.concat(frames, keys=['x', 'y', 'z'])
result

You can also pass a dict to concat in which case the dict keys will be used for the keys argument (unless other keys are specified):

In [None]:
# Create a dictionary of DataFrames
pieces = {'x': df1, 'y': df2, 'z': df3}

# Concatenate using dict keys
result = pd.concat(pieces)
result

In [None]:
# Specify different keys
result = pd.concat(pieces, keys=['z', 'y'])
result

The MultiIndex created has levels that are constructed from the passed keys and the index of the DataFrame pieces:

In [None]:
# Examine the index levels
result.index.levels

If you wish to specify other levels, you can do so using the levels argument:

In [None]:
# Specify custom levels
result = pd.concat(pieces, keys=['x', 'y', 'z'],
                  levels=[['z', 'y', 'x', 'w']],
                  names=['group_key'])
result

In [None]:
# Examine the custom levels
result.index.levels

##### 3. Joining Multiple DataFrames

A list or tuple of DataFrames can also be passed to join() to join them together on their indexes.

In [None]:
# Create DataFrames with index
left = pd.DataFrame({'v': [1, 2, 3]}, index=['K0', 'K1', 'K2'])
right = pd.DataFrame({'v': [4, 5, 6]}, index=['K0', 'K0', 'K3'])
right2 = pd.DataFrame({'v': [7, 8, 9]}, index=['K1', 'K1', 'K2'])

# Join multiple DataFrames
result = left.join([right, right2])
result

##### 4. Merging Together Values within Series or DataFrame Columns

Another fairly common situation is to have two like-indexed (or similarly indexed) Series or DataFrame objects and wanting to "patch" values in one object from values for matching indices in the other.

In [None]:
# Create DataFrames with NaN values
df1 = pd.DataFrame([[np.nan, 3., 5.], [-4.6, np.nan, np.nan],
                   [np.nan, 7., np.nan]])
df2 = pd.DataFrame([[-42.6, np.nan, -8.2], [-5., 1.6, 4]],
                   index=[1, 2])

print("DataFrame 1:")
display(df1)
print("\nDataFrame 2:")
display(df2)

For this, use the `combine_first()` method:

In [None]:
# Combine values, taking values from df2 only when missing in df1
result = df1.combine_first(df2)
result

Note that this method only takes values from the right DataFrame if they are missing in the left DataFrame. A related method, `update()`, alters non-NA values in place:

In [None]:
# Create a copy of df1 to demonstrate update
df1_copy = df1.copy()
print("Before update:")
display(df1_copy)

# Update df1 with values from df2
df1_copy.update(df2)

print("\nAfter update:")
display(df1_copy)

##### 5. Timeseries Friendly Merging

### 5.1 Merging Ordered Data

A `merge_ordered()` function allows combining time series and other ordered data. In particular it has an optional `fill_method` keyword to fill/interpolate missing data:

In [None]:
# Create sample DataFrames for ordered merge
left = pd.DataFrame({'k': ['K0', 'K1', 'K1', 'K2'],
                     'lv': [1, 2, 3, 4],
                     's': ['a', 'b', 'c', 'd']})

right = pd.DataFrame({'k': ['K1', 'K2', 'K4'],
                      'rv': [1, 2, 3]})

# Merge ordered with forward fill
pd.merge_ordered(left, right, fill_method='ffill', left_by='s')

### 5.2 Merging Asof

A `merge_asof()` is similar to an ordered left-join except that we match on nearest key rather than equal keys. For each row in the left DataFrame, we select the last row in the right DataFrame whose on key is less than the left's key. Both DataFrames must be sorted by the key.

In [None]:
# Create sample DataFrames for asof merge (trades and quotes)
trades = pd.DataFrame({
    'time': pd.to_datetime(['20160525 13:30:00.023',
                           '20160525 13:30:00.038',
                           '20160525 13:30:00.048',
                           '20160525 13:30:00.048',
                           '20160525 13:30:00.048']),
    'ticker': ['MSFT', 'MSFT', 'GOOG', 'GOOG', 'AAPL'],
    'price': [51.95, 51.95, 720.77, 720.92, 98.00],
    'quantity': [75, 155, 100, 100, 100]
})

quotes = pd.DataFrame({
    'time': pd.to_datetime(['20160525 13:30:00.023',
                           '20160525 13:30:00.023',
                           '20160525 13:30:00.030',
                           '20160525 13:30:00.041',
                           '20160525 13:30:00.048',
                           '20160525 13:30:00.049',
                           '20160525 13:30:00.072',
                           '20160525 13:30:00.075']),
    'ticker': ['GOOG', 'MSFT', 'MSFT', 'MSFT', 'GOOG', 'AAPL', 'GOOG', 'MSFT'],
    'bid': [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
    'ask': [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03]
})

# Display the DataFrames
print("Trades:")
display(trades)
print("\nQuotes:")
display(quotes)

In [None]:
# Perform an asof merge
pd.merge_asof(trades, quotes, on='time', by='ticker')

We can also do an asof merge with a tolerance, meaning we only merge values within a certain time difference:

In [None]:
# Asof merge with tolerance
pd.merge_asof(trades, quotes, on='time', by='ticker', tolerance=pd.Timedelta('2ms'))

We can also use the direction parameter to control whether the merge should look for values forward, backward, or nearest:

In [None]:
# Asof merge with direction='forward'
pd.merge_asof(trades, quotes, on='time', by='ticker', direction='forward')