Data alignment in pandas ensures that data in Series and DataFrames is automatically aligned along the axes during operations. This feature is particularly useful when working with data from different sources that may have different labels or indexes.

Automatic Data Alignment: When performing operations between pandas objects, data is aligned based on labels. Missing values are introduced where labels do not match.

Aligning Data Manually: You can manually align data using the align() method.

Combining Data: Functions like concat(), merge(), and join() allow you to combine data from multiple sources while ensuring proper alignment.

# Automatic Data Alignment:

In [1]:
import pandas as pd
# Creating two Series with different indexes
s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([4, 5, 6], index=['b', 'c', 'd'])

# Performing an operation between Series
result = s1 + s2
print("Automatic Data Alignment:")
print(result)

Automatic Data Alignment:
a    NaN
b    6.0
c    8.0
d    NaN
dtype: float64


# Aligning Data Manually with align():

In [2]:
# Aligning two Series manually
s1_aligned, s2_aligned = s1.align(s2, join='outer')
print("\nAligned Series:")
print(s1_aligned)
print(s2_aligned)


Aligned Series:
a    1.0
b    2.0
c    3.0
d    NaN
dtype: float64
a    NaN
b    4.0
c    5.0
d    6.0
dtype: float64


# Combining Data with concat():

In [3]:
# Creating two DataFrames
df1 = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
}, index=['a', 'b', 'c'])

df2 = pd.DataFrame({
    'A': [7, 8, 9],
    'C': [10, 11, 12]
}, index=['b', 'c', 'd'])

# Concatenating DataFrames
df_concat = pd.concat([df1, df2], axis=1)
print("\nConcatenated DataFrame:")
print(df_concat)


Concatenated DataFrame:
     A    B    A     C
a  1.0  4.0  NaN   NaN
b  2.0  5.0  7.0  10.0
c  3.0  6.0  8.0  11.0
d  NaN  NaN  9.0  12.0


# Combining Data with merge():

In [4]:
# Merging DataFrames
df_merge = pd.merge(df1, df2, left_index=True, right_index=True, how='outer')
print("\nMerged DataFrame:")
print(df_merge)


Merged DataFrame:
   A_x    B  A_y     C
a  1.0  4.0  NaN   NaN
b  2.0  5.0  7.0  10.0
c  3.0  6.0  8.0  11.0
d  NaN  NaN  9.0  12.0
