# Concatenating and Reshaping Data in Pandas

### What Is Concatenation and Reshaping?

In real-world data pipelines, we often need to **combine datasets vertically or horizontally** (concatenation), or **restructure them** from wide-to-long or long-to-wide formats (reshaping). These operations are essential when:

- You split data for processing and want to recombine later
- Data is stored across multiple files or tables
- You need to convert group-wise data into structured formats for ML

Pandas provides powerful tools like:

- `pd.concat()` — for **combining** DataFrames along rows or columns
- `.melt()` and `.pivot()` — for **reshaping** between long and wide formats
- `.stack()` and `.unstack()` — for **hierarchical reshaping**

We'll use the Titanic dataset to explore these operations.

### 1. Concatenation Using `pd.concat()`

Combine Two DataFrames Vertically (Row-wise)

In [1]:
import pandas as pd

df = pd.read_csv("data/train.csv")
df1 = pd.read_csv("data/train.csv").iloc[:3]
df2 = pd.read_csv("data/train.csv").iloc[3:6]

df_row_concat = pd.concat([df1, df2], axis=0)
print(df_row_concat[['PassengerId', 'Name']])

   PassengerId                                               Name
0            1                            Braund, Mr. Owen Harris
1            2  Cumings, Mrs. John Bradley (Florence Briggs Th...
2            3                             Heikkinen, Miss. Laina
3            4       Futrelle, Mrs. Jacques Heath (Lily May Peel)
4            5                           Allen, Mr. William Henry
5            6                                   Moran, Mr. James


- `axis=0` means vertical stacking
- Indexes are **not reset automatically** unless specified

Combine Two DataFrames Horizontally (Column-wise)


In [2]:
df1 = pd.read_csv("data/train.csv")[['PassengerId', 'Name']]
df2 = pd.read_csv("data/train.csv")[['Sex', 'Age']]

df_col_concat = pd.concat([df1, df2], axis=1)
print(df_col_concat.head())

   PassengerId                                               Name     Sex  \
0            1                            Braund, Mr. Owen Harris    male   
1            2  Cumings, Mrs. John Bradley (Florence Briggs Th...  female   
2            3                             Heikkinen, Miss. Laina  female   
3            4       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female   
4            5                           Allen, Mr. William Henry    male   

    Age  
0  22.0  
1  38.0  
2  26.0  
3  35.0  
4  35.0  


Resetting Index After Concat

In [3]:
df_concat_reset = pd.concat([df1, df2], axis=0).reset_index(drop=True)
print(df_concat_reset.head())

   PassengerId                                               Name  Sex  Age
0          1.0                            Braund, Mr. Owen Harris  NaN  NaN
1          2.0  Cumings, Mrs. John Bradley (Florence Briggs Th...  NaN  NaN
2          3.0                             Heikkinen, Miss. Laina  NaN  NaN
3          4.0       Futrelle, Mrs. Jacques Heath (Lily May Peel)  NaN  NaN
4          5.0                           Allen, Mr. William Henry  NaN  NaN


### 2. Reshaping with `melt()`: Wide → Long

Convert columns into rows — useful for time series or stacked observations.

In [4]:
df_sample = df[['PassengerId', 'SibSp', 'Parch']].head()

df_melted = pd.melt(df_sample, id_vars='PassengerId', var_name='Relation', value_name='Count')
print(df_melted.head())

   PassengerId Relation  Count
0            1    SibSp      1
1            2    SibSp      1
2            3    SibSp      0
3            4    SibSp      1
4            5    SibSp      0


- `id_vars`: columns to keep fixed
- Other columns are unpivoted into two columns (`Relation`, `Count`)

### 3. Reshaping with `pivot()`: Long → Wide

Reverse of melt — convert a long table into a wide format.

In [5]:
pivoted = df_melted.pivot(index='PassengerId', columns='Relation', values='Count')
print(pivoted.head())

Relation     Parch  SibSp
PassengerId              
1                0      1
2                0      1
3                0      0
4                0      1
5                0      0


### 4. Stack & Unstack for MultiIndex Reshaping

Stack: Columns → Index

In [6]:
stacked = pivoted.stack()
print(stacked.head())

PassengerId  Relation
1            Parch       0
             SibSp       1
2            Parch       0
             SibSp       1
3            Parch       0
dtype: int64


Unstack: Index → Columns

In [7]:
unstacked = stacked.unstack()
print(unstacked.head())

Relation     Parch  SibSp
PassengerId              
1                0      1
2                0      1
3                0      0
4                0      1
5                0      0


### AI/ML Use Case: Tidy Data for Modeling

Machine learning algorithms need **clean and structured input** — often in **tidy long-format** or **flat wide-format**.

Examples:

- Melt data to stack features for **sequence models**
- Pivot group stats (e.g., survival rate by class and gender)
- Stack/unstack time-series user data for RNNs or LSTMs

These reshaping tools help us prepare **meaningful features** and **consolidate large datasets**.

### Exercises

Q1. Concatenate first 5 and last 5 rows of Titanic dataset vertically

In [8]:
df_top = df.head(5)
df_bottom = df.tail(5)

df_concat = pd.concat([df_top, df_bottom])
print(df_concat[['PassengerId', 'Name']])

     PassengerId                                               Name
0              1                            Braund, Mr. Owen Harris
1              2  Cumings, Mrs. John Bradley (Florence Briggs Th...
2              3                             Heikkinen, Miss. Laina
3              4       Futrelle, Mrs. Jacques Heath (Lily May Peel)
4              5                           Allen, Mr. William Henry
886          887                              Montvila, Rev. Juozas
887          888                       Graham, Miss. Margaret Edith
888          889           Johnston, Miss. Catherine Helen "Carrie"
889          890                              Behr, Mr. Karl Howell
890          891                                Dooley, Mr. Patrick


Q2. Horizontally concatenate 'Name', 'Sex', 'Age' into one DataFrame

In [9]:
df_name = df[['Name']]
df_sex = df[['Sex']]
df_age = df[['Age']]

df_combined = pd.concat([df_name, df_sex, df_age], axis=1)
print(df_combined.head())

                                                Name     Sex   Age
0                            Braund, Mr. Owen Harris    male  22.0
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0
2                             Heikkinen, Miss. Laina  female  26.0
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0
4                           Allen, Mr. William Henry    male  35.0


Q3. Use `.melt()` on 'SibSp' and 'Parch' to convert to long format

In [10]:
df_melt = pd.melt(df[['PassengerId', 'SibSp', 'Parch']], id_vars='PassengerId')
print(df_melt.head())

   PassengerId variable  value
0            1    SibSp      1
1            2    SibSp      1
2            3    SibSp      0
3            4    SibSp      1
4            5    SibSp      0


Q4. Pivot the melted DataFrame back to wide

In [11]:
df_pivot = df_melt.pivot(index='PassengerId', columns='variable', values='value')
print(df_pivot.head())

variable     Parch  SibSp
PassengerId              
1                0      1
2                0      1
3                0      0
4                0      1
5                0      0


Q5. Stack and unstack the pivoted DataFrame

In [12]:
stacked = df_pivot.stack()
print(stacked.head())

unstacked = stacked.unstack()
print(unstacked.head())

PassengerId  variable
1            Parch       0
             SibSp       1
2            Parch       0
             SibSp       1
3            Parch       0
dtype: int64
variable     Parch  SibSp
PassengerId              
1                0      1
2                0      1
3                0      0
4                0      1
5                0      0


### Summary

Concatenating and reshaping data in Pandas are powerful operations for data alignment, transformation, and formatting. Whether you’re working with split files, group aggregates, time-series, or sequence data, being able to reshape between **wide and long formats** is essential.

- **`pd.concat()`** helps combine DataFrames vertically or horizontally.
- **`.melt()`** converts wide → long (unpivoting), useful for tidying or stacking.
- **`.pivot()`** does long → wide (pivoting), ideal for creating wide format for models.
- **`.stack()` / `.unstack()`** support deeper reshaping in MultiIndex structures.

Mastering these tools lets you build clean, model-ready datasets and apply advanced feature engineering, especially for sequence-based or grouped modeling scenarios in machine learning.