# Combine columns with same header split across template

### Description

- Use when data template has same headers split into 2+ columns for readability in Excel
- Split into 2 dataframes & merge - better formatting control with merge
- Useful when data template is designed to fit on a single print-out in Excel
- Can incorporate into cleaning function when iterating through files in a directory 

## Import libraries

In [1]:
import pandas as pd
import numpy as np
import os

## Import data

In [2]:
dfa1 = pd.read_csv('data_split_columns_raw.csv')

In [3]:
dfa1

Unnamed: 0,number,letter,color,number.1,letter.1,color.1
0,1,a,red,4,d,green
1,2,b,orange,5,e,blue
2,3,c,yellow,6,f,purple


## Split the dataframe

In [4]:
# Make a slice of dfa1 containing the left 3 columns
dfa2 = dfa1[['number', 'letter', 'color']]

# Make another slice of dfa1 containing the right 3 columns
dfb1 = dfa1[['number.1', 'letter.1', 'color.1']]

In [5]:
dfa2

Unnamed: 0,number,letter,color
0,1,a,red
1,2,b,orange
2,3,c,yellow


In [6]:
dfb1

Unnamed: 0,number.1,letter.1,color.1
0,4,d,green
1,5,e,blue
2,6,f,purple


## Prepare the split dataframes for merging

In [7]:
# Rename columns in dfb1 to match dfa2 column headers
# Only do this if the column order already matches- resort columns first if needed
dfb1.columns = dfa2.columns

In [8]:
dfb1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   number  3 non-null      int64 
 1   letter  3 non-null      object
 2   color   3 non-null      object
dtypes: int64(1), object(2)
memory usage: 204.0+ bytes


In [9]:
dfa2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   number  3 non-null      int64 
 1   letter  3 non-null      object
 2   color   3 non-null      object
dtypes: int64(1), object(2)
memory usage: 204.0+ bytes


## Merge the split dataframes into new dataframe

In [10]:
dfa3 = pd.concat([dfa2, dfb1], ignore_index = True)

In [11]:
dfa3

Unnamed: 0,number,letter,color
0,1,a,red
1,2,b,orange
2,3,c,yellow
3,4,d,green
4,5,e,blue
5,6,f,purple


In [12]:
dfa3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   number  6 non-null      int64 
 1   letter  6 non-null      object
 2   color   6 non-null      object
dtypes: int64(1), object(2)
memory usage: 276.0+ bytes


## Export data

In [13]:
dfa3.to_csv('cleaned_data_split_columns.csv', encoding = 'utf-8', index = False, header = True)