# Parsing Delimited Data to Multiple Rows

A common issue I've come across with datasets is multiple values for a column stored on one row. These need to be parsed out into separate rows for accuracy. Luckily, there is an easy fix we will explore below.

First, we will create a dataframe that replicates the error. We will then create a new column, 'i', which is a copy of the index. We will use this as a key.

In [1]:
import pandas as pd
from pandas import DataFrame

sales = DataFrame([{'account': 'Jones LLC , WF', 'Jan': 150, 'Feb': 200, 'Mar': 140},
         {'account': 'Alpha Co',  'Jan': 200, 'Feb': 210, 'Mar': 215},
         {'account': 'Blue Inc',  'Jan': 50,  'Feb': 90,  'Mar': 95 }])
sales['i'] = sales.index
sales

Unnamed: 0,Feb,Jan,Mar,account,i
0,200,150,140,"Jones LLC , WF",0
1,210,200,215,Alpha Co,1
2,90,50,95,Blue Inc,2


We now parse the data from the column with multiple values using split() on the delimiter and tolist()

In [2]:
parse = DataFrame(sales.account.str.split(',').tolist(), index=sales.i).stack()
parse = parse.reset_index()[[0, 'i']] # var1 variable is currently labeled 0
parse.columns = ['account', 'i'] # renaming var1
parse

Unnamed: 0,account,i
0,Jones LLC,0
1,WF,0
2,Alpha Co,1
3,Blue Inc,2


And lastly, we can use the key ('i' column) to merge the other values. The 'i' column can now be dropped.

In [3]:
result = pd.merge(parse, sales.drop('account', axis = 1), how='left', on=['i'])
result.drop('i', axis=1, inplace=True)
result

Unnamed: 0,account,Feb,Jan,Mar
0,Jones LLC,200,150,140
1,WF,200,150,140
2,Alpha Co,210,200,215
3,Blue Inc,90,50,95
