# Flatten list of lists and dict of lists in Pandas DataFrame

---

In this notebook, I will show you how to flatten a list of lists or dict of lists in DataFrame. In flatten, I mean how to unpack nested lists or dictionaries and make them look tabular.

In [1]:
import pandas as pd

from collections import defaultdict

In [2]:
data = {"company_name": ['A', 'B', 'C', 'D'],
        "info": [
[['Name', 'David Jones'],['Title', 'CEO'],['Phone', '207-685-1626'],['Email', 'djones@example.org']],

[['Name', 'Kate Brown'],['Title', 'Senior Lawyer'],['Phone', '316-978-7791'],
['Email', 'Kate.Brown@example.edu'],['Name', 'Darin White'],['Title', 'Associate Vice President'],
['Phone', '316-978-3887'],['Email', 'Darin.White@example.edu']],

[['Name', 'Carl Clark'],['Title', 'Chief Operating Officer'],
['Phone', '413-534-2745'],['Email', 'Clark_Carl@example.com']],

[['Name', 'Taylor Garcia'], ['Title', 'Board Member'],['Phone', '307-733-2164'],
['Email', 'Garcia@example.org']]
        ]
}

In [3]:
df = pd.DataFrame(data)

df

Unnamed: 0,company_name,info
0,A,"[[Name, David Jones], [Title, CEO], [Phone, 20..."
1,B,"[[Name, Kate Brown], [Title, Senior Lawyer], [..."
2,C,"[[Name, Carl Clark], [Title, Chief Operating O..."
3,D,"[[Name, Taylor Garcia], [Title, Board Member],..."


In [4]:
df['info'].iloc[2]

[['Name', 'Carl Clark'],
 ['Title', 'Chief Operating Officer'],
 ['Phone', '413-534-2745'],
 ['Email', 'Clark_Carl@example.com']]

As we see, we have a list of lists in ```info``` column. We aim to present this list of lists in tabular format. In other words, we have to make new DataFrame from this list of lists.

In [5]:
df_exploded = df.explode('info')

df_exploded.head()

Unnamed: 0,company_name,info
0,A,"[Name, David Jones]"
0,A,"[Title, CEO]"
0,A,"[Phone, 207-685-1626]"
0,A,"[Email, djones@example.org]"
1,B,"[Name, Kate Brown]"


Using ```.explode()``` method gave us only inner lists. Notice that inner lists have only two elements, and we can use list slicing to separate them and transform the whole DataFrame. To do so, we have to add two helper columns.

In [6]:
df_exploded.loc[:, 'tag'] = df_exploded['info'].map(lambda x: x[0])

df_exploded.loc[:, 'result'] = df_exploded['info'].map(lambda x: x[1])

df_exploded.head()

Unnamed: 0,company_name,info,tag,result
0,A,"[Name, David Jones]",Name,David Jones
0,A,"[Title, CEO]",Title,CEO
0,A,"[Phone, 207-685-1626]",Phone,207-685-1626
0,A,"[Email, djones@example.org]",Email,djones@example.org
1,B,"[Name, Kate Brown]",Name,Kate Brown


In [7]:
# Make some transofrmation to get desired DataFrame

(df_exploded.groupby(['company_name', 'tag'])['result'] # Groupby
            .apply(lambda x: pd.Series(x.values)) # Apply Pandas Series to each value
            .unstack(1) # Unstack them
            .reset_index() # Reset index, which is "company_name" column
            .drop(['level_1'], axis=1) # Drop unnecessary column
)

tag,company_name,Email,Name,Phone,Title
0,A,djones@example.org,David Jones,207-685-1626,CEO
1,B,Kate.Brown@example.edu,Kate Brown,316-978-7791,Senior Lawyer
2,B,Darin.White@example.edu,Darin White,316-978-3887,Associate Vice President
3,C,Clark_Carl@example.com,Carl Clark,413-534-2745,Chief Operating Officer
4,D,Garcia@example.org,Taylor Garcia,307-733-2164,Board Member


We successfully transformed a list of lists into a tabular format. Now let, convert the initial list of lists into dict of lists, where the key is the value of the "tag" column and value is the list of "result" column.

In [8]:
# Convert list of lists into dict of lists


out = []


for x in df['info'].tolist():
    groups = defaultdict(list)
    for g, v in x:
        groups[g].append(v)
    out.append(dict(groups))


df.loc[:, 'new_info'] = out

In [9]:
df.head()

Unnamed: 0,company_name,info,new_info
0,A,"[[Name, David Jones], [Title, CEO], [Phone, 20...","{'Name': ['David Jones'], 'Title': ['CEO'], 'P..."
1,B,"[[Name, Kate Brown], [Title, Senior Lawyer], [...","{'Name': ['Kate Brown', 'Darin White'], 'Title..."
2,C,"[[Name, Carl Clark], [Title, Chief Operating O...","{'Name': ['Carl Clark'], 'Title': ['Chief Oper..."
3,D,"[[Name, Taylor Garcia], [Title, Board Member],...","{'Name': ['Taylor Garcia'], 'Title': ['Board M..."


We successfully converted the list of lists into dict of lists. Take a look at the second record of the "new_info" column. We see that the values of this dict are two-element lists.

In [10]:
df['new_info'].iloc[1]

{'Name': ['Kate Brown', 'Darin White'],
 'Title': ['Senior Lawyer', 'Associate Vice President'],
 'Phone': ['316-978-7791', '316-978-3887'],
 'Email': ['Kate.Brown@example.edu', 'Darin.White@example.edu']}

There are two ways to deal with this sort of data. I found the second way much faster compared to the first, but let review both of them.

In [11]:
# The first way to flatten dict of lists


new_data = []


for i in df['new_info']:
    new_data.append(pd.DataFrame(i))

# Flattening
(pd.concat(new_data, axis=0, sort=False) # Concatenate list of DataFrames
    .drop_duplicates() # Drop duplicates
    .reset_index(drop=True)
)

Unnamed: 0,Name,Title,Phone,Email
0,David Jones,CEO,207-685-1626,djones@example.org
1,Kate Brown,Senior Lawyer,316-978-7791,Kate.Brown@example.edu
2,Darin White,Associate Vice President,316-978-3887,Darin.White@example.edu
3,Carl Clark,Chief Operating Officer,413-534-2745,Clark_Carl@example.com
4,Taylor Garcia,Board Member,307-733-2164,Garcia@example.org


It's evident that, having millions of rows how slow this loop will be. For that reason, I came up with the second solution, which uses ```Pandas``` native methods to achieve the same.

In [12]:
# The second method to faltten dict of lists


(df['new_info'].apply(pd.Series) # Apply Pandas Series to each cell of this column
                .apply(lambda x: x.explode()) # Apply ".explode()" method to each list
                .reset_index(drop=True)
)

Unnamed: 0,Name,Title,Phone,Email
0,David Jones,CEO,207-685-1626,djones@example.org
1,Kate Brown,Senior Lawyer,316-978-7791,Kate.Brown@example.edu
2,Darin White,Associate Vice President,316-978-3887,Darin.White@example.edu
3,Carl Clark,Chief Operating Officer,413-534-2745,Clark_Carl@example.com
4,Taylor Garcia,Board Member,307-733-2164,Garcia@example.org
