# What is exploding data?

Data exploding is when there is a list of values, where each value in the list needs to be places in a seperate row, in the same column. This can be achieved using the explode method `df.explode()`.

To do this with with columns (every value becomes a column), a new dataframe must be created with the columns being specified by the list of values. This cna be achived like this: `pd.DataFrame(df['column'].values.list())`

# Import Libraries

In [2]:
import pandas as pd
import numpy as np

# Create a random array

In [3]:
array = np.random.rand(25,5).round(decimals=2)
array

array([[0.9 , 0.23, 0.38, 0.98, 0.7 ],
       [0.42, 0.11, 0.37, 0.21, 0.99],
       [0.35, 0.75, 0.67, 0.52, 0.79],
       [0.28, 0.93, 0.28, 0.06, 0.59],
       [0.16, 0.26, 0.01, 0.76, 0.99],
       [0.7 , 0.72, 0.14, 0.04, 0.78],
       [0.71, 0.66, 0.01, 0.64, 0.08],
       [0.63, 0.58, 0.66, 0.81, 0.11],
       [0.89, 0.78, 0.5 , 0.11, 0.68],
       [0.73, 0.44, 0.65, 0.34, 0.86],
       [0.24, 0.59, 0.2 , 0.35, 0.58],
       [0.84, 0.12, 0.01, 0.91, 0.2 ],
       [0.99, 0.97, 0.84, 0.87, 0.84],
       [0.06, 0.44, 0.62, 0.63, 0.63],
       [0.12, 0.09, 0.51, 0.39, 0.75],
       [0.78, 0.31, 0.84, 0.42, 0.18],
       [0.72, 0.96, 0.99, 0.19, 0.87],
       [0.43, 0.96, 0.08, 0.65, 0.37],
       [0.38, 0.8 , 0.13, 0.42, 0.06],
       [0.87, 0.61, 0.19, 0.31, 0.74],
       [0.51, 1.  , 0.66, 0.62, 0.79],
       [0.94, 0.58, 0.76, 0.4 , 0.41],
       [0.73, 0.31, 0.64, 0.07, 0.24],
       [0.22, 0.79, 0.52, 0.63, 0.98],
       [0.59, 0.81, 0.19, 0.49, 0.18]])

# Exploding a list of values into rows

In [8]:
df = pd.DataFrame({"list_values":array.tolist()}) #Creates 25 rows where each row contains a list stored as a list
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25 entries, 0 to 24
Data columns (total 1 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   list_values  25 non-null     object
dtypes: object(1)
memory usage: 332.0+ bytes


## Seing the type of value stored

In [20]:
type(df.iloc[0][0])

  type(df.iloc[0][0])


list

## Exploding the list stored by separating the list

Duplicates the index value for each index

In [21]:
df.explode(column="list_values")

Unnamed: 0,list_values
0,0.9
0,0.23
0,0.38
0,0.98
0,0.7
...,...
24,0.59
24,0.81
24,0.19
24,0.49


# Explode the list of values into columns

Where the column names become the index of the value of the inner list

In [26]:
pd.DataFrame(df.list_values.values.tolist()).head()

Unnamed: 0,0,1,2,3,4
0,0.9,0.23,0.38,0.98,0.7
1,0.42,0.11,0.37,0.21,0.99
2,0.35,0.75,0.67,0.52,0.79
3,0.28,0.93,0.28,0.06,0.59
4,0.16,0.26,0.01,0.76,0.99


# Exploding a string represenation of a list into columns 

Each list is a list of values for each row as a string.

In [24]:
df_str = pd.DataFrame({"list_values":[str(a) for a in array.tolist()]})
df_str

Unnamed: 0,list_values
0,"[0.9, 0.23, 0.38, 0.98, 0.7]"
1,"[0.42, 0.11, 0.37, 0.21, 0.99]"
2,"[0.35, 0.75, 0.67, 0.52, 0.79]"
3,"[0.28, 0.93, 0.28, 0.06, 0.59]"
4,"[0.16, 0.26, 0.01, 0.76, 0.99]"
5,"[0.7, 0.72, 0.14, 0.04, 0.78]"
6,"[0.71, 0.66, 0.01, 0.64, 0.08]"
7,"[0.63, 0.58, 0.66, 0.81, 0.11]"
8,"[0.89, 0.78, 0.5, 0.11, 0.68]"
9,"[0.73, 0.44, 0.65, 0.34, 0.86]"


In [30]:
type(df_str.iloc[0][0]) #Each value is a string

  type(df_str.iloc[0][0]) #Each value is a string


str

## Exploding the list of strings into column names

Trying to convert the string list of values to columns will not work. It will assume the entire string is in the first index of a list

In [34]:
pd.DataFrame(df_str.list_values.values.tolist()).head()

Unnamed: 0,0
0,"[0.9, 0.23, 0.38, 0.98, 0.7]"
1,"[0.42, 0.11, 0.37, 0.21, 0.99]"
2,"[0.35, 0.75, 0.67, 0.52, 0.79]"
3,"[0.28, 0.93, 0.28, 0.06, 0.59]"
4,"[0.16, 0.26, 0.01, 0.76, 0.99]"


A way to fix this is to evaluate each string as a list then go on as normal. `eval()` parses a list as a string as a python expression and turns it back into a string.

In [36]:
pd.DataFrame(df_str.list_values.apply(lambda x: eval(x)).values.tolist()).head()

Unnamed: 0,0,1,2,3,4
0,0.9,0.23,0.38,0.98,0.7
1,0.42,0.11,0.37,0.21,0.99
2,0.35,0.75,0.67,0.52,0.79
3,0.28,0.93,0.28,0.06,0.59
4,0.16,0.26,0.01,0.76,0.99
