#  Unnest/flatten/explode list-like columns with Pandas `explode()`

This is a Notebook for the medium article [Exploding a list-like column with Pandas explode() method](https://bindichen.medium.com/exploding-a-list-like-column-with-pandas-explode-method-3ffd41f9f7e2)

Please read article for instructions. 

**License**: [BSD 2-Clause](https://opensource.org/licenses/BSD-2-Clause). Not for commercial use!!!


#### Version of packages used in this Notebook

In [7]:
import numpy as np
import pandas as pd

# Make sure your package version >= them
print('numpy: ', np.__version__)
print('pandas: ', pd.__version__)


numpy:  1.21.5
pandas:  1.4.2


#### Contents
1. Basic usage
2. Series use cases
 - 2.1 with other value
 - 2.2 with empty list
 - 2.3 String with delimiter
3. DataFrame use cases
 - 3.1 Other value
 - 3.2 With empty value
 - 3.3 Multiple list-like columns
 - 3.4 String with delimiter

## 1. Basic usage

In [3]:
df = pd.DataFrame({
    'class': ['Year 1', 'Year 2'],
    'students': [['Tom', 'Jane'], ['Liz', 'James']]
})

df

Unnamed: 0,class,students
0,Year 1,"[Tom, Jane]"
1,Year 2,"[Liz, James]"


In [4]:
# Availble in Series, notice that rows now have the same index value
df['students'].explode()

0      Tom
0     Jane
1      Liz
1    James
Name: students, dtype: object

In [5]:
# Pandas method, notice that rows now have the same index value
df.explode('students')

Unnamed: 0,class,students
0,Year 1,Tom
0,Year 1,Jane
1,Year 2,Liz
1,Year 2,James


In [4]:
# Reset index with ignore_index=True or reset_index()
df.explode('students').reset_index(drop=True)

Unnamed: 0,class,students
0,Year 1,Tom
1,Year 1,Jane
2,Year 2,Liz
3,Year 2,James


In [5]:
df.explode('students', ignore_index=True)

Unnamed: 0,class,students
0,Year 1,Tom
1,Year 1,Jane
2,Year 2,Liz
3,Year 2,James


## 2. Series use cases 

In [6]:
s = pd.Series([['Tom', 'Jane'], ['Liz', 'James']])

In [15]:
s.explode()

0      Tom
0     Jane
1      Liz
1    James
dtype: object

### 2.1 with other value

In [12]:
# non-list
s = pd.Series(
    [
        ['Tom', 'Jane'], 
        ['Liz', 'James'], 
        'not a list', 
        101,
        ['Katie', 'Sean']
    ]
)

s.explode()

0           Tom
0          Jane
1           Liz
1         James
2    not a list
3           101
4         Katie
4          Sean
dtype: object

### 2.2 with empty list

In [13]:
# Empty array, by default
s = pd.Series([
    ['Tom', 'Jane'], 
    ['Liz', 'James'], 
    [], 
    ['Katie', 'Sean']
])

s.explode()

0      Tom
0     Jane
1      Liz
1    James
2      NaN
3    Katie
3     Sean
dtype: object

### 2.3 String with delimiter

In [14]:
# String with delimiter
s = pd.Series([
    'Tom, Jane', 
    'Liz, James', 
    'Katie, Sean'
])

s

0      Tom, Jane
1     Liz, James
2    Katie, Sean
dtype: object

In [22]:
s.str.split(",").explode()

0       Tom
0      Jane
1       Liz
1     James
2     Katie
2      Sean
dtype: object

## 3. Dataframe use cases

### 3.1 Other value

In [18]:
# Other value
df = pd.DataFrame({
    'class': ['Year 1', 'Year 2', 'Year 3', 'Year 4'],
    'students': [
        ['Tom', 'Jane'], 
        'I am a string', 
        101, 
        ['Katie', 'Sean']
    ]
})

df

Unnamed: 0,class,students
0,Year 1,"[Tom, Jane]"
1,Year 2,I am a string
2,Year 3,101
3,Year 4,"[Katie, Sean]"


In [19]:
df.explode('students')

Unnamed: 0,class,students
0,Year 1,Tom
0,Year 1,Jane
1,Year 2,I am a string
2,Year 3,101
3,Year 4,Katie
3,Year 4,Sean


### 3.2 With empty value

In [20]:
# Empty
df = pd.DataFrame({
    'class': ['Year 1', 'Year 2', 'Year 3', 'Year 4'],
    'students': [
        ['Tom', 'Jane'], 
        ['Liz', 'James'], 
        [], 
        ['Katie', 'Sean']
    ]
})

df

Unnamed: 0,class,students
0,Year 1,"[Tom, Jane]"
1,Year 2,"[Liz, James]"
2,Year 3,[]
3,Year 4,"[Katie, Sean]"


In [42]:
df.explode('students')

Unnamed: 0,class,students
0,Year 1,Tom
0,Year 1,Jane
1,Year 2,Liz
1,Year 2,James
2,Year 3,
3,Year 4,Katie
3,Year 4,Sean


### 3.3 Multiple list-like columns

In [31]:
# Multiple columns
df = pd.DataFrame({
    'class': ['Year 1', 'Year 2'],
    'students': [['Tom', 'Jane'], ['Liz', 'James']],
    'sex': [['M', 'F'], ['F', 'M']]
})

df

Unnamed: 0,class,students,sex
0,Year 1,"[Tom, Jane]","[M, F]"
1,Year 2,"[Liz, James]","[F, M]"


In [32]:
df.explode(['students', 'sex'])

Unnamed: 0,class,students,sex
0,Year 1,Tom,M
0,Year 1,Jane,F
1,Year 2,Liz,F
1,Year 2,James,M


### 3.4 string with delimiter

In [44]:
# string with delimiter


df = pd.DataFrame({
    'class': ['Year 1', 'Year 2'],
    'students': ['Tom, Jane', 'Liz, James']
})

df

Unnamed: 0,class,students
0,Year 1,"Tom, Jane"
1,Year 2,"Liz, James"


In [45]:
df.assign(students=df['students'].str.split(",")).explode("students")

Unnamed: 0,class,students
0,Year 1,Tom
0,Year 1,Jane
1,Year 2,Liz
1,Year 2,James


## Thanks for reading

This is a Notebook for the medium article [Exploding a list-like column with Pandas explode() method](https://bindichen.medium.com/exploding-a-list-like-column-with-pandas-explode-method-3ffd41f9f7e2)

**License**: [BSD 2-Clause](https://opensource.org/licenses/BSD-2-Clause). Not for commercial use!!!