# Pandas tip #2: Split text column into multiple new columns
In my projects I always do my first data analysis in Pandas. Often, one of the columns contains text data and requires some processing. For example, the column contains `first` and `last` name. What I previously did was write a Lambda function and use the apply to process each row. There is however a better way using `.str.split()` which is very similar to Python's `.split()` method. Using the `expand=True` parameter, the splitted result is put in new columns.

In [None]:
import pandas as pd

df = pd.DataFrame([
    {'path': 'train/data_shard_1.csv'},
    {'path': 'train/data_shard_2.csv'},
    {'path': 'train/data_shard_3.csv'},
    {'path': 'test/data_shard_1.csv'},
    {'path': 'test/data_shard_2.csv'},
])

In [None]:
# https://linkedin.com/in/dennisbakhuis
df = (df
    .join(df
        .loc[:, 'path']
        .str.split('/', expand=True)
        .rename(columns={0: 'folder', 1: 'filename'})
    )
)

df

### A more meaningful example
Most of you probably have seen the Titanic dataset. This dataset has a `Name` column which has some hidden information. It always starts with the last name (or family name) followed by a title of the person. We can easily extract that information using `.str.split(expand=True)`. Lets have a look:

In [None]:
# Use a list of column names to ensure we return a DataFrame
df = pd.read_csv('Assets/Titanic_train_data.csv')[['Name']]
df

The family name is before the `,` and the syntax is very similar to the regular `.split()`:

In [None]:
df['family_name'] = df['Name'].str.split(',', expand=True)[0]
df

To get the title, we have to chain a couple of splits after each other.

In [None]:
df['title'] = (df
    .loc[:, 'Name']  # is the same as df['Name'] but looks better
    .str.split(',', expand=True)[1]
    .str.split(expand=True)[0]
    .str.split('.', expand=True)[0]  # remove the `.`
)

In [None]:
df

If you have any questions, comments, or requests, feel free to [contact me on LinkedIn](https://linkedin.com/in/dennisbakhuis).

In [None]:
df['folder'] = df['path'].apply(lambda x: x.split('/')[0])
df['filename'] = df['path'].apply(lambda x: x.split('/')[1])

In [None]:
df['folder'] = df['path'].str.split('/', expand=True)[0]
df['filename'] = df['path'].str.split('/', expand=True)[1]