# Splitting

In some cases, strings contain several pieces of information so we might need to split a string in order to access one piece of information.

In staff, the name column contains both the first and last names. We can easily extract the first or last names from the name column by using the split function.

The Pandas <font color='red'>split</font> function is available under the <font color='red'>str</font> accessor. <u>It splits a string at the position of the given character and then returns a list of all parts.</u>

Let’s look at a simple example first. The following code snippet splits the name column at the space character.

In [1]:
import pandas as pd

staff = pd.read_csv("staff.csv")

print(staff["name"].str.split(" "))

0            [John, Doe]
1            [Jane, Doe]
2          [Matt, smith]
3       [Ashley, Harris]
4    [Jonathan, targett]
5           [Hale, Cole]
Name: name, dtype: object


In [3]:
staff["name"]

0            John Doe
1            Jane Doe
2          Matt smith
3       Ashley Harris
4    Jonathan targett
5           Hale Cole
Name: name, dtype: object

In [5]:
staff["name"].str.split("n")

0             [Joh,  Doe]
1             [Ja, e Doe]
2            [Matt smith]
3         [Ashley Harris]
4    [Jo, atha,  targett]
5             [Hale Cole]
Name: name, dtype: object

It’s not enough to merely split a string. We also need to extract the part we need.

The <font color='red'>expand</font> parameter of the <font color='red'>split</font> function can be used to create separate columns after splitting. We can then select the column we need.

Let’s create a column that only contains the last names.

In [8]:
import pandas as pd

staff = pd.read_csv("staff.csv")

staff["last_name"] = staff["name"].str.split(" ", expand=True)[1]

print(staff[["name","last_name"]])

               name last_name
0          John Doe       Doe
1          Jane Doe       Doe
2        Matt smith     smith
3     Ashley Harris    Harris
4  Jonathan targett   targett
5         Hale Cole      Cole


# Combining

Just like we split strings, we can combine multiple strings into a single one.

The **+** operator can be used to combine strings. Let’s review a quick example.

In [9]:
import pandas as pd

staff = pd.read_csv("staff.csv")

print(staff["name"] + " - " + staff["department"])

0               John Doe - Accounting
1            Jane Doe - Field Quality
2        Matt smith - human resources
3          Ashley Harris - accounting
4    Jonathan targett - field quality
5             Hale Cole - engineering
dtype: object


We have just learned how to split and combine strings in Pandas. These operations are frequently used because strings or textual data might contain multiple pieces of information.