## String Methods in _pandas_

In this notebook, you'll learn about several string methods and how to use them with a _pandas_ DataFrame.

We'll be reading in and trying to combine two datasets.

In [None]:
import pandas as pd

Our first dataset contains unemployment data that was obtained from the Burea of Labor Statistics.

In [None]:
unemployment = pd.read_csv('data/tn_unemployment.csv')

unemployment.head()

Now, let's bring in our second DataFrame, which contains population data per county.

In [None]:
population = pd.read_csv('data/tn_population.csv')

population.head()

Our goal is to combine the unemployment and population data. In order to do this, _pandas_ needs a common column to join on. 

Notice that both DataFrames have a Name column. However, we can't merge them at the moment since the capitalization is different, and one includes the state.

When working with text data in `pandas`, it is often useful to utilize the built-in sting methods. To use these methods, you must prepend a `.str` before the desired method.

### Changing Case

For example, we can make column entirely uppercase using the `upper()` method.

In [None]:
unemployment['Name'].str.upper()

Alternatively, we can capitalize the first letter of each word using the `title()` method.

In [None]:
population['Name'].str.title()

Let's use the second method which will get our columns closer to where they need to be.

In [None]:
population['Name'] = population['Name'].str.title()

### Replace

Another often useful method is the `replace()` method. To use this method, specify what pattern you want to replace and then the replacement text.

In [None]:
unemployment['Period'].str.replace('21', '2021')

**Try It Out** Use string slicing to remove the ", TN" from the Name column of the unemployment DataFrame.

In [None]:
# Your Code Here
unemployment['Name'].str.replace(', TN', '')

### String Slicing

We can also slice strings using _pandas_ much like we can with regular strings.

In [None]:
unemployment['Period'].str[:3]

**Try It Out** Use string slicing to remove the ", TN" from the Name column of the unemployment DataFrame.

In [None]:
# Your Code Here
unemployment['Name'].str[:-4]

### String Concatenation

Note that we can also use + with string to concatenate them. For example, we could add on the ", TN" to the population Name column.

In [None]:
population['Name'] + ", TN"

### Splitting Strings

Another useful string method is `.str.split()`, which allows us to divide a string into a list of parts by specifying what to split on. 

Notice that if we split on the comma, the first piece will match what is contained in the `Name` column of the population DataFrame.

In [None]:
unemployment['Name'].str.split(',')

By default, this method returns a list. We can make it return a DataFrame by using the `expand` argument.

In [None]:
unemployment['Name'].str.split(',', expand = True)

We only want the first column.

In [None]:
unemployment['Name'].str.split(',', expand = True)[0]

Finally, we can assign this back to the `Name` column.

In [None]:
unemployment['Name'] = unemployment['Name'].str.split(',', expand = True)[0]

In [None]:
unemployment.head()

Finally, we are ready to merge our DataFrames.

In [None]:
pd.merge(left = population, right = unemployment)