<img src = "https://images2.imgbox.com/60/09/VFwl5LOq_o.jpg" width="400">

# 3. Parsing and Manipulating Text
---

Learn how to manipulate string and text data by transforming case, parsing and truncating text and extracting substrings from larger strings.

In [1]:
%pip install -q sqlalchemy

Note: you may need to restart the kernel to use updated packages.


In [2]:
%load_ext sql

In [3]:
%sql postgresql://postgres:123@localhost/sakila

## Concatenating strings
---

In this exercise and the ones that follow, we are going to derive new fields from columns within the `customer` and `film` tables of the DVD rental database.

We'll start with the `customer` table and create a query to return the customers name and email address formatted such that we could use it as a "To" field in an email script or program. This format will look like the following:

`Brian Piccolo <bpiccolo@datacamp.com>`

**In the first step of the exercise, use the || operator to do the string concatenation and in the second step, use the CONCAT() functions.**

### Instructions

Concatenate the `first_name` and `last_name` columns separated by a single space followed by `email` surrounded by `<` and `>`.

In [4]:
%%sql

SELECT first_name || ' ' || last_name || ' <' || email || '>' AS full_email 
FROM   customer
LIMIT  20

 * postgresql://postgres:***@localhost/sakila
20 rows affected.


full_email
MARY SMITH <MARY.SMITH@sakilacustomer.org>
PATRICIA JOHNSON <PATRICIA.JOHNSON@sakilacustomer.org>
LINDA WILLIAMS <LINDA.WILLIAMS@sakilacustomer.org>
BARBARA JONES <BARBARA.JONES@sakilacustomer.org>
ELIZABETH BROWN <ELIZABETH.BROWN@sakilacustomer.org>
JENNIFER DAVIS <JENNIFER.DAVIS@sakilacustomer.org>
MARIA MILLER <MARIA.MILLER@sakilacustomer.org>
SUSAN WILSON <SUSAN.WILSON@sakilacustomer.org>
MARGARET MOORE <MARGARET.MOORE@sakilacustomer.org>
DOROTHY TAYLOR <DOROTHY.TAYLOR@sakilacustomer.org>


Now use the `CONCAT()` function to do the same operation as the previous step.

In [5]:
%%sql

SELECT CONCAT(first_name,' ', last_name, ' <', email, '>') AS full_email 
FROM   customer
LIMIT  20

 * postgresql://postgres:***@localhost/sakila
20 rows affected.


full_email
MARY SMITH <MARY.SMITH@sakilacustomer.org>
PATRICIA JOHNSON <PATRICIA.JOHNSON@sakilacustomer.org>
LINDA WILLIAMS <LINDA.WILLIAMS@sakilacustomer.org>
BARBARA JONES <BARBARA.JONES@sakilacustomer.org>
ELIZABETH BROWN <ELIZABETH.BROWN@sakilacustomer.org>
JENNIFER DAVIS <JENNIFER.DAVIS@sakilacustomer.org>
MARIA MILLER <MARIA.MILLER@sakilacustomer.org>
SUSAN WILSON <SUSAN.WILSON@sakilacustomer.org>
MARGARET MOORE <MARGARET.MOORE@sakilacustomer.org>
DOROTHY TAYLOR <DOROTHY.TAYLOR@sakilacustomer.org>


## Changing the case of string data
---

Now you are going to use the `film` and `category` tables to create a new field called `film_category` by concatenating the category `name` with the film's `title`. You will also format the result using functions you learned about in the video to transform the case of the fields you are selecting in the query; for example, the `INITCAP()` function which converts a string to title case.

### Instructions

Convert the film category `name` to uppercase.

Convert the first letter of each word in the film's `title` to upper case.

Concatenate the converted category `name` and film `title` separated by a colon.

Convert the `description` column to lowercase.

SELECT UPPER(c.name)
       || ': '
       || INITCAP(f.title)  AS film_category,
       LOWER(f.description) AS description
FROM   film AS f
       INNER JOIN film_category AS fc
               ON f.film_id = fc.film_id
       INNER JOIN category AS c
               ON fc.category_id = c.category_id 

## Replacing string data
---

Sometimes you will need to make sure that the data you are extracting does not contain any whitespace. There are many different approaches you can take to cleanse and prepare your data for these situations. A common technique is to replace any whitespace with an underscore.

In this example, we are going to practice finding and replacing whitespace characters in the `title` column of the `film` table using the `REPLACE()` function.

### Instructions

Replace all whitespace with an underscore.

In [7]:
%%sql

SELECT REPLACE(title, ' ', '_') AS title
FROM   film 
LIMIT  20

 * postgresql://postgres:***@localhost/sakila
20 rows affected.


title
ACADEMY_DINOSAUR
ACE_GOLDFINGER
ADAPTATION_HOLES
AFFAIR_PREJUDICE
AFRICAN_EGG
AGENT_TRUMAN
AIRPLANE_SIERRA
AIRPORT_POLLOCK
ALABAMA_DEVIL
ALADDIN_CALENDAR


## Determining the length of strings
---

Determining the number of characters in a string is something that you will use frequently when working with data in a SQL database. Many situations will require you to find the length of a string stored in your database. For example, you may need to limit the number of characters that are displayed in an application or you may need to ensure that a column in your dataset contains values that are all the same length. In this example, we are going to determine the length of the `description` column in the `film` table of the DVD Rental database.

### Instructions

Select the `title` and `description` columns from the `film` table.

Find the number of characters in the `description` column with the alias `desc_len`.

In [8]:
%%sql

SELECT title,
       description,
       LENGTH(description) AS desc_len
FROM   film
LIMIT  20

 * postgresql://postgres:***@localhost/sakila
20 rows affected.


title,description,desc_len
ACADEMY DINOSAUR,A Epic Drama of a Feminist And a Mad Scientist who must Battle a Teacher in The Canadian Rockies,96
ACE GOLDFINGER,A Astounding Epistle of a Database Administrator And a Explorer who must Find a Car in Ancient China,100
ADAPTATION HOLES,A Astounding Reflection of a Lumberjack And a Car who must Sink a Lumberjack in A Baloon Factory,96
AFFAIR PREJUDICE,A Fanciful Documentary of a Frisbee And a Lumberjack who must Chase a Monkey in A Shark Tank,92
AFRICAN EGG,A Fast-Paced Documentary of a Pastry Chef And a Dentist who must Pursue a Forensic Psychologist in The Gulf of Mexico,117
AGENT TRUMAN,A Intrepid Panorama of a Robot And a Boy who must Escape a Sumo Wrestler in Ancient China,89
AIRPLANE SIERRA,A Touching Saga of a Hunter And a Butler who must Discover a Butler in A Jet Boat,81
AIRPORT POLLOCK,A Epic Tale of a Moose And a Girl who must Confront a Monkey in Ancient India,77
ALABAMA DEVIL,A Thoughtful Panorama of a Database Administrator And a Mad Scientist who must Outgun a Mad Scientist in A Jet Boat,115
ALADDIN CALENDAR,A Action-Packed Tale of a Man And a Lumberjack who must Reach a Feminist in Ancient China,89


## Truncating strings
---

In the previous exercise, you calculated the length of the `description` column and noticed that the number of characters varied but most of the results were over 75 characters. There will be many times when you need to truncate a text column to a certain length to meet specific criteria for an application. In this exercise, we will practice getting the first 50 characters of the `description` column.

### Instructions

Select the first 50 characters of the `description` column with the alias `short_desc`

In [9]:
%%sql

SELECT LEFT(description, 50) AS short_desc
FROM   film AS f
LIMIT  20

 * postgresql://postgres:***@localhost/sakila
20 rows affected.


short_desc
A Epic Drama of a Feminist And a Mad Scientist who
A Astounding Epistle of a Database Administrator A
A Astounding Reflection of a Lumberjack And a Car
A Fanciful Documentary of a Frisbee And a Lumberja
A Fast-Paced Documentary of a Pastry Chef And a De
A Intrepid Panorama of a Robot And a Boy who must
A Touching Saga of a Hunter And a Butler who must
A Epic Tale of a Moose And a Girl who must Confron
A Thoughtful Panorama of a Database Administrator
A Action-Packed Tale of a Man And a Lumberjack who


## Extracting substrings from text data
---

In this exercise, you are going to practice how to extract substrings from text columns. The Sakila database contains the address table which stores the street `address` for all the rental store locations. You need a list of all the street names where the stores are located but the `address` column also contains the street number. You'll use several functions that you've learned about in the video to manipulate the `address` column and return only the street address.

### Instructions

Extract only the street address without the street number from the `address` column.

Use functions to determine the starting and ending position parameters.

In [10]:
%%sql

SELECT SUBSTRING(address FROM POSITION(' ' IN address)+1 FOR LENGTH(address))
FROM   address
LIMIT  20

 * postgresql://postgres:***@localhost/sakila
20 rows affected.


substring
MySakila Drive
MySQL Boulevard
Workhaven Lane
Lillydale Drive
Hanoi Way
Loja Avenue
Joliet Street
Inegl Manor
Idfu Parkway
Santiago de Compostela Way


## Combining functions for string manipulation
---

In the next example, we are going to break apart the `email` column from the `customer` table into three new derived fields. Parsing a single column into multiple columns can be useful when you need to work with certain subsets of data. Email addresses have embedded information stored in them that can be parsed out to derive additional information about our data. For example, we can use the techniques we learned about in the video to determine how many of our customers use an email from a specific domain.

### Instructions

Extract the characters to the left of the `@` of the `email` column in the `customer` table and alias it as `username`.

Now use `SUBSTRING` to extract the characters after the `@` of the `email` column and alias the new derived field as `domain`.

In [11]:
%%sql

SELECT LEFT(email, POSITION('@' IN email) - 1) AS  username,
       SUBSTRING(email FROM POSITION('@' IN email) + 1 FOR LENGTH(email)) AS domain
FROM   customer
LIMIT  20

 * postgresql://postgres:***@localhost/sakila
20 rows affected.


username,domain
MARY.SMITH,sakilacustomer.org
PATRICIA.JOHNSON,sakilacustomer.org
LINDA.WILLIAMS,sakilacustomer.org
BARBARA.JONES,sakilacustomer.org
ELIZABETH.BROWN,sakilacustomer.org
JENNIFER.DAVIS,sakilacustomer.org
MARIA.MILLER,sakilacustomer.org
SUSAN.WILSON,sakilacustomer.org
MARGARET.MOORE,sakilacustomer.org
DOROTHY.TAYLOR,sakilacustomer.org


## Padding
---

Padding strings is useful in many real-world situations. Earlier in this course, we learned about string concatenation and how to combine the customer's first and last name separated by a single blank space and also combined the customer's full name with their email address.

The padding functions that we learned about in the video are an alternative approach to do this task. To use this approach, you will need to combine and nest functions to determine the length of a string to produce the desired result. Remember when calculating the length of a string you often need to adjust the integer returned to get the proper length or position of a string.

Let's revisit the string concatenation exercise but use padding functions.

### Instructions

Add a single space to the end or right of the `first_name` column using a padding function.

Use the `||` operator to concatenate the padded `first_name` to the `last_name` column.

In [12]:
%%sql

SELECT RPAD(first_name, LENGTH(first_name)+1) || last_name AS full_name
FROM   customer
LIMIT  20

 * postgresql://postgres:***@localhost/sakila
20 rows affected.


full_name
MARY SMITH
PATRICIA JOHNSON
LINDA WILLIAMS
BARBARA JONES
ELIZABETH BROWN
JENNIFER DAVIS
MARIA MILLER
SUSAN WILSON
MARGARET MOORE
DOROTHY TAYLOR


Now add a single space to the left or beginning of the `last_name` column using a different padding function than the first step.

Use the `||` operator to concatenate the `first_name` column to the padded `last_name`.

In [13]:
%%sql

SELECT first_name || LPAD(last_name, LENGTH(last_name)+1) AS full_name
FROM   customer
LIMIT  20

 * postgresql://postgres:***@localhost/sakila
20 rows affected.


full_name
MARY SMITH
PATRICIA JOHNSON
LINDA WILLIAMS
BARBARA JONES
ELIZABETH BROWN
JENNIFER DAVIS
MARIA MILLER
SUSAN WILSON
MARGARET MOORE
DOROTHY TAYLOR


Add a single space to the right or end of the `first_name` column.

Add the characters `<` to the right or end of `last_name` column.

Finally, add the characters `>` to the right or end of the `email` column.

In [14]:
%%sql

SELECT RPAD(first_name, LENGTH(first_name) + 1)
       || RPAD(last_name, LENGTH(last_name) + 2, ' <')
       || RPAD(email, LENGTH(email) + 1, '>') AS full_email
FROM   customer
LIMIT  20

 * postgresql://postgres:***@localhost/sakila
20 rows affected.


full_email
MARY SMITH <MARY.SMITH@sakilacustomer.org>
PATRICIA JOHNSON <PATRICIA.JOHNSON@sakilacustomer.org>
LINDA WILLIAMS <LINDA.WILLIAMS@sakilacustomer.org>
BARBARA JONES <BARBARA.JONES@sakilacustomer.org>
ELIZABETH BROWN <ELIZABETH.BROWN@sakilacustomer.org>
JENNIFER DAVIS <JENNIFER.DAVIS@sakilacustomer.org>
MARIA MILLER <MARIA.MILLER@sakilacustomer.org>
SUSAN WILSON <SUSAN.WILSON@sakilacustomer.org>
MARGARET MOORE <MARGARET.MOORE@sakilacustomer.org>
DOROTHY TAYLOR <DOROTHY.TAYLOR@sakilacustomer.org>


## The TRIM function
---

In this exercise, we are going to revisit and combine a couple of exercises from earlier in this chapter. If you recall, you used the `LEFT()` function to truncate the `description` column to 50 characters but saw that some words were cut off and/or had trailing whitespace. We can use trimming functions to eliminate the whitespace at the end of the string after it's been truncated.

### Instructions

Convert the film category `name` to uppercase and use the `CONCAT()` concatenate it with the `title`.

Truncate the description to the first 50 characters and make sure there is no leading or trailing whitespace after truncating.

In [15]:
%%sql

SELECT CONCAT(UPPER(c.name), ': ', f.title) AS film_category,
       TRIM(LEFT(description, 50))          AS film_desc
FROM   film AS f
       INNER JOIN film_category AS fc
               ON f.film_id = fc.film_id
       INNER JOIN category AS c
               ON fc.category_id = c.category_id 
LIMIT  20

 * postgresql://postgres:***@localhost/sakila
20 rows affected.


film_category,film_desc
DOCUMENTARY: ACADEMY DINOSAUR,A Epic Drama of a Feminist And a Mad Scientist who
HORROR: ACE GOLDFINGER,A Astounding Epistle of a Database Administrator A
DOCUMENTARY: ADAPTATION HOLES,A Astounding Reflection of a Lumberjack And a Car
HORROR: AFFAIR PREJUDICE,A Fanciful Documentary of a Frisbee And a Lumberja
FAMILY: AFRICAN EGG,A Fast-Paced Documentary of a Pastry Chef And a De
FOREIGN: AGENT TRUMAN,A Intrepid Panorama of a Robot And a Boy who must
COMEDY: AIRPLANE SIERRA,A Touching Saga of a Hunter And a Butler who must
HORROR: AIRPORT POLLOCK,A Epic Tale of a Moose And a Girl who must Confron
HORROR: ALABAMA DEVIL,A Thoughtful Panorama of a Database Administrator
SPORTS: ALADDIN CALENDAR,A Action-Packed Tale of a Man And a Lumberjack who


## Putting it all together
---

In this exercise, we are going to use the `film` and `category` tables to create a new field called `film_category` by concatenating the category `name` with the film's `title`. You will also practice how to truncate text fields like the `film` table's `description` column without cutting off a word.

To accomplish this we will use the `REVERSE()` function to help determine the position of the last whitespace character in the `description` before we reach 50 characters. This technique can be used to determine the position of the last character that you want to truncate and ensure that it is less than or equal to 50 characters AND does not cut off a word.

This is an advanced technique but I know you can do it! Let's dive in.

### Instructions

Get the first 50 characters of the `description` column.

Determine the position of the last whitespace character of the truncated `description` column and subtract it from the number 50 as the second parameter in the first function above.

In [16]:
%%sql

SELECT UPPER(c.name)
       || ': '
       || f.title AS film_category,
       LEFT(description, 50 - POSITION(' ' IN Reverse(LEFT(description, 50))))
FROM   film AS f
       INNER JOIN film_category AS fc
               ON f.film_id = fc.film_id
       INNER JOIN category AS c
               ON fc.category_id = c.category_id 
LIMIT 20

 * postgresql://postgres:***@localhost/sakila
20 rows affected.


film_category,left
DOCUMENTARY: ACADEMY DINOSAUR,A Epic Drama of a Feminist And a Mad Scientist
HORROR: ACE GOLDFINGER,A Astounding Epistle of a Database Administrator
DOCUMENTARY: ADAPTATION HOLES,A Astounding Reflection of a Lumberjack And a Car
HORROR: AFFAIR PREJUDICE,A Fanciful Documentary of a Frisbee And a
FAMILY: AFRICAN EGG,A Fast-Paced Documentary of a Pastry Chef And a
FOREIGN: AGENT TRUMAN,A Intrepid Panorama of a Robot And a Boy who must
COMEDY: AIRPLANE SIERRA,A Touching Saga of a Hunter And a Butler who must
HORROR: AIRPORT POLLOCK,A Epic Tale of a Moose And a Girl who must
HORROR: ALABAMA DEVIL,A Thoughtful Panorama of a Database Administrator
SPORTS: ALADDIN CALENDAR,A Action-Packed Tale of a Man And a Lumberjack
