## Data Cleaning

### Part1 - LEFT & RIGHT

- **LEFT** pulls a specified number of characters for each row in a specified column starting at the beginning (or from the left).


- **RIGHT** pulls a specified number of characters for each row in a specified column starting at the end (or from the right)


- **LENGTH** provides the number of characters for each row of a specified column.

**sample code**
```SQL
SELECT LEFT(phone_number, 3) AS area_code,
       RIGHT(phone_number,8) AS phone_number_only,
       RIGHT(phone_number, LENGTH(phone_number)-4) AS phone_number_too
FROM demo_data
```



```SQL
WITH table1 AS (
SELECT name,
       CASE WHEN LEFT(name,1)IN ('0','1','2','3','4','5','6','7','8','9') THEN 'number'
       ELSE 'letter' END as account_groups
FROM accounts
 )

SELECT COUNT(*), account_groups
FROM table1
GROUP BY 2

```


```SQL
SELECT SUM(vowels) vowels, SUM(other) other
FROM (SELECT name, CASE WHEN LEFT(UPPER(name), 1) IN ('A','E','I','O','U') 
                        THEN 1 ELSE 0 END AS vowels, 
          CASE WHEN LEFT(UPPER(name), 1) IN ('A','E','I','O','U') 
                       THEN 0 ELSE 1 END AS other
         FROM accounts) t1;

```

### Part2 - LEFT & RIGHT

- **POSITION** takes a character and a column, and provides the index where that character is for each row.
  
  
- **STRPOS** provides the same result as POSITION, but the syntax for achieving those results is a bit different 


- **LOWER or UPPER** are used to pull an index regardless of the case of a letter, to make all of the characters lower or uppercase.


  - **REMINDER!** The index of the first position is 1 in SQL

  
**sample code**

```SQL
SELECT POSITION('char' IN column_name) as char_position, 
       STRPOS(column_name, 'char') substr_char_position,
       LOWER(column_name) AS lowercase,
       UPPER(column_name) AS uppercase,
       LEFT(column_name, POSITION('char' IN column_name)) left_only
       RIGHT(column_name,LENGTH(column_name)-STRPOS(column_name, 'char')) right_only
FROM demo_data

```

```SQL
SELECT primary_poc,
       POSITION(' ' IN primary_poc) as sp_position,
       STRPOS(primary_poc,' ') as sp_position_alt,
       LEFT(primary_poc, POSITION(' ' IN primary_poc)) first_name, 
       RIGHT(primary_poc,LENGTH(primary_poc)-STRPOS(primary_poc, ' ')) last_name
FROM accounts
LIMIT 10
                                                    
SELECT name,
       POSITION(' ' IN name) as sp_position,
       STRPOS(name,' ') as sp_position_alt,
       LEFT(name, POSITION(' ' IN name)) first_name, 
       RIGHT(name,LENGTH(name)-STRPOS(name, ' ')) last_name
       
FROM sales_reps
```


### Part3 - CONCAT

- **CONCAT** and **Piping ||** will allow you to combine columns together across rows. 

```SQL 
SELECT first_name, last_name, 
       CONCAT(first_name, ' ', last_name) as full_name,
       first_name || ' ' || last_name as full_name_alt
FROM sample_table

```


```sql

WITH table1 AS (
            SELECT primary_poc,
                   LEFT(primary_poc, POSITION(' ' IN primary_poc)) as first_name ,
                   RIGHT(primary_poc,LENGTH(primary_poc)-STRPOS(primary_poc, ' ')) as last_name,
                   name as company_name
            FROM accounts 
            )
SELECT LOWER( CONCAT(first_name, '.' , last_name, '@', REPLACE(company_name,' ' ,'' )) ) as email,
       LEFT(LOWER(first_name), 1) || RIGHT(LOWER(first_name), 1) || LEFT(LOWER(last_name), 1) || RIGHT(LOWER(last_name), 1) || LENGTH(first_name) || LENGTH(last_name) || REPLACE(UPPER(company_name), ' ', '') as password

FROM table1

```


```sql
WITH table1 AS (
            SELECT primary_poc,
                   LEFT(primary_poc, POSITION(' ' IN primary_poc)) as first_name ,
                   RIGHT(primary_poc,LENGTH(primary_poc)-STRPOS(primary_poc, ' ')) as last_name,
                   name as company_name
            FROM accounts 
            )
SELECT LOWER(first_name || '.' || last_name || '@' || company_name || '.com')  as email
FROM table1

```


### Part4 - CAST

- **DATE_PART('month', TO_DATE(month, 'month'))** here changed a month name into the number associated with that particular month.


- **CAST** is actually useful to change lots of column types. Commonly you change a string to a date using CAST(date_column AS DATE). However, you might want to make [other changes](http://www.postgresqltutorial.com/postgresql-cast/) to your columns in terms of their data types.

  - `CAST(date_column AS DATE)` OR `date_column::DATE `
  
- **Expert Tip**

  - `CAST` is most useful for turning strings into numbers or dates. Typically, if you want to turn a number into a string, performing any type of string operationg like `LEFT, RIGHT, SUBSTRING` will automatically cast the data 

  - `LEFT, RIGHT, and TRIM` are all used to select only certain elements of strings, but using them to select elements of a number or date will treat them as strings for the purpose of the function. 
  - `TRIM` can be used to remove characters from the beginning and end of a string. This can remove unwanted spaces at the beginning or end of a row that often happen with data being moved from Excel or other storage system
  
  - [Postgres literature](https://www.postgresql.org/docs/9.1/functions-string.html)
  
**Sample Code** 

```sql
SELECT *,
DATE_PART('month',TO_DATE(month,'month') AS clean_month,
year || '-' || DATE_PART('month',TO_DATE(month,'month')) ||'-' ||day as concatenated_date, 
CAST( year || '-' || DATE_PART('month',TO_DATE(month,'month')) '-' ||day as date ) as formatted_date
    (year || '-' || DATE_PART('month',TO_DATE(month,'month'))||'-' ||day)::DATE as formatted_date_Alt
FROM demo_table 
```

**Sample Code** 

```SQL
WITH table1 AS ( SELECT date,
       SUBSTRING(date,1,2) as month,
       SUBSTRING(date,4,2) as day,
       SUBSTRING(date,7,4) as year                
FROM sf_crime_data 
limit 10)

SELECT (year || '-' || month ||'-' ||day)::DATE as formatted_date_Alt 
FROM table1

```


### Part5 - COALESCE

- COALESCE returns the first non-NULL value passed for each row. 

` 
COALESCE(primary_poc,'no POC') AS primary_poc_modified 
`

**sample code**
```SQL

SELECT COALESCE(a.id, a.id) filled_id,
       a.name, a.website, a.lat, a.long, a.primary_poc, a.sales_rep_id,
       COALESCE(o.account_id, a.id) account_id, o.occurred_at,
       COALESCE(o.standard_qty, 0) standard_qty,
       COALESCE(o.gloss_qty,0) gloss_qty,
       COALESCE(o.poster_qty,0) poster_qty,
       COALESCE(o.total,0) total, 
       COALESCE(o.standard_amt_usd,0) standard_amt_usd,
       COALESCE(o.gloss_amt_usd,0) gloss_amt_usd,
       COALESCE(o.poster_amt_usd,0) poster_amt_usd, 
       COALESCE(o.total_amt_usd,0) total_amt_usd
FROM accounts a
LEFT JOIN orders o
ON a.id = o.account_id;

```