# Phase 1 Week 2 Day 4 AM - Data Wrangling in SQL

## Definition

What is Data Wrangling ?

1. Data wrangling or data munging, is the process of **transforming** and **mapping** data from one "raw" data-source data-form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics.

2. We can simply say that the data wrangling process is a **method of data cleaning and data preparation** by converting it from one form to a more understandable form mainly for preliminary data analytics.

3. The process of transformation such as :
  * Data Exploration
  * Data Preparation
  * Data Cleaning
  * Data Validation 
  * Data Enrichment
  * etc.

4. This might mean modifying all of the values in a given column in a certain way, or merging multiple columns together. 

5. The necessity for data wrangling is often a biproduct of poorly collected or presented data. Data that is entered manually by humans is typically fraught with errors; data collected from websites is often optimized to be displayed on websites, not to be sorted and aggregated.

6. You can think Data Wrangling is like Preprocessing in Machine Learning. But, we are using SQL to cleaning the data rather than using Python.

7. You can use DDL and DML syntax that you’ve already learn in previous phase.

---
## Dataset

For this tutorial, you will create several tables.

**IMPORTANT NOTES :**
1. For this DDL and DML that will explain in this notebook, **it runs smoothly in MariaDB**. So, it will be nice if you have MariaDB in your computer.

2. **If you don't have MariaDB in your computer, you can still follow the instructions with online MariaDB. Go to [sqliteonline.com](https://sqliteonline.com/)**. In the left menu, choose MariaDB and click `Click to connect`.

3. Sometimes, if you try to run the code with non MariaDB, it will error for several reasons. Mainly, this is because different way of writing a particular syntax. For example : for automatically write integer 
  * In MariaDB, you must write `AUTO INCREMENT`.
  * In SQLite, you must write `AUTOINCREMENT`.

### Table `crunchbase_companies_clean_data`

Data Definition Language (DDL)
```
CREATE TABLE crunchbase_companies_clean_data (
    permalink VARCHAR(50),
    name VARCHAR(50),
    homepage_url VARCHAR(50),
    category_code VARCHAR(50),
    funding_total_usd BIGINT,
    status VARCHAR(20),
    country_code VARCHAR(5),
    state_code VARCHAR(5),
    region VARCHAR(50),
    city VARCHAR(50),
    funding_rounds INT,
    founded_at VARCHAR(20),
    founded_at_clean VARCHAR(20),
    id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
    );
```

Data Manipulation Language (DML)
```
INSERT INTO crunchbase_companies_clean_data (permalink,name,homepage_url,category_code,funding_total_usd,status,country_code,state_code,region,city,funding_rounds,founded_at,founded_at_clean) VALUES
     ('/company/8868','8868','http://www.8868.cn',NULL,NULL,'operating',NULL,NULL,'unknown',NULL,1,NULL,NULL),
     ('/company/21e6','2.10E+07',NULL,NULL,5050000,'operating','USA','CA','SF Bay','San Francisco',1,'1/1/13','2013-01-01'),
     ('/company/club-domains','.Club Domains','http://dotclub.com','software',7000000,'operating','USA','FL','Fort Lauderdale','Oakland Park',1,'10/10/11','2011-10-10'),
     ('/company/fox-networks','.Fox Networks','http://www.dotfox.com','advertising',4912394,'closed','ARG',NULL,'Buenos Aires','Buenos Aires',1,NULL,NULL),
     ('/company/a-list-games','[a]list games','http://www.alistgames.com','games_video',9300000,'operating',NULL,NULL,'unknown',NULL,1,NULL,NULL),
     ('/company/pay-mobile-checkout','@Pay','http://atpay.com','mobile',3500000,'operating','USA','NM','Albuquerque','Albuquerque',1,'5/1/11','2011-05-01'),
     ('/company/tv-communications','&TV Communications','http://enjoyandtv.com','games_video',4000000,'operating','USA','CA','Los Angeles','Los Angeles',2,NULL,NULL),
     ('/company/waywire','#waywire','http://www.waywire.com','news',1750000,'acquired','USA','NY','New York','New York',1,'6/1/12','2012-06-01'),
     ('/company/0-6-com','0-6.com','http://www.0-6.com','web',2000000,'operating',NULL,NULL,'unknown',NULL,1,'1/1/07','2007-01-01'),
     ('/company/0xdata','0xdata','http://www.0xdata.com','analytics',1700000,'operating','USA','CA','SF Bay','Mountain View',1,'1/1/11','2011-01-01');
INSERT INTO crunchbase_companies_clean_data (permalink,name,homepage_url,category_code,funding_total_usd,status,country_code,state_code,region,city,funding_rounds,founded_at,founded_at_clean) VALUES
     ('/company/1-800-doctors','1-800-DOCTORS','http://1800doctors.com','health',1750000,'operating','USA','NJ','Iselin','Iselin',1,'1/1/84','1984-01-01'),
     ('/company/10-20-media','10-20 Media','http://www.10-20media.com','ecommerce',1550000,'operating','USA','MD','Washington DC','Woodbine',3,'1/1/01','2001-01-01'),
     ('/company/1000jobboersen-de','1000jobboersen.de','http://www.1000jobboersen.de','web',NULL,'operating','DEU',NULL,'Germany - Other',NULL,1,NULL,NULL),
     ('/company/1000memories','1000memories','http://1000memories.com','web',2535000,'acquired','USA','CA','SF Bay','San Francisco',2,'7/1/10','2010-07-01'),
     ('/company/1000museums-com','1000museums.com','http://www.1000museums.com','web',4196711,'operating','USA','WA','Seattle','Bellevue',3,'1/1/08','2008-01-01'),
     ('/company/1001-menus','1001 Menus','http://1001menus.com','web',1736910,'operating','FRA',NULL,'Paris','Paris',1,'11/20/10','2010-11-20'),
     ('/company/100du-tv','100du.tv','http://www.100du.com','hospitality',3000000,'operating',NULL,NULL,'unknown',NULL,2,NULL,NULL),
     ('/company/100e-com','100e.com','http://www.100e.com','education',3000000,'operating','CHN',NULL,'Beijing','Beijing',1,NULL,NULL),
     ('/company/100plus','100Plus','http://www.100plus.com','analytics',1250000,'acquired','USA','CA','SF Bay','San Francisco',2,'9/16/11','2011-09-16'),
     ('/company/1010data','1010data','http://www.1010data.com','software',35000000,'operating','USA','NY','New York','New York',1,'1/1/00','2000-01-01');
INSERT INTO crunchbase_companies_clean_data (permalink,name,homepage_url,category_code,funding_total_usd,status,country_code,state_code,region,city,funding_rounds,founded_at,founded_at_clean) VALUES
     ('/company/10bestthings','10BestThings','http://10bestthings.com','web',50000,'closed','USA','OH','Cleveland','Cleveland',1,'4/1/09','2009-04-01'),
     ('/company/10x-technologies','10X Technologies','http://10xtechnologies.com','biotech',3000000,'operating','USA','CA','SF Bay','Oakland',1,'1/1/12','2012-01-01'),
     ('/company/10x10-room','10X10 Room','http://10x10room.com','software',77500,'operating','USA','MA','Boston','Lexington',1,'1/1/10','2010-01-01'),
     ('/company/121cast','121cast','http://www.121cast.com','mobile',270000,'operating','AUS',NULL,'Melbourne','Melbourne',2,'2/1/12','2012-02-01'),
     ('/company/1234enter','1234ENTER','http://www.1234enter.com.br','ecommerce',650267,'operating','BRA',NULL,'Brazil - Other',NULL,2,'1/1/12','2012-01-01'),
     ('/company/123contactform','123ContactForm','http://www.123contactform.com','web',NULL,'operating','ROM',NULL,'Timisoara','Timisoara',1,'1/1/08','2008-01-01'),
     ('/company/12society','12Society','http://www.12Society.com','ecommerce',NULL,'acquired','USA','CA','Los Angeles','West Hollywood',1,'1/1/12','2012-01-01'),
     ('/company/1366-technologies','1366 Technologies','http://www.1366tech.com','manufacturing',66450000,'operating','USA','MA','Boston','Bedford',8,'1/1/07','2007-01-01'),
     ('/company/139shop','139shop','http://www.139shop.com',NULL,NULL,'operating',NULL,NULL,'unknown',NULL,1,NULL,NULL),
     ('/company/13th-lab','13th Lab','http://13thlab.com','mobile',700000,'operating','SWE',NULL,'Stockholm','Stockholm',1,'1/1/10','2010-01-01');
INSERT INTO crunchbase_companies_clean_data (permalink,name,homepage_url,category_code,funding_total_usd,status,country_code,state_code,region,city,funding_rounds,founded_at,founded_at_clean) VALUES
     ('/company/140-proof','140 Proof','http://140proof.com','advertising',5500000,'operating','USA','CA','SF Bay','San Francisco',2,'1/11/10','2010-01-11'),
     ('/company/140fire','140Fire','http://140fire.com','advertising',500000,'operating','USA','CA','Los Angeles','Santa Monica',1,'1/1/10','2010-01-01'),
     ('/company/15five','15Five','http://15five.com','software',1200000,'operating','USA','CA','SF Bay','San Francisco',2,'5/1/11','2011-05-01'),
     ('/company/15minutesnow','15MinutesNOW','http://15minutesnow.com','games_video',200000,'operating',NULL,NULL,'unknown',NULL,1,'4/19/11','2011-04-19'),
     ('/company/169-st','169 ST.','http://www.junebugreview.com','games_video',50000,'closed','USA','FL','Orlando','Lake Mary',1,'5/15/09','2009-05-15'),
     ('/company/170-systems','170 Systems','http://www.170systems.com','software',14000000,'acquired','USA','MA','Boston','Bedford',1,'1/1/90','1990-01-01'),
     ('/company/1bib','1bib','http://www.1bib.com','web',NULL,'closed','CHN',NULL,'China - Other',NULL,1,'1/1/06','2006-01-01'),
     ('/company/1c-company','1C Company','http://1c.ru/eng','software',200000000,'operating','RUS',NULL,'Moscow','Moscow',1,'1/1/91','1991-01-01'),
     ('/company/1calendar','1calendar','http://1calendar.com','education',40000,'operating','DNK',NULL,'DNK','Copenhagen',1,'1/19/09','2009-01-19'),
     ('/company/1cast','1Cast','http://www.1cast.com','news',NULL,'closed','USA','WA','Seattle','Kirkland',1,'6/1/06','2006-06-01');
INSERT INTO crunchbase_companies_clean_data (permalink,name,homepage_url,category_code,funding_total_usd,status,country_code,state_code,region,city,funding_rounds,founded_at,founded_at_clean) VALUES
     ('/company/1click','1CLICK','http://1click.io','mobile',NULL,'operating','IND',NULL,'Bangalore','Bangalore',1,'11/1/12','2012-11-01'),
     ('/company/1daylater','1DayLater','http://1daylater.com','web',43812,'operating',NULL,NULL,'unknown',NULL,2,'8/26/09','2009-08-26'),
     ('/company/1daymakeover','1DayMakeover','http://www.1daymakeover.com','ecommerce',50000,'closed','USA','CA','Los Angeles','Santa Ana',1,'6/30/08','2008-06-30'),
     ('/company/1energy-systems','1Energy Systems','http://1energysystems.com','software',1450000,'operating','USA','WA','Seattle','Seattle',1,'1/1/10','2010-01-01'),
     ('/company/1eq','1eq','http://www.1eq.me','health',1300000,'operating',NULL,NULL,'unknown',NULL,2,'1/1/12','2012-01-01'),
     ('/company/1life-healthcare','1Life Healthcare','http://www.1life.com',NULL,30000000,'operating','USA','CA','SF Bay','San Francisco',1,'1/1/02','2002-01-01'),
     ('/company/1o1media','1o1Media',NULL,NULL,NULL,'operating',NULL,NULL,'unknown',NULL,1,NULL,NULL),
     ('/company/1ring','1Ring','http://www.1ring.com','web',NULL,'operating',NULL,NULL,'unknown',NULL,1,'5/1/09','2009-05-01'),
     ('/company/1sdk','1SDK','http://www.1sdk.com','mobile',19299,'operating','DEU',NULL,'Berlin','Berlin',1,'9/1/12','2012-09-01'),
     ('/company/1stdibs','1stdibs','http://www.1stdibs.com','ecommerce',57000000,'operating','USA','NY','New York','New York',4,'1/1/01','2001-01-01');
```


### Table `sf_crime_incidents_2014_01`

Data Definition Language (DDL)
```
CREATE TABLE sf_crime_incidents_2014_01 (
    id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
    incidnt_num BIGINT,
    category VARCHAR(30),
    descript VARCHAR(100),
    day_of_week VARCHAR(10),
    date DATETIME,
    time VARCHAR(5),
    pd_district VARCHAR(20),
    resolution VARCHAR(20),
    address VARCHAR(50),
    lon FLOAT,
    lat FLOAT,
    location VARCHAR(50)
    );
```

Data Manipulation Language (DML)
```
INSERT INTO sf_crime_incidents_2014_01 (incidnt_num,category,descript,day_of_week,date,time,pd_district,resolution,address,lon,lat,location) VALUES
     (140099416,'VEHICLE THEFT','STOLEN AND RECOVERED VEHICLE','Friday','2014-01-31 08:00:00','17:00','INGLESIDE',NULL,'0 Block of GARRISON AV',-122.413623946206,37.709725805163,'(37.709725805163, -122.413623946206)'),
     (140092426,'ASSAULT','BATTERY','Friday','2014-01-31 08:00:00','17:45','TARAVAL','ARREST, CITED','100 Block of FONT BL',-122.47370623066001,37.715487608605706,'(37.7154876086057, -122.47370623066)'),
     (140092410,'SUSPICIOUS OCC','SUSPICIOUS OCCURRENCE','Friday','2014-01-31 08:00:00','15:30','PARK',NULL,'0 Block of CASTRO ST',-122.435718550322,37.768688713435104,'(37.7686887134351, -122.435718550322)'),
     (140092341,'OTHER OFFENSES','DRIVERS LICENSE, SUSPENDED OR REVOKED','Friday','2014-01-31 08:00:00','17:50','CENTRAL','ARREST, CITED','JEFFERSON ST / POWELL ST',-122.412527239682,37.808625059546706,'(37.8086250595467, -122.412527239682)'),
     (140092573,'DRUG/NARCOTIC','POSSESSION OF NARCOTICS PARAPHERNALIA','Friday','2014-01-31 08:00:00','19:20','SOUTHERN','ARREST, BOOKED','0 Block of GRACE ST',-122.414633686589,37.7750814399634,'(37.7750814399634, -122.414633686589)'),
     (146027306,'LARCENY/THEFT','GRAND THEFT FROM LOCKED AUTO','Friday','2014-01-31 08:00:00','17:25','SOUTHERN',NULL,'0 Block of MCCOPPIN ST',-122.421324876076,37.7716335058168,'(37.7716335058168, -122.421324876076)'),
     (140092288,'LARCENY/THEFT','GRAND THEFT FROM LOCKED AUTO','Friday','2014-01-31 08:00:00','14:00','RICHMOND',NULL,'400 Block of 6TH AV',-122.46433777955099,37.779837614232704,'(37.7798376142327, -122.464337779551)'),
     (140092727,'ASSAULT','BATTERY','Friday','2014-01-31 08:00:00','20:00','CENTRAL',NULL,'500 Block of SACRAMENTO ST',-122.401338334577,37.794018257336894,'(37.7940182573369, -122.401338334577)'),
     (140092874,'LARCENY/THEFT','PETTY THEFT SHOPLIFTING','Friday','2014-01-31 08:00:00','19:40','SOUTHERN','ARREST, CITED','800 Block of MARKET ST',-122.40665951743401,37.7850491022697,'(37.7850491022697, -122.406659517434)'),
     (140092830,'OTHER OFFENSES','DRIVERS LICENSE, SUSPENDED OR REVOKED','Friday','2014-01-31 08:00:00','21:35','BAYVIEW','ARREST, CITED','BACON ST / SAN BRUNO AV',-122.403595293514,37.7276340992506,'(37.7276340992506, -122.403595293514)');
INSERT INTO sf_crime_incidents_2014_01 (incidnt_num,category,descript,day_of_week,date,time,pd_district,resolution,address,lon,lat,location) VALUES
     (140092818,'ASSAULT','BATTERY','Friday','2014-01-31 08:00:00','21:06','INGLESIDE',NULL,'0 Block of AMETHYST WY',-122.446066944717,37.7461152780528,'(37.7461152780528, -122.446066944717)'),
     (140092777,'OTHER OFFENSES','DRIVERS LICENSE, SUSPENDED OR REVOKED','Friday','2014-01-31 08:00:00','21:10','TARAVAL','ARREST, CITED','JOHNMUIR DR / SKYLINEBLVD HY',-122.50022040363001,37.718954166945,'(37.718954166945, -122.50022040363)'),
     (140092200,'LARCENY/THEFT','GRAND THEFT FROM LOCKED AUTO','Friday','2014-01-31 08:00:00','09:00','RICHMOND',NULL,'500 Block of JOHNFKENNEDY DR',-122.465838183623,37.7724965522266,'(37.7724965522266, -122.465838183623)'),
     (140092125,'NON-CRIMINAL','FOUND PROPERTY','Friday','2014-01-31 08:00:00','14:30','PARK',NULL,'1800 Block of WALLER ST',-122.45487762240299,37.7681470009312,'(37.7681470009312, -122.454877622403)'),
     (140092090,'DRUNKENNESS','UNDER INFLUENCE OF ALCOHOL IN A PUBLIC PLACE','Friday','2014-01-31 08:00:00','16:00','SOUTHERN','ARREST, BOOKED','0 Block of 7TH ST',-122.411129232783,37.779316154458606,'(37.7793161544586, -122.411129232783)'),
     (140092084,'DRUG/NARCOTIC','POSSESSION OF METH-AMPHETAMINE','Friday','2014-01-31 08:00:00','15:13','MISSION','ARREST, BOOKED','16TH ST / MISSION ST',-122.419671780296,37.7650501214668,'(37.7650501214668, -122.419671780296)'),
     (140092078,'OTHER OFFENSES','POSSESSION OF BURGLARY TOOLS','Friday','2014-01-31 08:00:00','15:58','BAYVIEW','ARREST, BOOKED','0 Block of DAKOTA ST',-122.39607658856899,37.754208058980495,'(37.7542080589805, -122.396076588569)'),
     (140092040,'LARCENY/THEFT','GRAND THEFT FROM LOCKED AUTO','Friday','2014-01-31 08:00:00','11:00','RICHMOND',NULL,'300 Block of MARTIN LUTHER KING JR DR',-122.464413831456,37.7664562802319,'(37.7664562802319, -122.464413831456)'),
     (140091848,'NON-CRIMINAL','DEATH REPORT, CAUSE UNKNOWN','Friday','2014-01-31 08:00:00','14:48','NORTHERN',NULL,'100 Block of FELL ST',-122.42067602835199,37.7762540028806,'(37.7762540028806, -122.420676028352)'),
     (140091785,'LARCENY/THEFT','GRAND THEFT FROM UNLOCKED AUTO','Friday','2014-01-31 08:00:00','14:00','CENTRAL',NULL,'BROADWAY ST / CORDELIA ST',-122.40904392899401,37.7975730481122,'(37.7975730481122, -122.409043928994)');
INSERT INTO sf_crime_incidents_2014_01 (incidnt_num,category,descript,day_of_week,date,time,pd_district,resolution,address,lon,lat,location) VALUES
     (146027340,'LARCENY/THEFT','PETTY THEFT OF PROPERTY','Friday','2014-01-31 08:00:00','16:00','SOUTHERN',NULL,'0 Block of MARKET ST',-122.394874899097,37.794537236459796,'(37.7945372364598, -122.394874899097)'),
     (140091666,'WARRANTS','WARRANT ARREST','Friday','2014-01-31 08:00:00','13:45','MISSION','ARREST, BOOKED','2000 Block of MISSION ST',-122.419600224041,37.7638676440328,'(37.7638676440328, -122.419600224041)'),
     (140091666,'ASSAULT','BATTERY','Friday','2014-01-31 08:00:00','13:45','MISSION','ARREST, BOOKED','2000 Block of MISSION ST',-122.419600224041,37.7638676440328,'(37.7638676440328, -122.419600224041)'),
     (140091581,'LARCENY/THEFT','GRAND THEFT FROM UNLOCKED AUTO','Friday','2014-01-31 08:00:00','11:35','PARK',NULL,'HAYES ST / PIERCE ST',-122.43448505565901,37.775416568770396,'(37.7754165687704, -122.434485055659)'),
     (140091503,'ROBBERY','ROBBERY, BODILY FORCE','Friday','2014-01-31 08:00:00','12:20','SOUTHERN',NULL,'800 Block of BRYANT ST',-122.403742962696,37.7752316978114,'(37.7752316978114, -122.403742962696)'),
     (140091456,'DRUG/NARCOTIC','POSSESSION OF NARCOTICS PARAPHERNALIA','Friday','2014-01-31 08:00:00','11:55','PARK','ARREST, CITED','1600 Block of FULTON ST',-122.444406199614,37.7760977755843,'(37.7760977755843, -122.444406199614)'),
     (146027356,'LARCENY/THEFT','GRAND THEFT FROM LOCKED AUTO','Friday','2014-01-31 08:00:00','23:30','SOUTHERN',NULL,'SHIPLEY ST / 5TH ST',-122.402842748428,37.7798294192767,'(37.7798294192767, -122.402842748428)'),
     (146027895,'LARCENY/THEFT','PETTY THEFT OF PROPERTY','Friday','2014-01-31 08:00:00','20:40','SOUTHERN',NULL,'0 Block of NEWMONTGOMERY ST',-122.401135170461,37.7880115303236,'(37.7880115303236, -122.401135170461)'),
     (146027942,'LARCENY/THEFT','GRAND THEFT FROM LOCKED AUTO','Friday','2014-01-31 08:00:00','16:00','SOUTHERN',NULL,'RINGOLD ST / 9TH ST',-122.411075138748,37.773331667730105,'(37.7733316677301, -122.411075138748)'),
     (140091177,'NON-CRIMINAL','DEATH REPORT, CAUSE UNKNOWN','Friday','2014-01-31 08:00:00','00:01','TARAVAL',NULL,'1400 Block of 48TH AV',-122.50801009652099,37.7589334708659,'(37.7589334708659, -122.508010096521)');
INSERT INTO sf_crime_incidents_2014_01 (incidnt_num,category,descript,day_of_week,date,time,pd_district,resolution,address,lon,lat,location) VALUES
     (140092539,'MISSING PERSON','FOUND PERSON','Friday','2014-01-31 08:00:00','09:30','TARAVAL','LOCATED','0 Block of LOBOS ST',-122.45566641348101,37.7148737118982,'(37.7148737118982, -122.455666413481)'),
     (146028241,'NON-CRIMINAL','LOST PROPERTY','Friday','2014-01-31 08:00:00','01:00','CENTRAL',NULL,'GEARY ST / TAYLOR ST',-122.411518820359,37.7869408998805,'(37.7869408998805, -122.411518820359)'),
     (140090969,'ASSAULT','AGGRAVATED ASSAULT WITH A KNIFE','Friday','2014-01-31 08:00:00','06:00','MISSION',NULL,'1000 Block of POTRERO AV',-122.406677245558,37.7561400516041,'(37.7561400516041, -122.406677245558)'),
     (140090947,'ROBBERY','ROBBERY ON THE STREET WITH A GUN','Friday','2014-01-31 08:00:00','07:30','NORTHERN',NULL,'BUSH ST / STEINER ST',-122.435107179729,37.7868088327514,'(37.7868088327514, -122.435107179729)'),
     (140092539,'MISSING PERSON','MISSING JUVENILE','Friday','2014-01-31 08:00:00','09:30','TARAVAL','LOCATED','0 Block of LOBOS ST',-122.45566641348101,37.7148737118982,'(37.7148737118982, -122.455666413481)'),
     (140090919,'OTHER OFFENSES','POSSESSION OF BURGLARY TOOLS','Friday','2014-01-31 08:00:00','07:05','MISSION','ARREST, BOOKED','300 Block of DOLORES ST',-122.4262118867,37.7642984128867,'(37.7642984128867, -122.4262118867)'),
     (140090919,'OTHER OFFENSES','PROBATION VIOLATION','Friday','2014-01-31 08:00:00','07:05','MISSION','ARREST, BOOKED','300 Block of DOLORES ST',-122.4262118867,37.7642984128867,'(37.7642984128867, -122.4262118867)'),
     (140090884,'NON-CRIMINAL','AIDED CASE, MENTAL DISTURBED','Friday','2014-01-31 08:00:00','07:47','SOUTHERN','PSYCHOPATHIC CASE','0 Block of 6TH ST',-122.409574494976,37.7816102568731,'(37.7816102568731, -122.409574494976)'),
     (140092498,'MISSING PERSON','FOUND PERSON','Friday','2014-01-31 08:00:00','08:00','SOUTHERN','LOCATED','0 Block of MOSS ST',-122.40786753706101,37.7776864127303,'(37.7776864127303, -122.407867537061)'),
     (140092498,'MISSING PERSON','MISSING ADULT','Friday','2014-01-31 08:00:00','08:00','SOUTHERN','LOCATED','0 Block of MOSS ST',-122.40786753706101,37.7776864127303,'(37.7776864127303, -122.407867537061)');
INSERT INTO sf_crime_incidents_2014_01 (incidnt_num,category,descript,day_of_week,date,time,pd_district,resolution,address,lon,lat,location) VALUES
     (140091105,'VEHICLE THEFT','STOLEN TRUCK','Friday','2014-01-31 08:00:00','00:01','BAYVIEW',NULL,'1500 Block of PALOU AV',-122.390048154309,37.733450980100606,'(37.7334509801006, -122.390048154309)'),
     (140090812,'OTHER OFFENSES','DRIVERS LICENSE, SUSPENDED OR REVOKED','Friday','2014-01-31 08:00:00','06:25','CENTRAL','ARREST, CITED','SACRAMENTO ST / SANSOME ST',-122.401319002979,37.7939867990287,'(37.7939867990287, -122.401319002979)'),
     (140090771,'WARRANTS','ENROUTE TO DEPARTMENT OF CORRECTIONS','Friday','2014-01-31 08:00:00','04:26','MISSION','ARREST, BOOKED','CHURCH ST / 19TH ST',-122.428203997155,37.759691691831,'(37.759691691831, -122.428203997155)'),
     (140090771,'OTHER OFFENSES','VIOLATION OF PARK CODE','Friday','2014-01-31 08:00:00','04:26','MISSION','ARREST, BOOKED','CHURCH ST / 19TH ST',-122.428203997155,37.759691691831,'(37.759691691831, -122.428203997155)'),
     (140090743,'NON-CRIMINAL','AIDED CASE, MENTAL DISTURBED','Friday','2014-01-31 08:00:00','03:52','MISSION','PSYCHOPATHIC CASE','CASTRO ST / 24TH ST',-122.434089046473,37.751305018572396,'(37.7513050185724, -122.434089046473)'),
     (140090721,'ROBBERY','ATTEMPTED ROBBERY ON THE STREET WITH BODILY FORCE','Friday','2014-01-31 08:00:00','00:45','SOUTHERN',NULL,'1300 Block of FOLSOM ST',-122.41239754290702,37.7729954122426,'(37.7729954122426, -122.412397542907)'),
     (140090680,'ROBBERY','ROBBERY, ARMED WITH A GUN','Friday','2014-01-31 08:00:00','02:30','MISSION',NULL,'18TH ST / MISSION ST',-122.419360352761,37.7618358012376,'(37.7618358012376, -122.419360352761)'),
     (140090646,'ASSAULT','AGGRAVATED ASSAULT WITH BODILY FORCE','Friday','2014-01-31 08:00:00','02:02','RICHMOND',NULL,'GEARY BL / 4TH AV',-122.462141270182,37.781109889933205,'(37.7811098899332, -122.462141270182)'),
     (140090602,'OTHER OFFENSES','DRIVERS LICENSE, SUSPENDED OR REVOKED','Friday','2014-01-31 08:00:00','01:00','NORTHERN',NULL,'BUSH ST / VANNESS AV',-122.421949487547,37.7884881521135,'(37.7884881521135, -122.421949487547)'),
     (140090527,'OTHER OFFENSES','DRIVERS LICENSE, SUSPENDED OR REVOKED','Friday','2014-01-31 08:00:00','00:15','CENTRAL','ARREST, CITED','BROADWAY ST / COLUMBUS AV',-122.406669739951,37.7978641744394,'(37.7978641744394, -122.406669739951)');
```

Table `dc_bikeshare_q1_2012`

Data Definition Language (DDL)
```
CREATE TABLE dc_bikeshare_q1_2012 (
    id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
    duration VARCHAR(20),
    duration_seconds INT,
    start_time DATETIME,
    start_station VARCHAR(70),
    start_terminal INT,
    end_time DATETIME,
    end_station VARCHAR(70),
    end_terminal INT,
    bike_number VARCHAR(10),
    rider_type VARCHAR(20)
    );
```

Data Manipulation Language (DML)
```
INSERT INTO dc_bikeshare_q1_2012 (duration,duration_seconds,start_time,start_station,start_terminal,end_time,end_station,end_terminal,bike_number,rider_type) VALUES
     ('0h 7m 55sec.',475,'2012-01-01 00:04:00','7th & R St NW / Shaw Library',31245,'2012-01-01 00:11:00','7th & T St NW',31109,'W01412','Registered'),
     ('0h 19m 22sec.',1162,'2012-01-01 00:10:00','Georgia & New Hampshire Ave NW',31400,'2012-01-01 00:29:00','16th & Harvard St NW',31103,'W00524','Casual'),
     ('0h 19m 5sec.',1145,'2012-01-01 00:10:00','Georgia & New Hampshire Ave NW',31400,'2012-01-01 00:29:00','16th & Harvard St NW',31103,'W00235','Registered'),
     ('0h 8m 5sec.',485,'2012-01-01 00:15:00','14th & V St NW',31101,'2012-01-01 00:23:00','Park Rd & Holmead Pl NW',31602,'W00864','Registered'),
     ('0h 7m 51sec.',471,'2012-01-01 00:15:00','11th & Kenyon St NW',31102,'2012-01-01 00:23:00','7th & T St NW',31109,'W00995','Registered'),
     ('0h 5m 58sec.',358,'2012-01-01 00:17:00','Court House Metro / Wilson Blvd & N Uhle St',31017,'2012-01-01 00:23:00','Lynn & 19th St North',31014,'W00466','Registered'),
     ('0h 29m 14sec.',1754,'2012-01-01 00:18:00','37th & O St NW / Georgetown University',31236,'2012-01-01 00:47:00','9th & Upshur St NW',31404,'W00525','Registered'),
     ('0h 4m 19sec.',259,'2012-01-01 00:22:00','14th & V St NW',31101,'2012-01-01 00:27:00','15th & P St NW',31201,'W00340','Registered'),
     ('0h 8m 36sec.',516,'2012-01-01 00:24:00','Lynn & 19th St North',31014,'2012-01-01 00:33:00','25th St & Pennsylvania Ave NW',31237,'W00466','Registered'),
     ('0h 15m 13sec.',913,'2012-01-01 00:25:00','14th & V St NW',31101,'2012-01-01 00:40:00','L''Enfant Plaza / 7th & C St SW',31218,'W00963','Registered');
INSERT INTO dc_bikeshare_q1_2012 (duration,duration_seconds,start_time,start_station,start_terminal,end_time,end_station,end_terminal,bike_number,rider_type) VALUES
     ('0h 18m 17sec.',1097,'2012-01-01 00:29:00','Tenleytown / Wisconsin Ave & Albemarle St NW',31303,'2012-01-01 00:48:00','Massachusetts Ave & Dupont Circle NW',31200,'W01398','Registered'),
     ('0h 8m 10sec.',490,'2012-01-01 00:30:00','New York Ave & 15th St NW',31222,'2012-01-01 00:38:00','21st & I St NW',31205,'W00042','Registered'),
     ('0h 17m 25sec.',1045,'2012-01-01 00:32:00','Metro Center / 12th & G St NW',31230,'2012-01-01 00:50:00','Massachusetts Ave & Dupont Circle NW',31200,'W00570','Registered'),
     ('0h 17m 15sec.',1035,'2012-01-01 00:32:00','Lamont & Mt Pleasant NW',31107,'2012-01-01 00:50:00','14th & Rhode Island Ave NW',31203,'W01463','Registered'),
     ('0h 17m 40sec.',1060,'2012-01-01 00:33:00','Lamont & Mt Pleasant NW',31107,'2012-01-01 00:50:00','14th & Rhode Island Ave NW',31203,'W00535','Registered'),
     ('0h 17m 19sec.',1039,'2012-01-01 00:33:00','Metro Center / 12th & G St NW',31230,'2012-01-01 00:50:00','Massachusetts Ave & Dupont Circle NW',31200,'W00494','Registered'),
     ('0h 7m 23sec.',443,'2012-01-01 00:33:00','25th St & Pennsylvania Ave NW',31237,'2012-01-01 00:41:00','New York Ave & 15th St NW',31222,'W00466','Registered'),
     ('0h 5m 16sec.',316,'2012-01-01 00:33:00','7th & T St NW',31109,'2012-01-01 00:39:00','Convention Center / 7th & M St NW',31223,'W00663','Registered'),
     ('0h 8m 26sec.',506,'2012-01-01 00:34:00','14th & Rhode Island Ave NW',31203,'2012-01-01 00:42:00','14th & V St NW',31101,'W01052','Registered'),
     ('0h 15m 56sec.',956,'2012-01-01 00:36:00','17th & Corcoran St NW',31214,'2012-01-01 00:52:00','17th & Corcoran St NW',31214,'W00174','Registered');
INSERT INTO dc_bikeshare_q1_2012 (duration,duration_seconds,start_time,start_station,start_terminal,end_time,end_station,end_terminal,bike_number,rider_type) VALUES
     ('0h 4m 4sec.',244,'2012-01-01 00:37:00','17th & Corcoran St NW',31214,'2012-01-01 00:41:00','Massachusetts Ave & Dupont Circle NW',31200,'W01298','Registered'),
     ('0h 5m 19sec.',319,'2012-01-01 00:39:00','McPherson Square - 14th & H St NW',31216,'2012-01-01 00:44:00','8th & H St NW',31228,'W01333','Registered'),
     ('0h 2m 37sec.',157,'2012-01-01 00:39:00','Potomac & Pennsylvania Ave SE',31606,'2012-01-01 00:42:00','Potomac & Pennsylvania Ave SE',31606,'W00697','Registered'),
     ('0h 8m 31sec.',511,'2012-01-01 00:41:00','4th & E St SW',31244,'2012-01-01 00:49:00','5th & F St NW',31620,'W00260','Registered'),
     ('0h 3m 19sec.',199,'2012-01-01 00:45:00','18th & M St NW',31221,'2012-01-01 00:48:00','19th St & Pennsylvania Ave NW',31100,'W00658','Registered'),
     ('0h 8m 19sec.',499,'2012-01-01 00:45:00','15th & P St NW',31201,'2012-01-01 00:54:00','16th & U St NW',31229,'W00996','Registered'),
     ('0h 7m 40sec.',460,'2012-01-01 00:46:00','15th & P St NW',31201,'2012-01-01 00:53:00','16th & U St NW',31229,'W00790','Registered'),
     ('0h 15m 31sec.',931,'2012-01-01 00:48:00','Massachusetts Ave & Dupont Circle NW',31200,'2012-01-01 01:04:00','4th St & Massachusetts Ave NW',31604,'W01213','Registered'),
     ('0h 6m 38sec.',398,'2012-01-01 00:49:00','Park Rd & Holmead Pl NW',31602,'2012-01-01 00:55:00','Columbia Rd & Belmont St NW',31113,'W00981','Registered'),
     ('0h 9m 18sec.',558,'2012-01-01 00:49:00','17th & Corcoran St NW',31214,'2012-01-01 00:59:00','16th & Harvard St NW',31103,'W01270','Registered');
INSERT INTO dc_bikeshare_q1_2012 (duration,duration_seconds,start_time,start_station,start_terminal,end_time,end_station,end_terminal,bike_number,rider_type) VALUES
     ('0h 10m 6sec.',606,'2012-01-01 00:49:00','17th & Corcoran St NW',31214,'2012-01-01 01:00:00','16th & Harvard St NW',31103,'W01465','Registered'),
     ('0h 2m 24sec.',144,'2012-01-01 00:50:00','19th St & Pennsylvania Ave NW',31100,'2012-01-01 00:53:00','21st & I St NW',31205,'W00658','Registered'),
     ('0h 14m 46sec.',886,'2012-01-01 00:52:00','Adams Mill & Columbia Rd NW',31104,'2012-01-01 01:07:00','C & O Canal & Wisconsin Ave NW',31225,'W00936','Casual'),
     ('0h 4m 1sec.',241,'2012-01-01 00:53:00','Eastern Market Metro / Pennsylvania Ave & 7th St SE',31613,'2012-01-01 00:57:00','14th & D St SE',31607,'W00007','Registered'),
     ('0h 7m 48sec.',468,'2012-01-01 00:53:00','Potomac & Pennsylvania Ave SE',31606,'2012-01-01 01:01:00','13th & D St NE',31622,'W00880','Registered'),
     ('0h 7m 40sec.',460,'2012-01-01 00:54:00','Potomac & Pennsylvania Ave SE',31606,'2012-01-01 01:01:00','13th & D St NE',31622,'W00232','Registered'),
     ('0h 7m 18sec.',438,'2012-01-01 00:54:00','Potomac & Pennsylvania Ave SE',31606,'2012-01-01 01:01:00','13th & D St NE',31622,'W00539','Registered'),
     ('0h 7m 24sec.',444,'2012-01-01 00:54:00','Potomac & Pennsylvania Ave SE',31606,'2012-01-01 01:01:00','13th & D St NE',31622,'W01108','Registered'),
     ('0h 3m 18sec.',198,'2012-01-01 00:55:00','L''Enfant Plaza / 7th & C St SW',31218,'2012-01-01 00:58:00','4th & E St SW',31244,'W00326','Casual'),
     ('0h 2m 59sec.',179,'2012-01-01 00:55:00','L''Enfant Plaza / 7th & C St SW',31218,'2012-01-01 00:58:00','4th & E St SW',31244,'W01111','Registered');
INSERT INTO dc_bikeshare_q1_2012 (duration,duration_seconds,start_time,start_station,start_terminal,end_time,end_station,end_terminal,bike_number,rider_type) VALUES
     ('0h 6m 18sec.',378,'2012-01-01 00:55:00','20th & Crystal Dr',31002,'2012-01-01 01:02:00','20th & Crystal Dr',31002,'W00333','Casual'),
     ('0h 5m 5sec.',305,'2012-01-01 00:56:00','20th & Crystal Dr',31002,'2012-01-01 01:01:00','20th & Crystal Dr',31002,'W01257','Casual'),
     ('0h 13m 37sec.',817,'2012-01-01 00:57:00','10th & U St NW',31111,'2012-01-01 01:11:00','21st & M St NW',31212,'W00913','Registered'),
     ('0h 1m 47sec.',107,'2012-01-01 00:57:00','5th St & K St NW',31600,'2012-01-01 00:59:00','4th St & Massachusetts Ave NW',31604,'W00713','Registered'),
     ('0h 8m 14sec.',494,'2012-01-01 00:58:00','Adams Mill & Columbia Rd NW',31104,'2012-01-01 01:06:00','17th & Corcoran St NW',31214,'W00954','Registered'),
     ('0h 20m 6sec.',1206,'2012-01-01 00:58:00','17th & Corcoran St NW',31214,'2012-01-01 01:18:00','4th & M St SW',31108,'W00174','Registered'),
     ('0h 14m 5sec.',845,'2012-01-01 00:59:00','California St & Florida Ave NW',31116,'2012-01-01 01:13:00','7th & T St NW',31109,'W00147','Registered'),
     ('0h 6m 50sec.',410,'2012-01-01 00:59:00','14th & V St NW',31101,'2012-01-01 01:06:00','Columbia Rd & Belmont St NW',31113,'W01130','Registered'),
     ('0h 1m 54sec.',114,'2012-01-01 01:02:00','Park Rd & Holmead Pl NW',31602,'2012-01-01 01:04:00','11th & Kenyon St NW',31102,'W00375','Registered'),
     ('0h 18m 44sec.',1124,'2012-01-01 01:04:00','4th St & Massachusetts Ave NW',31604,'2012-01-01 01:23:00','15th St & Massachusetts Ave SE',31626,'W01213','Registered');
```

---
## Data Exploration

First, you need to know about your dataset. You learned that certain functions work on some data types, but not others. 

For example, `COUNT` works with any data type, but `SUM` only works for numerical data. In order to use `SUM`, the data must appear to be numeric, but it must also be stored in the database in a numeric form.

You might run into this problem, for example, **if you have a column that appears to be entirely numeric, but happens to contain spaces or commas.** If you upload data to particular SQL databases software with commas in a column full of numbers, that SQL database software will treat that column as non-numeric. 

Generally, numeric column types in various SQL databases do not support commas or currency symbols. To make things more complicated, SQL databases can store data in many different formats with different levels of precision.

To see a list of data types, you can visit the website of each SQL database software, or at a glance, you can visit [this](https://www.w3schools.com/sql/sql_datatypes.asp).


---

It's certainly best for data to be stored in its optimal format from the beginning, but if it isn't, you can always change it in your query. **It's particularly common for dates or numbers, for example, to be stored as strings.** This becomes problematic when you want to sum a column and you get an error because SQL is reading numbers as strings. 

```
-- Convert one column
ALTER TABLE table_name
MODIFY column_name new_column_data_type
```

You can also convert data type at the time of querying so that it doesn't change original dataset. 

Synytax : `CONVERT(value, type)` or `CAST(value, type)`

```
-- Example : Convert a column
SELECT CONVERT(founded_at, DATE)
FROM crunchbase_companies_clean_data;

-- Example : Convert a value
SELECT CONVERT('01/03/12', DATE);
Result: '2001-03-12'
```

---
In our table, you can see in the table `crunchbase_companies_clean_data`, there is a column named `founded_at`. Let's check the result from this query : 

```
SELECT *
FROM crunchbase_companies_clean_data
ORDER BY founded_at`
```

As you can see, the result is not ordered properly. Try, convert it into `DATE` and then re-query : 

```
ALTER TABLE crunchbase_companies_clean_data MODIFY founded_at DATE

SELECT *
FROM crunchbase_companies_clean_data
ORDER BY founded_at`
```

After we convert it into `DATE`, now we can see that the result is properly ordered.

---
## SQL Date Format

You're probably used to seeing dates formatted as MM-DD-YYYY or a similar, month-first format. It's not necessarily any worse than DD-MM-YYYY. The problem with both of these formats is that when they are stored as strings, they don't sort in chronological order. For example, here's a date field stored as a string. Because the month is listed first, the `ORDER BY` statement doesn't produce a chronological list:

```
SELECT permalink,
       founded_at
FROM crunchbase_companies_clean_data
ORDER BY founded_at
```
You must convert it from string data-type to datetime data-type. Here's an example from the same table, but with a field that has a cleaned date. Note that the cleaned date field is actually stored as a string, but still sorts in chronological order anyway:

```
SELECT permalink,
       founded_at,
       founded_at_clean
FROM crunchbase_companies_clean_data
ORDER BY founded_at_clean
```

---
### Crazy rules for dates and times

Assuming you've got some dates properly stored as a `DATE` or `TIME` data type, you can do some pretty powerful things. Maybe you'd like to calculate a field of dates a week after an existing field. Or maybe you'd like to create a field that indicates how many days apart the values in two other date fields are. These are trivially simple, but it's important to keep in mind that the data type of your results will depend on exactly what you are doing to the dates.

When you perform arithmetic on dates (such as subtracting one date from another), the results are often stored as the `interval` data type — a series of integers that represent a period of time. The following query uses date subtraction to determine how long it took companies to be acquired (unacquired companies and those without dates entered were filtered out). Note that because the `companies.founded_at_clean` column is stored as a string, it must be cast as a timestamp before it can be subtracted from another timestamp.

```
SELECT companies.permalink,
       companies.founded_at_clean,
       acquisitions.acquired_at_cleaned,
       acquisitions.acquired_at_cleaned -
         companies.founded_at_clean::timestamp AS time_to_acquisition
FROM crunchbase_companies_clean_data companies
JOIN crunchbase_acquisitions_clean_data acquisitions
  ON acquisitions.company_permalink = companies.permalink
WHERE founded_at_clean IS NOT NULL
```

In the example above, you can see that the `time_to_acquisition` column is an interval, not another date.

You can introduce intervals using the `INTERVAL` function as well:

```
SELECT companies.permalink,
       companies.founded_at_clean,
       DATE_ADD(CONVERT(companies.founded_at_clean, datetime),  INTERVAL 1 WEEK) AS plus_one_week
FROM crunchbase_companies_clean_data companies
WHERE founded_at_clean IS NOT NULL
```

The interval is defined using plain-English terms like '10 seconds' or '5 months'. Also note that adding or subtracting a `date` column and an `interval` column results in another date column as in the above query.

You can add the current time (at the time you run the query) into your code using the `NOW()` function:

```
SELECT companies.permalink,
       companies.founded_at_clean,
       NOW() - CONVERT(companies.founded_at_clean, DATETIME) AS founded_time_ago
FROM crunchbase_companies_clean_data companies
WHERE founded_at_clean IS NOT NULL
```

---
## Using SQL String Functions to Clean Data

This lesson features data on San Francisco Crime Incidents for the 3-month period beginning November 1, 2013 and ending January 31, 2014. It was collected from the SF Data website on February 16, 2014. There is one row for each incident reported. Some field definitions: location is the GPS location of the incident, listed in decimal degrees, latitude first, longitude second. The two coordinates are also broken out into the lat and lon fields, respectively.

```
SELECT *
FROM tutorial.sf_crime_incidents_2014_01
```

---
### Cleaning Strings

`LEFT`, `RIGHT`, and `TRIM` are all used to select only certain elements of strings, but using them to select elements of a number or date will treat them as strings for the purpose of the function.

---
#### LEFT, RIGHT, and LENGTH

Let's start with `LEFT`. You can use `LEFT` to pull a certain number of characters from the left side of a string and present them as a separate string. The syntax is `LEFT(string, number of characters)`.

As a practical example, we can see that the date field in this dataset begins with a 10-digit date, and include the timestamp to the right of it. The following query pulls out only the date.

```
SELECT incidnt_num,
       date,
       LEFT(date, 10) AS cleaned_date
FROM sf_crime_incidents_2014_01
```

`RIGHT` does the same thing, but from the right side:
```
SELECT incidnt_num,
       date,
       LEFT(date, 10) AS cleaned_date,
       RIGHT(date, 17) AS cleaned_time
FROM sf_crime_incidents_2014_01
```

`RIGHT` works well in this case because we know that the number of characters will be consistent across the entire date field. If it wasn't consistent, it's still possible to pull a string from the right side in a way that makes sense. 

The `LENGTH` function returns the length of a string. So `LENGTH(date)` will always return 28 in this dataset. Since we know that the first 10 characters will be the date, and they will be followed by a space (total 11 characters), we could represent the `RIGHT` function like this:

```
SELECT incidnt_num,
       date,
       LEFT(date, 10) AS cleaned_date,
       RIGHT(date, LENGTH(date) - 11) AS cleaned_time
FROM sf_crime_incidents_2014_01
```
When using functions within other functions, it's important to remember that **the innermost functions will be evaluated first, followed by the functions that encapsulate them**.

---
#### TRIM

The `TRIM` function is used to remove characters from the beginning and end of a string. Here's an example:

```
SELECT location,
       TRIM(both '(3' FROM location)
FROM sf_crime_incidents_2014_01
```

The `TRIM` function takes 3 arguments. First, you have to specify whether you want to remove characters from the beginning ('leading'), the end ('trailing'), or both ('both', as used above). Next you must specify all characters to be trimmed. Any characters included in the single quotes will be removed from both beginning, end, or both sides of the string. Finally, you must specify the text you want to trim using `FROM`.

---
#### POSITION and STRPOS

`POSITION` allows you to specify a substring, then returns a numerical value equal to the character number (counting from left) where that substring first appears in the target string. For example, the following query will return the position of the character 'A' (case-sensitive) where it first appears in the `descript` field:

```
SELECT incidnt_num,
       descript,
       POSITION('A' IN descript) AS a_position
FROM sf_crime_incidents_2014_01
```

---
#### SUBSTR

`LEFT` and `RIGHT` both create substrings of a specified length, but they only do so starting from the sides of an existing string. If you want to start in the middle of a string, you can use `SUBSTR`. The syntax is `SUBSTR(*string*, *starting character position*, *# of characters*)`:

```
SELECT incidnt_num,
       date,
       SUBSTR(date, 6, 2) AS month
FROM sf_crime_incidents_2014_01
```

---
#### CONCAT

You can combine strings from several columns together (and with hard-coded values) using `CONCAT`. Simply order the values you want to concatenate and separate them with commas. If you want to hard-code values, enclose them in single quotes. Here's an example:

```
SELECT incidnt_num,
       day_of_week,
       LEFT(date, 10) AS cleaned_date,
       CONCAT(day_of_week, ', ', LEFT(date, 10)) AS day_and_date
FROM sf_crime_incidents_2014_01
```

---
#### Changing case with UPPER and LOWER

Sometimes, you just don't want your data to look like it's screaming at you. 
* You can use **`LOWER` to force every character in a string to become lower-case**. 
* Similarly, you can use **`UPPER` to make all the letters appear in upper-case**:

```
SELECT incidnt_num,
       address,
       UPPER(address) AS address_upper,
       LOWER(address) AS address_lower
FROM sf_crime_incidents_2014_01
```

There are a number of variations of these functions, as well as several other string functions not covered here. Different databases use subtle variations on these functions, so be sure to look up the appropriate database's syntax if you're connected to a private database.

---
### Turning dates into more useful dates

Dates are some of the most commonly screwed-up formats in SQL. This can be the result of a few things:

* The data was manipulated in Excel at some point, and the dates were changed to MM/DD/YYYY format or another format that is not compliant with SQL's strict standards.
* The data was manually entered by someone who use whatever formatting convention he/she was most familiar with.
* The date uses text (Jan, Feb, etc.) intsead of numbers to record months.

Once you've got a well-formatted date field, you can manipulate in all sorts of interesting ways. To make the lesson a little cleaner, we'll use a different version of the crime incidents dataset that already has a nicely-formatted date field:

```
SELECT *
FROM tutorial.sf_crime_incidents_cleandate
```

You've learned how to construct a date field, but what if you want to deconstruct one? You can use EXTRACT to pull the pieces apart one-by-one:

```
SELECT date,
       EXTRACT(year FROM date) AS year,
       EXTRACT(MONTH FROM date) AS month,
       EXTRACT(DAY FROM date) AS day,
       EXTRACT(HOUR FROM date) AS hour,
       EXTRACT(MINUTE FROM date) AS minute,
       EXTRACT(SECOND FROM date) AS second,
       EXTRACT(QUARTER FROM date) AS quarter
FROM sf_crime_incidents_2014_01;
```

What if you want to include today's date or time? You can instruct your query to pull the local date and time at the time the query is run using any number of functions. Interestingly, you can run them without a `FROM` clause:

```
SELECT CURRENT_DATE AS date,
       CURRENT_TIME AS time,
       CURRENT_TIMESTAMP AS timestamp,
       LOCALTIME AS local_time,
       LOCALTIMESTAMP AS local_timestamp,
       NOW() AS now
```

As you can see, the different options vary in precision. You might notice that these times probably aren't actually your local time. If you run a current time function against a connected database, you might get a result in a different time zone.

There is a lot function related to Date & Time. This is example those functions in [Maria DB](https://mariadb.com/kb/en/date-time-functions/).

---
### COALESCE

Occasionally, you will end up with a dataset that has some nulls that you'd prefer to contain actual values. This happens frequently in numerical data (displaying nulls as 0 is often preferable), and when performing outer joins that result in some unmatched rows. In cases like this, you can use `COALESCE` to replace the null values:

```
SELECT category,
       resolution,
       COALESCE(resolution, 'No Resolution')
FROM sf_crime_incidents_2014_01
```

---
## SQL Window Functions

This lesson uses data from Washington DC's Capital Bikeshare Program, (table `dc_bikeshare_q1_2012`) which publishes detailed trip-level historical data on their website. The data was downloaded in February, 2014, but is limited to data collected during the first quarter of 2012. Each row represents one ride. Most fields are self-explanatory, except `rider_type`: 
  * `Registered` indicates a monthly membership to the rideshare program, 
  * `Casual` incidates that the rider bought a 3-day pass. 

The `start_time` and `end_time` fields were cleaned up from their original forms to suit SQL date formatting—they are stored in this table as timestamps.

### Intro to window functions

A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.

The most practical example of this is to calculate cumulative of field `duration_seconds` based on `start_time`:

```
SELECT duration_seconds,
       SUM(duration_seconds) OVER (ORDER BY start_time) AS running_total
FROM dc_bikeshare_q1_2012
```

You can see that the above query creates an aggregation (`running_total`) without using `GROUP BY`. Let's break down the syntax and see how it works.

---
### Basic windowing syntax

The first part of the above aggregation, `SUM(duration_seconds)`, looks a lot like any other aggregation. Adding `OVER` designates it as a window function. You could read the above aggregation as "take the sum of `duration_seconds` over the entire result set, in order by `start_time`."

If you'd like to narrow the window from the entire dataset to individual groups within the dataset, you can use `PARTITION BY` to do so:

```
SELECT start_terminal,
       duration_seconds,
       SUM(duration_seconds) OVER
         (PARTITION BY start_terminal ORDER BY start_time)
         AS running_total
FROM dc_bikeshare_q1_2012
WHERE start_time < '2012-01-08'
```

The above query groups and orders the query by `start_terminal`. Within each value of `start_terminal`, it is ordered by `start_time`, and the running total sums across the current row and all previous rows of `duration_seconds`. Scroll down until the `start_terminal` value changes and you will notice that `running_total` starts over. That's what happens when you group using `PARTITION BY`. In case you're still stumped by `ORDER BY`, it simply orders by the designated column(s) the same way the `ORDER BY` clause would, except that it treats every partition as separate. It also creates the running total—without `ORDER BY`, each value will simply be a sum of all the duration_seconds values in its respective `start_terminal`. Try running the above query without `ORDER BY` to get an idea:

```
SELECT start_terminal,
       duration_seconds,
       SUM(duration_seconds) OVER
         (PARTITION BY start_terminal) AS start_terminal_total
FROM dc_bikeshare_q1_2012
WHERE start_time < '2012-01-08'
```

The `ORDER` and `PARTITION` define what is referred to as the "window"—the ordered subset of data over which calculations are made.

Note: You can't use window functions and standard aggregations in the same query. More specifically, you can't include window functions in a `GROUP BY` clause.

---
### The usual suspects: SUM, COUNT, and AVG

When using window functions, you can apply the same aggregates that you would under normal circumstances—`SUM`, `COUNT`, and `AVG`. The easiest way to understand these is to re-run the previous example with some additional functions. 

```
SELECT start_terminal,
       duration_seconds,
       SUM(duration_seconds) OVER
         (PARTITION BY start_terminal) AS running_total,
       COUNT(duration_seconds) OVER
         (PARTITION BY start_terminal) AS running_count,
       AVG(duration_seconds) OVER
         (PARTITION BY start_terminal) AS running_avg
FROM dc_bikeshare_q1_2012
WHERE start_time < '2012-01-08'
```

Alternatively, the same functions with `ORDER BY`:

```
SELECT start_terminal,
       duration_seconds,
       SUM(duration_seconds) OVER
         (PARTITION BY start_terminal ORDER BY start_time) AS running_total,
       COUNT(duration_seconds) OVER
         (PARTITION BY start_terminal ORDER BY start_time) AS running_count,
       AVG(duration_seconds) OVER
         (PARTITION BY start_terminal ORDER BY start_time) AS running_avg
FROM dc_bikeshare_q1_2012
WHERE start_time < '2012-01-08'
```

This next practice problem is very similar to the examples, so try modifying the above code rather than starting from scratch.

---
### ROW_NUMBER

`ROW_NUMBER()` does just what it sounds like—displays the number of a given row. It starts are 1 and numbers the rows according to the `ORDER BY` part of the window statement. `ROW_NUMBER()` does not require you to specify a variable within the parentheses:

```
SELECT start_terminal,
       start_time,
       duration_seconds,
       ROW_NUMBER() OVER (ORDER BY start_time)
                    AS row_number
FROM dc_bikeshare_q1_2012
WHERE start_time < '2012-01-08'
```

Using the `PARTITION BY` clause will allow you to begin counting 1 again in each partition. The following query starts the count over again for each terminal:

```
SELECT start_terminal,
       start_time,
       duration_seconds,
       ROW_NUMBER() OVER (PARTITION BY start_terminal
                          ORDER BY start_time)
                    AS row_number
FROM dc_bikeshare_q1_2012
WHERE start_time < '2012-01-08'
```

### NTILE

You can use window functions to identify what percentile (or quartile, or any other subdivision) a given row falls into. The syntax is `NTILE(*# of buckets*)`. In this case, `ORDER BY` determines which column to use to determine the quartiles (or whatever number of 'tiles you specify). For example:

```
SELECT start_terminal,
       duration_seconds,
       NTILE(4) OVER
         (PARTITION BY start_terminal ORDER BY duration_seconds)
          AS quartile,
       NTILE(5) OVER
         (PARTITION BY start_terminal ORDER BY duration_seconds)
         AS quintile
FROM dc_bikeshare_q1_2012
WHERE start_time < '2012-01-08'
ORDER BY start_terminal, duration_seconds
```

### LAG and LEAD

It can often be useful to compare rows to preceding or following rows, especially if you've got the data in an order that makes sense. You can use `LAG` or `LEAD` to create columns that pull values from other rows—all you need to do is enter which column to pull from and how many rows away you'd like to do the pull. `LAG` pulls from previous rows and `LEAD` pulls from following rows:

In the following syntax, you can see previous and following rows per `start_terminal`. If the `start_terminal` only contains 1 row, you can seet both in `lag` column and `lead` column, it'll be filled by `NULL`.

```
SELECT start_terminal,
       duration_seconds,
       LAG(duration_seconds, 1) OVER
         (PARTITION BY start_terminal ORDER BY duration_seconds) AS lag,
       LEAD(duration_seconds, 1) OVER
         (PARTITION BY start_terminal ORDER BY duration_seconds) AS lead
FROM dc_bikeshare_q1_2012
WHERE start_time < '2012-01-08'
ORDER BY start_terminal, duration_seconds
```


This is especially useful if you want to calculate differences between rows:

```
SELECT start_terminal,
       duration_seconds,
       duration_seconds -LAG(duration_seconds, 1) OVER
         (PARTITION BY start_terminal ORDER BY duration_seconds)
         AS difference
FROM dc_bikeshare_q1_2012
WHERE start_time < '2012-01-08'
ORDER BY start_terminal, duration_seconds
```

The first row of the `difference` column is null because there is no previous row from which to pull. Similarly, using `LEAD` will create nulls at the end of the dataset. If you'd like to make the results a bit cleaner, you can wrap it in an outer query to remove nulls:

```
SELECT *
  FROM (
    SELECT start_terminal,
           duration_seconds,
           duration_seconds -LAG(duration_seconds, 1) OVER
             (PARTITION BY start_terminal ORDER BY duration_seconds)
             AS difference
      FROM dc_bikeshare_q1_2012
     WHERE start_time < '2012-01-08'
     ORDER BY start_terminal, duration_seconds
       ) sub
 WHERE sub.difference IS NOT NULL
```