# SQL Data Cleaning


**In this lesson, you will be learning a number of techniques to**

* Clean and re-structure messy data.
* Convert columns to different data types.
* Tricks for manipulating NULLs.

This will give you a robust toolkit to get from raw data to clean data that's useful for analysis.


We connect to MySQL server and workbench and make analysis with the parch-and-posey database. This course is the practicals of the course SQL for Data Analysis at Udacity.

In [1]:
# we import some required libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pprint import pprint
import time
print('Done!')

Done!


In [2]:
import mysql
from mysql.connector import Error
from getpass import getpass

db_name = 'parch_and_posey'
try:
    connection = mysql.connector.connect(host='localhost',
                                         database=db_name,
                                         user=input('Enter UserName:'),
                                         password=getpass('Enter Password:'))
    if connection.is_connected():
        db_Info = connection.get_server_info()
        print("Connected to MySQL Server version ", db_Info)
        cursor = connection.cursor()
        cursor.execute("select database();")
        record = cursor.fetchone()
        print("You're connected to database: ", record)

except Error as e:
    print("Error while connecting to MySQL", e)

Enter UserName:root
Enter Password:········
Connected to MySQL Server version  8.0.24
You're connected to database:  ('parch_and_posey',)


In [3]:
# Let's see the tables in the database

# let's run the show tables command 

cursor.execute('show tables')
out = cursor.fetchall()
out

[('accounts',), ('orders',), ('region',), ('sales_reps',), ('web_events',)]

**Defining a method that converts the result of a query to a dataframe**

In [4]:
def query_to_df(query):
    st = time.time()
    # Assert Every Query ends with a semi-colon
    try:
        assert query.endswith(';')
    except AssertionError:
        return 'ERROR: Query Must End with ;'

    # so we never have more than 20 rows displayed
    pd.set_option('display.max_rows', 20) 
    df = None

    # Process the query
    cursor.execute(query)
    columns = cursor.description
    result = []
    for value in cursor.fetchall():
        tmp = {}
        for (index,column) in enumerate(value):
            tmp[columns[index][0]] = [column]
        result.append(tmp)

    # Create a DataFrame from all results
    for ind, data in enumerate(result):
        if ind >= 1:
            x = pd.DataFrame(data)
            df = pd.concat([df, x], ignore_index=True)
        else:
            df = pd.DataFrame(data)
    print(f'Query ran for {time.time()-st} secs!')
    return df

In [5]:
# 1. For the accounts table

query = 'SELECT * FROM accounts LIMIT 3;'
query_to_df(query)

Query ran for 0.0689997673034668 secs!


Unnamed: 0,id,name,website,lat,longs,primary_poc,sales_rep_id
0,1001,Walmart,www.walmart.com,40.23849561,-75.10329704,Tamara Tuma,321500
1,1011,Exxon Mobil,www.exxonmobil.com,41.1691563,-73.84937379,Sung Shields,321510
2,1021,Apple,www.apple.com,42.29049481,-76.08400942,Jodee Lupo,321520


In [6]:
# 2. For the orders table

query = 'SELECT * FROM orders LIMIT 3;'
query_to_df(query)

Query ran for 0.10384035110473633 secs!


Unnamed: 0,id,account_id,occurred_at,standard_qty,gloss_qty,poster_qty,total,standard_amt_usd,gloss_amt_usd,poster_amt_usd,total_amt_usd
0,1,1001,2015-10-06 17:31:14,123,22,24,169,613.77,164.78,194.88,973.43
1,2,1001,2015-11-05 03:34:33,190,41,57,288,948.1,307.09,462.84,1718.03
2,3,1001,2015-12-04 04:21:55,85,47,0,132,424.15,352.03,0.0,776.18


In [7]:
# 3. For the sales_reps table

query = 'SELECT * FROM sales_reps LIMIT 3;'
query_to_df(query)

Query ran for 0.05859208106994629 secs!


Unnamed: 0,id,name,region_id
0,321500,Samuel Racine,1
1,321510,Eugena Esser,1
2,321520,Michel Averette,1


In [8]:
# 4. For the web_events table

query = 'SELECT * FROM web_events LIMIT 3;'
query_to_df(query)

Query ran for 0.09909844398498535 secs!


Unnamed: 0,id,account_id,occurred_at,channel
0,1,1001,2015-10-06 17:13:58,direct
1,2,1001,2015-11-05 03:08:26,direct
2,3,1001,2015-12-04 03:57:24,direct


In [9]:
# 5. For the region table

query = 'SELECT * FROM region LIMIT 3;'
query_to_df(query)

Query ran for 0.04686427116394043 secs!


Unnamed: 0,id,name
0,1,Northeast
1,2,Midwest
2,3,Southeast


### LEFT
### RIGHT
### LENGTH

**LEFT** pulls a specified number of characters for each row in a specified column starting at the beginning (or from the left). As you saw here, you can pull the first three digits of a phone number using LEFT(phone_number, 3).


**RIGHT** pulls a specified number of characters for each row in a specified column starting at the end (or from the right). As you saw here, you can pull the last eight digits of a phone number using RIGHT(phone_number, 8).


**LENGTH** provides the number of characters for each row of a specified column. Here, you saw that we could use this to get the length of each phone number as LENGTH(phone_number).


**LEFT & RIGHT Quizzes**

* In the accounts table, there is a column holding the website for each company. The last three digits specify what type of web address they are using. Pull these extensions and provide how many of each website type exist in the accounts table.

In [10]:
query_to_df(
"SELECT RIGHT(website, 3) website_type, COUNT(*) counts FROM accounts GROUP BY 1;"
)

Query ran for 0.0312349796295166 secs!


Unnamed: 0,website_type,counts
0,com,349
1,org,1
2,net,1


* There is much debate about how much the name (or even the first letter of a company name) matters. Use the accounts table to pull the first letter of each company name to see the distribution of company names that begin with each letter (or number).

In [11]:
query_to_df(
"SELECT LEFT(name, 1) first_char, COUNT(*) counts FROM accounts GROUP BY 1 ORDER BY 2 DESC;"
)

Query ran for 0.03124833106994629 secs!


Unnamed: 0,first_char,counts
0,A,37
1,C,37
2,P,27
3,M,22
4,S,17
...,...,...
21,O,7
22,X,2
23,3,1
24,Q,1


* Use the accounts table and a CASE statement to create two groups: one group of company names that start with a number and a second group of those company names that start with a letter. What proportion of company names start with a letter?

In [12]:
# Let's see the spread of company names starting with either alphabets or numbers

query_to_df(
"WITH \
table1 AS (SELECT LEFT(a.name, 1) first_char FROM accounts a), \
table2 AS (SELECT CASE WHEN first_char = first_char*10 THEN 'is_alpha' ELSE 'is_num' END first_char, COUNT(*) counts \
FROM table1 GROUP BY 1) \
SELECT * FROM table2;"
)

Query ran for 0.006979942321777344 secs!


Unnamed: 0,first_char,counts
0,is_alpha,350
1,is_num,1


In [13]:
# Next, let's calculate the proportion of numbers in the first letters of company names

query_to_df(
"WITH \
table1 AS (SELECT LEFT(a.name, 1) first_char FROM accounts a), \
table2 AS (SELECT CASE WHEN first_char = first_char*10 THEN 'is_alpha' ELSE 'is_num' END new_col, COUNT(*) counts \
FROM table1 GROUP BY 1), \
table3 AS (SELECT counts FROM table2 WHERE new_col='is_num'), \
table4 AS (SELECT ((SELECT * FROM table3) / SUM(counts)) letter_prop FROM table2) \
SELECT letter_prop*100 letter_pct FROM table4;"
)

Query ran for 0.003988027572631836 secs!


Unnamed: 0,letter_pct
0,0.28


* Consider vowels as a, e, i, o, and u. What proportion of company names start with a vowel, and what percent start with anything else?

In [14]:
# Let's first see the number of names whose first letter is a vowel or not

query_to_df(
"WITH \
table1 AS (SELECT LEFT(a.name, 1) first_char FROM accounts a), \
table2 AS (SELECT CASE WHEN first_char IN ('A', 'E', 'I', 'O', 'U') THEN 'is_vowel' ELSE 'is_not' END vowel_or_not, \
COUNT(*) counts FROM table1 GROUP BY 1) \
SELECT * FROM table2;"
)

Query ran for 0.003993511199951172 secs!


Unnamed: 0,vowel_or_not,counts
0,is_not,271
1,is_vowel,80


In [15]:
# Next, let's calculate the proportion of vowels in the first letters of company names

query_to_df(
"WITH \
table1 AS (SELECT LEFT(name, 1) first_char FROM accounts), \
table2 AS (SELECT CASE WHEN first_char IN ('A', 'E', 'I', 'O', 'U') THEN 'is_vowel' ELSE 'is_not' END vowel_or_not, \
COUNT(*) counts FROM table1 GROUP BY 1), \
table3 AS (SELECT counts FROM table2 WHERE vowel_or_not='is_vowel'), \
table4 AS (SELECT ((select * FROM table3)/SUM(counts)) vowel_prop FROM table2) \
SELECT vowel_prop*100 vowel_pct FROM table4;"
)

Query ran for 0.0019936561584472656 secs!


Unnamed: 0,vowel_pct
0,22.79


### POSITION
### STRPOS
### LOWER
### UPPER

**POSITION** takes a character and a column, and provides the index where that character is for each row. The index of the first position is 1 in SQL. If you come from another programming language, many begin indexing at 0. Here, you saw that you can pull the index of a comma as **`POSITION(',' IN city_state)`**.


**STRPOS** provides the same result as POSITION, but the syntax for achieving those results is a bit different as shown here: **`STRPOS(city_state, ',')`**.


Note, both POSITION and STRPOS are case sensitive, so looking for A is different than looking for a.


Therefore, if you want to pull an index regardless of the case of a letter, you might want to use **LOWER or UPPER** to make all of the characters lower or uppercase.

**Position Quizzes**

* Use the accounts table to create first and last name columns that hold the first and last names for the primary_poc.

In [16]:
query_to_df(
"SELECT LEFT(primary_poc, POSITION(' ' IN primary_poc)-1) first_name, \
RIGHT(primary_poc, LENGTH(primary_poc) - POSITION(' ' IN primary_poc)) last_name FROM accounts;"
)

Query ran for 0.49476146697998047 secs!


Unnamed: 0,first_name,last_name
0,Tamara,Tuma
1,Sung,Shields
2,Jodee,Lupo
3,Serafina,Banda
4,Angeles,Crusoe
...,...,...
346,Buffy,Azure
347,Esta,Engelhardt
348,Khadijah,Riemann
349,Deanne,Hertlein


* Now see if you can do the same thing for every rep name in the sales_reps table. Again provide first and last name columns.

In [17]:
query_to_df(
"SELECT LEFT(name, POSITION(' ' IN name)-1) first_name, \
RIGHT(name, LENGTH(name) - POSITION(' ' IN name)+1) last_name FROM sales_reps;"
)

Query ran for 0.05186176300048828 secs!


Unnamed: 0,first_name,last_name
0,Samuel,Racine
1,Eugena,Esser
2,Michel,Averette
3,Renetta,Carew
4,Cara,Clarke
...,...,...
45,Elwood,Shutt
46,Maryanna,Fiorentino
47,Georgianna,Chisholm
48,Micha,Woodford


### CONCAT or Piping ||
### REPLACE

Each of **Concat/Piping** will allow you to combine columns together across rows. In this video, you saw how first and last names stored in separate columns could be combined together to create a full name: 
```
CONCAT(first_name, ' ', last_name)
````
or with piping as 
```
first_name || ' ' || last_name.
```

**Replace** takes a column and the value to replace as well as the new value to input instead for example
```
REPLACE(name, ' ', '_')
```
Where name is the column of interest and we're replacing the spaces with underscores for each row in this column. 

**Quizzes CONCAT**

* Each company in the accounts table wants to create an email address for each primary_poc. The email address should be the first name of the primary_poc . last name primary_poc @ company name .com.

In [18]:
query_to_df(
"WITH \
table1 AS (SELECT LEFT(primary_poc, POSITION(' ' IN primary_poc)-1) first_name, \
RIGHT(primary_poc, LENGTH(primary_poc)-POSITION(' ' IN primary_poc)) last_name, \
REPLACE(name, ' ', '') company FROM accounts), \
table2 AS (SELECT *, CONCAT(LOWER(first_name), '.', LOWER(last_name), '@', LOWER(company), '.com') email FROM table1) \
SELECT * FROM table2 LIMIT 20;"
)

Query ran for 0.04199409484863281 secs!


Unnamed: 0,first_name,last_name,company,email
0,Tamara,Tuma,Walmart,tamara.tuma@walmart.com
1,Sung,Shields,ExxonMobil,sung.shields@exxonmobil.com
2,Jodee,Lupo,Apple,jodee.lupo@apple.com
3,Serafina,Banda,BerkshireHathaway,serafina.banda@berkshirehathaway.com
4,Angeles,Crusoe,McKesson,angeles.crusoe@mckesson.com
5,Savanna,Gayman,UnitedHealthGroup,savanna.gayman@unitedhealthgroup.com
6,Anabel,Haskell,CVSHealth,anabel.haskell@cvshealth.com
7,Barrie,Omeara,GeneralMotors,barrie.omeara@generalmotors.com
8,Kym,Hagerman,FordMotor,kym.hagerman@fordmotor.com
9,Jamel,Mosqueda,AT&T,jamel.mosqueda@at&t.com


* We would also like to create an initial password, which they will change after their first log in. The first password will be<br> 
a. The first letter of the primary_poc's first name (lowercase), then<br> 
b. The last letter of their first name (lowercase), <br>
c. The first letter of their last name (lowercase), <br>
d. The last letter of their last name (lowercase), <br>
e. The number of letters in their first name, <br>
f. The number of letters in their last name, and then <br> 
g. The name of the company they are working with, all capitalized with no spaces.

In [19]:
query_to_df(
"WITH \
table1 AS (SELECT LOWER(LEFT(primary_poc, POSITION(' ' IN primary_poc)-1)) first_name, \
LOWER(RIGHT(primary_poc, LENGTH(primary_poc)-POSITION(' ' IN primary_poc))) last_name, \
UPPER(REPLACE(name, ' ', '')) company FROM accounts), \
\
table2 AS (SELECT *, CONCAT(LEFT(first_name, 1), RIGHT(first_name, 1), \
LEFT(last_name, 1), RIGHT(last_name, 1), LENGTH(first_name), LENGTH(last_name), company) signature FROM table1) \
\
SELECT * FROM table2;"
)

Query ran for 0.46733903884887695 secs!


Unnamed: 0,first_name,last_name,company,signature
0,tamara,tuma,WALMART,tata64WALMART
1,sung,shields,EXXONMOBIL,sgss47EXXONMOBIL
2,jodee,lupo,APPLE,jelo54APPLE
3,serafina,banda,BERKSHIREHATHAWAY,saba85BERKSHIREHATHAWAY
4,angeles,crusoe,MCKESSON,asce76MCKESSON
...,...,...,...,...
346,buffy,azure,KKR,byae55KKR
347,esta,engelhardt,ONEOK,eaet410ONEOK
348,khadijah,riemann,NEWMONTMINING,khrn87NEWMONTMINING
349,deanne,hertlein,PPL,dehn68PPL


## Cast
## Casting with ::

You can change a string to a date using CAST. CAST is actually useful to change lots of column types. Commonly you might use CAST to change a string to a Datetime object or number. In the reverse, if we want to change a number to a string, whatever operations

**Expert Tip**
Most of the functions presented in this lesson are specific to strings. They won’t work with dates, integers or floating-point numbers. However, using any of these functions will automatically change the data to the appropriate type.

LEFT, RIGHT, and TRIM are all used to select only certain elements of strings, but using them to select elements of a number or date will treat them as strings for the purpose of the function. Though we didn't cover TRIM in this lesson explicitly, it can be used to remove characters from the beginning and end of a string. This can remove unwanted spaces at the beginning or end of a row that often happen with data being moved from Excel or other storage systems.

There are a number of variations of these functions, as well as several other string functions not covered here. Different databases use subtle variations on these functions, so be sure to look up the appropriate database’s syntax if you’re connected to a private database.The Postgres literature contains a lot of the related functions.

**CAST Quizzes:**<br>
For this set of quiz questions, you are going to be working with a single table from a different database. This is a different database than Parch & Posey. We shall use the San-Francisco crime-data for this exercise

In [20]:
# Let's tell MySQL that we want to use a different DataBase, the crime_data Database

query_to_df(
"USE crime_data;"
)

Query ran for 0.0 secs!


In [21]:
# Let's see the tables in the crime_data database

# let's run the show tables command 

cursor.execute('show tables')
out = cursor.fetchall()
out

[('sf_crime_data',)]

In [22]:
# Let's see the first few rows of the sf_crime_data table

query_to_df(
"SELECT * FROM sf_crime_data LIMIT 5;"
)

Query ran for 0.15931248664855957 secs!


Unnamed: 0,id,incidnt_num,category,descript,day_of_week,dates,times,pd_district,resolution,address,lon,lat,location
0,1,140000000,VEHICLE THEFT,STOLEN AND RECOVERED VEHICLE,Friday,01/31/2014 08:00:00 AM +0000,0 days 17:00:00,INGLESIDE,NONE,0 Block of GARRISON AV,-122.413628,37.709724,"(37.709725805163, -122.413623946206)"
1,2,140000000,ASSAULT,BATTERY,Friday,01/31/2014 08:00:00 AM +0000,0 days 17:45:00,TARAVAL,"ARREST, CITED",100 Block of FONT BL,-122.473709,37.715488,"(37.7154876086057, -122.47370623066)"
2,3,140000000,SUSPICIOUS OCC,SUSPICIOUS OCCURRENCE,Friday,01/31/2014 08:00:00 AM +0000,0 days 15:30:00,PARK,NONE,0 Block of CASTRO ST,-122.435722,37.768688,"(37.7686887134351, -122.435718550322)"
3,4,140000000,OTHER OFFENSES,"DRIVERS LICENSE, SUSPENDED OR REVOKED",Friday,01/31/2014 08:00:00 AM +0000,0 days 17:50:00,CENTRAL,"ARREST, CITED",JEFFERSON ST / POWELL ST,-122.412529,37.808624,"(37.8086250595467, -122.412527239682)"
4,5,140000000,DRUG/NARCOTIC,POSSESSION OF NARCOTICS PARAPHERNALIA,Friday,01/31/2014 08:00:00 AM +0000,0 days 19:20:00,SOUTHERN,"ARREST, BOOKED",0 Block of GRACE ST,-122.414635,37.775082,"(37.7750814399634, -122.414633686589)"


**Let's clean the dates column to the format that dates should use in SQL**

In [23]:
query_to_df(
"WITH \
t1 AS (SELECT dates, SUBSTR(dates, 1, 10) AS date1 FROM sf_crime_data), \
t2 AS (SELECT dates, CONCAT(RIGHT(date1, 4), '-', LEFT(date1, 2), '-', SUBSTR(date1, 4, 2)) new_date FROM t1), \
t3 AS (SELECT dates, CAST(new_date AS date) new_date FROM t2) \
SELECT * FROM t3 LIMIT 5;"
)

Query ran for 0.0059854984283447266 secs!


Unnamed: 0,dates,new_date
0,01/31/2014 08:00:00 AM +0000,2014-01-31
1,01/31/2014 08:00:00 AM +0000,2014-01-31
2,01/31/2014 08:00:00 AM +0000,2014-01-31
3,01/31/2014 08:00:00 AM +0000,2014-01-31
4,01/31/2014 08:00:00 AM +0000,2014-01-31


## COALESCE:

In general, COALESCE returns the first non-NULL value passed for each row. It's simply a function to replace NULL values with a certain value for each row.

In [24]:
# Let's change back to Parch-and-Posey

query_to_df(
"USE parch_and_posey;"
)

Query ran for 0.0 secs!


In [25]:
# Let's see the tables again

query_to_df(
"SHOW TABLES;"
)

Query ran for 0.005975961685180664 secs!


Unnamed: 0,Tables_in_parch_and_posey
0,accounts
1,orders
2,region
3,sales_reps
4,web_events


In [26]:
query_to_df(
"SELECT * FROM accounts a LEFT JOIN orders o ON a.id = o.account_id WHERE o.total IS NULL;"
)

Query ran for 0.06282496452331543 secs!


Unnamed: 0,id,name,website,lat,longs,primary_poc,sales_rep_id,account_id,occurred_at,standard_qty,gloss_qty,poster_qty,total,standard_amt_usd,gloss_amt_usd,poster_amt_usd,total_amt_usd
0,,Goldman Sachs Group,www.gs.com,40.75744399,-73.96730918,Loris Manfredi,321690,,,,,,,,,,


In [27]:
query_to_df(
"SELECT COALESCE(o.id, a.id) filled_id, a.name, a.website, a.lat, a.longs, a.primary_poc, a.sales_rep_id, o.* \
FROM accounts a LEFT JOIN orders o ON a.id = o.account_id WHERE o.total IS NULL;"
)

Query ran for 0.08676481246948242 secs!


Unnamed: 0,filled_id,name,website,lat,longs,primary_poc,sales_rep_id,id,account_id,occurred_at,standard_qty,gloss_qty,poster_qty,total,standard_amt_usd,gloss_amt_usd,poster_amt_usd,total_amt_usd
0,1731,Goldman Sachs Group,www.gs.com,40.75744399,-73.96730918,Loris Manfredi,321690,,,,,,,,,,,


In [28]:
query_to_df(
"SELECT COALESCE(o.id, a.id) id, a.name, a.website, a.lat, a.longs, a.primary_poc, a.sales_rep_id, \
COALESCE(o.account_id, a.id) account_id, o.occurred_at, o.standard_qty, o.gloss_qty, o.poster_qty, o.total, \
o.standard_amt_usd, o.gloss_amt_usd, o.poster_amt_usd, o.total_amt_usd FROM accounts a LEFT JOIN orders o \
ON a.id = o.account_id WHERE o.total IS NULL;"
)

Query ran for 0.10770726203918457 secs!


Unnamed: 0,id,name,website,lat,longs,primary_poc,sales_rep_id,account_id,occurred_at,standard_qty,gloss_qty,poster_qty,total,standard_amt_usd,gloss_amt_usd,poster_amt_usd,total_amt_usd
0,1731,Goldman Sachs Group,www.gs.com,40.75744399,-73.96730918,Loris Manfredi,321690,1731,,,,,,,,,


In [29]:
query_to_df(
"SELECT COALESCE(o.id, a.id) filled_id, a.name, a.website, a.lat, a.longs, a.primary_poc, a.sales_rep_id, \
COALESCE(o.account_id, a.id) account_id, o.occurred_at, COALESCE(o.standard_qty, 0) standard_qty, \
COALESCE(o.gloss_qty,0) gloss_qty, COALESCE(o.poster_qty,0) poster_qty, COALESCE(o.total,0) total, \
COALESCE(o.standard_amt_usd,0) standard_amt_usd, COALESCE(o.gloss_amt_usd,0) gloss_amt_usd, \
COALESCE(o.poster_amt_usd,0) poster_amt_usd, COALESCE(o.total_amt_usd,0) total_amt_usd FROM accounts a \
LEFT JOIN orders o ON a.id = o.account_id WHERE o.total IS NULL;"
)

Query ran for 0.10502767562866211 secs!


Unnamed: 0,filled_id,name,website,lat,longs,primary_poc,sales_rep_id,account_id,occurred_at,standard_qty,gloss_qty,poster_qty,total,standard_amt_usd,gloss_amt_usd,poster_amt_usd,total_amt_usd
0,1731,Goldman Sachs Group,www.gs.com,40.75744399,-73.96730918,Loris Manfredi,321690,1731,,0,0,0,0,0.0,0.0,0.0,0.0


In [30]:
query_to_df(
"SELECT COUNT(*) FROM accounts a LEFT JOIN orders o ON a.id = o.account_id;"
)

Query ran for 0.009973526000976562 secs!


Unnamed: 0,COUNT(*)
0,6913


In [31]:
query_to_df(
"SELECT COALESCE(o.id, a.id) filled_id, a.name, a.website, a.lat, a.longs, a.primary_poc, a.sales_rep_id, \
COALESCE(o.account_id, a.id) account_id, o.occurred_at, COALESCE(o.standard_qty, 0) standard_qty, \
COALESCE(o.gloss_qty,0) gloss_qty, COALESCE(o.poster_qty,0) poster_qty, COALESCE(o.total,0) total, \
COALESCE(o.standard_amt_usd,0) standard_amt_usd, COALESCE(o.gloss_amt_usd,0) gloss_amt_usd, \
COALESCE(o.poster_amt_usd,0) poster_amt_usd, COALESCE(o.total_amt_usd,0) total_amt_usd \
FROM accounts a LEFT JOIN orders o ON a.id = o.account_id;"
)

Query ran for 64.12787938117981 secs!


Unnamed: 0,filled_id,name,website,lat,longs,primary_poc,sales_rep_id,account_id,occurred_at,standard_qty,gloss_qty,poster_qty,total,standard_amt_usd,gloss_amt_usd,poster_amt_usd,total_amt_usd
0,4318,Walmart,www.walmart.com,40.23849561,-75.10329704,Tamara Tuma,321500,1001,2016-11-25 23:19:37,485,543,177,1205,2420.15,4067.07,1437.24,7924.46
1,4317,Walmart,www.walmart.com,40.23849561,-75.10329704,Tamara Tuma,321500,1001,2016-09-26 23:22:47,507,614,226,1347,2529.93,4598.86,1835.12,8963.91
2,4316,Walmart,www.walmart.com,40.23849561,-75.10329704,Tamara Tuma,321500,1001,2016-08-28 06:50:58,557,572,255,1384,2779.43,4284.28,2070.60,9134.31
3,4315,Walmart,www.walmart.com,40.23849561,-75.10329704,Tamara Tuma,321500,1001,2016-07-30 03:21:57,457,532,249,1238,2280.43,3984.68,2021.88,8286.99
4,4314,Walmart,www.walmart.com,40.23849561,-75.10329704,Tamara Tuma,321500,1001,2016-05-31 21:09:48,531,603,209,1343,2649.69,4516.47,1697.08,8863.24
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6908,6812,United Natural Foods,www.unfi.com,36.17010987,-115.14713633,Savanna Gayman,321920,4341,2014-06-05 01:04:56,18,42,106,166,89.82,314.58,860.72,1265.12
6909,6811,United Natural Foods,www.unfi.com,36.17010987,-115.14713633,Savanna Gayman,321920,4341,2014-04-07 22:39:01,0,78,51,129,0.00,584.22,414.12,998.34
6910,6810,United Natural Foods,www.unfi.com,36.17010987,-115.14713633,Savanna Gayman,321920,4341,2014-03-08 15:33:55,0,25,39,64,0.00,187.25,316.68,503.93
6911,6809,United Natural Foods,www.unfi.com,36.17010987,-115.14713633,Savanna Gayman,321920,4341,2014-02-07 16:28:21,0,53,17,70,0.00,396.97,138.04,535.01


In [32]:
# Change False to True to end the connection when done.

if True and connection.is_connected():
    cursor.close()
    connection.close()
    print(f'Connection to Database: {record} Closed!')

Connection to Database: ('parch_and_posey',) Closed!
