SELECTING DATA LAB

A primary key is a unique identifier for a table. That is, there can only be unique values for this column entry.

you'll see that the columns that are the primary key for one table can also appear on other tables. This is known as a foreign key aka the primary key from a different ("foreign") table. This is the core idea of how data on different tables is associated in a relational database. If you were told a specific customerNumber, and then given a list of order data that included the customerNumber, you could determine which orders were placed by that customer by matching up the primary and foreign keys.

Once you're connected to the database, you can then read and select data from the database, or even write data to the database. To retrieve data from one or more tables you usually use a SELECT statement. 

For now, just notice that queries start with the SELECT clause, followed by what you want to select. If selecting multiple columns, you separate them with a comma. Then you specify where that data is being retrieved from the using the FROM clause followed by the table name. Afterward, you can provide conditions such as filters or limits on the amount of data returned.

Cursor objects allow you to keep track of which result set is which since it's possible to run multiple queries before you're done fetching the results of the first.

In [None]:
# connect database and create cursor here
import sqlite3 
conn = sqlite3.connect('data.sqlite')
cur = conn.cursor()

In [None]:
cur.execute("""SELECT * FROM employees LIMIT 5;""")

The execute command itself only returns the cursor object. To see the results, you must use the fetchall method afterwards.

In [None]:
cur.fetchall()

#  or just do 

cur.execute("""SELECT * FROM employees LIMIT 5;""").fetchall()

Often, a more convenient output will be to turn these results into pandas DataFrames. To do this, you simply wrap the c.fetchall() output with a pandas DataFrame constructor:

In [None]:
import pandas as pd

cur.execute("""SELECT * FROM employees LIMIT 5;""")
df = pd.DataFrame(cur.fetchall())
df.head()

we end up with a dataframe that has numbers as the labels of the columns. So we call cur.description ...

In [None]:
cur.execute("""SELECT * FROM employees LIMIT 5;""")
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df.head()

In general, the WHERE clause filters query results by some condition. As you are starting to see, you can also combine multiple conditions. Here we are looking at Boston in our city column

In [None]:
cur.execute("""SELECT * FROM customers WHERE city = 'Boston';""")
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df

we can add an Or or And to select multiple columns

In [None]:
cur.execute("""SELECT * FROM customers WHERE city = 'Boston' OR city = 'Madrid';""")
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df

Two additional keywords that you can use to refine your searches are the ORDER BY and LIMIT clauses. The order by clause allows you to sort the results by a particular feature. For example, you could sort by the customerName column if you wished to get results in alphabetical order. By default, ORDER BY is ascending. So, as with the above example, if you want the opposite, use the additional parameter DESC. Finally, the limit clause is typically the last argument in a SQL query and simply limits the output to a set number of results.

In [None]:
cur.execute("""SELECT customerNumber, customerName, city, creditLimit
               FROM customers
               WHERE (city = 'Boston' OR city = 'Madrid') AND (creditLimit >= 50000.00)
               ORDER BY creditLimit DESC
               LIMIT 15
               ;""")
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df

make sure to check the type of the values if they are not working out when they should. Sometimes numbers can come back as strings

In [None]:
type(df.creditLimit.iloc[0]) #to check the type

TEST LAB

In [None]:
import sqlite3
import pandas as pd
conn = sqlite3.connect('planets.db')
cur = conn.cursor()

In [None]:
cur.execute(""" SELECT name, color FROM planets; """)
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]         #this will go through the column where x goes through the headers and applies the titles
df

If you want to get the DataFrame with just the titles you want do this

In [None]:
cur.execute(""" SELECT name, color FROM planets; """)
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df

if you want all the titles do this 

In [None]:
cur.execute(""" SELECT * FROM planets WHERE mass  > 1.00 ; """)
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df

if we need to do a simple WHERE statement we can do this

In [None]:
cur.execute(""" SELECT name, color FROM planets WHERE num_of_moons > 10 ; """)
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df

the details of the values need to be setup as strings if they aren't int

In [None]:
cur.execute(""" SELECT name, color FROM planets WHERE 
            color = 'blue' OR 
            color = 'dark blue' OR 
            color = 'light blue' 
            ; """)
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df

In [None]:
Here we can see how to do ORDER BY and set a list DESCending and with a LIMIT

In [None]:
cur.execute(""" SELECT name, color, num_of_moons 
            FROM planets 
            WHERE rings = 0 
            ORDER BY mass 
            DESC
            LIMIT 4; """)
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df

FILTERING AND ORDERING

The first query modifier you'll explore is ORDER BY. This modifier allows us to order the table rows returned by a certain SELECT statement. Here's a boilerplate SELECT statement that uses ORDER BY:


In [None]:
cur.execute('''SELECT column_name FROM table_name ORDER BY column_name ASC|DESC;''').fetchall()

Let's select our cats and order them by age:

In [None]:
cur.execute('''SELECT * FROM cats ORDER BY age;''').fetchall()

Here we are able to find all the information of the cats and order it by age ascending 

In [None]:
cur.execute('''SELECT * FROM cats ORDER BY age ASC;''').fetchall()

In [None]:
What if you want the oldest cat? If you want to select extremes from a database table––for example, the employee with the highest paycheck or the patient with the most recent appointment––we can use ORDER BY in conjunction with LIMIT

putting LIMIT 1 or .fetchone() will return the first index

If you need to get specific ranges between two sets of information you can use between like this

SELECT column_name(s) FROM table_name WHERE column_name BETWEEN value1 AND value2;



In [None]:
cur.execute("""SELECT name FROM cats WHERE age BETWEEN 1 AND 3;""").fetchall()

Some cats were added to the Database that weren't given a name. Let's find them with:

SELECT * FROM cats WHERE Name IS null;

Now, let's talk about the SQL aggregate function COUNT.

SQL aggregate functions are SQL statements that can get the average of a column's values, retrieve the minimum and maximum values from a column, sum values in a column, or count a number of records that meet certain conditions. You can learn more about these SQL aggregators here and here.

For now, we'll just focus on COUNT, which counts the number of records that meet a certain condition. Here's a standard SQL query using COUNT:

SELECT COUNT([column name]) FROM [table name] WHERE [column name] = [value]

Let's try it out and count the number of cats who have an owner_id of 1:

SELECT COUNT(owner_id) FROM cats WHERE owner_id = 1;

In [None]:
cur.execute("""SELECT COUNT(owner_id) FROM cats WHERE owner_id = 1;""").fetchall() #will allow us to see an int count for a value set against

what if you had a larger database where you couldn't just tally up the number of cats grouped by breed? That's where — you guessed it! — GROUP BY comes in handy.

In [None]:
cur.execute("""SELECT breed, COUNT(breed) FROM cats GROUP BY breed;""").fetchall()

returns a list of ('American Shorthair', 1), ('Calico', 1), ('Scottish Fold', 1), ('Tabby', 3)

GROUP BY is a great function for aggregating results into different segments — you can even use it on multiple columns!

In [None]:
cur.execute("""SELECT breed, owner_id, COUNT(breed) FROM cats GROUP BY breed, owner_id;""").fetchall()

We are now familiar with this syntax:

SELECT name FROM cats;

However, you may not know that this can be written like this as well:

SELECT cats.name FROM cats;

Both return:

[('Maru',), ('Hana',), ("Lil' Bub",), ('Moe',), ('Patches',), (None,)] 

SQLite allows us to explicitly state the tableName.columnName you want to select. This is particularly useful when you want data from two different tables.

Imagine you have another table called dogs with a column containing all of the dog names:

CREATE TABLE dogs (
    id INTEGER PRIMARY KEY,
    name TEXT
);

INSERT INTO dogs (name) VALUES ("Clifford");

If you want to get the names of all the dogs and cats, you can no longer run a query with just the column name. SELECT name FROM cats,dogs; will return Error: ambiguous column name: name.

Instead, you must explicitly follow the tableName.columnName syntax.

SELECT cats.name, dogs.name FROM cats, dogs;

In [None]:
[{Role[{name{position{stats[{Blocks, Kills, Aces}]}}}]}]

In [173]:
ListDictListDict = [
{'Hitters': [{'Guy': {'Middle': [{'Blocks': 16, 'Kills': 10, 'Aces': 6}]}}, 
             {'Leo': {'Outside': [{'Blocks': 10, 'Kills': 17, 'Aces': 11}]}},
             {'Devon': {'Outside': [{'Blocks':3, 'Kills': 6, 'Aces': 3}]}},
             {'Liz': {'Middle': [{'Block': 2, 'Kills': 2, 'Aces': 2}]}}                           ]}, 

{'Setters': [{'Sarah': {'Setter': [{'Sets': 30}]}}
            ]}, 

{'Libero': [{'Tom': {'Libero': [{'Digs': 20}]}}]
            }]

In [2]:
Guy = [{'Hitters': [{'Guy': {'Middle': [{'Blocks': 16, 'Kills': 10, 'Aces': 6}]}}]}]

In [23]:
ListDictListDict[0]['Hitters'][1]['Leo']['Outside'][0]['Kills']

17

In [180]:
Y[0].items()

dict_items([('Hitters', [{'Guy': {'Middle': [{'Blocks': 16, 'Kills': 10, 'Aces': 6}]}}, {'Leo': {'Outside': [{'Blocks': 10, 'Kills': 17, 'Aces': 11}]}}, {'Devon': {'Outside': [{'Blocks': 3, 'Kills': 6, 'Aces': 3}]}}, {'Liz': {'Middle': [{'Block': 2, 'Kills': 2, 'Aces': 2}]}}])])

In [174]:
Y = ListDictListDict
for key, value in Y[0].iteritems():
    print("{0} = {1}".format(key, value))

AttributeError: 'dict' object has no attribute 'iteritems'

When going through a list of dictionaries or vice verse we want to keep an eye on our brackets and curly brackets. Everytime you see a bracket you will need to open it up with an index number, as you are pulling the element out of the list. If it is a dictionary you are going through you want to ask what the keys, values, and items are so that you can better parse where you are going to be going into as you are diving in. Especially in larger dictionaries you need to have an idea of where you are going when the data is too large. 

In [115]:
def findkeys():
    for x in Y[0].values():
        return x 

In [116]:
findkeys()

[{'Guy': {'Middle': [{'Blocks': 16, 'Kills': 10, 'Aces': 6}]}},
 {'Leo': {'Outside': [{'Blocks': 10, 'Kills': 17, 'Aces': 11}]}},
 {'Devon': {'Outside': [{'Blocks': 3, 'Kills': 6, 'Aces': 3}]}},
 {'Liz': {'Middle': [{'Block': 2, 'Kills': 2, 'Aces': 2}]}}]

In [151]:
book = {
        'title': 'The Great Gatsby',
        'author': 'F. Scott Fitzgerald',
        'date_published': 1925,
        'in_stock': True
}

for key in book.items():
    key[0]
    print(key)

('title', 'The Great Gatsby')
('author', 'F. Scott Fitzgerald')
('date_published', 1925)
('in_stock', True)


In [134]:
for x in Y:
    print(x, Y[x])

AttributeError: 'str' object has no attribute 'keys'

In [126]:
for x in ListDictListDict[0]['Hitters'][0]['Guy']['Middle']:
    y = list(x.items())
    #print(x)
    print(y)

[('Blocks', 16), ('Kills', 10), ('Aces', 6)]


In [157]:
for x in Y[0]['Hitters']:
    print(x)

{'Guy': {'Middle': [{'Blocks': 16, 'Kills': 10, 'Aces': 6}]}}
{'Leo': {'Outside': [{'Blocks': 10, 'Kills': 17, 'Aces': 11}]}}
{'Devon': {'Outside': [{'Blocks': 3, 'Kills': 6, 'Aces': 3}]}}
{'Liz': {'Middle': [{'Block': 2, 'Kills': 2, 'Aces': 2}]}}


In [190]:
consoles =  {'Microsoft': {'name': 'Xbox', 'price': 500},
            'Sony': {'name': 'Playstation', 'price': 550},
            'Nintendo': {'name': 'Switch', 'price': 450}}

In [192]:
for z in consoles:
    print(z)

Microsoft
Sony
Nintendo


In [191]:
for x in consoles:
    print(consoles[x])

{'name': 'Xbox', 'price': 500}
{'name': 'Playstation', 'price': 550}
{'name': 'Switch', 'price': 450}


In [193]:
consoles.items()

dict_items([('Microsoft', {'name': 'Xbox', 'price': 500}), ('Sony', {'name': 'Playstation', 'price': 550}), ('Nintendo', {'name': 'Switch', 'price': 450})])

In [197]:
consoles.keys()

dict_keys(['Microsoft', 'Sony', 'Nintendo'])

In [201]:
consoles.values()

dict_values([{'name': 'Xbox', 'price': 500}, {'name': 'Playstation', 'price': 550}, {'name': 'Switch', 'price': 450}])

In [196]:
for i in consoles:
    print(consoles[i]['price'])

500
550
450


In [203]:
consolesexpanded =  {'Microsoft': {'name': ['Xbox', 'Xbox 360', 'Xbox One', 'Xbox X']                    , 'price': 500},
                    'Sony': {'name': ['Playstation', 'Playstation 2', 'Playstation                          3', 'Playstation 4', 'Playstation 5'], 'price': 550},
                    'Nintendo': {'name': ['N64', 'Gamecube', 'Wii', 'Wii U',                                'Switch'], 'price': 450}}

In [204]:
CE = consolesexpanded

In [205]:
for z in CE:
    print(z)

Microsoft
Sony
Nintendo


In [210]:
for x in CE:
    print(CE[x])

{'name': ['Xbox', 'Xbox 360', 'Xbox One', 'Xbox X'], 'price': 500}
{'name': ['Playstation', 'Playstation 2', 'Playstation                          3', 'Playstation 4', 'Playstation 5'], 'price': 550}
{'name': ['N64', 'Gamecube', 'Wii', 'Wii U', 'Switch'], 'price': 450}


In [207]:
CE.items()

dict_items([('Microsoft', {'name': ['Xbox', 'Xbox 360', 'Xbox One', 'Xbox X'], 'price': 500}), ('Sony', {'name': ['Playstation', 'Playstation 2', 'Playstation                          3', 'Playstation 4', 'Playstation 5'], 'price': 550}), ('Nintendo', {'name': ['N64', 'Gamecube', 'Wii', 'Wii U', 'Switch'], 'price': 450})])

In [208]:
for i in CE:
    print(CE[i]['name'])

['Xbox', 'Xbox 360', 'Xbox One', 'Xbox X']
['Playstation', 'Playstation 2', 'Playstation                          3', 'Playstation 4', 'Playstation 5']
['N64', 'Gamecube', 'Wii', 'Wii U', 'Switch']


In [214]:
for i in CE:
    print(CE[i]['name'][0])

Xbox
Playstation
N64


In [None]:
for x in CEA[]

In [331]:
for i in CE:
    for x in CE[i]:
        print(x)

name
price
name
price
name
price


In [256]:
consolesexpandedagain =  {'Microsoft': {'name': ['Xbox', 'Xbox 360', 'Xbox One', 'Xbox X'], 'price': [350, 400, 450, 500]},
'Sony': {'name': ['Playstation', 'Playstation 2', 'Playstation 3', 'Playstation 4', 'Playstation 5'], 'price': [350, 400, 450, 500, 550]},
'Nintendo': {'name': ['N64', 'Gamecube', 'Wii', 'Wii U',                                'Switch'], 'price': [250, 300, 350, 400, 450]}}


In [330]:
CEA['Microsoft']

{'name': ['Xbox', 'Xbox 360', 'Xbox One', 'Xbox X'],
 'price': [350, 400, 450, 500]}

In [257]:
CEA = consolesexpandedagain

In [321]:
CEA.items()

dict_items([('Microsoft', {'name': ['Xbox', 'Xbox 360', 'Xbox One', 'Xbox X'], 'price': [350, 400, 450, 500]}), ('Sony', {'name': ['Playstation', 'Playstation 2', 'Playstation 3', 'Playstation 4', 'Playstation 5'], 'price': [350, 400, 450, 500, 550]}), ('Nintendo', {'name': ['N64', 'Gamecube', 'Wii', 'Wii U', 'Switch'], 'price': [250, 300, 350, 400, 450]})])

In [322]:
CEA.values()

dict_values([{'name': ['Xbox', 'Xbox 360', 'Xbox One', 'Xbox X'], 'price': [350, 400, 450, 500]}, {'name': ['Playstation', 'Playstation 2', 'Playstation 3', 'Playstation 4', 'Playstation 5'], 'price': [350, 400, 450, 500, 550]}, {'name': ['N64', 'Gamecube', 'Wii', 'Wii U', 'Switch'], 'price': [250, 300, 350, 400, 450]}])

In [326]:
CEA.keys()

dict_keys(['Microsoft', 'Sony', 'Nintendo'])

In [233]:
for i in CEA:
    for x in CEA[i]['price']:
        print(i, x)

Microsoft 350
Microsoft 400
Microsoft 450
Microsoft 500
Sony 350
Sony 400
Sony 450
Sony 500
Sony 550
Nintendo 250
Nintendo 300
Nintendo 350
Nintendo 400
Nintendo 450


In [263]:
for x in CEA:
    print(x)

Microsoft
Sony
Nintendo


In [266]:

for x in CEA:
    print(CEA[x])

{'name': ['Xbox', 'Xbox 360', 'Xbox One', 'Xbox X'], 'price': [350, 400, 450, 500]}
{'name': ['Playstation', 'Playstation 2', 'Playstation 3', 'Playstation 4', 'Playstation 5'], 'price': [350, 400, 450, 500, 550]}
{'name': ['N64', 'Gamecube', 'Wii', 'Wii U', 'Switch'], 'price': [250, 300, 350, 400, 450]}


In [288]:
for i in CEA:
    for x in CEA[i].items():
        print(x)

('name', ['Xbox', 'Xbox 360', 'Xbox One', 'Xbox X'])
('price', [350, 400, 450, 500])
('name', ['Playstation', 'Playstation 2', 'Playstation 3', 'Playstation 4', 'Playstation 5'])
('price', [350, 400, 450, 500, 550])
('name', ['N64', 'Gamecube', 'Wii', 'Wii U', 'Switch'])
('price', [250, 300, 350, 400, 450])


In [296]:
for i in CEA:
    for x in CEA[i].values():
        print(x)

['Xbox', 'Xbox 360', 'Xbox One', 'Xbox X']
[350, 400, 450, 500]
['Playstation', 'Playstation 2', 'Playstation 3', 'Playstation 4', 'Playstation 5']
[350, 400, 450, 500, 550]
['N64', 'Gamecube', 'Wii', 'Wii U', 'Switch']
[250, 300, 350, 400, 450]


In [304]:
#games = []
for i in CEA:    
    for x in CEA[i].items():
        print(x[1])
        #games.append(x[1])

['Xbox', 'Xbox 360', 'Xbox One', 'Xbox X']
[350, 400, 450, 500]
['Playstation', 'Playstation 2', 'Playstation 3', 'Playstation 4', 'Playstation 5']
[350, 400, 450, 500, 550]
['N64', 'Gamecube', 'Wii', 'Wii U', 'Switch']
[250, 300, 350, 400, 450]


In [337]:
for i in CEA:
    for x in CEA[i]['name']:
        print(x)

Xbox
Xbox 360
Xbox One
Xbox X
Playstation
Playstation 2
Playstation 3
Playstation 4
Playstation 5
N64
Gamecube
Wii
Wii U
Switch


In [308]:
for i in CEA:
    for x in CEA[i]['price']:
        print(type(x))


<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>
<class 'int'>


In [309]:
for i in CEA:
    for x in CEA[i]['name']:
        print(type(x))

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


In [340]:
for x in CEA:
    print(CEA[x]['name'])
    


['Xbox', 'Xbox 360', 'Xbox One', 'Xbox X']
['Playstation', 'Playstation 2', 'Playstation 3', 'Playstation 4', 'Playstation 5']
['N64', 'Gamecube', 'Wii', 'Wii U', 'Switch']


In [345]:
[CEA[x]['name'] for x in CEA]

[['Xbox', 'Xbox 360', 'Xbox One', 'Xbox X'],
 ['Playstation',
  'Playstation 2',
  'Playstation 3',
  'Playstation 4',
  'Playstation 5'],
 ['N64', 'Gamecube', 'Wii', 'Wii U', 'Switch']]

In [310]:
consoleprices = []
for i in CEA:
    for x in CEA[i]['price']:
        consoleprices.append(x)

In [347]:
print(type(CEA['Sony']['name']))

<class 'list'>


In [311]:
consoleprices

[350, 400, 450, 500, 350, 400, 450, 500, 550, 250, 300, 350, 400, 450]

In [312]:
consolenames = []
for i in CEA:
    for x in CEA[i]['name']:
        consolenames.append(x)

In [313]:
consolenames

['Xbox',
 'Xbox 360',
 'Xbox One',
 'Xbox X',
 'Playstation',
 'Playstation 2',
 'Playstation 3',
 'Playstation 4',
 'Playstation 5',
 'N64',
 'Gamecube',
 'Wii',
 'Wii U',
 'Switch']

In [317]:
def merge(list_1, list_2):

    merged_list = tuple(zip(consolenames, consoleprices))
    return merged_list

In [320]:
merge(consolenames, consoleprices)

(('Xbox', 350),
 ('Xbox 360', 400),
 ('Xbox One', 450),
 ('Xbox X', 500),
 ('Playstation', 350),
 ('Playstation 2', 400),
 ('Playstation 3', 450),
 ('Playstation 4', 500),
 ('Playstation 5', 550),
 ('N64', 250),
 ('Gamecube', 300),
 ('Wii', 350),
 ('Wii U', 400),
 ('Switch', 450))

In [None]:
def pets_older_than(groomer_info, age):
    age_name = []
    for k,v in groomer_info['pets'][0].items():
        for d in v:
            if d['age'] > age:
                age_name.append(k)
    return age_name


In [None]:
DictDict =  {'Hitters':  {'Leo': {'Outside': {'Blocks': 10}}}}, 
            {'Setters': {'Sarah': {'Setter': {'Sets': 30}}}}, 
            {'Libero': {'Tom': {'Libero': {'Digs': 20}}}}

In [169]:
DictListDict = {'Hitters': [{'Guy': {'Middle': [{'Blocks': 16, 'Kills': 10, 'Aces': 6}]}}, 
             {'Leo': {'Outside': [{'Blocks': 10, 'Kills': 17, 'Aces': 11}]}},
             {'Devon': {'Outside': [{'Blocks':3, 'Kills': 6, 'Aces': 3}]}},
             {'Liz': {'Middle': [{'Block': 2, 'Kills': 2, 'Aces': 2}]}}                           ]}, 

{'Setters': {'Sarah': {'Setter': {'Sets': 30}}}
            }, 

{'Libero': {'Tom': {'Libero': {'Digs': 20}}}
            }

{'Libero': {'Tom': {'Libero': {'Digs': 20}}}}

In [170]:
DictListDict

({'Hitters': [{'Guy': {'Middle': [{'Blocks': 16, 'Kills': 10, 'Aces': 6}]}},
   {'Leo': {'Outside': [{'Blocks': 10, 'Kills': 17, 'Aces': 11}]}},
   {'Devon': {'Outside': [{'Blocks': 3, 'Kills': 6, 'Aces': 3}]}},
   {'Liz': {'Middle': [{'Block': 2, 'Kills': 2, 'Aces': 2}]}}]},)

In [172]:
DictListDict.keys

AttributeError: 'tuple' object has no attribute 'keys'