# Databases

The problems in this notebook expand upon the concepts covered in the notebook `Lectures/Data Collection/Data in Databases`.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

##### 1. Constructing a database table 

To better understand the structure of a relational database it can help to create one yourself. In this problem we demonstrate how.

Imagine we are running a cat store, we sell 3 products: cat food, cat treats, and cat toys. We want a database to keep track of who is buying what stuff from us.

Below we import the package we will need and then we create a `cat_store_practice` database in this folder.

In [None]:
from sqlalchemy import create_engine

In [None]:
## making the engine
engine = create_engine("sqlite:///cat_store_practice.db")

In [None]:
## Connecting to the database
conn = engine.connect()

The syntax for creating a brand new table in `SQL` is `CREATE TABLE table_name(columns)`. We demonstrate this below.

In [None]:
## CREATE TABLE is SQL code
## it creates a table with the given name, here products
## in parantheses we list the columns of our table
## along with the SQL data type
## The PRIMARY KEY line sets the product_id as the
## primary key for this table
## Think of a primary key as being equivalent to a pandas dataframe index
## The primary key allows us to link entries across tables
conn.execute("""CREATE TABLE products(
                    product_id int,
                    product text,
                    price real,
                    in_stock int,
                    PRIMARY KEY (product_id)
                )""")


## We can now add our first product
conn.execute("INSERT INTO products VALUES (1,'Cat Food',12.50,10)")

Use one of the `fetch` commands to check the contents of the `products` table.

In [None]:
## code here



In [None]:
## close the connection
conn.close()

## dispose of the engine
engine.dispose()

del conn,engine

##### 2. `inspect`

You can use `get_table_names` to see what tables are in the database to which you have connected. This is a part of `SQLAlchemy`'s `inspect` method, <a href="https://docs.sqlalchemy.org/en/14/core/inspection.html">https://docs.sqlalchemy.org/en/14/core/inspection.html</a>.

Here we demonstrate.

In [None]:
## import inspect
from sqlalchemy import inspect

In [None]:
## create the engine then connect
engine = create_engine("sqlite:///cat_store_real.db")
conn = engine.connect()

In [None]:
## inspecting the table allows you to
## use get_table_names
inspect(engine).get_table_names()

<b>Do not</b> end the connection or delete the engine, you will use in in the next problem.

##### 3. Using `WHERE`

Create a connection to the `cat_store_real` database in this folder.

Return all purchases in the `purchases` table with `pretax_price < 70` and `number_of_items > 3`.

In [None]:
## Code here




In [None]:
## Code here




##### 4. Introduction to `JOIN`s

One way to combine data from different tables is with a `JOIN` statement.

`JOIN`s work by finding column values that match between tables and returning the corresponding rows. Here is a typical `JOIN` statement:

<blockquote>
    
    SELECT columns FROM table1
    
    JOIN table2
    
    ON table1.match_column=table2.match_column
    
    WHERE logical_condition;    
</blockquote>

Here is an example where we add the customer names to each purchase.

In [None]:
results = conn.execute("""SELECT name, purchase_id, pretax_price FROM purchases
                            JOIN customers
                            ON purchases.customer_id=customers.customer_id""")

pd.DataFrame(results.fetchall(), columns=results.keys())

We should note that if the two tables you want to join share a column name you need to specify which one you want.

In [None]:
results = conn.execute("""SELECT name, purchases.customer_id, purchase_id, pretax_price FROM purchases
                            JOIN customers
                            ON purchases.customer_id=customers.customer_id""")

pd.DataFrame(results.fetchall(), columns=results.keys())

##### Practice

Try to answer the following using `JOIN` statements.

1. Who has made purchases?

2. Who has made the most purchases?

3. What are the emails of customers that have made purchases over $70?

In [None]:
##### Code here




In [None]:
##### Code here




In [None]:
##### code here





--------------------------

This notebook was written for the Erd&#337;s Institute C&#337;de Data Science Boot Camp by Matthew Osborne, Ph. D., 2022.

Any potential redistributors must seek and receive permission from Matthew Tyler Osborne, Ph.D. prior to redistribution. Redistribution of the material contained in this repository is conditional on acknowledgement of Matthew Tyler Osborne, Ph.D.'s original authorship and sponsorship of the Erdős Institute as subject to the license (see License.md)