# SQLite
SQLite is a simple SQL option that runs on a local *file* instead of a local *server*. This makes things much easier because 
- you don't have to activate a server
- you don't need to deal with things like users, passwords, permissions, etc.
- It can also be saved to anything that can save data (e.g. a USB drive) and can be emailed.
- It is typically small and efficient
- You can store multiple datatypes in one column

However, 
- it is not as powerful as PostgreSQL
- lacks some fundamental features, such as several datatypes and commands (e.g. `RIGHT OUTER JOIN` or `FOR EACH`). 
- It also doesn't do well with big data, high volume, or multiple user environments. 

Check out [this website](https://www.digitalocean.com/community/tutorials/sqlite-vs-mysql-vs-postgresql-a-comparison-of-relational-database-management-systems) for more info.

## How to connect to a SQLite Database
We're going to do this with Pandas first, because if we're looking at some database we probably want to import it into Pandas eventually anyway

##### Step 1: Import sqlite3 and pandas

In [3]:
import sqlite3
import pandas as pd

##### Step 2: create a connection to the database.
Make sure the "`Chinook_Sqlite.sqlite`" file is in the same folder as this notebook, or include the relative filepath.

In [2]:
conn = sqlite3.connect('Chinook_Sqlite.sqlite')

**NOTE: if the "`Chinook_Sqlite.sqlite`" file is not there, it will NOT throw an error, it will create a new file with that name.**

##### Step 3: See what tables are in the database using pandas
You could skip this step if you know what table you're looking for

In [43]:
pd.read_sql("""SELECT * FROM sqlite_master
            WHERE type = 'table';""", #notice the triple quotes allow multi-line queries
           con = conn) #notice this is where we use the "conn" connection we made above.

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,Album,Album,2,CREATE TABLE [Album]\n(\n [AlbumId] INTEGER...
1,table,Artist,Artist,3,CREATE TABLE [Artist]\n(\n [ArtistId] INTEG...
2,table,Customer,Customer,4,CREATE TABLE [Customer]\n(\n [CustomerId] I...
3,table,Employee,Employee,7,CREATE TABLE [Employee]\n(\n [EmployeeId] I...
4,table,Genre,Genre,9,CREATE TABLE [Genre]\n(\n [GenreId] INTEGER...
5,table,Invoice,Invoice,10,CREATE TABLE [Invoice]\n(\n [InvoiceId] INT...
6,table,InvoiceLine,InvoiceLine,12,CREATE TABLE [InvoiceLine]\n(\n [InvoiceLin...
7,table,MediaType,MediaType,14,CREATE TABLE [MediaType]\n(\n [MediaTypeId]...
8,table,Playlist,Playlist,15,CREATE TABLE [Playlist]\n(\n [PlaylistId] I...
9,table,PlaylistTrack,PlaylistTrack,16,CREATE TABLE [PlaylistTrack]\n(\n [Playlist...


##### Step 4: Create a query with pandas and save to a variable

In [32]:
df = pd.read_sql("""SELECT * FROM ALBUM;""", con=conn)
df.head()

Unnamed: 0,AlbumId,Title,ArtistId
0,1,For Those About To Rock We Salute You,1
1,2,Balls to the Wall,2
2,3,Restless and Wild,2
3,4,Let There Be Rock,1
4,5,Big Ones,3


# PostgreSQL
PostgreSQL is a powerful, server-based SQL option. It can do just about anything and it's FREE because it's open-source. There's also an active community of people to help you troubleshoot. It's biggest disadvantages are that it isn't as popular as things like MySQL or Oracle, and there is can be a steep learning curve. Also, it can be over-powered for simple, read-heavy tasks so it can actually under-perform in those situations.

## How to connect to a PostgreSQL Database
We'll use sqlalchemy for this, because it interacts well with Pandas

##### Step 1: Import sqlalchemy and pandas

In [33]:
import sqlalchemy
import pandas as pd

**NOTE: An alternative is to use psycopg2 instead, but this is considered a bit out-dated.**

##### Step 2: create a connection (called "engine" in sqlalchemy) to the database.
The connection string follows the form:
`postgresql://"user":"password"@"host":"port"/"database"`

In [35]:
# We'll connect to the Nortwinds Database, which has the following parameters:
# User: dsi_student
# Password: gastudents
# Host: dsi.c20gkj5cvu3l.us-east-1.rds.amazonaws.com
# Port: 5432
# Database: northwind
engine = sqlalchemy.create_engine("postgresql://dsi_student:gastudents@dsi.c20gkj5cvu3l.us-east-1.rds.amazonaws.com:5432/northwind")

**NOTE: If you were using psycopg2, you'd use the syntax: `psycopg2.connect("SAME STRING AS ABOVE")`**

OR

**`psycopg2.connect(user='dsi_student', password='gastudents', host='dsi.c20gkj5cvu3l.us-east-1.rds.amazonaws.com', port='5432', database='northwind')`**

##### Step 3: See what tables are in the database using pandas
You could skip this step if you know what table you're looking for

In [44]:
pd.read_sql("""SELECT * FROM information_schema.tables
            WHERE table_schema = 'public';""",
           con = engine) #notice this is where we use the "engine" connection we made above.

Unnamed: 0,table_catalog,table_schema,table_name,table_type,self_referencing_column_name,reference_generation,user_defined_type_catalog,user_defined_type_schema,user_defined_type_name,is_insertable_into,is_typed,commit_action
0,northwind,public,categories,BASE TABLE,,,,,,YES,NO,
1,northwind,public,full_order_table5,BASE TABLE,,,,,,YES,NO,
2,northwind,public,fo,BASE TABLE,,,,,,YES,NO,
3,northwind,public,table_join,BASE TABLE,,,,,,YES,NO,
4,northwind,public,full_order,BASE TABLE,,,,,,YES,NO,
5,northwind,public,customercustomerdemo,BASE TABLE,,,,,,YES,NO,
6,northwind,public,customerdemographics,BASE TABLE,,,,,,YES,NO,
7,northwind,public,customers,BASE TABLE,,,,,,YES,NO,
8,northwind,public,employees,BASE TABLE,,,,,,YES,NO,
9,northwind,public,employeeterritories,BASE TABLE,,,,,,YES,NO,


##### Step 4: Create a query with pandas and save to a variable

In [45]:
df = pd.read_sql("""SELECT * FROM usstates;""", con=engine)
df.head()

Unnamed: 0,StateID,StateName,StateAbbr,StateRegion
0,1,Alabama,AL,south
1,2,Alaska,AK,north
2,3,Arizona,AZ,west
3,4,Arkansas,AR,south
4,5,California,CA,west


##### Note: you can create a table using `pd.read_sql()`, but it will give you an error, even when it works...

# SQL Magic Alternative
SQL Magic is nice because it doesn't require strings or specifying the connection each time. It's also useful for creating and writing to tables, which you will especially see in the next lesson.

##### Step 1: Load the SQL extension

In [46]:
%load_ext sql

##### Step 2:  Connect to a database
Use the exact same connection string as we used above with the sqlalchemy connection.

In [49]:
%sql postgresql://dsi_student:gastudents@dsi.c20gkj5cvu3l.us-east-1.rds.amazonaws.com:5432/northwind

u'Connected: dsi_student@northwind'

**NOTE: to connect to the sqlite database we used above, use this:**

`%sql sqlite:///Chinook_Sqlite.sqlite` (notice the three slashes)

##### Step 3: Test out a query
Note: a single `"%"` is LINE magic, and a double `"%%"` is CELL magic. CELL magic works on multiple lines, LINE magic works on a singe line

In [51]:
%%sql 
SELECT * FROM usstates
WHERE "StateRegion" = 'south'

9 rows affected.


StateID,StateName,StateAbbr,StateRegion
1,Alabama,AL,south
4,Arkansas,AR,south
10,Florida,FL,south
11,Georgia,GA,south
18,Kentucky,KY,south
19,Louisiana,LA,south
25,Mississippi,MS,south
26,Missouri,MO,south
49,West Virginia,WV,south


##### Step 4: Save the result of a query
NOTE: You must use LINE magic, not CELL magic, so you'll need to put a space and a backslash at the end of a line to continue onto a new one

In [57]:
result = %sql SELECT * FROM usstates \
WHERE "StateRegion" = 'south'

9 rows affected.


In [58]:
# Notice: "result" is a unique datatype, not a dataframe yet
type(result)

sql.run.ResultSet

##### Step 5: Make it into a dataframe and save as a variable

In [60]:
df = result.DataFrame()
df.head()

Unnamed: 0,StateID,StateName,StateAbbr,StateRegion
0,1,Alabama,AL,south
1,4,Arkansas,AR,south
2,10,Florida,FL,south
3,11,Georgia,GA,south
4,18,Kentucky,KY,south
