## Lecture 3

In this demonstration, we will study structured query language (SQL). We will be using the [sqlalchemy library](http://docs.sqlalchemy.org/en/latest/core/tutorial.html) to interface with the database management system [sqlite](https://docs.python.org/2/library/sqlite3.html). You might find the [SQL Alchemy Quick Reference Sheet](https://www.pythonsheets.com/notes/python-sqlalchemy.html) to be  helpful.

In [1]:
# importing some packages

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

import pathlib

import sqlalchemy
import sqlite3

# changing some settings

pd.set_option('display.max_rows', 10)
pd.set_option('display.max_columns', 8)

%matplotlib inline
plt.rcParams['figure.figsize'] = (9,7)

## Connecting to the Database

We will create a SQLite database called `movies.db`. Before we create the database, we check for an existing database.

In [2]:
dbfile = pathlib.Path("movies.db")

if dbfile.exists():
    dbfile.unlink()

The `sqlite3` package allows us to store databases in memory or on disk. We will store the database on disk.

In [3]:
connection = sqlite3.connect(dbfile)
connection.close()

After we have created the database, we can use the `sqlalchemy` package to connect to it. Remember that the SQL standard has many implementations. Each implementation has its own connection details. However `sqlalchemy` provides an abstraction layer over these connection details. We can use the same approaches no matter the implementation.

In [4]:
sqlite_url = "sqlite:///movies.db"

sqlite_engine = sqlalchemy.create_engine(sqlite_url)

On the first call there should be no tables present in the file.

In [5]:
sqlite_engine.table_names()

[]

## Creating a Table

We need to add data to a table in the database. The table will contain columns

* 'director' : Text 
* 'genre' : Text
* 'movie' : Text and Unique 
* 'rating' : Integer Number with Range Constraint
* 'revenue' : Floating Point Number

Note that each column has a fixed data type. The database will enforce these types along with the uniqueness of entries in the primary key.

In [6]:
sql_expr = """
CREATE TABLE movies(
    director TEXT, 
    genre TEXT, 
    movie TEXT PRIMARY KEY, 
    rating INTEGER CHECK (rating >= 0 and rating <= 10), 
    revenue FLOAT);
"""

result = sqlite_engine.execute(sql_expr)

The query returns a sqlalchemy `ResultProxy` object:

In [15]:
result?

[1;31mType:[0m        ResultProxy
[1;31mString form:[0m <sqlalchemy.engine.result.ResultProxy object at 0x000002703D2F4208>
[1;31mFile:[0m        c:\users\policast\programs\anaconda3.7\lib\site-packages\sqlalchemy\engine\result.py
[1;31mDocstring:[0m  
Wraps a DB-API cursor object to provide easier access to row columns.

Individual columns may be accessed by their integer position,
case-insensitive column name, or by ``schema.Column``
object. e.g.::

  row = fetchone()

  col1 = row[0]    # access via integer position

  col2 = row['col2']   # access via name

  col3 = row[mytable.c.mycol] # access via Column object.

``ResultProxy`` also handles post-processing of result column
data using ``TypeEngine`` objects, which are referenced from
the originating SQL statement that produced this result set.


We can check to see if the result object contains any data:

In [16]:
result.returns_rows

False

## Inserting Values into the Table

Now let's manually insert some values into the table.  Note that:

* strings in SQL must be quoted with a single quote **`'`** character.
* insertions need to have values in the same order as the columns in the `create table` statement! 

In [7]:
sql_expr = """
INSERT INTO movies VALUES 
    ('David', 'Action & Adventure', 'Deadpool 2', 7, 318344544),
    ('Bill', 'Comedy', 'Book Club', 5,  68566296),
    ('Ron', 'Science Fiction & Fantasy', 'Solo: A Star Wars Story', 6, 213476293),
    ('Baltasar', 'Drama', 'Adrift', 6,  31445012),
    ('Bart', 'Drama', 'American Animals', 6,   2847319),
    ('Gary', 'Action & Adventure', 'Oceans 8', 6, 138803463),
    ('Drew', 'Action & Adventure', 'Hotel Artemis', 8,   6708147),
    ('Brad', 'Animation', 'Incredibles 2', 5, 594398019),
    ('Jeff', 'Comedy', 'Tag', 6,  54336863),
    ('J.A.', 'Science Fiction & Fantasy', 'Jurassic World: Fallen Kingdom', 6, 411873505),
    ('Charles', 'Comedy', 'Uncle Drew', 5,  42201656),
    ('Gerard', 'Horror', 'The First Purge', 7,  68765655),
    ('Peyton', 'Action & Adventure', 'Ant-Man and the Wasp', 5, 208681866),
    ('Genndy', 'Animation', 'Hotel Transylvania 3: Summer Vacation', 5, 154418311),
    ('Rawson', 'Action & Adventure', 'Skyscraper', 6,  66801215),
    ('Ol', 'Comedy', 'Mamma Mia! Here We Go Again', 8, 111705055),
    ('Christopher', 'Action & Adventure', 'Mission: Impossible-Fallout', 6, 182080372),
    ('Marc', 'Comedy', 'Christopher Robbin', 6,   6786317);
"""
result = sqlite_engine.execute(sql_expr)

Again we see that this query returns nothing:

In [8]:
result.returns_rows

False

## Querying the Table

Now that we have populated the table we can construct a query to extract the results.

In [9]:
sql_expr = """
SELECT * FROM movies;
"""

result = sqlite_engine.execute(sql_expr)

In [10]:
result.returns_rows

True

### Iterating the Cursor

The query returns a persistent connection in the form of a `cursor` which can be used to read data from the database. 

In [11]:
[r for r in result.cursor]

[('David', 'Action & Adventure', 'Deadpool 2', 7, 318344544.0),
 ('Bill', 'Comedy', 'Book Club', 5, 68566296.0),
 ('Ron',
  'Science Fiction & Fantasy',
  'Solo: A Star Wars Story',
  6,
  213476293.0),
 ('Baltasar', 'Drama', 'Adrift', 6, 31445012.0),
 ('Bart', 'Drama', 'American Animals', 6, 2847319.0),
 ('Gary', 'Action & Adventure', 'Oceans 8', 6, 138803463.0),
 ('Drew', 'Action & Adventure', 'Hotel Artemis', 8, 6708147.0),
 ('Brad', 'Animation', 'Incredibles 2', 5, 594398019.0),
 ('Jeff', 'Comedy', 'Tag', 6, 54336863.0),
 ('J.A.',
  'Science Fiction & Fantasy',
  'Jurassic World: Fallen Kingdom',
  6,
  411873505.0),
 ('Charles', 'Comedy', 'Uncle Drew', 5, 42201656.0),
 ('Gerard', 'Horror', 'The First Purge', 7, 68765655.0),
 ('Peyton', 'Action & Adventure', 'Ant-Man and the Wasp', 5, 208681866.0),
 ('Genndy',
  'Animation',
  'Hotel Transylvania 3: Summer Vacation',
  5,
  154418311.0),
 ('Rawson', 'Action & Adventure', 'Skyscraper', 6, 66801215.0),
 ('Ol', 'Comedy', 'Mamma Mia! H

However, as we read the cursor we advance it and so it can only be used once:

In [14]:
[r for r in result.cursor]

[]

## Using Pandas to Query the Database

You can also use Pandas to query the database.  Here we pass the engine (or a connection) into the [`pandas.read_sql` function](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html):

In [15]:
sql_expr = """
SELECT * FROM movies;
"""

pd.read_sql(sql_expr, sqlite_engine)

Unnamed: 0,director,genre,movie,rating,revenue
0,David,Action & Adventure,Deadpool 2,7,318344544.0
1,Bill,Comedy,Book Club,5,68566296.0
2,Ron,Science Fiction & Fantasy,Solo: A Star Wars Story,6,213476293.0
3,Baltasar,Drama,Adrift,6,31445012.0
4,Bart,Drama,American Animals,6,2847319.0
...,...,...,...,...,...
13,Genndy,Animation,Hotel Transylvania 3: Summer Vacation,5,154418311.0
14,Rawson,Action & Adventure,Skyscraper,6,66801215.0
15,Ol,Comedy,Mamma Mia! Here We Go Again,8,111705055.0
16,Christopher,Action & Adventure,Mission: Impossible-Fallout,6,182080372.0


## Primary Key Integrity Constraint

What happens if we try to insert another record with the same primary key (`movie`)?

In [16]:
sql_expr = """
INSERT INTO students VALUES 
('Marc', 'Drama', 'Christopher Robbin', 9, 100000.0)
"""

try:
    result = sqlite_engine.execute(sql_expr)
except Exception as e:
    print(e)

(sqlite3.OperationalError) no such table: students
[SQL: 
INSERT INTO students VALUES 
('Marc', 'Drama', 'Christopher Robbin', 9, 100000.0)
]
(Background on this error at: http://sqlalche.me/e/e3q8)


Notice in the above block of code we use `try:` and `except Exception as e:`.  This accomplishes two goals:

1. This syntax catches the exception and prevents the notebook from terminating when the error occurs (we are expecting this error!)
1. This syntax also hides the full stack trace and only shows us the important message containing the final error.

## Saving a Dataframe to a Database

We can also populate the database using Pandas.  In the following we first obtain the Tips dataset from the seaborn visualization library as a dataframe:

In [18]:
df_movies = pd.read_csv("movies.csv")
df_movies

Unnamed: 0,director,genre,movie,rating,revenue
0,David,Action & Adventure,Deadpool 2,7,318344544
1,Bill,Comedy,Book Club,5,68566296
2,Ron,Science Fiction & Fantasy,Solo: A Star Wars Story,6,213476293
3,Baltasar,Drama,Adrift,6,31445012
4,Bart,Drama,American Animals,6,2847319
...,...,...,...,...,...
13,Genndy,Animation,Hotel Transylvania 3: Summer Vacation,5,154418311
14,Rawson,Action & Adventure,Skyscraper,6,66801215
15,Ol,Comedy,Mamma Mia! Here We Go Again,8,111705055
16,Christopher,Action & Adventure,Mission: Impossible-Fallout,6,182080372


We can then use the `pandas.to_sql` command to put the data in our SQLlite database:

In [19]:
df_movies.to_sql("movies_duplicate", sqlite_engine)

We can see that a new table has been added:

In [20]:
sqlite_engine.table_names()

['movies', 'movies_duplicate']

We can also query the table:

In [22]:
sql_expr = """
SELECT * FROM movies
WHERE rating > 4;
"""

pd.read_sql(sql_expr, sqlite_engine)

Unnamed: 0,director,genre,movie,rating,revenue
0,David,Action & Adventure,Deadpool 2,7,318344544.0
1,Bill,Comedy,Book Club,5,68566296.0
2,Ron,Science Fiction & Fantasy,Solo: A Star Wars Story,6,213476293.0
3,Baltasar,Drama,Adrift,6,31445012.0
4,Bart,Drama,American Animals,6,2847319.0
...,...,...,...,...,...
13,Genndy,Animation,Hotel Transylvania 3: Summer Vacation,5,154418311.0
14,Rawson,Action & Adventure,Skyscraper,6,66801215.0
15,Ol,Comedy,Mamma Mia! Here We Go Again,8,111705055.0
16,Christopher,Action & Adventure,Mission: Impossible-Fallout,6,182080372.0


### Exploring the Schema

There is no mechanism in standard SQL to access the schema associated with each database management systems.  However sqlalchemy provides a simple abstraction layer.  

In [23]:
inspector = sqlalchemy.inspect(sqlite_engine)
inspector.get_table_names()

['movies', 'movies_duplicate']

We can get information about the columns.

In [24]:
for col in inspector.get_columns("movies"):
    print(col)

{'name': 'director', 'type': TEXT(), 'nullable': True, 'default': None, 'autoincrement': 'auto', 'primary_key': 0}
{'name': 'genre', 'type': TEXT(), 'nullable': True, 'default': None, 'autoincrement': 'auto', 'primary_key': 0}
{'name': 'movie', 'type': TEXT(), 'nullable': True, 'default': None, 'autoincrement': 'auto', 'primary_key': 1}
{'name': 'rating', 'type': INTEGER(), 'nullable': True, 'default': None, 'autoincrement': 'auto', 'primary_key': 0}
{'name': 'revenue', 'type': FLOAT(), 'nullable': True, 'default': None, 'autoincrement': 'auto', 'primary_key': 0}


## Dropping Tables

The drop command is used to remove tables from the database (be carefull!):

In [25]:
sql_expr = """
DROP TABLE movies_duplicate;
"""
sqlite_engine.execute(sql_expr)

<sqlalchemy.engine.result.ResultProxy at 0x1cf3b02a908>

Notice that the `movies_duplicate` table no longer exists:

In [26]:
sqlite_engine.table_names()

['movies']

## UPDATE values

What is the rating of `Christoher Robbin`?

In [45]:
sql_expr = """
SELECT *
FROM movies
WHERE movie LIKE '%Christopher%' 
"""

pd.read_sql(sql_expr, sqlite_engine)

Unnamed: 0,director,genre,movie,rating,revenue
0,Marc,Comedy,Christopher Robbin,6,6786317.0


Here we have used pattern matching to search for strings containing the substring `Christopher`. 

We can try to change the rating.

In [32]:
sql_expr = """
UPDATE movies
SET rating = 1 + rating
WHERE LOWER(movie) = 'christopher robbin';
"""

sqlite_engine.execute(sql_expr)

<sqlalchemy.engine.result.ResultProxy at 0x1cf3b30cb00>

And let's check the table now:

In [34]:
sql_expr = """
SELECT * 
FROM movies
WHERE movie LIKE  '%Christopher%' 
"""

pd.read_sql(sql_expr, sqlite_engine)

Unnamed: 0,director,genre,movie,rating,revenue
0,Marc,Comedy,Christopher Robbin,7,6786317.0


### Important Note

In the `update` statement ***we decide which rows get updated based entirely on the values in each row, as checked by the `where` clause.*** There is no notion of any information outside the values in the row--e.g. there are no "object identifiers" or "row numbers"... everything is *just the data and only the data*.

## Integrity Constraints 

The integrity constraints we imposed earlier can be used to improve data quality. 

Suppose the director tried to log onto the database to give himself an 11. 

In [35]:
try:
    sql_expr = """
        UPDATE movies
        SET rating = 11
        WHERE LOWER(director) LIKE '%Marc%';
        """
    sqlite_engine.execute(sql_expr)
except Exception as e:
    print(e)


(sqlite3.IntegrityError) CHECK constraint failed: movies
[SQL: 
        UPDATE movies
        SET rating = 11
        WHERE LOWER(director) LIKE '%Marc%';
        ]
(Background on this error at: http://sqlalche.me/e/gkpj)


The above code fails.  Why? (check the scheme)

## Deleting Records

We can delete rows in much the same way we update rows:

In [40]:
sql_expr = """
DELETE FROM movies 
    WHERE movie = 'Book Club'
"""

sqlite_engine.execute(sql_expr)

<sqlalchemy.engine.result.ResultProxy at 0x1cf3b33b710>

Notice we can rerun the above command multiple times.  Why?

In [41]:
sql_expr = """
SELECT * 
FROM movies;
"""
pd.read_sql(sql_expr, sqlite_engine)

Unnamed: 0,director,genre,movie,rating,revenue
0,David,Action & Adventure,Deadpool 2,7,318344544.0
1,Ron,Science Fiction & Fantasy,Solo: A Star Wars Story,6,213476293.0
2,Baltasar,Drama,Adrift,6,31445012.0
3,Bart,Drama,American Animals,6,2847319.0
4,Gary,Action & Adventure,Oceans 8,6,138803463.0
...,...,...,...,...,...
12,Genndy,Animation,Hotel Transylvania 3: Summer Vacation,5,154418311.0
13,Rawson,Action & Adventure,Skyscraper,6,66801215.0
14,Ol,Comedy,Mamma Mia! Here We Go Again,8,111705055.0
15,Christopher,Action & Adventure,Mission: Impossible-Fallout,6,182080372.0


Restoring `Book Club` 

In [42]:
sql_expr = """
INSERT INTO movies VALUES
('Bill', 'Comedy', 'Book Club', 5,  68566296);
"""
sqlite_engine.execute(sql_expr)

<sqlalchemy.engine.result.ResultProxy at 0x1cf3b33ba90>

The above computation cannot be run more than once:

In [43]:
try:
    sql_expr = """
        INSERT INTO movies VALUES
        ('Bill', 'Comedy', 'Book Club', 5,  68566296);
    """
    sqlite_engine.execute(sql_expr)
except Exception as e:
    print(e)

(sqlite3.IntegrityError) UNIQUE constraint failed: movies.movie
[SQL: 
        INSERT INTO movies VALUES
        ('Bill', 'Comedy', 'Book Club', 5,  68566296);
    ]
(Background on this error at: http://sqlalche.me/e/gkpj)


Notice that the order of the records has change.  We actually have no guarantee on where `Book Club` is inserted in the database.

In [44]:
sql_expr = """
SELECT * FROM movies;
"""
pd.read_sql(sql_expr, sqlite_engine)

Unnamed: 0,director,genre,movie,rating,revenue
0,David,Action & Adventure,Deadpool 2,7,318344544.0
1,Ron,Science Fiction & Fantasy,Solo: A Star Wars Story,6,213476293.0
2,Baltasar,Drama,Adrift,6,31445012.0
3,Bart,Drama,American Animals,6,2847319.0
4,Gary,Action & Adventure,Oceans 8,6,138803463.0
...,...,...,...,...,...
13,Rawson,Action & Adventure,Skyscraper,6,66801215.0
14,Ol,Comedy,Mamma Mia! Here We Go Again,8,111705055.0
15,Christopher,Action & Adventure,Mission: Impossible-Fallout,6,182080372.0
16,Marc,Comedy,Christopher Robbin,6,6786317.0


## SELECT Queries

Now let's start looking at some slightly more interesting queries.  The canonical SQL query block includes the following clauses, in the order they appear. Square brackets indicate optional clauses.

```sql
SELECT ...
  FROM ...
[WHERE ...]
[GROUP BY ...]
[HAVING ...]
[ORDER BY ...]
[LIMIT ...];
```

Query blocks can reference one or more tables, and be nested in various ways.  Before we worry about multi-table queries or nested queries, we'll work our way through examples that exercise all of these clauses on a single table.


### The `SELECT` LIST

The `SELECT` list determines which columns to include in the output.

In [46]:
sql_expr = """
SELECT movie
FROM movies;
"""
pd.read_sql(sql_expr, sqlite_engine)

Unnamed: 0,movie
0,Adrift
1,American Animals
2,Ant-Man and the Wasp
3,Book Club
4,Christopher Robbin
...,...
13,Skyscraper
14,Solo: A Star Wars Story
15,Tag
16,The First Purge


### Functions in the Selection List

SQL has a wide range of functions that can be applied to each attribute in the select list.  Notice that we can alias (name) the columns with `AS`.  The complete list of built into SQL is available [here](https://www.w3schools.com/sql/sql_ref_sqlserver.asp).

In [48]:
sql_expr = """
SELECT UPPER(movie) AS Title, LOWER(genre) as Category, rating/10.0 AS rating_transformed
FROM movies;
"""
pd.read_sql(sql_expr, sqlite_engine)

Unnamed: 0,Title,Category,rating_transformed
0,DEADPOOL 2,action & adventure,0.7
1,SOLO: A STAR WARS STORY,science fiction & fantasy,0.6
2,ADRIFT,drama,0.6
3,AMERICAN ANIMALS,drama,0.6
4,OCEANS 8,action & adventure,0.6
...,...,...,...
13,SKYSCRAPER,action & adventure,0.6
14,MAMMA MIA! HERE WE GO AGAIN,comedy,0.8
15,MISSION: IMPOSSIBLE-FALLOUT,action & adventure,0.6
16,CHRISTOPHER ROBBIN,comedy,0.6


Try to avoid blank space in the headers of columns. You will not be able to reference these headers without enclosing them in tick marks.

### Selecting Distinct Rows 

As we know, SQL is a multiset logic, preserving the meaning of the number of duplicates in query results. Sometimes, however, we don't want to keep the duplicates, we want to eliminate them.  This is done simply by adding the keyword `DISTINCT` after the `SELECT` statement:

In [49]:
sql_expr = """
SELECT DISTINCT genre
FROM movies;
"""
pd.read_sql(sql_expr, sqlite_engine)

Unnamed: 0,genre
0,Action & Adventure
1,Science Fiction & Fantasy
2,Drama
3,Animation
4,Comedy
5,Horror


Which rows are used when taking the distinct entries?  Does it really matter?

## The `WHERE` Clause

The `WHERE` clause determines which *rows* of to include by specifying a predicate (boolean expression).  Rows (tuples) that satisfy this expression are returned.

In [51]:
sql_expr = """
SELECT movie, revenue
FROM movies
WHERE genre = 'Animation'
"""

pd.read_sql(sql_expr, sqlite_engine)

Unnamed: 0,movie,revenue
0,Incredibles 2,594398019.0
1,Hotel Transylvania 3: Summer Vacation,154418311.0


And of course we can specify both rows and columns explicitly. If we have a primary key, we can filter things down to even the cell level via a `select` list of one column, and a `where` clause checking equality on the primary key columns:

In [53]:
sql_expr = """
SELECT revenue
FROM movies
WHERE movie = 'Incredibles 2';
"""
pd.read_sql(sql_expr, sqlite_engine)

Unnamed: 0,revenue
0,594398019.0


#### Important Observation -- SQL always returns a table

Note that even this **"single-celled"** response still has a uniform data type of a *relation*. 

SQL expressions take in tables and always produce tables.  How does this compare to Pandas?

## Group By Aggregation

GROUP BY aggregation in SQL is a lot like the group by in Pandas.  SQL provides a family of [*aggregate functions*] for use in the `select` clause. In the simplest form, queries with aggregates in the `select` clause generate a single row of output, with each aggregate function performing a summary of all the rows of input. 

We can compute the average revenue and the number of movies in each genre.

In [54]:
sql_expr = """
SELECT genre, AVG(revenue) as average_revenue, COUNT(movie) as count
FROM movies
GROUP BY genre
"""

pd.read_sql(sql_expr, sqlite_engine)

Unnamed: 0,genre,average_revenue,count
0,Action & Adventure,153569934.5,6
1,Animation,374408165.0,2
2,Comedy,56719237.4,5
3,Drama,17146165.5,2
4,Horror,68765655.0,1
5,Science Fiction & Fantasy,312674899.0,2


We can use the **`HAVING`** clause to apply a predicate to groups.

In [55]:
sql_expr = """
SELECT genre, AVG(revenue) as average_revenue, COUNT(movie) as count
FROM movies
GROUP BY genre
HAVING count > 2
"""

pd.read_sql(sql_expr, sqlite_engine)

Unnamed: 0,genre,average_revenue,count
0,Action & Adventure,153569934.5,6
1,Comedy,56719237.4,5


### ORDER BY

SQL allows us to order output rows

- ascending order (ASC) 
- descending order (DESC)



In [56]:
sql_expr = """
SELECT *
FROM movies
ORDER BY rating;
"""
pd.read_sql(sql_expr, sqlite_engine)

Unnamed: 0,director,genre,movie,rating,revenue
0,Brad,Animation,Incredibles 2,5,594398019.0
1,Charles,Comedy,Uncle Drew,5,42201656.0
2,Peyton,Action & Adventure,Ant-Man and the Wasp,5,208681866.0
3,Genndy,Animation,Hotel Transylvania 3: Summer Vacation,5,154418311.0
4,Bill,Comedy,Book Club,5,68566296.0
...,...,...,...,...,...
13,Marc,Comedy,Christopher Robbin,6,6786317.0
14,David,Action & Adventure,Deadpool 2,7,318344544.0
15,Gerard,Horror,The First Purge,7,68765655.0
16,Drew,Action & Adventure,Hotel Artemis,8,6708147.0


In [58]:
sql_expr = """
SELECT *
FROM movies
ORDER BY rating DESC;
"""
pd.read_sql(sql_expr, sqlite_engine)

Unnamed: 0,director,genre,movie,rating,revenue
0,Drew,Action & Adventure,Hotel Artemis,8,6708147.0
1,Ol,Comedy,Mamma Mia! Here We Go Again,8,111705055.0
2,David,Action & Adventure,Deadpool 2,7,318344544.0
3,Gerard,Horror,The First Purge,7,68765655.0
4,Ron,Science Fiction & Fantasy,Solo: A Star Wars Story,6,213476293.0
...,...,...,...,...,...
13,Brad,Animation,Incredibles 2,5,594398019.0
14,Charles,Comedy,Uncle Drew,5,42201656.0
15,Peyton,Action & Adventure,Ant-Man and the Wasp,5,208681866.0
16,Genndy,Animation,Hotel Transylvania 3: Summer Vacation,5,154418311.0


## LIMIT Clause

The limit clause limits the number of elements returned.  

In [59]:
sql_expr = """
SELECT * 
FROM movies 
LIMIT 3
"""

pd.read_sql(sql_expr, sqlite_engine)

Unnamed: 0,director,genre,movie,rating,revenue
0,David,Action & Adventure,Deadpool 2,7,318344544.0
1,Ron,Science Fiction & Fantasy,Solo: A Star Wars Story,6,213476293.0
2,Baltasar,Drama,Adrift,6,31445012.0


**Why do we use the `LIMIT` clause?**

Often the database we are querying is massive and retrieving the entire table as we are debugging the query can be costly in time and system resources.  However, we should avoid using `LIMIT` when constructing a sample of the data. 

**Which elements are returned?**

While this depends on the order of elements which could be arbitrary beyond anything specified by the `ORDER BY` clauses. Note that the output is not a random sample.

