# Level 4: SQL Basics – Data Query Language (DQL)

**Data Query Language (DQL)** is arguably the most important part of SQL for data analysis. It consists of the commands used to retrieve data from the database. The cornerstone of DQL is the `SELECT` statement.

In this notebook, we will cover:
- `SELECT`: To choose the data you want to see.
- `WHERE`: To filter the data based on conditions.
- `ORDER BY`: To sort the results.
- `LIMIT` / `OFFSET`: To control the number of results returned.
- `DISTINCT`: To remove duplicate results.

### Setup
Let's create a database with some sample data about books.

In [1]:
import sqlite3
import os

db_file = 'dql_example.db'
if os.path.exists(db_file):
    os.remove(db_file)

conn = sqlite3.connect(db_file)
cursor = conn.cursor()

cursor.execute("""
CREATE TABLE books (
    id INTEGER PRIMARY KEY,
    title TEXT NOT NULL,
    author TEXT NOT NULL,
    genre TEXT,
    publish_year INTEGER,
    rating REAL
);
""")

books_data = [
    ('The Hobbit', 'J.R.R. Tolkien', 'Fantasy', 1937, 4.8),
    ('Dune', 'Frank Herbert', 'Sci-Fi', 1965, 4.9),
    ('1984', 'George Orwell', 'Dystopian', 1949, 4.7),
    ('A Brief History of Time', 'Stephen Hawking', 'Science', 1988, 4.6),
    ('Pride and Prejudice', 'Jane Austen', 'Romance', 1813, 4.5),
    ('The Hitchhiker''s Guide to the Galaxy', 'Douglas Adams', 'Sci-Fi', 1979, 4.9)
]

cursor.executemany("INSERT INTO books (title, author, genre, publish_year, rating) VALUES (?, ?, ?, ?, ?)", books_data)
conn.commit()

## 4.1 `SELECT` Statements

Select all columns from the table:

In [2]:
cursor.execute("SELECT * FROM books;")
print(cursor.fetchall())

[(1, 'The Hobbit', 'J.R.R. Tolkien', 'Fantasy', 1937, 4.8), (2, 'Dune', 'Frank Herbert', 'Sci-Fi', 1965, 4.9), (3, '1984', 'George Orwell', 'Dystopian', 1949, 4.7), (4, 'A Brief History of Time', 'Stephen Hawking', 'Science', 1988, 4.6), (5, 'Pride and Prejudice', 'Jane Austen', 'Romance', 1813, 4.5), (6, 'The Hitchhikers Guide to the Galaxy', 'Douglas Adams', 'Sci-Fi', 1979, 4.9)]


Select specific columns:

In [3]:
cursor.execute("SELECT title, author FROM books;")
print(cursor.fetchall())

[('The Hobbit', 'J.R.R. Tolkien'), ('Dune', 'Frank Herbert'), ('1984', 'George Orwell'), ('A Brief History of Time', 'Stephen Hawking'), ('Pride and Prejudice', 'Jane Austen'), ('The Hitchhikers Guide to the Galaxy', 'Douglas Adams')]


## 4.2 Filtering with `WHERE`

The `WHERE` clause is used to extract only those records that fulfill a specified condition.

In [4]:
# Find all Sci-Fi books
cursor.execute("SELECT title, author FROM books WHERE genre = ?;", ('Sci-Fi',))
print("Sci-Fi Books:", cursor.fetchall())

Sci-Fi Books: [('Dune', 'Frank Herbert'), ('The Hitchhikers Guide to the Galaxy', 'Douglas Adams')]


In [5]:
# Find books published after 1950 with a rating > 4.7
cursor.execute("SELECT title FROM books WHERE publish_year > ? AND rating > ?;", (1950, 4.7))
print("\nPublished after 1950 with high rating:", cursor.fetchall())


Published after 1950 with high rating: [('Dune',), ('The Hitchhikers Guide to the Galaxy',)]


### Other `WHERE` Operators
- `IN`: Check if a value is within a set of values. `WHERE genre IN ('Fantasy', 'Romance')`
- `BETWEEN`: Check if a value is within a range. `WHERE publish_year BETWEEN 1900 AND 2000`
- `LIKE`: Search for a pattern in a string. `WHERE author LIKE 'J%'` (finds authors starting with J)
- `IS NULL`: Check for null values.

## 4.3 Sorting Results (`ORDER BY`)

In [6]:
# Order books by publication year, oldest first (ASC is default)
cursor.execute("SELECT title, publish_year FROM books ORDER BY publish_year;")
print("Ordered by year (ASC):", cursor.fetchall())

Ordered by year (ASC): [('Pride and Prejudice', 1813), ('The Hobbit', 1937), ('1984', 1949), ('Dune', 1965), ('The Hitchhikers Guide to the Galaxy', 1979), ('A Brief History of Time', 1988)]


In [7]:
# Order books by rating, highest first
cursor.execute("SELECT title, rating FROM books ORDER BY rating DESC;")
print("\nOrdered by rating (DESC):", cursor.fetchall())


Ordered by rating (DESC): [('Dune', 4.9), ('The Hitchhikers Guide to the Galaxy', 4.9), ('The Hobbit', 4.8), ('1984', 4.7), ('A Brief History of Time', 4.6), ('Pride and Prejudice', 4.5)]


## 4.4 Limiting Results (`LIMIT` and `OFFSET`)

`LIMIT` constrains the number of rows returned. `OFFSET` skips a certain number of rows before starting to return them. This is often used for pagination.

In [8]:
# Get the top 3 highest-rated books
cursor.execute("SELECT title, rating FROM books ORDER BY rating DESC LIMIT 3;")
print("Top 3 books:", cursor.fetchall())

Top 3 books: [('Dune', 4.9), ('The Hitchhikers Guide to the Galaxy', 4.9), ('The Hobbit', 4.8)]


In [9]:
# Get the 4th and 5th highest-rated books (skip the top 3)
cursor.execute("SELECT title, rating FROM books ORDER BY rating DESC LIMIT 2 OFFSET 3;")
print("\n4th and 5th books:", cursor.fetchall())


4th and 5th books: [('1984', 4.7), ('A Brief History of Time', 4.6)]


## 4.5 Removing Duplicates (`DISTINCT`)

The `DISTINCT` keyword is used to return only unique values.

In [10]:
# Get the list of unique genres
cursor.execute("SELECT DISTINCT genre FROM books;")
print("Unique genres:", cursor.fetchall())

Unique genres: [('Fantasy',), ('Sci-Fi',), ('Dystopian',), ('Science',), ('Romance',)]


In [11]:
# Close the connection
conn.close()