<a href="https://colab.research.google.com/github/brendanpshea/database_sql/blob/main/Database_02_IntroToSQL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##  What is the Relational Model?

The **relational model)) of data has revolutionized the way we store, organize, and interact with information. This model, proposed by English computer scientist Edgar F. Codd in 1970 while working at IBM, forms the foundation of modern database systems.

At its core, the relational model is a way of structuring data in a database using tables. These tables, which resemble spreadsheets, represent specific entities or concepts. For instance, a library database might have tables for "Books," "Authors," "Publishers," and "Borrowers."

Some ideas of the relational model:

-   An **entity** is a specific object or concept that data is being stored about, such as a book, author, or customer.
-   An **attribute** is a characteristic or property of an entity, such as a book's title, author, or publication date.
-   A **table** is a collection of data elements organized in terms of rows and columns. Each table represents a specific entity.
-   A **row** (record or tuple) is a single instance of an entity in a table, such as one specific book in the "Books" table.
-   A **column (field)** represents an attribute of the entity in a table, such as the "Title" or "Author" of a book.
-   A **primary key** is a unique identifier for each row in a table, used to distinguish one record from another.
-   A **foreign key** is a field in one table that uniquely identifies a row in another table, used to establish relationships between tables.

Here's an example of what a simple "Books" table might look like:

| id | title | author | publication_year |
| --- | --- | --- | --- |
| 1 | Dune | Frank Herbert | 1965 |
| 2 | Ender's Game | Orson Scott Card | 1985 |
| 3 | Hitchhiker's Guide to the Galaxy | Douglas Adams | 1979 |

In this table, each row represents a specific book, with columns for the book's unique identifier (id), title, author, and publication year.

The real strength of the relational model lies in its ability to establish relationships between tables. Instead of duplicating data across tables, we can link related data using keys.

For example, let's say we have a separate "Authors" table:

| id | name | birth_year |
| --- | --- | --- |
| 1 | Frank Herbert | 1920 |
| 2 | Orson Scott Card | 1951 |
| 3 | Douglas Adams | 1952 |

We can link the "Books" and "Authors" tables using the "author_id" field in the "Books" table, which would serve as a foreign key referencing the "id" field (the primary key) in the "Authors" table:

| id | title | author_id | publication_year |
| --- | --- | --- | --- |
| 1 | Dune | 1 | 1965 |
| 2 | Ender's Game | 2 | 1985 |
| 3 | Hitchhiker's Guide to the Galaxy | 3 | 1979 |

This structure allows us to efficiently store and manage data, as changes to an author's details only need to be made in the "Authors" table, and will automatically be reflected wherever that author is referenced.

## What is SQL?

**SQL**, which stands for Structured Query Language, is a programming language used to communicate with and manipulate relational databases.SQL was developed in the early 1970s at IBM to work with these relational databases, and it has since become the standard language for interacting with them.

One of the key features of SQL is that it is a **declarative language**. This means that when you write SQL queries, you tell the database what you want it to do, but not how to do it. It's a bit like giving someone directions to a destination without specifying the exact route they should take. The database figures out the best way to execute your query and retrieve the data you requested.

A **query** is simply a question or a request for data from a database. With SQL, you can write queries to retrieve, insert, update, and delete data in a database. For example, if you have a database of sci-fi books, you might write a query to find all the books published after a certain year, like this:

```sql
SELECT title, author, publication_year
FROM Books
WHERE publication_year > 1980;
```

This query tells the database, "Give me the title, author, and publication year for all the books in the 'Books' table that were published after 1980."

SQL is divided into several sublanguages, each with a specific purpose:

1.  **Data Definition Language (DDL).** This part of SQL is used to define and modify the structure of a database, including creating, altering, and deleting tables and other database objects.
2.  **Data Manipulation Language (DML).** DML is used to manipulate the data stored in a database. This includes inserting new data, updating existing data, and deleting data.
3.  **Data Query Language (DQL).** This is the part of SQL used to retrieve data from a database. The most common DQL command is SELECT, which is used to query data from one or more tables.
4.  **Data Control Language (DCL).** DCL is used to manage access to a database. This includes granting and revoking permissions for users to perform specific actions on the database.
5.  **Transaction Control Language (TCL).** TCL is used to manage database transactions. This includes commands to commit (permanently save) or rollback (undo) changes made to the database.

By learning SQL, you gain a powerful tool for working with relational databases. Whether you're a data analyst, a software developer, or just someone who wants to understand how to manage and query data effectively, SQL is a valuable skill to have. It's used in a wide variety of settings, from small personal projects to large-scale enterprise applications, making it a versatile and essential part of working with data.

##  What is SQLite?

**SQLite** is a lightweight, file-based relational database management system (RDBMS) that is embedded directly into the application that uses it. Unlike traditional database systems that run as separate server processes, SQLite is a serverless, self-contained library that requires minimal support from external libraries or operating systems. This unique architecture makes SQLite an ideal choice for many scenarios, particularly those where a full-fledged database server might be overkill or impractical. It's one of the most widely installed pieces of software in the world, and is the database of choice for many mobile, web, and desktop applications, as well "embedded" databases in the internet-of-things.

One of the key advantages of SQLite is its simplicity and ease of use. Setting up a new SQLite database is as simple as creating a new file on your computer. There is no need to install any additional software or configure a complex server environment. This makes SQLite an excellent choice for beginners learning SQL and for developers who need a quick and easy way to store and manage data in their applications. However, SQLite isn't well-suited for traditional business/enterprise use (where there are large numbers of users simultaneously writing to huge databases across a network). For this, you'd need to use a traditional "client-server database" (such as Oracle, SQL Server, MySQL, or Postgres).

Happily, SQLite supports most of the core features of SQL, including complex queries, transactions, and constraints. This means that you can use the same SQL syntax and concepts you would use with other RDBMSs, making it a great learning tool and a gateway to more advanced database systems.

Throughout this book, we'll be using SQLite to explore SQL and relational databases. By the end, you'll have a solid foundation in SQL and a practical understanding of how to use SQLite to manage and query data in your own projects, whether you're building a mobile app, a small web application, or a desktop tool for managing your personal library of sci-fi books.

## Getting Started With SQLite in Google Colab
Google Colab is a free, cloud-based Jupyter notebook environment that allows you to write and execute Python code in your browser. It's a great platform for learning and experimenting with SQL and SQLite, as it provides a simple, interactive way to run SQL queries and visualize the results.

To get started with SQLite in Google Colab, we'll use the IPython SQL magic extension. This extension allows us to write SQL queries directly in our notebook cells, prefixed with the `%%sql` magic command.
First, if needed, let's install the IPython SQL extension and load it into our notebook:

In [1]:
# !pip install ipython-sql # Not needed--already installed
%load_ext sql

Now that we have the SQL magic extension loaded, we can connect to our SQLite database. In this case, we'll be using a pre-populated database file called `books.db`, which contains data about a collection of sci-fi books.

To load the `books.db` file into our Colab notebook, we can use the `wget` command to download it from a web link:

In [2]:
!wget -N 'https://github.com/brendanpshea/database_sql/raw/main/data/sci_fi_books.db' -q

Once the file is downloaded, we can connect to it using the SQL magic extension:

In [7]:
%sql sqlite:///sci_fi_books.db

This command establishes a connection to the `sci_fi_books.db` SQLite database file in our current working directory.

With the connection established, we can now run SQL queries on our database directly in our Colab notebook cells. For example, to see a list of all the tables in our database, we can use the following command.



In [9]:
%%sql
SELECT name FROM sqlite_master WHERE type='table';

   sqlite:///books.db
 * sqlite:///sci_fi_books.db
Done.


name
books


## Getting to Know the Books Table

The data we'll be working with throughout this book comes from Goodreads, a popular social cataloging website that allows users to search its database of books, annotations, and reviews. Users can sign up and register books to generate library catalogs and reading lists. They can also create their own groups of book suggestions, surveys, polls, blogs, and discussions.

For our purposes, we'll be using a dataset that contains information about a collection of science fiction books. This data has been extracted from Goodreads and formatted to fit into a SQLite database, allowing us to easily explore and analyze it using SQL.

To work effectively with a database, it's crucial to understand its structure or schema. A database **schema** is a blueprint that defines how the data is organized. It specifies what tables exist in the database, what columns each table has, and what type of data each column can store.

Here's a breakdown of the schema for our "Books" table:

-   `id`: A unique identifier for each book, serving as the PRIMARY KEY for our database. This ensures that each record in the table can be uniquely identified and accessed.
-   `title`: The title of the book, stored as a TEXT data type.
-   `series`: If the book is part of a series, this column will contain the name of that series. It's also stored as TEXT.
-   `author`: The author of the book, stored as TEXT.
-   `rating`: The average rating of the book on Goodreads, stored as a REAL number (a decimal number).
-   `language`: The language the book is written in, stored as TEXT.
-   `pages`: The number of pages in the book, stored as an INT (integer).
-   `publisher`: The name of the book's publisher, stored as TEXT.
-   `numRatings`: The total number of ratings the book has received on Goodreads, stored as an INT.
-   `firstPublishDate`: The date when the book was first published, stored as TEXT.
-   `publishDate`: The most recent publication date of the book, also stored as TEXT.

Understanding this schema is essential for writing effective SQL queries. It tells us what information we have access to and how it's structured. For example, knowing that the `rating` is stored as a REAL number tells us that we can perform mathematical operations on it, like finding the average rating across all books. Knowing that `pages` is an INT tells us that we can use it for numerical comparisons, like finding all books over 500 pages.