<a href="https://colab.research.google.com/github/brendanpshea/database_sql/blob/main/Database_09_PokemonAndPostgres.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## What is PostgreSQL? How Does it Differ from SQLite?

PostgreSQL and SQLite are both relational database management systems that support ACID (Atomicity, Consistency, Isolation, Durability) properties and transactions. However, they have significant differences in their architectures, feature sets, and use cases.

**SQLite** is a lightweight, serverless, and self-contained database engine. It's ideal for small to medium-scale applications, embedded systems, or local data storage. SQLite stores the entire database as a single file on disk, making it easy to set up and manage. It's often used in mobile apps, desktop applications, and small websites.

On the other hand, **PostgreSQL** is a full-featured, server-based RDBMS designed to handle large amounts of data and support multiple concurrent users. Its client-server architecture allows it to manage resources more efficiently and handle heavier workloads.

When deciding between SQLite and PostgreSQL, consider the following factors:

1.  **Scalability**: As your application grows and the number of concurrent users increases, PostgreSQL's client-server architecture becomes crucial. It can handle a large number of simultaneous connections and efficiently manage resources. SQLite, being serverless, may struggle with high levels of concurrency and may not be suitable for applications with a large number of concurrent writers.
2.  **Data Size and Distribution**: PostgreSQL is designed to handle large and massive datasets, even in the terabyte range. It offers features like table partitioning, which allows you to split large tables across multiple files or even servers, improving query performance and manageability. SQLite, while capable of handling moderately sized datasets, may not be the best choice for extremely large or distributed datasets.
3.  **Advanced Features**: PostgreSQL offers a rich set of advanced features that become increasingly important as your application grows. These include:
    -   **Strict Typing**: PostgreSQL enforces strict data typing, ensuring data integrity and reducing the chances of data inconsistencies. This becomes increasingly critical when there many "writers" to the database.
    -   **Complex Queries**: PostgreSQL supports complex queries, including advanced joins, subqueries, and window functions, which are essential for handling sophisticated data retrieval tasks.
    -   **Stored Procedures and Triggers**: PostgreSQL allows you to define stored procedures and triggers, enabling you to encapsulate complex business logic within the database itself. This can lead to better performance and maintainability.
    -   **Extensibility**: PostgreSQL is highly extensible, allowing you to add custom data types, functions, and even programming languages. This flexibility becomes crucial as your application's requirements evolve.
    - **Security and User Management**: Postgres has built-in support for things like encryption, user management, password hashing, and other security measures. SQLite, by contrast, relies on the surrounding "application" (written in Python, Java, C#, etc.) to handle these things. This can be become impractical as the numbers of users becomes large.
4.  **Replication and High Availability**: As your application becomes mission-critical, you may need to ensure high availability and minimize downtime. PostgreSQL offers built-in replication features, such as streaming replication and logical replication, which allow you to create standby servers and distribute the workload. SQLite, being a serverless database, does not have built-in replication capabilities.
5. **Available Resources.** Postgres requires more physical resources (processing power, disk space) and human resources (e.g., a trained database administrator) than SQLite. SQLite's dynamic typing can make database development and deployment quicker than Postgres's strict typing. (In fact, SQLite is often used to develop protoype databases, which can then be "scaled up" to Postgres or a similar RDBMS).

While SQLite is a great choice for small to medium-sized applications, embedded systems, or local data storage, it may not be suitable for large-scale, high-concurrency, or mission-critical applications. In these cases, PostgreSQL's robustness, scalability, and advanced features make it the better choice.

For example, a large institution like a university or a financial organization would likely choose PostgreSQL over SQLite due to its ability to handle large amounts of data, support multiple concurrent users, and provide advanced features necessary for complex data management tasks.

## Data Types in Postgres

PostgreSQL offers a rich set of data types, including several that are not available in SQLite. These data types allow you to store and manipulate data more efficiently and with greater precision. Let's explore some of the key data types in PostgreSQL.

| Data Type Category | Examples | Description |
| --- | --- | --- |
| Numeric Types | `INTEGER`, `BIGINT`, `SMALLINT`, `DECIMAL`, `NUMERIC`, `REAL`, `DOUBLE PRECISION`, `SERIAL`, `BIGSERIAL` | Whole numbers, fixed-point numbers, floating-point numbers, and auto-incrementing integers. |
| Character Types | `CHAR(n)`, `VARCHAR(n)`, `TEXT` | Fixed-length and variable-length character strings. |
| Date/Time Types | `DATE`, `TIME`, `TIMESTAMP`, `INTERVAL` | Stores date, time, timestamp, and interval values. |
| Boolean Type | `BOOLEAN` | Stores a logical value of either `TRUE` or `FALSE`. |
| Enumerated Type | `ENUM` | Defines a custom data type with a static set of values. |
| Array Type | Any data type followed by `[]` | Represents an array of elements of the same type. |
| UUID Type | `UUID` | Stores Universally Unique Identifiers (UUIDs). |
| JSON and JSONB Types | `JSON`, `JSONB` | Stores JSON data as text or in a binary format. |
| Hstore Type | `HSTORE` | Represents a key-value pair data type. |
| Range Types | `INT4RANGE`, `TSRANGE`, `DATERANGE`, etc. | Represent a range of values, such as integers, timestamps, or dates. |

Here are a few examples to illustrate the usage of some of these data types:

1.  Enumerated Type:

```sql
CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
```

2.  Array Type:

```sql
CREATE TABLE scores (
      id SERIAL PRIMARY KEY,
      student_id INTEGER,
      grades INTEGER[]
    );
```

When choosing data types for your database schema, consider factors such as data integrity, storage efficiency, and the nature of the data being stored. PostgreSQL's wide range of data types gives you flexibility and power in designing your database schema.

It's worth noting that while SQLite also supports many of these data types, such as numeric, character, and date/time types, it lacks some of the more advanced types like arrays, UUIDs, and range types. PostgreSQL's extensive set of data types is one of the factors that make it a more versatile and feature-rich database system.

In [1]:
# Insteall postgres
!apt install postgresql postgresql-contrib &>log
!service postgresql start
!sudo -u postgres psql -c "CREATE USER root WITH SUPERUSER"
# set connection
%load_ext sql
%sql postgresql+psycopg2://@/postgres

 * Starting PostgreSQL 14 database server
   ...done.
CREATE ROLE


## Overview of the Pokemon Clinic Database:

To demonstrate the power of Postgres, we'll be creating a The Pokemon Clinic database consists of three main tables: `trainers`, `pokemon`, and `medical_records`. These tables are designed to store information about Pokemon trainers, their Pokemon, and the medical records associated with each Pokemon.

In [2]:
%%sql
-- Create the trainers table
DROP TABLE IF EXISTS trainers CASCADE;
DROP TABLE IF EXISTS pokemon CASCADE;
DROP TABLE IF EXISTS medical_records;

-- Create the trainers table
CREATE TABLE trainers (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100) NOT NULL,
  email VARCHAR(100) UNIQUE NOT NULL,
  password_hash VARCHAR(100) NOT NULL,
  phone VARCHAR(20),
  date_of_birth DATE,
  CHECK (date_of_birth < CURRENT_DATE)
);

-- Create the pokemon table
CREATE TABLE pokemon (
  id SERIAL PRIMARY KEY,
  name VARCHAR(50) NOT NULL,
  species VARCHAR(50) NOT NULL,
  trainer_id INTEGER REFERENCES trainers(id),
  level INTEGER CHECK (level BETWEEN 1 AND 100),
  health_status VARCHAR(20),
  abilities TEXT[]
);

-- Create the medical_records table
CREATE TABLE medical_records (
  id SERIAL PRIMARY KEY,
  pokemon_id INTEGER REFERENCES pokemon(id),
  visit_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  diagnosis TEXT,
  treatment TEXT,
  cost NUMERIC(8, 2)
);

 * postgresql+psycopg2://@/postgres
Done.
Done.
Done.
Done.
Done.
Done.


[]

While much of this statement should be familiar, you might notice a few new things, as well.

The `trainers` table stores information about Pokemon trainers.
  -   It includes columns for the trainer's ID (auto-generated), name, email (unique), hashed password, phone number, and date of birth.
  -   The `date_of_birth` column has a CHECK constraint that ensures the value is always less than the current date. This constraint uses the `CURRENT_DATE` function, which is specific to PostgreSQL and not available in SQLite.
  -   The `email` column has a UNIQUE constraint to ensure that each email address is associated with only one trainer.

The pokemon` table stores information about individual Pokemon.
  -   It includes columns for the Pokemon's ID (auto-generated), name, species, trainer ID (foreign key referencing the `trainers` table), level, health status, and abilities.
   -   The `level` column has a CHECK constraint that ensures the value is between 1 and 100.
  -   The `abilities` column is of type TEXT[] (array), which allows storing multiple abilities for each Pokemon. Arrays are a feature specific to PostgreSQL and not available in SQLite.

The `medical_records` table stores information about medical visits and treatments for each Pokemon.
  -   It includes columns for the medical record ID (auto-generated), Pokemon ID (foreign key referencing the `pokemon` table), visit date, diagnosis, treatment, and cost.
  -   The `visit_date` column has a DEFAULT value of `CURRENT_TIMESTAMP`, which automatically sets the value to the current timestamp if no value is provided during insertion. This is a PostgreSQL-specific feature.
  -   The `cost` column is of type NUMERIC(8, 2), which allows storing monetary values with a precision of 8 digits and 2 decimal places. The NUMERIC type provides more precise decimal calculations compared to SQLite's REAL type.

### Test Data for Pokemon Clinic
Now, let's insert some test data for our Pokemon Clinic

In [3]:
%%sql
CREATE EXTENSION IF NOT EXISTS pgcrypto;
-- Insert sample data into the trainers table
INSERT INTO trainers (name, email, password_hash, phone, date_of_birth)
VALUES
  ('Ash Ketchum', 'ash@example.com', crypt('bad_password', gen_salt('bf')), '123-456-7890', '1990-05-22'),
  ('Misty', 'misty@example.com', crypt('long_password_with_symbols_23423odsgv*732x', gen_salt('bf')), '987-654-3210', '1991-07-19');

-- Insert sample data into the pokemon table
INSERT INTO pokemon (name, species, trainer_id, level, health_status, abilities)
VALUES
  ('Pikachu', 'Pikachu', 1, 25, 'Healthy', ARRAY['Static', 'Lightning Rod']),
  ('Charizard', 'Charizard', 1, 65, 'Healthy', ARRAY['Blaze']),
  ('Staryu', 'Staryu', 2, 30, 'Injured', ARRAY['Natural Cure', 'Illuminate']);

-- Insert sample data into the medical_records table
INSERT INTO medical_records (pokemon_id, diagnosis, treatment, cost)
VALUES
  (3, 'Minor abrasions', 'Applied healing salve', 50.00),
  (1, 'Regular check-up', 'None', 0.00);

 * postgresql+psycopg2://@/postgres
Done.
2 rows affected.
3 rows affected.
2 rows affected.


[]

### `Trainers` Table: What is a Password "Hash"?
Again, much of the above `INSERT` statement should be familar. However, if you look closely at the ways the trainer's passwords are handled, you'll notice something a bit different.

- The `password_hash` column stores the hashed version of the trainer's password. **Hashing** is a one-way process that converts the plain-text password into a fixed-size string of characters. The resulting hash is irreversible, meaning it is computationally infeasible to obtain the original password from the hash.
- In this example, the `crypt` function is used along with the `gen_salt` function to hash the passwords. The `crypt` function applies a cryptographic hash function to the password, and the `gen_salt` function generates a random salt value using the Blowfish ('bf') algorithm. The salt is appended to the password before hashing, making it more resistant to rainbow table attacks and increasing the security of stored passwords.
- Storing hashed passwords instead of plain-text passwords is crucial for security. If the database is compromised, attackers would only have access to the hashed passwords, making it extremely difficult for them to retrieve the original passwords.

It's important to note that while the example uses the `crypt` function for simplicity, in a production environment, it's recommended to use more secure and modern hashing algorithms specifically designed for password hashing, such as bcrypt, scrypt, or PBKDF2, which provide better protection against various types of attacks.

We see what this hash looks like:

In [4]:
%%sql
SELECT * FROM trainers;

 * postgresql+psycopg2://@/postgres
2 rows affected.


id,name,email,password_hash,phone,date_of_birth
1,Ash Ketchum,ash@example.com,$2a$06$tW1aWoHy4VjdtyfU.An/4.jw/YFAT/DHGacZcNsWTkM0C9nBOeOuG,123-456-7890,1990-05-22
2,Misty,misty@example.com,$2a$06$lGSHZvctjhuiTptaF1EHWOYm9JARWEp.BIHWh7s6oFagXMsvmvcta,987-654-3210,1991-07-19


If you look closely, you'll notice that the **password hash** is ALWAYS the exact same length, regardless of the initial length of the password. So, for example, if take two passwords, one of which is "123", and the other of which is the text of my favorite novel (300 pages--also a bad password, though for different reasons!), they will oth generate a "hash" of the exact same length.

The basic idea is this:
1. When the user first creates their password, we "hash" the password and store that hash (not the password) in the database.
2. WHen the user logs in again and enters their password again, we again "hash" whatever they entered, and compare this to our database. If it matches, we let them in!
3. The advantage of this is that if someone manages to break into our database and access the password hash, they won't be able to recover the user's password. This is because hashing is a **one-way** function. If you know the password, you can get the hash, but knowing the hash does NOT allow you to compute the password.



### Postgres Arrays in the `Pokemon` Table

Let's now take a look at the `pokemon` table, which has an array.

In [5]:
%%sql
SELECT * FROM pokemon;

 * postgresql+psycopg2://@/postgres
3 rows affected.


id,name,species,trainer_id,level,health_status,abilities
1,Pikachu,Pikachu,1,25,Healthy,"['Static', 'Lightning Rod']"
2,Charizard,Charizard,1,65,Healthy,['Blaze']
3,Staryu,Staryu,2,30,Injured,"['Natural Cure', 'Illuminate']"


Here, you'll notice there is an array of abilities. We can access this as follows:

In [15]:
%%sql
SELECT
  name,
  abilities[1] as first_ability,
  abilities[2] as second_ability,
  abilities[3] as third_ability
FROM pokemon;

 * postgresql+psycopg2://@/postgres
3 rows affected.


name,first_ability,second_ability,third_ability
Pikachu,Static,Lightning Rod,
Charizard,Blaze,,
Staryu,Natural Cure,Illuminate,


THe attempt to access the third ability fails (since the Pokemon don't have one!). However, it doesn't crash the database--it just returns `None`.

## A Better ALTER TABLE

In PostgreSQL, the `ALTER TABLE` command is more versatile and feature-rich compared to SQLite. While SQLite supports basic table modifications, PostgreSQL offers a wide range of options to alter tables efficiently. Let's explore some of the key improvements in PostgreSQL's `ALTER TABLE` command.

###  Adding Columns with Default Values
In PostgreSQL, you can add a new column to a table and specify a default value for existing rows in a single statement. Example:

In [6]:
%%sql
ALTER TABLE trainers ADD COLUMN created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP;

 * postgresql+psycopg2://@/postgres
Done.


[]

This statement adds a new column named `created_at` of type `TIMESTAMP` to the `trainers` table and sets the default value to the current timestamp for existing rows.

### Modifying Column Data Types
PostgreSQL allows you to modify the data type of a column using the `ALTER TABLE` command. Example:

In [8]:
%%sql
ALTER TABLE trainers ALTER COLUMN phone TYPE VARCHAR(15);

 * postgresql+psycopg2://@/postgres
Done.


[]

This statement changes the data type of the `phone` column in the `trainers` table from `VARCHAR(20)` to `VARCHAR(15)`.

### Adding Constraints
PostgreSQL enables you to add constraints to existing tables using the `ALTER TABLE` command. Example:

In [9]:
%%sql
ALTER TABLE pokemon ADD CONSTRAINT unique_name_per_trainer UNIQUE (name, trainer_id);

 * postgresql+psycopg2://@/postgres
Done.


[]

This statement adds a unique constraint named `unique_name_per_trainer` to the `pokemon` table, ensuring that the combination of `name` and `trainer_id` is unique.

These examples demonstrate the powerful capabilities of PostgreSQL's `ALTER TABLE` command. PostgreSQL provides a rich set of options to modify table structures, add or drop constraints, change column data types, and perform various other table alterations.

The flexibility and extensive features of `ALTER TABLE` in PostgreSQL allow for efficient schema modifications without the need to recreate tables from scratch. This is particularly useful in scenarios where the database schema needs to evolve over time to accommodate changing requirements.

It's important to note that while SQLite does support some basic `ALTER TABLE` operations, such as renaming tables and adding columns, it lacks the extensive options and flexibility provided by PostgreSQL's `ALTER TABLE` command.

In [7]:
%%sql
CREATE USER john_doe WITH PASSWORD 'secret_password';

 * postgresql+psycopg2://@/postgres
Done.


[]