<a href="https://colab.research.google.com/github/brendanpshea/database_sql/blob/main/Database_09_PokemonAndPostgres.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Postgres and Pokemon

Welcome to the exciting world of PostgreSQL! In this chapter, we'll embark on a thrilling adventure as we explore the wonders of this powerful database system. Think of it as leveling up from SQLite to a more advanced and feature-packed database.

We'll start by comparing PostgreSQL with SQLite, which you've already befriended in the previous chapters. You'll discover the superpowers that PostgreSQL brings to the table, like its robust architecture, extensive feature set, and ability to handle more complex scenarios.

Get ready to dive into the practical side of things! We'll guide you through the installation process and show you how to create your very own "Pokemon Research Center" database. It's like building a virtual laboratory for your beloved Pokemon. You'll learn how to define tables with special abilities, such as unique data types, powerful constraints, and relationships that bind them together. We'll also teach you how to populate these tables with sample data, bringing your database to life!

We'll uncover the secrets of advanced techniques like password hashing, which acts as a protective shield for sensitive information. You'll learn how to harness the power of PostgreSQL's array data type and perform incredible feats with the ALTER TABLE command, allowing you to modify your database structure with ease.

As we progress through the chapter, we'll introduce you to the concept of stored procedures. Think of them as special moves that your database can perform, encapsulating complex operations into a single, reusable unit. We'll also delve into the realm of user management and role-based access control, empowering you to become a master of database security and integrity.

So, grab your Pokedex and get ready to level up your database skills with PostgreSQL! Let's embark on this exciting adventure together and become masters of the database universe!

Learning Outcomes:

1.  Understand the key differences between PostgreSQL and SQLite, and when to choose one over the other.
2.  Learn how to install PostgreSQL, create a database, and define tables with complex data types and constraints.
3.  Understand the importance of password hashing and how to implement it in PostgreSQL.
4.  Learn how to use PostgreSQL's array data type and perform complex table alterations using ALTER TABLE.
5.  Understand the concept of stored procedures and how to create and use them in PostgreSQL.
6.  Learn about user management and role-based access control in PostgreSQL, and their importance in maintaining database security.
7.  Understand the different deployment options for PostgreSQL, including on-site and cloud-based deployment, and their respective advantages and considerations.
8. Carry out basic data analytics in Postgres.

Keywords: PostgreSQL, SQLite, RDBMS, database, data types, constraints, password hashing, arrays, ALTER TABLE, stored procedures, user management, roles, deployment

## What is PostgreSQL? How Does it Differ from SQLite?

PostgreSQL and SQLite are both relational database management systems that support ACID (Atomicity, Consistency, Isolation, Durability) properties and transactions. However, they have significant differences in their architectures, feature sets, and use cases.

**SQLite** is a lightweight, serverless, and self-contained database engine. It's ideal for small to medium-scale applications, embedded systems, or local data storage. SQLite stores the entire database as a single file on disk, making it easy to set up and manage. It's often used in mobile apps, desktop applications, and small websites.

On the other hand, **PostgreSQL** is a full-featured, server-based RDBMS designed to handle large amounts of data and support multiple concurrent users. Its client-server architecture allows it to manage resources more efficiently and handle heavier workloads.

When deciding between SQLite and PostgreSQL, consider the following factors:

1.  **Scalability**: As your application grows and the number of concurrent users increases, PostgreSQL's client-server architecture becomes crucial. It can handle a large number of simultaneous connections and efficiently manage resources. SQLite, being serverless, may struggle with high levels of concurrency and may not be suitable for applications with a large number of concurrent writers.
2.  **Data Size and Distribution**: PostgreSQL is designed to handle large and massive datasets, even in the terabyte range. It offers features like table partitioning, which allows you to split large tables across multiple files or even servers, improving query performance and manageability. SQLite, while capable of handling moderately sized datasets, may not be the best choice for extremely large or distributed datasets.
3.  **Advanced Features**: PostgreSQL offers a rich set of advanced features that become increasingly important as your application grows. These include:
    -   **Strict Typing**: PostgreSQL enforces strict data typing, ensuring data integrity and reducing the chances of data inconsistencies. This becomes increasingly critical when there many "writers" to the database.
    -   **Complex Queries**: PostgreSQL supports complex queries, including advanced joins, subqueries, and window functions, which are essential for handling sophisticated data retrieval tasks.
    -   **Stored Procedures and Triggers**: PostgreSQL allows you to define stored procedures and triggers, enabling you to encapsulate complex business logic within the database itself. This can lead to better performance and maintainability.
    -   **Extensibility**: PostgreSQL is highly extensible, allowing you to add custom data types, functions, and even programming languages. This flexibility becomes crucial as your application's requirements evolve.
    - **Security and User Management**: Postgres has built-in support for things like encryption, user management, password hashing, and other security measures. SQLite, by contrast, relies on the surrounding "application" (written in Python, Java, C#, etc.) to handle these things. This can be become impractical as the numbers of users becomes large.
4.  **Replication and High Availability**: As your application becomes mission-critical, you may need to ensure high availability and minimize downtime. PostgreSQL offers built-in replication features, such as streaming replication and logical replication, which allow you to create standby servers and distribute the workload. SQLite, being a serverless database, does not have built-in replication capabilities.
5. **Available Resources.** Postgres requires more physical resources (processing power, disk space) and human resources (e.g., a trained database administrator) than SQLite. SQLite's dynamic typing can make database development and deployment quicker than Postgres's strict typing. (In fact, SQLite is often used to develop protoype databases, which can then be "scaled up" to Postgres or a similar RDBMS).

While SQLite is a great choice for small to medium-sized applications, embedded systems, or local data storage, it may not be suitable for large-scale, high-concurrency, or mission-critical applications. In these cases, PostgreSQL's robustness, scalability, and advanced features make it the better choice.

For example, a large institution like a university or a financial organization would likely choose PostgreSQL over SQLite due to its ability to handle large amounts of data, support multiple concurrent users, and provide advanced features necessary for complex data management tasks.

## Data Types in Postgres

PostgreSQL offers a rich set of data types, including several that are not available in SQLite. These data types allow you to store and manipulate data more efficiently and with greater precision. Let's explore some of the key data types in PostgreSQL.

| Data Type Category | Examples | Description |
| --- | --- | --- |
| Numeric Types | `INTEGER`, `BIGINT`, `SMALLINT`, `DECIMAL`, `NUMERIC`, `REAL`, `DOUBLE PRECISION`, `SERIAL`, `BIGSERIAL` | Whole numbers, fixed-point numbers, floating-point numbers, and auto-incrementing integers. |
| Character Types | `CHAR(n)`, `VARCHAR(n)`, `TEXT` | Fixed-length and variable-length character strings. |
| Date/Time Types | `DATE`, `TIME`, `TIMESTAMP`, `INTERVAL` | Stores date, time, timestamp, and interval values. |
| Boolean Type | `BOOLEAN` | Stores a logical value of either `TRUE` or `FALSE`. |
| Enumerated Type | `ENUM` | Defines a custom data type with a static set of values. |
| Array Type | Any data type followed by `[]` | Represents an array of elements of the same type. |
| UUID Type | `UUID` | Stores Universally Unique Identifiers (UUIDs). |
| JSON and JSONB Types | `JSON`, `JSONB` | Stores JSON data as text or in a binary format. |
| Hstore Type | `HSTORE` | Represents a key-value pair data type. |
| Range Types | `INT4RANGE`, `TSRANGE`, `DATERANGE`, etc. | Represent a range of values, such as integers, timestamps, or dates. |

Here are a few examples to illustrate the usage of some of these data types:

1.  Enumerated Type:

```sql
CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
```

2.  Array Type:

```sql
CREATE TABLE scores (
      id SERIAL PRIMARY KEY,
      student_id INTEGER,
      grades INTEGER[]
    );
```

When choosing data types for your database schema, consider factors such as data integrity, storage efficiency, and the nature of the data being stored. PostgreSQL's wide range of data types gives you flexibility and power in designing your database schema.

It's worth noting that while SQLite also supports many of these data types, such as numeric, character, and date/time types, it lacks some of the more advanced types like arrays, UUIDs, and range types. PostgreSQL's extensive set of data types is one of the factors that make it a more versatile and feature-rich database system.

In [None]:
# Insteall postgres
!apt install postgresql postgresql-contrib &>log
!service postgresql start
!sudo -u postgres psql -c "CREATE USER root WITH SUPERUSER"
# set connection
%load_ext sql
%config SqlMagic.autopandas=True
%sql postgresql+psycopg2://@/postgres

 * Starting PostgreSQL 14 database server
   ...done.
CREATE ROLE


## Overview of the Pokemon Research Center Database

To demonstrate the power of Postgres, we'll be creating a **Pokemon Research Center** database consisting of three main tables: researchers, pokemon, and research_records. These tables are designed to store information about Pokemon researchers, their Pokemon, and the research records associated with each Pokemon.


In [None]:
%%sql
-- Drop existing tables if they exist
DROP TABLE IF EXISTS researchers CASCADE;
DROP TABLE IF EXISTS pokemon CASCADE;
DROP TABLE IF EXISTS research_records;

-- Create the researchers table
CREATE TABLE researchers (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100) NOT NULL,
  email VARCHAR(100) UNIQUE NOT NULL,
  password_hash VARCHAR(100) NOT NULL,
  phone VARCHAR(20),
  date_of_birth DATE,
  CHECK (date_of_birth < CURRENT_DATE)
);

-- Create the pokemon table
CREATE TABLE pokemon (
  id SERIAL PRIMARY KEY,
  name VARCHAR(50) NOT NULL,
  species VARCHAR(50) NOT NULL,
  researcher_id INTEGER REFERENCES researchers(id),
  level INTEGER CHECK (level BETWEEN 1 AND 100),
  health_status VARCHAR(20),
  abilities TEXT[]
);

-- Create the research_records table
CREATE TABLE research_records (
  id SERIAL PRIMARY KEY,
  pokemon_id INTEGER REFERENCES pokemon(id),
  observation_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  findings TEXT,
  recommendations TEXT,
  funding NUMERIC(8, 2)
);


 * postgresql+psycopg2://@/postgres
Done.
Done.
Done.
Done.
Done.
Done.


[]

While much of this statement should be familiar, you might notice a few new things, as well.

The **researchers** table stores information about Pokemon researchers.

-   It includes columns for the researcher's ID (auto-generated), name, email (unique), hashed password, phone number, and date of birth.
-   The `date_of_birth` column has a **CHECK constraint** that ensures the value is always less than the current date. This constraint uses the `CURRENT_DATE` function, which is specific to PostgreSQL and not available in SQLite.
-   The `email` column has a **UNIQUE constraint** to ensure that each email address is associated with only one researcher.

The **pokemon** table stores information about individual Pokemon.

-   It includes columns for the Pokemon's ID (auto-generated), name, species, researcher ID (foreign key referencing the researchers table), level, health status, and abilities.
-   The `level` column has a **CHECK constraint** that ensures the value is between 1 and 100.
-   The `abilities` column is of type **TEXT[] (array)**, which allows storing multiple abilities for each Pokemon. Arrays are a feature specific to PostgreSQL and not available in SQLite.

The **research_records** table stores information about research observations and recommendations for each Pokemon.

-   It includes columns for the research record ID (auto-generated), Pokemon ID (foreign key referencing the pokemon table), observation date, findings, recommendations, and funding.
-   The `observation_date` column has a **DEFAULT value** of `CURRENT_TIMESTAMP`, which automatically sets the value to the current timestamp if no value is provided during insertion. This is a PostgreSQL-specific feature.
-   The `funding` column is of type **NUMERIC(8, 2)**, which allows storing monetary values with a precision of 8 digits and 2 decimal places. The NUMERIC type provides more precise decimal calculations compared to SQLite's REAL type.


## Test Data for Pokemon Research Center

Now, let's insert some test data for our Pokemon Research Center:

In [None]:
%%sql
CREATE EXTENSION IF NOT EXISTS pgcrypto;
-- Insert sample data into the researchers table
INSERT INTO researchers (name, email, password_hash, phone, date_of_birth)
VALUES
  ('Professor Oak', 'oak@example.com', crypt('bad_password', gen_salt('bf')), '123-456-7890', '1950-06-15'),
  ('Professor Elm', 'elm@example.com', crypt('long_password_with_symbols_23423odsgv*732x', gen_salt('bf')), '987-654-3210', '1965-12-01');

-- Insert sample data into the pokemon table
INSERT INTO pokemon (name, species, researcher_id, level, health_status, abilities)
VALUES
  ('Sparky', 'Pikachu', 1, 25, 'Healthy', ARRAY['Static', 'Lightning Rod']),
  ('Flare', 'Charizard', 1, 65, 'Healthy', ARRAY['Blaze']),
  ('Aqua', 'Totodile', 2, 30, 'Injured', ARRAY['Torrent', 'Sheer Force']),
  ('Bulby', 'Bulbasaur', 1, 20, 'Healthy', ARRAY['Overgrow']),
  ('Scorch', 'Charmander', 2, 15, 'Healthy', ARRAY['Blaze']),
  ('Wings', 'Pidgeotto', 1, 40, 'Healthy', ARRAY['Keen Eye', 'Tangled Feet']);

-- Insert sample data into the research_records table
INSERT INTO research_records (pokemon_id, findings, recommendations, funding)
VALUES
  (3, 'Behavioral study on water-type moves', 'Further observation needed', 1000.00),
  (1, 'Electrical discharge analysis', 'Monitor energy output', 2000.00),
  (4, 'Growth rate analysis', 'Increase sunlight exposure', 1500.00),
  (5, 'Fire-type behavior study', 'Control training environment', 1200.00),
  (6, 'Flight pattern study', 'Observe in natural habitat', 1800.00),
  (2, 'Advanced combat techniques', 'Enhance training regimen', 2200.00);


 * postgresql+psycopg2://@/postgres
Done.
2 rows affected.
6 rows affected.
6 rows affected.


[]

## Researchers Table: What is a Password "Hash"?

Again, much of the above INSERT statement should be familiar. However, if you look closely at the ways the researcher's passwords are handled, you'll notice something a bit different.

The `password_hash` column stores the hashed version of the researcher's password. **Hashing** is a one-way process that converts the plain-text password into a fixed-size string of characters. The resulting hash is irreversible, meaning it is computationally infeasible to obtain the original password from the hash.

In this example, the `crypt` function is used along with the `gen_salt` function to hash the passwords. The `crypt` function applies a cryptographic hash function to the password, and the `gen_salt` function generates a random salt value using the Blowfish ('bf') algorithm. The salt is appended to the password before hashing, making it more resistant to rainbow table attacks and increasing the security of stored passwords.

Storing hashed passwords instead of plain-text passwords is crucial for security. If the database is compromised, attackers would only have access to the hashed passwords, making it extremely difficult for them to retrieve the original passwords.

It's important to note that while the example uses the `crypt` function for simplicity, in a production environment, it's recommended to use more secure and modern hashing algorithms specifically designed for password hashing, such as bcrypt, scrypt, or PBKDF2, which provide better protection against various types of attacks.

We see what this hash looks like:

In [None]:
%%sql
SELECT * FROM researchers;

 * postgresql+psycopg2://@/postgres
2 rows affected.


id,name,email,password_hash,phone,date_of_birth
1,Professor Oak,oak@example.com,$2a$06$rYzEbevmRid6vjSiWTPUGecp4PbCTFIwDLKeFmlth4b41x6thadWW,123-456-7890,1950-06-15
2,Professor Elm,elm@example.com,$2a$06$jFqu4Gxes9OZAUgCmKxTZuGOHSXvWpTjOYc7LZXJpS7xYXEQIrpoC,987-654-3210,1965-12-01


If you look closely, you'll notice that the password hash is ALWAYS the exact same length, regardless of the initial length of the password. So, for example, if we take two passwords, one of which is "123", and the other of which is the text of my favorite novel (300 pages---also a bad password, though for different reasons!), they will both generate a "hash" of the exact same length.

The basic idea is this:

-   When the user first creates their password, we "hash" the password and store that hash (not the password) in the database.
-   When the user logs in again and enters their password again, we again "hash" whatever they entered, and compare this to our database. If it matches, we let them in!
-   The advantage of this is that if someone manages to break into our database and access the password hash, they won't be able to recover the user's password. This is because hashing is a one-way function. If you know the password, you can get the hash, but knowing the hash does NOT allow you to compute the password.

## Postgres Arrays in the Pokemon Table

Let's now take a look at the pokemon table, which has an array.

In [None]:
%%sql
SELECT * FROM pokemon;

 * postgresql+psycopg2://@/postgres
3 rows affected.


id,name,species,researcher_id,level,health_status,abilities
1,Pikachu,Pikachu,1,25,Healthy,"['Static', 'Lightning Rod']"
2,Charizard,Charizard,1,65,Healthy,['Blaze']
3,Totodile,Totodile,2,30,Injured,"['Torrent', 'Sheer Force']"


Here, you'll notice there is an array of abilities. We can access this as follows:

In [None]:
%%sql
SELECT
  name,
  abilities[1] as first_ability,
  abilities[2] as second_ability,
  abilities[3] as third_ability
FROM pokemon;


 * postgresql+psycopg2://@/postgres
3 rows affected.


name,first_ability,second_ability,third_ability
Pikachu,Static,Lightning Rod,
Charizard,Blaze,,
Totodile,Torrent,Sheer Force,


The attempt to access the third ability fails (since the Pokemon don't have one!). However, it doesn't crash the database—it just returns None.

## A Better ALTER TABLE

In PostgreSQL, the `ALTER TABLE` command is more versatile and feature-rich compared to SQLite. While SQLite supports basic table modifications, PostgreSQL offers a wide range of options to alter tables efficiently. Let's explore some of the key improvements in PostgreSQL's `ALTER TABLE` command.

###  Adding Columns with Default Values
In PostgreSQL, you can add a new column to a table and specify a default value for existing rows in a single statement. Example:

In [None]:
%%sql
ALTER TABLE researchers ADD COLUMN created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP;

 * postgresql+psycopg2://@/postgres
Done.


[]

This statement adds a new column named `created_at` of type `TIMESTAMP` to the `researchers` table and sets the default value to the current timestamp for existing rows.

### Modifying Column Data Types
PostgreSQL allows you to modify the data type of a column using the `ALTER TABLE` command. Example:

In [None]:
%%sql
ALTER TABLE researchers ALTER COLUMN phone TYPE VARCHAR(15);

 * postgresql+psycopg2://@/postgres
Done.


[]

This statement changes the data type of the `phone` column in the `researchers` table from `VARCHAR(20)` to `VARCHAR(15)`.

### Adding Constraints
PostgreSQL enables you to add constraints to existing tables using the `ALTER TABLE` command. Example:

In [None]:
%%sql
ALTER TABLE pokemon ADD CONSTRAINT unique_name_per_researcher UNIQUE (name, researcher_id);

 * postgresql+psycopg2://@/postgres
Done.


[]

This statement adds a unique constraint named `unique_name_per_researcher` to the `pokemon` table, ensuring that the combination of `name` and `trainer_id` is unique.

These examples demonstrate the powerful capabilities of PostgreSQL's `ALTER TABLE` command. PostgreSQL provides a rich set of options to modify table structures, add or drop constraints, change column data types, and perform various other table alterations.

The flexibility and extensive features of `ALTER TABLE` in PostgreSQL allow for efficient schema modifications without the need to recreate tables from scratch. This is particularly useful in scenarios where the database schema needs to evolve over time to accommodate changing requirements.

It's important to note that while SQLite does support some basic `ALTER TABLE` operations, such as renaming tables and adding columns, it lacks the extensive options and flexibility provided by PostgreSQL's `ALTER TABLE` command.

## Stored Procedures in PostgreSQL

In the world of Pokemon, trainers often have a set of routines they follow to care for their Pokemon. They may have a specific way of feeding them, training them, or even a special technique for battling. These routines help trainers manage their Pokemon effectively and consistently. Similarly, in the world of databases, we have stored procedures that allow us to encapsulate and reuse common tasks or complex operations.

### What are Stored Procedures?

A **stored procedure** is a precompiled collection of SQL statements and optional control-flow statements stored under a name and processed as a unit. It is a database object that performs a specific task or a series of tasks when invoked. Stored procedures are stored within the database itself and can be called or executed whenever needed.

Stored procedures offer several benefits:

1.  *Reusability*. Stored procedures can be called multiple times from different parts of an application, promoting code reuse and reducing duplication.
2.  *Encapsulation*. Stored procedures encapsulate complex logic and SQL statements, making the code more modular and easier to maintain.
3.  *Performance*. Since stored procedures are precompiled and stored in the database, they can execute faster compared to individual SQL statements sent from an application.
4.  *Security*. Stored procedures can help enforce security by granting execute permissions to users without giving them direct access to the underlying tables.

### Basic Syntax of Stored Procedures

In PostgreSQL, you can create a stored procedure using the `CREATE PROCEDURE` statement. Here's the basic syntax:

```sql
CREATE PROCEDURE procedure_name(parameter1 datatype, parameter2 datatype, ...)
AS $$
BEGIN
  -- Procedure logic goes here
  -- SQL statements and control-flow statements
END;
$$ LANGUAGE plpgsql;
```

Let's break down the syntax:

-   `procedure_name`: The name you give to the stored procedure.
-   `parameter1`, `parameter2`, etc.: Optional input parameters that the procedure can accept. You specify the parameter name and its data type.
-   `AS $$`: Indicates the start of the procedure body.
-   `BEGIN` and `END`: Delimits the procedure body, which contains the SQL statements and control-flow statements.
-   `LANGUAGE plpgsql`: Specifies the language used for the stored procedure, in this case, PL/pgSQL (Procedural Language/PostgreSQL).

### Example: Stored Procedure for the Pokemon Clinic

Let's create a stored procedure for the Pokemon Clinic that retrieves the medical records of a specific Pokemon based on its ID.

In [None]:
%%sql
CREATE OR REPLACE PROCEDURE update_pokemon_level(
  p_pokemon_id INTEGER,
  p_new_level INTEGER
)
AS $$
BEGIN
  -- Update the level of the Pokemon
  UPDATE pokemon
  SET level = p_new_level
  WHERE id = p_pokemon_id;

  -- Check if any rows were affected
  IF FOUND THEN
    RAISE NOTICE 'Pokemon level updated successfully';
  ELSE
    RAISE NOTICE 'No Pokemon found with the given ID';
  END IF;
END;
$$ LANGUAGE plpgsql;

 * postgresql+psycopg2://@/postgres
Done.


[]

In this example:

-   The `CREATE OR REPLACE PROCEDURE` statement is used to create a procedure named `update_pokemon_level`.
-   The procedure takes two input parameters: `p_pokemon_id` (INTEGER) representing the ID of the Pokemon to update, and `p_new_level` (INTEGER) representing the new level to set for the Pokemon.
-   Inside the procedure body, an `UPDATE` statement is used to update the `level` column of the `pokemon` table for the row where the `id` matches the provided `p_pokemon_id`.
-   The `IF FOUND THEN` clause checks if any rows were affected by the `UPDATE` statement. If rows were affected (i.e., a Pokemon with the given ID was found and updated), it raises a notice indicating that the level was updated successfully (it also **logs** this notice). Otherwise, it raises a notice indicating that no Pokemon was found with the given ID.

To call this stored procedure and update the level of a Pokemon, you can use the `CALL` statement followed by the procedure name and the required arguments. For example:

In [None]:
%%sql
CALL update_pokemon_level(1, 30);

 * postgresql+psycopg2://@/postgres
Done.


[]

This statement calls the `update_pokemon_level` procedure, passing the values `1` as the `p_pokemon_id` parameter and `30` as the `p_new_level` parameter. The procedure will then execute the `UPDATE` statement and update the level of the Pokemon with ID 1 to 30.

## User Management and Roles in PostgreSQL

When it comes to managing a database, one of the most critical aspects is ensuring the security and integrity of the data. In the world of Pokemon, this means safeguarding sensitive information about trainers, their Pokemon, and the medical records of the Pokemon Clinic. Just like how a Pokemon trainer must carefully manage their team and delegate responsibilities, a database administrator must effectively manage user access and privileges to maintain a secure and organized system.

PostgreSQL provides a robust user management system that allows you to create, modify, and delete user accounts, as well as assign privileges and roles to users. By properly setting up user accounts and roles, you can ensure that each user has access to only the necessary data and functionality, reducing the risk of unauthorized access or data breaches.

Let's explore how to manage users and their roles in PostgreSQL using our Pokemon Research Center database as an example.

### Creating Users
In PostgreSQL, you can create a new user using the `CREATE USER` statement. For example, let's create a user account for the head nurse of the Pokemon Research Center:

In [None]:
%%sql
CREATE USER nurse_joy WITH PASSWORD 'chansey123';

 * postgresql+psycopg2://@/postgres
Done.


[]

This statement creates a new user named `nurse_joy` with the password `'chansey123'`. It's important to choose a strong and secure password to protect the user account from unauthorized access.

### Creating Roles
**Roles** in PostgreSQL are used to group together privileges and assign them to users. Think of roles as job titles or categories that define what actions users can perform. Let's create a role for Pokemon trainers:

In [None]:
%%sql
CREATE ROLE pokemon_trainer;

 * postgresql+psycopg2://@/postgres
Done.


[]

This statement creates a new role named `pokemon_trainer`. We can later assign this role to users who are authorized to perform actions related to Pokemon training.

### Granting Privileges to Roles
Once you have created a role, you can grant specific privileges to that role using the `GRANT` statement. For example, let's grant the necessary privileges to the `pokemon_trainer` role:

In [None]:
%%sql
GRANT SELECT, INSERT, UPDATE ON pokemon TO pokemon_trainer;

 * postgresql+psycopg2://@/postgres
Done.


[]

### Assigning Roles to Users
To assign a role to a user, you can use the GRANT statement with the TO clause. Let's assign the pokemon_trainer role to a user named ash_ketchum:

In [None]:
%%sql
CREATE USER ash_ketchum WITH PASSWORD 'pika!';
GRANT pokemon_trainer TO ash_ketchum;

 * postgresql+psycopg2://@/postgres
Done.
Done.


[]

This statement grants the `pokemon_trainer` role to the user `ash_ketchum`. Now, `ash_ketchum` has all the privileges associated with the `pokemon_trainer` role.

### Revoking Privileges and Roles
Sometimes, you may need to revoke privileges or roles from users or roles. This is useful when a user's responsibilities change or when you want to restrict access to certain data. You can use the `REVOKE` statement to revoke privileges and roles. For example:

In [None]:
%%sql
REVOKE UPDATE ON pokemon FROM pokemon_trainer;
REVOKE pokemon_trainer FROM ash_ketchum;

 * postgresql+psycopg2://@/postgres
Done.
Done.


[]

The first statement revokes the `UPDATE` privilege on the `pokemon` table from the `pokemon_trainer` role. This means that users with the `pokemon_trainer` role will no longer be able to modify data in the `pokemon` table. The second statement revokes the `pokemon_trainer` role from the user `ash_ketchum`. As a result, `ash_ketchum` will no longer have the privileges associated with the `pokemon_trainer` role.

PostgreSQL allows you to modify user attributes and delete user accounts using the `ALTER USER` and `DROP USER` statements, respectively. For example:

In [None]:
%%sql
ALTER USER nurse_joy WITH PASSWORD 'blissey456';
DROP USER ash_ketchum;

 * postgresql+psycopg2://@/postgres
Done.
Done.


[]

The first statement changes the password of the user `nurse_joy` to `'blissey456'`. It's a good practice to regularly update passwords to enhance security. The second statement deletes the user `ash_ketchum` from the database. This should be done with caution and only when necessary, as it permanently removes the user and their associated permissions.


### Importance of User Management and Security

Proper user management is crucial for maintaining the security of your database. By creating separate user accounts and assigning appropriate privileges and roles, you can ensure that users have access only to the necessary data and operations. Some important security considerations include:
  -   Use strong and unique passwords for user accounts.
  -   Grant privileges based on the **principle of least privilege**, giving users only the permissions they need to perform their tasks.
  -   Regularly review and audit user privileges to ensure they align with the users' responsibilities.
  -   Revoke unnecessary privileges and remove inactive user accounts to minimize security risks.

By leveraging PostgreSQL's user management and role-based access control features, you can effectively manage user access, enforce security policies, and protect sensitive data in your database.

## Basic Data Analytics in Postgres

Data analytics is a powerful tool that allows us to extract meaningful insights from our data. PostgreSQL, with its robust SQL capabilities, provides a wide range of features to perform data analytics directly within the database. In this section, we'll explore some basic data analytics techniques using our **Pokemon Research Center** database. Much (though not all!) of this will serve as a review of what we learned using SQLite in earlier chapters.

### Aggregating Data

One of the fundamental operations in data analytics is aggregation, which involves summarizing data to provide useful insights. Let's start with some basic aggregation queries.

We can count the number of Pokemon each researcher is studying.

In [None]:
%%sql
SELECT researcher_id, COUNT(*) AS pokemon_count
FROM pokemon
GROUP BY researcher_id;

 * postgresql+psycopg2://@/postgres
2 rows affected.


Unnamed: 0,researcher_id,pokemon_count
0,2,2
1,1,4


This query groups the data by `researcher_id` and counts the number of Pokemon associated with each researcher. **GROUP BY** is a powerful SQL feature supported in both PostgreSQL and SQLite, but PostgreSQL can handle larger datasets more efficiently.

We can calculate the total funding allocated to each Pokemon.

In [None]:
%%sql
SELECT pokemon_id, SUM(funding) AS total_funding
FROM research_records
GROUP BY pokemon_id;

 * postgresql+psycopg2://@/postgres
6 rows affected.


Unnamed: 0,pokemon_id,total_funding
0,3,1000.0
1,5,1200.0
2,4,1500.0
3,6,1800.0
4,2,2200.0
5,1,2000.0


We can find the average level of Pokemon being studied by each researcher.

In [None]:
%%sql
SELECT researcher_id, ROUND(AVG(level),2) AS average_level
FROM pokemon
GROUP BY researcher_id;

 * postgresql+psycopg2://@/postgres
2 rows affected.


Unnamed: 0,researcher_id,average_level
0,2,22.5
1,1,37.5


### Filtering Data
Filtering data is essential to focus on specific subsets of data that meet certain criteria. Let's explore some examples. First, we can retrieve all Pokemon that are currently injured.

In [None]:
%%sql
SELECT *
FROM pokemon
WHERE health_status = 'Injured';

 * postgresql+psycopg2://@/postgres
1 rows affected.


Unnamed: 0,id,name,species,researcher_id,level,health_status,abilities
0,3,Aqua,Totodile,2,30,Injured,"[Torrent, Sheer Force]"


We can find all researchers who are studying more than one Pokemon.

In [None]:
%%sql
SELECT
  researcher_id,
  COUNT(*) AS pokemon_count
FROM pokemon
GROUP BY researcher_id
HAVING COUNT(*) > 1;

 * postgresql+psycopg2://@/postgres
2 rows affected.


Unnamed: 0,researcher_id,pokemon_count
0,2,2
1,1,4


### Joining Tables
Joining tables allows us to combine data from multiple tables based on related columns. Let's explore some examples.

We can retrieve a list of Pokemon along with their researchers' names.

In [None]:
%%sql
SELECT
  p.name AS pokemon_name,
  r.name AS researcher_name
FROM
  pokemon p
  JOIN researchers r ON p.researcher_id = r.id;

 * postgresql+psycopg2://@/postgres
6 rows affected.


Unnamed: 0,pokemon_name,researcher_name
0,Sparky,Professor Oak
1,Flare,Professor Oak
2,Aqua,Professor Elm
3,Bulby,Professor Oak
4,Scorch,Professor Elm
5,Wings,Professor Oak


Now, let's combine joins with aggregration to find the total funding received by each researcher.

In [None]:
%%sql
SELECT
  r.name AS researcher_name,
  SUM(rr.funding) AS total_funding
FROM
  researchers r
  JOIN pokemon p ON r.id = p.researcher_id
  JOIN research_records rr ON p.id = rr.pokemon_id
GROUP BY r.name;


 * postgresql+psycopg2://@/postgres
2 rows affected.


Unnamed: 0,researcher_name,total_funding
0,Professor Oak,7500.0
1,Professor Elm,2200.0


### Advanced Analysis with Window Functions

**Window functions** perform calculations across a set of table rows related to the current row. They are powerful tools for advanced data analytics.

For example, let's rank Pokemon by their level within each researcher's group.

In [None]:
%%sql
SELECT name, researcher_id, level,
       RANK() OVER (PARTITION BY researcher_id ORDER BY level DESC) AS rank
FROM pokemon;

 * postgresql+psycopg2://@/postgres
6 rows affected.


Unnamed: 0,name,researcher_id,level,rank
0,Flare,1,65,1
1,Wings,1,40,2
2,Sparky,1,25,3
3,Bulby,1,20,4
4,Aqua,2,30,1
5,Scorch,2,15,2


The key line here is `RANK() OVER (PARTITION BY researcher_id ORDER BY level DESC) AS rank`. Here's what happens in this in line.
-   `RANK()`: This is a window ranking function that assigns a rank to each row based on the specified ordering within a window.
-   `OVER (PARTITION BY researcher_id ORDER BY level DESC)`: This clause defines the window for the ranking function.
    -   `PARTITION BY researcher_id`: This partitions the data into groups based on the `researcher_id` column. So, the ranking will be calculated independently for each researcher.
    -   `ORDER BY level DESC`: This orders the rows within each partition by the `level` column in descending order (highest level first).


For another example, let's calculate the **moving average** level of Pokemon for each researcher.

In [None]:
%%sql
SELECT name, researcher_id, level,
       AVG(level) OVER (PARTITION BY researcher_id ORDER BY level ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS moving_avg
FROM pokemon;

 * postgresql+psycopg2://@/postgres
6 rows affected.


Unnamed: 0,name,researcher_id,level,moving_avg
0,Bulby,1,20,22.5
1,Sparky,1,25,28.333333333333332
2,Wings,1,40,43.33333333333333
3,Flare,1,65,52.5
4,Scorch,2,15,22.5
5,Aqua,2,30,22.5


The moving average is a technique used to smooth out data points by creating an average of different subsets of the complete data set. In the context of our Pokemon Research Center database, calculating the moving average of Pokemon levels can help us understand trends or patterns in the levels of Pokemon being studied by each researcher.

The key line here is: `AVG(level) OVER (PARTITION BY researcher_id ORDER BY level ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS moving_avg`.  Here's a breakdown:

-  `AVG(level)`: This specifies that we are calculating the average of the `level` column.
-   `OVER`: This clause defines the window for the window function.
-   `PARTITION BY researcher_id`: This divides the result set into partitions based on the `researcher_id`. The moving average is calculated within each partition (i.e., for each researcher separately).
-   `ORDER BY level`: This orders the rows within each partition by the `level` column.
-   `ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING`: This defines the window frame, which includes the current row, the row immediately before it, and the row immediately after it. Thus, the moving average is calculated over three rows: the current row, the previous row, and the next row.

The result is "smoothed representation" of the level, which is obtained by taking the mean of three values: the current row, the previous row, and the next row.

## Pikachu and Squirtle on Postgres and SQLite
Pikachu: Pika pika! Pikachu pika pika chu! (Hey Squirtle! We need to decide on a database for the Pokemon Research Center! It's a crucial decision that will impact their operations and research capabilities.)

Squirtle: Squirtle squirt! Squirtle squirt squirtle squirt. (I know, Pikachu! It's important to consider the specific needs of the research center. What do you think about SQLite and Postgres as potential options?)

Pikachu: Pika pi, pikachu pika pika chu. Pika pika, chu pika pikachu pika. Pika pikachu, chu pika pika! (Well, SQLite is serverless, meaning it's embedded directly into the application. It doesn't require a separate server process, making it lightweight and easy to set up. Perfect for small-scale applications with minimal concurrency needs!)

Squirtle: Squirtle squirt squirtle. Squirt squirtle, squirt squirt squirtle. Squirtle squirt, squirtle squirt squirt! (On the other hand, Postgres is a client-server database. It runs as a separate process and can handle multiple client connections simultaneously. This makes it suitable for applications with high concurrency and scalability requirements!)

Pikachu: Pika pika, chu pika pikachu. Pika pi, pikachu pika chu pika. Pika pika, chu pika pikachu pika! (Another difference is that SQLite uses dynamic typing, which means it doesn't enforce strict data types. While this provides flexibility, it can also lead to data inconsistencies if not handled carefully. SQLite is great for rapid prototyping and simple data storage needs!)

Squirtle: Squirt squirtle squirt. Squirtle squirt, squirtle squirtle squirt. Squirtle squirt, squirtle squirt squirtle! (Yeah, while Postgres has strong data typing. It ensures data integrity by enforcing data types and constraints. This is crucial for maintaining data consistency and reliability in complex applications!)

Pikachu: Pika pi, pikachu pika chu pika. Pika pika, chu pika pikachu pika. Pika pikachu, chu pika pika! (When it comes to concurrency, SQLite uses file-level locking and allows only one write operation at a time. This means that multiple users cannot simultaneously write to the database. However, it's perfectly fine for applications with low concurrency requirements!)

Squirtle: Squirtle squirt squirtle squirt. Squirtle squirt, squirtle squirtle squirt. Squirtle squirt, squirtle squirt squirtle! (Postgres, on the other hand, uses MVCC (Multi-Version Concurrency Control) to handle concurrent transactions efficiently. It allows multiple users to read and write data simultaneously without locking conflicts. This makes Postgres ideal for applications with high traffic and concurrent access!)

Pikachu: Pika pika, chu pika pikachu pika. Pika pi, pikachu pika chu pika. Pika pika, chu pika pikachu! (If the research center has a small-scale application with minimal concurrent users, SQLite could be a good choice. It's lightweight, easy to embed, and requires minimal setup and maintenance. Perfect for standalone applications or prototypes!)

Squirtle: Squirt squirtle squirt squirtle. Squirtle squirt, squirtle squirtle squirt. Squirtle squirt, squirtle squirt squirtle! (But if they expect high concurrency and need to scale up in the future, Postgres would be more suitable. It can handle a large number of simultaneous connections and provides advanced features like ACID compliance and data integrity. Ideal for enterprise-level applications with growing needs!)

Pikachu: Pika pi, pikachu pika chu pika pikachu. Pika pika, chu pika pikachu pika. Pika pikachu, chu pika pika! (The research center seems to be in the middle. They have moderate data requirements and potential for growth. It's not a clear-cut decision for them, as both SQLite and Postgres have their strengths and weaknesses.)

Squirtle: Squirtle squirt squirtle squirt. Squirtle squirt, squirtle squirtle squirt. Squirtle squirt, squirtle squirt squirtle! (I agree. It's important to consider their current needs as well as future scalability. They should weigh the trade-offs between simplicity and robustness, and choose the database that aligns with their long-term goals.)

Pikachu: Pika pika! Chu pika pikachu pika chu! Pika pi, pikachu pika chu pika. Pika pika, chu pika pikachu! (Hey, I just realized something! Both SQLite and Postgres support ANSI standard SQL! This means that the research center can leverage their existing SQL knowledge and skills. It also makes it easier to migrate between the two databases if needed!)

Squirtle: Squirtle squirt squirtle squirt. Squirtle squirt, squirtle squirtle squirt. Squirtle squirt, squirtle squirt squirtle! (Absolutely! The transition wouldn't be too difficult since the SQL syntax is mostly compatible. They can start with SQLite for simplicity and move to Postgres when they need more advanced features and scalability. It's a win-win situation!)

Pikachu: Pika pika! Chu pika pikachu! Pika pi, pikachu pika chu pika. Pika pika, chu pika pikachu! (That's a relief! They can choose either one and have the flexibility to adapt in the future! The Pokemon Research Center will be in good hands with either SQLite or Postgres. Let's go share our findings with them!)

Squirtle: Squirt squirtle squirt! Squirtle squirt, squirtle squirtle squirt. Squirtle squirt, squirtle squirt squirtle! (Right behind you, Pikachu! I'm confident that our analysis will help them make an informed decision. The research center's operations and research will greatly benefit from choosing the right database!)

## Review With Quizlet

In [1]:
%%html
<iframe src="https://quizlet.com/930521134/learn/embed?i=psvlh&x=1jj1" height="600" width="100%" style="border:0"></iframe>

## Glossary


| Term | Definition |
|------|------------|
| ADD CONSTRAINT | SQL command used to add a constraint to an existing table, such as primary key, foreign key, or check constraints. |
| ALTER COLUMN | SQL command to modify the properties of a column in an existing table, such as data type, default value, or constraints. |
| ALTER USER | SQL command to change the attributes of a database user account, such as password, role membership, or connection limits. |
| BOOLEAN | A data type that can have one of three possible values: TRUE, FALSE, or NULL. Used for logical operations and conditions. |
| CREATE EXTENSION | PostgreSQL command to load additional functionality into the database, such as new data types, functions, or operators. |
| CREATE PROCEDURE my_procedure(parameters) AS \$\$ | SQL syntax to define a new stored procedure with specified parameters and a body of code enclosed in $$ delimiters. |
| CREATE ROLE | SQL command to create a new role in the database system, which can be used for managing permissions and access control. |
| CREATE TYPE my_type AS ENUM (v1, v2) | PostgreSQL syntax to create a custom enumerated type with predefined values, allowing for more specific data validation. |
| CREATE USER | SQL command to create a new user account in the database system, often with specified privileges and attributes. |
| DATE | A data type used to store calendar dates (year, month, day) without time information. |
| DROP USER | SQL command to remove a user account from the database system, including all owned objects and permissions. |
| ENUM | Short for enumerated type, a data type consisting of a static, ordered set of values. |
| GRANT | SQL command used to give specific privileges to a user or role on database objects. |
| GRANT operation on t to user | Specific form of the GRANT command that assigns certain operations (e.g., SELECT, INSERT) on a table 't' to a specified user. |
| GRANT role to user | SQL command to assign a role and its associated privileges to a specific user. |
| Hash function | A mathematical algorithm that maps data of arbitrary size to a fixed-size output, often used for indexing, encryption, and data integrity checks. |
| INTEGER[] | PostgreSQL data type representing an array of integers, allowing multiple integer values to be stored in a single column. |
| INTERVAL | A data type used to store and manipulate periods of time, such as days, hours, minutes, and seconds. |
| JSONB | PostgreSQL data type for storing JSON (JavaScript Object Notation) data in a binary format, allowing for efficient querying and indexing. |
| NUMERIC(p,s) | A data type for storing exact numeric values with a specified precision (p) and scale (s). |
| OVER (PARTITION BY c1 ORDER BY c2) | SQL clause used with window functions to define a window or set of rows for calculation, partitioned by column c1 and ordered by column c2. |
| pgcrypto | A PostgreSQL extension that provides cryptographic functions for encryption, hashing, and random data generation. |
| plpgsql | Procedural Language/PostgreSQL, a loadable procedural language for the PostgreSQL database system used for creating functions and stored procedures. |
| PostgreSQL | An open-source, advanced object-relational database management system known for its reliability, feature robustness, and performance. |
| Principle of least privilege | A security concept that advocates granting users the minimum levels of access or permissions needed to perform their tasks. |
| RANK() | A window function that assigns a rank to each row within a partition of a result set, with gaps in the ranking for ties. |
| REVOKE | SQL command used to remove specific privileges from a user or role on database objects. |
| Scalability | The capability of a database system to handle growing amounts of work, data, or users efficiently and to be enlarged to accommodate that growth. |
| Stored Procedures | Named sets of SQL and procedural statements that are stored in the database for reuse, improving performance and code modularity. |
| Strict typing | A characteristic of a programming language or database system that enforces rigid data type rules, reducing the risk of type-related errors. |
| TIME | A data type used to store time-of-day values, typically in hours, minutes, seconds, and fractional seconds. |
| Window functions | SQL functions that perform calculations across a set of rows that are related to the current row, allowing for complex analytical queries. |