Skip to content

Roger-Takeshita/SQL

Repository files navigation

Summary

RELATIONAL DATABASE AND SQL

Anatomy of a Relational Database

Go Back to Summary

  • The structure of a a particular database is known as its schema.
  • Schemas include the definition of such things as the database's:
    • Tables, including the number and data type of each column
    • Indexes for efficient access of data.
    • Constrains (rules, such as whether a field can e null or not)

Tables

Go Back to Summary

  • Database tables look like a spreadsheet since they consist of columns and rows.

  • Tables are also known as relations.

  • A single table in a relational database holds data for a particular data entry.

  • Since only one type of data can be held in a single table, related data, we will have different tables storing different contents and tye are linked via what is known as a foreign key (FK).

  • Foreign key fields hold the value of its parent's primary key (PK).

  • The naming convention is typically snake_cased and always plural.

Columns

Go Back to Summary

  • The columns of a table have a:

    • Name
    • Data type
    • Optional contrains
  • The typical naming convention is usually snake_cased and singular.

  • PostgreSQL has many data types for columns, but common ones include:

    • Integer
    • Decimal
    • Varchar (variable-length strings)
    • Text (unlimited length strings)
    • Date (does not include time)
    • Timestamps (both date and time)
    • Boolean
  • Common constrains for a column include:

    • PRIMARY KEY: column, or group of columns, uniquely identify a row.
    • REFERENCES (Foreign Key): value in column must match the primary key in another table.
    • NOT NULL: column must have a value, it cannot be empty (null).
    • UNIQUE: data in this column must be unique among all rous in the table.

Creating Database and a Table

Go Back to Summary

  • On terminal:

       CREATE DATABASE music;
    
       CREATE TABLE bands (
          id serial PRIMARY KEY,  # serial is auto-incrementing integer
          name varchar NOT NULL,
          genre varchar
       );

Creating a Table for a Related Data Entry

Go Back to Summary

  • Let's say we have the following data relationship: Band ----< Musician

    • A Band has many Musicians and a Musician belongs to a Band
  • Whenever you have a one:many relationship, the rows in the table for the many-side must include a column that references which row in the table on the on-side it belongs to.

  • This column is known as a foreign key (FK)

  • The FK must be the same data type is the primary key in the parent table (usually an integer).

       CREATE TABLE musicians (
          id serial PRIMARY KEY,
          name varchar NOT NULL,
          quote text,
          band_id integer NOT NULL REFERENCES bands (id)
       );

PostgreSQL Commands

Terminal

Go Back to Summary

  • Run PostgreSQL on Terminal:

       pqsl
  • Connect to a specific database:

       \c <database_name>
  • To quit the psql

       \q
  • To List all databases in the PostgreSQL database server

       \l
  • To list all tables inside the databa_base that you are currentt using.

       \d

Managing Databases

Go Back to Summary

Create a Database

   CREATE DATABASE <database_name>;

Delete a Database

   DROP DATABASE <database_name>;

Managing Tables

Go Back to Summary

Add a New Column to a Table

   ALTER TABLE <table_name> ADD COLUMN <column_name> <DATA_TYPE>;

Delete a Column from a Table

   ALTER TABLE <table_name> DROP COLUMN <column_name>;

Rename a Column

   ALTER TABLE <table_name> RENAME <column_name> TO <new_column_name>;

Rename a Table

   ALTER TABLE <table_name> RENAME TO <new_table_name>;

Managing Indexes

Go Back to Summary

Removing a Specified Index from a Table

   DROP INDEX <index_name>;

Querying Data from Tables

Go Back to Summary

Query All Data from a Table

   SELECT * FROM <table_name>;

Query Data from Specified Columns

   SELECT <column_name_1>, <column_name_2>, ... FROM <table_name>;

Query Data Using a Filter - WHERE Operator

   SELECT * FROM <table_name> WHERE <condition>;

Query Data - LIKE Operator

   SELECT * FROM <table_name> WHERE <column_name> LIKE '%value%';

Query Data - IN Operator

   SELECT * FROM <table_name> WHERE <column_name> IN (value_1, value2, ...);

Query Data - Constrain the Returned Rows - LIMIT Operator

   SELECT * FROM <table_name> LIMIT <limit> OFFSET <offset> ORDER BY <column_name>;

Query Data - JOIN

  • INNER JOIN, LEFT JOIN, FULL OUTER JOIN, CROSS JOIN and NATURAL JOIN

      SELECT * FROM <table_name_1> INNER JOIN <table_name_2> ON <conditions>;
      SELECT * FROM <table_name_1> LEFT JOIN <table_name_2> ON <conditions>;
      SELECT * FROM <table_name_1> FULL OUTER JOIN <table_name_2> ON <conditions>;
      SELECT * FROM <table_name_1> CROSS JOIN <table_name_2> ON <conditions>;
      SELECT * FROM <table_name_1> NATURAL JOIN <table_name_2> ON <conditions>;

Return the Number of Rows of a Table

   SELECT COUNT (*) FROM <table_name>;

Sort Rows in Ascending or Descending Order - ORDER BY

   SELECT * FROM <table_name> ORDER BY <column_name_1>, <column_name_2>, ...;

Filter Groups - HAVING Clause

   SELECT * FROM <table_name> GROUP BY <column_name> HAVING <condition>;

Set Operations

Go Back to Summary

Combine the Result - Two or More Queries - UNION Operator

   SELECT * FROM <table_name_1> UNION SELECT * FROM <table_name_2>;

Minus a Result - EXCEPT Operator

   SELECT * FROM <table_name_1> EXCEPT SELECT * FROM <table_name_2>;

Get Intersection of the Result Sets of Two Queries

   SELECT * FROM <table_name_1> INTERSECT SELECT * FROM <tabble_name_2>;

Modifying Data

Go Back to Summary

Insert a New Row Into a Table

   INSERT INTO <table_name> (<column_name_1>, <column_name_2, ...) VALUES (<value_1>, <value_2>, ...);

Insert Multiple Rows Into a Table

   INSERT INTO <table_name> (<column_name_1>, <column_name_2>, ...) VALUES (<value_1>, <value_2>, ...), (<value_1>, <value_2>, ...), (<value_1>, <value_2>, ...) ...;

Update Data for All Rows

   UPDATE <table_name> SET <column_name_1> = <value_1>, ...;

Update Data from a Set of Rows Specified by a Condition - WHERE clause

   UPDATE <table_name> SET <column_name_1> = <value_1>, ... WHERE <conditions>;

Delete All Rows of a Table

   DELETE FROM <table_name>;

Delete a Specific Row Based on a Condition

   DELETE FROM <table_name> WHERE <condition>;

ADVANCED

Go Back to Summary

  • SELECT: choose the fields for query
  • FROM: pick table(s) fro data source
  • WHERE: filter data based upon conditions
  • GROUP BY: segment data into groups
  • ORDER BY: sort results
  • LIMIT: limit the number of records returned
  • JOIN Types:
    • INNER vs. OUTER
    • LEFT vs. RIGHT

Aliases

Go Back to Summary

  • Rename fields in your queries:

      SELECT <field> AS <alias>
      SELECT milliseconds/1000. AS seconds
  • Reference tables as abbreviations:

      FROM <table> <alias>
      FROM track t
      JOIN genre g ON t.genreid = g.genreid;
      SELECT g.name,
             AVG(t.milliseconds) / 1000. / 60. AS minutes
      FROM track t
        JOIN mediatype m ON t.mediatypeid = m.mediatypeid
        JOIN genre g ON t.genreid = g.genreid
      WHERE m.name LIKE '%audio%'
      GROUP BY g.name
      ORDER BY minutes DESC;
  • Reference fields by position:

      SELECT <field1>, <field2>
      FROM <table>
      ORDER BY 1, 2
      SELECT name, milliseconds
      FROM track
      ORDER BY 2 DESC
      SELECT g.name,
             AVG(t.milliseconds) / 1000. / 60. AS minutes
      FROM track t
        JOIN mediatype m ON t.mediatypeid = m.mediatypeid
        JOIN genre g ON t.genreid = g.genreid
      WHERE m.name LIKE '%audio%'
      GROUP BY g.name
      ORDER BY 2 DESC;

Extract

Go Back to Summary

  SELECT EXTRACT ( <date_component> FROM <field> )
  SELECT EXTRACT ( month FROM invoicedate ) AS month
  FROM invoice;
  • Date/Time components
    • Day, Month, Year
    • Week, Month
    • Hour, Minute, Second
    • DOW, DOY, Quarter, Timezone

Having

Go Back to Summary

  SELECT <field1>, <agg function>(<field2>)
  FROM <table>
  GROUP BY <field1>
  HAVING <agg function>(<field2>) <operator> <value>;
  SELECT genreid, COUNT(trackid)
  FROM track
  GROUP BY genreid
  HAVING COUNT(*) > 50
  ORDER BY genreid ASC;
genreid count
1 1297
2 130
3 374
4 332
  SELECT genre.name,
         AVG(track.milliseconds) / 1000. / 60. AS minutes
  FROM track
    JOIN mediatype ON track.mediatypeid = mediatype.mediatypeid
    JOIN genre ON track.genreid = genre.genreid
  WHERE mediatype.name LIKE '%audio%'
  GROUP BY genre.name
  HAVING AVG(track.milliseconds) / 1000. / 60. >= 3
  ORDER BY minutes DESC;

Concatenate Results

  • Just like & in Excel, we can use || to concatenate results in SQL

      SELECT e.firstname || ' ' || e.lastname AS "Employee Name",
             boss.firstname || ' ' || boss.lastname AS "Boss"
      FROM employee e
      FULL JOIN employee boss ON e.reportsto = boss.employeeid;

SUM

Go Back to Summary

  SELECT SUM(milliseconds) / 1000. / 60. / 60. / 24. AS days
  FROM track;
  --  15.958079

LIKE vs. ILIKE

Go Back to Summary

  • ILIKE = case insensitive

      SELECT g.name,
            AVG(t.milliseconds) / 1000. / 60. AS minutes
      FROM track t
        JOIN mediatype m ON t.mediatypeid = m.mediatypeid
        JOIN genre g ON t.genreid = g.genreid
      WHERE m.name LIKE '%audio%'
      GROUP BY g.name
      ORDER BY 2 DESC;
  • LIKE = case sensitive

      SELECT genre.name,
            AVG(track.milliseconds) / 1000. / 60. AS minutes
      FROM track
        JOIN mediatype ON track.mediatypeid = mediatype.mediatypeid
        JOIN genre ON track.genreid = genre.genreid
      WHERE mediatype.name LIKE '%audio%'
      GROUP BY genre.name
      HAVING AVG(track.milliseconds) / 1000. / 60. >= 3
      ORDER BY minutes DESC;

Case Statements

Go Back to Summary

  • Just like a switch/case in JavaScript

      SELECT CASE
        WHEN < condition_1 > THEN < result_1 >
        WHEN < condition_2 > THEN < result_2 >
        ELSE <result_3 >
      END;
      SELECT CASE
        WHEN AGE < 18 THEN 'child'
        WHEN AGE >= 60 THEN 'senior'
        ELSE 'adult'
      END AS age_market_segment;

Union

Go Back to Summary

  • Merges data from two queries by stacking results on top of each other

  • Must have same number of columns and corresponding data types

  • Duplicate results are removed by default

  • UNION ALL will include duplicates

      SELECT *
      FROM< table1 >
      UNION
      SELECT *
      FROM< table2 >;

Coalesce

Go Back to Summary

  • Picks first non-null value

      COALESCE ([field1], [field2], [field3])
      COALESCE(online.firstname, catalog.firstname) AS firstname

Window Functions

Go Back to Summary

  • ATTENTION: Not all SQL supports window function
  • Perform calculation across a set of table rows that are somehow related to the current row + all the previous
  • Applications:
    • Cumulative sales
    • Percentile rank
    • Group level results

Cumulative Sales

Go Back to Summary

  SELECT SUM(< field1 >) OVER (ORDER BY< field2 >)
  FROM< TABLE>;
  • Example, In this case the SUM is calculated dynamically

      SELECT invoicedate,
            SUM(total) OVER (ORDER BY invoicedate)
      FROM invoice;

Percentiles

Go Back to Summary

  SELECT NTILE(< # OF groups >) OVER (ORDER BY< field >)
  FROM< TABLE>;
  SELECT name,
         milliseconds,
         NTILE(100) OVER (ORDER BY milliseconds DESC) AS percentile
  FROM track;

Group Level Results

Go Back to Summary

  SELECT AVG(< field1 >) OVER (PARTITION BY< field2 >)
  FROM< TABLE>;
  SELECT name,
         genreid,
         milliseconds,
         AVG(milliseconds) OVER (PARTITION BY genreid)
  FROM track;

Subqueries

Go Back to Summary

  WITH < subquery_name > AS (
      < code_for_subquery >
  )
  < code_for_parent_query >;
  WITH top_customers AS (
    SELECT customerid,
         SUM(total) AS sales
    FROM invoice
    GROUP BY customerid
  )
  SELECT COUNT(*)
  FROM top_customers
  WHERE sales > 40;

Views

Go Back to Summary

  • When we create a view, this "table" will be permanently saved on the database. This way, anyone that has access to this table, will have access to this view

  • ATTENTION: You can't have duplicate field names in a view

  • IMPORTANT: This table is populated with new data

      CREATE VIEW <view_name > AS <query_code >; COMMIT;
      CREATE VIEW top_customers AS
          SELECT customerid,
                SUM(total) AS sales
          FROM invoice
          GROUP BY customerid
          ORDER BY sales DESC;

Temporary Tables

Go Back to Summary

  • This will permanently create a copy of my data, but it's deleted after you logged out

  • ATTENTION: You can't have duplicate field names in a view

  • IMPORTANT: This table is not populated with new data that is coming in, only if we updated the temporary table

      CREATE TEMP TABLE < view_name > AS < query_code >;
      CREATE TEMP TABLE total_sales AS
          SELECT customerid,
                SUM(total) AS sales
          FROM invoice
          GROUP BY customerid
          ORDER BY sales DESC;

EXERCISES

Go Back to Summary

Exercise 1

Go Back to Summary

  • How many days of content are there in the library?

      SELECT SUM(milliseconds) / 1000. / 60. / 60. / 24. AS days
      FROM track;
      --  15.958079
  • What are the longest songs (excluding video)?

      SELECT track.name,
            track.milliseconds / 1000. / 60. AS minutes,
            mediatype.name
      FROM track
        JOIN mediatype ON track.mediatypeid = mediatype.mediatypeid
      WHERE mediatype.name ILIKE '%AUDIO%'
      ORDER BY track.milliseconds DESC LIMIT 100;
  • What is the average length of a song grouped by genre (convert time to minutes)?

      SELECT genre.name,
             AVG(track.milliseconds) / 1000. / 60. AS minutes
      FROM track
        JOIN mediatype ON track.mediatypeid = mediatype.mediatypeid
        JOIN genre ON track.genreid = genre.genreid
      WHERE mediatype.name LIKE '%audio%'
      GROUP BY genre.name
      ORDER BY minutes DESC;

Exercise 2

Go Back to Summary

  • Which customers have spent more than $40 (Use Group By and Having for the customer and invoice tables)?

      SELECT customer.firstname,
            customer.lastname,
            invoice.customerid,
            SUM(invoice.total)
      FROM customer
        JOIN invoice ON invoice.customerid = customer.customerid
      GROUP BY customer.firstname,
              customer.lastname,
              invoice.customerid
      HAVING SUM(invoice.total) > 40;
  • What are the total sales by month (Use Extract and Group By and the invoice table)?

      SELECT EXTRACT(month FROM invoicedate) AS month,
            SUM(invoice.total) AS total
      FROM invoice
      GROUP BY month
      ORDER BY total DESC;
    
      SELECT EXTRACT(month FROM invoicedate) AS month,
            SUM(invoice.total)
      FROM invoice
      GROUP BY month
      ORDER BY SUM(invoice.total) DESC;
  • Create a roster of employees with their bosses (Join the employee table to itself by using table aliases)

      SELECT t1.firstname,
            t1.lastname,
            t2.firstname,
            t2.lastname
      FROM employee t1
        LEFT JOIN employee t2 ON t1.reportsto = t2.employeeid;
    • Different solution
      SELECT e.firstname || ' ' || e.lastname AS "Employee Name",
            boss.firstname || ' ' || boss.lastname AS "Boss"
      FROM employee e
        FULL JOIN employee boss ON e.reportsto = boss.employeeid;

Exercise 3

Go Back to Summary

  • Using the iowa liquor products table, create an alcohol type label for whisky, vodka, tequila, rum, brandy, schnapps and any other liquor types (hint: use CASE STATEMENT and LIKE)

      SELECT DISTINCT category_name,
            CASE
              WHEN category_name ILIKE '%schnapps%' THEN 'schnapps'
              WHEN category_name ILIKE '%wisk%' OR category_name ILIKE '%scotch%' THEN 'whiskey'
              WHEN category_name ILIKE '%vodka%' THEN 'vodka'
              WHEN category_name ILIKE '%rum%' THEN 'rum'
              WHEN category_name ILIKE '%brand%' THEN 'brandy'
              WHEN category_name ILIKE '%gin%' THEN 'gin'
              ELSE 'Other'
            END AS liquor_type
      FROM iowa_products
      ORDER BY liquor_type;
  • Adding a counter

      SELECT category_name,
            CASE
              WHEN category_name ILIKE '%schnapps%' THEN 'schnapps'
              WHEN category_name ILIKE '%wisk%' OR category_name ILIKE '%scotch%' THEN 'whiskey'
              WHEN category_name ILIKE '%vodka%' THEN 'vodka'
              WHEN category_name ILIKE '%rum%' THEN 'rum'
              WHEN category_name ILIKE '%brand%' THEN 'brandy'
              WHEN category_name ILIKE '%gin%' THEN 'gin'
              ELSE 'Other'
            END AS liquor_type,
            COUNT(*)
      FROM iowa_products
      GROUP BY liquor_type,
              category_name
      ORDER BY liquor_type,
              category_name;
  • Using the catalog and online tables, create a customer list that combines the names from the catalog and online tables using UNION without creating duplicates.

  SELECT customerid,
         firstname AS "First Name",
         lastname AS "Last Name"
  FROM catalog
  UNION
  SELECT customerid,
         firstname,
         lastname
  FROM online
  ORDER BY "First Name",
           "Last Name";
  • FULL JOIN the catalog and online tables and inspect the results. Try adding the catalog sales and online sales totals together. Why do you get errors?

  • Wrong answer:

      SELECT *,
            c.catalogpurchases + o.onlinepurchases
      FROM catalog c
        FULL JOIN online o ON c.customerid = o.customerid;
  • Right answer

      SELECT *,
            COALESCE(c.catalogpurchases + o.onlinepurchases) AS firstname,
            COALESCE(c.lastname,o.lastname) AS lastname,
            COALESCE(c.catalogpurchases,0) +COALESCE(o.onlinepurchases,0) AS total_sales
      FROM catalog c
        FULL JOIN online o ON c.customerid = o.customerid;

Exercise 4

Go Back to Summary

  • How many iowa liquor vendors have more than $1 million in 2014 sales (hint: use subquery to group sales by vendor)?
  WITH vendor_sales AS (
      SELECT vendor,
          vendor_no,
          SUM(total)
      FROM iowa_sales
      WHERE EXTRACT(year FROM DATE) = 2014
      GROUP BY vendor,
              vendor_no
      HAVING SUM(total) > 1000000
  )
  SELECT COUNT(*) FROM vendor_sales;
  • Group sales by month with a subquery and then calculate cumulative sales by month for 2014 (using iowa sales table)

      WITH sale_by_month AS (
          SELECT EXTRACT(YEAR FROM iowa_sales.date) AS YEAR,
                EXTRACT(MONTH FROM iowa_sales.date) AS MONTH,
                SUM(total) AS new_total
          FROM iowa_sales
          GROUP BY YEAR,
                  MONTH
      )
      SELECT SUM(new_total) OVER (ORDER BY month)
      FROM sale_by_month
      WHERE year = 2014;
  • Alternative

      WITH monthly_sales AS (
          SELECT EXTRACT(month FROM DATE) AS month,
              SUM(total) AS sales
          FROM iowa_sales
          WHERE EXTRACT(year FROM DATE) = 2014
          GROUP BY month
      )
      SELECT month,
            sales / 1000000. AS month_sales,
            TO_CHAR(sales,'999,999,999'),
            SUM(sales) OVER (ORDER BY month) / 1000000. AS cum_sales
      FROM monthly_sales
      ORDER BY month;
  • Alternative with monthly sales

      WITH monthly_sales AS (
          SELECT EXTRACT(month FROM DATE) AS month,
                EXTRACT(year FROM DATE) AS year,
                SUM(total) AS sales
          FROM iowa_sales
          GROUP BY month,
                  year
      )
      SELECT month,
            year,
            sales / 1000000. AS month_sales,
            SUM(sales) OVER (PARTITION BY year ORDER BY month) / 1000000. AS cum_sales
      FROM monthly_sales
      ORDER BY year,
              month;
  • Create a View that adds liquor type to the iowa product data. Don't forget to commit your changes.

      IF EXISTS DROP TABLE new_iowa_products;
    
      CREATE VIEW new_iowa_products AS (
          SELECT CASE
              WHEN i.category_name ILIKE '%whisky%' THEN 'whisky'
              WHEN i.category_name ILIKE '%vodka%' THEN 'vodka'
              WHEN i.category_name ILIKE '%tequila%' THEN 'tequila'
              WHEN i.category_name ILIKE '%rum%' THEN 'rum'
              WHEN i.category_name ILIKE '%brandy%' THEN 'brandy'
              WHEN i.category_name ILIKE '%schnapps%' THEN 'schnapps' ELSE 'other'
            END AS liquor_type,
          * FROM iowa_products AS i
      );
    
      COMMIT;
    
      SELECT *
      FROM new_iowa_products;

Releases

No releases published

Packages

No packages published