# Information Systems for Engineers Spring 2022 - Cheat Sheet

During the exam, you will be required to write SQL queries using a Jupyter notebook.

This notebook is designed to help you start writing your queries by providing you an environment with the datasets loaded and a simple query that you can use to recap the syntax of SQL.

Feel free to extend this notebook and use it for preparing the answers you need for the exam. Take into account that the content of this notebook will not be considered for grading.

## SQL

There is a local PostgreSQL 13 installation with a dataset loaded into a database. Run the next cell to connect to it.

In [None]:
%load_ext sql
%sql  postgresql://postgres:example@db 

To print the tables currently loaded in the database run:

In [None]:
%%sql

SELECT * 
FROM INFORMATION_SCHEMA.TABLES 
WHERE TABLE_TYPE = 'BASE TABLE' and TABLE_CATALOG = 'postgres' and TABLE_SCHEMA = 'public';

To print the attributes of a particular table (`airports`, for example) run:

In [None]:
%%sql

SELECT column_name, data_type, character_maximum_length
FROM INFORMATION_SCHEMA.COLUMNS 
WHERE table_name = 'airports';

## Useful SQL Keywords

The Keyword `SELECT DISTINCT` is used to return only distinct values. For example:

In [None]:
%%sql
SELECT DISTINCT residence FROM airlines;

## Complex query example

More complex PostgreSQL queries would look like:

In [None]:
%%sql
SELECT airlines.residence, COUNT(airlines.code)
    FROM airlines INNER JOIN airports ON airlines.residence = airports.residence
    WHERE airlines.residence <> 'CA'
GROUP BY airlines.residence
ORDER BY airlines.residence;

## Exam database − data about flight delays in the US

The dataset consists of relations containing information such as airports, airlines, flights and flight irregularities. Tables include both real-world and synthetic data. 

Here is some basic information on the database tables.

### 1) `airlines` table

Contains the list of airlines serving flights in our database.

* `code` is the three-letter IATA code identifier of the airline

* `name` is the airline name

* `residence` is the two-letter code of the state of residence

In [None]:
%%sql
SELECT * FROM airlines;

### 2) `airports` table

Contains the list of available airports.

* `code` is the three-letter IATA code identifier of the airport

* `name` is the airport name

* `city` is the name of the city where the airport is located

* `residence` is the two-letter code of the state in which airport is located

* `latitude` and `longitude` are floating-point numbers describing the geographical location of the airport

In [None]:
%%sql
SELECT * FROM airports LIMIT 4;

### 3) `flights` table

Contains the list of flights conducted during the the first seven days of January 2015. This table contains many rows - careful when printing the data!

* `id` is the unique flight ID

* `flight_number` is the IATA flight code

* `airline` is the IATA code of the airline

* `departure` and `arrival` are the IATA codes of the departure and arrival airports

* `year`, `month`, and `day` are the integer values encoding the number of day, month and the year when the flight departed. All values are 1-based integers, e.g., the date 2nd January 2015 is stored in columns `year`, `month`, `day` as `2015`, `1`, and `2`, respectively.

* `load_factor` is a floating-point number in the range `[0-1]` that describes the load factor on the flight, i.e., the fraction of occupied passenger seats

In [None]:
%%sql
SELECT * FROM flights LIMIT 4;

### 4) `flights_delay` table

Contains the information on flights delays and irregularities. This table contains many rows - careful when printing the data!

* `flight_id` is the unique flight ID

* `arrival_delay` a positive value is the delay on arrival in minutes, a negative value indicates an early arrival in minutes

* `cancellation` and `divertion` are boolean flags indicating cancelled and diverted flights

In [None]:
%%sql
SELECT * FROM flights_delay LIMIT 4;

##### Note: the examples provided above do not contain all the query operations you might need during the exam.

Now its your turn, you can write all your queries in new cells below. Feel free to add as many cells as needed.

In [None]:
%%sql

In [None]:
%%sql 

In [None]:
%%sql

In [None]:
%%sql 

In [None]:
%%sql 

In [None]:
%%sql 

In [None]:
%%sql 

In [None]:
%%sql