# SQL I: Intro

## What's SQL
SQL stands for Structured Query Language. This languaje allows us to interact with relational databases to manage and analyze the information stored in the DB.

SQL is a standard languaje but the different versions have their own extensions and commands (like SQL Server, MySQL, MariaBD, ...)

## What SQL can do
- execute queries against a database
- retrieve data from a database
- insert records in a database
- update records in a database
- delete records from a database
- create new databases
- create new tables in a database
- create stored procedures in a database
- can create views in a database
- can set permissions on tables, procedures, and views

## SQL Syntax

A database most often contains one or more tables. Each table is identified by a name (e.g. "Customers" or "Orders"). Tables contain records (rows) with data. We can also have Views in our database which are equivalent to a table with the difference that they reference a table and encapsule a SQL Query. 

### SQL commands
You have to have in mind that the SQL commands are not case sensitive, so `SELECT` is equivalent to `select`. Some database systems require a semicolon at the end of each SQL statement. Semicolon is the standard way to separate each SQL statement in database systems that allow more than one SQL statement to be executed in the same call to the server, but as mentioned, they are mostly optional.

**This are the most important commands to have in mind**
- SELECT -> extracts data from a database
- UPDATE -> updates data in a database
- DELETE -> deletes data from a database
- INSERT INTO -> inserts new data into a database
- CREATE -> creates new items (no data), it's combined with DATABASE, TABLE, VIEW, or INDEX
- ALTER DATABASE -> modifies a item (no data), combined with DATABASE and TABLE
- DROP TABLE -> deletes an item (no data), combined with TABLE, DATABASE, VIEW ...

## Entity - Relation model

The entity-relationship model is a tool to generate the data model that describes the structure and relationships of a database. These models at the same time are describing a real situation, with real elements that are related to each other. For example: The activity of a fruit warehouse, or without going any further, the activity of this same forum.

Obviously, the concrete activity of, for example, loading a fruit truck is not being described. But it is being described that in this reality (the fruit warehouse) there is an entity called COURIERS, which is related to another entity called ORDERS, where the latter will be acquired, and therefore are related, by another entity called CUSTOMERS, etc. ..

The entity-relationship model is a diagram that helps generate the data structure with which to manage a real problem or activity. Once this model has become a structure within the database, that is, the tables with their primary and foreign keys, through SQL it is possible both to maintain the operation of the activity by feeding the database, and to analyze the data for the benefit of activity. For example, in the case of the fruit warehouse, the data structure should allow registering customer orders, but also and consequently, obtain the sales per customer in a given period.

Let's have a look to a simplified example (Without column or field names)

![er_profesores](er_profesores.gif)

We observe that there are three entities: CURSOS, PROFESORES and ALUMNOS, the cardinality of the relationships is also observed through the indicators on both sides of them, together with the entities that are being related. To establish the cardinality of relationships we must ask ourselves the questions that answer this question, for example, let's take the relationship CURSOS - PROFESORES and see how the cardinality of said relationship is established:

- A professor can teach several courses. This implies annotating an N on the side of the CURSOS entity of said relationship.

- A course is taught by a single teacher. This implies annotating a ONE on the side of the PROFESORES entity of said relationship.

This model of relationships implies that we need to add a foreign key into the CURSOS table, related to the PROFESORES table, but we'll see in further lessons what a foreign key is and for what are they used to.

Let us now take the relationship CURSOS - ALUMNOS:

- Several students are enrolled in a course. This implies annotating an N on the side of the STUDENT entity of said relationship.

- A student can attend several courses. This implies annotating an M/N on the side of the COURSES entity of said relationship.

**Note:** Have a look to the diagram and note that M is noted because the N is already used at the other end of the relationship. This indicates that it is a many-to-many relationship, and N and M may be of different values for a given course and student.


### Strong and weak relationships

There are two types of entities, the strong ones, sometimes called masters, that independently identify their records with their own key, and the weak ones that depend on a strong entity to identify their records, or if you want, their existence makes no sense without a strong entity to lean on. A typical example of a weak entity is the INVOICE_LINES entity that depends on the INVOICES master to identify its records. The cardinality of this relationship is from 1 to N, since an invoice can have several lines while a line can only belong to one invoice. Well, in the weak entity LINEAS_INVOICE the primary key will be composed and the ID_INVOICE field will be part of it, which in turn will be a foreign key of the INVOICES table. The other field that will form the primary key will be, for example, ID_LINEA, so that to identify a record of the LINEAS_INVOICES entity, the key of its master or strong entity is needed in addition to ID_LINEA. Example: invoice: 92054 line: 3 identifies line 3 of invoice 92054. The cardinality of the relationship of a weak entity with its master or strong entity will always be from 1 to N. 

Weak entities are represented in the entity-relationship diagram with a double rectangle:

![er_facturas](er_facturas.gif)

If you have doubts about what nature to apply to an entity, the following tips may help:

- If the nature of the records for the entity being studied can change their parent in the future, it is surely a strong entity.
- If the parent entity of the entity whose nature is being studied simply groups records, sometimes being doubtful as to which parent to associate with a child record, or if you want there are several equally valid candidates, it is probably a strong entity.
- If the nature of the entity under study is not expected to have too many records for the same parent, that is, it will have a relatively small number of records for a given parent, and apart from its possible master it is hardly related to other entities, then probably be a weak entity.
- If the entity whose nature is being studied is related to many other entities in such a way that we must create keys foreign to the entity being analyzed in all of them, then even if it is a weak entity, it may be convenient to assess identifying its records with its own key. and make it strong. Otherwise we will need to port the composite key to all these related entities to create the foreign keys.

## SQL Data Types

This are the basic and most common data types you need to know in SQL:

- CHAR(0) -> A fixed lenght of alphanumeric value
- VARCHAR(0) -> A maximul lenght of alphanumeric value
- INT -> A numeric field without decimals with positive and negative numbers
- FLOAT -> A numeric field with decimals
- LONGTEXT -> A slphanumeric field with a big capacity for text
- DATE -> A date value
- DATETIME -> A date and time value

This are all the data types yoy may use in the future:

**String** (Text)
- **HAR(size)**: A FIXED length string (can contain letters, numbers, and special characters). The size parameter specifies the column length in characters - can be from 0 to 255. Default is 1
- **VARCHAR(size)**: A VARIABLE length string (can contain letters, numbers, and special characters). The size parameter specifies the maximum column length in characters - can be from 0 to 65535
- **BINARY(size)**: Equal to CHAR(), but stores binary byte strings. The size parameter specifies the column length in bytes. Default is 1
- **VARBINARY(size)**: Equal to VARCHAR(), but stores binary byte strings. The size parameter specifies the maximum column length in bytes.
- **TINYBLOB**: For BLOBs (Binary Large Objects). Max length: 255 bytes
- **TINYTEXT**: Holds a string with a maximum length of 255 characters
- **TEXT(size)**: Holds a string with a maximum length of 65,535 bytes
- **BLOB(size)**: For BLOBs (Binary Large Objects). Holds up to 65,535 bytes of data
- **MEDIUMTEXT**: Holds a string with a maximum length of 16,777,215 characters
- **MEDIUMBLOB**: For BLOBs (Binary Large Objects). Holds up to 16,777,215 bytes of data
- **LONGTEXT**: Holds a string with a maximum length of 4,294,967,295 characters
- **LONGBLOB**: For BLOBs (Binary Large Objects). Holds up to 4,294,967,295 bytes of data

**Numeric**
- **TINYINT(size)**: A very small integer. Signed range is from -128 to 127. Unsigned range is from 0 to 255. The size parameter specifies the maximum display width (which is 255)
- **BOOL**: Zero is considered as false, nonzero values are considered as true.
- **BOOLEAN**: Equal to BOOL
- **SMALLINT(size)**: A small integer. Signed range is from -32768 to 32767. Unsigned range is from 0 to 65535. The size parameter specifies the maximum display width (which is 255)
- **MEDIUMINT(size)**: A medium integer. Signed range is from -8388608 to 8388607. Unsigned range is from 0 to 16777215. The size parameter specifies the maximum display width (which is 255)
- **INT(size)**: A medium integer. Signed range is from -2147483648 to 2147483647. Unsigned range is from 0 to 4294967295. The size parameter specifies the maximum display width (which is 255)
- **INTEGER(size)**: Equal to INT(size)
- **BIGINT(size)**: A large integer. Signed range is from -9223372036854775808 to 9223372036854775807. Unsigned range is from 0 to 18446744073709551615. The size parameter specifies the maximum display width (which is 255)
- **FLOAT(size, d)**: A floating point number. The total number of digits is specified in size. The number of digits after the decimal point is specified in the d parameter. This syntax is deprecated in MySQL 8.0.17, and it will be removed in future MySQL versions
- **FLOAT(p)**: A floating point number. MySQL uses the p value to determine whether to use FLOAT or DOUBLE for the resulting data type. If p is from 0 to 24, the data type becomes FLOAT(). If p is from 25 to 53, the data type becomes DOUBLE()
- **DOUBLE(size, d)**: A normal-size floating point number. The total number of digits is specified in size. The number of digits after the decimal point is specified in the d parameter
- **DECIMAL(size, d)**: An exact fixed-point number. The total number of digits is specified in size. The number of digits after the decimal point is specified in the d parameter. The maximum number for size is 65. The maximum number for d is 30. The default value for size is 10. The default value for d is 0.

**Date and Time**
- **DATE**: A date. Format: YYYY-MM-DD. The supported range is from '1000-01-01' to '9999-12-31'
- **DATETIME(fsp)**: A date and time combination. Format: YYYY-MM-DD hh:mm:ss. The supported range is from '1000-01-01 00:00:00' to '9999-12-31 23:59:59'. Adding DEFAULT and ON UPDATE in the column definition to get automatic initialization and updating to the current date and time
- **TIMESTAMP(fsp)**: A timestamp. TIMESTAMP values are stored as the number of seconds since the Unix epoch ('1970-01-01 00:00:00' UTC). Format: YYYY-MM-DD hh:mm:ss. The supported range is from '1970-01-01 00:00:01' UTC to '2038-01-09 03:14:07' UTC. Automatic initialization and updating to the current date and time can be specified using DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP in the column definition
- **TIME(fsp)**: A time. Format: hh:mm:ss. The supported range is from '-838:59:59' to '838:59:59'
- **YEAR**: A year in four-digit format. Values allowed in four-digit format: 1901 to 2155, and 0000.


You can see all the data types in this link: https://www.digitalocean.com/community/tutorials/sql-data-types

## Interacting with SQL

There are several ways to interact with an SQL, MySQL, ... server, these are the most common:

- SQL Management tools
- Programming languaje
- ODBC connector

We're going to focus in the first two methods

### SQL Management tools

Each SQL ecosystem can have their own SQL management tool, for example in the case of SQL Server (Microsoft) the most common tool is SQL Server Management tool, and in the case of MySQL (the framework we'll use) is MySQL Workbench. There are also some third party used tools to interact and manage SQL, MySQL and other systems like the case of DBeaver which is a global connector for several database types.

For this lesson, please download and install MySQL workbench + local mysql server: https://dev.mysql.com/downloads/installer/

You can watch this video to recap the content we have seen in class: https://www.youtube.com/watch?v=2mbHyB2VLYY


### Programming languajes

SQL is commonly combined with other lenguajes in order to build apps, websites and systems. Is widely used in websites with wordpress or PHP base, and interacted with analytic lenguajes like python to perform analysis and data models. In our case we'll learn how to interact with python.