# Selecting and Retrieving Data with SQL

## What is SQL anyway?

* SQL = Structured Query Language
* It is a standard computer language for relational database management and data manipulation
* It is used to query, insert, update and modify data
* It is used to communicate with databases
* Statements are made up of descriptive works and are easy to learn
* SQL is a non-procedural language:
    * It cannot write complete applications
    * It is simple, but powerful

### Database Administrator vs. Data Scientist

* DBA
    * Manages/governs entire database
    * Gives permissions to users
    * Determines access to data
    * Manages and creates tables
    * Uses SQL to query and retrieve data

* Data Scientist
    * End user of a database
    * Uses SQL to query and retrieve data
        * May create their own table or test environment
        * Combines multiple sources together
        * Writes complex queries for analysis

### SQL and Database Management Systems

* How you write syntax will depend on the DBMS you are using
* Each DBMS has its own dialect
* SQL can translate
* You will tweak based on the dialect your DBMS speaks



## Thinking about your data

* **Think before you code**: what is the problem you are trying to solve?
* **Understand your data**:
    * Understand the business process or subject matter the data is modeled after
    * Know the business rules
    * Understand how your data is organized and structured in the table (modeled)
* **Concepts**:
    * Database: a container (usually a file or a set of files) to store organized data; a set of related information
    * Tables: a structured list of data or a specific data type
    * Columns: a single field in a table
    * Row: a record in a table

## Evolution of data models
* What is data modeling?
    * Models organize and structure information into multiple related tables
    * They can represent a business process or show relationships between business processes
    * They should closely represent the real world
* Relational models
    * Conceptual simplicity (structural independence)
    * Provides ad-hoc queries (SQL)
    * Set-oriented acess
* No-SQL models
    * Address the big data problem
    * Less semantics in data models
    * Based on schema-less key-value data-model
    * Best suited for large sparse data stores

## Relational vs. Transactional Models

* Characteristics:
    * **Relational model**: Allows for easy querying and data manipulation in an easy, logical and intuitive way
    * **Transactional model**: Operacional database - insurance claims within a healthcare database

* Data model building blocks
    * Entity: person, place, thing or event. Distinguishable, unique and distinct
    * Attribute: a characteristic of an entity
    * Relationship: one-to-many, many-to-many, one-to-one

* ER Diagrams
    * Composed of entity types; it specifies relationships that can exist between instances of those entity types

* Primary keys and foreign keys
    * Primary key: a column (or set of columns) whose values uniquely identify every row in a table
    * Foreign key: one or more columns that can be used together to identify a single row in another table

* ER Diagram Notation:
    * Chen Notation:
        * painter -1:M- painting
        * employee -M:N- skill
        * manager -1:1- store
    
    * Crow's foot Notation:
        * painter || paints |< painting
        * employee >| learns |< skill
        * manager || manages || store

## Good practices of writing statements

In [3]:
# Retrieving data with a SELECT statement
statement = """

SELECT prod_name,
	   prod_id,
	   prod_price
FROM Products
LIMIT 10;

"""

# Creating tables:
statement = """

CREATE TABLE Shoes
	(
	Id	char(10)	PRIMARY KEY,
	Brand	char(10)	NOT NULL,
	Type	char(250)	NOT NULL,
	Color	char(250)	NOT NULL,
	Price	decimal(8,2)	NOT NULL,
	Desc	Varchar	NULL
	);

"""

# Creating 'rows'
statement = """

INSERT INTO Shoes
	(Id,
	Brand,
	Type,
	Color,
	Price,
	Desc
	)

VALUES
	(‘14535974’,
    ‘Gucci’,
    ‘Slippers’,
    ‘Pink’,
    ‘695.00’,
    NULL);

"""

## Creating temporary tables

* Temporary tables will be deleted when current session is terminated
* It is faster than creating a real table
* Useful for complex queries when using subsets and joins

In [4]:
statement = """

CREATE TEMPORARY TABLE Sandals AS
	(
		SELECT *
		FROM shoes
		WHERE shoe_type = ‘sandals’
	)

"""

## Adding comments to SQL

* Single lane comments: --
* Multiple lane comments: /\* code */

## Suggested readings

### SQL Overview
* [What is SQL and how is it used?][https://www.thebalance.com/what-is-sql-and-uses-2071909]
* [NTC Hosting: Structured Query Language][https://www.ntchosting.com/encyclopedia/databases/structured-query-language/]
* [SQLite Tutorial][https://www.tutorialspoint.com/sqlite/index.htm]

### Data modeling and ER Diagrams
* [Entity-Relationshop Diagrams (9 min video)][https://www.youtube.com/watch?v=c0_9Y8QAstg]
* [Star Schema vs. Snowflake Schema][http://www.vertabelo.com/blog/technical-articles/data-warehouse-modeling-star-schema-vs-snowflake-schema]
* [Explain Star Schema & Snow Flake Design (5 min video)][https://www.youtube.com/watch?v=KUwOcip7Zzc]
* [Data modeling 101][http://www.agiledata.org/essays/dataModeling101.html]
* [What is Data Modeling - An Introduction for Business Analysts][http://business-analysis-excellence.com/what-is-data-modeling/]

### Comparing NoSQL and SQL
* [SQL vs. NoSQL - What you need to know][http://dataconomy.com/2014/07/sql-vs-nosql-need-know/]
* [NoSQL keeps rising, but relational databases still dominate big data][http://www.techrepublic.com/article/nosql-keeps-rising-but-relational-databases-still-dominate-big-data/]
* [Data Science skills: Is NoSQL better than SQL?][https://www.siliconrepublic.com/careers/data-science-skills-sql]