# SQL Overview:

* SQL is an ANSI (American National Standards Institute) standard, but there are many different versions of the
  SQL language.
* SQL is the standard language for Relation Database System .Different dialects used for different databases like MYsql,Oracle     and so  on

# SQL Commands:

The standard SQL commands to interact with relational databases are CREATE, SELECT, INSERT, UPDATE,DELETE and DROP. These commands can be classified into groups based on their nature:

### DDL - Data Definition Language:

* CREATE -Creates a new table, a view of a table, or other object in database
* ALTER -Modifies an existing database object, such as a table.
* DROP -Deletes an entire table, a view of a table or other object in the database.

* Note : DDL commands include an implicit commit

### DML - Data Manipulation Language:

* INSERT-Creates a record
* UPDATE-Modifies records
* DELETE-Deletes records

* Note : No implicit commit, use commit after insertion/deletion accordingly

### DCL - Data Control Language:

* GRANT-Gives a privilege to user
* REVOKE-Takes back privileges granted from user

### DQL - Data Query Language:

* SELECT-Retrieves certain records from one or more tables


## What is NULL value in SQL?

* A NULL value in a table is a value in a field that appears to be blank, which means a field with a NULL value is a
  field with no value.

* It is very important to understand that a NULL value is different than a zero value or a field that contains spaces. A
  field with a NULL value is one that has been left blank during record creation.


## Commonly used SQL Constraints:

Constraints are the rules enforced on data columns on table

* NOT NULL Constraint: Ensures that a column cannot have NULL value.
* DEFAULT Constraint: Provides a default value for a column when none is specified.
* UNIQUE Constraint: Ensures that all values in a column are different.
* PRIMARY Key: Uniquely identified each rows/records in a database table.
* FOREIGN Key: Uniquely identified a rows/records in any another database table.
* CHECK Constraint: The CHECK constraint ensures that all values in a column satisfy certain conditions.
* INDEX: Use to create and retrieve data from the database very quickly

In [2]:
!pip install  --upgrade pymysql

Requirement already up-to-date: pymysql in c:\python35\lib\site-packages


You are using pip version 8.1.1, however version 8.1.2 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.


In [None]:
# example where indexing is a good idea 
select* from a where a.name = 'Christian'

#can create one index on both the columns name and address together
select* from a where a.name ='Christian' and a.address='bonn'

In [None]:
# example where indexing is a bad idea 
select* from a where a.name like '%Christian'

# SQL Syntax - Refresher :

* SQL SELECT Statement:
        SELECT column1, column2....columnN
        FROM table_name;
        
* SQL DISTINCT Clause:
       SELECT DISTINCT column1, column2....columnN
       FROM table_name;
        
* SQL WHERE Clause:
       SELECT column1, column2....columnN
       FROM table_name
       WHERE CONDITION;
       
* SQL AND/OR Clause:
      SELECT column1, column2....columnN
      FROM table_name
      WHERE CONDITION-1 {AND|OR} CONDITION-2;
      
* SQL IN Clause:
      SELECT column1, column2....columnN
      FROM table_name
      WHERE column_name IN (val-1, val-2,...val-N);     

* SQL ORDER BY Clause:
      SELECT column1, column2....columnN
      FROM table_name
      WHERE CONDITION
      ORDER BY column_name {ASC|DESC};

* SQL GROUP BY Clause:
      SELECT column_name,[aggregate fun]
      FROM table_name
      WHERE CONDITION
      GROUP BY column_name;
      
* SQL HAVING Clause:
      SELECT SUM(column_name)
      FROM table_name
      WHERE CONDITION
      GROUP BY column_name
      HAVING (arithematic function condition);

## Note on use of Group by clause:

* Group by can include one column or multiple columns

* Grouping of NULL Values : If grouping is required on a column that contains NULL values, all these NULL values form one group.   When rows are grouped.
* **General Rule for the GROUP BY Clause** :
    * any column specified in the SELECT clause must occur as a parameter of an aggregated function or in the list of columns         given in the GROUP BY clause, or in both.
     
      Example : Incorrect use of group by - Toen in included in select but not group by or the aggregate function count()
                SELECT   TOWN, COUNT(*)
                FROM     PLAYERS
                GROUP BY PLAYERNO

# SQL Joins :

There are different types of joins available in SQL:

* INNER JOIN: returns rows when there is a match in both tables.
* LEFT JOIN: returns all rows from the left table, even if there are no matches in the right table.
* RIGHT JOIN: returns all rows from the right table, even if there are no matches in the left table.
* FULL JOIN: returns rows when there is a match in one of the tables.
* SELF JOIN: is used to join a table to itself as if the table were two tables, temporarily renaming at least one
  table in the SQL statement.
* CARTESIAN JOIN: returns the Cartesian product of the sets of records from the two or more joined tables

## Syntax :

* **Inner Join:**
```SQL   
    SELECT table1.column1, table2.column2...
    FROM table1
    INNER JOIN table2
    ON table1.common_filed = table2.common_field;
```
* **Left Join :**
```SQL 
    SELECT table1.column1, table2.column2...
    FROM table1
    LEFT JOIN table2
    ON table1.common_filed = table2.common_field;
```
    
* **Right Join: **
```SQL 
    SELECT table1.column1, table2.column2...
    FROM table1
    RIGHT JOIN table2
    ON table1.common_filed = table2.common_field;
```    
* **Full Join :**
```SQL 
    SELECT table1.column1, table2.column2...
    FROM table1
    FULL JOIN table2
    ON table1.common_filed = table2.common_field;
```

** For More Detailed explaination and syntax refer [SQL Tutorial](http://www.tutorialspoint.com/sql/sql_tutorial.pdf)**

## SQL in ipython notebook

First we have to load ipython-sql

In [5]:
import pymysql


Connect to database - world ( example database available when MYSQL server is installed)

In [6]:
import os
import json
base=os.environ['BUG_FREE_EUREKA_BASE']

with open (os.path.join(base,'secrets.json')) as f:
    secrets=json.load(f)
connection=pymysql.connect(**secrets["test_db"])




### Basic SQL statements: 

* Count the number of rows :

In [7]:
cursor = connection.cursor(pymysql.cursors.DictCursor)
cursor.execute('use world')

0

In [8]:
cursor.execute('select count(*) as no_of_rows from city ')
cursor.fetchone()

{'no_of_rows': 4079}

In [10]:
cursor.execute('Select Name, Population from city;') 
cursor.fetchall()

[{'Name': 'Kabul', 'Population': 1780000},
 {'Name': 'Qandahar', 'Population': 237500},
 {'Name': 'Herat', 'Population': 186800},
 {'Name': 'Mazar-e-Sharif', 'Population': 127800},
 {'Name': 'Amsterdam', 'Population': 731200},
 {'Name': 'Rotterdam', 'Population': 593321},
 {'Name': 'Haag', 'Population': 440900},
 {'Name': 'Utrecht', 'Population': 234323},
 {'Name': 'Eindhoven', 'Population': 201843},
 {'Name': 'Tilburg', 'Population': 193238},
 {'Name': 'Groningen', 'Population': 172701},
 {'Name': 'Breda', 'Population': 160398},
 {'Name': 'Apeldoorn', 'Population': 153491},
 {'Name': 'Nijmegen', 'Population': 152463},
 {'Name': 'Enschede', 'Population': 149544},
 {'Name': 'Haarlem', 'Population': 148772},
 {'Name': 'Almere', 'Population': 142465},
 {'Name': 'Arnhem', 'Population': 138020},
 {'Name': 'Zaanstad', 'Population': 135621},
 {'Name': 'Â´s-Hertogenbosch', 'Population': 129170},
 {'Name': 'Amersfoort', 'Population': 126270},
 {'Name': 'Maastricht', 'Population': 122087},
 {'Na

* Select only few columns :

In [9]:
cursor.execute("Select Name,Population from city limit 10")
cursor.fetchall()

[{'Name': 'Kabul', 'Population': 1780000},
 {'Name': 'Qandahar', 'Population': 237500},
 {'Name': 'Herat', 'Population': 186800},
 {'Name': 'Mazar-e-Sharif', 'Population': 127800},
 {'Name': 'Amsterdam', 'Population': 731200},
 {'Name': 'Rotterdam', 'Population': 593321},
 {'Name': 'Haag', 'Population': 440900},
 {'Name': 'Utrecht', 'Population': 234323},
 {'Name': 'Eindhoven', 'Population': 201843},
 {'Name': 'Tilburg', 'Population': 193238}]

* Where clause:

In [19]:
cursor.execute("Select * from city where CountryCode = 'IND'")
cursor.fetchall()

[{'CountryCode': 'IND',
  'District': 'Maharashtra',
  'ID': 1024,
  'Name': 'Mumbai (Bombay)',
  'Population': 10500000},
 {'CountryCode': 'IND',
  'District': 'Delhi',
  'ID': 1025,
  'Name': 'Delhi',
  'Population': 7206704},
 {'CountryCode': 'IND',
  'District': 'West Bengali',
  'ID': 1026,
  'Name': 'Calcutta [Kolkata]',
  'Population': 4399819},
 {'CountryCode': 'IND',
  'District': 'Tamil Nadu',
  'ID': 1027,
  'Name': 'Chennai (Madras)',
  'Population': 3841396},
 {'CountryCode': 'IND',
  'District': 'Andhra Pradesh',
  'ID': 1028,
  'Name': 'Hyderabad',
  'Population': 2964638},
 {'CountryCode': 'IND',
  'District': 'Gujarat',
  'ID': 1029,
  'Name': 'Ahmedabad',
  'Population': 2876710},
 {'CountryCode': 'IND',
  'District': 'Karnataka',
  'ID': 1030,
  'Name': 'Bangalore',
  'Population': 2660088},
 {'CountryCode': 'IND',
  'District': 'Uttar Pradesh',
  'ID': 1031,
  'Name': 'Kanpur',
  'Population': 1874409},
 {'CountryCode': 'IND',
  'District': 'Maharashtra',
  'ID': 10

* Order By Clause:

In [20]:
cursor.execute("""Select Name,CountryCode,Population from city order by Population desc limit 10""")
cursor.fetchall()


[{'CountryCode': 'IND', 'Name': 'Mumbai (Bombay)', 'Population': 10500000},
 {'CountryCode': 'KOR', 'Name': 'Seoul', 'Population': 9981619},
 {'CountryCode': 'BRA', 'Name': 'SÃ£o Paulo', 'Population': 9968485},
 {'CountryCode': 'CHN', 'Name': 'Shanghai', 'Population': 9696300},
 {'CountryCode': 'IDN', 'Name': 'Jakarta', 'Population': 9604900},
 {'CountryCode': 'PAK', 'Name': 'Karachi', 'Population': 9269265},
 {'CountryCode': 'TUR', 'Name': 'Istanbul', 'Population': 8787958},
 {'CountryCode': 'MEX', 'Name': 'Ciudad de MÃ©xico', 'Population': 8591309},
 {'CountryCode': 'RUS', 'Name': 'Moscow', 'Population': 8389200},
 {'CountryCode': 'USA', 'Name': 'New York', 'Population': 8008278}]

* Aggrgate functions:

In [21]:
cursor.execute('''
Select min(Population) as minimum,
       max(Population) as maximum,
       avg(Population) as average, 
       std(Population) 
from city''')
cursor.fetchall()

[{'average': Decimal('350468.2236'),
  'maximum': 10500000,
  'minimum': 42,
  'std(Population)': 723686.9870174888}]

* Group By Clause :

In [22]:
cursor.execute("""
Select 
    CountryCode,
    sum(Population),
    count(*) 
from city 
    group by CountryCode 
    order by count(*) 
    desc limit 10""")
cursor.fetchall()

[{'CountryCode': 'CHN',
  'count(*)': 363,
  'sum(Population)': Decimal('175953614')},
 {'CountryCode': 'IND',
  'count(*)': 341,
  'sum(Population)': Decimal('123298526')},
 {'CountryCode': 'USA',
  'count(*)': 274,
  'sum(Population)': Decimal('78625774')},
 {'CountryCode': 'BRA',
  'count(*)': 250,
  'sum(Population)': Decimal('85876862')},
 {'CountryCode': 'JPN',
  'count(*)': 248,
  'sum(Population)': Decimal('77965107')},
 {'CountryCode': 'RUS',
  'count(*)': 189,
  'sum(Population)': Decimal('69150700')},
 {'CountryCode': 'MEX',
  'count(*)': 173,
  'sum(Population)': Decimal('59752521')},
 {'CountryCode': 'PHL',
  'count(*)': 136,
  'sum(Population)': Decimal('30934791')},
 {'CountryCode': 'DEU',
  'count(*)': 93,
  'sum(Population)': Decimal('26245483')},
 {'CountryCode': 'IDN',
  'count(*)': 85,
  'sum(Population)': Decimal('37485695')}]

In [23]:
#wrong usage of GROUP BY , population is not part of grp by or any aggregate function
# will throw an error saying " not a group by clause"

# this is because once grouped, the SQL engine does not know which population to pick for the group

cursor.execute("""
Select 
    CountryCode,
    Population,
    count(*) 
from city 
    group by CountryCode 
    order by count(*) 
    desc limit 10""")
cursor.fetchall()


[{'CountryCode': 'CHN', 'Population': 9696300, 'count(*)': 363},
 {'CountryCode': 'IND', 'Population': 10500000, 'count(*)': 341},
 {'CountryCode': 'USA', 'Population': 8008278, 'count(*)': 274},
 {'CountryCode': 'BRA', 'Population': 9968485, 'count(*)': 250},
 {'CountryCode': 'JPN', 'Population': 7980230, 'count(*)': 248},
 {'CountryCode': 'RUS', 'Population': 8389200, 'count(*)': 189},
 {'CountryCode': 'MEX', 'Population': 8591309, 'count(*)': 173},
 {'CountryCode': 'PHL', 'Population': 2173831, 'count(*)': 136},
 {'CountryCode': 'DEU', 'Population': 3386667, 'count(*)': 93},
 {'CountryCode': 'IDN', 'Population': 9604900, 'count(*)': 85}]

* Nested Query :

In [24]:
cursor.execute("""Select 
Name 
from city 
where Population = 
(Select min(Population) from city)""")
cursor.fetchall()


[{'Name': 'Adamstown'}]

### Using PYMYSQL to connect to database and execute queries :

PyMySQL is a database connectors for Python, it is a library to enable Python programs to talk to a MySQL server.

#### Example :

* Create a database cursor and execute the SQL

In [None]:
import pymysql
user = 'root'
password = ''

sql = """
        CREATE DATABASE IF NOT EXISTS reviewtest;
        USE reviewtest;
        DROP TABLE IF EXISTS `DormDetails`;
        CREATE TABLE `DormDetails` (
            `id` int(11) NOT NULL AUTO_INCREMENT,
            `name` varchar(255) NOT NULL,
            `address` varchar(255) NOT NULL,
            PRIMARY KEY (`id`)  
        );"""

connection = pymysql.connect('localhost',user,password)
cursor = connection.cursor(pymysql.cursors.DictCursor)
cursor.execute(sql)
connection.commit()

data = [
  ('Ferdinandstrasse','Ippendorf Bonn'),
  ('Wichelshof','Romerstrasse Bonn'),
  ('Bismarckstrasse','Popelsdorf Bonn'),
]

with connection.cursor() as cursor:
    sql = """
        INSERT INTO `DormDetails` (`name`, `address`)
        VALUES (%s, %s)"""
    cursor.executemany(sql, data)

    
connection.commit()
connection.close()