# 6. Data Manipulation Language (DML)

Data Manipulation Language (DML) in MySQL consists of SQL commands that are used to manipulate and **interact with the data** stored in a MySQL database. DML commands primarily focus on querying, inserting, updating, and deleting data within database tables. 

In [13]:
import pandas as pd
from pandasql import sqldf

In [30]:
authors = pd.read_csv('CSV/authors.csv')
books = pd.read_csv('CSV/books.csv')
customers = pd.read_csv('CSV/customers.csv')
payments = pd.read_csv('CSV/payments.csv')
authorsbooks = pd.read_csv('CSV/authorsbooks.csv')
reviews = pd.read_csv('CSV/reviews.csv')

## SELECT statement


**ORDER BY**

`SELECT` is used to query data in the database. Select command, or statement allows the user to extract data from tables, based on specific criteria. It is processed according to the following sequence:

```sql
SELECT (DISTINCT) item(s)
FROM  table(s)
WHERE  predicate
GROUP BY  field(s)
ORDER BY fields

Let's retrieve a list of Authors and let's order them by their nationality:

In [25]:
pysqlauthors1 = lambda q: sqldf(q, globals())

In [26]:
query1 = """
    SELECT author_name, nationality
    FROM authors
    ORDER BY nationality;
"""

In [27]:
result1 = pysqlauthors1(query1)
print(result1)

        author_name          nationality
0        John Smith             American
1      Olivia Gomez             Anguilla
2      Rafael Gomez  Antigua and Barbuda
3    Luis Rodriguez            Argentina
4    Isabella Lopez          Argentinian
..              ...                  ...
215      Lucia Diaz              Uruguay
216    Lucas Wilson              Vanuatu
217     Pedro Silva            Venezuela
218      Elena Soto           Venezuelan
219      Dong Hoang           Vietnamese

[220 rows x 2 columns]


## SELECT statement with WHERE criteria

### where + between

We will select all the books who`s price is between 10 and 20 Euros:

In [32]:
pysqlbooks1 = lambda q: sqldf(q, globals())

In [33]:
query2 = """SELECT title, price
    FROM books
    WHERE price 
    BETWEEN 10 AND 20"""

result2 = pysqlbooks1(query2)
print(result2)

                                           title  price
0                 Legends of the Northern Lights  10.35
1                          Fjords and Fairytales  16.08
2   The Art of Tradition: Celebrations Worldwide  17.58
3   Cultural Cornerstones: Practices and Customs  14.65
4                       The Philosophy of Wisdom  15.24
5                         The Works of Aristotle  17.03
6                              The Unseen Threat  17.34
7                              Romantic Getaways  16.78
8                               Cherished Dreams  10.88
9               Quick and Easy Weeknight Dinners  10.70
10                          African Safari Tales  18.69
11                                Chasing Dreams  10.73
12                                  Hearts Afire  11.21
13                            The Forbidden Love  11.29
14                             Steam and Shadows  19.10
15                            Clockwork Carnival  13.09
16                              Hoops and Dreams

### where + not between

Let's select books who's rating is not between 1 and 4

In [34]:
pysqlbooks2 = lambda q: sqldf(q, globals())

In [36]:
query3 = """SELECT title, avarage_rating
    FROM books
    WHERE avarage_rating 
    NOT BETWEEN 1 AND 4"""

result3 = pysqlbooks2(query3)
print(result3)

                               title  avarage_rating
0              The Starry Chronicles             4.5
1          The Lost City of Atlantis             4.7
2     Journey to the Cosmic Frontier             4.4
3       The Quest for Eternal Wisdom             4.3
4          The Time Traveler's Diary             4.6
..                               ...             ...
383  The Adventure of Mr. Teddy Bear             4.5
384              The Magical Kingdom             4.4
385             Mystery in Candyland             4.3
386          The Brave Little Knight             4.2
387              Alice in Whimsyland             4.1

[388 rows x 2 columns]


### where + or

Let`s select authors whos nationality is Latvian or Spanish

In [38]:
pysqlauthors2 = lambda q: sqldf(q, globals())

In [39]:
query4 = """SELECT author_name, nationality
    FROM authors
    WHERE nationality = 'Latvian' 
    OR nationality = 'Spanish' """

result4 = pysqlauthors2(query4)
print(result4)

          author_name nationality
0        Maria García     Spanish
1     Carlos Martínez     Spanish
2   Ricardo Fernandez     Spanish
3        Maria Garcia     Spanish
4      Sara Fernández     Spanish
5     Javier González     Spanish
6         Diego Perez     Spanish
7       Carmen Torres     Spanish
8       Lucia Morales     Spanish
9      Andris Bērziņš     Latvian
10      Laima Ozoliņa     Latvian
11    Jānis Pētersons     Latvian
12       Inese Sīmane     Latvian
13     Mārtiņš Ķirsis     Latvian
14       Zane Bērziņa     Latvian
15     Valdis Salmiņš     Latvian
16       Līga Kalniņa     Latvian
17     Pēteris Auziņš     Latvian
18  Silvija Riekstiņa     Latvian
19    Elena Rodriguez     Spanish
20    Isabel Martinez     Spanish
21     Luis Velazquez     Spanish


### where + in

Select customers who are from Spain or Germany, or UK

In [40]:
pysqlcustomers1 = lambda q: sqldf(q, globals())

In [41]:
query5 = """SELECT first_name, last_name, country
    FROM customers
    WHERE country IN ('Latvia' , 'Spain', 'UK')"""

result5 = pysqlcustomers1(query5)
print(result5)

   first_name  last_name country
0        John      Smith      UK
1      Sophia      Brown   Spain
2      Olivia  Rodriguez  Latvia
3   Charlotte    Collins  Latvia
4     Alessia      Ricci   Spain
5     Antonio     Moreno   Spain
6     Luciano      Serra   Spain
7      Matteo      Gallo   Spain
8    Riccardo      Rossi   Spain
9        Luna      Perez   Spain
10    Alessio    Martini   Spain
11        Eva   Lombardi   Spain
12      Elena    Ivanova   Spain
13       Luis     García   Spain
14       Sara      Lopez   Spain
15      Elena    Ivanova   Spain
16       Luis     García   Spain
17       Sara      Lopez   Spain
18  Sebastian     Larsen  Latvia
19       Ance       Soko   Spain


### where + NULL

Select books who`s review comments are NULL

In [42]:
pysqlbooks3 = lambda q: sqldf(q, globals())

In [43]:
query6 = """SELECT title, avarage_rating
    FROM books
    WHERE avarage_rating IS NULL"""

result6 = pysqlbooks3(query6)
print(result6)

                 title avarage_rating
0  Lilly of the Valley           None
1       One nightstand           None


### where + NOT NULL

Retrieve data about books that doesn`t have NULL value in their avarage_rating

In [44]:
pysqlbooks4 = lambda q: sqldf(q, globals())

In [45]:
query7 = """SELECT title, avarage_rating
    FROM books
    WHERE avarage_rating IS NOT NULL"""

result7 = pysqlbooks4(query7)
print(result7)

                                 title  avarage_rating
0                The Starry Chronicles             4.5
1      Secrets of the Enchanted Forest             2.8
2            The Lost City of Atlantis             4.7
3       Journey to the Cosmic Frontier             4.4
4         The Quest for Eternal Wisdom             4.3
..                                 ...             ...
516      The Space Adventures of Ziggy             3.0
517      The Daring Adventures of Lucy             3.0
518  The Mystery of the Missing Cookie             3.0
519                     Unicorn Dreams             2.0
520                    The second hope             4.0

[521 rows x 2 columns]


## Using wildcards in the LIKE clause

The `LIKE` keyword selects rows containing fields that match specified portions of character strings. LIKE is used with char, varchar, text, datetime and smalldatetime data. A wildcard allows the user to match fields that contain certain letters.

| Symbol | Meaning |
|--------|---------|
|    %   |any string of zero or more characters|
|   _    |any single character|
| [  ]     |any single character within the specified range ([a-f], [abcdf])|
| [^]    |any single character not within the specified range ([^a-f] or [^abcdf])|

Let`s select all the last names of customers that start with "An":

In [46]:
pysqlcustomers2 = lambda q: sqldf(q, globals())

In [47]:
query8 = """SELECT customer_id, first_name, last_name
    FROM customers
    WHERE last_name LIKE 'An%'"""

result8 = pysqlcustomers2(query8)
print(result8)

   customer_id first_name last_name
0          128   Patricia  Anderson
1          172    Kwabena    Ankrah
2          232    Adriano   Andrade


Let`s select all the last names of customers that ends with the letterss "on":

In [48]:
pysqlcustomers3 = lambda q: sqldf(q, globals())

query9 = """SELECT first_name, last_name
    FROM customers
    WHERE last_name LIKE '%on'"""

result9 = pysqlcustomers3(query9)
print(result9)

   first_name   last_name
0       Alice     Johnson
1      Olivia      Wilson
2         Mia     Johnson
3       Alice     Johnson
4      Olivia      Wilson
5         Mia     Johnson
6       Ethan     Johnson
7        Mary     Johnson
8     Michael      Wilson
9    Patricia    Anderson
10  Elizabeth     Jackson
11       Emma     Johnson
12      James      Wilson
13       Ella      Wilson
14     Olivia     Johnson
15      Riley    Thompson
16       Lucy      Wilson
17       Emma   Johansson
18      Hanna  Gustavsson


## SELECT statement with ORDER BY clause

`ORDER BY` clause is used to sort the records in the resulting list. Use ASC to sort the results in ascending order and DESC to sort the results in descending order.
By default ORDER BY is sorted in ASC manner.

In [49]:
pysqlcustomers4 = lambda q: sqldf(q, globals())

query10 = """SELECT first_name, last_name, country
    FROM customers
    ORDER BY country"""

result10 = pysqlcustomers4(query10)
print(result10)

    first_name last_name    country
0        Nadia  Mokhtari    Algeria
1      Eduardo     Gomez  Argentina
2      William     Adams    Armenia
3         Liam     Smith  Australia
4       Olivia   Johnson  Australia
..         ...       ...        ...
305       Minh    Nguyen    Vietnam
306        Mai      Tran    Vietnam
307      Phong    Nguyen    Vietnam
308       Quoc      Pham    Vietnam
309      Nadia    Zahran      Yemen

[310 rows x 3 columns]


In [50]:
pysqlcustomers5 = lambda q: sqldf(q, globals())

query11 = """SELECT first_name, last_name, country
    FROM customers
    ORDER BY country DESC"""

result11 = pysqlcustomers5(query11)
print(result11)

    first_name last_name    country
0        Nadia    Zahran      Yemen
1         Minh    Nguyen    Vietnam
2          Mai      Tran    Vietnam
3        Phong    Nguyen    Vietnam
4         Quoc      Pham    Vietnam
..         ...       ...        ...
305     Sophie    Martin  Australia
306       Lucy    Wilson  Australia
307    William     Adams    Armenia
308    Eduardo     Gomez  Argentina
309      Nadia  Mokhtari    Algeria

[310 rows x 3 columns]


## SELECT statement with GROUP BY clause

The `GROUP BY` clause is used to create one output row per each group and produces summary values for the selected columns,

Let's group payment methods:

In [52]:
pysqlpayments1 = lambda q: sqldf(q, globals())

query12 = """SELECT payment_method
    FROM payments
    GROUP BY 1"""

result12 = pysqlpayments1(query12)
print(result12)

  payment_method
0     App Wallet
1    Credit Card
2         PayPal


### Using COUNT with GROUP BY clause

Let's see how many authors there are by country, or let's group authors by country:

In [56]:
pysqlauthors3 = lambda q: sqldf(q, globals())

query13 = """SELECT nationality, count(authors_id) AS quantity
    FROM authors
    GROUP BY 1"""

result13 = pysqlauthors3(query13)
print(result13)

             nationality  quantity
0               American         1
1               Anguilla         1
2    Antigua and Barbuda         1
3              Argentina         1
4            Argentinian         3
..                   ...       ...
102              Uruguay         1
103              Vanuatu         1
104            Venezuela         1
105           Venezuelan         1
106           Vietnamese         1

[107 rows x 2 columns]


### Using AVG and SUM with GROUP BY

We can use the `AVG` function to give us the average of any group, and `SUM` to give the total.

Let`s make a  query that calculates and displays the average payment amount and the total sum of payment amounts for each distinct payment method, and that orders the results by the average payment amount in ascending order:

In [57]:
pysqlpayments2 = lambda q: sqldf(q, globals())

query14 = """SELECT payment_method, ROUND(AVG(amount),2) AS avg_amount, ROUND(SUM(amount),2) AS sum_amount
    FROM payments
    GROUP BY 1
    ORDER BY avg_amount"""

result14 = pysqlpayments2(query14)
print(result14)

  payment_method  avg_amount  sum_amount
0     App Wallet      777.99   515030.72
1         PayPal      793.75   505620.16
2    Credit Card      799.66   569356.76


## Restricting rows with HAVING

The `HAVING` clause can be used to restrict rows. It is similar to the WHERE condition except HAVING can include the aggregate function; the WHERE cannot do this. The HAVING clause behaves like the WHERE clause, but is applicable to groups. 

Make a query that calculates book quantity associated with each author, but the HAVING clause filters only those authors who have more than 4 books associated with them:

In [58]:
pysqlauthorsbooks1 = lambda q: sqldf(q, globals())

query15 = """SELECT authors_id, COUNT(book_id) AS book_quantity
    FROM authorsbooks
    GROUP BY 1
    HAVING book_quantity>4"""

result15 = pysqlauthorsbooks1(query15)
print(result15)

    authors_id  book_quantity
0           20              6
1           27              5
2           41              5
3          106              5
4          115              5
5          118              6
6          127              6
7          138              5
8          142              5
9          160              5
10         164              5
11         168              6
12         186              7
13         189              6
14         218              7


## INSERT statement

The `INSERT` statement adds rows to a table.  
- INSERT specifies the table or view that data will be inserted into
- Column_list lists columns that will be affected by the INSERT
- If a column is ommited, each value must be provided
- If you are including columns, they can be listed in any order
- VALUES specifies the data that you want to insert into the table. VALUE is required.clause.

```sql
INSERT [INTO] Table_name | view name [column_list]
DEFAULT VALUES | values_list | select statement

This example inserts values into authors table:


```sql
INSERT INTO authors (author_name, birth_date, nationality)
VALUES
    ('John Smith', '1980-05-15', 'American'),
    ('Maria García', '1991-12-10', 'Spanish'),
    ('Hiroshi Tanaka', '1975-03-22', 'Japanese'),
 ...

### Inserting rows with SELECT statement

We can sometimes create a small temporary table from a large table. For this, we can insert rows with a SELECT statement. When using this command, there is no validation for uniqueness.

Create the temporary table:

```sql
CREATE TEMPORARY TABLE authors_temp (
    id INT PRIMARY KEY AUTO_INCREMENT,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    nationality VARCHAR(50)
);

Inserting data into permament table:

```sql
INSERT INTO authors 
SELECT * FROM authors_temp;

## UPDATE statement

The `UPDATE` statement changes data in existing rows either by adding new data or modifying existing data.

Be careful to use WHERE clause!!! If not, the changes will be applied to all the selected table.

In this case all the authors nationality will be set to Latvian:

```sql
UPDATE authors
SET nationality = 'Latvian';

You probably are looking for something like this:

```sql
UPDATE authors
SET nationality = 'Latvian'
WHERE author_id = 3;

As authors id is unique, this UPDATE will only affect the author with ID 3 and change it`s nationality to Latvian, instead of affecting all the table.

Christmas is coming, so let's apply 5% discount to all the book prices:

```sql
UPDATE books
SET price = price * 0.95;


### Including subqueries in an UPDATE statement

Supposedly, we want to update the avarage_rating for a specific book in the books table based on the ratings provided in the reviews table for that book (identified by book_id). We can use a subquery to calculate the average rating and update the book's avarage_rating:

```sql
UPDATE books
SET avarage_rating = (
    SELECT AVG(rating) 
    FROM reviews 
    WHERE reviews.book_id = books.book_id
)
WHERE book_ update


## DELETE statement

The `DELETE` statement removes rows from a record set. DELETE names the table or view that holds the rows that will be deleted and only one table or row may be listed at a time. WHERE is  a standard WHERE clause that limits the deletion to select records.

```sql
DELETE [FROM] {table_name | view_name }
[WHERE clause]

If you ommit WHERE clause, all rows in the table are removed (except for indexes, constraints and the table itself)

What follows are three different DELETE statements that can be used:

- Deleting all rows from a table:

```sql
DELETE 
FROM authors;

- Deleting selected rows:

```sql
DELETE FROM authors
WHERE author_id = 123;

- Deleting rows based on a value in a subquery:

```sql
DELETE FROM payments
WHERE order_id IN
(SELECT order_id FROM orders WHERE status = 'Cancelled');

## BUILT-IN function

There are plenty of built-in functions in MySQL that facilitates data manipulation. Here I will sum up the ones that  I find very useful in daily life of someone who works with MySQL:

### MATH functions:

|FUNCTION|USE|
|--------|---|
|ROUND() | Rounds a number to a specified number of decimal places|
|FLOOR() | Returns the largest integer less than or equal to the number|


### STRING functions:

|FUNCTION|USE|
|--------|---|
|CONCAT() | Concatenates two or more strings|
|LENGTH() | Returns the length of a string|
|UPPER()  | Converts a string to uppercase|
|LOWER()  | Converts a string to lowercase|
|SUBSTRING()| Extracts a substring from a string|
|REPLACE() | Replaces occurrences of a substring with another substring|

### DATE & TIME functions:

|FUNCTION|USE|
|--------|---|
|NOW() | Returns the current date and time|
|DATE()| Extracts the date part from a datetime|
|TIME()| Extracts the time part from a datetime|
|DATE_FORMAT()| Formats a datetime value as a string|
|DATEDIFF()| Returns the number of days between two dates|
|DATEADD()| Adds and increments date values |

### AGGREGATE functions:

|FUNCTION|USE|
|--------|---|
|SUM() | Calculates the sum of set of values|
|AVG()| Calculates the avarage of set of values|
|COUNT()| Counts the number of rows or non-null values|
|MAX()| Returns the maximum value in a set of values|
|MIN()|  Returns the minimum value in a set of values|


### CONDITIONAL functions:

|FUNCTION|USE|
|--------|---|
|IF() | Returns one value if a condition is true, and another value if it's false|
|CASE WHEN()| Allows you to perform conditional operations in a SQL query|

## JOINING tables

Joining two or more tables is the process of comparing the data in specified columns and using the comparison results to form a new table from the rows that qualify.fied.

A join statement:
- specifies a column from each table
- compares the values in those columns row by row
- combines rows with qualifying values into a new row

Although the comparison is usually for equality – values that match exactly – other types of joins can also be specified.

### INNER join

An inner join connects two tables on a column with the same data type. Only the rows where the column values match are returned; unmatched rows are discarded.

This query retrieves author names and the titles of books they have authored, and it will only return records where there are matching entries in all three tables:

In [77]:
query_combined = """SELECT a.author_name, b.title
FROM authors a 
INNER JOIN authorsbooks ab
ON a.authors_id = ab.authors_id
INNER JOIN books b
ON b.book_id = ab.book_id
ORDER BY 1"""

result_combined = ps.sqldf(query_combined, globals())
print(result_combined)

       author_name                           title
0    Adriana Perez          The Ultimate BBQ Guide
1    Adriana Perez                       The Torah
2        Ahmed Ali  Infinite Realms of Imagination
3        Ahmed Ali              Echoes of Eternity
4        Ahmed Ali               The Secret Garden
..             ...                             ...
536  Yusuf Erdogan           Equestrian Excellence
537   Zane Bērziņa            Metaphysical Musings
538   Zane Bērziņa             The Haunting Melody
539   Zane Bērziņa             Lilly of the Valley
540     Zoe Turner          The Mysterious Mansion

[541 rows x 2 columns]


### LEFT join

A `LEFT JOIN` specifies that all left outer rows be returned. All rows from the left table that did not meet the condition specified are included in the results set, and output columns from the other table are set to NULL.

Using a LEFT JOIN to show all the values from authors, all the names for authors and their corresponding book ids and book titles. In my database all the authors have atleast one book associated, but in the opposite case, if there were no books associated to an author, in book id and book title NULL would be represented.

In [78]:
query_combined2 = """SELECT a.author_name, b.book_id, b.title
FROM authors a 
LEFT JOIN authorsbooks ab
ON a.authors_id = ab.authors_id
LEFT JOIN books b
ON b.book_id = ab.book_id"""

result_combined2 = ps.sqldf(query_combined2, globals())
print(result_combined2)

        author_name  book_id                                            title
0        John Smith      1.0                            The Starry Chronicles
1        John Smith    256.0                                   Chasing Dreams
2      Maria García      2.0                  Secrets of the Enchanted Forest
3      Maria García    398.0                               The Book of Mormon
4      Maria García    407.0                              The Gospel of Judas
..              ...      ...                                              ...
537      Sofia Soto    388.0  Business Growth Strategies for the 21st Century
538      Sofia Soto    433.0                            An Inconvenient Truth
539      Sofia Soto    455.0                                         Collapse
540  Luis Velazquez    113.0                              The Quest for Truth
541  Remedios Vegas    527.0                                  The second hope

[542 rows x 3 columns]


### RIGHT join

A `RIGHT JOIN` includes, in its result set, all rows from the right table that did not meet the condition specified. Output columns that correspond to the other table are set to NULL.

Use a RIGHT JOIN to show all the books who doesn`t have reviews:

In [79]:
query_combined3 = """SELECT b.book_id, r.review_id
FROM reviews r 
RIGHT JOIN books b
ON b.book_id = r.book_id
WHERE review_id IS NULL"""

result_combined3 = ps.sqldf(query_combined3, globals())
print(result_combined3)

     book_id review_id
0          1      None
1          4      None
2          8      None
3          9      None
4         12      None
..       ...       ...
314      516      None
315      517      None
316      520      None
317      522      None
318      523      None

[319 rows x 2 columns]
