# 2. Population of database tables

First of all, it is important to mention that this database is completely fictional and 100% constructed by me.


There were mainly 3 ways I populated my database tables:

1. By using a simple `INSERT INTO ... SELECT` statement
2. By using a complex `INSERT INTO ... SELECT` statement
3. By using Python code in VSCode

## 1. Simple `INSERT INTO` statement

I used `INSERT INTO ` statement in the independent tables like Books, Authors, Genres, Customers 

**Books table:**

```sql
INSERT INTO books (title, publication_date, ISBN, stock_quantity, avarage_rating)
VALUES
  ('The Starry Chronicles', '2022-03-15', '9781234567890', 50, 4.5),
  ('Secrets of the Enchanted Forest', '2022-05-20', '9782345678901', 40, 4.2),
  ('The Lost City of Atlantis', '2022-04-10', '9783456789012', 30, 4.7), 
  ...


**Authors table:**

```sql
INSERT INTO authors (author_name, birth_date, nationality)
VALUES
    ('John Smith', '1980-05-15', 'American'),
    ('Maria García', '1991-12-10', 'Spanish'),
    ('Hiroshi Tanaka', '1975-03-22', 'Japanese'),
 ...

**Customers table:**

```sql
INSERT INTO customers (first_name, last_name, email, password, phone, address, membership_status, registration_date, last_login, country)
VALUES
('John', 'Smith', 'john.smith@example.com', 'password1', '123-456-7890', '123 Main St, London, UK', 'Standard', '2000-05-15', '2023-10-16', 'UK'),
('Alice', 'Johnson', 'alice.johnson@example.com', 'password2', '234-567-8901', '456 Elm St, Paris, France', 'Premium', '2005-10-20', '2023-10-15', 'France'),
('Michael', 'Miller', 'michael.miller@example.com', 'password3', '345-678-9012', '789 Oak St, Berlin, Germany', 'VIP', '2010-07-07', '2023-10-14', 'Germany'),
...

**Genres table:**

```sql
INSERT INTO genres (genre_name)
VALUES
    ('Science Fiction'),
    ('Fantasy'),
    ('Mystery'),
 ...

## 2. Complex `INSERT INTO ` statement

I used it to populate payments table:

```sql
INSERT INTO payments (order_id, payment_date, payment_method, amount, status)
SELECT
  o.order_id,
  DATE(NOW() - INTERVAL FLOOR(RAND() * 1096) DAY) AS payment_date,
  CASE FLOOR(RAND() * 3)
    WHEN 0 THEN 'Credit Card'
    WHEN 1 THEN 'PayPal'
    WHEN 2 THEN 'App Wallet'
  END AS payment_method,
  SUM(oi.subtotal) AS amount,
  CASE FLOOR(RAND() * 4)
    WHEN 0 THEN 'Pending'
    WHEN 1 THEN 'Completed'
    WHEN 2 THEN 'Failed'
    WHEN 3 THEN 'Refunded'
  END AS status
FROM orders AS o
JOIN orderitems AS oi ON o.order_id = oi.order_id
GROUP BY o.order_id;
```

The data for this table is derived from the Orders and Orderitems tables, using `JOIN` based on the order_id column, and the `GOUP BY` clause groups the data by the order_id in the orders table.

`DATE(NOW() - INTERVAL FLOOR(RAND() * 1096) DAY)` generates a random date within the last 3 years.

`CASE FLOOR(RAND() * 3)` generates a random payment method.

`SUM(oi.subtotal)` calculates the payment amount as the sum of item subtotals.

`CASE FLOOR(RAND() * 4)` enerates a random payment status.

## Using Python

In tables like Orderitems, Loyaltypoints, Authorsbooks, Genresbooks, Orders, Reviews, which are dependent tables, to generate data I decided to use Python. As I am additionally studying Python, I decided to put my new knowledge in use and use random to generate records for these tables.

### Generating `INSERT INTO`statements with Python:

Code to populate Authorsbooks and Genresbooks (as they have the same idea). These tables are made of composite foreign keys that function as primary keys, a result of M:M relationship. So to successfuly make those tables I had to take in account actual values of primary key from table Authors and Books, and in the other case - primary key of Genres table and primary key of Books table:

**Authorsbooks table:**

```python
import random 

# Generate a list of unique book IDs
book_ids = list(range(1, 522))  # Book IDs from 1 to 521

# Create a list of authors with unique book IDs
associations = [(authors_id, book_id) for authors_id, book_id in zip(range(1, 219), book_ids[:218])]

# Randomly assign authors to the remaining book IDs
for book_id in book_ids[218:]:
    authors_id = random.randint(1, 218)
    associations.append((authors_id, book_id))

# Create SQL INSERT INTO statements
sql_statements = []
for authors_id, book_id in associations:
    sql = f"INSERT INTO authorsbooks (authors_id, book_id) VALUES ({authors_id}, {book_id});"
    sql_statements.append(sql)

# Print the SQL statements
for sql in sql_statements:
    print(sql)

Result:
```sql
INSERT INTO authorsbooks (authors_id, book_id) VALUES (1, 1);
INSERT INTO authorsbooks (authors_id, book_id) VALUES (2, 2);  
INSERT INTO authorsbooks (authors_id, book_id) VALUES (3, 3); 
...

**Genresbooks table:**

```python
import random

# Generate a list of unique book IDs
book_ids = list(range(1, 522))  # Book IDs from 1 to 521

# Create a list of genres with unique book IDs
associations = [(genre_id, book_id) for genre_id, book_id in zip(range(1, 49), book_ids[:48])]

# Randomly assign genres to the remaining book IDs
for book_id in book_ids[48:]:
    genre_id = random.randint(1, 48)
    associations.append((genre_id, book_id))

# Create SQL INSERT INTO statements
sql_statements = []
for genre_id, book_id in associations:
    sql = f"INSERT INTO genresbooks (genre_id, book_id) VALUES ({genre_id}, {book_id});"
    sql_statements.append(sql)

# Print the SQL statements
for sql in sql_statements:
    print(sql)

Result: 
```sql
INSERT INTO genresbooks (genre_id, book_id) VALUES (1, 1);
INSERT INTO genresbooks (genre_id, book_id) VALUES (2, 2);  
INSERT INTO genresbooks (genre_id, book_id) VALUES (3, 3);
...

**Loyalty points table**  
In this table I had to use existing customer_id from table Customers and generate the rest of needed data to populate the table


```python
from random import randint
from datetime import datetime, timedelta

# Number of customers
num_customers = 309

# Function to generate a random date between 'start' and 'end'
def random_date(start, end):
    return start + timedelta(seconds=randint(0, int((end - start).total_seconds())))

# Print SQL INSERT statements
for customer_id in range(1, num_customers + 1):
    # Generate random data
    points_earned = randint(1, 1000)
    transaction_date = random_date(datetime(2021, 1, 1), datetime.now())

    # SQL INSERT statement
    insert_query = f"INSERT INTO loyaltypoints (customer_id, points_earned, transaction_date) VALUES ({customer_id}, {points_earned}, '{transaction_date.strftime('%Y-%m-%d %H:%M:%S')}');"
    
    print(insert_query)


Result:
```sql
INSERT INTO loyaltypoints (customer_id, points_earned, transaction_date) VALUES (1, 74, '2023-07-29 08:03:40');
INSERT INTO loyaltypoints (customer_id, points_earned, transaction_date) VALUES (2, 258, '2021-12-02 04:34:58'); 
INSERT INTO loyaltypoints (customer_id, points_earned, transaction_date) VALUES (3, 45, '2023-02-28 09:48:39');  
...

**Reviews table**  
In Reviews table I had to take in account existing book_id and customer_id, as those were foreign keys and actual customers leave actual reviews. It is not neccessary for every book to have review or for every customer to leave a review, so I made a total of 250 reviews just add something in the database.

```python
import random
from datetime import date, timedelta

# The number of reviews to generate
num_reviews = 250

# Function to replace single quotes with double single quotes
def escape_quotes(text):
    if text is not None:
        return text.replace("'", "''")
    else:
        return 'NULL'

# Generate and print random reviews
for _ in range(num_reviews):
    customer_id = random.randint(1, 309)  # Number of customers
    book_id = random.randint(1, 521)      # Number of books
    rating = random.randint(1, 10)
    comments = None if random.random() < 0.2 else f"Review for Book {book_id}"
    review_date = date.today() - timedelta(days=random.randint(1, 365))

    print(f"INSERT INTO reviews (customer_id, book_id, rating, comments, review_date) VALUES "
          f"({customer_id}, {book_id}, {rating}, '{escape_quotes(comments)}', '{review_date}');")', '{review_date}');"), '{review_date}');") ', '{review_date}');")', '{review_date}');")

Result:
```sql
INSERT INTO reviews (customer_id, book_id, rating, comments, review_date) VALUES (11, 350, 2, 'NULL', '2022-12-14');
INSERT INTO reviews (customer_id, book_id, rating, comments, review_date) VALUES (56, 506, 8, 'NULL', '2023-01-30');
INSERT INTO reviews (customer_id, book_id, rating, comments, review_date) VALUES (195, 141, 6, 'Review for Book 141', '2023-10-09');
...

### Inserting data directly in database with Python

By making a connection in VS Code, I populated Orderitems and Orders with data by only writing Python code and executing it in my VS Code.

**Orders table**  
In Orders table I had to use existing customer_id from Customers table and generate fictional orders who have predefined states - Shipped, Delivered, Pending and Cancelled. 

```python
import random
from datetime import date, timedelta
import mysql.connector

# Connect to your MySQL database
conn = mysql.connector.connect(
    host="localhost",
    user="root",
    password="",
    database="bookshopnearme"
)
cursor = conn.cursor()

# The number of records to generate
num_records = 1000

# A list of order statuses
order_statuses = ["Pending", "Shipped", "Cancelled", "Delivered"]

# Generate and insert random data
for i in range(num_records):
    customer_id = random.randint(1, 309)  # Assuming customer IDs start from 1
    order_date = date.today() - timedelta(days=random.randint(1, 365))  # Orders from the last 365 days
    order_status = random.choice(order_statuses)

    # SQL query to insert data into the table
    query = "INSERT INTO orders (customer_id, order_date, status) VALUES (%s, %s, %s)"
    values = (customer_id, order_date, order_status)

    cursor.execute(query, values)

# Commit the changes and close the connection
conn.commit()
conn.close()()tion
conn.commit()
conn.close()

Here I noticed that, as the data was randomly generated, there were too many orders with Pending, Cancelled ans Shipped statuses, which is just a sign of badly managed business, so I additionally wrote a code to Insert into the table some more order with exclusively Delivered status:

```python
import random
from datetime import date, timedelta

num_rows = 1300  # Number of rows to generate
today = date.today()

# Generate and print INSERT INTO queries
for i in range(1, num_rows + 1):
    customer_id = random.randint(1,309)  # Customer IDs from 1 to 309
    status = 'Delivered'
    random_days = random.randint(1, 800)
    order_date = today - timedelta(days=random_days)
    
    insert_query = f"INSERT INTO orders (customer_id, order_date, status) VALUES ({customer_id}, '{order_date}', '{status}');"
    print(insert_query)

Result:
```sql
INSERT INTO orders (customer_id, order_date, status) VALUES (112, '2023-10-01', 'Delivered');
INSERT INTO orders (customer_id, order_date, status) VALUES (222, '2023-01-18', 'Delivered');
INSERT INTO orders (customer_id, order_date, status) VALUES (74, '2021-08-08', 'Delivered');
...

**Orderitems table**  
In Orderitems table I had to take in account two foreign keys - books_id and order_id

```python
import random
import pymysql  

# Establish a connection to the MySQL database
connection = pymysql.connect(
    host="localhost",
    user="root",
    password="",
    db="bookshopnearme",
    cursorclass=pymysql.cursors.DictCursor
)

try:
    num_rows = 3010  # Number of rows to generate
    cursor = connection.cursor()

    for _ in range(num_rows):
        order_id = random.randint(1, 2013)
        book_id = random.randint(1, 521)
        quantity = random.randint(1, 10)
        price_per_book = round(random.uniform(1.0, 90.0), 2)
        subtotal = round(quantity * price_per_book, 2)
    
        insert_query = f"INSERT INTO orderitems (order_id, book_id, quantity, subtotal) VALUES ({order_id}, {book_id}, {quantity}, {subtotal});"
        cursor.execute(insert_query)

    connection.commit()

finally:
    connection.close()e()ally:
    connection.close()