**Part 1: Some bikes are classified as hard tail bikes. This means they have no rear shock. For every hard tail sold, the shock column is missing data. Write a query to display 'No shock' in any rows of the shock name column in the products table where there is no shock name listed. If your query does not run, be sure to check the column for any white space. You may need to nest two different data preparation functions to complete this step.**

In [310]:
UPDATE dsci_504.components
SET comp_name = 'No shock' 
WHERE TRIM(comp_name) IS NULL;

**Write an UPDATE to the component table to make the changes above. Once complete, run a query to verify your changes were successfull. Limit the query to 100 rows.**

In [311]:
select *
FROM dsci_504.components
WHERE comp_name = 'No shock'
LIMIT 100;

comp_id,comp_name,comp_cost,comp_supplier,comp_cat
172,No shock,0.0,75,shock


**Now that we have re-named the component name column, we need to make sure the component cost and supplier columns have been updated as well. Update both columns to show $0.00 and 'No Part' respectively. To ensure no null values in the table, you may need to create a supplier entry in the supplier table to point the component supplier column to.**

In [312]:
iNSERT INTO Suppliers (sup_id, sup_name, sup_ctry) 
VALUES (0, 'No Part','');
UPDATE dsci_504.Components 
SET comp_cost = 0.00, 
    comp_supplier = 'No Part'
WHERE comp_supplier IS NULL;

: duplicate key value violates unique constraint "suppliers_pkey"

**Part 2: Now that you have fixed the component and supplier tables, find all sales for hardtail bikes. Include the following columns in your output:**

- **Product Name**
- **Build Name**
- **Sum of the Order Total**

**Be sure to create a rollup of the results by product name and build name**

In [225]:
SELECT 
    p.prod_name AS "Product Name", 
    b.build_name AS "Build Name",
     p.prod_description, 
    SUM(o.order_tot) AS "Sum of the Order Total"
FROM 
    dsci_504.Products p
JOIN 
    dsci_504.productbuilds pb ON p.prod_id = pb.prod_id
JOIN 
    dsci_504.builds b ON pb.build_id = b.build_id
JOIN 
    dsci_504.Orders o ON o.prod_id = p.prod_id
WHERE 
    p.prod_description = 'Hardtail Mountain Bike'
GROUP BY 
    ROLLUP (p.prod_name, b.build_name,p.prod_description);

Product Name,Build Name,prod_description,Sum of the Order Total
,,,366109.01
Chameleon,extreme lines,Hardtail Mountain Bike,71412.49
Air9,trail boss,Hardtail Mountain Bike,81477.28
Scout,galaxy,Hardtail Mountain Bike,99342.09
DV9,max air,Hardtail Mountain Bike,113877.15
Air9,trail boss,,81477.28
Scout,galaxy,,99342.09
Chameleon,extreme lines,,71412.49
DV9,max air,,113877.15
Scout,,,99342.09


**Q: What is the total of sales for all hardtail mountain bikes?**

```
A: 366109.01

```

**Q: What was the highest selling bike and build combination?**

```
A: 113877.15

```

**Q: Explain the significance of the NULL values lines in the output**

```
A: Because we use ROLLUP and the NULL values provide a way of identifying the overall data grouped by specific categories.

```

**Part 3: You have been asked to provide some data for a side project. Due to privacy reasons, you have been asked to remove certain customer join dates. Create a new table named opc\_export in the public schema with the following requested fields:**

```
Customer ID* // Customer Last Name // Customer First Name // Customer Join Date // Customer Appreciation Code // Total Order Quantity // Total Order Value

```

**You can name the columns whatever you want**

In [226]:
--CREATE TABLE  dsci_504.opc_export AS
SELECT 
    cus_id AS customer_id,
    cus_last_name AS customer_last_name,
    cus_first_name AS customer_first_name,
    CASE 
        WHEN cus_join_date NOT BETWEEN '11/22/2020' AND '6/16/2020' THEN cus_join_date
        ELSE NULL
    END AS customer_join_date,
    cus_app_cd AS customer_appreciation_code,
    tot_ord_qty AS total_order_quantity,
    tot_ord_value AS total_order_value
FROM 
    dsci_504.Customers;

customer_id,customer_last_name,customer_first_name,customer_join_date,customer_appreciation_code,total_order_quantity,total_order_value
1355,Rice,Margo,2009-01-20,7,67,33487.0
1703,Moore,Brandt,2002-06-30,4,76,11917.0
5,Conley,Airn,2004-04-29,5,17,25475.0
471,Smith,Taylor,2016-08-22,1,48,31106.0
627,Presley,Erin,2018-11-26,1,64,21532.0
1255,Cooper,Carl,2016-03-27,1,97,16208.0
2283,Tankersly,Joshua,2005-05-06,4,8,14859.0
2241,Jamieson,Noel,2012-11-24,2,52,24171.0
2617,Mathiasen,Ulrich,2020-01-27,18,45,27099.12
21,Jones,Piper,2019-02-18,1,23,32288.0


**Transfer the requested data from the customer table in the DSCI\_504 schema to the opc\_export table in the public schema. Only provide data AFTER January 1, 2020. Mask the following dates with NULL values:**

```
2020-12-12

```

**You may want to run a query prior to transfering the data to quality check your work.**

In [117]:
--INSERT INTO dsci_504.opc_export AS
SELECT 
    Customers .cus_id,
    Customers .cus_last_name, 
    Customers .cus_first_name, 
    CASE 
        WHEN cus_join_date IN ('2020-01-01', '2020-02-02', '2020-03-03') THEN NULL
        ELSE cus_join_date
    END, 
    cus_app_cd, 
    tot_ord_qty, 
    tot_ord_value 
FROM DSCI_504.Customers 
WHERE cus_join_date > '2020-01-01';

cus_id,cus_last_name,cus_first_name,cus_join_date,cus_app_cd,tot_ord_qty,tot_ord_value
2617,Mathiasen,Ulrich,2020-01-27,18,45,27099.12
2618,Kemp,Susanne,2020-12-11,23,13,3074.41
2619,Daniel,Ellie,2020-06-16,9,2,17557.2
2600,Holzer,Yeni,2020-11-22,5,60,10641.48
2601,Hull,Tancredo,2020-04-19,9,49,24585.61
2602,Sokol,Teodor,2020-04-15,14,48,8514.29
2603,Barnes,Theo,2020-09-14,18,76,19173.06
2604,Tempest,Simone,2020-06-24,11,45,27325.72
2605,Bullock,Pankaj,2020-03-20,3,11,1214.88
2606,Ware,Reina,2020-09-21,6,64,6718.85


**Run a query to return all records with a customer join date of December 12, 2020**

In [234]:
select  * 
FROM dsci_504.opc_export
WHERE customer_join_date = '2020-12-12';

customer_id,customer_last_name,customer_first_name,customer_join_date,customer_appreciation_code,total_order_quantity,total_order_value
2609,Bohme,Bohuslav,2020-12-12,19,96,15178.38


**Q: What happened to the records of the individuals who joined on 2020-12-12? Were they masked? Why do youthink the SELECT and INSERT statements behaved differently? Provide your best explanation for what happened.**

```
A: The reason the SELECT and INSERT statements may appear to behave differently could be related to the data in the "DSCI_504.Customers" table. Records with a '2020-12-12' date are in the "cus_join_date" column. They should be selected as they are without modification by the CASE statement and then inserted into the "dsci_504.opc_export" table. 

```

**Part 4: Tax laws update each year. While previous orders typically will keep the last tax rate for audit purposes, OPC utilizes the previous order structure to copy into the new order when placed. This happens in middleware so there is no additional tax on the database. Modify the tax rate column in the tax table to reflect the new tax rates for the 2022 tax year for each state.**

**You may need to do some summary analysis on the table to identify the most appropriate action to take. Be sure to test all queries where you will be overwriting table data before executing. Show all queries.**

**Use this resource for all of your tax rates:**

[2022 Sales Tax Rates](https://taxfoundation.org/2022-sales-taxes/)

In [242]:
UPDATE dsci_504.Taxes
SET tax_rate = CASE 
    WHEN tax_loc='AK' THEN 1.76
    WHEN tax_loc='AL' THEN 9.24
    WHEN tax_loc='AZ' THEN 8.40
END
    

**Now that you have the taxes table updated, run a query to find all orders in the order table placed after January 1, 2019 from Ohio. Include a column that calculates the 2022 order total based on the current 2022 tax rate in the taxes table.**

In [249]:
SELECT Orders.ord_id, Orders.order_tot, Orders.order_tot + (Orders.order_tot * Taxes.tax_rate) AS order_tot_2022
FROM dsci_504.Orders
JOIN dsci_504.Customers ON Orders.cus_id = Customers.cus_id
JOIN dsci_504.States ON Customers.cus_state = States.state_id
JOIN dsci_504.Taxes ON Orders.ord_tax_loc = Taxes.tax_id
WHERE Orders.ord_date = '2019-01-01' 
AND States.state = 'OH';

ord_id,order_tot,order_tot_2022


**Part 5: Calculate the sum of phone numbers in the customer table for the state of California. Output only the sum of the phone numbes as phone\_sum.**

In [55]:
SELECT SUM(cus_phone) AS phone_sum
FROM dsci_504.Customers c
JOIN dsci_504.States s ON c.cus_state = s.state_id
WHERE s.state = 'CA';

phone_sum
395577183626


**Q: What is the sum of all phone numbers in California?**

```
A: 395577183626

```

**Part 6: Calculate the sum of all zip codes for customers who have orders with a shipping tax location of West Virginia**

In [59]:
select SUM(cus_zip) AS sum_zip_codes
FROM dsci_504.Customers 
INNER JOIN dsci_504.Orders ON Customers.cus_id = Orders.cus_id
WHERE Orders.ord_tax_loc = '29';

sum_zip_codes
4848445


**Q: What is the sum of zip codes from all customers with orders from West Virginia?**

```
A: 4848445

```

**Part 5: Write a query to select the distinct count of all orders from each state. Be sure to display ONLY the state digraph (two-letter code) and the count. Order by state.**

In [61]:
SELECT s.state, COUNT(DISTINCT o.ord_id) AS order_count
FROM dsci_504.States AS s
LEFT JOIN dsci_504.Customers AS c ON s.state_id = c.cus_state
LEFT JOIN dsci_504.Orders AS o ON c.cus_id = o.cus_id
GROUP BY s.state
ORDER BY s.state;

state,order_count
AK,84
AL,73
AZ,87
CA,72
CO,81
CT,91
DE,93
FL,80
GA,87
HI,93


**Q: What would happen if you altered the location of the DISTINCT clause in your query between the SELECT and column locations? Explain what would happen and why you think this is an important aspect to know.**

```
A: using the DISTINCT  keyword between SELECT and the column immediately following essentially filters out any duplicate results within that column

```

In [63]:
SELECT 
    States.state, 
    COUNT(DISTINCT Orders.ord_id) as order_count
FROM 
    dsci_504.Orders
INNER JOIN 
    dsci_504.Customers ON Orders.cus_id = Customers.cus_id
INNER JOIN 
    dsci_504.States ON Customers.cus_state = States.state_id
GROUP BY 
    States.state
ORDER BY 
    States.state;

state,order_count
AK,84
AL,73
AZ,87
CA,72
CO,81
CT,91
DE,93
FL,80
GA,87
HI,93


**Part 7: Generate a series of numbers to be entered as the OPC customer appreciation number.**

**The template for the number will be 'OPC' followed by a random number multipled by 10,000,000. Add 10 to the random number output and multiply that by 195185. Divide the total number by .25. The output should be OPC+an integer of 8 digits.**

**Limit your generated series to 10 iterations for this attempt.**

In [261]:

WITH generate_series AS (
    SELECT generate_series(1,10) AS num
)
SELECT CONCAT('OPC', ((random() * 10000000 + 10) * 195185) / 0.25)::varchar(10) AS customer_appreciation_number
FROM generate_series

customer_appreciation_number
OPC6230869
OPC1854356
OPC1893573
OPC5745195
OPC3437791
OPC4209268
OPC3724743
OPC5169396
OPC4710235
OPC3399135


**Create a new table in the public schema and insert the generated data in the table. The number of enteries should total the number of records in the customers table. This will be used as a holding table for future use.**

In [294]:
CREATE TABLE dsci_504.opc_appreciation_numbers (
    opc_customer_appreciation_number VARCHAR(12)
);
INSERT INTO dsci_504.opc_appreciation_numbers (opc_customer_appreciation_number)
SELECT CONCAT('OPC', ((random() * 10000000 + 10) * 195185) / 0.25)::varchar(10)
FROM generate_series(1, (SELECT COUNT(*) FROM DSCI_504.Customers));


: relation "opc_appreciation_numbers" already exists

**Insert the appropriate number of customer appreciation number based on the size of the customer table.**

In [295]:
INSERT INTO dsci_504.opc_appreciation_numbers (opc_customer_appreciation_number)
SELECT CONCAT('OPC', ((random() * 10000000 + 10) * 195185) / 0.25)::varchar(10)
FROM generate_series(1, (SELECT COUNT(*) FROM DSCI_504.Customers));


**Q: What is the 500th Customer Appreciation Number?**

```
A: 

```

In [296]:
SELECT opc_customer_appreciation_number
FROM dsci_504.opc_appreciation_numbers
LIMIT 1 OFFSET 499;


opc_customer_appreciation_number
OPC########


**Set the values in the cus\_app\_num column in the customers table in teh dsci\_504 schema to NULL.**

**Insert the customer appreciation numbers held in the cus\_apprec\_num table to the customer table in the DSCI504 schema. You may need to alter the table to accept the differnet data type.**

In [307]:
UPDATE dsci_504.customers
SET cus_app_num = NULL;
ALTER TABLE dsci_504.customers 
ALTER COLUMN cus_app_num TYPE TEXT USING cus_app_num::TEXT;
INSERT INTO dsci_504.customers(cus_app_num)
SELECT opc_customer_appreciation_number
FROM dsci_504.opc_appreciation_numbers; 

: null value in column "cus_id" violates not-null constraint

**Q: What is the street of the customer with customer appreciation number OPC80448799. If your numbers did not generate this cutomer number, simply pick the address of customer 429.**

```
A:
```

In [309]:
SELECT cus_address
FROM dsci_504.customers
WHERE cus_app_num = 'OPC80448799';

cus_address


**Part 8: Provide an explanation of how you can use the skills in this assignment in your everyday life as a professional. Explain some use cases where this informatiuon may have been invaluable prior to you learning SQL.**

```
A: The skills through this assignment are applicable across various professional fields, where data analysis, data preparation, and database management are fundamental tasks. These functions help professionals work with data efficiently, improve quality, and make informed decisions based on accurate and well-prepared data

```

In [1]:
import nbconvert

!jupyter nbconvert --to html Assignment06_Shahal.ipynb

[NbConvertApp] Converting notebook Assignment06_Shahal.ipynb to html
[NbConvertApp] Writing 932027 bytes to Assignment06_Shahal.html
