# Sprint 2 - Security & ACID transactions

## 1. Complex sql queries with relational algebra

### Create date price tabel for one stock

$$
\begin{aligned}
&\text{Temp1} \leftarrow \sigma_{\text{s.StockID} = \text{shh.StockID}} (\text{Stocks} \times \text{StockHasHistory}) \\
&\text{Temp2} \leftarrow \sigma_{\text{shh.HistoryID} = \text{h.HistoryID}} (\text{Temp1} \times \text{History}) \\
&\text{Temp3} \leftarrow \sigma_{\text{s.Ticker} = 'AAPL'} (\text{Temp2}) \\
&\text{Result} \leftarrow \pi_{\text{h.Date, h.Price}} (\text{Temp3}) \\
&\text{FinalResult} \leftarrow \text{LIMIT}_{10} (\text{ORDER BY}_{\text{h.Date DESC}} (\text{Result}))
\end{aligned}
$$


1. **Temp1:**
   This step performs a Cartesian product between the `Stocks` table and the `StockHasHistory` table and then applies a selection to keep only those tuples where the `StockID` from `Stocks` matches the `StockID` from `StockHasHistory`. This effectively joins the two tables on `StockID`.

2. **Temp2:**
   This step takes the result from Temp1 and performs a Cartesian product with the `History` table. It then applies a selection to keep only those tuples where the `HistoryID` from `StockHasHistory` matches the `HistoryID` from `History`. This effectively joins the result of Temp1 with the `History` table on `HistoryID`.

3. **Temp3:**
   This step applies a selection on Temp2 to keep only those tuples where the `Ticker` attribute in the `Stocks` table is 'AAPL'. This filters the joined data to include only the history records for the stock with the ticker 'AAPL'.

4. **Result:**
   This step projects the `Date` and `Price` attributes from the `History` table for the filtered data from Temp3. Essentially, it extracts only the date and price information for the 'AAPL' stock.

5. **FinalResult:**
   This final step orders the results by `Date` in descending order and then limits the output to the top 10 records. This means it retrieves the 10 most recent price records for the 'AAPL' stock.


Thi outputs the 10 most recent historical price records for the stock with the ticker 'AAPL', sorted by date in descending order.

In [None]:
%%sql

SELECT h.Date, h.Price
FROM Stocks s
JOIN StockHasHistory shh ON s.StockID = shh.StockID
JOIN History h ON shh.HistoryID = h.HistoryID
WHERE s.Ticker = 'AAPL'
ORDER BY h.Date DESC
LIMIT 10;

 * mysql+pymysql://csc370:***@localhost:3306/sprint1
10 rows affected.


Date,Price
2024-06-12,213.07
2024-06-11,207.15
2024-06-10,193.12
2024-06-07,196.89
2024-06-06,194.48
2024-06-05,195.87
2024-06-04,194.35
2024-06-03,194.03
2024-05-31,192.25
2024-05-30,191.29


### Average historical price for predetermined period

$$
\begin{aligned}
&\text{Temp1} \leftarrow \sigma_{\text{Stocks.StockID} = \text{StockHasHistory.StockID}} (\text{Stocks} \times \text{StockHasHistory}) \\
&\text{Temp2} \leftarrow \sigma_{\text{StockHasHistory.HistoryID} = \text{History.HistoryID}} (\text{Temp1} \times \text{History}) \\
&\text{Temp3} \leftarrow \pi_{\text{Ticker, Price}} (\text{Temp2}) \\
&\text{Result} \leftarrow \gamma_{\text{Ticker}, \text{AVG(Price)} \rightarrow AvgPrice} (\text{Temp3})
\end{aligned}
$$


1. **Temp1:**
   This step performs a Cartesian product between the `Stocks` table and the `StockHasHistory` table and then applies a selection to keep only those tuples where the `StockID` from `Stocks` matches the `StockID` from `StockHasHistory`. This effectively joins the two tables on `StockID`.

2. **Temp2:**
   This step takes the result from Temp1 and performs a Cartesian product with the `History` table. It then applies a selection to keep only those tuples where the `HistoryID` from `StockHasHistory` matches the `HistoryID` from `History`. This effectively joins the result of Temp1 with the `History` table on `HistoryID`.

3. **Temp3:**
   This step projects the `Ticker` and `Price` attributes from the combined data in Temp2. Essentially, it extracts only the `Ticker` and `Price` information for each stock.

4. **Result:**
   This final step performs an aggregation on Temp3. It groups the data by `Ticker` and calculates the average `Price` for each `Ticker`, storing the result as `AvgPrice`.

This outputs a list of stock tickers with their corresponding average prices. Provides average price for each stock based on the historical price data available in the `History` table.

In [None]:
%%sql

-- Retrieve the average historical price for each stock over the entire period
SELECT 
    s.Ticker,
    ROUND(AVG(h.Price), 2) AS AvgPrice
FROM 
    Stocks s
JOIN 
    StockHasHistory shh ON s.StockID = shh.StockID
JOIN 
    History h ON shh.HistoryID = h.HistoryID
GROUP BY 
    s.Ticker;

 * mysql+pymysql://csc370:***@localhost:3306/sprint1
15 rows affected.


Ticker,AvgPrice
AAPL,182.73
MSFT,371.72
AMZN,153.37
GOOGL,141.05
META,378.93
TSLA,219.6
BRK-A,565853.73
JNJ,154.55
JPM,165.06
V,256.1


### Date with highest price in period

$$
\begin{aligned}
&\text{Temp1} \leftarrow \sigma_{\text{Stocks.StockID} = \text{StockHasHistory.StockID}} (\text{Stocks} \times \text{StockHasHistory}) \\
&\text{Temp2} \leftarrow \sigma_{\text{StockHasHistory.HistoryID} = \text{History.HistoryID}} (\text{Temp1} \times \text{History}) \\
&\text{MaxPricePerTicker} \leftarrow \gamma_{\text{Ticker}, \text{MAX(Price)} \rightarrow MaxPrice} (\pi_{\text{Ticker, Price}} (\text{Temp2})) \\
&\text{Temp3} \leftarrow \sigma_{\text{Temp2.Ticker} = \text{MaxPricePerTicker.Ticker} \land \text{Temp2.Price} = \text{MaxPricePerTicker.MaxPrice}} (\text{Temp2} \times \text{MaxPricePerTicker}) \\
&\text{Result} \leftarrow \pi_{\text{Temp3.Ticker}, \text{Temp3.Date}, \text{Temp3.Price}} (\text{Temp3})
\end{aligned}
$$


1. **Temp1:**
   This step performs a Cartesian product between the `Stocks` table and the `StockHasHistory` table, followed by a selection to keep only those tuples where the `StockID` from `Stocks` matches the `StockID` from `StockHasHistory`. This effectively joins the two tables on `StockID`.

2. **Temp2:**
   This step takes the result from Temp1 and performs a Cartesian product with the `History` table. It then applies a selection to keep only those tuples where the `HistoryID` from `StockHasHistory` matches the `HistoryID` from `History`. This effectively joins the result of Temp1 with the `History` table on `HistoryID`.

3. **MaxPricePerTicker:**
   This step first projects the `Ticker` and `Price` attributes from Temp2. Then, it groups the projected data by `Ticker` and calculates the maximum `Price` for each `Ticker`, storing the result as `MaxPrice`.

4. **Temp3:**
   This step performs a Cartesian product between Temp2 and MaxPricePerTicker and applies a selection to keep only those tuples where the `Ticker` in Temp2 matches the `Ticker` in MaxPricePerTicker and the `Price` in Temp2 matches the `MaxPrice` in MaxPricePerTicker. This step filters the combined data to include only those records where the price matches the maximum price for each ticker.

5. **Result:**
   This step projects the `Ticker`, `Date`, and `Price` attributes from Temp3. This final projection extracts the relevant information about the maximum prices for each ticker along with the corresponding dates.

This outputs a list of stock tickers with the dates on which they had their highest prices and the corresponding prices. Provides the maximum price for each stock along with the date when that price occurred.

In [None]:
%%sql

-- Find the date with the highest stock price for each stock
SELECT 
    s.Ticker,
    h.Date,
    h.Price
FROM 
    Stocks s
JOIN 
    StockHasHistory shh ON s.StockID = shh.StockID
JOIN 
    History h ON shh.HistoryID = h.HistoryID
WHERE 
    (s.Ticker, h.Price) IN (
        SELECT 
            s.Ticker, 
            MAX(h.Price)
        FROM 
            Stocks s
        JOIN 
            StockHasHistory shh ON s.StockID = shh.StockID
        JOIN 
            History h ON shh.HistoryID = h.HistoryID
        GROUP BY 
            s.Ticker
    );

 * mysql+pymysql://csc370:***@localhost:3306/sprint1
15 rows affected.


Ticker,Date,Price
AAPL,2024-06-12,213.07
MSFT,2024-06-12,441.06
AMZN,2024-05-09,189.5
GOOGL,2024-06-12,177.79
META,2024-04-05,527.34
TSLA,2023-07-18,293.34
BRK-A,2024-03-28,634440.0
JNJ,2023-07-28,169.184
JPM,2024-05-17,204.79
V,2024-03-21,289.834


## 2. Applying SQL indices to stock history

- Only applies to history tables due to size, unless portfolio is large
- When using massive amounts of stock data, time to perform operations such as searching for all stock prices within a given range of dates will be drastically reduced by using, for example, a b-tree ($O(logn)$ vs $O(n)$)

### Example usage:

In [None]:
%%sql

-- Create an index on the Ticker column in the Stocks table
CREATE INDEX idx_stocks_ticker ON Stocks(Ticker);

 * mysql+pymysql://csc370:***@localhost:3306/sprint1
0 rows affected.


[]

In [None]:
%%sql
EXPLAIN SELECT h.*
FROM History h
JOIN StockHasHistory shh ON h.HistoryID = shh.HistoryID
JOIN Stocks s ON shh.StockID = s.StockID
WHERE s.Ticker = 'AAPL'
ORDER BY h.Date;

 * mysql+pymysql://csc370:***@localhost:3306/sprint1
3 rows affected.


id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,shh,,index,"PRIMARY,HistoryID",HistoryID,4,,1,100.0,Using index; Using temporary; Using filesort
1,SIMPLE,s,,eq_ref,"PRIMARY,idx_stocks_ticker",PRIMARY,4,sprint1.shh.StockID,1,6.67,Using where
1,SIMPLE,h,,eq_ref,PRIMARY,PRIMARY,4,sprint1.shh.HistoryID,1,100.0,
