# Create a Web Application for an ETF Analyzer

In this Challenge assignment, you’ll build a financial database and web application by using SQL, Python, and the Voilà library to analyze the performance of a hypothetical fintech ETF.

Instructions: 

Use this notebook to complete your analysis of a fintech ETF that consists of four stocks: GOST, GS, PYPL, and SQ. Each stock has its own table in the `etf.db` database, which the `Starter_Code` folder also contains.

Analyze the daily returns of the ETF stocks both individually and as a whole. Then deploy the visualizations to a web application by using the Voilà library.

The detailed instructions are divided into the following parts:

* Analyze a single asset in the ETF

* Optimize data access with Advanced SQL queries

* Analyze the ETF portfolio

* Deploy the notebook as a web application

#### Analyze a Single Asset in the ETF

For this part of the assignment, you’ll use SQL queries with Python, Pandas, and hvPlot to analyze the performance of a single asset from the ETF.

Complete the following steps:

1. Write a SQL `SELECT` statement by using an f-string that reads all the PYPL data from the database. Using the SQL `SELECT` statement, execute a query that reads the PYPL data from the database into a Pandas DataFrame.

2. Use the `head` and `tail` functions to review the first five and the last five rows of the DataFrame. Make a note of the beginning and end dates that are available from this dataset. You’ll use this information to complete your analysis.

3. Using hvPlot, create an interactive visualization for the PYPL daily returns. Reflect the “time” column of the DataFrame on the x-axis. Make sure that you professionally style and format your visualization to enhance its readability.

4. Using hvPlot, create an interactive visualization for the PYPL cumulative returns. Reflect the “time” column of the DataFrame on the x-axis. Make sure that you professionally style and format your visualization to enhance its readability.

#### Optimize Data Access with Advanced SQL Queries

For this part of the assignment, you’ll continue to analyze a single asset (PYPL) from the ETF. You’ll use advanced SQL queries to optimize the efficiency of accessing data from the database.

Complete the following steps:

1. Access the closing prices for PYPL that are greater than 200 by completing the following steps:

    - Write a SQL `SELECT` statement to select the dates where the PYPL closing price was higher than 200.0.

    - Using the SQL statement, read the data from the database into a Pandas DataFrame, and then review the resulting DataFrame.

    - Select the “time” and “close” columns for those dates where the closing price was higher than 200.0.

2. Find the top 10 daily returns for PYPL by completing the following steps:

    -  Write a SQL statement to find the top 10 PYPL daily returns. Make sure to do the following:

        * Use `SELECT` to select only the “time” and “daily_returns” columns.

        * Use `ORDER` to sort the results in descending order by the “daily_returns” column.

        * Use `LIMIT` to limit the results to the top 10 daily return values.

    - Using the SQL statement, read the data from the database into a Pandas DataFrame, and then review the resulting DataFrame.

#### Analyze the ETF Portfolio

For this part of the assignment, you’ll build the entire ETF portfolio and then evaluate its performance. To do so, you’ll build the ETF portfolio by using SQL joins to combine all the data for each asset.

Complete the following steps:

1. Write a SQL query to join each table in the portfolio into a single DataFrame. To do so, complete the following steps:

    - Use a SQL inner join to join each table on the “time” column. Access the “time” column in the `GDOT` table via the `GDOT.time` syntax. Access the “time” columns from the other tables via similar syntax.

    - Using the SQL query, read the data from the database into a Pandas DataFrame. Review the resulting DataFrame.

2. Create a DataFrame that averages the “daily_returns” columns for all four assets. Review the resulting DataFrame.

    > **Hint** Assuming that this ETF contains equally weighted returns, you can average the returns for each asset to get the average returns of the portfolio. You can then use the average returns of the portfolio to calculate the annualized returns and the cumulative returns. For the calculation to get the average daily returns for the portfolio, use the following code:
    >
    > ```python
    > etf_portfolio_returns = etf_portfolio['daily_returns'].mean(axis=1)
    > ```
    >
    > You can use the average daily returns of the portfolio the same way that you used the daily returns of a single asset.

3. Use the average daily returns in the `etf_portfolio_returns` DataFrame to calculate the annualized returns for the portfolio. Display the annualized return value of the ETF portfolio.

> **Hint**  To calculate the annualized returns, multiply the mean of the `etf_portfolio_returns` values by 252.
>
> To convert the decimal values to percentages, multiply the results by 100.

4. Use the average daily returns in the `etf_portfolio_returns` DataFrame to calculate the cumulative returns of the ETF portfolio.

5. Using hvPlot, create an interactive line plot that visualizes the cumulative return values of the ETF portfolio. Reflect the “time” column of the DataFrame on the x-axis. Make sure that you professionally style and format your visualization to enhance its readability.

#### Deploy the Notebook as a Web Application

For this part of the assignment, complete the following steps:

1. Use the Voilà library to deploy your notebook as a web application. You can deploy the web application locally on your computer.

2. Take a screen recording or screenshots to show how the web application appears when using Voilà. Include the recording or screenshots in the `README.md` file for your GitHub repository.


## Review the following code which imports the required libraries, initiates your SQLite database, popluates the database with records from the `etf.db` seed file that was included in your Starter_Code folder, creates the database engine, and confirms that data tables that it now contains.

In [2]:
# Importing the required libraries and dependencies
import numpy as np
import pandas as pd
import hvplot.pandas
import sqlalchemy

# Create a temporary SQLite database and populate the database with content from the etf.db seed file
database_connection_string = 'sqlite:///etf.db'

# Create an engine to interact with the SQLite database
engine = sqlalchemy.create_engine(database_connection_string)

# Confirm that table names contained in the SQLite database.
engine.table_names()

  


['GDOT', 'GS', 'PYPL', 'SQ']

## Analyze a single asset in the FinTech ETF

For this part of the assignment, you’ll use SQL queries with Python, Pandas, and hvPlot to analyze the performance of a single asset from the ETF.

Complete the following steps:

1. Write a SQL `SELECT` statement by using an f-string that reads all the PYPL data from the database. Using the SQL `SELECT` statement, execute a query that reads the PYPL data from the database into a Pandas DataFrame.

2. Use the `head` and `tail` functions to review the first five and the last five rows of the DataFrame. Make a note of the beginning and end dates that are available from this dataset. You’ll use this information to complete your analysis.

3. Using hvPlot, create an interactive visualization for the PYPL daily returns. Reflect the “time” column of the DataFrame on the x-axis. Make sure that you professionally style and format your visualization to enhance its readability.

4. Using hvPlot, create an interactive visualization for the PYPL cumulative returns. Reflect the “time” column of the DataFrame on the x-axis. Make sure that you professionally style and format your visualization to enhance its readability.



### Step 1: Write a SQL `SELECT` statement by using an f-string that reads all the PYPL data from the database. Using the SQL `SELECT` statement, execute a query that reads the PYPL data from the database into a Pandas DataFrame.

In [5]:
# Write a SQL query to SELECT all of the data from the PYPL table
query = """
SELECT *
FROM PYPL
"""

# Use the query to read the PYPL data into a Pandas DataFrame
pypl_df = pd.read_sql_query(query, con=engine)


### Step 2: Use the `head` and `tail` functions to review the first five and the last five rows of the DataFrame. Make a note of the beginning and end dates that are available from this dataset. You’ll use this information to complete your analysis.

In [6]:
# View the first 5 rows of the DataFrame.
pypl_df.head()


Unnamed: 0,time,open,high,low,close,volume,daily_returns
0,2016-12-16 00:00:00.000000,39.9,39.9,39.12,39.32,7298861,-0.005564
1,2016-12-19 00:00:00.000000,39.4,39.8,39.11,39.45,3436478,0.003306
2,2016-12-20 00:00:00.000000,39.61,39.74,39.26,39.74,2940991,0.007351
3,2016-12-21 00:00:00.000000,39.84,40.74,39.82,40.09,5826704,0.008807
4,2016-12-22 00:00:00.000000,40.04,40.09,39.54,39.68,4338385,-0.010227


In [7]:
# View the last 5 rows of the DataFrame.
pypl_df.tail()


Unnamed: 0,time,open,high,low,close,volume,daily_returns
994,2020-11-30 00:00:00.000000,212.51,215.83,207.09,214.2,8992681,0.013629
995,2020-12-01 00:00:00.000000,217.15,220.57,214.3401,216.52,9148174,0.010831
996,2020-12-02 00:00:00.000000,215.6,215.75,210.5,212.66,6414746,-0.017827
997,2020-12-03 00:00:00.000000,213.33,216.93,213.11,214.68,6463339,0.009499
998,2020-12-04 00:00:00.000000,214.88,217.28,213.01,217.235,2118319,0.011901


### Step 3: Using hvPlot, create an interactive visualization for the PYPL daily returns. Reflect the “time” column of the DataFrame on the x-axis. Make sure that you professionally style and format your visualization to enhance its readability.

In [8]:
# Create an interactive visualization with hvplot to plot the daily returns for PYPL.
pypl_df['daily_returns'].hvplot(
    title = "Daily Returns for Paypal",
    x = "time",
    xlabel = "Time (specified in days)",
    ylabel = "Daily Returns",
)


### Step 4: Using hvPlot, create an interactive visualization for the PYPL cumulative returns. Reflect the “time” column of the DataFrame on the x-axis. Make sure that you professionally style and format your visualization to enhance its readability.

In [9]:
# Create an interactive visaulization with hvplot to plot the cumulative returns for PYPL.
pypl_df['daily_returns'].cumsum().hvplot(
    title = "Cumulative Returns for Paypal",
    x = "time",
    xlabel = "Time (specified in days)",
    ylabel = "Cumulative Returns"
)

## Optimize the SQL Queries

For this part of the assignment, you’ll continue to analyze a single asset (PYPL) from the ETF. You’ll use advanced SQL queries to optimize the efficiency of accessing data from the database.

Complete the following steps:

1. Access the closing prices for PYPL that are greater than 200 by completing the following steps:

1. Access the closing prices for PYPL that are greater than 200 by completing the following steps:

    - Write a SQL `SELECT` statement to select the dates where the PYPL closing price was higher than 200.0.

    - Select the “time” and “close” columns for those dates where the closing price was higher than 200.0.

    - Using the SQL statement, read the data from the database into a Pandas DataFrame, and then review the resulting DataFrame.

2. Find the top 10 daily returns for PYPL by completing the following steps:

    -  Write a SQL statement to find the top 10 PYPL daily returns. Make sure to do the following:

        * Use `SELECT` to select only the “time” and “daily_returns” columns.

        * Use `ORDER` to sort the results in descending order by the “daily_returns” column.

        * Use `LIMIT` to limit the results to the top 10 daily return values.

    - Using the SQL statement, read the data from the database into a Pandas DataFrame, and then review the resulting DataFrame.


### Step 1: Access the closing prices for PYPL that are greater than 200 by completing the following steps:

    - Write a SQL `SELECT` statement to select the dates where the PYPL closing price was higher than 200.0.

    - Select the “time” and “close” columns for those dates where the closing price was higher than 200.0.

    - Using the SQL statement, read the data from the database into a Pandas DataFrame, and then review the resulting DataFrame.


In [11]:
# Write a SQL SELECT statement to select the time and close columns 
# where the PYPL closing price was higher than 200.0.
query =  """
SELECT time, close
FROM PYPL
WHERE close > 200
"""
# Using the query, read the data from the database into a Pandas DataFrame
pypl_higher_than_200 = pd.read_sql_query(query, con=engine)

# Review the resulting DataFrame
pypl_higher_than_200


Unnamed: 0,time,close
0,2020-08-05 00:00:00.000000,202.92
1,2020-08-06 00:00:00.000000,204.09
2,2020-08-25 00:00:00.000000,201.71
3,2020-08-26 00:00:00.000000,203.53
4,2020-08-27 00:00:00.000000,204.34
5,2020-08-28 00:00:00.000000,204.48
6,2020-08-31 00:00:00.000000,203.95
7,2020-09-01 00:00:00.000000,208.92
8,2020-09-02 00:00:00.000000,210.82
9,2020-09-03 00:00:00.000000,205.07


### Step 2: Find the top 10 daily returns for PYPL by completing the following steps:

    -  Write a SQL statement to find the top 10 PYPL daily returns. Make sure to do the following:

        * Use `SELECT` to select only the “time” and “daily_returns” columns.

        * Use `ORDER` to sort the results in descending order by the “daily_returns” column.

        * Use `LIMIT` to limit the results to the top 10 daily return values.

    - Using the SQL statement, read the data from the database into a Pandas DataFrame, and then review the resulting DataFrame.


In [12]:
# Write a SQL SELECT statement to select the time and daily_returns columns
# Sort the results in descending order and return only the top 10 return values
query =  """
SELECT time, daily_returns
FROM PYPL
ORDER BY daily_returns DESC
LIMIT 10
"""

# Using the query, read the data from the database into a Pandas DataFrame
pypl_top_10_returns = pd.read_sql_query(query, con=engine)

# Review the resulting DataFrame
pypl_top_10_returns

Unnamed: 0,time,daily_returns
0,2020-03-24 00:00:00.000000,0.140981
1,2020-05-07 00:00:00.000000,0.140318
2,2020-03-13 00:00:00.000000,0.1387
3,2020-04-06 00:00:00.000000,0.100877
4,2018-10-19 00:00:00.000000,0.093371
5,2019-10-24 00:00:00.000000,0.085912
6,2020-11-04 00:00:00.000000,0.080986
7,2020-03-10 00:00:00.000000,0.080863
8,2020-04-22 00:00:00.000000,0.075321
9,2018-12-26 00:00:00.000000,0.074656


## Analyze the Fintech ETF Portfolio

For this part of the assignment, you’ll build the entire ETF portfolio and then evaluate its performance. To do so, you’ll build the ETF portfolio by using SQL joins to combine all the data for each asset.

Complete the following steps:

1. Write a SQL query to join each table in the portfolio into a single DataFrame. To do so, complete the following steps:

    - Use a SQL inner join to join each table on the “time” column. Access the “time” column in the `GDOT` table via the `GDOT.time` syntax. Access the “time” columns from the other tables via similar syntax.

    - Using the SQL query, read the data from the database into a Pandas DataFrame. Review the resulting DataFrame.

2. Create a DataFrame that averages the “daily_returns” columns for all four assets. Review the resulting DataFrame.

    > **Hint** Assuming that this ETF contains equally weighted returns, you can average the returns for each asset to get the average returns of the portfolio. You can then use the average returns of the portfolio to calculate the annualized returns and the cumulative returns. For the calculation to get the average daily returns for the portfolio, use the following code:
    >
    > ```python
    > etf_portfolio_returns = etf_portfolio['daily_returns'].mean(axis=1)
    > ```
    >
    > You can use the average daily returns of the portfolio the same way that you used the daily returns of a single asset.

3. Use the average daily returns in the `etf_portfolio_returns` DataFrame to calculate the annualized returns for the portfolio. Display the annualized return value of the ETF portfolio.

> **Hint**  To calculate the annualized returns, multiply the mean of the `etf_portfolio_returns` values by 252.
>
> To convert the decimal values to percentages, multiply the results by 100.

4. Use the average daily returns in the `etf_portfolio_returns` DataFrame to calculate the cumulative returns of the ETF portfolio.

5. Using hvPlot, create an interactive line plot that visualizes the cumulative return values of the ETF portfolio. Reflect the “time” column of the DataFrame on the x-axis. Make sure that you professionally style and format your visualization to enhance its readability.


### Step 1: Write a SQL query to join each table in the portfolio into a single DataFrame. To do so, complete the following steps:

    - Use a SQL inner join to join each table on the “time” column. Access the “time” column in the `GDOT` table via the `GDOT.time` syntax. Access the “time” columns from the other tables via similar syntax.

    - Using the SQL query, read the data from the database into a Pandas DataFrame. Review the resulting DataFrame.

In [13]:
# Wreate a SQL query to join each table in the portfolio into a single DataFrame 
# Use the time column from each table as the basis for the join
query = """
SELECT *
FROM GDOT
JOIN GS ON GDOT.time = GS.time
JOIN PYPL ON GDOT.time = PYPL.time
JOIN SQ ON GDOT.time = SQ.time;
"""
# Using the query, read the data from the database into a Pandas DataFrame
etf_portfolio = pd.read_sql_query(query, con=engine)

# Review the resulting DataFrame
etf_portfolio


Unnamed: 0,time,open,high,low,close,volume,daily_returns,time.1,open.1,high.1,...,close.1,volume.1,daily_returns.1,time.2,open.2,high.2,low.1,close.2,volume.2,daily_returns.2
0,2016-12-16 00:00:00.000000,24.41,24.7300,23.9400,23.9800,483544,-0.023218,2016-12-16 00:00:00.000000,242.80,243.1900,...,39.3200,7298861,-0.005564,2016-12-16 00:00:00.000000,14.290,14.4700,14.2300,14.3750,4516341,0.017339
1,2016-12-19 00:00:00.000000,24.00,24.0100,23.5500,23.7900,288149,-0.007923,2016-12-19 00:00:00.000000,238.34,239.7400,...,39.4500,3436478,0.003306,2016-12-19 00:00:00.000000,14.340,14.6000,14.3000,14.3600,3944657,-0.001043
2,2016-12-20 00:00:00.000000,23.75,23.9400,23.5800,23.8200,220341,0.001261,2016-12-20 00:00:00.000000,240.52,243.6500,...,39.7400,2940991,0.007351,2016-12-20 00:00:00.000000,14.730,14.8200,14.4100,14.4900,5207412,0.009053
3,2016-12-21 00:00:00.000000,23.90,23.9700,23.6900,23.8600,249189,0.001679,2016-12-21 00:00:00.000000,242.24,242.4000,...,40.0900,5826704,0.008807,2016-12-21 00:00:00.000000,14.450,14.5400,14.2701,14.3800,3901738,-0.007591
4,2016-12-22 00:00:00.000000,23.90,24.0100,23.7000,24.0050,383139,0.006077,2016-12-22 00:00:00.000000,241.23,242.8600,...,39.6800,4338385,-0.010227,2016-12-22 00:00:00.000000,14.330,14.3400,13.9301,14.0400,3874004,-0.023644
5,2016-12-23 00:00:00.000000,23.99,24.0000,23.7900,23.8100,113534,-0.008123,2016-12-23 00:00:00.000000,239.54,241.9000,...,39.5800,2525504,-0.002520,2016-12-23 00:00:00.000000,13.940,14.2400,13.8800,14.0800,1440289,0.002849
6,2016-12-27 00:00:00.000000,23.84,23.9800,23.6300,23.7550,54273,-0.002310,2016-12-27 00:00:00.000000,241.95,242.5899,...,39.7200,2209080,0.003537,2016-12-27 00:00:00.000000,14.120,14.2463,14.0000,14.0050,2169127,-0.005327
7,2016-12-28 00:00:00.000000,23.73,23.9400,23.6000,23.6700,98105,-0.003578,2016-12-28 00:00:00.000000,243.71,244.5000,...,39.5800,2721046,-0.003525,2016-12-28 00:00:00.000000,14.080,14.1200,13.8500,13.9550,1882397,-0.003570
8,2016-12-29 00:00:00.000000,23.65,23.9300,23.4500,23.5800,79427,-0.003802,2016-12-29 00:00:00.000000,240.75,241.0700,...,39.9700,3118262,0.009853,2016-12-29 00:00:00.000000,13.940,14.1100,13.6400,13.7250,2774798,-0.016482
9,2016-12-30 00:00:00.000000,23.53,23.5700,23.3900,23.5400,154961,-0.001696,2016-12-30 00:00:00.000000,239.28,240.5000,...,39.4700,3622222,-0.012509,2016-12-30 00:00:00.000000,13.710,13.7700,13.5300,13.6400,2758838,-0.006193


### Step 2: Create a DataFrame that averages the “daily_returns” columns for all four assets. Review the resulting DataFrame.

In [14]:
# Create a DataFrame that displays the mean value of the “daily_returns” columns for all four assets.
etf_portfolio_returns = etf_portfolio['daily_returns'].mean(axis=1)

# Review the resulting DataFrame
etf_portfolio_returns

0     -0.007038
1     -0.001216
2      0.008567
3     -0.001004
4     -0.008243
5     -0.001220
6      0.000304
7     -0.004176
8     -0.005080
9     -0.003673
10     0.010709
11     0.019658
12     0.010197
13     0.017107
14    -0.005331
15    -0.001469
16     0.003114
17    -0.001676
18     0.003887
19    -0.011994
20    -0.000325
21    -0.003625
22     0.009330
23    -0.004453
24     0.009025
25     0.005900
26    -0.001316
27    -0.008435
28    -0.005873
29    -0.002324
         ...   
969    0.004273
970   -0.029937
971   -0.002944
972   -0.031380
973    0.010542
974   -0.046673
975    0.001536
976    0.018162
977    0.038580
978    0.020725
979    0.034325
980   -0.028081
981   -0.014000
982    0.034656
983   -0.016853
984   -0.001675
985    0.008388
986    0.010609
987   -0.009216
988    0.020272
989    0.017472
990    0.033899
991    0.003386
992    0.027063
993   -0.009501
994   -0.014635
995   -0.003990
996   -0.006288
997    0.011246
998    0.009108
Length: 999, dtype: floa

### Step 3: Use the average daily returns in the etf_portfolio_returns DataFrame to calculate the annualized returns for the portfolio. Display the annualized return value of the ETF portfolio.

In [15]:
# Use the average daily returns provided by the etf_portfolio_returns DataFrame 
# to calculate the annualized return for the portfolio. 
annualized_etf_portfolio_returns = (etf_portfolio_returns.mean() * 252) * 100

# Display the annualized return value of the ETF portfolio.
annualized_etf_portfolio_returns


43.827272400613595

### Step 4: Use the average daily returns in the `etf_portfolio_returns` DataFrame to calculate the cumulative returns of the ETF portfolio.

In [16]:
# Use the average daily returns provided by the etf_portfolio_returns DataFrame 
# to calculate the cumulative returns
etf_cumulative_returns = etf_portfolio_returns.cumsum()

# Display the final cumulative return value
etf_cumulative_returns


0     -0.007038
1     -0.008254
2      0.000313
3     -0.000691
4     -0.008934
5     -0.010154
6     -0.009850
7     -0.014026
8     -0.019106
9     -0.022779
10    -0.012070
11     0.007588
12     0.017785
13     0.034892
14     0.029561
15     0.028092
16     0.031206
17     0.029530
18     0.033418
19     0.021424
20     0.021099
21     0.017474
22     0.026805
23     0.022351
24     0.031376
25     0.037276
26     0.035960
27     0.027524
28     0.021651
29     0.019327
         ...   
969    1.652641
970    1.622704
971    1.619760
972    1.588380
973    1.598922
974    1.552250
975    1.553786
976    1.571948
977    1.610527
978    1.631253
979    1.665578
980    1.637497
981    1.623497
982    1.658153
983    1.641301
984    1.639626
985    1.648013
986    1.658622
987    1.649406
988    1.669679
989    1.687151
990    1.721050
991    1.724436
992    1.751499
993    1.741998
994    1.727363
995    1.723372
996    1.717084
997    1.728330
998    1.737438
Length: 999, dtype: floa

### Step 5: Using hvPlot, create an interactive line plot that visualizes the cumulative return values of the ETF portfolio. Reflect the “time” column of the DataFrame on the x-axis. Make sure that you professionally style and format your visualization to enhance its readability.

In [17]:
# Using hvplot, create an interactive line plot that visualizes the ETF portfolios cumulative return values.
etf_cumulative_returns.hvplot.line(
    title = "ETF portfolios Cumulative Returns",
    x = "time",
    xlabel = "Time(specified in days)",
    ylabel = "Cumulative Returns",
    yformatter = '%.1f'
)
