# `pandas` and Finance: Using python to build a diversified portfolio

## Learning Objectives
>- Read stock data from yahoo finance
>- Calculate monthly and annual stock returns
>- Calculate correlations between asset classes (stocks vs bonds)
>- Analysis can then be used to build a diversified portfolio

## After completing this lesson you will be able to:
1. Apply pandas to a "real world" problem/exercise related to stock analysis
2. Scrape/extract data from the web and store in a pandas DataFrame
3. Perform descriptive analytics on stock data
4. Calculate common stock metrics such as moving average
5. Visualize data
 
## Modules needed for this lesson (must be installed)
>- To install `pandas-datareader` use the following code in a code cell or in your command prompt
    1. Option 1: `conda install pandas-datareader`
    2. Option 2: `pip install pandas-datareader`
    
>- To install Plotly and Plotly Express
    1. Option 1: `conda install -c plotly plotly=4.13.0`
    2. Option 2: `pip install plotly==4.13.0`

## Files Needed and/or Data Sources:
>- We will use `pandas_datareader` to extract data from YahooFinance


## Initial references for this lesson (more given throughout lesson):
>- Lesson Tutorial: https://towardsdatascience.com/in-12-minutes-stocks-analysis-with-pandas-and-scikit-learn-a8d8a7b50ee7
>- Pandas Data Reader: https://pandas-datareader.readthedocs.io/en/latest/
>- Datetime module: https://docs.python.org/3/library/datetime.html

Narration Videos:

- Part 1: https://youtu.be/Fy1Y1zzeg7o
- Part 2: https://youtu.be/mhRXc1mewl4
- Part 3: https://youtu.be/vTABqfdQoLM
- Part 4: https://youtu.be/rvG5xQrsRyc
- Part 5: https://youtu.be/A7xp0AiF3JE
- Part 6: https://youtu.be/UMxcUG5Hh0s
- Part 7: https://youtu.be/qVkxZNLqnWQ

# Install the `pandas-datareader` module
>- Note: I used the Anaconda powershell prompt to install this but other ways should work

In [1]:
# Run code below if you do not have pandas-datareader installed yet 
# conda install pandas-datareader

# Import Modules

### These are our fundamental modules for working with pandas and reading data from the web

### The following are needed for the data visualization component of this lesson
>- Below is a link for various tutorials on `matplotlib` 
>>- [`matplotlib doc`](https://matplotlib.org/tutorials/index.html)
>- Seaborn is built on top of matplotlib, has sample datasets built in, and helps build charts easier than working directly with matplotlib
>>- [`seaborn doc`](https://seaborn.pydata.org/introduction.html)
>- Plotly Creates Interactive Graphs
>>- [`Plotly doc`](https://plotly.com/python/getting-started/)


# ETF DataFrame 
## Setting up a DataFrame to Store ETF Data
>- We will collect data on 4 ETFs that will make a simple but diversified portfolio
>>- We collect data on the following ETFs that together make a diversified portfolio:
>>>- Stock ETF 1: VTI, Vanguard total US stock market ETF
>>>- Stock ETF 2: VEU, Vanguard All World except US
>>>- Bond ETF 1: BND, Vanguard total US bond market ETF 
>>>- Bond ETF 2: BNDX, Vanguard total international Bond excluding US

### Set up variables to define our start and end dates for stock analysis
>- We will look at stock prices over the past 10 years

### Read in Stock Data with `pandas DataReader` and yahoo Finance

# Visualize Our Portfolio
## Initial Visualization of our ETF Data

### First, `pandas` built in data visualization can provide for some quick plotting
>- `pandas` built-in visualization is built off of matplotlib 
>- [pandas Viz doc](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html)

Let's makes some plots to get familiar with the pandas built-in data visualization
>- Line plot
>- Histogram
>- Box plot

#### pandas Line Plot

#### pandas Histogram

#### pandas Box Plots

### Plotting with `seaborn` and `plotly` for exposure to other data viz modules

#### `Seaborn` Line Plot

#### `Plotly` Interactive Graphs

##### Plotly Express Line Chart
>- Hover over top right corner of chart for more options

In [None]:
px.line(etf, title='My ETF Portfolio')

# Data Preparation 
## Add year, month, day columns to our dataframe
>- First create a new dataframe by copying `etf`
>>- Passing in `deep=True` creates a copy of the DataFrame including the data and indices

#### Now, add in year, month, and day fields

## Find and store the last trading day for each month
>- This allows us to calculate monthly and annual returns based on last trading days of month/year

### We will look at doing this several ways:

#### 1. Creating a new `lastday` DataFrame and joining to our stock data
>- This method joins on year, month, and day to return a stock DataFrame only containing prices on the last trading day of each month

#### 2. Grouping the data
>- Here we will use groupby() and the agg() functions to return the last trading days in each month
>- This is a more concise way of accomplishing what we want

### Option 1
>- Create a `lastday` DataFrame and join back to our stock price DataFrame 
>- All this DataFrame does is store the last trading day in each month

#### Now, merge `etf1` and `lastday` DataFrames to create `etf2`
>- `etf2` is a DataFrame that will only show the adjusted closing price on the last trading day of every month
>- By joining on year, month, and day we will get a DataFrame that only shows us the adjusted closing price on the last trading day of the month. 

### Option 2
#### An alternate way to get the last trading days of each month without joining
>- Grouping by year and month and taking the max of all other columns accomplishes the same thing as our join

## DataPrep Continued...
### Set the index of the monthly stock data, `etf2`

### Add 1 month lags to all our month end prices
>- This allows us to calculate monthly return

### Add Columns for Monthly Returns

### Alternatively, we can use the `pct_change()` function to find returns on the original data
>- Percent change computes the percentage change from the immediately previous row by default.
>- Here, we only want to select the original ETF columns and not the lagged columns

[pct_change() doc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pct_change.html)

# Annual Returns
## Annual returns given a certain month as the baseline
>- Here, you could change the month depending on when you are planning on investing or analyzing data

### Now, add some annual lags to `etfann`
>- Let's use a loop for practice and for larger datasets

### Calculate annual returns with a loop
>- Make sure to slice columns to only calculate returns on original ETF columns

### If all we want to see are percent returns, you can use `pct_change()`
[`pct_change()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pct_change.html)

# ETF Portfolio Analysis

## Q1: How many years were `VTI` returns positive?
>- Compare this to the total years analyzed. Does `VTI` seem like a decent investment? 

## Q2: What was `VTI`'s average annual return during the time frame analyzed?
>- Round to three decimals

## Q3: What are the descriptive annual return statistics?

## Q4: How many months has `VTI` had a negative monthly return?
>- Compare this to the total number of months in the DataFrame

## Q5: What are the descriptive statistics on the monthly ETF returns?
>- Note the mean, min, max, median, and standard deviation

#### Plot BoxPlots of Monthly Returns to Visualize Descriptive Statistics

#### Plot monthly returns using plotly graphs

# Asset Correlations
## Calculate ETF return correlations
>- Note: A well diversified portfolio will have a mix of low or negatively correlated asset classes
>>- Typically, stocks and bonds are negatively correlated but lets see how our ETFs relate
>- Note: our unit of analysis is a month where we calculated monthly returns for our ETFs
>>- The correlation we are finding then is the correlation of monthly returns

## Monthly Asset Correlations
>- For portfolio diversification, look for assets classes that have low or negative correlation 

## Annual Asset Correlations

## Visualize ETF Relationships
### Now let's visualize our returns

### First, let's look at how our two stock ETFs relate
>- We would expect a pretty high correlation but let's take a look
>- Seaborn's `lmplot()` lets us plot a scatter plot with a linear regression line

### Now let's look at how our bond ETFs look plotted against each other

### How does VTI (total US stock etf) correlate to BND (total US bond etf)?

### Seaborns `pairplot()` will plot all pairwise scatter plots as well as distributions of all variables