# Diversification

Harold's company wants to build a diversified stock portfolio. So far, it has added `BMO` (Bank of Montreal) and `CNQ` (Canadian Natural Resources Limited), which reside within the Financial Services and Energy sectors in [the S&P TSX 60 index](https://en.wikipedia.org/wiki/S%26P/TSX_60), respectively. Now they want to add a third energy sector stock to the mix.

Harold's manager has asked him to research a set of five energy stocks to add to the existing portfolio. To create a diversified portfolio that tends to minimize long-term volatility/risk, stocks within the portfolio should be as uncorrelated as possible so as to create a counterbalance effect (i.e, when some stocks fall in price, others may rise in price).

Use the Pandas library to help Harold analyze five energy stocks—`CVE`, `ENB`, `IMO`, `IPL`, and `TRP`—and choose the stock with the least correlation to `BMO` and `CNQ`.

## Instructions

## Import libraries and dependencies

In [None]:
# Import libraries and dependencies
import pandas as pd
from pathlib import Path
import seaborn as sns

%matplotlib inline

### Read CSV in as DataFrame

In [None]:
# Set file paths
cnq_data = Path("../Resources/CNQ.csv")
bmo_data = Path("../Resources/BMO.csv")
cve_data = Path("../Resources/CVE.csv")
enb_data = Path("../Resources/ENB.csv")
imo_data = Path("../Resources/IMO.csv")
ipl_data = Path("../Resources/IPL.csv")
trp_data = Path("../Resources/TRP.csv")

# Read the individual CSV datasets
cnq_df = pd.read_csv(cnq_data, index_col="date")
bmo_df = pd.read_csv(bmo_data, index_col="date")
cve_df = pd.read_csv(cve_data, index_col="date")
enb_df = pd.read_csv(enb_data, index_col="date")
imo_df = pd.read_csv(imo_data, index_col="date")
ipl_df = pd.read_csv(ipl_data, index_col="date")
trp_df = pd.read_csv(trp_data, index_col="date")

# Display sample data from `BMO` (all files have the same structure)
bmo_df.head()

### Combine the DataFrames

In [None]:
# Use the `concat` function to combine the DataFrames by matching indexes (or in this case `date`)
combined_df = pd.concat([cnq_df, bmo_df, cve_df, enb_df, imo_df, ipl_df, trp_df], axis="columns", join="inner")
combined_df.head()

### Calculate Daily Returns

In [None]:
# Use the `pct_change` function to calculate daily returns for each stock
daily_returns = combined_df.pct_change()
daily_returns.head()

### Calculate Correlation

In [None]:
# Use the `corr` function to calculate correlations for each stock pair
correlation = daily_returns.corr()
correlation

### Plot Correlation

In [None]:
# Create a heatmap from the correlation values
sns.heatmap(correlation)

In [None]:
# Create a heatmap from the correlation values and adjust the scale
sns.heatmap(correlation, vmin=-1, vmax=1)

### Which energy stock would be the best candidate to add to the existing portfolio?

**Sample Answer:** `ENB` stock would be the best candidate to add to the existing portfolio as it is the stock that has the least correlation with `BMO` and `CNQ`.