# `pandas` Part 8: Using `pandas` to Collect and Analyze Web Data
### Part(a): Reading, Storing, and Cleaning Financial Web Data
### Part(b): Visualizing Data

# Learning Objectives

## After completing this lesson you will be able to:
1. Install a convenient web data collection tool `pandas_datareader`
2. Apply pandas to a "real world" problem/exercise related to stock analysis
3. Scrape/extract data from the web and store in a pandas DataFrame
4. Perform descriptive analytics on stock data
5. Calculate common stock metrics such returns and moving average
6. Visualize data 
 
## Modules needed for this lesson 
>- pandas
>- datetime
>- pandas_datareader, see notes for installation
>- matplotlib


## Files Needed and/or Data Sources:
>- We will use `pandas_datareader` to extract data from YahooFinance


## Initial references for this lesson (more given throughout lesson):
>- Lesson Tutorial: https://towardsdatascience.com/in-12-minutes-stocks-analysis-with-pandas-and-scikit-learn-a8d8a7b50ee7
>- Pandas Data Reader: https://pandas-datareader.readthedocs.io/en/latest/
>- Datetime module: https://docs.python.org/3/library/datetime.html
>>- Also see: https://www.w3schools.com/python/python_datetime.asp
>- Matplotlib: https://matplotlib.org/tutorials/index.html

Narration Videos:

- Part 1: https://youtu.be/8XkqQZ5B-DQ
- Part 2: https://youtu.be/S54WkrKqrSo
- Part 3: https://youtu.be/Ijy9WqL1mjA
- Part 4: https://youtu.be/2afvFr655cI

# Pandas Part 8(a): Reading, Storing and Cleaning Financial Data

## Install the `pandas-datareader` module
### Several ways to install the pandas datareader module
1. `pip install pandas-datareader`
>- Mac users would do this in new terminal window
>- Windows users would do this in PowerShell
2. `conda install pandas-datareader`
>- Open Anaconda PowerShell prompt and type the above comands
3. You can also install within a notebook code cell as shown below

#### Note: for all of these options restart jupyter notebooks and/or Anaconda for the changes to take place

##### One way to install the pandas-datareader module is from within jupyter

## Import Necessary Modules

### These are our fundamental modules for working with pandas and reading data from the web

### The following are needed for the data visualization component of this lesson
>- We will be using a popular python data visualization module name, `matplotlib` 
>- Below is a link for various tutorials on `matplotlib`. A lot of cool stuff here! 
>>- https://matplotlib.org/tutorials/index.html

## Set up variables to define our start and end dates for stock analysis
>- We will look at stock prices over the past 10 years

## Create a DataFrame by Reading Data from YahooFinance
>- To see other data sources check out the following link:
https://github.com/wilsonfreitas/awesome-quant#data-sources
>- Initially, we will look at all the data but then we will turn our focus to the `Closing Price`
>- The first example pulls stock data for Apple with the ticker symbol of `aapl` but any stock including ETFs and Index funds can but used

## What are the metadata and basic statistics for our DataFrame?
>- Note: metadata is data about data (e.g., number of records and fields, data types, primary/foreign keys, etc)

### Some Metadata and Descriptive Questions to Answer
1. How many days of stock prices do we have in our DataFrame? 
2. How many fields do we have to work with? 
3. What kind of data do we have to work with? 
4. What are the descriptive statistics for all the fields in the DataFrame?
5. Then, we will answer some other descriptive analytics questions. 

##### How many records and fields do we have to work with?
>- Note: records in this example represent the number of days of stock data we have

##### What kind of data do we have to work with?

##### What are the descriptive statistics on Apple's stock?

## Prepare Data: Define New Variables and Create Common Financial Variables

### Question: How has Apple's adjusted closing price performed over the last 10 years? 
>- We have already calculated descriptive statistics for all fields but let's do it just for 'Adj Close'

#### Define a variable for adjusted close and run some descriptive stats on it

## Create Variables for Common Financial Metrics

### An Important Financial Metric:  Returns

Returns: $R_t = (P_t/P_{t-1}) - 1$

>- Where, 
>>- $R_t$ = return at time, t
>>- $P_t$ = the price at time, t
>>- $P_{t-1}$ is the price at time, t- 1 (the day before in our example)


#### To calculate daily returns...

#### Note: The `shift()` function essentially takes all the data in the original column and moves/shifts it by the number passed to it
>- In our example, we used `shift(1)` which takes aclose and shifts all the values 1 day

#### To see how `shift()` is working, we can create a DataFrame to store the current day price and a lag (prior day) price

#### Note: To get annualized returns you could use `shift(252)` or `shift(253)`
>- Most years have 252 or 253 trading days: https://en.wikipedia.org/wiki/Trading_day 

### Moving average (aka rolling average)

>- A moving average is a constantly updated average based on the 'n' past observations where 'n' is user specified. For example if we want the moving average for the past 10 days we would set n=10
>- Stock analysts will use moving averages for the past 50, 100, 200, etc
>>- We will look at moving averages for 50 and 100 days in this lesson
>>- We can use `field.rolling(window='n').mean()` syntax to define moving averages for the past 'n' observations
>>- Where:
>>>- `field` is a numeric value you want to calculate a rolling mean for
>>>- `n` is the number of periods you want to use to calculate the rolling mean

#### Define closing price moving averages for `50` and `100` days

### Now, let's answer some descriptive analytics questions

##### What was the adjusted closing price one year ago (Nov 12th, 2019)?
>- Note: If you are getting these notes at a different time update the date accordingly

##### On what date(s) did the maximum adjusted price occur? 
>- What was the price?

##### On what date(s) did the minimum price occur? 
>- What was the price?

##### How many days was the closing price above a $50?
>- Several ways to get this answer

##### What were the 2020 daily adjusted closing prices in the month of October (usually around the time Apple releases a new iPhone)? 
>- Several ways to do this. We will look at a couple.

#### First, using date methods on the index (date)

#### Using a slice on the index

## Pandas Part(b): Visualize
>- Visualizing data is a fundamental part of any analytics project
>- Data visualization can provide support for both descriptive and predictive analytics
>- Graphs can "speak a thousand words" and can be effectively used on Executive Summaries to convey powerful information to decision makers

Narration Videos:

- Part 1: https://youtu.be/qBvXG5_N7Ro
- Part 2: https://youtu.be/ydP2Y6xVdnU

### Plot a time series chart of the `close` price over the last 10 years
>- We will be using the `plot()` function from `matplotlib`. More info can be found at the below link:
>>- https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.plot.html
>- We will also use a popular graphical styling module called `ggplot` = "grammer of graphics"
>>- ggplot reference:  https://github.com/yhat/ggpy

#### Let's adjust some features of our graph
>- With `figsize(width,height)` we can adjust the width and height of our chart
>- With `plt.style.use(['dark_background`]) we can make the background black instead of the default white.
>- The link below provides more documentation on the different ways you can customize your charts:
>>- https://matplotlib.org/3.1.1/tutorials/introductory/customizing.html

### Now, plot the moving averages with the close price
>- Stock analysts look at charts like this to help them decide when to buy and sell

## Plotting Returns with the default `plot()` which is a line plot

## Plotting a boxplot for a summary chart on returns
### Boxplots show us:
>- Mean, Median, 1st and 3rd quartiles, minimum and maximum values
>>- Reference: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.boxplot.html#matplotlib.pyplot.boxplot