# **Step 1: Installing Python Modules**

We need to import necessary modules to kicktstart the data visualization project
- "matplotlib" includes some basic chart plotting functions and it is widely used in statistical researches
- "plotly" contains interactive chart plotting functions and it is widely used in business analytics

---

In [None]:
import requests
from bs4 import BeautifulSoup 
import pandas as pd 

!pip install plotly==4.5.0
import matplotlib.pyplot as plt   
import plotly.graph_objects as go
import plotly.express as px
from datetime import datetime

%matplotlib inline
%pylab inline
pylab.rcParams['figure.figsize'] = (12, 10)

The web scraper function that we have developed in the previous project will be used here for extracting stock data.

In [None]:
def scrape_table(Url):
    soup = BeautifulSoup(requests.get(Url).text)
    headers = [header.text for listing in soup.find_all('thead') for header in listing.find_all('th')]
    raw_data = {header:[] for header in headers}

    for rows in soup.find_all('tbody'):
      for row in rows.find_all('tr'):
        if len(row) != len(headers) or row.find_all('td')[3].text == '-': 
          continue
        for idx, cell in enumerate(row.find_all('td')):
          raw_data[headers[idx]].append(cell.text)

    return pd.DataFrame(raw_data)

# **Step 2: Data Type Conversion**


The data type conversion functions in the previous project Wrangling will be used here for data conversion as well.

In [None]:
def convert_column_to_float(df, columns):
  for column in columns: 
    df[column] = pd.to_numeric(df[column].str.replace(',',''))
  return df

def convert_column_to_datetime(df, columns):
  for column in columns:
    df[column] = pd.to_datetime(df[column])
  return df

def revert_scaled_number(number):
  mapping = {'M': 1000000, 'B': 1000000000, 'T': 1000000000000}
  scale = number[-1]
  return float(number[0:-1]) * mapping[scale]

# **Step 3: Basic Chart Types**

Let's extract some stock data! Let's use Apple Inc. as an example. We only want to keep the prices for the latest 90 days. Then, we convert the data into the right format and the following table shows each date's open, close, low and high price.

In [None]:
apple = scrape_table("https://finance.yahoo.com/quote/AAPL/history?p=AAPL")[0:90]
# TODO: use the data type conversion functions to convert the data columns

Using "Date" as X-axis, "Price" as Y-axis, use Matplotlib to draw the stock movement graph.

References：
*   lw = linewidth (The width of the line)
*   kind (The mark on the line)
*   grid (Whether to use grid cells)
*   title (Add title to the graph)


In [None]:
ts = pd.Series(apple["Adj Close**"].values, index=apple["Date"])
# TODO: plot the line chart using matplotlib

Let's create a pie chart with the use of filtering function.

- You have learnt how to filter dataframes in previous sessions. Let's use them to create a pie chart
- The pie chart should show the percentage distribution of market capitalization of different cryptocurrencies

In [None]:
df=scrape_table("https://finance.yahoo.com/cryptocurrencies?count=200&offset=0")
df['Market Cap']=df['Market Cap'].apply(revert_scaled_number)
# TODO: aggregate the data and plot a pie chart showing the market cap distribution of cryptocurrencies

**Candle Stick Charts**

Use Plotly to draw the candlestick graph (Move cursor on the graph to show hover data).

References：


*   autosize (Auto zoom-in and out)
*   margin (Adjust the margin of the graph; l:left, r:right, t:top, b:bottom)
*   paper_bgcolor (Change background color of the graph)


In [None]:
# TODO: use plotly to plot a candlestick chart showing apple's stock history

#Step 4: Advanced chart types

**Multi-stock area graph**
- Using the web scraper to obtain data from various stocks 
- Then, stack them together in an area chart/line chart.

In [None]:
microsoft, google = (scrape_table("https://finance.yahoo.com/quote/"+s+"/history?p="+s)[0:90] for s in ["MSFT", "GOOG"])
microsoft, google = convert_column_to_float(microsoft, microsoft.columns[1:]), convert_column_to_float(google, google.columns[1:])
microsoft, google = convert_column_to_datetime(microsoft, [microsoft.columns[0]]), convert_column_to_datetime(google, [google.columns[0]])
 
# TODO: plot a multi-stock graph using matplotlib

**Heatmaps**

- The rectangle area represents market capitalization
- The color gradient represents percentage price change (positive, negative)
- Ticker symbol is shown on each rectangle
- Hovering over the square will show the underlying details of a stock

In [None]:
activestocks = scrape_table("https://finance.yahoo.com/most-active?count=200&offset=0")
marketCaps = activestocks['Market Cap'].apply(revert_scaled_number)
percentChanges = activestocks['% Change'].str.replace('+','').str.replace('%','').astype(float)

In [None]:
# TODO: plot a heatmap showing the active stock data

**Bubble Charts**

In [None]:
activestocks = scrape_table("https://finance.yahoo.com/most-active?count=200&offset=0")
activestocks['Market Cap']=activestocks['Market Cap'].apply(revert_scaled_number)
activestocks['% Change']=activestocks['% Change'].str.replace('+','').str.replace('%','').astype(float)
activestocks['Price (Intraday)']=activestocks['Price (Intraday)'].str.replace(',','').astype(float)


# TODO: plot a bubble chart showing the active stock data