<a href="https://colab.research.google.com/github/MonkeyWrenchGang/MGTPython/blob/main/module_1/1_Colab_Tutorial_Stockmarket_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## What is Colab? 

Colab is a FREE version of Jupyter notebooks provided by Google. Google Colaboratory is a free Jupyter notebook environment that runs on Google’s cloud servers, letting the user leverage backend hardware like GPUs and TPUs. This lets you do everything you can in a Jupyter notebook hosted in your local machine, without requiring the installations and setup for hosting a notebook in your local machine.




This notebook is simply a sample that you can use to tinker with. You may have noticed I like dealing with stock market data. In this notebook we'll take a look at Colab:

1. mount your google drive 
2. load libraries 
3. download data from yahoo finance using pandas' data reader 
4. write data to your google drive 
     - write CSV & Excel to your G drive 
     - create a new google sheets doc, and write a dataframe to sheet 1
5. read data from your google drive 
     - read CSV & Excel
     - read from google sheets




## 1. Mount your google drive, so you can read/write from there.

Copy and paste the code below and run it. you should get prompted to click on a URL to allow colab access to your google drive. once you select allow, you will be presented with a access code, copy and paste the code in the prompt box. 

```python
from google.colab import drive
drive.mount('/content/gdrive')

```


In [12]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).



### install and update 


*   install yfinance package
*   update pandas-datareader 




In [13]:
!pip install yfinance
!pip install --upgrade pandas-datareader

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## 2. Load our libraries!

Here we are going to load our basic libraries. 

In [21]:
import pandas as pd
import numpy as np
import pandas_datareader as pdr
import matplotlib.pyplot as plt
# fix the downloader 
import yfinance as yfin
yfin.pdr_override()

import datetime as dt
import scipy.optimize as sco

### 3a. Next lets download data for IBM

let's see if we can download stockmarket data from Yahoo finance.

In [25]:
# note we can remove the source because we are now using the get data yahoo function! 

yfin.pdr_override()
IBM = pdr.get_data_yahoo('IBM',start='2020-01-23',end='2022-10-24')
IBM = IBM.reset_index()
IBM = IBM.rename(columns={"Adj Close":"ADJ_CLOSE"})
IBM.head()

[*********************100%***********************]  1 of 1 completed


Unnamed: 0,Date,Open,High,Low,Close,ADJ_CLOSE,Volume
0,2020-01-23 00:00:00-05:00,137.858505,138.05928,135.898666,136.587006,117.725349,5918059
1,2020-01-24 00:00:00-05:00,137.084137,137.59082,134.28299,134.378586,115.821869,5836889
2,2020-01-27 00:00:00-05:00,132.40918,133.910141,132.026764,132.523895,114.223305,4358264
3,2020-01-28 00:00:00-05:00,133.365204,134.292542,132.648178,133.412994,114.989639,3610374
4,2020-01-29 00:00:00-05:00,133.049713,133.441681,131.548752,131.634796,113.456985,3194275


## 3b.  Get data function. 
Here is a simple function that takes a list of symbols to download and returns a data frame. 

In [54]:

def get_data(symbols):
  """ download data from yahoo finance return a dataframe of date + adjusted closes  

    Keyword arguments:
    symbols -- list of stockmarket symbols, 
    
  """
  # create a empty data frame 
  df = pd.DataFrame()
  # for each symbol in symbols get the data, extract the adjusted close
  for symbol in symbols:
      df[symbol] = pdr.get_data_yahoo(symbol, 
                                  start='2020-01-01', 
                                  end='2021-12-31')['Adj Close']
  # rename the columns 
  df.columns = symbols
  df = df.reset_index()
  df["Date"] = pd.to_datetime(df["Date"]).dt.tz_localize(None)
  return df

symbols = ['AAPL',
'MSFT',
'GOOGL',
'FB',
'GOOG',
'NVDA', 
"SPY", 
"QQQ"]

stocks = get_data(symbols)
stocks.head()

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- FB: No timezone found, symbol may be delisted
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


Unnamed: 0,Date,AAPL,MSFT,GOOGL,FB,GOOG,NVDA,SPY,QQQ
0,2020-01-02,73.561554,156.151947,68.433998,,68.3685,59.770554,309.694946,212.340897
1,2020-01-03,72.846367,154.207565,68.075996,,68.032997,58.813869,307.349792,210.395889
2,2020-01-06,73.426834,154.606171,69.890503,,69.710503,59.060516,308.522369,211.751511
3,2020-01-07,73.081497,153.196533,69.755501,,69.667,59.775532,307.654907,211.722046
4,2020-01-08,74.257103,155.636703,70.251999,,70.216003,59.887646,309.294556,213.3134


## 4. Write to Disk
### Here is an example of writing a CSV and EXEL file data to my gdrive 

your location will be different!

In [55]:
stocks.to_csv("/content/gdrive/MyDrive/Colab Notebooks/data/stocks.csv", index=False)
stocks.to_excel("/content/gdrive/MyDrive/Colab Notebooks/data/stocks.xlsx", index=False)

## 5. Example reading CSV and EXCEL from google drive. 

In [56]:
df = pd.read_csv("/content/gdrive/MyDrive/Colab Notebooks/data/stocks.csv")
df.head()

Unnamed: 0,Date,AAPL,MSFT,GOOGL,FB,GOOG,NVDA,SPY,QQQ
0,2020-01-02,73.561554,156.151947,68.433998,,68.3685,59.770554,309.694946,212.340897
1,2020-01-03,72.846367,154.207565,68.075996,,68.032997,58.813869,307.349792,210.395889
2,2020-01-06,73.426834,154.606171,69.890503,,69.710503,59.060516,308.522369,211.751511
3,2020-01-07,73.081497,153.196533,69.755501,,69.667,59.775532,307.654907,211.722046
4,2020-01-08,74.257103,155.636703,70.251999,,70.216003,59.887646,309.294556,213.3134


In [57]:
df2 = pd.read_excel("/content/gdrive/MyDrive/Colab Notebooks/data/stocks.xlsx")
df2.head()

Unnamed: 0,Date,AAPL,MSFT,GOOGL,FB,GOOG,NVDA,SPY,QQQ
0,2020-01-02,73.561554,156.151947,68.433998,,68.3685,59.770554,309.694946,212.340897
1,2020-01-03,72.846367,154.207565,68.075996,,68.032997,58.813869,307.349792,210.395889
2,2020-01-06,73.426834,154.606171,69.890503,,69.710503,59.060516,308.522369,211.751511
3,2020-01-07,73.081497,153.196533,69.755501,,69.667,59.775532,307.654907,211.722046
4,2020-01-08,74.257103,155.636703,70.251999,,70.216003,59.887646,309.294556,213.3134


## Quick and Dirty Analysis 

In [48]:
# -- descriptive stats AAPL -- 
df["AAPL"].describe()

count    504.000000
mean     116.694674
std       29.423690
min       55.082973
25%       90.122931
50%      122.373371
75%      139.942383
max      179.289444
Name: AAPL, dtype: float64

In [49]:
stocks.head()

Unnamed: 0,Date,AAPL,MSFT,GOOGL,FB,GOOG,NVDA,SPY,QQQ
0,2020-01-02,73.561531,156.151932,68.433998,,68.3685,59.77055,309.694885,212.340942
1,2020-01-03,72.846359,154.207581,68.075996,,68.032997,58.813866,307.349762,210.395905
2,2020-01-06,73.426819,154.606171,69.890503,,69.710503,59.060513,308.522369,211.75148
3,2020-01-07,73.081505,153.196487,69.755501,,69.667,59.775536,307.654907,211.722046
4,2020-01-08,74.257095,155.636673,70.251999,,70.216003,59.887642,309.294556,213.313431


In [58]:
import altair as alt

alt.Chart(stocks).mark_line().encode(
  x='Date:T',
  y='AAPL',
).interactive(bind_y=False).properties(
    title='Apple Stock Price 2020 - Present',
    width=1000,
    height=250
)

In [59]:
stocks = stocks.reset_index()
stocks = stocks.set_index(['Date'])
stocks_nrml = stocks.div(stocks.iloc[0])
stocks_nrml = stocks_nrml.reset_index() 
stocks_nrml.head()

Unnamed: 0,Date,index,AAPL,MSFT,GOOGL,FB,GOOG,NVDA,SPY,QQQ
0,2020-01-02,,1.0,1.0,1.0,,1.0,1.0,1.0,1.0
1,2020-01-03,inf,0.990278,0.987548,0.994769,,0.995093,0.983994,0.992428,0.99084
2,2020-01-06,inf,0.998169,0.990101,1.021283,,1.019629,0.988121,0.996214,0.997224
3,2020-01-07,inf,0.993474,0.981073,1.019311,,1.018993,1.000083,0.993413,0.997086
4,2020-01-08,inf,1.009455,0.9967,1.026566,,1.027023,1.001959,0.998707,1.00458


In [60]:
alt.Chart(stocks_nrml).transform_fold(
    ['AAPL', 'MSFT', 'GOOG'],
    as_=['company', 'growth of $1']
).mark_line().encode(
    x='Date:T',
    y='growth of $1:Q',
    color='company:N'
).interactive(bind_y=False).properties(
    title='Apple vs Google, vs MSFT 2020 - Present',
    width=1000,
    height=250
)

In [61]:
%%shell 

jupyter nbconvert --to html "/content/gdrive/MyDrive/Colab Notebooks/1_Colab_Tutorial_Stockmarket_Data.ipynb"


This application is used to convert notebook files (*.ipynb)
        to various other formats.


Options
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
    <cmd> --help-all

--debug
    set log level to logging.DEBUG (maximize logging output)
    Equivalent to: [--Application.log_level=10]
--show-config
    Show the application's configuration (human-readable format)
    Equivalent to: [--Application.show_config=True]
--show-config-json
    Show the application's configuration (json format)
    Equivalent to: [--Application.show_config_json=True]
--generate-config
    generate default config file
    Equivalent to: [--JupyterApp.generate_config=True]
-y
    Answer yes to any questions instead of prompting.
    Equivalent to: [--JupyterApp.answer_yes=True]
--execute
    Execute the notebook prior to export.
    Equivalent to: [--ExecutePr

CalledProcessError: ignored