## What is Colab? 

Colab is a FREE version of Jupyter notebooks provided by Google. Google Colaboratory is a free Jupyter notebook environment that runs on Google’s cloud servers, letting the user leverage backend hardware like GPUs and TPUs. This lets you do everything you can in a Jupyter notebook hosted in your local machine, without requiring the installations and setup for hosting a notebook in your local machine.




This notebook is simply a sample that you can use to tinker with. You may have noticed I like dealing with stock market data. In this notebook we'll take a look at Colab:

1. mount your google drive 
2. load libraries 
3. download data from yahoo finance using pandas' data reader 
4. write data to your google drive 
     - write CSV & Excel to your G drive 
     - create a new google sheets doc, and write a dataframe to sheet 1
5. read data from your google drive 
     - read CSV & Excel
     - read from google sheets




Collecting yfinance
  Downloading yfinance-0.1.63.tar.gz (26 kB)
Collecting lxml>=4.5.1
  Downloading lxml-4.6.3-cp37-cp37m-manylinux2014_x86_64.whl (6.3 MB)
[K     |████████████████████████████████| 6.3 MB 6.4 MB/s 
Building wheels for collected packages: yfinance
  Building wheel for yfinance (setup.py) ... [?25l[?25hdone
  Created wheel for yfinance: filename=yfinance-0.1.63-py2.py3-none-any.whl size=23918 sha256=bad316e0e43916d0e50bf36d7f1b2f1a7d839eb6337e4371023d95b3c5a3c248
  Stored in directory: /root/.cache/pip/wheels/fe/87/8b/7ec24486e001d3926537f5f7801f57a74d181be25b11157983
Successfully built yfinance
Installing collected packages: lxml, yfinance
  Attempting uninstall: lxml
    Found existing installation: lxml 4.2.6
    Uninstalling lxml-4.2.6:
      Successfully uninstalled lxml-4.2.6
Successfully installed lxml-4.6.3 yfinance-0.1.63
Collecting pandas-datareader
  Downloading pandas_datareader-0.10.0-py3-none-any.whl (109 kB)
[K     |████████████████████████████████| 

[*********************100%***********************]  1 of 1 completed
                  Open        High  ...   Adj Close     Volume
Date                                ...                       
2017-01-03  225.039993  225.830002  ...  207.534515   91366500
2017-01-04  225.619995  226.750000  ...  208.769211   78744400
2017-01-05  226.270004  226.580002  ...  208.603333   78379000
2017-01-06  226.529999  227.750000  ...  209.349670   71559900
2017-01-09  226.910004  227.070007  ...  208.658646   46939700
...                ...         ...  ...         ...        ...
2017-04-24  237.179993  237.410004  ...  219.477432  119209900
2017-04-25  237.910004  238.949997  ...  220.754486   76698300
2017-04-26  238.509995  239.529999  ...  220.615677   84702500
2017-04-27  238.770004  238.949997  ...  220.800720   57410300
2017-04-28  238.899994  238.929993  ...  220.319565   63532800

[81 rows x 6 columns]


## 1. Mount your google drive, so you can read/write from there.

Copy and paste the code below and run it. you should get prompted to click on a URL to allow colab access to your google drive. once you select allow, you will be presented with a access code, copy and paste the code in the prompt box. 

```python
from google.colab import drive
drive.mount('/content/gdrive')

```


In [1]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive



### install and update 


*   install yfinance package
*   update pandas-datareader 




In [None]:
!pip install yfinance
!pip install --upgrade pandas-datareader

## 2. Load our libraries!

Here we are going to load our basic libraries. 

In [1]:
import pandas as pd
import numpy as np
import pandas_datareader as pdr
import matplotlib.pyplot as plt
import yfinance as yf
import datetime as dt
import scipy.optimize as sco

### 3a. Next lets download data for IBM

let's see if we can download stockmarket data from Yahoo finance.

In [2]:
IBM = pdr.DataReader("IBM", data_source='yahoo',  
                                  start='2020-1-1', 
                                  end='2021-12-31')
IBM = IBM.reset_index()
IBM = IBM.rename(columns={"Adj Close":"ADJ_CLOSE"})
IBM.head()

Unnamed: 0,Date,High,Low,Open,Close,Volume,ADJ_CLOSE
0,2020-01-02,135.919998,134.770004,135.0,135.419998,3148600.0,124.142937
1,2020-01-03,134.860001,133.559998,133.570007,134.339996,2373700.0,123.152855
2,2020-01-06,134.240005,133.199997,133.419998,134.100006,2425500.0,122.932846
3,2020-01-07,134.960007,133.399994,133.690002,134.190002,3090800.0,123.015358
4,2020-01-08,135.860001,133.919998,134.509995,135.309998,4346000.0,124.042076


## 3b.  Get data function. 
Here is a simple function that takes a list of symbols to download and returns a data frame. 

In [3]:

def get_data(symbols):
  """ download data from yahoo finance return a dataframe of date + adjusted closes  

    Keyword arguments:
    symbols -- list of stockmarket symbols, 
    
  """
  # create a empty data frame 
  df = pd.DataFrame()
  # for each symbol in symbols get the data, extract the adjusted close
  for symbol in symbols:
      df[symbol] = pdr.DataReader(symbol, 
                                  data_source='yahoo',  
                                  start='2020-1-1', 
                                  end='2021-12-31')['Adj Close']
  # rename the columns 
  df.columns = symbols
  return df.reset_index()

symbols = ['AAPL',
'MSFT',
'GOOGL',
'FB',
'GOOG',
'NVDA', 
"SPY", 
"QQQ"]

stocks = get_data(symbols)
stocks.head()

Unnamed: 0,Date,AAPL,MSFT,GOOGL,FB,GOOG,NVDA,SPY,QQQ
0,2020-01-02,74.096443,157.903488,1368.680054,209.779999,1367.369995,59.844322,316.83667,214.416458
1,2020-01-03,73.376083,155.937286,1361.52002,208.669998,1360.660034,58.886452,314.4375,212.452408
2,2020-01-06,73.96077,156.340347,1397.810059,212.600006,1394.209961,59.1334,315.637115,213.821289
3,2020-01-07,73.61293,154.914871,1395.109985,213.059998,1393.339966,59.849308,314.749573,213.791519
4,2020-01-08,74.797081,157.382431,1405.040039,215.220001,1404.319946,59.961559,316.427063,215.398438


## 4. Write to Disk
### Here is an example of writing a CSV and EXEL file data to my gdrive 

your location will be different!

In [4]:
stocks.to_csv("/content/gdrive/MyDrive/Colab Notebooks/MGT4192/stocks.csv", index=False)
stocks.to_excel("/content/gdrive/MyDrive/Colab Notebooks/MGT4192/stocks.xlsx", index=False)

### Here is an exmaple of writing a sheet for google sheets

1. Import the library
2. Authenticate
3. Create the interface to Sheets. "gc"




In [5]:
from google.colab import auth
auth.authenticate_user()

import gspread
from gspread_dataframe import set_with_dataframe
from oauth2client.client import GoogleCredentials

gc = gspread.authorize(GoogleCredentials.get_application_default())

After executing the cell below, you will see a new spreadsheet named 'My new spreadsheet' at https://sheets.google.com. open it up and you should see our dataset written to the spreadsheet. 

1. gc.create() - creates a new spreadsheet
2. you need the worksheet pointer to write to the sheet
3. write your data frame to a worksheet. 


In [6]:
sh = gc.create('My new spreadsheet')
worksheet = sh.get_worksheet(0) #-> 0 - first sheet, 1 - second sheet etc.
set_with_dataframe(worksheet, stocks)

## 5. Example reading CSV and EXCEL from google drive. 

In [7]:
df = pd.read_csv("/content/gdrive/MyDrive/Colab Notebooks/MGT4192/stocks.csv")
df.head()

Unnamed: 0,Date,AAPL,MSFT,GOOGL,FB,GOOG,NVDA,SPY,QQQ
0,2020-01-02,74.096443,157.903488,1368.680054,209.779999,1367.369995,59.844322,316.83667,214.416458
1,2020-01-03,73.376083,155.937286,1361.52002,208.669998,1360.660034,58.886452,314.4375,212.452408
2,2020-01-06,73.96077,156.340347,1397.810059,212.600006,1394.209961,59.1334,315.637115,213.821289
3,2020-01-07,73.61293,154.914871,1395.109985,213.059998,1393.339966,59.849308,314.749573,213.791519
4,2020-01-08,74.797081,157.382431,1405.040039,215.220001,1404.319946,59.961559,316.427063,215.398438


In [8]:
df2 = pd.read_excel("/content/gdrive/MyDrive/Colab Notebooks/MGT4192/stocks.xlsx")
df2.head()

Unnamed: 0,Date,AAPL,MSFT,GOOGL,FB,GOOG,NVDA,SPY,QQQ
0,2020-01-02,74.096443,157.903488,1368.680054,209.779999,1367.369995,59.844322,316.83667,214.416458
1,2020-01-03,73.376083,155.937286,1361.52002,208.669998,1360.660034,58.886452,314.4375,212.452408
2,2020-01-06,73.96077,156.340347,1397.810059,212.600006,1394.209961,59.1334,315.637115,213.821289
3,2020-01-07,73.61293,154.914871,1395.109985,213.059998,1393.339966,59.849308,314.749573,213.791519
4,2020-01-08,74.797081,157.382431,1405.040039,215.220001,1404.319946,59.961559,316.427063,215.398438


## Reading from google Sheets

simply open the worksheet and get all the records from the sheet. 

In [9]:
dataframe = pd.DataFrame(worksheet.get_all_records())
dataframe.head()

Unnamed: 0,Date,AAPL,MSFT,GOOGL,FB,GOOG,NVDA,SPY,QQQ
0,2020-01-02 0:00:00,74.096443,157.903488,1368.680054,209.779999,1367.369995,59.844322,316.83667,214.416458
1,2020-01-03 0:00:00,73.376083,155.937286,1361.52002,208.669998,1360.660034,58.886452,314.4375,212.452408
2,2020-01-06 0:00:00,73.96077,156.340347,1397.810059,212.600006,1394.209961,59.1334,315.637115,213.821289
3,2020-01-07 0:00:00,73.61293,154.914871,1395.109985,213.059998,1393.339966,59.849308,314.749573,213.791519
4,2020-01-08 0:00:00,74.797081,157.382431,1405.040039,215.220001,1404.319946,59.961559,316.427063,215.398438


## Quick and Dirty Analysis 

In [10]:
# -- descriptive stats AAPL -- 
df["AAPL"].describe()

count    420.000000
mean     109.956695
std       26.068996
min       55.483528
25%       80.550295
50%      118.264339
75%      129.970272
max      153.119995
Name: AAPL, dtype: float64

In [11]:
stocks.head()

Unnamed: 0,Date,AAPL,MSFT,GOOGL,FB,GOOG,NVDA,SPY,QQQ
0,2020-01-02,74.096443,157.903488,1368.680054,209.779999,1367.369995,59.844322,316.83667,214.416458
1,2020-01-03,73.376083,155.937286,1361.52002,208.669998,1360.660034,58.886452,314.4375,212.452408
2,2020-01-06,73.96077,156.340347,1397.810059,212.600006,1394.209961,59.1334,315.637115,213.821289
3,2020-01-07,73.61293,154.914871,1395.109985,213.059998,1393.339966,59.849308,314.749573,213.791519
4,2020-01-08,74.797081,157.382431,1405.040039,215.220001,1404.319946,59.961559,316.427063,215.398438


In [12]:
import altair as alt

alt.Chart(stocks).mark_line().encode(
  x='Date:T',
  y='AAPL',
).interactive(bind_y=False).properties(
    title='Apple Stock Price 2020 - Present',
    width=1000,
    height=250
)

In [13]:
stocks = stocks.reset_index()
stocks = stocks.set_index(['Date'])
stocks_nrml = stocks.div(stocks.iloc[0])
stocks_nrml = stocks_nrml.reset_index() 
stocks_nrml.head()

Unnamed: 0,Date,index,AAPL,MSFT,GOOGL,FB,GOOG,NVDA,SPY,QQQ
0,2020-01-02,,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
1,2020-01-03,inf,0.990278,0.987548,0.994769,0.994709,0.995093,0.983994,0.992428,0.99084
2,2020-01-06,inf,0.998169,0.990101,1.021283,1.013443,1.019629,0.98812,0.996214,0.997224
3,2020-01-07,inf,0.993475,0.981073,1.019311,1.015635,1.018993,1.000083,0.993413,0.997085
4,2020-01-08,inf,1.009456,0.9967,1.026566,1.025932,1.027023,1.001959,0.998707,1.00458


In [14]:
alt.Chart(stocks_nrml).transform_fold(
    ['AAPL', 'MSFT', 'GOOG'],
    as_=['company', 'growth of $1']
).mark_line().encode(
    x='Date:T',
    y='growth of $1:Q',
    color='company:N'
).interactive(bind_y=False).properties(
    title='Apple vs Google, vs MSFT 2020 - Present',
    width=1000,
    height=250
)

In [None]:
%%shell 

jupyter nbconvert --to html "/content/gdrive/MyDrive/Colab Notebooks/1_Colab_Tutorial_Stockmarket_Data.ipynb"
