# Day 4
## Retrieving data from Yahoo!Finance
[Yahoo!Finance](https://au.finance.yahoo.com/) offers historical market data, recent, several years' financial
statements, current quotes, analyst recommendations, options data, and more. The
historical trading data include daily, weekly, monthly, and dividends. The historical
data has several variables: open price, high price achieved, lowest price achieved,
trading volume, close price, and adjusted-close price (which is adjusted for splits and
dividends). Historical quotes typically do not go back further than 1960. offers historical market data, recent, several years' financial statements, current quotes, analyst recommendations, options data, and more. The historical trading data include daily, weekly, monthly, and dividends. The historical data has several variables: open price, high price achieved, lowest price achieved, trading volume, close price, and adjusted-close price (which is adjusted for splits and dividends). Historical quotes typically do not go back further than 1960.

In [None]:
import pandas as pd
x=pd.read_csv("AAPL.csv")
print(type(x))

To view the first and the last few observations, the `.head()` and `.tail()` functions could be used. The default values of those two functions are 5. In the following, the command of `x.head()` will output the first five lines, while `x.tail(2)` will output the last two lines:

In [None]:
x.head()

In [None]:
x.tail(2)

## Accessing specific elements

In [None]:
print(x['Open'])

In [None]:
print(x['Close'])

In [None]:
print(x['Open']-x['Close'])

In [None]:
(x['Open']-x['Close']).plot(title='Open-Close')

Another way of thinking about stock volatility is to consider High-Low. What are the benefits of this measure compared to standard deviation?

In [None]:
(x['High']-x['Low']).plot(title='High-Low')

The measure above did not account for a trend in price over the years. So, an adjustment needs to be made:

In [None]:
((x['High']-x['Low'])/x['Close']).plot(title='(High-Low)/Close')

## Series

`Series` is a one-dimensional **labeled** array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The labels are collectively referred to as the index. Financial data, such as stock price or volume traded, does not contain numeric data only, but also the time stamp which can be used as labels in `Series`.

The basic method to create a `Series` is to call:

`>>> s = pd.Series(data, index=index)`

In [None]:
ts = pd.Series(list(x['Open']-x['Close']),x['Date'])

In [None]:
print(ts)

In [None]:
ts.plot(title='Open-Close',figsize=(12,8))

In [None]:
print(type(x['Adj Close']))

In [None]:
P = x['Adj Close'].to_list()
print(type(P))

Suppose you want to calculate returns ($r_t$) given a series of prices ($P_t$.)

Your returns are   $r_{t}=\frac{P_{t}-P_{t-1}}{P_{t-1}}$.

In [None]:
r = (P[1:]-P[0:-1])/P[0:-1]
print(r)

In [None]:
# Need to use list comprehensions, and destructuring
r = [(x1 - x2)/x2 for (x1, x2) in zip(P[1:], P[0:-1])]

print(r[0:25])  # This prints the first 25 values in the list 'r'

Alternatively, can extract specific elements using `iloc[]` method.

Recall our dataframe `x`:

In [None]:
x.head()

In [None]:
x.iloc[0,0]

In [None]:
x.iloc[0,1]

In [None]:
x.iloc[1:,5]

In [None]:
x.iloc[0:-1,5]

Operations with arrays of data are vector and matrix operations and are done through `numpy` (you can also visualise this in **Excel** by opening `AAPL.csv` and shiting blocks of data):

In [None]:
import numpy as np
r2 = (np.array(x.iloc[1:,5])-np.array(x.iloc[0:-1,5]))/np.array(x.iloc[0:-1,5])
print(r2)

In [None]:
APPLret = pd.Series(r2,x.iloc[1:,0])  # why Series labels are starting from 1? ==> x.iloc[1:,0]
print(APPLret)
APPLret.plot(title='AAPL Returns')

### <font color=red> Homework </font>
> Plot percentage change in traded volume instead of returns

## Getting data directly from the Web
Another way of getting data is directly from the oline source. Use `yfinance` library. For the latest developments and the list of sources see [here](https://pypi.org/project/yfinance/).

Note that is you want a **quiet** installation (no output during installation process) , use option `pip -q install package_name` instead of `pip -q install package_name`:

In [None]:
import sys
!{sys.executable} -m pip install yfinance

In [None]:
import yfinance as yf

aapl = yf.download("AAPL",  
                   start='1980-12-11', 
                   end='2023-01-25')

In [None]:
aapl.head()

In [None]:
aapl.tail(2)

In [None]:
# Only get the adjusted close for a specific period.
aapl_oneyear = yf.download("AAPL",  
                   start='2014-1-1', 
                   end='2014-12-31')['Adj Close']
print(type(aapl_oneyear))

In [None]:
aapl_oneyear.head()

In [None]:
aapl_oneyear.tail()

In [None]:
print(aapl_oneyear[0:])

In [None]:
# Convert the adjusted closing prices to cumulative returns.
ret = aapl_oneyear.pct_change() # fill_method='ffill' if there are missing values
print(ret)
print(type(ret))

In [None]:
((1 + ret).cumprod() - 1).plot(title='AAPL Cumulative Returns',figsize=(12,8))

Apply to all columns:

In [None]:
ret = aapl.pct_change()
((1 + ret).cumprod() - 1).plot(title='AAPL Cumulative Returns',figsize=(12,8))


In [None]:
vol = aapl.iloc[:,5].plot(title='AAPL Volume',figsize=(12,8))

In [None]:
# Create a dictionary containing several stocks
ticker_list={'INTC':'Intel', 
            'MSFT':'Microsoft',
            'BHP':'BHP',
            'BA':'Boeing',
            'TM':'Toyota',
            'GME':'GameStop',
            'BABA':'Alibaba'}
print(ticker_list)

In [None]:
import datetime as dt

def read_data(ticker_list,
              start=dt.datetime(2020,1,2),
              end=dt.datetime(2024,1,15)):
    ticker=pd.DataFrame()
    
    for tick in ticker_list:
        prices=yf.download(tick, start, end)
        closing_prices=prices['Close']
        ticker[tick]=closing_prices
        
    return ticker

In [None]:
x=read_data(ticker_list)

In [None]:
x.tail()

In [None]:
x.plot(title=ticker_list,subplots=True,figsize=(12,30))

In [None]:
# Not recommended, but try:   
#         title='Single title string' 
# instead of 
#         title=ticker_list

x.plot(title='Single title string',subplots=True,figsize=(12,30))