**Dénes Csala**  
University of Bristol, 2022  

Based on *Elements of Data Science* ([Allen B. Downey](https://allendowney.com), 2021) and *Python Data Science Handbook* ([Jake VanderPlas](https://jakevdp.github.io/PythonDataScienceHandbook/), 2018)

License: [MIT](https://mit-license.org/)

# Loading financial data into _pandas_

Install _Yahoo Finance_. If you put a `!` in the first character of a cell, it becomes a _Linux_ command. The cell below would install the `yfinance` _python package_, using the `pip` package manager tool. However, when you run the cell for the second time (in _Colab_ the packages you install persist for about 8 hours) it is already installed - so you get only `Requirement already satisfied` messages.

In [None]:
!pip install yfinance

Collecting yfinance
  Downloading yfinance-0.1.67-py2.py3-none-any.whl (25 kB)
Collecting lxml>=4.5.1
  Downloading lxml-4.6.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.3 MB)
[K     |████████████████████████████████| 6.3 MB 14.8 MB/s 
Installing collected packages: lxml, yfinance
  Attempting uninstall: lxml
    Found existing installation: lxml 4.2.6
    Uninstalling lxml-4.2.6:
      Successfully uninstalled lxml-4.2.6
Successfully installed lxml-4.6.4 yfinance-0.1.67


In [None]:
import yfinance as yf

In [None]:
nvda = yf.Ticker("NVDA").history(period='5y')
tsla = yf.Ticker("TSLA").history(period='5y')
ryaay = yf.Ticker("RYAAY").history(period='5y')
wizz = yf.Ticker("WIZZ.L").history(period='5y')

Now we have downloaded the data for four stocks, computer graphics company nVidia `NVDA` and electric vehicle manufacturer Tesla `TSLA`, and low-cost airlines RyanAir `RYAAY` and WizzAir `WIZZ.L` for the past 5 years.

The responses returned are _pandas_ `DataFrames`. They contain [OHLC](https://www.investopedia.com/terms/o/ohlcchart.asp) data, but we only need the `Close` columns this time. Let us also give them names.

In [None]:
nvda=nvda[['Close']]
nvda['Name']='NVDA'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [None]:
nvda.head()

Unnamed: 0_level_0,Close,Name
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2016-11-28,23.224274,NVDA
2016-11-29,23.012047,NVDA
2016-11-30,22.752928,NVDA
2016-12-01,21.627621,NVDA
2016-12-02,21.827509,NVDA


This is what we want. However surely, there is a more efficient way to automating this, if we have multiple stocks.

In [None]:
dfs=[] #create empty list of dataframes
for x in ['NVDA','TSLA','RYAAY','WIZZ.L']:
  df = yf.Ticker(x).history(period='5y')
  df=df[['Close']]
  df['Name']=x
  dfs.append(df) #append newly download and formatted dataframe to our list of dtaaframes

Great. Now we have a list of `DataFrame`s, each containing the closing stock price and the stock name, for the past 5 years. 

# DataFrame combination

## Concatenation

In [None]:
len(dfs)

4

In [None]:
dfs[0].head(2)

Unnamed: 0_level_0,Close,Name
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2016-11-28,23.22427,NVDA
2016-11-29,23.012047,NVDA


In [None]:
dfs[1].head(2)

Unnamed: 0_level_0,Close,Name
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2016-11-28,39.223999,TSLA
2016-11-29,37.914001,TSLA


We can combine `DataFrames` by stacking them on top of each other using `concat`. They must have the same `column` names (otherwise, empty columns will be created and filled with `NaN`s). The `pd.concat` function accepts only `list []` arguments - therefore, the `DataFrame`s to be combined have to be in the format `[dfA, dfB]`.

In [None]:
import pandas as pd
pd.concat([ dfs[1],dfs[2] ])

Unnamed: 0_level_0,Close,Name
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2016-11-28,39.223999,TSLA
2016-11-29,37.914001,TSLA
2016-11-30,37.880001,TSLA
2016-12-01,36.375999,TSLA
2016-12-02,36.293999,TSLA
...,...,...
2021-11-19,105.889999,RYAAY
2021-11-22,102.750000,RYAAY
2021-11-23,104.000000,RYAAY
2021-11-24,103.800003,RYAAY


In [None]:
pd.concat(dfs)

Unnamed: 0_level_0,Close,Name
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2016-11-28,23.224270,NVDA
2016-11-29,23.012047,NVDA
2016-11-30,22.752930,NVDA
2016-12-01,21.627615,NVDA
2016-12-02,21.827509,NVDA
...,...,...
2021-11-23,4300.000000,WIZZ.L
2021-11-24,4352.000000,WIZZ.L
2021-11-25,4399.000000,WIZZ.L
2021-11-26,3729.000000,WIZZ.L


## Joining

Sometimes, the dataframes to be joined need to end up next to each other, a "_horizontal_ `concat`". This is called `join`. The `DataFrames` to be combined must have the same index. They must _not_ have any matching `column` names - though these can be renamed automatically using `lsuffix` or `rsuffix`.

In [None]:
ryan=dfs[2]
wizz=dfs[3]
nvda=dfs[0]
ryan.join(wizz,rsuffix='_right').join(nvda,rsuffix='_nn')

Unnamed: 0_level_0,Close,Name,Close_right,Name_right,Close_nn,Name_nn
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2016-11-28,82.300003,RYAAY,,,23.224270,NVDA
2016-11-29,82.339996,RYAAY,1693.0,WIZZ.L,23.012047,NVDA
2016-11-30,79.839996,RYAAY,1676.0,WIZZ.L,22.752930,NVDA
2016-12-01,80.260002,RYAAY,1737.0,WIZZ.L,21.627615,NVDA
2016-12-02,80.669998,RYAAY,1753.0,WIZZ.L,21.827509,NVDA
...,...,...,...,...,...,...
2021-11-19,105.889999,RYAAY,4173.0,WIZZ.L,329.850006,NVDA
2021-11-22,102.750000,RYAAY,4184.0,WIZZ.L,319.559998,NVDA
2021-11-23,104.000000,RYAAY,4300.0,WIZZ.L,317.459991,NVDA
2021-11-24,103.800003,RYAAY,4352.0,WIZZ.L,326.739990,NVDA


Let's do something smarter:

In [None]:
dfz=pd.DataFrame() #initialise empty DataFrame
for x in dfs:
  stock_name=x['Name'].values[0]
  stock_name=stock_name.replace('.','')
  x=x[['Close']]
  x.columns=[stock_name]
  dfz=x.join(dfz)

In [None]:
dfz.head()

Unnamed: 0_level_0,WIZZL,RYAAY,TSLA,NVDA
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2016-11-29,1693.0,82.339996,37.914001,23.012047
2016-11-30,1676.0,79.839996,37.880001,22.75293
2016-12-01,1737.0,80.260002,36.375999,21.627615
2016-12-02,1753.0,80.669998,36.293999,21.827509
2016-12-05,1762.0,82.129997,37.360001,22.673958


Ready to export. In _CSV_:

In [None]:
dfz.to_csv('my_stocks.csv')

To _JSON_.

In [None]:
import json

In [None]:
json_list_of_dicts=list(df.T.to_dict().values())
open('my_stocks.json','w').write(json.dumps(json_list_of_dicts))

46827