<a href="https://colab.research.google.com/github/YaPineiro/Fin420/blob/main/YP__week_0_session_2_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python Packages


Say you create a set of functions in Excel. To share these functions you would have to send your spreadsheet to someone else.  The other person then opens your (entire) spreadsheet and copies and pastes the wanted formulas.  This is a hackish way to interact.

To share functions in Python, you put them into a package and submit the package to a package manager.  If anyone wants to use your package (functions/data structures) they install and load your package.  In this way, you can easily and safely share your code with millions of other people.  Alternatively, you can use easily use code written by others.

## Implications for Recreating Others' Work

In Excel you are constantly recreating functions you and others have already done.  This is due to the **inefficiency of sharing code between spreadsheets.**

In Python, once we are someone else creates a function, it can be easily shared.  Thus, we generally will not have to create our own functions, but rather we can use those created by others (note this makes the code documentation extra important).  This **frees us to spend our time understanding the implications of our calculations, and how it informs our financial decision**.  


<b>Python makes you a better financial analyst because you can focus on where you add value (making financial decisions), rather than on perpetually reimplementing calclations.</b>






## Python Packages in Colab

Most of the packages/modules we will use are preinstalled in Colab, and so we only need to load them.  Some packages we will have to install however.





See [here for a more complete description of the difference between a python package and module](https://stackoverflow.com/questions/7948494/whats-the-difference-between-a-python-module-and-a-python-package).

## Preinstalled Packages

Examples of a couple important preinstalled packages that we will use are `numpy` and `pandas`.  

Say we need to generate random numbers from a standard normal distribution.  We can load numpy's random module and then:


To load preinstalled packages we only need to:

In [1]:
import numpy as np
import pandas as pd

We can now use the packages:

In [2]:
np.sinh(4)

27.28991719712775

In [3]:
np.random.randn(3,3)

array([[-1.79773951,  0.70822446,  0.16609215],
       [ 0.62076052,  1.07866944, -0.49201295],
       [-2.2266123 , -0.95517844, -0.06737229]])

In [4]:
pd.DataFrame([[1, 2],
              [3, 4]])

Unnamed: 0,0,1
0,1,2
1,3,4


## Importing Packages

Above we wrote `import numpy as np`.  The `as np` portion simply allows us to a shorter way to refer to numpy and is unnecessary. If we leave it out we refer to functions in the package with `numpy.` instead of `np.`.

We refer to functions (such as `sinh`) provided by numpy with `np.sinh`.  This ensures there is no confusion with any other functions called `sinh` in other packages.

### Methods of Import

In others' Python code you may see:

`import numpy`

or

`from numpy import sinh`

While there are differences, it doesn't really matter which you use. For explanations see [this answer on Stackoverflow](https://stackoverflow.com/questions/710551/use-import-module-or-from-module-import) and [here in the Python documentation](https://docs.python.org/3/tutorial/modules.html#packages).

In [5]:
from numpy import random

In [6]:
random.randn(3,3)

array([[ 0.57901965, -0.12506327, -0.28719767],
       [ 0.4197095 ,  1.91843809,  0.07108273],
       [-0.16017834, -0.35211837, -0.45118467]])

### Find Module Functions

In [7]:
dir(np)

['ALLOW_THREADS',
 'BUFSIZE',
 'CLIP',
 'DataSource',
 'ERR_CALL',
 'ERR_DEFAULT',
 'ERR_IGNORE',
 'ERR_LOG',
 'ERR_PRINT',
 'ERR_RAISE',
 'ERR_WARN',
 'FLOATING_POINT_SUPPORT',
 'FPE_DIVIDEBYZERO',
 'FPE_INVALID',
 'FPE_OVERFLOW',
 'FPE_UNDERFLOW',
 'False_',
 'Inf',
 'Infinity',
 'MAXDIMS',
 'MAY_SHARE_BOUNDS',
 'MAY_SHARE_EXACT',
 'NAN',
 'NINF',
 'NZERO',
 'NaN',
 'PINF',
 'PZERO',
 'RAISE',
 'SHIFT_DIVIDEBYZERO',
 'SHIFT_INVALID',
 'SHIFT_OVERFLOW',
 'SHIFT_UNDERFLOW',
 'ScalarType',
 'True_',
 'UFUNC_BUFSIZE_DEFAULT',
 'UFUNC_PYVALS_NAME',
 'WRAP',
 '_CopyMode',
 '_NoValue',
 '_UFUNC_API',
 '__NUMPY_SETUP__',
 '__all__',
 '__builtins__',
 '__cached__',
 '__config__',
 '__deprecated_attrs__',
 '__dir__',
 '__doc__',
 '__expired_functions__',
 '__file__',
 '__former_attrs__',
 '__future_scalars__',
 '__getattr__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_add_newdoc_ufunc',
 '_builtins',
 '_core',
 '_distributor_init',
 '_financial_nam

In [8]:
np.array([[1, 2, 3.0], [2, 3, 4]]).shape

(2, 3)

## Installing Packages in Colab

However some Python packages we will use are not preinstalled.  We therefore must installed them prior to loading.  For example, if we try to load the bankfind package we get an error: `No module named 'bankfind'`

In [9]:
import bankfind

ModuleNotFoundError: No module named 'bankfind'

We need to first install the `bankfind` package by executing the following BASH command:

In [10]:
!pip install bankfind

Collecting bankfind
  Downloading bankfind-0.0.1-py2.py3-none-any.whl.metadata (4.0 kB)
Collecting requests==2.24.0 (from bankfind)
  Downloading requests-2.24.0-py2.py3-none-any.whl.metadata (6.9 kB)
Collecting chardet<4,>=3.0.2 (from requests==2.24.0->bankfind)
  Downloading chardet-3.0.4-py2.py3-none-any.whl.metadata (3.2 kB)
Collecting idna<3,>=2.5 (from requests==2.24.0->bankfind)
  Downloading idna-2.10-py2.py3-none-any.whl.metadata (9.1 kB)
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 (from requests==2.24.0->bankfind)
  Downloading urllib3-1.25.11-py2.py3-none-any.whl.metadata (41 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.1/41.1 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
Downloading bankfind-0.0.1-py2.py3-none-any.whl (47 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.2/47.2 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading requests-2.24.0-py2.py3-none-any.whl (61 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━

In [11]:
import bankfind as bf

In [12]:
vermont_json = bf.get_institutions(filters="STNAME:Vermont")
vermont_json

{'meta': {'total': 63,
  'parameters': {'filters': 'STNAME:Vermont',
   'fields': 'ACTIVE,ADDRESS,ASSET,BKCLASS,CB,CBSA,CBSA_DIV,CBSA_DIV_FLG,CBSA_DIV_NO,CBSA_METRO,CBSA_METRO_FLG,CBSA_METRO_NAME,CBSA_MICRO_FLG,CBSA_NO,CERT,CERTCONS,CFPBEFFDTE,CFPBENDDTE,CFPBFLAG,CHANGEC1,CHANGEC2,CHANGEC3,CHANGEC4,CHANGEC5,CHANGEC6,CHANGEC7,CHANGEC8,CHANGEC9,CHANGEC10,CHANGEC11,CHANGEC12,CHANGEC13,CHANGEC14,CHANGEC15,CHARTER,CHRTAGNT,CITY,CITYHCR,CLCODE,CMSA_NO,CMSA,CONSERVE,COUNTY,CSA,CSA_NO,CSA_FLG,DATEUPDT,DENOVO,DEP,DEPDOM,DOCKET,EFFDATE,ENDEFYMD,EQ,ESTYMD,FDICDBS,FDICREGN,FDICSUPV,FED,FED_RSSD,FEDCHRTR,FLDOFF,FORM31,HCTMULT,IBA,INACTIVE,INSAGNT1,INSAGNT2,INSBIF,INSCOML,INSDATE,INSDIF,INSFDIC,INSSAIF,INSSAVE,INSTAG,INSTCRCD,LAW_SASSER_FLG,MSA,MSA_NO,MUTUAL,NAME,NAMEHCR,NETINC,NETINCQ,NEWCERT,OAKAR,OCCDIST,OFFDOM,OFFFOR,OFFICES,OFFOA,OTSDIST,OTSREGNM,PARCERT,PROCDATE,QBPRCOML,REGAGNT,REGAGENT2,REPDTE,RISDATE,ROA,ROAPTX,ROAPTXQ,ROAQ,ROE,ROEQ,RSSDHCR,RUNDATE,SASSER,SPECGRP,SPECGRPN,STALP,STALPHCR,STC

In [13]:
vermont_json['data']

[{'ZIP': '05403',
  'SASSER': 0,
  'CHRTAGNT': 'STATE',
  'CONSERVE': 'N',
  'REGAGENT2': '',
  'STNAME': 'Vermont',
  'ROAQ': -0.74,
  'INSDATE': '08/02/2022',
  'TE06N528': '',
  'TE06N529': '',
  'OFFOA': 0,
  'FDICDBS': 2,
  'NAMEHCR': '',
  'OCCDIST': '',
  'CMSA': '',
  'DEPDOM': 230108,
  'CBSA_METRO_FLG': '1',
  'TE10N528': '',
  'NETINC': -1581,
  'CBSA_DIV_NO': '',
  'MUTUAL': '0',
  'OFFFOR': 0,
  'INSSAVE': 0,
  'CHARTER': '0',
  'RSSDHCR': '',
  'TE04N528': '',
  'TE04N529': '',
  'CERT': 59298,
  'STALP': 'VT',
  'SPECGRP': 4,
  'CFPBENDDTE': '9999-12-31',
  'TE09N528': '',
  'IBA': 0,
  'INSBIF': 0,
  'INSFDIC': 1,
  'ENDEFYMD': '12/31/9999',
  'MSA': '',
  'TE02N528': '',
  'CB': '1',
  'TE02N529': '',
  'TE07N528': '',
  'FDICSUPV': 'New York',
  'FED': '01',
  'REGAGNT': 'FDIC',
  'NEWCERT': 0,
  'ASSET': 257839,
  'CBSA_MICRO_FLG': '0',
  'OFFICES': 1,
  'STCNTY': '50007',
  'CSA_FLG': '1',
  'CITY': 'South Burlington',
  'CLCODE': '21',
  'INACTIVE': 0,
  'STALPHCR'

## A Note on JSON

These data are in a format known as JSON, which stands for Javascript Object Notation.  It is a convenient format for storing and transerring data (you can read more here: https://www.json.org/json-en.html).  Pandas known how to handle this type of data, however, and so we can convert it from JSON to a `DataFrame` easily with `json_normalize`.

In [14]:
banks_vermont = pd.json_normalize(vermont_json['data'])
banks_vermont[banks_vermont["NAMEHCR"].notna()]

Unnamed: 0,ZIP,SASSER,CHRTAGNT,CONSERVE,REGAGENT2,STNAME,ROAQ,INSDATE,TE06N528,TE06N529,...,TE08N528,NETINCQ,ESTYMD,FEDCHRTR,TRUST,ID,CHANGEC1,CHANGEC3,CHANGEC4,CHANGEC2
0,5403,0,STATE,N,,Vermont,-0.74,08/02/2022,,,...,,-441.0,08/02/2022,0,0,59298,,,,
1,5760,0,OCC,N,,Vermont,1.28,01/01/1934,,,...,,267.0,01/01/1863,1,0,6280,,,,
2,5478,0,STATE,N,,Vermont,1.38,04/02/1934,,,...,,1282.0,11/12/1866,0,1,14168,,,,
3,5301,0,STATE,N,,Vermont,1.22,08/09/1989,,,...,,867.0,01/01/1912,0,1,28837,,,,
4,5201,0,OCC,N,,Vermont,0.79,08/09/1989,,,...,,1138.0,01/01/1917,1,0,30350,,,,
5,5081,0,STATE,N,,Vermont,0.42,03/31/1934,,,...,,251.0,01/01/1892,0,0,14136,,,,
6,5753,0,OCC,N,,Vermont,0.87,01/01/1934,,,...,,1205.0,11/09/1831,1,1,6275,,,,
7,5055,0,OCC,N,,Vermont,0.44,05/22/1991,,,...,,1009.0,05/22/1991,1,1,33418,,,,
8,5855,0,OCC,N,,Vermont,1.2,01/01/1934,,,...,,3415.0,01/15/1851,1,1,6271,,,,
9,5819,0,STATE,N,,Vermont,1.07,03/31/1934,,,...,,2297.0,02/24/1853,0,1,14134,,,,


Which returns a list of banks in Vermont.  Note the use of `.notna()`.  We'll unpack the line `banks_vermont[banks_vermont["NAMEHCR"].notna()]` in following sessions.  However, here it is worth knowing that it returns the `banks_vermont` DataFrame with any row where the value of `NAMEHCR` is removed.

# Finding Python Packages

To search for python packages you can install via pip, you can use the PyPi website: [https://pypi.org/](https://pypi.org/).  Note, you used to be able (and may be able to agin in the near future) use `pip search` at the command line.  For example we could search for any finance packages with:

In [15]:
!pip search finance

[31mERROR: XMLRPC request failed [code: -32500]
RuntimeError: PyPI no longer supports 'pip search' (or XML-RPC search). Please use https://pypi.org/search (via a browser) instead. See https://warehouse.pypa.io/api-reference/xml-rpc.html#deprecated-methods for more information.[0m[31m
[0m

However this functionality has been temporarily/permanently removed.

# Using Packages

The package's pip page, will usually link to the source code repository.  This is most likely a GitHub or GitLab repo.  The repo will contain a README.md file which will explain how to use the package (or link to the documentation housed elsewhere).  

For example, here is [the repo with the README.md for the bankfind package](https://github.com/dpguthrie/bankfind).  The README.md file is automatically displayed below the code.  It is a fairly simple package so the README overview is short.  The README does link to more detailed documentation, as well as the documentation for the underlying data from the FDIC.  

## Security

It is possible that there is malicious code in Python packages hosted on pip.  The probability is low however, particularly if you are using well-known packages.  Here are a couple of pointers:

-  Read the source code.  It doesn't take as long as you think.
-  Don't run packages with administrator privileges.
-  Only install packages with Github or Gitlab repos that you can access (goes to point 1).

In general package security isn't much of an issue, and if you ever have concerns about a package run it in something like Colab.


# Exercise

*The following exercise may not work. The `cbpro` package doesn't work with python 3.10+ and the alternative `coinbasepro` package also won't load*

Find a package that wraps the coinbase-pro API ([this package may do the trick](https://github.com/danpaquin/coinbasepro-python)).  Use the package to get [a list of available currency pairs for trading](https://docs.pro.coinbase.com//#get-products).  All the code you will need to get the list of currency pairs is in the [GitHub README](https://github.com/danpaquin/coinbasepro-python).

# Alternative Exercise

Find a package that will allow you to download data from Yahoo Finance. Use the package to download stock data for ticker `MSFT`.

# Alternative Exercise Solution

In [3]:
!pip install yfinance



In [11]:
import yfinance as yfi

In [12]:
ticker = "MSFT"

In [16]:
data = yfi.download(ticker, start="2024-01-01", end="2024-12-31")

[*********************100%***********************]  1 of 1 completed


In [17]:
print(data.head())

Price            Close        High         Low        Open    Volume
Ticker            MSFT        MSFT        MSFT        MSFT      MSFT
Date                                                                
2024-01-02  368.117249  373.109913  364.047674  371.085046  25258600
2024-01-03  367.849274  370.489534  365.774790  366.271079  23083500
2024-01-04  365.208984  370.330688  364.444711  367.918732  20901500
2024-01-05  365.020416  369.298423  363.779694  366.231362  20987000
2024-01-08  371.908905  372.415129  366.271072  366.558897  23134000


# Coinbase Exercise Solution

In [16]:
!pip install cbpro

Collecting cbpro
  Downloading cbpro-1.1.4-py2.py3-none-any.whl.metadata (15 kB)
Collecting sortedcontainers>=1.5.9 (from cbpro)
  Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl.metadata (10 kB)
Collecting requests==2.13.0 (from cbpro)
  Downloading requests-2.13.0-py2.py3-none-any.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.5/44.5 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting six==1.10.0 (from cbpro)
  Downloading six-1.10.0-py2.py3-none-any.whl.metadata (1.3 kB)
Collecting websocket-client==0.40.0 (from cbpro)
  Downloading websocket_client-0.40.0.tar.gz (196 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m196.2/196.2 kB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pymongo==3.5.1 (from cbpro)
  Downloading pymongo-3.5.1.tar.gz (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m44.7 MB/s

In [17]:
import cbpro

ImportError: cannot import name 'MutableMapping' from 'collections' (/usr/lib/python3.11/collections/__init__.py)

In [1]:
!pip install coinbasepro

Collecting coinbasepro
  Downloading coinbasepro-0.4.1-py3-none-any.whl.metadata (9.1 kB)
Collecting requests>=2.20.0 (from coinbasepro)
  Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Downloading coinbasepro-0.4.1-py3-none-any.whl (23 kB)
Downloading requests-2.32.3-py3-none-any.whl (64 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.9/64.9 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: requests, coinbasepro
  Attempting uninstall: requests
    Found existing installation: requests 2.13.0
    Uninstalling requests-2.13.0:
      Successfully uninstalled requests-2.13.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
bankfind 0.0.1 requires requests==2.24.0, but you have requests 2.32.3 which is incompatible.
cbpro 1.1.4 requires requests==2.13.0, but you have requests 2.32.3 which is

In [2]:
import coinbasepro