# Why Python For Finance?

## What is Python?

Python is an **_interpreted_, _object-oriented_, _high-level programming language_** with **_dynamic semantics_**. 

Its high-level built in data structures, combined with **_dynamic typing_** and **_dynamic binding_**, make it very attractive for **Rapid Application Development**, as well as for use as a **scripting** or **glue** language to connect existing components together. 

Python’s simple, easy to learn syntax emphasizes **readability** and therefore reduces the cost of program maintenance.

Python supports **modules** and **packages**, which encourages program **modularity** and **code reuse**. 

The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed.

Python is characterized by the following features:
- Open source
- Interpreted
- Multiparadigm
- Multipurpose
- Corss-platform
- Dynamically typed
- Indentation aware
- Garbage Collecting

In [None]:
import this

### Brief History of Python

Development efforts began in the 1980s by **Guido van Rossum** from the Netherlands
- Python 0.9.0 released in 1991 (first release)
- ...
- Python 2.7 released in 2010
- ...
- Python 3.4 released in 2014
- Python 3.6

### The Python Ecosystem

**Availability of a large number of libraries and tools**

These libraries and tools generally have to be _imported_ when needed (e.g., a plotting library) or have to be started as a separate system process (e.g., a Python development environment). 

_Importing_ means making a library available to the current namespace and the current Python interpreter process.

In [None]:
100 * 2.5 + 50

In [None]:
log(1)

In [None]:
from math import *

In [None]:
log(1)

In [None]:
import math

In [None]:
math.log(1)

_math_ is a standard Python library available with any installation.

**IPython** is also sometimes called the killer application of the Python ecosystem.

In [None]:
math.log?

IPython comes in three different versions: a _shell_ version, one based on a _QT_ graphical user interface (the _QT console_), and a browser-based version (the _Notebook_). 

### Python User Spectrum

Not only for professional software developers also for the casual developer as well as for domain experts and scientific developers.
- _Professional software developers_ find all that they need to efficiently build large applications. Almost all programming paradigms are supported; there are powerful development tools available; and any task can, 
- _Scientific developers_ or _domain experts_ are generally heavy users of certain libraries and frameworks, have built their own applications that they enhance and optimize over time, and tailor the ecosystem to their specific needs. These groups of users also generally engage in longer interactive sessions, rapidly prototyping new code as well as exploring and visualizing their research and/or domain data sets.
- _Casual programmers_ like to use Python generally for specific problems they know that Python has its strengths in.

### The Scientific Stack

The set of libraries:
- NumPy
- SciPy
- matplotlib
- Pytables
- Pandas

## Technology in Finance

Contemplate the role of technology in finance

In recent years, spurred by innovation and also regulation, banks and other financial institutions like hedge funds have evolved more and more into technology companies instead of being just financial intermediaries.

### Technology Spending

Not only technology is important for the financial industry, but the financial industry is also really important to the technology sector:

> Banks will spend 4.2% more on technology in 2014 than they did in 2013, according to IDC analysts. 
Overall IT spend in financial services globally will exceed \$430 billion in 2014 and surpass \$500 billion by 2020, the analysts say. 
> <div style="text-align: right">— Crosman 2013</div>

### Technology as Enabler

> Technological innovations have contributed significantly to greater efficiency in the derivatives market. Through innovations in trading technology, trades at Eurex are today executed much faster than ten years ago despite the strong increase in trading volume and the number of quotes … These strong improvements have only been possible due to the constant, high IT investments by derivatives exchanges and clearing houses.

> <div style="text-align: right"> — Deutsche Börse Group 2008</div>

### Technology and Talent as Barriers to Entry

> Aggregated over the total software lifecycle, firms adopting in-house strategies for OTC [derivatives] pricing will require investments between \$25 million and \$36 million alone to build, maintain, and enhance a complete derivatives library.

> <div style="text-align: right">— Ding 2010 </div>

> Meriwether spent \$20 million on a state-of-the-art computer system and hired a crack team of financial engineers to run the show at LTCM, which set up shop in Greenwich, Connecticut. It was risk management on an industrial level.
> <div style="text-align: right"> — Patterson 2010 </div>

### Ever-Increasing Speeds, Frequencies, Data Volumes

> Renaissance’s Medallion fund gained an astonishing 80 percent in 2008, capitalizing on the market’s extreme volatility with its lightning-fast computers. Jim Simons was the hedge fund world’s top earner for the year, pocketing a cool \$2.5 billion.

> <div style="text-align: right"> — Patterson 2010 </div>

A number of challenges:
- Data processing
- Analytics speed
- Theoretical foundations

### The Rise of Real-Time Analytics

Speeds, frequencies, and data volumes increase at a rapid pace in the industry. In fact, real-time analytics can be considered the industry’s answer to this trend.

“Financial and data analytics” refers to the discipline of applying software and technology in combination with (possibly advanced) algorithms and methods to gather, process, and analyze data in order to gain insights, to make decisions, or to fulfill regulatory requirements.

Two major challenges that financial institutions:
- Big data
- Real-time economy

One can observe an interplay between advances in technology and financial/business practice.

One major trend in the analytics space has been the utilization of parallel architectures on the CPU (central processing unit) side and massively parallel architectures on the GPGPU (general-purpose graphical processing units) side. 

## Python for Finance

Explain how Python can help in several of the challenges.

### Finance and Python Syntax

The Python syntax is generally quite close to the mathematical syntax used to describe scientific problems or financial algorithms. To illustrate this phenomenon, we will consider a Black-Scholes-Merton (BSM) setup (see also later) in which the option’s underlying risk factor follows a geometric Brownian motion. 

Suppose we have the following numerical parameter values for the valuation:
* Initial stock index level S0 = 100
* Strike price of the European call option K = 105
* Time-to-maturity T = 1 year
* Constant, riskless short rate r = 5%
* Constant volatility = 20%

**Equation 1-1**. _Black-Scholes-Merton (1973) index level at maturity_

$$ S_{T} = S_{0} \exp ((r - \frac{1}{2}\sigma^2)T + \sigma \sqrt{T} z)$$ 

An algorithmic description of the Monte Carlo valuation procedure:
1. Draw I (pseudo)random numbers z(i), i ∈ {1, 2, …, I}, from the standard normal distribution.
2.	Calculate all resulting index levels at maturity ST(i) for given z(i) and Equation 1-1.
3.	Calculate all inner values of the option at maturity as hT(i) = max(ST(i) – K,0).
4.	Estimate the option present value via the Monte Carlo estimator given in Equation 1-2.

**Equation 1-2**. _Monte Carlo estimator for European option_

$$ C_{0} \approx e^{-rT} \frac{1}{I}\sum_{I}h_{T}(i)$$

In [None]:
S0 = 100.
K = 105.
T = 1.0
r = 0.05
sigma = 0.2

In [None]:
from numpy import *

In [None]:
I = 100000

In [None]:
z = random.standard_normal(I)
ST = S0 * exp((r - 0.5 * sigma ** 2) * T + sigma * sqrt(T) * z)
hT = maximum(ST - K, 0)
C0 = exp(-r * T) * sum(hT) / I

In [None]:
print("Value of the European Call Option %5.3f" %C0)

Three aspects are worth considering:
* Syntax
* Translation
* Vectorization

If the code is reused regularly, it typically gets organized in so-called modules (or scripts), which are single Python (i.e., text) files with the suffix .py. 

In [None]:
#
# Monte Carlo valuation of European call option
# in Black-Scholes-Merton model
# bsm_mcs_euro.py
#
import numpy as np

# Parameter Values
S0 = 100.	# initial index level
K = 105.	# strike price
T = 1.0	# time-to-maturity
r = 0.05	# riskless short rate
sigma = 0.2	# volatility

I = 100000	# number of simulations

# Valuation Algorithm
z = np.random.standard_normal(I)	# pseudorandom numbers
ST = S0 * np.exp((r - 0.5 * sigma ** 2) * T + sigma * np.sqrt(T) * z)

# index values at maturity
hT = np.maximum(ST - K, 0)	# inner values at maturity
C0 = np.exp(-r * T) * np.sum(hT) / I	# Monte Carlo estimator

# Result Output
print("Value of the European Call Option %5.3f" %C0)

* **English** for writing, talking about scientific and financial problems, etc. 
* **Mathematics** for concisely and exactly describing and modeling abstract aspects, algorithms, complex quantities, etc.
* **Python** for technically modeling and implementing abstract aspects, algorithms, complex quantities, etc.

### Efficiency and Productivity Through Python

Benefits from using Python can be measured in three dimensions:
* Efficiency
* Productivity
* Quality

Consider a finance student, writing her master’s thesis and interested in Google stock prices. She wants to analyze historical stock price information for, say, five years to see how the volatility of the stock price has fluctuated over time. She wants to find evidence that volatility, in contrast to some typical model assumptions, fluctuates over time and is far from being constant. The results should also be visualized. She mainly has to do the following:
* Download Google stock price data from the Web.
* Calculate the rolling standard deviation of the log returns (volatility).
* Plot the stock price data and the results.

First, make sure to have available all necessary libraries:

In [None]:
!conda list

"pandas_datareader" package is not installed!

In [None]:
import numpy as np
import pandas as pd
import pandas_datareader.data as web
import datetime

Second, retrieve the data from, say, Google :

In [None]:
start = datetime.datetime(2009, 3, 14)
end = datetime.datetime(2014, 4, 14)

goog = web.DataReader('GOOG', 'google', start, end)

In [None]:
goog.tail()

In [None]:
goog['Open'].tail()

Third, implement the necessary analytics for the volatilities:

In [None]:
goog['Log_Ret'] = np.log(goog['Close'] / goog['Close'].shift(1))
goog['Volatility'] = pd.rolling_std(goog['Log_Ret'], window=252) * np.sqrt(252)

In [None]:
goog['Volatility'] = goog['Log_Ret'].rolling(window=252,center=False).std() * np.sqrt(252)

Fourth, plot the results.

In [None]:
%matplotlib inline

In [None]:
goog[['Close', 'Volatility']].plot(subplots=True, color='blue', figsize=(8, 6))

#### Ensuring high performance

Due to the very nature of Python being an interpreted language, the _prejudice_ persists that Python generally is too slow for compute-intensive tasks in finance. . In principle, one can distinguish at least three different strategies for better performance:
* Paradigm
* Compiling
* Parallelization

A quite common task in financial analytics is to evaluate complex mathematical expressions on large arrays of numbers. To this end, Python itself provides everything needed:

In [None]:
loops = 25000000
from math import *
a = range(1, loops)
def f(x):
    return 3 * log(x) + cos(x) ** 2

%timeit r = [f(x) for x in a]

The same task can be implemented using <code>NumPy</code>, which provides optimized (i.e., _pre-compiled_), functions to handle such array-based operations:

In [None]:
import numpy as np

a = np.arange(1, loops)

%timeit r = 3 * np.log(a) + np.cos(a) ** 2

There is even a library specifically dedicated to this kind of task. It is called <code>numexpr</code>, for “numerical expressions.” It _compiles_ the expression to improve upon the performance of NumPy’s general functionality by:

In [None]:
import numexpr as ne

ne.set_num_threads(1)
f = '3 * log(a) + cos(a) ** 2'

%timeit r = ne.evaluate(f)

<code>numexpr</code> also has built-in capabilities to parallelize the execution of the respective operation. This allows us to use all available threads of a CPU:

In [None]:
ne.set_num_threads(4)

%timeit r = ne.evaluate(f)

### From Prototyping to Production

Efficiency in interactive analytics and performance when it comes to execution speed are certainly two benefits of Python to consider. 
- Yet another major benefit of using Python for finance might at first sight seem a bit subtler; 
- at second sight it might present itself as an important strategic factor. 

It is the possibility to use Python end to end, from prototyping to production.

Today’s practice in financial institutions is often characterized by a separated, two-step process. 
1. There are the _quantitative analysts (“quants”)_ responsible for model development and technical prototyping. One is mainly looking for a proof of concept and/or a prototype that exhibits the main desired features of an algorithm or a whole application.
2. IT departments with their _developers_ take over and are responsible for translating the existing _prototype code_ into reliable, maintainable, and performant _production code_.

This two-step approach has a number of generally unintended consequences:
* Inefficiencies
* Diverse skill set
* Legacy code

Using Python, on the other hand, enables a _streamlined_ end-to-end process from the first interactive prototyping steps to highly reliable and efficiently maintainable production code.