# Introduction to Python for quantitative finance.


![UCLouvain_Logo_Pos_CMJN.png](attachment:UCLouvain_Logo_Pos_CMJN.png)



*Author* : Ubeydullah Ozcan, Risk Modeling and Pricing Actuary, Allianz Benelux (ozcanubey@outlook.com)


## Course 1 : Variables, loops, functions, import data and plots
In this first course, we are going to discuss about some basics but crucial concept as variables, loops and functions. This course aims to give you the basics of Python programming in order to give you some keys to develop yourself in the future.

A **variable** is reserved memory location to store values. In other words, variable is just a space where you put whatever you want. In Python, a variable can store numbers, list, tuple, strings, dictionnary, dataframe, etc ..

A **loop** in a computer program is an instruction that repeats until a specified condition is reached. In this introduction, we will discuss in more details about "for" loop and "while" loop. Keep in mind that loops are among the most basic and powerful of programming concepts. 

Python allows you to import/export in different format like excel, csv, txt, json, sql, etc.. In this course, we will focus on how to import data from a web browser (e.g : yahoo finance) and also how to import data from an external files.

A quick introduction to make some plot will be given but later in the course we will give more details about creating some visuals.

Finally, a **function** is a group of related statements that performs a specific task. Functions help break our program into smaller and modular chunks. As our program grows larger and larger, functions make it more organized and manageable. Furthermore, it avoids repetition and makes the code reusable. **REMARK** : in Python, you have some functionaly that has been already put in function which is different from some function that we will implement by our own. For example : *mean()* function exist in python and you will implement it again. But if you want to do a specific tasks (data analysis for a stock return for a certain period) and you want to **automate** this task, then you will create your own function !

### 1. Variables
Let's start with the most basic concept which is variable. Here, some method to create variables :     

In [2]:
a = 123 # integer
b = "Hello World!" # String
c = 12.2 # float
d = True # boolean
e = a == b # boolean
print(a, b, c, d, e) # statement to print out result

123 Hello World! 12.2 True False


What we can do with all of this ? For example, we can do calculation, we can check condition or look to an element of a string : 

In [30]:
print(a + c, b[0], b[2:5])

135.2 H llo


Here, an other important type of variable which is **List** :

In [33]:
list = [ 'abcd', 786 , 2.23, 'john', 70.2 ]
tinylist = [123, 'john']

print(list)          # Prints complete list
print(list[0])       # Prints first element of the list
print(list[1:3])     # Prints elements starting from 2nd till 3rd 
print(list[2:])      # Prints elements starting from 3rd element
print(tinylist * 2)  # Prints list two times
print(list + tinylist) # Prints concatenated lists

['abcd', 786, 2.23, 'john', 70.2]
abcd
[786, 2.23]
[2.23, 'john', 70.2]
[123, 'john', 123, 'john']
['abcd', 786, 2.23, 'john', 70.2, 123, 'john']


In [35]:
dict = {}
dict['one'] = "This is one"
dict[2]     = "This is two"

tinydict = {'name': 'john','code':6734, 'dept': 'sales'}


print(dict['one'])       # Prints value for 'one' key
print(dict[2])           # Prints value for 2 key
print(tinydict)          # Prints complete dictionary
print(tinydict.keys())   # Prints all the keys
print(tinydict.values()) # Prints all the values

This is one
This is two
{'name': 'john', 'code': 6734, 'dept': 'sales'}
dict_keys(['name', 'code', 'dept'])
dict_values(['john', 6734, 'sales'])


This a course of programming in quantitative finance, now we know variable, Let's do some math !! 

Let's take some value and see how arithmetic works in Python : 

In [3]:
x = 3
y = 5.3
print(x + y)
print(x - y)
print(x * y)
print(x / y)
print(y ** x)

8.3
-2.3
15.899999999999999
0.5660377358490566
148.87699999999998


In [39]:
print(x <= y)
print(x == y)
print(x >= y)
print(x != y)

True
False
False
True


An other important concept in programming in the logic. The most important logic functions in mathematic are **AND**, **OR** and **XOR**

In [41]:
print(False & True)
print(False | True)
print(False ^ True)

False
True
True


### 2. Loops (If/else, For, While)

Let's now introduce one of the most important concept in any programming language, the If/Else statement and the For and While loops ! These tools are crucial to become, one day, an active programmer ! Here a flow-chart to visualize how a simple if/else statement works. 

![If.PNG](attachment:If.PNG)

And here some example of how to use If/Else in Python :

In [4]:
x = 99

if x % 2 == 0:
    print(x, "is an even number")
else:
    print(x, "is an odd number")

99 is an odd number


"%" is the modulus operator. So, as we know, if the modulus of a number divided by 2 is null, that means that this number is even, otherwhise, the number of an odd number. With this few python code, we can check is a number is even or odd.

And what about if we want to construct a more complex condition statement ? You have to use "Elif" Statement which is "Else If" to add more conditions. Here an example.

In [54]:
x = -2

if x > 0:
    print(x, 'Positif')
elif x == 0:
    print(x, "null")
else:
    print(x, "negatif")

-2 negatif


The piece of code above check if a number is positif, negatif or null. 

Not enough ? Let's try more complex with some logic functions. The next loop will check if 2 numbers are both positif or both null or if one of them is negatif or null.

In [99]:
x = 2
y = 3

if (x > 0) & (y > 0):
    print(x, y, 'Positif')
elif (x == 0) & (y == 0):
    print(x, y,  "null")
else:
    print("one of them is negatif or null")

2 3 Positif


As you can see, sky is the limit !

Let's now introduction For and While loops. These tools are useful is we want to implement an iterative operation. 

**For** loop is useful when the goal is to do some operation over all element of an object.

**While** loop is useful when the goal is to do some operation until a condition is reach.

An example is always better of theory !

In [83]:
Loop_1 = ["Hello", "World", 10.2]
n = len(Loop_1) 
for i in range(n):
    print(Loop_1[i])

Hello
World
10.2


The function *range(3)* create a sequence of 3 number : 0, 1 and 2.

Let's now image we want to print all even number between 1 and 20. Here how to do:

In [15]:
Loop_2 = range(1,20)
n = len(Loop_2)
for i in range(1,n):
    p = Loop_2[i]
    if p % 2 == 0:
        print(p)
    else:
        i = i + 1

2
4
6
8
10
12
14
16
18
20


Now let's see how While loop work. To recall, while loop do some operation until a condition has been reached.

In [89]:
x = 0

while x < 10:
    print(x)
    x += 1

0
1
2
3
4
5
6
7
8
9


$x +=1$ is equivalent to $x = x + 1$ in Python. With while loops, the most important thing to make attention is the infinity loops

### 3. Functions

This section of this introduction will focus on creating function. The code that we created before are just some script. That is not the most efficient way to work if we want to automate some calcul or reproduce some calcul. This is why we need to define functions. 

Function is a python object which gives you the flexibity to define your inputs and outputs. Let's create a function that will check if the result of a sum of a 2 numbers is odd or even. Here is the structure of the code

In [2]:
def parity(x,y):
    z = x + y
    if(z % 2 == 0):
        print("Sum of ", x, " and ", y," is an even number")
    else:
        print("Sum of ", x, " and ", y," is an odd number")
    
    return z

In [8]:
parity(27, 39)

Sum of  27  and  39  is an even number


66

### 4. Import data

A very important feature in Python is the capability to easily import different types of data and manipulate them. In the same time, let's introduce the Python package Pandas which is usefull to import and manipulate dataset. To install a Python package, you can pip it or an easier way in to install it using anaconda package installer.

Once it's install, here how to load the package in your environment.

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv("Data/Course 1/training.csv")
df.head()

Unnamed: 0,id_policy,year,pol_no_claims_discount,pol_coverage,pol_duration,pol_sit_duration,pol_pay_freq,pol_payd,pol_usage,drv_sex1,...,vh_make_model,vh_age,vh_fuel,vh_type,vh_speed,vh_value,vh_weight,population,town_surface_area,claim_amount
0,PL000000,1.0,0.332,Med2,5,1,Monthly,No,WorkPrivate,M,...,aparvvfowrjncdhp,8.0,Gasoline,Tourism,174.0,11040.0,1143.0,1270.0,33.1,0.0
1,PL042495,1.0,0.0,Med2,6,1,Monthly,No,WorkPrivate,M,...,aparvvfowrjncdhp,10.0,Diesel,Tourism,174.0,11040.0,1143.0,1290.0,51.3,0.0
2,PL042496,1.0,0.196,Med1,2,1,Yearly,Yes,Retired,M,...,iwhqpdfuhrsxyqxe,8.0,Diesel,Commercial,150.0,14159.0,1193.0,1020.0,262.8,0.0
3,PL042497,1.0,0.0,Med2,8,5,Yearly,No,WorkPrivate,F,...,kvcddisqpkysmvvo,4.0,Gasoline,Tourism,149.0,17233.0,1012.0,180.0,219.7,0.0
4,PL042498,1.0,0.0,Med1,2,2,Yearly,No,Retired,F,...,tdgkjlphosocwbgu,13.0,Gasoline,Tourism,200.0,19422.0,1315.0,30.0,70.3,0.0


Using the pandas package, the data has been automatically stored in a dataframe which a type of object in Python widely used to store and manipulate data. 
Let's now try to manipulate this dataframe.

In [13]:
df["id_policy"]

0         PL000000
1         PL042495
2         PL042496
3         PL042497
4         PL042498
            ...   
228211    PL008818
228212    PL055033
228213    PL061619
228214    PL060903
228215    PL052240
Name: id_policy, Length: 228216, dtype: object

In [14]:
df[["id_policy", "year"]]

Unnamed: 0,id_policy,year
0,PL000000,1.0
1,PL042495,1.0
2,PL042496,1.0
3,PL042497,1.0
4,PL042498,1.0
...,...,...
228211,PL008818,4.0
228212,PL055033,4.0
228213,PL061619,4.0
228214,PL060903,4.0


In [15]:
df[df["year"] == 4].head()

Unnamed: 0,id_policy,year,pol_no_claims_discount,pol_coverage,pol_duration,pol_sit_duration,pol_pay_freq,pol_payd,pol_usage,drv_sex1,...,vh_make_model,vh_age,vh_fuel,vh_type,vh_speed,vh_value,vh_weight,population,town_surface_area,claim_amount
171162,PL005608,4.0,0.0,Med2,7,6,Biannual,No,WorkPrivate,M,...,gapclpflkdsbeorm,16.0,Diesel,Commercial,150.0,25662.0,1780.0,330.0,38.7,0.0
171163,PL082834,4.0,0.0,Max,29,13,Yearly,No,Retired,M,...,aparvvfowrjncdhp,16.0,Diesel,Tourism,174.0,11040.0,1143.0,80.0,111.7,0.0
171164,PL095626,4.0,0.196,Max,12,7,Yearly,No,WorkPrivate,F,...,iulvirmzdntweaee,13.0,Gasoline,Tourism,164.0,14696.0,936.0,30.0,138.6,0.0
171165,PL085678,4.0,0.0,Med2,17,8,Biannual,No,WorkPrivate,M,...,iulvirmzdntweaee,18.0,Diesel,Tourism,164.0,14696.0,936.0,1020.0,359.4,0.0
171166,PL067999,4.0,0.0,Max,17,4,Monthly,No,WorkPrivate,M,...,jjycmklnkdivnypu,4.0,Gasoline,Tourism,174.0,13904.0,1260.0,280.0,745.4,0.0


In [19]:
df_filtered = df[(df["year"] == 1) & (df["pol_duration"] < 10)]
print(df_filtered)

      id_policy  year  pol_no_claims_discount pol_coverage  pol_duration  \
0      PL000000   1.0                   0.332         Med2             5   
1      PL042495   1.0                   0.000         Med2             6   
2      PL042496   1.0                   0.196         Med1             2   
3      PL042497   1.0                   0.000         Med2             8   
4      PL042498   1.0                   0.000         Med1             2   
...         ...   ...                     ...          ...           ...   
57043  PL000210   1.0                   0.000          Max             1   
57044  PL004688   1.0                   0.225          Max             1   
57045  PL014290   1.0                   0.225          Max             7   
57047  PL003646   1.0                   0.000          Max             1   
57052  PL012984   1.0                   0.000          Min             2   

       pol_sit_duration pol_pay_freq pol_payd    pol_usage drv_sex1  ...  \
0          

In [20]:
import openpyxl
df_filtered.to_excel(r'Output\filtered.xlsx', index = True)

In [46]:
df[(df["year"] == 1) & (df["pol_duration"] < 10)][["id_policy", "drv_sex1", "claim_amount"]]

Unnamed: 0,id_policy,drv_sex1,claim_amount
0,PL000000,M,0.0
1,PL042495,M,0.0
2,PL042496,M,0.0
3,PL042497,F,0.0
4,PL042498,F,0.0
...,...,...,...
57043,PL000210,F,0.0
57044,PL004688,M,0.0
57045,PL014290,M,0.0
57047,PL003646,M,0.0


In [35]:
df.groupby(['pol_coverage','drv_sex1'])["claim_amount"].sum()

pol_coverage  drv_sex1
Max           F            9274075.19
              M           12804380.74
Med1          F             331688.34
              M             719897.96
Med2          F             931941.23
              M            1510278.20
Min           F             124681.46
              M             361044.96
Name: claim_amount, dtype: float64

In [47]:
df.groupby(['pol_coverage','drv_sex1'])['claim_amount'].agg('sum')

pol_coverage  drv_sex1
Max           F            9274075.19
              M           12804380.74
Med1          F             331688.34
              M             719897.96
Med2          F             931941.23
              M            1510278.20
Min           F             124681.46
              M             361044.96
Name: claim_amount, dtype: float64

In [50]:
df.groupby(['pol_coverage','drv_sex1'])['claim_amount'].agg('mean')

pol_coverage  drv_sex1
Max           F           145.841723
              M           154.407312
Med1          F            45.213787
              M            49.799250
Med2          F            64.772118
              M            58.802297
Min           F            24.767870
              M            24.391634
Name: claim_amount, dtype: float64

#### Data from web browser
Let's now see how to download data directly from a web browser and manipulate it.


In [1]:
import pandas_datareader.data as web
import datetime

In [2]:
start = datetime.datetime(2021, 8, 1)
end = datetime.datetime(2021, 8, 31)

In [7]:
apple = web.DataReader('AAPL', 'yahoo', start, end)
apple.head()

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2021-08-02,146.949997,145.25,146.360001,145.520004,62880000.0,145.302307
2021-08-03,148.039993,145.179993,145.809998,147.360001,64786600.0,147.139542
2021-08-04,147.789993,146.279999,147.270004,146.949997,56368300.0,146.730164
2021-08-05,147.839996,146.169998,146.979996,147.059998,46397700.0,146.839996
2021-08-06,147.110001,145.630005,146.350006,146.139999,54067400.0,146.139999


In [2]:
google["Close"].head()

NameError: name 'google' is not defined

### 5.Vector and matrix 
In this course, we are going to discuss about Numpy and Pandas in Python. Numpy is the fundamental package for scientific computing with Python and pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language. Using these both packages, we are going to create and manipulate vector, matrix, dataframe and use these objects to do calculation or data analysis. 

In [3]:
import numpy as np
import pandas as pd

In [4]:
# This is a vector
a = np.array([1,2,3])
print(a)

[1 2 3]


In [5]:
# This is a matrix
b = np.array([[9.0,8.0,7.0],[6.0,5.0,4.0]])
print(b)

[[9. 8. 7.]
 [6. 5. 4.]]


In [6]:
# Get Number of array dimensions. 
a.ndim

1

In [7]:
b.shape

(2, 3)

### Accessing/Changing specific elements, rows, columns, etc

In [9]:
s = np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])
s

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [10]:
print(s[1,2])
print(s[-1,-1])

6
12


In [11]:
t = s[0,:]
t

array([1, 2, 3])

In [12]:
s[1,2] = 20
s

array([[ 1,  2,  3],
       [ 4,  5, 20],
       [ 7,  8,  9],
       [10, 11, 12]])

In [13]:
s[:,0] = [1,2,3,4]
s

array([[ 1,  2,  3],
       [ 2,  5, 20],
       [ 3,  8,  9],
       [ 4, 11, 12]])

In [14]:
b = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
print(b)

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


In [15]:
b[0,1,1] # First matrix, row 2, col 2

4

In [16]:
b[1,1,1] # Second matrix, row 2, col 2

8

In [17]:
b[1,1,1] # Second matrix, row 2, col 2

8

In [18]:
b[:,1,:] = [[9,9],[8,8]]
b

array([[[1, 2],
        [9, 9]],

       [[5, 6],
        [8, 8]]])

In [19]:
np.zeros((2,3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [20]:
np.ones((4,2,2))

array([[[1., 1.],
        [1., 1.]],

       [[1., 1.],
        [1., 1.]],

       [[1., 1.],
        [1., 1.]],

       [[1., 1.],
        [1., 1.]]])

In [21]:
np.full((2,2), 99)

array([[99, 99],
       [99, 99]])

In [22]:
np.full_like(a, 4)

array([4, 4, 4])

In [23]:
np.random.rand(4,2)

array([[0.23039198, 0.80699189],
       [0.95111022, 0.72014514],
       [0.20211665, 0.41038666],
       [0.55220755, 0.54964209]])

In [24]:
np.random.randint(-4,8, size=(3,3))

array([[ 6,  1,  7],
       [-1,  3,  7],
       [ 4,  2,  3]])

In [25]:
np.identity(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [26]:
arr = np.array([[1,2,3]])
r1 = np.repeat(arr,3, axis=0)
print(r1)

[[1 2 3]
 [1 2 3]
 [1 2 3]]


In [27]:
output = np.ones((5,5))
print(output)

z = np.zeros((3,3))
z[1,1] = 9
print(z)

output[1:-1,1:-1] = z
print(output)

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
[[0. 0. 0.]
 [0. 9. 0.]
 [0. 0. 0.]]
[[1. 1. 1. 1. 1.]
 [1. 0. 0. 0. 1.]
 [1. 0. 9. 0. 1.]
 [1. 0. 0. 0. 1.]
 [1. 1. 1. 1. 1.]]


In [28]:
a = np.array([1,2,3,4])
b = np.array([5,6,7,8])
print(a+2)
print(a*2)
print(a+b)
print(a**2)
print(np.cos(a))
print(np.sum(a))

[3 4 5 6]
[2 4 6 8]
[ 6  8 10 12]
[ 1  4  9 16]
[ 0.54030231 -0.41614684 -0.9899925  -0.65364362]
10


### Matrix : product, inverse, determinant, etc..

In [29]:
m = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
n = np.array([[10, 11, 12], [13, 14, 15], [16, 17, 18]])
a = np.array([1,2,3,4])
b = np.array([5,6,7,8])

In [31]:
print(m)
print(n)
print(a)
print(b)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[10 11 12]
 [13 14 15]
 [16 17 18]]
[1 2 3 4]
[5 6 7 8]


In [35]:
print(np.sum(a)) # Sum of all element 
print(np.vdot(a,b)) # Return the dot product of two vectors.
print(np.outer(a,b))
print(np.matmul(m,n))
print(np.dot(m,n))

10
70
[[ 5  6  7  8]
 [10 12 14 16]
 [15 18 21 24]
 [20 24 28 32]]
[[ 84  90  96]
 [201 216 231]
 [318 342 366]]
[[ 84  90  96]
 [201 216 231]
 [318 342 366]]


In [37]:
k = np.array([[1, 10, 100], [2, 9, 17], [5, 1, 12]])
print(k)

[[  1  10 100]
 [  2   9  17]
 [  5   1  12]]


In [38]:
print(np.linalg.inv(k))
print(np.transpose(k))
print(np.linalg.det(k))

[[-0.0252848   0.0055571   0.20283412]
 [-0.01694915  0.13559322 -0.05084746]
 [ 0.01194776 -0.01361489  0.0030564 ]]
[[  1   2   5]
 [ 10   9   1]
 [100  17  12]]
-3599.0000000000014


# Exercice : Your turn !!

## 1) Root of quadratic function
The goal of this exercice is to implement a quadratic function and find automatically his roots for a given set of parameters. Let's define f(x) as a quadratic function :

$$ f(x) = ax^2 + bx + c $$

The solution of a quadratic function is all $x$ such that f(x) is null $ f(x) = 0$. The solution exist if and only if $$\Delta = b^2 - 4 a c \geq 0 $$. We can define solutions as follow :

if $\Delta > 0$ : $$x_{1,2} = \frac{-b \pm \sqrt{\Delta}}{2a} $$

else if $\Delta = 0$ : $$x_1 = x_2 = \frac{-b}{2a} $$

else, no solution

***Let's define a function with 3 parameters (a, b, c) as inputs. The outputs of this function have to be the solutions of the equation and you have to check if your solution is correct ! If no solution, I want a message that explain me why.*** 

## 2) Root of quadratic function : multiple parameters

Now, let's do the same  exercice 1) but now the inputs of the function allow multiple value by parameters and the function is able to calculate the solution for multiple quadratic functions. If I give to the function $n$ value of "a", $n$ value of "b" and $n$ of "c", the function has to return $n$ solution with $n$ checks

## 3) Solving system of linear equation
If you need any reminder on how to solve a system of linear equation using matrix, here a gentle reminder https://www.mathsisfun.com/algebra/systems-linear-equations-matrices.html 

Here the system of equation to solve : 
$$ x_0 + 2 x_1 = 1 $$
$$ 3 x_0 + 5  x_1 = 2 $$

Let's fin $x_0, x_1$

## 4) Heads or Tails ?
Let's code the game the Heads or Tails. First, you have to ask to the player his choice. Than, simulate the result of a coin toss. Finally, determine if the user win or loss.

HINT : use "randint(0,1)" to simulate the coin toss.

## 5) Heads or Tails 10 000 times ? 
Can you simulate now 10 000 times this game ? I'm sure you can :)
remark : forgot about the game, just focus about the result of a coin toss.