<a href="https://colab.research.google.com/github/diengiau/py18plus/blob/master/02_varStats.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Variable

We often use a variable to store data and re-use it in a program. It is quite easy in Python:

`variableName = value`

For example:


In [123]:
x = 15
4*x

60

In [124]:
x**2

225

A variable can be re-assigned another value:

In [125]:
x = 20
x

20

We can create a variable to store all stock prices of a stock. So we should introduce a new definition: a list.
A list is like a vector to store many individual value inside.

List = \[value1, value2, ...\]

In [0]:
prices = [90, 94, 97, 96, 91, 89, 95, 97, 100, 105]

In [127]:
prices

[90, 94, 97, 96, 91, 89, 95, 97, 100, 105]

In [128]:
len(prices)

10

In [129]:
prices[0]

90

In [130]:
prices[1]

94

In [131]:
prices[9]

105

In [0]:
# prices[10] # Index Error: we only have 10 elements in list, so the maximum index should be 9

The next problem is how to calculate stock returns?

In [133]:
ret = []
ret

[]

In [134]:
(prices[1] - prices[0])/prices[0] # simple return

0.044444444444444446

In [135]:
import math # import a package in Python
math.log(prices[1]/prices[0]) # log returns

0.04348511193973889

Above, we load a package `math` to python, and use the function `log` from this package to calculate natural log of a number. In python, we load a function `f` from a package `p` by following syntax: `p.f()`.

Repeat this method, we can append returns to the `ret` list:

In [136]:
ret.append(math.log(prices[1]/prices[0]))
ret.append(math.log(prices[2]/prices[1]))
ret.append(math.log(prices[3]/prices[2]))
ret.append(math.log(prices[4]/prices[3]))
ret.append(math.log(prices[5]/prices[4]))
ret.append(math.log(prices[6]/prices[5]))
ret.append(math.log(prices[7]/prices[6]))
ret.append(math.log(prices[8]/prices[7]))
ret.append(math.log(prices[9]/prices[8]))
ret

[0.04348511193973889,
 0.031416196233378914,
 -0.010362787035546547,
 -0.053488684950986236,
 -0.022223136784710235,
 0.06524052186840094,
 0.020834086902842053,
 0.03045920748470844,
 0.04879016416943205]

# 2. Basic stats and operations for univariate

We can calculate some stats like mean, median, variance, standard deviation, ...
We may require a new package/library `numpy`.

In [137]:
import numpy as np
np.mean(ret)

0.01712785331413981

In [138]:
np.median(ret)

0.03045920748470844

In [139]:
np.var(ret)

0.0012998427784805112

In [140]:
np.std(ret)

0.036053332418522856

In [141]:
np.sum(ret)

0.15415067982725827

In [142]:
np.min(ret)

-0.053488684950986236

In [143]:
np.max(ret)

0.06524052186840094

Some important math operations also use `numpy` functions such as exponential or square root:

In [144]:
np.exp(2) # exponential

7.38905609893065

In [145]:
np.sqrt(81) # same as 81**0.5

9.0

In [146]:
np.log(100)

4.605170185988092

In [147]:
np.log10(100)

2.0

In [148]:
np.abs(-90)

90

In [149]:
np.floor(20.9), np.ceil(20.9)

(20.0, 21.0)

In [150]:
np.cumprod(ret)

array([ 4.34851119e-02,  1.36613681e-03, -1.41569848e-05,  7.57238501e-07,
       -1.68282148e-08, -1.09788151e-09, -2.28733589e-11, -6.96704384e-13,
       -3.39923213e-14])

# 3. Two variables

First, we can create two randon variables, each contain 100 values.


In [0]:
np.random.seed(42)
x = np.random.randn(100) # 100 observation
e = np.random.uniform(low=0,high=1,size=100)
y = 0.2*x + e

In [152]:
x[:10], y[:10] # ten obs only

(array([ 0.49671415, -0.1382643 ,  0.64768854,  1.52302986, -0.23415337,
        -0.23413696,  1.57921282,  0.76743473, -0.46947439,  0.54256004]),
 array([0.51675383, 0.19445495, 0.24940307, 0.64222114, 0.89607903,
        0.27637554, 0.83463318, 0.8565059 , 0.26973473, 1.08029409]))

In [153]:
x.shape, y.shape

((100,), (100,))

In [154]:
x.dtype, y.dtype

(dtype('float64'), dtype('float64'))

## Binary Operations

Some binary operations between two variables:

In [155]:
np.add(x, y)[:10]

array([ 1.01346799,  0.05619065,  0.89709161,  2.165251  ,  0.66192565,
        0.04223858,  2.413846  ,  1.62394063, -0.19973966,  1.62285414])

In [156]:
np.subtract(x, y)[:10]

array([-0.02003968, -0.33271925,  0.39828546,  0.88080871, -1.1302324 ,
       -0.5105125 ,  0.74457963, -0.08907118, -0.73920911, -0.53773405])

In [157]:
np.multiply(x, y)[:10]

array([ 0.25667894, -0.02688618,  0.16153551,  0.97812197, -0.20981993,
       -0.06470973,  1.31806342,  0.65731238, -0.12663354,  0.58612441])

In [158]:
np.divide(x, y)[:10]

array([ 0.96122006, -0.71103513,  2.5969549 ,  2.37150376, -0.26130884,
       -0.8471696 ,  1.89210403,  0.89600635, -1.74050407,  0.50223365])

In [159]:
np.greater(x, y)[:10]

array([False, False,  True,  True, False, False,  True, False, False,
       False])

## Correlation between two variables

In [160]:
np.corrcoef(x, y)

array([[1.        , 0.60845792],
       [0.60845792, 1.        ]])

Very highly correlated, up to 0.6.