# QF 625 Introduction to Programming
## Lesson 2 | An Introduction to `NumPy` | `RE`view

> Welcome back, Team :) Let's begin our first meeting of Week 2.

> In the previous lesson, you have learned about `methods and functions` that are available in built-in Python, along with variables and data types.

> First, let us begin with some basic built-in Python that you need to fully understand before we proceed.

> Here's a quick reminder regarding the useful hotkeys for scripting on Jupyter Notebook :)

- `a` inserts a cell above 
- `b` inserts a cell below
- `dd` (double d) deletes a cell 
- `esc` jumps out of the cell
- `return/enter` gets into the cell
- `m` makes your cell markdown
- `y` makes your cell code
- `shift + control + -` splits the cell
- `shift + m` merges the cells
- `shift + l` adds line numbers into the cell

> Now you will start learning about how to use `packages`.

> As Python is open-source language, using the humongous ecosystem of packages will help you work more efficiently.

> A `package is a collection of Python modules and scripts` giving you new data types, functions, and methods.


### How to `install` a ***package***?

> To use a package, you need to download it first.

> Let's use the command `pip3 install target_package` so that you can install packages of your interest.

> For downloading a package, you need to do this just once, yet you should import in your workspace whenever you wish to use it.

> The command below will load the NumPy package into Python for your use.

In [1]:
import numpy as np

***Wait, why do we use alias here (i.e., as `np`)? ?*** 

> As you will see below, 

- To access the array() function, you need to use np.array() to indicate that the function is from the NumPy package.

> Yes, the reason why we have used np as alias is to minimize our typing task.

### The Basics

> Using `NumPy`, you can create a new data type called `array`.

> Why use data type `array`?

**`array` is useful for financial analysis because...**

- array `stores` data more efficiently

- array `performs` faster than built-in Python lists in terms of computations (access in reading and writing items faster as the package is optimized for numerical analyses)

- array `shows` better performance with relatively larger datasets

- array, most importantly, **`enables` you to utilize `array-related functions`**--you can perform statistical modelling and visualization easier, which is critical for financial analysis.





> **A good way to understand about the usefulness of NumPy is to compare array with list (yes, that list that you learned in the previous lesson).**

#### Differences 1. Arrays can contain only a single data type (unlike lists).

In [2]:
your_list = ["Year", 2019, False]
print(your_list)

['Year', 2019, False]


In [3]:
type(your_list)

list

> As you will see below, arrays in NumPy will convert the elements in the list to the most compatible data types.

In [4]:
# Note that function array() takes a list as its input. 
your_array = np.array(["Year", 2019, False])

# As noted above, to use array(), you need to use np.array()
print(your_array)

your_array2 = np.array([2020, "Coronavirus", True])
print(your_array2)

['Year' '2019' 'False']
['2020' 'Coronavirus' 'True']


In [5]:
print(type(your_array))

<class 'numpy.ndarray'>


In [6]:
# Here are lists.
earnings_list = [10.09, 10.28, 2.21, 6.19, 8.24]
prices_list = [99.98, 87.68, 154.23, 162.12, 121.11]

> How would you make objects `earnings` and `prices` arrays?

In [7]:
earnings_array = np.array(earnings_list)
prices_array = np.array(prices_list)

print(earnings_array);print(prices_array)

[10.09 10.28  2.21  6.19  8.24]
[ 99.98  87.68 154.23 162.12 121.11]


#### Differences 2. Arrays have different ways of operations (than lists).

> Let's see how lists behave first.

In [8]:
pe_ratio_list = prices_list + earnings_list
print(pe_ratio_list)

[99.98, 87.68, 154.23, 162.12, 121.11, 10.09, 10.28, 2.21, 6.19, 8.24]


> The two objects were merely concatenated. That's not what we want...

> ***Arrays allow for efficient numerical manipulation of its elements.***

> Let's calculate `the dollar amount an investor can expect to invest in a company to receive one dollar of that company’s earnings`--yes, the `price to earnings ratio`--using two arrays, earnings_array and prices_array above.

In [9]:
pe_ratio_array = prices_array / earnings_array
print(pe_ratio_array)

[ 9.90882061  8.52918288 69.78733032 26.19063005 14.69781553]


> You could see here that arrays perform `element-wise mathematical operations`.

#### Indexing, Subsetting, Filtering, & Slicing: Similarities between `array` and `list`

> We have seen differences between arrays and lists.

> Here are also similarities.

In [10]:
earnings_subset_three_in_the_middle = earnings_array[1:4]
print(earnings_subset_three_in_the_middle)

[10.28  2.21  6.19]


In [11]:
earnings_subset_the_last_two = earnings_array[-2:]
print(earnings_subset_the_last_two)

[6.19 8.24]


In [12]:
earnings_subset_every_other_element = earnings_array[0:5:2]

> Please address the error message above.

### Arrays in NumPy can be `multi`dimensional.

![](ndim.png)
#### How to add image (CLICK HERE TWICE)

> A common form of financial data comes with a rectangular form of data that contains rows and columns. 

> Such data can be represented with two-dimensional arrays.

> To create a two-dimensional array using NumPy, you can use the same function array().

> Instead of providing a single list as your input, let's pass in a list of two lists as your input.

> Here, let's pass earnings and prices to create a two-dimensional array.

In [13]:
pe_array = np.array([[10.09, 10.28, 2.21, 6.19, 8.24],[99.98, 87.68, 154.23, 162.12, 121.11]])
print(pe_array)

# Recall that there were two lists of earnings_list and prices_list
pe_array2 = np.array([earnings_list, prices_list])
print(pe_array2)

[[ 10.09  10.28   2.21   6.19   8.24]
 [ 99.98  87.68 154.23 162.12 121.11]]
[[ 10.09  10.28   2.21   6.19   8.24]
 [ 99.98  87.68 154.23 162.12 121.11]]


In [14]:
pe_array == pe_array2

array([[ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]])

> You might want to use `boolean` arrays as well. 

> As you will see below, Boolean arrays are quite useful for subsetting--stay tuned :)

#### Methods in Array

> Like list, array also has many useful methods.

##### array.shape

In [15]:
np.shape(pe_array)

(2, 5)

##### array.size

In [16]:
np.size(pe_array)

10

##### array.transpose

In [17]:
pe_array_transposed = np.transpose(pe_array)

In [18]:
print(pe_array_transposed)

[[ 10.09  99.98]
 [ 10.28  87.68]
 [  2.21 154.23]
 [  6.19 162.12]
 [  8.24 121.11]]


In [19]:
print(pe_array_transposed.shape);print(pe_array_transposed.size)

(5, 2)
10


> Remember how to subset nested lists? Subsetting two-dimensional arrays is similar to subsetting nested lists. 

> In a 2D array, the indexing/slicing should be specific to the dimension of the array: **`array[row, column]`**

##### How would you subset `earnings` from the `transposed pe_array`? 

In [20]:
earnings = pe_array_transposed[ : , 0]
print(earnings)

[10.09 10.28  2.21  6.19  8.24]


##### How would you subset `prices` from the `transposed pe_array`? 

In [21]:
prices = pe_array_transposed[ : , -1]
print(prices)

[ 99.98  87.68 154.23 162.12 121.11]


##### How would you subset the `earnings and prices for third and forth companies` from the `transposed pe_array`?

In [22]:
pe_34 = pe_array_transposed[2:4, : ]
print(pe_34)

[[  2.21 154.23]
 [  6.19 162.12]]


> ***Review & Expansion of Your Vocabulary: Below are some useful basics for array.***

In [23]:
# Get Dimension
pe_array_transposed.ndim

2

In [24]:
# Get Shape
pe_array_transposed.shape

(5, 2)

In [25]:
# Get Type
pe_array_transposed.dtype

dtype('float64')

In [26]:
# Get Size (One Element in Your Array)
pe_array_transposed.itemsize

8

In [27]:
# Get Total Size
pe_array_transposed.nbytes

80

In [28]:
# Get the Number of Elements
pe_array_transposed.size

10

In [29]:
# Get a specific element [row, column]
pe_array_transposed[1, 1]

87.68

In [30]:
# Get a specific row 
pe_array[1, :]

array([ 99.98,  87.68, 154.23, 162.12, 121.11])

In [31]:
# Get a specific column
pe_array[:, 4]

array([  8.24, 121.11])

In [32]:
# Getting a little more fancy [start:end:step]
pe_array[0, 1:-1:2]

array([10.28,  6.19])

#### `WARNING`: Please be careful when `copying arrays`!

In [33]:
a = np.array([1,2,3]) # Imagine that we have one array.
b = a
b[0] = 100

print(b) # This is fine.
print(a) # This is weird.

[100   2   3]
[100   2   3]


In [34]:
c = np.array([1,2,3])
d = c.copy() # use copy method 
d[0] = 100

print(c) # Now this will be fine :)

[1 2 3]


### Mathematics with NumPy

> **`We all love mathematics`. For a lot more**, [check this out](https://docs.scipy.org/doc/numpy/reference/routines.math.html).

- For example, `linear algebra`, look at [here](https://docs.scipy.org/doc/numpy/reference/routines.linalg.html).

#### Statistics

> Not only can you perform element-wise calculations on NumPy arrays, you can also calculate summary statistics such as range, mean, and standard deviation of arrays using functions from NumPy.

##### Calculating the range (minimum and maximum values)

In [35]:
print(pe_array)

[[ 10.09  10.28   2.21   6.19   8.24]
 [ 99.98  87.68 154.23 162.12 121.11]]


In [36]:
np.min(pe_array)

2.21

In [37]:
np.max(pe_array, axis=1)

array([ 10.28, 162.12])

In [38]:
np.sum(pe_array, axis=0)

array([110.07,  97.96, 156.44, 168.31, 129.35])

##### Calculating the mean (`mean`) and standard deviation (`std`)

In [39]:
earnings_mean = np.mean(earnings_array)
print(earnings_mean)

7.401999999999999


In [40]:
earnings_mean2 = np.mean(pe_array[0,:])
earnings_mean == earnings_mean2

True

In [41]:
prices_std = np.std(prices_array)
print(prices_std)

29.210269837849836


In [42]:
prices_std2 = np.std(pe_array[-1,:])
prices_std == prices_std2

True

##### Generating a sequence of numbers

> Often you may want to create an array of a range of numbers (e.g., 1 to 500) without having to type in every single number. 

> The NumPy function `arange()` is an efficient way to create numeric arrays of a range of numbers--using arange() can be much faster than typing each individual element.

> The arguments for `arange()` include the `start`, `stop`, and `step interval` as follows: `np.arange(start, stop, step)`


In [43]:
ticker_ids = np.arange(1, 501 , 1)
print(ticker_ids)

[  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198
 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216
 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234
 235 236 237 238 239 240 241 242 243 244 245 246 24

> How would you create `odd numbers only`?

In [44]:
ticker_ids_odd = np.arange(1, 501, 2)
print(ticker_ids_odd)

[  1   3   5   7   9  11  13  15  17  19  21  23  25  27  29  31  33  35
  37  39  41  43  45  47  49  51  53  55  57  59  61  63  65  67  69  71
  73  75  77  79  81  83  85  87  89  91  93  95  97  99 101 103 105 107
 109 111 113 115 117 119 121 123 125 127 129 131 133 135 137 139 141 143
 145 147 149 151 153 155 157 159 161 163 165 167 169 171 173 175 177 179
 181 183 185 187 189 191 193 195 197 199 201 203 205 207 209 211 213 215
 217 219 221 223 225 227 229 231 233 235 237 239 241 243 245 247 249 251
 253 255 257 259 261 263 265 267 269 271 273 275 277 279 281 283 285 287
 289 291 293 295 297 299 301 303 305 307 309 311 313 315 317 319 321 323
 325 327 329 331 333 335 337 339 341 343 345 347 349 351 353 355 357 359
 361 363 365 367 369 371 373 375 377 379 381 383 385 387 389 391 393 395
 397 399 401 403 405 407 409 411 413 415 417 419 421 423 425 427 429 431
 433 435 437 439 441 443 445 447 449 451 453 455 457 459 461 463 465 467
 469 471 473 475 477 479 481 483 485 487 489 491 49

> How would you create **`even`** numbers only then? 

In [45]:
ticker_ids_even = ticker_ids_odd + 1
print(ticker_ids_even)

[  2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36
  38  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72
  74  76  78  80  82  84  86  88  90  92  94  96  98 100 102 104 106 108
 110 112 114 116 118 120 122 124 126 128 130 132 134 136 138 140 142 144
 146 148 150 152 154 156 158 160 162 164 166 168 170 172 174 176 178 180
 182 184 186 188 190 192 194 196 198 200 202 204 206 208 210 212 214 216
 218 220 222 224 226 228 230 232 234 236 238 240 242 244 246 248 250 252
 254 256 258 260 262 264 266 268 270 272 274 276 278 280 282 284 286 288
 290 292 294 296 298 300 302 304 306 308 310 312 314 316 318 320 322 324
 326 328 330 332 334 336 338 340 342 344 346 348 350 352 354 356 358 360
 362 364 366 368 370 372 374 376 378 380 382 384 386 388 390 392 394 396
 398 400 402 404 406 408 410 412 414 416 418 420 422 424 426 428 430 432
 434 436 438 440 442 444 446 448 450 452 454 456 458 460 462 464 466 468
 470 472 474 476 478 480 482 484 486 488 490 492 49

#### Boolean arrays can be a very powerful way to subset arrays. 

> As a case in point, let's try to identify the earnings that are greater than average from a list of earnings.

> To do so, let's find the mean value of earnings first.

In [46]:
earnings_mean = np.mean(earnings_array)

##### How would you index earnings that are lesser than average

> Hint: You might want to create a boolean array first.

In [47]:
boolean_array = (earnings_array < earnings_mean)
print(boolean_array)

[False False  True  True False]


In [48]:
earnings_below_mean = earnings[boolean_array]
print(earnings_below_mean)

[2.21 6.19]


In [49]:
earnings_below_mean2 = earnings[earnings_array < earnings_mean]
earnings_below_mean == earnings_below_mean2

array([ True,  True])

> Boolean array can be used for strings as well. 

> Let's create the names of companies with their associated industry first. 

> Here, your want to find all companies that are categorized as `Investment Services` industry.

In [50]:
company_array = np.array(["Facebook", "Amazon", "Goldman Sachs", "Red Bull", 
                         "Wells Fargo", "McKinsey", "Tesla"])
industry_array = np.array(["Internet", "Internet", "Investment Services", "Food & Beverage", 
                          "Investment Services", "Management Consulting", "Mobility"])

company_industry_array = np.array([company_array, industry_array])
print(company_industry_array)

[['Facebook' 'Amazon' 'Goldman Sachs' 'Red Bull' 'Wells Fargo' 'McKinsey'
  'Tesla']
 ['Internet' 'Internet' 'Investment Services' 'Food & Beverage'
  'Investment Services' 'Management Consulting' 'Mobility']]


##### How would you subset Investment Services industry and print companies in Investment Servecies?

In [51]:
bool_array = (industry_array == "Investment Services")
print(bool_array)

[False False  True False  True False False]


In [52]:
investment_services = company_array[bool_array]
print(investment_services)

['Goldman Sachs' 'Wells Fargo']


> For your information, there is numpy_financial package that contains a collection of elementary financial functions. 

> It will make your life easier when working with financial values.

> For example, the function .pv(rate, nper, pmt, fv) allows you to calculate the present value of an investment with some parameters:

- `rate` The rate of return of the investment
- `nper` The lifespan of the investment
- `pmt` The (fixed) payment at the beginning or end of each period
- `fv` The future value of the investment

> You can use this formula in many ways (e.g., you can calculate the present value of future investments in today's dollars).

In [53]:
import numpy_financial as npf

> Before you run the code above, you should have installed the package `numpy-financial`.

In [54]:
your_investment = npf.pv(rate=0.04, nper=20, pmt=0, fv=15000)

> Here, the present value returned is negative, so we multiply the result by -1

In [55]:
print("Your Investment is worth " + str(round(-your_investment, 2)) + " in today's dollars")

Your Investment is worth 6845.8 in today's dollars


In [56]:
your_friend_investment = npf.pv(rate=0.02, nper=40, pmt=0, fv=15000)
print("Your friend's investment is worth " + str(round(-your_friend_investment, 2)) + " in today's dollars")

Your friend's investment is worth 6793.36 in today's dollars


> Similarly, you can also calculate the future value of an investment the following parameters:

- `rate` The rate of return of the investment
- `nper` The lifespan of the investment
- `pmt` The (fixed) payment at the beginning or end of each period (which is 0 in our example)
- `pv` The present value of the investment

> Here, you can use the function .fv(rate, nper, pmt, pv).

> Note that you should `input a negative value into the pv parameter` if it represents `a negative cash flow (cash going out)`. 

> That is, if you were to compute the future value of an investment, requiring an up-front cash payment, you would need to `input a negative value to the pv parameter` in the function .fv().

# Estimate Your Investment's Future Value

In [57]:
your_investment_future = npf.fv(rate=0.04, nper=20, pmt=0, pv=-20000)
print("Your investment will return a total of $" + str(round(your_investment_future, 2)) + " in 20 years")

Your investment will return a total of $43822.46 in 20 years


# Estimate the Future Value of Your Friend's Investment

In [58]:
your_friend_investment_future = npf.fv(rate=0.08, nper=20, pmt=0, pv=-20000)
print("The future value of your friend's investment will return a total of $" + str(round(your_friend_investment_future, 2)) + " in 20 years")

The future value of your friend's investment will return a total of $93219.14 in 20 years


##### Now let's adjust future values of your investment for inflation with the following steps:

**1. forecast the future value of an investment given a rate of return**

**2. discount the future value of the investment by a projected inflation rate**

> Here, we will `utilize both functions .fv() and .pv()` to estimate the projected value of a given investment in today's dollars, adjusted for inflation.

> ***Scenario***: `Investment returning 7% per year for 25 years`

In [59]:
your_brother_investment = npf.fv(rate=0.07, nper=25, pmt=0, pv=-15000)
print("Your brother's investment will return a total of $" + str(round(your_brother_investment, 2)) + " in 25 years")

Your brother's investment will return a total of $81411.49 in 25 years


> ***Scenario***: `Inflation rate of 2.5% per year for 25 years`

In [60]:
your_brother_investment_discounted = npf.pv(rate=0.025, nper=25, pmt=0, fv=your_brother_investment)
print("After adjusting for inflation, your brother's investment is worth $" + str(round(-your_brother_investment_discounted, 2)) + " in today's dollars")

After adjusting for inflation, your brother's investment is worth $43912.59 in today's dollars


> `Thank you for working with the script :)`

In [61]:
exit()