<img align=left src="http://www.nus.edu.sg/templates/t3_nus2015/images/assets/logos/logo.png" width=125>
<br><br>

# RE2708 Lecture 5

## Discounted Cash Flow Model

Dr. Cristian Badarinza

## Structure of this Lecture

- First part: **NPV and IRR (Monte Carlo Simulation)**

- Second part: **Web Scraping**

## Loading the libraries

In [None]:
import numpy as np
from matplotlib import pyplot as plt

## Case Study

* Lecture 5 covers the implementation of financial calculation (NPV and IRR), as well as the simulation technique to understand the uncertainty sourrounding a given investment. 

* Let's start from a familiar case study:

<img src="InterlaceApartment.PNG">

## Investment opportunity

See calculation in Excel sheet:

<img src="CaseStudyCalculate.PNG" width=75%>

## Can we do the same calculation in Python?

First, set the assumptions right:

In [None]:
price = 3000000
rent = 5700*12
growth = 0.02 
vacancy = 0.1
rate = 0.02

Second, build the cash flow series:

In [None]:
cf = np.arange(0,6)
cf[0] = -price
cf[1:5] = rent*(1-vacancy)
cf[5] = price*(1+growth)**5
print(cf)

## Calculate Net Present Value (NPV):

In [None]:
npv = np.npv(rate,cf)

print("NPV = $" + str(npv.round()))

## Calculate Internal Rate of Return (IRR):

In [None]:
irr = np.irr(cf)

print("IRR = " + str(round(irr*100,1)) + "%")

Is this the end of the story? **No**, it is just the beginning. 

We want to understand the **optimistic** and **pessimistic** scenarios.

## Table of Contents

### Discounted Cash Flow Model

1. Simulating random numbers
1. Calculating the distribution of NPVs and IRRs
1. Working with percentiles

### Web scraping

Applications:
1. Vacancy rates
1. Skyscrapers
1. Wikipedia


## 1. Simulating random numbers

The library `numpy` allows us to generate random numbers.

For example, let's say that we want to generate a random number for the expected vacancy rate and we assume it can be anything between 0 and 1, i.e. between 0% and 100%:

In [None]:
vacancy = np.random.uniform(0,1)

print(vacancy)

Why not generate more scenarios? Let's try 2, 3, 4, ..., 100000 different scenarios.

In [None]:
N = 2
vacancy = np.random.uniform(0,1,N)

print(vacancy)

A **histogram** is a good way to quickly visualize our 100000 scenarios:

In [None]:
plt.hist(vacancy,30,edgecolor='black', linewidth=1)
plt.title('Histogram of the simulated vacancy rate')
plt.xlabel('Vacancy rate')
plt.ylabel('Frequency')
plt.show()

How about using a bell curve (i.e. a normal distribution)?

In [None]:
plt.hist(vacancy,30,edgecolor='black', linewidth=1)
plt.title('Histogram of the simulated vacancy rate')
plt.xlabel('Vacancy rate')
plt.ylabel('Frequency')
plt.show()

Finally, what if we want the parameter to just have positive values?

In [None]:
rate = np.random.lognormal(-4,.5,N)

plt.hist(rate,30,edgecolor='black', linewidth=1)
plt.title('Histogram of the simulated discount rate')
plt.xlabel('Discount rate')
plt.ylabel('Frequency')
plt.show()

## 2. Calculating the distributions of NPVs and IRRs

Let's now simulate different values for our three parameters:

In [None]:
N = 10000
growth = np.random.normal(0.02,0.01,N)
vacancy = np.random.uniform(0.8,1,N)
rate = np.random.lognormal(-4,.5,N)

... and run a `for` loop to calculate NPVs and IRRs for all possible scenarios:

In [None]:
npv = np.arange(0,N)
irr = np.arange(0,N,dtype=np.float)

for i in np.arange(0,N):
    cf = np.arange(0,6)
    cf[0] = -price
    cf[1:5] = rent*(1-vacancy[i])
    cf[5] = price*(1+growth[i])**5    
    
    npv[i] = np.npv(rate[i],cf)
    irr[i] = np.irr(cf)

In [None]:
plt.hist(irr,30,edgecolor='black', linewidth=1)
plt.title('Histogram of the simulated vacancy rate')
plt.xlabel('Vacancy rate')
plt.ylabel('Frequency')
plt.show()

## 3. Working with percentiles

The graphical representations above are very useful, but they still do not provide us a definitive answer to the question of what is a **pessimistic** and what is an **optimistic** scenario in terms of NPV and IRR.

To this end, we can of course use `min` and `max`, but that is rather useless, because both the minimum and the maximum scenarios are highly unlikely. 

Instead, we have the function `percentile`, which indicates the part of the distribution that lies above/below a certain threshold.

For example, let's find the worst NPV and IRR that occurs with a probability higher than 10%:

In [None]:
np.percentile(npv,.05)

In [None]:
np.percentile(irr,.05)

Similarly, we can find the most optimistic NPV and IRR, for the top part of the probability distribution:

In [None]:
np.percentile(npv,.95)

In [None]:
np.percentile(irr,.95)

Let's now finally bring all simulated parameters in one plot:

In [None]:
fig, ax = plt.subplots(1,3,figsize=(12,3))
ax[0].hist(vacancy,30,edgecolor='black', linewidth=1)
ax[0].set_xlabel('Vacancy rate')
ax[1].hist(growth,30,edgecolor='black', linewidth=1)
ax[1].set_xlabel('Yearly price growth')
ax[2].hist(rate,30,edgecolor='black', linewidth=1)
ax[2].set_xlabel('Discount rate')
fig.suptitle('Overview of simulated parameters',fontsize=15)
plt.show()

... and also plot the distribution of possible NPVs and IRRs:

In [None]:
fig, ax = plt.subplots(1,2,figsize=(11,4))
npv = npv/1000
ax[0].hist(npv,30,edgecolor='black', linewidth=1)
ax[0].set_xlabel('NPV (thousands SGD)')
ax[0].axvline(x=np.percentile(npv,1),linestyle='dotted',color=(.8,.5,.3))
ax[0].axvline(x=np.percentile(npv,5),linestyle='dashed',color=(.8,.5,.3))
ax[0].axvline(x=np.percentile(npv,95),linestyle='dashed',color=(.8,.5,.3))
ax[0].axvline(x=np.percentile(npv,99),linestyle='dotted',color=(.8,.5,.3))
ax[1].hist(irr,30,edgecolor='black', linewidth=1)
ax[1].set_xlabel('IRR')
ax[1].axvline(x=np.percentile(irr,1),linestyle='dotted',color=(.8,.5,.3))
ax[1].axvline(x=np.percentile(irr,5),linestyle='dashed',color=(.8,.5,.3))
ax[1].axvline(x=np.percentile(irr,95),linestyle='dashed',color=(.8,.5,.3))
ax[1].axvline(x=np.percentile(irr,99),linestyle='dotted',color=(.8,.5,.3))
fig.suptitle('Overview of optimistic and pessimistic scenarios',fontsize=15)
plt.show()

# Web Scraping

## Loading the libraries

In [None]:
import requests
import pandas as pd

## Application 1: Data on world's highest buildings

The most frquent application of web scraping consists of retrieving usable data from tables displayed on web pages.

For example, let's download the list of the world's highest buildings:

In [None]:
m = requests.get('https://www.skyscrapercenter.com/buildings')
df = pd.read_html(m.text)

We have to remove some of the columns and rows that are not useful:

In [None]:
df = df[1]
df.drop(columns=df.columns[[0,1,2]],inplace=True)

... and then the new data set is ready:

In [None]:
df.head()

## Application 2: Data on protected green areas

Finally, here is an example of a way to answer the following question: `What fraction of the Singapore land is a protected green area?`.

To answer this question, we use Wikipedia to download a list of all protected areas:

In [None]:
m = requests.get('https://en.wikipedia.org/wiki/List_of_parks_in_Singapore')

In [None]:
df = pd.read_html(m.text)

We clean the data and transform the relevant variable from a string to a number. We then sum up the land allocated to all parks, and divide this sum by the total country's area.

In [None]:
df = df[0].drop(index=0,columns=0)
df[3] = pd.to_numeric(df[3])
df.head()

In [None]:
print('The green protected area accounts for ' + str(round((df[3].sum()/720000000)*100,2)) + '% of total land mass.')