# Chapter 3: Regression

In [1]:
#list 3-1 Implementation of OLS in TensorFlow2
import tensorflow as tf
X=tf.constant([[1,0],[1,2]],tf.float32)
Y=tf.constant([[2],[4]],tf.float32)
XT=tf.transpose(X)
XTX=tf.matmul(XT,X)
beta=tf.matmul(tf.matmul(tf.linalg.inv(XTX),XT),Y)
print(beta.numpy())

[[2.]
 [1.]]


TensorFlow has advantages while minizing a loss function that doesn't have an analytical solution or when we cannot hold all of the data in memory.

### Alternative methods: Least Absolute Deviation(LAD) or Least Absolute Errors(LAE), BTW, TensorFlow is good at LAD.

In [2]:
#List 3.2- Generate input data for a linear samples
S=100 #data samples
N=10000 #Observations to train model
alpha=tf.constant([1.],tf.float32)
beta=tf.constant([3.],tf.float32)
X=tf.random.normal([N,S])
epsilon=tf.random.normal([N,S],stddev=0.25)
Y=alpha+beta*X+epsilon

In [3]:
#List 3.3 -Initialize Variables and define the loss
alphaHat0=tf.random.normal([1],stddev=5.0)
betaHat0=tf.random.normal([1],stddev=5.0)
alphaHat=tf.Variable(alphaHat0,tf.float32)
betaHat=tf.Variable(betaHat0,tf.float32)
def maeloss(alphaHat,betaHat,xSample,ySample):
    prediction=alphaHat+betaHat*xSample
    error=ySample-prediction
    absError=tf.abs(error)
    return tf.reduce_mean(absError)

In [4]:
#list 3.4 Define an optimizer and minize the loss function
opt=tf.optimizers.SGD()# SGD means Stochastic Gradient Descent
alphaHist,betaHist=[],[]
for j in range(1000):
    opt.minimize(lambda:maeloss(alphaHat,betaHat,X[:,0],Y[:,0]),var_list=[alphaHat,betaHat])
    alphaHist.append(alphaHat.numpy()[0])
    betaHist.append(betaHat.numpy()[0])
#Without batch, epoch the whole date above

In [12]:
#List 3.5 Plot the parameter training histories
#define dataframe of parameter histories
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
params=pd.DataFrame(np.hstack([alphaHist,betaHist]),columns=['alphaHat','betaHat'])#Stack arrays in sequence horizontally (column wise)
params=plot(figsize=(10,7))
plt.xlabel('Epoch')
plt.ylabel('Parameter Value')
#Note: For this graph, the errors are as below. Even though I follow all the steps in the book, there are still errors. It's not my errors, but the authors'.
#I even copied and pastered the source codes, but the errors are still there. Wierd!

ValueError: Shape of passed values is (2000, 1), indices imply (2000, 2)

In econometrics, we use confidence interval to test our hypothesis. But in machine, it focus only on prediction,
not hypothesis testing. So tensorflow works fine in this case. 

* 5 assumptions of Gauss-Markov Theorem:
    * The true model is lienar in the parameters
    * the data is sampled randomly
    * none of the independent variables are perfectly correlated with each other(no perfect collinearity).
    * the error term is exogenous
    * the variance of the error term is constant and finite.

**Partially linear model**: Certain independent variables to enter linearly, while others are permitted to enter the 
    the model through a **non-linear** function.
    $$Y=\alpha+\beta X+g\left(Z\right)+\epsilon$$
    **Notice**: In this simulation, assume: Z is control variable; $g\left(Z\right)=exp\left(\theta Z \right)$; $\alpha=1$,$\beta=3$, $\theta=0.05$. 100 samples and 10000 oversvations.

In [14]:
#List 3.6 Simulate in PLM with parameters.
S=100
N=10000
alpha=tf.constant([1.],tf.float32)
beta=tf.constant([3.],tf.float32)
theta=tf.constant([.05],tf.float32)
epsilon=tf.random.normal([N,S],stddev=0.25)
X=tf.random.normal([N,S])
Z=tf.random.normal([N,S])
Y=alpha+beta*X+tf.exp(theta*Z)+epsilon


In [16]:
#list 3.7 Initiallize variable and compute loss
#Not initial value: in the parensathesis, no tf.float, but stddev. 
alphahat0=tf.random.normal([1],stddev=5.0)
betahat0=tf.random.normal([1],stddev=5.0)
thetahat0=tf.random.normal([1],mean=0.05,stddev=0.10)

alphahat=tf.Variable(alphahat0,tf.float32)
betahat=tf.Variable(betahat0,tf.float32)
thetahat=tf.Variable(thetahat0,tf.float32)

def plm(alphahat,betahat,thetahat,xS,zS):
    prediction=alphahat+betahat*xS+tf.exp(theta*zS)
    return prediction
#xS,zS are sample values

In [18]:
#list 3.8 Define a loss function for a partially lienar function
def maeloss(alphahat,betahat,thetahat,xS,yS,zS):
    yhat=plm(alphahat,betahat,betahat,xS,zS)
    return tf.losses.mae(yhat,yS)
#There is a function to calculate abe in tensorflow.


In [19]:
# list 3.9 Train a partially linear regression model
opt=tf.optimizers.SGD()

for i in range(1000):
    opt.minimize(lambda:maeloss(alphahat,betahat,thetahat,X[:,0],Y[:,0],Z[:,0]),
                var_list=[alphahat,betahat,thetahat])
#How to fix this issue?

























Blank

### Non-Lienar Regression

In [22]:
data=pd.read_csv('exchange_rate.csv')
data.head()

Unnamed: 0,DATE,DEXUSUK
0,2016-07-18,1.3279
1,2016-07-19,1.3123
2,2016-07-20,1.3179
3,2016-07-21,1.3216
4,2016-07-22,1.3091


In [24]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1305 entries, 0 to 1304
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   DATE     1305 non-null   object
 1   DEXUSUK  1305 non-null   object
dtypes: object(2)
memory usage: 20.5+ KB


In [29]:
data['DEXUSUK']=pd.to_numeric(data['DEXUSUK'],errors='coerce')
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1305 entries, 0 to 1304
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   DATE     1305 non-null   object 
 1   DEXUSUK  1247 non-null   float64
dtypes: float64(1), object(1)
memory usage: 20.5+ KB


In [30]:
#List 3-10 Prepare the data for a TAR model of the USD-GBP Exchange Rate
e=np.array(data['DEXUSUK'])#convert log number to array data structure as tensorflow operate array data only
de=tf.cast(np.diff(e[:-1])<-0.02,tf.float32)#Identify exchange rate decreases greater than 2%
le=tf.constant(e[1:-1],tf.float32) #Define the lagged exchange rate as a constant
e=tf.constant(e[2:],tf.float32)

In [31]:
#List 3-11 Define parameters for a TAR model of the USD-GBP exchange  rate
rho0hat=tf.Variable(0.80,tf.float32)
rho1hat=tf.Variable(0.80,tf.float32)




In [37]:
#List 3-12 Define model and loss function for TAR model of USD-GBP exchange rate
#Define model
def tar(rho0hat,rho1hat,le,de):
    regime0=rho0hat*le
    regime1=rho1hat*le
    prediction=regime0*de+regime1*(1-de)
    return prediction

#Define Loss
def maeloss(rho0hat,rho1hat,e,le,de):
    ehat=tar(rho0hat,rho1hat,le,de)
    return tf.losses.mae(e,ehat)

In [38]:
# list 3-13 Train TAR Model of the USD-GBP Exchang rate
opt=tf.optimizers.SGD()
for i in range(2000):
    opt.minimize(lambda:maeloss(rho0hat,rho1hat,e,le,de),var_list=[rho0hat,rho1hat])

In [47]:
print(rho0hat.numpy())


nan


In [52]:
#I have some problems on the graph drawing,so need to fix this issue later

### Logistic Regression

Logistic Curve:
    $$p(X)=\frac{1}{1+e^{-\left(\alpha+\beta_{0}X_{0}+...+\beta_{k}X_{k}\right)}}$$
Binary cross-entrophy loss function:
    $$\sum_i -\left(Y_i *log\left(p\left(X_i\right)\right)\right)+\left(1-Y_i\right)*log\left(1-p\left(X_i\right)\right)$$
In tensorflow, use **tf.losses.binary_crossentropy( )** to compute the binary cross_entrophy loss

* **Modules of Tensorflow** to compute loss:
    * tf.losses: native Tensorflow implementation
        * tf.binary_crossentropy( )
        * tf.categorical_crossentropy( )
        * tf.sparse_categorical_crossentropy( )
    * tf.keras.losses: Keras implementations of the loss functions

### Stochastic Gradient Descent (SGD)


Stochastic Gradient Descent in TensorFlow:
    $$\theta_t=\theta_{t-1}-lr*g_t$$
where $\theta_t$ is a vector of parameter values at iteration t; **lr** is the *learning rate*, and $g_t$ is the gradient computed in iteration i.

* Modern extensions of SGD:
    * Root mean square propagation (RMSProp)
    * Adaptive moment estimation (Adam)
    * Adaptive gradient methods (Adagrad and Adadelta)

In [55]:
#List 3.14 Instantiate optimizers
sgd=tf.optimizers.SGD(learning_rate=0.001,momentum=0.5)# If concerned more local minima, increase momentum value
rms=tf.optimizers.RMSprop(learning_rate=0.001,rho=0.8,momentum=0.9)
#rho is the rate at which information about the gradient decays
agrad=tf.optimizers.Adagrad(learning_rate=0.001,initial_accumulator_value=0.1)
adelt=tf.optimizers.Adadelta(learning_rate=0.001,rho=0.95)
adam=tf.optimizers.Adam(learning_rate=0.001,beta_1=0.9,beta_2=0.999)

***THE END***