**Chapter 7, end-of-chapter problems**


In [5]:
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import pymc3 as pm
import scipy.stats as stats
import seaborn as sns
import daft
from causalgraphicalmodels import CausalGraphicalModel

from scipy.interpolate import griddata

In [6]:
%load_ext nb_black
%config InlineBackend.figure_format = 'retina'
%load_ext watermark
RANDOM_SEED = 8927
np.random.seed(RANDOM_SEED)
az.style.use("arviz-darkgrid")
az.rcParams["stats.hdi_prob"] = 0.89  # sets default credible interval used by arviz

<IPython.core.display.Javascript object>

In [7]:
def standardize(vec):
    return vec - np.mean(vec) / np.std(vec)

<IPython.core.display.Javascript object>

# 7E1

State the three motivating criteria that define information entropy. Try to express each in your own words.


1. Something that can be on a continuous scale. "Measure of uncertainty should be continuous."
2. Something that accounts for the number of events. "Measure of uncertainty should increase as number of possible events increases." (More ways to be wrong.)
3. "Measure of uncertainty should be additive." (The uncertainty over all combinations of events should be additive of the separate uncertainties.)

This can be seen in the equation:
$E =  - \displaystyle\sum\limits_{i=1}^{n} p_i\log{p_i}$


# 7E2

Suppose a coin is weighted such that, when it is tossed and lands on a table, it comes up heads 70% of the time. What is the entropy of this coin?

Taking the equation of entropy, there's two possible events: heads and tails. The probability of heads is 0.7 and the probability of tails is 0.3.

In [7]:
coin = [0.7, 0.3]
entropy = 0

for i in coin:
    entropy += i * np.log(i)
    
print(-entropy)


0.6108643020548935


<IPython.core.display.Javascript object>

**Why can't we just take one? Wouldn't it be deterministic for the other? Correct thing is to use both.**

# 7E3.

Suppose a four-sided die is loaded such that, when tossed onto a table, it shows “1” 20%, “2” 25%, “3” 25%, and “4” 30% of the time. What is the entropy of this die?

In [10]:
die = [0.2, 0.25, 0.25, 0.3]
entropy = 0

for side in die:
    entropy += i * np.log(i)
    
print(-entropy)


1.4447673651911233


<IPython.core.display.Javascript object>

# 7E4

Suppose another four-sided die is loaded such that it never shows “4”. The other three sides show equally often. What is the entropy of this die?

In [11]:
die = [0, 0.33, 0.33, 0.33]
entropy = 0

for side in die:
    entropy += i * np.log(i)
    
print(-entropy)


1.4447673651911233


<IPython.core.display.Javascript object>

# Is calibration overrated?

# Why is sigma different?

# 7H1 Laffer tax data set, making a better model

In 2007, The Wall Street Journal published an editorial(“We’reNum ber One, Alas”) with a graph of corporate tax rates in 29 countries plotted against tax revenue. A badly fit curve was drawn in (reconstructed at right), seemingly by hand, to make the argument that the relationship between tax rate and tax revenue increases and then declines, such that higher tax rates can actually produce less tax revenue. I want you to actually fit a curve to these data, found in data(Laffer). Consider models that use tax rate to predict tax revenue. Compare, using WAIC or PSIS, a straight-line model to any curved models you like. What do you conclude about the relationship between tax rate and tax revenue?

In [8]:
df_Laffer = pd.read_csv("other_data/Laffer.csv")
df_Laffer.head()

Unnamed: 0,tax_rate,tax_revenue
0,0.07,-0.06
1,8.81,2.45
2,12.84,3.58
3,16.24,2.19
4,19.18,2.46


<IPython.core.display.Javascript object>

# 7H2 Laffer data, influence of outliers

In the Laffer data, there is one country with a high tax revenue that is an outlier. Use PSIS and WAIC to measure the importance of this outlier in the models you fit in the previous problem. Then use robust regression with a Student’s t distribution to revisit the curve fitting problem. How much does a curved relationship depend upon the outlier point?

Appendix: Environment and system parameters

In [4]:
%watermark -n -u -v -iv -w

Last updated: Sat May 22 2021

Python implementation: CPython
Python version       : 3.8.6
IPython version      : 7.20.0

pandas    : 1.2.1
arviz     : 0.11.1
daft      : 0.1.2
numpy     : 1.20.1
pymc3     : 3.11.0
seaborn   : 0.11.1
matplotlib: 3.3.4
json      : 2.0.9
scipy     : 1.6.0

Watermark: 2.1.0



<IPython.core.display.Javascript object>