## Exercise 1

Poisson regression is a Generalized Linear Model, used to model count data. It takes the form

$$\mathbb{E}(\mu|x)=\exp(w_1\,x_1+\ldots+w_k\,x_k+b),$$

where the observed counts $y$ are drawn from a Poisson distribution on the expected counts: 

$$y_i \sim \text{Poisson}(\mu_i).$$

1. Download and import Load the smoking dataset from: [https://data.princeton.edu/wws509/datasets/#smoking](https://data.princeton.edu/wws509/datasets/#smoking). Then perform a train-test split on the data;


In [42]:
%reset -f
import numpy as np
import pandas as pd

import pyro
import pyro.distributions as dist
import pyro.optim as optim

from pyro.infer import SVI, Trace_ELBO
from pyro.infer import Predictive

import torch
import torch.distributions.constraints as constraints

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
import seaborn as sns

pyro.set_rng_seed(42)

In [43]:
data = pd.read_csv("https://data.princeton.edu/wws509/datasets/smoking.raw",sep="\t",header=None)
data.columns = ["Age_Group","Smoking_Status","Population","Deaths"]
data.head()

Unnamed: 0,Age_Group,Smoking_Status,Population,Deaths
0,1,1,656,18
1,2,1,359,22
2,3,1,249,19
3,4,1,632,55
4,5,1,1067,117


* age at the start of follow-up: in five-year age groups coded 1 to 9 for 40-44, 45-49, 50-54, 55-59, 60-64, 65-69, 70-74, 75-79, 80+.
* smoking status: coded 1 = never smoked, 2 = smoked cigars or pipe only, 3 = smoked cigarettes and cigar or pipe, and 4 = smoked cigarettes only,
* population: number of male pensioners followed, and
* deaths: number of deaths in a six-year period.

In [44]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36 entries, 0 to 35
Data columns (total 4 columns):
 #   Column          Non-Null Count  Dtype
---  ------          --------------  -----
 0   Age_Group       36 non-null     int64
 1   Smoking_Status  36 non-null     int64
 2   Population      36 non-null     int64
 3   Deaths          36 non-null     int64
dtypes: int64(4)
memory usage: 1.2 KB


Now we perform a train-test split on the data:
- **train data** - 80% of the observations will be used to perfom inference on our model
- **test data** - the remaining 20% will be used for testing the correctness of posterior predictions 

In [45]:
scaler = MinMaxScaler()
data["Population"] = scaler.fit_transform(pd.DataFrame(data["Population"]))

In [46]:
deaths = torch.tensor(data["Deaths"].values, dtype=torch.float)
predictors = torch.stack([torch.tensor(data[column].values,dtype=torch.float) 
                            for column in data.columns if column != "Deaths"],1)

X_train, X_test, y_train, y_test = train_test_split(predictors, deaths, test_size=0.20, 
                                                    random_state=42,shuffle=True)

print("X_train.shape =", X_train.shape,"\ny_train.shape =", y_train.shape)
print("\nX_test.shape =", X_test.shape,"\ny_test.shape =", y_test.shape)

X_train.shape = torch.Size([28, 3]) 
y_train.shape = torch.Size([28])

X_test.shape = torch.Size([8, 3]) 
y_test.shape = torch.Size([8])


2. Fit a Poisson bayesian regression model using the number of deaths as the response variable and the other columns as the explanatory variables;

The target variable in our model is the number of deaths and we wish to infer the parameters corresponding to the following predictors

$$
\text{Deaths}=\text{exp}(w_0\cdot\text{Age_Group}+w_1\cdot\text{Smoking_Status}+w_2\cdot\text{Population}+b) + \epsilon
$$

We set a normal prior on $w$, a Log-Normal on the bias term $b$ and a uniformly distributed std for the gaussian noise on $\hat{y}$

\begin{align*}
w&\sim\mathcal{N}(0,1)\\
b&\sim\text{LogNormal}(0,1)\\
\hat{\mu}&= \text{exp}(w x+ b) \\
y &\sim \text{Poisson}(\hat{\mu}).
\end{align*}

Then we define the family of posterior distributions, by setting a Gamma distribution on $w$ and a Log-Normal on $b$, and run SVI inference on $(x,y)$ data.

Notice the prior distribution on the bias term makes this regression problem analytically intractable.

**will be read**

https://towardsdatascience.com/an-illustrated-guide-to-the-poisson-regression-model-50cccba15958

https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Poisson_Regression.pdf

**Poisson ile ilgili okuma yapılacak, tanım aralıklarına bakılacak**

**w  için negatif değer bulunmaması gerekiyor ona bakılacak, muhati nasıl pozitif yaparım** 

In [40]:
n_observations, n_predictors = predictors.shape

# sample weights
w = pyro.sample("w", dist.Normal(torch.zeros(n_predictors), 
                                    torch.ones(n_predictors)))
b = pyro.sample("b", dist.LogNormal(torch.zeros(1), torch.ones(1)))

mu_hat = torch.exp((w*predictors).sum(dim=1) + b)

# condition on the observations
with pyro.plate("deaths", len(deaths)):
    pyro.sample("obs", dist.Poisson(mu_hat), obs=deaths)

In [48]:
def death_model(predictors, deaths):
    
    n_observations, n_predictors = predictors.shape
    
    # sample weights
    w = pyro.sample("w", dist.Normal(torch.zeros(n_predictors), 
                                        torch.ones(n_predictors)))
    b = pyro.sample("b", dist.LogNormal(torch.zeros(1), torch.ones(1)))
    
    mu_hat = torch.exp((w*predictors).sum(dim=1) + b)
    
    # condition on the observations
    with pyro.plate("deaths", len(deaths)):
        pyro.sample("obs", dist.Poisson(mu_hat), obs=deaths)
        
def death_guide(predictors, deaths=None):
    
    n_observations, n_predictors = predictors.shape
        
    w_loc = pyro.param("w_loc", torch.rand(n_predictors), constraint=constraints.positive)
    w_scale = pyro.param("w_scale", torch.rand(n_predictors), 
                         constraint=constraints.positive)
    
    w = pyro.sample("w", dist.Gamma(w_loc, w_scale))
    
    b_loc = pyro.param("b_loc", torch.rand(1))
    b_scale = pyro.param("b_scale", torch.rand(1), constraint=constraints.positive)
    
    b = pyro.sample("b", dist.LogNormal(b_loc, b_scale))
    
death_svi = SVI(model=death_model, guide=death_guide, 
              optim=optim.ClippedAdam({'lr' : 0.01}), 
              loss=Trace_ELBO())

for step in range(2000):
    loss = death_svi.step(X_train, y_train)/len(X_train)
    if step % 100 == 0:
        print(f"Step {step} : loss = {loss}")


Step 0 : loss = 93.95904425425189
Step 100 : loss = 2803.019447884389
Step 200 : loss = 1522.4704756853837
Step 300 : loss = 6301.159454096109
Step 400 : loss = 401.5093303886907
Step 500 : loss = 260685.97967164963
Step 600 : loss = 3393509.984654016
Step 700 : loss = 348.2024760799749
Step 800 : loss = 361426.7742816306
Step 900 : loss = 71.80504192199025
Step 1000 : loss = 54017015.67062563
Step 1100 : loss = 438.7351415157318
Step 1200 : loss = 622.0738740733692
Step 1300 : loss = 41085367.382489525
Step 1400 : loss = 55.8839810905712
Step 1500 : loss = 828.5340937290873
Step 1600 : loss = 450.55361250681534
Step 1700 : loss = 476065.80741048924
Step 1800 : loss = 735.5033743626306
Step 1900 : loss = 23374.32838944878


In [None]:
print("Inferred params:", list(pyro.get_param_store().keys()), end="\n\n")

# w_i and b posterior mean
inferred_w = pyro.get_param_store()["w_loc"]
inferred_b = pyro.get_param_store()["b_loc"]

for i,w in enumerate(inferred_w):
    print(f"w_{i} = {w.item():.8f}")
print(f"b = {inferred_b.item():.8f}")

**Posterior predictive distribution**

We can use the `Predictive` utility class, corresponding to the posterior predictive distribution, to evaluate our model on test data. Here we compute some summary statistics (mean, std and qualtiles) on $100$ samples from the posterior predictive:

In [50]:
# print latent params quantile information
def summary(samples):
    stats = {}
    for par_name, values in samples.items():
        marginal = pd.DataFrame(values)
        percentiles=[.05, 0.5, 0.95]
        describe = marginal.describe(percentiles).transpose()
        stats[par_name] = describe[["mean", "std", "5%", "50%", "95%"]]
    return stats

# define the posterior predictive
predictive = Predictive(model=death_model, guide=death_guide, num_samples=100,
                        return_sites=("w","b"))

# get posterior samples on test data
svi_samples = {k: v.detach().numpy() for k, v in predictive(X_test, y_test).items()}

# show summary statistics
for key, value in summary(svi_samples).items():
    print(f"Sampled parameter = {key}\n\n{value}\n")

Sampled parameter = w

       mean       std        5%       50%       95%
0  0.651493  0.761568  0.005992  0.398287  2.404526
1  0.152532  0.158151  0.000753  0.095998  0.491336
2  3.787127  2.943419  0.568067  2.898957  9.268196

Sampled parameter = b

      mean       std        5%      50%       95%
0  0.94716  0.250861  0.602919  0.90823  1.464611




3. Evaluate the regression fit on test data using MAE and MSE error metrics.

The most known metrics for comparing different regression models are the **Mean Absolute Error** (MAE)

$$\frac{1}{n}\sum_{i=1}^n |y_i - \hat{y}_i|$$

and the **Mean Squared Error** (MSE)

$$\frac{1}{n}\sum_{i=1}^n (y_i - \hat{y}_i)^2,$$

where $n$ is the number of observations, $y$ are the true values `y_test` and $\hat{y}$ are the predicted values `y_pred`.

In [51]:
# compute predictions using the inferred paramters
y_pred = torch.exp((inferred_w * X_test).sum(1)) + inferred_b

print("MAE =", torch.nn.L1Loss()(y_test, y_pred).item())
print("MSE =", torch.nn.MSELoss()(y_test, y_pred).item())

MAE = 715.705810546875
MSE = 1427406.625


## Exercise 2

The Iris dataset contains petal and sepal length and width for three different types of Iris flowers: Setosa, Versicolour, and Virginica.

1. Import the Iris dataset from `sklearn`:
```
from sklearn import datasets
iris = datasets.load_iris()
```
and perform a train-test split on the data.

2. Fit a multinomial bayesian logistic regression model on the four predictors petal length/width and sepal length/width. 

3. Evaluate your bayesian classifier on test data: compute the overall test accuracy and class-wise accuracy for the three different flower categories.