# Exercise set 4

> This exercise aims to show you how to perform **least squares regression** 
> for real experimental data. In the first part, we will use data that
> contains uncertainties, and we are going 
> to make use of this in the fitting and for estimating errors in
> the fitted parameter.
> In the second part, we will use testing/training to estimate
> what kind of errors we can expect when using a model for estimation.

## Exercise 4.1

In this exercise we will use least-squares regression to investigate a physical phenomenon: the decay of
beer froth with time. The file [Data/erdinger.csv](Data/erdinger.csv)
contains [measured heights](https://doi.org/10.1088/0143-0807/23/1/304) for beer
froth as a function of time, along with the errors in the measured heights.

Arnd Leike was awarded the 2002 [Ig Nobel prize](https://en.wikipedia.org/wiki/Ig_Nobel_Prize) for this work. In
the [original study](https://doi.org/10.1088/0143-0807/23/1/304), Leike reported data
for two more beers. The data for these two are in the
files [Data/augustinerbrau.csv](Data/augustinerbrau.csv) and [Data/budweiser.csv](Data/budweiser.csv).
If you have extra time, you can try to redo [4.1(d)](#4.1(d)) also for these two beers.

### 4.1(a)
Create a linear model for the beer froth height as a function of time using least squares.
Plot your model with the raw data, calculate the coefficient of determination, $R^2$ , and plot
the residuals. What do you think about your model?

In [None]:
# Here is some code to get you started:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns  # Styling of plots


%matplotlib notebook
sns.set_theme(style="ticks", context="notebook", palette="muted")

data = pd.read_csv("Data/erdinger.csv")
data.head()

In [None]:
time = data["time"].to_numpy()
height = data["height"].to_numpy()
height_error = data["height-error"].to_numpy()

In [None]:
# In this exercise, you are encouraged to try sklearn and its LinearRegression method:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error

In [None]:
# Make linear model:
model1 = LinearRegression(fit_intercept=True)  # Create the model, and include the constant term.
X = time.reshape(-1, 1)
model1.fit(X, height)

y_hat_1 = model1.predict(X)
# To calculate R²:
r2_model1 = model1.score(X, height)
# And a mean squared error:
mse_model1 = mean_squared_error(height, y_hat_1)
# Summarize the model with some short text:
model1_txt = f"y = {model1.coef_[0]:.3g}x + {model1.intercept_:.3g}"
model1_txt = f"{model1_txt}\n(R² = {r2_model1:.3g}, MSE = {mse_model1:.3g})"

In [None]:
# And here is a hint for the plotting - use errorbar to display the errors in the raw data:
fig, (ax1, ax2) = plt.subplots(constrained_layout=True, ncols=2, figsize=(8, 4))
ax1.errorbar(
    time,
    height,
    yerr=height_error,
    label="Raw data",
    fmt="o",  # Just show the symbols and no lines
    capsize=4, # Size of end of the error bars
)
ax1.plot(
    time,
    y_hat_1,
    lw=3,
    label=model1_txt,
)
ax1.set(xlabel="Time (s)", ylabel="Height (cm)")
ax1.legend()

ax2.scatter(y_hat_1, height - y_hat_1)
ax2.set(xlabel="ŷ", ylabel="Residual, y - ŷ")
ax2.set_ylim(-1.1, 2.0)
sns.despine(fig=fig)

### Answer to question 4.1(a): "What do you think about your model?"

Well, the R² is quite hight and the model seems to be OK overall. But it is missing
the fact that the change in the height is not constant. Further, we clearly see that
the residuals are dependent on the predicted heights and this also points toward that something
is missing in the model.

The error is increasing towards the end in the left hand figure. And if we were to predict outside the
range we used for making the model, we would probably make a large error.

For a time greater than 480 s the model predicts a negative height.
Of course, the height cannot be smaller than zero, so this can mean two things: (i) the model only apply
between 0 and 360 s or (ii) the model is not physically sound and we should improve it. We will actually improve it (so option (ii)) in the next questions.

### 4.1(b)
If we assume that the change in froth volume is proportional
to the volume present at any given time, we can show that we get
exponential decay of the froth height,

\begin{equation}
\frac{h(t)}{h(0)} = \exp \left(-\frac{t}{\tau} \right),
\end{equation}

where $h(t)$ is the height of the froth as a function of time $t$, and $\tau$ is a parameter.
We will assume that $h(0)$ is a known parameter equal to the height of the froth at the initial time.

Show how you can transform the equation above to a linear equation of the form,

\begin{equation}
y = b x,
\end{equation}

and express $b, x, y$ in terms of $h, h(0), t, \tau$.

**Note:** The equation $y=bx$ does not include the usual constant term.
This will modify the least squares equation as shown in [Appendix A](#A.-Least-squares-without-the-intercept)
You can use the equation from the appendix to calculate $b$ in the following or (recommended!)
make use of methods where you can turn off the intercept, for instance
[``LinearRegression(fit_intercept=False)``](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)

### Answer to question 4.1(b):

If we take the natural logarithm on both sides of the equation, we get,

\begin{equation}
\ln \left( \frac{h(t)}{h(0)} \right) = -\frac{t}{\tau} = -\frac{1}{\tau} \times t .
\end{equation}

Setting,
\begin{equation}
y = \ln \left( \frac{h(t)}{h(0)} \right), \quad x = t, \quad b=-\frac{1}{\tau},
\end{equation}
 we get,
\begin{equation}
\underbrace{\ln \left( \frac{h(t)}{h(0)} \right)}_{y} = -\frac{t}{\tau} = \underbrace{-\frac{1}{\tau}}_{b} \times \underbrace{t}_{x},
\end{equation}

or $y = bx$.

### 4.1(c)
Use the transformation you found above to create a new linear model where you estimate
the value of $\tau$. Plot your new model together with the raw data and calculate $R^2$.

In [None]:
# Define y:
data["y"] = np.log(height / height[0])
data.head()

X = time.reshape(-1, 1)  # This is the same as before
y = data["y"]  # This is the new y

model2 = LinearRegression(fit_intercept=False)  # New model, without intercept
model2.fit(X, y)
r2_model2 = model2.score(X, y)
y_hat_2 = model2.predict(X)
# Convert y back into a height:
height_hat_2 = height[0] * np.exp(y_hat_2)  # Height, estimated by model 2
mse_model2 = mean_squared_error(height, height_hat_2)  # MSE, convert to height to compare with model 1


tau = -1.0 / model2.coef_[0]
print(f"tau = {tau:.4g} s")

model2_txt = f"h(t) = h(0) exp(-t/{tau:4g})"
model2_txt = f"{model2_txt}\n(R² = {r2_model2:.3g}, MSE = {mse_model2:.3g})"

In [None]:
fig, (ax1, ax2) = plt.subplots(constrained_layout=True, ncols=2, figsize=(8, 4))
ax1.errorbar(
    time,
    height,
    yerr=height_error,
    label="Raw data",
    fmt="o",  # Just show the symbols and no lines
    capsize=4, # Size of end of the error bars
)
ax1.plot(
    time,
    height_hat_2,
    lw=3,
    label=model2_txt,
)
ax1.set(xlabel="Time (s)", ylabel="Height (cm)")
ax1.legend()

# Let us plot these for heights so we can compare with model 1
ax2.scatter(height_hat_2, height - height_hat_2)
ax2.set(xlabel="ŷ", ylabel="Residual, y - ŷ")
ax2.set_ylim(-1.1, 2.0)

sns.despine(fig=fig)

### Answer to question 4.1(c): What value did you get for $\tau$?

From the coefficient found in the least squares fit: $\tau \approx 290$. We see that the residuals are now all
smaller i size, but we are overestimating the height for a lot of the points. We still seem to have a trend in the residuals, so maybe the model is not perfect, but R² is now really close to 1.

### 4.1(d)
[Leike](https://doi.org/10.1088/0143-0807/23/1/304) found a
value of $\tau = 276$ s which is probably lower than the
value you found in the previous task.
We will now try to reproduce the results of Leike, but to
do that, we have to do weighted least squares.

As you have seen,
the raw data includes errors that are not constant. We can use
these errors to give weights to the data points in the fitting:
we give more importance
to points with smaller errors and less importance to points with larger errors.

One way forward is to assign weights ($w_i$) as $w_i = 1/\sigma_i^2$ where $\sigma_i$ is the
reported error for observation $i$. But we need to consider the fact that we
are now fitting to $y = \log (h(t) / h(0))$, and this will also modify the errors.
If you remember [propagation of errors](https://en.wikipedia.org/wiki/Propagation_of_uncertainty),
you should be able to show that $\sigma_y^2 = \sigma_h^2 / h^2$, and this is
the transformation we need.

Do the following steps to perform the weighted
least squares:
* (i) Calculate errors for your $y$ values according to $\sigma_y^2 = \sigma_{h}^2 / h^2$.

* (ii) Calculate weights for your $y$ values as $w = 1/\sigma_y^2$. Note: If
  a $\sigma_y$ value is zero, set the corresponding weight to zero.
  
* (iii) Run a weighted least squares fitting using your $w$'s as weights (see the Jupyter notebook version
  for more hints), and find $\tau$. Plot your new model and calculate $R^2$.

In [None]:
model3 = LinearRegression(fit_intercept=False)
weights_h = 1.0 / height_error**2
weights_h[weights_h == float("inf")] = 0
weights_h /= sum(weights_h)

# i)
sigma_y_sq = height_error**2 / height**2
# ii)
weights = 1.0 / sigma_y_sq
weights[weights == float("inf")] = 0  # Set infinite values to zero

# iii)
model3.fit(X, y, sample_weight=weights)  # Do fitting, but use the weights
r2_model3 = model3.score(X, y, sample_weight=weights)  # Calculate R² (considering the weights).
y_hat_3 = model3.predict(X)
height_hat_3 = height[0] * np.exp(y_hat_3)
mse_model3 = mean_squared_error(height, height_hat_3, sample_weight=weights_h)

In [None]:
tau_ = -1.0 / model3.coef_[0]
print(f"tau = {tau_:.4g} s")

model3_txt = f"h(t) = h(0) exp(-t/{tau_:4g})"
model3_txt = f"{model3_txt}\n(R² = {r2_model3:.3g}, MSE = {mse_model3:.3g})"

In [None]:
fig, (ax1, ax2) = plt.subplots(constrained_layout=True, ncols=2, figsize=(8, 4))
ax1.errorbar(
    time,
    height,
    yerr=height_error,
    label="Raw data",
    fmt="o",  # Just show the symbols and no lines
    capsize=4, # Size of end of the error bars
)
ax1.plot(
    time,
    height_hat_3,
    lw=3,
    label=model3_txt,
)
ax1.set(xlabel="Time (s)", ylabel="Height (cm)")
ax1.legend()

# Let us plot these for heights so we can compare with model 1
ax2.scatter(height_hat_3, np.sqrt(weights_h) * (height - height_hat_3))
ax2.set(xlabel="ŷ", ylabel="Weighted residual, w × (y - ŷ)")
ax2.set_ylim(-1.1, 2.0)

sns.despine(fig=fig)

### Answer to question 4.1(d): What value did you get for $\tau$?

With the weighted approach, we do get a $\tau = 277$ s, which is close to the $276$ s stated in the text.
The weighted residuals are closer to zero, but there might still be a weak trend in it. R² is still 
high and the model seems to capture the general trend quite fine.

If we plot this new model and the previous
one, we see that there is no big difference between them. It could be that the $\tau=290$ and $\tau=277$
we have found are equal within the experimental uncertainty. Let us quantify the uncertainty next.

### 4.1(e)
Since we do have measured errors here, we can use them to estimate the error in the
parameter you just found. For a weighted least squares fit to the equation $y = bx$,
the error estimate ($\sigma_b$) for $b$ is,

\begin{equation}
\sigma_b^2 = \frac{1}{\sum_{i=1}^n w_i x_i^2} .
\end{equation}

Estimate the error for the $\tau$-value you just found.

In [None]:
sigma_b = np.sqrt(1.0 / np.sum(weights * time * time))
print(sigma_b)

This is the uncertainty in the $b$ parameter. To find the uncertainty for $\tau$, we can either
calculate what $b$ can be with
this uncertainty, or we can use propagation of errors:

\begin{equation}
\sigma_\tau^2 = \left(\frac{\partial \tau}{\partial b}\right)^2 \times \sigma_b^2 =\tau^4 \times \sigma_b^2 
\end{equation}


In [None]:
# Checking:
b = model3.coef_[0]
tau_1 = -1.0 / (b + sigma_b)
tau_2 = -1.0 / (b - sigma_b)
# Let us take the average of the difference:
sigma_tau = 0.5 * (abs(tau_ - tau_1) + abs(tau_ - tau_2))
print(f"σ(tau) = {sigma_tau:.4g}, round up to {np.ceil(sigma_tau):.1g}")

In [None]:
# Propagation of errors:
sigma_tau =np.sqrt(sigma_b**2 * tau_**4)
print(f"σ(tau) = {sigma_tau:.4g}, round up to {np.ceil(sigma_tau):.1g}")

### Answer to question 4.1(e): What boundaries ($\pm$) did you get for $\tau$?

The uncertainty is around 7 s, so $\tau = 277 \pm 7$ s. With the double standard deviation we
get $\tau = 277 \pm 14$ s.

This compares well with the results of Leike: $\tau = 276 \pm 7$ s at 68% confidence and $\tau = 276 \pm 14$ s
at 95%
confidence.

## Exercise 4.2

[Forbes](https://doi.org/10.1017/S0080456800032075) investigated the
relationship between the boiling point of water and
the atmospheric pressure, and collected data in the Alps and Scotland.
Forbes' goal
was to estimate altitudes from the boiling point alone. We will see if we can
estimate the atmospheric pressure from Forbes' data.

### 4.2(a) 
Load the data from Forbes (data file [Data/forbes.csv](Data/forbes.csv)), plot it,
and create a linear model
that predicts the atmospheric pressure from the temperature. Report the R² and [mean
squared error (MSE)](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html) for your model.

In [None]:
from sklearn.metrics import mean_squared_error  # Note: sklearn has a method for the MSE!

In [None]:
data_forbes = pd.read_csv("Data/forbes.csv")
data_forbes.head()

In [None]:
temperature = data_forbes["Temperature (F)"].to_numpy()
X_temp = temperature.reshape(-1, 1)
pressure = data_forbes["Pressure (inches Hg)"].to_numpy()

In [None]:
model_pressure = LinearRegression(fit_intercept=True)
model_pressure.fit(X_temp, pressure)
pressure_hat = model_pressure.predict(X_temp)

r2_pressure = model_pressure.score(X_temp, pressure)
# And a mean squared error:
mse_pressure= mean_squared_error(pressure, pressure_hat)
# Summarize the model with some short text:
text = f"y = {model_pressure.coef_[0]:.3g}x + {model_pressure.intercept_:.3g}"
text = f"{text}\n(R² = {r2_pressure:.3g}, MSE = {mse_pressure:.3g})"

In [None]:
fig, ax = plt.subplots(constrained_layout=True)
ax.scatter(temperature, pressure, label="Raw data", color="0.3")
ax.set(xlabel="Temperature (°F)", ylabel="Pressure (inches Hg)")
ax.plot(temperature, pressure_hat, label=text, lw=3)
ax.legend()
sns.despine(fig=fig)

### Answer to question 4.2(a): What R² did you get and what was the MSE?

The R² is 0.994 and the mean squared error is 0.0478.

### 4.2(b) 

Estimate the error you can expect to make if you use your model for predicting the pressure.
Do this by Leave-one-out cross-validation (LOOCV) and calculate the mean squared error
of cross-validation ($\text{MSE}_\text{CV}$)

LOOCV is a special case of **training** and **testing**, and you can find a short description of it
in [appendix B](#B.-Leave-one-out-cross-validation). Please see the Jupyter notebook for a code example you can use. The code
example for LOOCV is concise, so make sure you understand what goes on here (that is,
what LOOCV is doing). If you are working with someone, try explaining testing/training
and how LOOCV works to them.

In [None]:
# Example 1 of LOOCV:
# sklearn has a method to pick out samples for leave-one-out:
from sklearn.model_selection import LeaveOneOut

loo = LeaveOneOut()
error = []
for train_index, test_index in loo.split(X_temp):  # Split into training and testing
    # train_index = index of samples to use for training
    # test_index = index of samples to use for testing
    # Pick out samples (for training and testing):
    X_train, X_test = X_temp[train_index], X_temp[test_index]
    y_train, y_test = pressure[train_index], pressure[test_index]
    # Fit a new model with the training set:
    model = LinearRegression(fit_intercept=True).fit(X_train, y_train)
    # Predict y for the test set:
    y_hat = model.predict(X_test)
    # Compare the predicted y values in the test set with the measured ones:
    error.append((y_test - y_hat) ** 2)
mse_cv_1 = np.mean(error)
print(f"MSE_CV = {mse_cv_1}")

In [None]:
# Example 2 of LOOCV:
# sklearn has a method for leave-one-out selection, and a method for
# cross-validation. And these two can be combined:
from sklearn.model_selection import LeaveOneOut, cross_val_score

# Create "empty" model for fitting:
model = LinearRegression(fit_intercept=True)
# Run cross validation, where we select testing and training with LeaveOneOut:
scores = cross_val_score(model, X_temp, pressure, scoring="neg_mean_squared_error", cv=LeaveOneOut())
mse_cv_2 = np.mean(-scores)
print(f"MSE_CV = {mse_cv_2}")

**Note:** The scoring is `"neg_mean_squared_error"` above, which is the negative of the mean squared error. This is maybe schematics, but many methods in sklearn return a "score", and for most of us, a better score = a better result. So if we used the mean squared error as the score, then a larger score = a larger error = a poorer result. However, with the negative sign, a larger score (closer to zero) = smaller error = better result.

In [None]:
# OK, let us also code it to check the formula.
# OBS! First a detail that is easy to miss!! the X used for H will include the
# column of ones here.
X_matrix = np.column_stack((np.ones_like(temperature), temperature))
H = X_matrix @ np.linalg.pinv(X_matrix)
hii = np.diagonal(H)
mse_cv_3 = np.mean(((pressure - pressure_hat) / (1 - hii))**2)
print(f"MSE_CV = {mse_cv_3}")

In [None]:
print(mse_cv_1 / mse_cv_2, mse_cv_1 / mse_cv_3, mse_cv_1 / mse_pressure, np.sqrt(mse_cv_1))

### Answer to question 4.2(b): What $\text{MSE}_\text{CV}$ did you get?

I got $\text{MSE}_\text{CV} = 0.059$. This is approximately 23% greater than the mean squared error
and it is a more realistic estimate of the error we can expect to make for new samples. Just to see what
this estimate looks like, we can highlight the region within the root mean squared error:

In [None]:
fig, ax = plt.subplots(constrained_layout=True)
ax.scatter(temperature, pressure, label="Raw data", color="0.3")
ax.set(xlabel="Temperature (°F)", ylabel="Pressure (inches Hg)")

line, = ax.plot(temperature, pressure_hat, label=text, lw=3)
rmsep = np.sqrt(mse_cv_1)
upper = pressure_hat + rmsep
lower = pressure_hat - rmsep


ax.plot(
    [temperature[0], temperature[-1]],
    [upper[0], upper[-1]],
    lw=1, ls=":", color=line.get_color(),
    label="$\hat{y} + \mathrm{RMSEP}$",
)
ax.plot(
    [temperature[0], temperature[-1]],
    [lower[0], lower[-1]],
    lw=1, ls="--", color=line.get_color(),
    label="$\hat{y} - \mathrm{RMSEP}$",
)


ax.legend()
sns.despine(fig=fig)

Almost all of the points fall within the error estimated from the cross validation. However, there is one single
point that seems off... But this is a story for a later time!

# Appendix

## A. Least squares without the intercept
We are going to determine the parameter $b$ for the linear model,

\begin{equation}
y =  b x,
\end{equation}

and we do this by minimizing the sum of squared errors (assuming that we have $n$
measurements of $y$ and $x$),

\begin{equation}
S = \sum_{i=1}^n (y_i - b x_i)^2.
\end{equation}

We have:

\begin{equation*}
\frac{\partial S}{\partial b} = -2 \sum_{i=1}^n r_i x_i, \quad
\frac{\partial^2 S}{\partial b^2} = 2\sum_{i=1}^n x_i^2 \geq 0,
\end{equation*}

Note that the second derivative is positive, except for the
trivial case when $x_i = 0$, and we are indeed going to
find a minimum.
Requiring that $\frac{\partial S}{\partial b} = 0$ gives,

\begin{equation}
-2 \sum_{i=1}^n r_i x_i = 0 \implies \sum_{i=1}^n (y_i x_i - b x_i^2) = 0 \implies 
b = \frac{\sum_{i=1}^n y_i x_i}{\sum_{i=1}^n x_i^2} .
\end{equation}

We can also repeat this derivation for weighted least squares. The sum of squared errors
is then,

\begin{equation}
S = \sum_{i=1}^n w_i (y_i - b x_i)^2,
\end{equation}

where $w_i$ are the weights and, after minimization,

\begin{equation}
b = \frac{\sum_{i=1}^n w_i y_i x_i}{\sum_{i=1}^n w_i x_i^2} .
\end{equation}

You can find more information on the weighted least squares method (with error analysis)
in Bevington and Robinson <a name="cite_ref-1"></a>[[1]](#bevington).
Taylor <a name="cite_ref-2"></a>[[2]](#taylor) states error formulas for
the parameters that might be useful for cases when
the error in $y$ is known and constant (e.g., as in the ``normal'' least squares).


<a name="bevington"></a>[[1]](#cite_ref-1) Philip R. Bevington and D. Keith Robinson. Data reduction and error analysis for the physical sciences. 3rd ed. New York, NY: McGraw-Hill, 2003.

<a name="taylor"></a>[[2]](#cite_ref-2) John R. Taylor. An Introduction to Error Analysis: The Study of Uncertainties in Physical
    Measurements. 2nd ed. Sausalito, CA: University Science Books, 1997.


## B. Leave-one-out cross-validation

In Leave-one-out cross-validation (LOOCV), we first pick one sample,
measurement number $j$, and we fit the model using the $n-1$ other points
(all points except $j$). After the fitting, we check how well the model can predict
measurement $j$ by calculating the difference between the
measured ($y_j$) and predicted ($\tilde{y}_j$) value. This difference, $r_j = y_{j} - \tilde{y}_j$, is
called the predicted residual, and it tells us the error we just made.

There is nothing special about picking point $j$, and we can try all possibilities
of leaving one point out, fitting the model using the remaining $n-1$
measurements, and predicting the value we left out.
After doing this for all possibilities, we have fitted the model
$n$ times and calculated $n$ predicted residuals. The mean squared error (obtained from the squared
residuals), $\mathrm{MSE}_{\mathrm{CV}}$, can then be used
to estimate the error in the model,

\begin{equation}
\mathrm{MSE}_{\mathrm{CV}} = \frac{1}{n} \sum_{i=1}^{n} r_i^2 =  \frac{1}{n} \sum_{i=1}^{n} (y_i - \tilde{y}_i)^2,
\end{equation}

where $y_i$ is the measured $y$ in experiment $i$, and $\tilde{y}_i$ is the
predicted $y$, using a model which was fitted using all points *except* $y_i$.

For a polynomial fitting, there is an alternative to refitting the model $n$ times. In fact,
we can show that for polynomial fitting, the mean squared error can
be obtained by,

\begin{equation}
\mathrm{MSE}_{\mathrm{CV}} = \frac{1}{n}\sum_{i=1}^{n} (y_i - \tilde{y}_i)^2 =
\frac{1}{n}\sum_{i=1}^{m} \left(\frac{y_i - \hat{y}_i}{1 - h_{ii}} \right)^2,
\end{equation}

where the $\hat{y}_i$'s are predicted values using the
model fitted with *all data points*,
and $h_{ii}$ is the $i$'th diagonal element of the
$\mathbf{H}$ matrix (the projection matrix,
see Eq.(4.49) on page 49 in our textbook),

\begin{equation}
\mathbf{H} =
\mathbf{X} 
\left( 
  \mathbf{X}^\mathrm{T} \mathbf{X}
\right)^{-1}
\mathbf{X}^\mathrm{T} = \mathbf{X} \mathbf{X}^+,
\end{equation}

Note the difference between $\hat{y}_i$ and $\tilde{y}_i$, and the
fact that we  do not have to do the
refitting(!) to obtain the $\mathrm{MSE}_{\mathrm{CV}}$.

When you calculate $\mathrm{MSE}_{\mathrm{CV}}$ use one of the two approaches above or both
if you want to see if they give the same answer.