# Analysis practice

Developed by Natasha Holmes for Cornell Physics Labs.

This exercise aims to remind you or familiarize you with some quantitative analysis tools that will be useful in this course. Please refer to the Statistics Summary on Canvas for guidance or check in with your TA or teammates for help. These questions refer to the data in the table, which we will assume were collected previously from an experiment where the researchers expected that the relationship between $x$ and $y$ was modeled as $y=8x-5$.

| x | y | dy | 
|----------|:-------------:|------:|
| 1| 3 | 1|
| 2| 6 | 3|
| 3| 19 | 4|
| 4| 33 | 10|
| 5| 42 | 10|
| 6| 47 | 4|
| 7| 59 | 3|
| 8| 72 | 1|
| 9| 74 | 7|
| 10| 80| 2|




First, we will check whether the measurements of $y$ at $x = 1$ and $x = 2$ are distinguishable within uncertainties. To do this, we use the following function:

$$t^\prime = \frac{\left|A - B\right|}{\sqrt{\delta A^2 + \delta B^2}}$$

which takes the difference between the two measurement values ($A$ and $B$) divided by the square root of the combined uncertainties (where $\delta A$ is the uncertainty in $A$ and $\delta B$ is the uncertainty in $B$). We combine the uncertainties by adding them *in quadrature* (i.e., we add the square of each uncertainty).


If we have two sets of measurements of the same physical phenomenon, we would expect the means to be within one unit of uncertainty of eachother 68% of the time.  Therefore, we would expect to see a t' value of approximately 1 on average.  

After calculating $t'$ for two measurements, you can evaluate their disimilarity (or distinguishability) through the following interpretation:


1.  $t'\approx1$  If we have two sets of measurements and a t' value of approximately 1, then the sets are indistinguishible and they may represent the same physical phenomenon.   

2.  $t'<<1$  If we have a t' value of much less than 1, then it is possible that either we overestimated our uncertainties, or that our current level of precision is not good enough for the phenomenon that we are trying to measure.  

3.  $1\lesssim t'<3$  This is a grey area.  It is still possible that our two sets of measurements are coming from the same phenomenon, but it is less likely than if our t' is somewhere close to 1. 

4.  $t' >3$ If our t' is greater than 3, then it is unlikely that our two sets of measurements were measuring the same phenomena.  This means that we have distinguished between two sets of physical phenomena.  


NOTE: $|t'| \le 1$ **does not** mean that A and B are the same. It only tells us that, we cannot distinguish them with the given data. For example, if you do a better measurement and decrease the uncertainties, you might later uncover a difference between A and B. That is, poor precision may be hiding a subtle difference!

### 1. Using whatever software or technology you're comfortable with (e.g., calculator, excel), determine whether the measurements of $y$ at $x = 1$ and $x = 2$ are distinguishable using a $t^\prime$. 
**ANSWER:** $t'=0.9486$, these are indistinguishable.

In the code below, we've called a notebook that defines a Python code to calculate $t^\prime$ from given inputs. The function is defined so that you enter each measurement and its uncertainty all separated by commas. That is, if we measured two values $A \pm \delta A = 5 \pm 1$ and $B \pm \delta B = 2 \pm 3$, we would call $t^\prime$ by typing:  **print(t_prime(5,1,2,3)**

### 2. Replace the ... with the relevant measurements and uncertainties to use the function to determine whether the measurements of $y$ at $x = 1$ and $x = 2$ are distinguishable. 
Check whether your value here is the same as your value in Q1. If not, check your calculations above and/or how you entered the measurements below.

In [1]:
%run ./utilities.ipynb

# an example of using t_prime
print(t_prime(3,1,6,3))


0.9486832980505138


Next, we'll plot the measurements of $x$ and $y$ to qualitatively evaluate whether the relationship between $x$ and $y$ is modeled as $y=8x-5$. First, let's load the data into separate arrays for x, y, and dy (the uncertainty in y).

### 3. Replace the ... with the relevant measurements and uncertainties for each variable, with each value separated by a comma. We've filled in the data for $x$ as an example.

In [2]:
x=[1,2,3,4,5,6,7,8,9,10]
y=[3, 6, 19, 33, 42, 47, 59, 72, 74, 80]
dy=[1, 3, 4, 10, 10, 4, 3, 1, 7, 2]


The code below will allow you plot the paired values of $x$, $y$, and the uncertainty in $y$.

### 4. Run the code below to generate a plot of the data ($y$ versus $x$, with the uncertainties in y). 

In [3]:
#converts x and y to numpy arrays.  
x=np.array(x)
y=np.array(y)
dy=np.array(dy)

plt.figure()
plt.errorbar(x,y, dy, fmt='.')
plt.title("y vs x")
plt.show()

<IPython.core.display.Javascript object>

From the plot, but without fitting a line, we can qualitatively see that the relationship between $x$ and $y$ appears fairly linear. We'll explore two ways to check how well the data fit the function $f(x)=8x-5$. Please refer to the Statistics Summary on Canvas for more information about least-squares fitting.

First, we can find the $\chi^2$ value between the line $f(x)=8x-5$ and the data. Second, we can find the best fitting line and compare the best-fitting parameters to the proposed fit parameters.

Run the code below and uncheck the box that says "fit". Change the values of the intercept and the slope to compare the data to the fit line $f(x)=8x-5$.

In [4]:
autoFit(x=x, y=y, dy=dy, title="Manual fit y vs x")

<IPython.core.display.Javascript object>

interactive(children=(FloatSlider(value=-6.64546600086426, description='intercept', max=-1.664546600086426, mi…

### 5. What is the $\chi^2$ value between the data and a fit line $f(x)=8x-5$? What does this say about how well the line $f(x)=8x-5$ fits to the data?
**ANSWER:** The $\chi^2$ between this function and the data is $23.499$, which indicates that the function doesn't fit the data very well.

The code below will use least-squares fitting methods to automatically find the best-fitting line through the data.

### 6. Run the code below find the best-fitting line to the data. Using the $\chi^2$  value, the plot, and the residual graph, how well do you think the line fits to the data?

**ANSWER:** With a $\chi^2=3.824$, I would say this fit is mediocre at best. The residuals plot doens't have many points near 0, with most points only making it by having larger uncertainties.

In [6]:
autoFit(x=x, y=y, dy=dy, title="Auto-fit y vs x")

<IPython.core.display.Javascript object>

interactive(children=(FloatSlider(value=-6.64546600086426, description='intercept', max=-1.664546600086426, mi…

### 7. Use the uncertainties in the fit parameters to compare each fit parameter to the predicted values (slope = 8 and intercept = -5) using a $t^\prime$, filling in the ... in the cell below.  Hint:  What are the uncertainties on your model when no measurement is involved?  

In [6]:
#slope
print("t' between theoretical and measured slopes:")
print(t_prime(8, 0, 9.458, 0.34))

#intercept
print("t' between theoretical and measured intercepts:")
print(t_prime(-5, 0, 6.645, 2.09))

t' between theoretical and measured slopes:
4.288235294117647
t' between theoretical and measured intercepts:
5.57177033492823


### 8. Given the researchers expectation for the relationship, what are three reasonable things the researchers could do next?

**ANSWER:** While the $\chi^2$ value isn't too bad (3.824), the $t'$ values are much higher than acceptable, which indicates that the fit is good while the theory is bad.
1. Researchers should go back and refine their theory. There is clearly a linear relationship in the data, but their linear model doesn't accurately match the behavior of this data.
2. Refine experimental methods: researchers should make sure that their experiment is making the correct measurement that they're trying to find. The data is consistent with a linear fit, but the discrepensy between the data and the theory opens the possibility to bad experimental methods.
3. Verify fidelity of experimental equipment: researchers can run extensive diagnostics on their equipment to make sure their measurements are calibrated correctly and there aren't any offsets that cause the data not to match up with experiment.

Save your notebook with all your answers to the questions, modified code cells, and output from each code cell. Submit your notebook by uploading it to the Gradescope assignment.