# Uncertainty in Measurement
Learn Numpy by reviewing the basic statistics in experimental physics.

To learn more about this, see chapter 2 of "Measurements and their Uncertainties" by Hughes.

## Import statement
* `import numpy` imports the numpy package
* `import numpy as np` imports the numpy package and rename it as np

In [10]:
import numpy

In [12]:
numpy.cos(numpy.pi)

-1.0

In [13]:
import numpy as np

In [14]:
np.cos(np.pi)

-1.0

## Some useful functions in numpy
* `np.mean` calculates the mean value of a chunk of numbers
* `np.std` calculates the standard deviation of a chunk of numbers

In [15]:
data = [1,2,3,4,5,6,7,8,9,10]

In [16]:
np.mean(data)

5.5

In [17]:
# population std
np.std(data)

2.8722813232690143

Be careful, the formula for standard deviation in numpy is
$$ \sigma_x = \sqrt{ \frac{1}{N-ddof}\sum_{i=1}^{N}x_i } $$
where `ddof` means delta degrees of freedom, and default value of `ddof` is 0 (so `np.std` gives you population std).

If you want sample std, you need to specify that `ddof=1` by using command `np.std(data, ddof=1)`

For more details, see [documentation](https://numpy.org/doc/stable/reference/generated/numpy.std.html).

In [21]:
# sample std
np.std(data,ddof=1)

0.08462667114679483

## Import data 
* `np.genfromtxt(file_path)` loads data at file_path 
Some useful arguments:
* `delimiter`: you can specify the delimiter in the file
* `skip_header`: you can decide to skip the header

For more details: [documentation](https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html).

In [25]:
data = np.genfromtxt(
    "data/pendulum.csv", 
    delimiter=',', 
    skip_header=True
)

In [26]:
data

array([[0.95, 0.96, 1.1 , 1.17, 1.09],
       [1.03, 1.  , 0.86, 0.86, 0.95],
       [1.08, 1.01, 0.87, 1.06, 1.04],
       [1.01, 1.15, 1.24, 1.01, 0.98],
       [1.03, 1.05, 0.94, 1.21, 1.09],
       [0.94, 1.01, 1.11, 1.07, 1.02],
       [1.  , 1.07, 0.95, 1.04, 0.99],
       [1.  , 1.13, 0.98, 0.97, 1.08],
       [1.07, 0.98, 0.88, 1.11, 1.06],
       [1.04, 0.92, 0.97, 1.15, 1.05]])

## Numpy array indexing
* use `data[i,j]` to access the value at i-th row j-th column
* use `data[n:m,j]` to access n-th to m-th values at j-th column
* use `data[i,n:m]` to access n-th to m-th values at i-th row
* `:m` means from 0 to m-1; `n:` means from n to end; `:` means from begin to end

In [27]:
data

array([[0.95, 0.96, 1.1 , 1.17, 1.09],
       [1.03, 1.  , 0.86, 0.86, 0.95],
       [1.08, 1.01, 0.87, 1.06, 1.04],
       [1.01, 1.15, 1.24, 1.01, 0.98],
       [1.03, 1.05, 0.94, 1.21, 1.09],
       [0.94, 1.01, 1.11, 1.07, 1.02],
       [1.  , 1.07, 0.95, 1.04, 0.99],
       [1.  , 1.13, 0.98, 0.97, 1.08],
       [1.07, 0.98, 0.88, 1.11, 1.06],
       [1.04, 0.92, 0.97, 1.15, 1.05]])

In [31]:
data[0,1]

0.96

In [33]:
data[0,:3]

array([0.95, 0.96, 1.1 ])

In [34]:
data[:,0]

array([0.95, 1.03, 1.08, 1.01, 1.03, 0.94, 1.  , 1.  , 1.07, 1.04])

## Mean of means and Standard error
* Mean of means is the average of all mean values.
* Standard error is the standard deviation of all mean values.

To report a measured value, you must use
$$\text{mean of means} \pm \text{stadard error} $$

In [39]:
for j in range(5):
    print(np.mean(data[:,j]))

1.015
1.028
0.9900000000000002
1.065
1.0350000000000004


In [40]:
means = [1.015,1.028,0.9900000000000002,1.065,1.0350000000000004]

In [41]:
print(np.mean(means))
print(np.std(means,ddof=1))

1.0266000000000002
0.027482721844824563


The measured period is 
$$ T = 1.03 \pm 0.03 s $$

* when you have approximately 10000 measurements, you can keep two significant figures in standard error

### A quicker way (or lazy way) to report measured value
Suppose a you have a set of measured values $\{x_1, \cdots, x_n\}$, then the report value can be written as 
$$ \langle x \rangle \pm \frac{s_x}{\sqrt{n}} $$

In [47]:
# suppose you only have 1 set of data, say[:,0]
period = data[:,0]
print(np.mean(period))
print(np.std(period,ddof=1)/np.sqrt(10))

1.015
0.014395215254459469


$$ T = 1.02 \pm 0.01 s $$