# SAO/LIP Python Primer Course Exercise Set 10

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/acorreia61201/SAOPythonPrimer/blob/main/exercises/Exercises10.ipynb)

In the exercises below, I suggest that you use an external text editor to create files for library creation. Feel free to use the supplied cells for code development, but you should use the Colab upload feature or your local machine to do these exercises.

## Exercise 1: Writing Your Own Integration Library

We've worked a bit with integration algorithms this week. Let's practice writing a module by implementing them in your own library.

**Your task:** Write a function that uses the trapezoidal rule to calculate the integral of a function over the range $[a, b]$. We've done this a couple times before in the exercises; you may copy your work from those exercises if you wish or write it from scratch using https://en.wikipedia.org/wiki/Trapezoidal_rule (remember that we defined $\Delta x = (b-a)/(N-1))$. Save this function to a file called `integration.py` in your current working directory. Use the cell below to test the function if necessary.

In [1]:
# YOUR CODE HERE

**Your task:** Write a function that uses the Monte-Carlo method to calculate the integral of a function over the range $[a, b]$. You may again copy the function from a previous exercise or write it from scratch if you wish. Save this function to `integration.py`, again using the cell below if necessary.

In [None]:
# YOUR CODE HERE

Let's test the accuracy of these functions against `scipy.integrate.quad()`.

**Your task:** Use the trapezoidal rule, Monte-Carlo method, and `quad()` to evaluate the following integral:

\begin{equation}
\int_0^1 \frac{x^4(1-x)^4}{1+x^2}dx
\end{equation}

Import your functions from `integration`. Evaluate the trapezoidal and Monte-Carlo integrals with 200 values over the range $N=[10, 10^6]$ using `numpy.geomspace()`. Plot your results versus $N$ on a loglog scale. Also plot the `quad` result as a dashed horizontal line. Label everything accordingly. How do the three methods compare?

In [None]:
# YOUR CODE HERE

The exact value of the above integral is $22/7 - \pi$. (We can actually use this and the fact that the integrand is greater than zero to prove $22/7 > \pi$; if you're interested, see https://en.wikipedia.org/wiki/Proof_that_22/7_exceeds_%CF%80.)

**Your task:** Make another plot of the absolute errors of each method:

\begin{equation}
\bigg| E - (\frac{22}{7} - \pi) \bigg|
\end{equation}

Here, $E$ is a placeholder for your estimates. Plot the trapezoidal and Monte-Carlo errors on a loglog scale, along with a horizontal dashed line for the `quad` error. Which method seems to have the smallest error? How quickly does the error diminish for increasing $N$?

In [2]:
# YOUR CODE HERE

## Exercise 2: Linear Regression

An important method in data analysis is *linear regression*. Given a two-dimensional data set (i.e. a data set of ordered pairs $(x, y)$), we can use linear regression to generate a *line of best fit* that approximates the linear relationship between the variables.

To generate the line, we need to determine the slope $m$ and y-intercept $b$ of best fit. For a data set with $N$ ordered pairs, we can use the following to calculate these:

\begin{equation}
m = \frac{N\sum_i(x_iy_i) - \sum_i (x_i) \sum_i (y_i)}{N\sum_i (x_i^2) - (\sum_i x_i)^2} \\
b = \frac{\sum_i (x_i^2)\sum_i (y_i) - \sum_i (x_iy_i)\sum_i x_i}{N\sum_i (x_i^2) - (\sum_i x_i)^2}
\end{equation}

**Your task:** Write a function that takes in two arrays `x` and `y` and outputs the slope and intercept of best fit. (Hint: Rather than using loops to recursively calculate the sums, use `numpy.sum()` to compute the sums of $x$, $y$, $xy$, and $x^2$ from arrays.) Save this function to a file `regression.py`.

In [None]:
# YOUR CODE HERE

One measure of how well this fit describes the data is the *correlation coefficient*, usually denoted by $R$. This value can range between -1 and 1, with -1 representing an exact negative linear relationship, 1 representing an exact positive linear relationship, and 0 representing no linear correlation. To simplify this, analysts usually list $R^2$, which ranges from 0 to 1, with 1 representing an exact linear correlation and 0 representing no correlation.

We can calculate $R$ using the following:

\begin{equation}
R = \frac{\sum_i (x_i - \mu_x)(y_i - \mu_y)}{\sqrt{\sum_i (x_i - \mu_x)^2\sum_i(y_i - \mu_y)^2}}
\end{equation}

$\mu_x$ and $\mu_y$ are arithmetic means of the $x$ and $y$ data, respectively.

**Your task:** Write a function that takes in two arrays `x` and `y` and outputs the covariance between them. Save this function to `regression.py`.

In [None]:
# YOUR CODE HERE

Let's try these out with a couple examples. First, let's do a test of *Ohm's law* from electrodynamics. Ohm's law states that in a simple circuit with resistance $R$ (not the correlation coefficient) and current $I$, the voltage will be:

\begin{equation}
V = IR
\end{equation}

This is a linear relationship, so a linear regression is a good way to verify if the law holds. We can do this by connecting a power supply with variable voltage to a circuit containing a resistor with a known resistance value and measuring the current at discrete voltage values.

Use the cell below to download a series of $N=26$ voltage and current measurements across a circuit with a $10 k\Omega$ resistor. The voltage measurements are in volts, while the current measurements are in milliamps.

In [None]:
# get data from lab 4 225

**Your task:** Import your functions from `regression` and generate a linear fit to the data. Plot the original data points as well as the line of best fit $y = mx + b$. Label your line, axes, and points accordingly.

Also, print out $R^2$ for the data. Is this value very close to 1, very close to 0, or somewhere in between? What does this say about the fit?

In [None]:
# YOUR CODE HERE

The fit seemed to work, just as we expected. However, we should check if the slope and intercept we got match the actual system. We can do this by calculating the variances in $m$ and $b$ (denoted by $\sigma_m$ and $\sigma_b$ respectively) using the following:

\begin{equation}
\sigma_y = \sqrt{\frac{\sum_i(y_i - mx_i - b)^2}{N-2}} \\
\sigma_m = \sigma_y \sqrt{\frac{N}{N\sum_i(x_i^2) - (\sum_ix_i)^2}} \\
\sigma_b = \sigma_y \sqrt{\frac{\sum_i(x_i^2)}{N\sum_i(x_i^2) - (\sum_ix_i)^2}}
\end{equation}

**Your task:** Write a function that takes in `x`, `y`, `m`, and `b`. as inputs and outputs $\sigma_m$ and $\sigma_b$ (Hint: Notice that the latter two equations have the same denominator. It may be useful to calculate this as its own variable within the function.) Add this to your `regression` library.

In [None]:
# YOUR CODE HERE

As stated above, the resistor used in the experiment had a nominal value of $10 k\Omega$. Additionally, if there's no voltage in the circuit, then there should be no current, so we should have $b = 0$.

**Your task:** Import your new function from `regression` and calculate the variances in your $m$ and $b$ values from before. Add and subtract these from the $m$ and $b$ values to get a lower and upper bound for their accepted ranges (i.e. $(m - \sigma_m, m + \sigma_m)$, and similar for $b$). Do the nominal values lie in these ranges? Are they a little bit off?

In [None]:
# YOUR CODE HERE

## Exercise 3: Archimedes' Method

One method of approximating $\pi$ is with *Archimedes' method*, which approximates the circumference of a circle to the perimeter of a circumscribed polygon. We know that the circumference of a unit circle is $2\pi r$ = $\pi$. As the number of sides in the circumscribed polygon increases, the polygon will become closer and closer to approximating the circle itself, and its perimeter will get closer to being the circle's circumference. We'll write some code to estimate the value of $\pi$ using this method.

To start, we need a way to generate the polygon itself. To do so, we can use the following formulas to generate a set of ordered pairs equally spaced along the circumference of the circle:

\begin{equation}
x_i = 0.5\cos(2\pi i/N) \\
y_i = 0.5\sin(2\pi i/N)
\end{equation}

(Notice that we're "cheating" by using $\pi$ to generate these; the original method of doing this would've involved using pencil and paper, which is far less efficient.)

**Your task:** Write a function that generates $N$ equally spaced points on the circumference of a unit circle. Test your code with a square; its vertices should be $(.5, 0)$, $(0, .5)$, $(-.5, 0)$, $(0, -.5)$. Save this to a file named `archimedes.py`.

In [None]:
# YOUR CODE HERE

**Your task:** Let's check that this function works as advertised. Import your function from `archimedes` and generate three sets of 4, 8, and 16 points. Plot these points as blue circles along with a solid black unfilled unit circle centered at (Hint: see some examples using `plt.Circle` and `plt.add_artist` at https://www.geeksforgeeks.org/how-to-draw-a-circle-using-matplotlib-in-python/#).

In [None]:
# YOUR CODE HERE

We now have to calculate the perimeter of the circumscribed polygon. We'll do it recursively. We know that the distance of a straight-line path starting at $(x_0, y_0)$ and ending at $(x_1, y_1)$ is:

\begin{equation}
\sqrt{(x_1 - x_0)^2 + (y_1 - y_0)^2}
\end{equation}

We can repeat this same procedure for each pair of points sequentially until we reach $(x_{N-1}, y_{N-1})$, whose straight line path will end at $(x_0, y_0)$. By summing up each of these straight line paths, we will get the perimeter of the shape.

**Your task:** Write a function that takes in a series of ordered pairs $(x_i, y_i)$ and outputs the perimeter of the polygon traced out by straight-line paths connecting those points. Check that your function works by using a square with $N = 4$ points; its perimeter should be:

\begin{equation}
4\sqrt{0.5^2 + 0.5^2} = 2.8284271247461903
\end{equation}

Save this to `archimedes.py`.

In [3]:
# YOUR CODE HERE

**Your task:** Finally, we'll use these two functions to approximate $\pi$. Use `archimedes` to calculate the perimeters of 200 $N$-sided circumscribed polygons, with $N = [4, 4^8]$. Plot these perimeters versus $N$ on a loglog plot along with a dashed horizontal line at $y=\pi$. How does the estimate compare? How quickly does it converge to $\pi$?

In [None]:
# YOUR CODE HERE