# Measuring The Near Zone Size

In this python notebook you will be walked through how to measure the near zone size of a quasar. 

We will be working with real data! The quasar you are to analyze is called J0008-0626 (Quasars are named based on their coordinates on the sky). This is a fairly distant quasar at a redshift of 5.929 with a rather average luminosity.

We will start our analysis by reading in our data. You are given data in the form of a .txt file. The data is comma delineated, meaning that each value in a row is separated by a comma rather than a space or tab. Feel free to read in the data however you like, but a good function to use is [numpy.genfromtxt()](https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html)

Make sure you assign the values of each column to different variables!

In [None]:
import numpy

You should now have 5 arrays of data: a wavelength array, a flux array for the model and data, and a error array for both as well. The first step is always to normalize the spectrum of the data and model. A typical normalization would involve finding the largest flux value in the whole dataset and dividing every data point by that value. Be sure to divide all of the model flux values by the **same** value.

Now, we smooth the data. In practice this means reducing the resolution of our data by approximating a range of points as one data point whose value is the average of those that comprise it. This may sound difficult, but the Scipy library makes its surprisingly simple. Refer to the documentation for [scipy.stats.binned_statistic()](https://scipy.github.io/devdocs/reference/generated/scipy.stats.binned_statistic.html)

In [None]:
import scipy.stats

Before we can caluclate the TFR, we also have to fit the model to the data. We do this because the model will likely be on a different scaling, so we have to fit it to the data to get any sort of accurate results.

We will fit the data by simply assuming that the model spectrum is equal to our data spectrum multiplied by some constant, or:

$$ Model = k \times Data$$

We can use the Scipy library again to find the best value for k. In particular: [scipy.optimize.curve_fit()](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html).

When making our fit, we only want to use the part of the spectrum that is redward of the $Ly$-$\alpha$ peak. So you will have to define truncated data and model arrays that only include wavelengths redward of $\lambda_{rest} = 1216$ Å.

Before writing your code, spend a few minutes and write down below why we only want to fit the model to this particular part of the spectrum.

Now we calculate the TFR or Transmitted Flux Ratio. For each wavelength in your array, divide the flux you have for your data, by the flux at that wavelength for the model. Print your results below.

As you know, the near zone begins at the $Ly$-$\alpha$ emission peak of the quasar. You can find where this is in the data by redshifting it's rest wavelength of 1216 Å. Starting at this point and moving blueward (towards lower wavelengths), find the first wavelength where the TFR goes below 0.1

Using the equation we derived earlier, calculate the redshift of the hydrogen atom at the end of the near zone, and use this to find the near zone size. Pay attention to units!

Extension: Make a plot of the TFR and Near Zone. Make one where the x axis is wavelength and one where the x axis is distance.