# The luminosity-temperature relationship in galaxy clusters

## Some theory

The [virial theorem](https://en.wikipedia.org/wiki/Virial_theorem) predicts that plasma in a dynamical equilibrium within a relaxed dark matter halo should have predictable bulk properties, such as temperature and luminosity, that depend primarily on the virial mass of the galaxy cluster (and to a lesser extent the redshift).  This is particularly useful for galaxy clusters, where the bulk plasma properties of the intracluster medium are readily observable with X-ray telescopes.  For more information on this, see [Voit (2005)](http://adsabs.harvard.edu/abs/2005RvMP...77..207V).  In particular, the virial theorem tells us that:

$k_b T_{200} = (8.2 \mathrm{keV})\big(\frac{M_{200}}{10^{15}h^{-1}M_\odot}\big)^{2/3}\big[\frac{H(z)}{H_0}\big]^{2/3}$

with $M_{200} \simeq M_{vir}$, $H_0$ is the Hubble parameter at $z=0$, H(z) is the redshift-dependent version of H, and h is $H_0$ in units of 100 km s$^{-1}$ Mpc$^{-1}$  (Voit 2005, equation 59).  Overall the scaling relationship is then (at $z \sim 0$):

$T_{200} \propto M_{vir}^{2/3}$

 Similarly, the bolometric luminosity is expected to scale with mass as follows:

$L_X = 10^{45} h_{70}^{-2} \big( \frac{M_{200}}{10^{15}h_{70}^{-1}M_\odot} \big)^{1.8} \mathrm{erg~s^{-1}}$

where $h_{70}$ is $H_0$ in units of 70 km s$^{-1}$ Mpc$^{-1}$ (Voit 2005, equation 60.  Yes, it's inconsistent with the previous equation; take it up with Mark Voit).  This tells us that $L_X \propto M_{vir}^{1.8}$.

We can combine the expressions above together to see that one would theoretically predict that $L_X \propto T_{200}^{2.7}$.  But does this hold true?


## Some observations

The [ACCEPT galaxy cluster sample](http://www.pa.msu.edu/astro/MC2/accept/) ([Cavagnolo et al. 2009](http://adsabs.harvard.edu/abs/2009ApJS..182...12C)) is a compilation of X-ray observations of galaxy clusters that includes estimates of the temperature and bolometric luminosity of the cluster.  This is a product of Michigan State University, from [Professor Megan Donahue's](http://www.pa.msu.edu/~donahue/) research group!  A new sample, ACCEPT-2, has also been published by the Donahue group.  One can use these observations to test various cluster scaling relationships.

## The plan for today

We are going to use two different methods of finding the $L_X - T$ scaling relationship using the ACCEPT sample, and we will compare the difference between the methods.  The methods are:

1.  Linear regression using [SciPy](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.linregress.html)
2.  A Bayesian Markov Chain Monte Carlo code, which we will adapt from a pre-class assignment.

Bayesian MCMC is arguably overkill for this project, but it will definitely be interesting to try it out and see how various assumptions affect the results!

**Note:** We are going to do this first with the ACCEPT sample and assumed constant errors; if you have enough time, we will use the ACCEPT-2 sample, which has variable errors.  Thanks to Megan Donahue, Dana Koeppe, and Rachel Salmon for sharing their data with us! 

In [None]:
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
# First, let's read in cluster information.
# Temperature is in units of keV, and luminosity is in units of 10^44 ergs/s.

# This code reads in the ACCEPT-1 database
Temperature, Lbol = np.loadtxt("accept_main.txt",skiprows=2,
                                             usecols=[7,8],unpack=True)


# This code reads in a sanitized version of the ACCEPT-2 database, which
# includes errors.  It will require modification of the code below to include
# variable errors in luminosity!
'''
A2_T500, A2_Terr, A2_Lbol, A2_Lerr = np.loadtxt("ACCEPT2_sanitized.dat",skiprows=5,
                                             usecols=[0,1,2,3],unpack=True)
'''

In [None]:
# Now, what exactly are we looking at?
plt.plot(Temperature,Lbol,'bo')
plt.xscale('log')
plt.yscale('log')
plt.xlabel('Temperature [keV]')
plt.ylabel(r'Luminosity [$10^{44}$ ergs/s]')
plt.title('Cluster luminosity-temperature relationship')

# Linear regression

We're first going to use the [SciPy linear regression tool](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html) to calculate the luminosity-temperature relationship.  Note that this is a *linear* method, so we're going to do a linear fit in log-log space.  In other words, instead of trying to fit the nonlinear relationship

$L_X = L_0 T^\alpha$

we will instead fit the linear relationship

$\mathrm{log_{10}(L_X) = log_{10}(L_0) + \alpha log_{10}(T)}$

which is more straightforward.  Note that some of the clusters do not have good estimates for bolometric luminosity, so we will only consider the ones that do.  Do that below using SciPy, and plot a line with your fit on top of the ACCEPT dataset below it to verify that you're getting a reasonable answer.  We will soon compare this to the Bayesian MCMC solution!

In [None]:
logTemp = np.log10(Temperature[Lbol > 0.0])
logLbol = np.log10(Lbol[Lbol > 0.0])

# get the values for log_{10}(L_0) and \alpha here!




# Bayesian MCMC

Now, on to the new stuff.  We are going to adapt two functions from the Frequentist MCMC code that we've used previously.  In particular:

1.  We're going to re-use the error estimator (which gets the reduced sum-of-squares error), which we will need to decide how good of a job we're doing.  Note that in order to do a linear fit we are using errors that are linear in log space - this is not the greatest assumption!
2.  We're going to adapt the function that gets an estimate of y=f(x) from a user-supplied set of parameters and x-values (in this case, we put in b, alpha, and log temperature and get log X-ray luminosity)

And then we're going to use a **new function called "prior"**, where we will encode our assumptions about the values of the power-law and normalization of the $L_X - T$ scaling relationship.


In [None]:
def est_error(ydata,yguess,error):
    '''
    Takes in the observed data, our current step's estimated y-values 
    for the model, and our guess for the errors in the data.
    
    Returns sum-of-squares error assuming a 2-parameter model
    '''
    return ((ydata-yguess)**2/(2*error**2)).sum()/(ydata.size-2)


def est_model_vals(alpha,b,logT):
    '''
    Given x-values (log10 temperature) and our MCMC code's estimates for alpha and b, 
    this returns estimated y values (log10 bolometric luminosity) that we can compare 
    to the actul data .
    '''
    return b + alpha*logT

def prior(alpha, b):
    '''
    Given values of alpha and b, return a probability based on prior knowledge that the user-supplied 
    values represent the real values.  Note that P(alpha,b) = P(alpha)*P(b), and the shape of the PDFs
    for each parameter are assumed to be Gaussian with a user-defined standard deviation, which are 
    shown below.
    
    NOTE: the assumed widths of these PDFs are broad, to reflect our uncertainty in the true value!
    '''
    alpha0 = 3.0
    sigma_alpha = 0.5
    b0 = -1.0
    sigma_b = 0.3
    
    p_alpha = (2.0*np.pi*sigma_alpha**2)**-0.5 * np.exp( -(alpha-alpha0)**2/(2.0*sigma_alpha**2))
        
    p_b = (2.0*np.pi*sigma_b**2)**-0.5 * np.exp( -(b-b0)**2/(2.0*sigma_b**2))
    
    return p_alpha*p_b


Now, we're going to modify our MCMC code from a previous class to include the Bayesian probabilities.  This line is absent from the code, and you have to figure out what it is to add it.  ALSO, note that we have had to make an estimate of the errors in the ACCEPT sample, since the 1st ACCEPT sample did not have a consistent enough sample to accurately calculate the errors.

In [None]:
import numpy.random as npr

# this is an ASSUMPTION FOR ACCEPT-1, since it doesn't have good x-ray data!
estimated_error = 0.2

# Initial guesses for alpha and b
alpha_old = 3
b_old = -1

# Range of step sizes that we take (assuming them to be the same for
# alpha and b, but doesn't have to be)
dalpha=db=0.01

# Total number of points we're going to sample (should be at least 10^4)
Npts = 100000
N_burn = 1000

# Initial model values and 'error' for our starting position
y_old = est_model_vals(alpha_old,b_old,logTemp)
err_old = est_error(logLbol,y_old,estimated_error)

# These lists keep track of our Markov Chain and errors so we can get our PDF later.
alpha_guess = []
b_guess = []
errors = []

iter_count = 0

# loop where we calculate our probabilities, etc.
for i in range(Npts):
    
    # Guess for our new point (A,B)
    alpha_new = alpha_old + npr.normal(0.0,dalpha)
    b_new = b_old + npr.normal(0.0,db)

    # model values based on the new A and B values
    ynew = est_model_vals(alpha_new,b_new,logTemp)
    
    # sum of squares value comparing data and model given our estimate
    # of the errors in the data.
    err_new = est_error(logLbol, ynew, estimated_error)
    
    
    '''
    ADD THE PROBABILITY OF ACCEPTANCE HERE!
    
    This should use the Bayesian probabilities, and thus include the likelihood function and the prior.
    
    HINT:  This would be a good opportunity to go back to the definition of Bayes' Theorem and think about
    how you would include the prior!
    '''
    p_accept = -1.0  # YOU'RE GOING TO WANT TO CHANGE THIS.

    #print('a,b,e,p:', alpha_new, b_new, err_new, p_accept)
    
    
    # if R < p_accept, we keep this point.  Otherwise, keep on moving!
    if npr.random() < p_accept:

        alpha_old = alpha_new
        b_old = b_new
        err_old = err_new
        
        if iter_count > N_burn:
            alpha_guess.append(alpha_old)
            b_guess.append(b_old)
            errors.append(err_old)

    iter_count += 1

print("acceptance ratio:",len(alpha_guess)/(Npts-N_burn))

Now we're going to use the same code as before to show the 2D histogram of alpha and b values, as well as the cell where the MCMC walker most frequently finds itself (i.e., our nominal best guess for alpha and b).  I will supply the plot of the histogram; you need to use the given values to make a plot of this fit on top of (1) the ACCEPT data and (2) the data found from the least-squares regression.

In [None]:
from matplotlib.colors import LogNorm

cts,xbin,ybin,img = plt.hist2d(alpha_guess, b_guess, bins=60,norm=LogNorm())

# use np.argwhere() to find the bin(s) with the max counts
vals=np.argwhere(cts==cts.max())

# use those to guess our best values of alpha, b.  If there are more than one bins with the max
# number of counts, we're just using the first one we come to.  (This is not the best idea, but
# it makes the point)
alpha_best = xbin[vals[0,0]]
b_best = ybin[vals[0,1]]
plt.plot(alpha_best,b_best,'wD',markersize=10)

plt.colorbar()
plt.xlabel('alpha vals')
plt.ylabel('b vals')
plt.title('2D Histogram of walker positions')

print("alpha, b best:",alpha_best,b_best)

In [None]:
# put your plot(s) here!



# Some questions:
   
1.  Modify the function ```prior()``` to make the previously-assumed values (A) strongly exclude the values returned by the linear regression, and, separately, (B) be even more general, with much larger standard deviations (note: "strongly exclude" in this case means "ensure that the linear regression values are many standard deviations away from the values favored by the prior).  What happens to both the best-fit values and the 2D histogram when you do these two things?  How does the value differ from your prior?
    
2.  What happens to the best-fit values and the 2D histogram when you modify the ```estimated_error``` variable to make the assumed errors significantly larger or smaller?

3.  If you use fewer data points (i.e., randomly remove a very large fraction of the clusters from the sample) what happens to your best-fit values and 2D histogram?

**Put your answers here!**

