<a href="https://colab.research.google.com/github/alxogm/tutorials/blob/lyaforest/Lya_CF_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Correlation Function of the Lyman-$\alpha$ forest in  EDR

October 2023

Alma González (U. of Guanajuato)

This notebook has been tested in Colaboratory in October 2023.

### Table of Contents
* [Overview](#overview)
* [Installs, Imports and Downloads](#imports)
* [Accessing the Data](#data)
* [Auto-correlation](#autocorrelation)

<a class="anchor" id="overview"></a>
## Overview

This notebook demonstrates how to use the delta (flux fluctuations) files provided as Lyman-$\alpha$ catalog value added catalog, as part of the DESI Early Data Release. We will compute the auto-correlation function of the Lyman-$\alpha$ forest. Finally we will compare our results with those reported by the DESI collaboration in [Gordon et.al. 2023](https://arxiv.org/abs/2308.10950)

## Bug Reporting

If you identify any errors please talk to me (gonzalez.alma@ugto.mx) as this specific tutorial is not yet in the main desihub repository.

<a class="anchor" id="imports"></a>
## Installs, Imports and Downloads



In [None]:
!pip install picca

In [None]:
from   google.colab import drive
import os
import sys
import subprocess
import bs4
import requests
import urllib.request
import numpy as np
import matplotlib.pyplot as plt
import picca
from picca.wedgize import wedge
import fitsio
from astropy.table import Table

In [None]:
#Mount the Drive and define some useful paths
drivepath='/content/drive/'
drive.mount(drivepath, force_remount=True)
desiedr_path = drivepath + '/MyDrive/Bucaramanga/desi_edr/'
desicode_path = desiedr_path+'/desicode'
specprod = 'fuji'    # Internal name for the EDR
specprod_dir = desiedr_path+specprod
lya_dir = specprod_dir+'/lya'

In [None]:
sys.path.insert(1,desicode_path+"/desitarget/py/")
sys.path.insert(1,desicode_path+"/desiutil/py/")
sys.path.insert(1,desicode_path+"/desispec/py/")
sys.path.insert(1,desicode_path+"/desimodel/py/")
sys.path.insert(1,desicode_path+'/speclite/')
import desispec.io

In [None]:
#Create some necesary directories
if not os.path.exists(lya_dir):
  os.makedirs(lya_dir)

if not os.path.exists(lya_dir+'/Delta'):
  os.makedirs(lya_dir+'/Delta')

if not os.path.exists(lya_dir+'/Log'):
  os.makedirs(lya_dir+'/Log')

if not os.path.exists(lya_dir+'/Correlations'):
  os.makedirs(lya_dir+'/Correlations')

<a class="anchor" id="data"></a>
## Accessing the data
In this case the data is the Lyman-$\alpha$ catalog, or what we usually refers to as the "Deltas". These are a value added catalog of the DESI EDR, and all documentation can be found [here](https://data.desi.lbl.gov/doc/releases/edr/vac/lymanalpha/), and the relevant reference is [Cesar Ramirez-Perez et. al. 2023](https://arxiv.org/abs/2306.06312) ... For a very basic, but practical, introduction of how these deltas are computed see this [desihigh notebook](https://github.com/michaelJwilson/desihigh/blob/main/Lymanalphaforest_explorers.ipynb)

In [None]:
#Download the Delta Files
#you only need to do this the first time, so you can comment the following lines later if you prefer

url = "https://data.desi.lbl.gov/public/edr/vac/edr/lya/fuji/v0.3/Delta/"
r = requests.get(url)
data = bs4.BeautifulSoup(r.text, "html.parser")
for l in data.find_all("a")[1:]:
    r = requests.get(url + l["href"])
    local_delta=lya_dir+'/Delta/'+l["href"]
    if not os.path.exists(local_delta):
      tmp = urllib.request.urlretrieve(url + l["href"],local_delta)
      print ("Downloaded file "+local_delta)
    else: continue
print("All Delta files are on disk")

Lest explore the content of one of the delta files and the attributes file.

In [None]:
delta_1=fitsio.FITS("/content/drive/MyDrive/Bucaramanga/desi_edr/fuji/lya/Delta/delta-1.fits.gz")
print(delta_1)

In [None]:
metadata=Table(delta_1["METADATA"][:])
tids=metadata["TARGETID"]
print(tids)

In [None]:
wavelength=delta_1["LAMBDA"][:]
deltas=delta_1["DELTA"][:,:]

Before proceding to look at the Deltas, lets look at the full spectra. For this, first we need to locate the files where these are stored. So we need to locate them in the redshift catalog.

In [None]:
zcat=Table.read(specprod_dir+"/zcatalog/zall-pix-fuji.fits")

In [None]:
#Note I have selected only those that have
w=np.in1d(zcat["TARGETID"],tids)
zcat=zcat[w]

In [None]:
#Lets select target ids that are in the same file, we do this by counting how many targetids correspond to the same healpix
hpx,indx,counts=np.unique(zcat["HEALPIX"],return_counts=True,return_index=True)
max_indx=np.argmax(counts)
hpx[max_indx],counts[max_indx],indx[max_indx]

In [None]:
zcat=zcat[zcat["HEALPIX"]==hpx[max_indx]]
zcat

In [None]:
#We use a similar function that in the tutorial
def get_spec_data_url(hpx,survey,program,redrock=False):
    specprod_dir = f"https://data.desi.lbl.gov/public/edr/spectro/redux/{specprod}"
    target_dir   = f"/healpix/{survey}/{program}/{hpx.astype(str)[:-2]}/{hpx}/"
    coadd_fname  = f"coadd-{survey}-{program}-{hpx}.fits"

    #Download the spectra file to the drive directory mantaining the same structure directory
    if not os.path.exists(desiedr_path+'/fuji'+target_dir):
      os.makedirs(desiedr_path+'/fuji'+target_dir)

    coadd_url = specprod_dir+target_dir+coadd_fname
    coadd_file=desiedr_path+'/fuji'+target_dir+coadd_fname

    if not os.path.exists(coadd_file):
        print("downloading coadd file from %s to %s"
              % (coadd_url, coadd_file))
        tmp = urllib.request.urlretrieve(coadd_url, coadd_file)
    else:
        print('%s present on disk. '%(coadd_file))

    if redrock:
      redrock_fname  = f"redrock-{survey}-{program}-{hpx}.fits"
      redrock_url = specprod_dir+target_dir+redrock_fname
      redrock_file=desiedr_path+'/fuji'+target_dir+redrock_fname

      if not os.path.exists(redrock_file):
          print("downloading coadd from %s to %s"
              % (redrock_url, coadd_file))
          tmp = urllib.request.urlretrieve(redrock_url, redrock_file)
      else:
          print('%s present on disk. '%(redrock_file))

    coadd_obj  = desispec.io.read_spectra(coadd_file)
    return coadd_obj

In [None]:
coadd_spec=get_spec_data_url(hpx[max_indx],'sv1','dark')

In [None]:
w=np.in1d(coadd_spec.fibermap["TARGETID"],zcat["TARGETID"])
coadd_spec_=coadd_spec[w]
coadd_spec_.fibermap

In [None]:
for i in range(len(coadd_spec_.fibermap)):
  plt.plot(coadd_spec_.wave['b'],coadd_spec_.flux['b'][i])
  plt.xlabel("Wavelength")
  plt.ylabel("Flux")
  plt.show()

#Exercise:Add the redshift information and check all is consistent.

Now, lets see the deltas associated to these spectra

In [None]:
w=np.in1d(metadata["TARGETID"],zcat["TARGETID"])
deltas=deltas[w]

In [None]:
#Note thes in general will not be in the same order as above, we need to fix this...
for i in range(len(deltas)):
  plt.plot(wavelength,deltas[i])
  plt.xlabel("Wavelength")
  plt.ylabel("Delta")
  plt.show()

In [None]:
#Exercise: Make plots of the continuum and the weights.

<a class="anchor" id="autocorrelation"></a>
## Computing the auto-correlation function.

We will use the [picca](https://github.com/igmhub/picca/tree/master) code. The  main reference for what is this code doing is (Gordon et.al 2023)[https://arxiv.org/abs/2308.10950]

In [None]:
os.chdir(lya_dir+'/Correlations')
!pwd

In [None]:
#With this instruction we can compute the Lya auto correlation. We limited it to use only 1000 spectra, for speedness, but for using all the deltas available you can remove the --nspec 1000 flag
!picca_cf.py --out cf_lya_lya.fits.gz --in-dir /content/drive/MyDrive/Bucaramanga/desi_edr/fuji/lya/Delta/ --nspec 1000

In [None]:
#To compute the complete distortion Matrix but still do it in a reasonable time change the --nspec 1000 flag to --rej 0.99
!picca_dmat.py --out dmat.fits.gz --in-dir /content/drive/MyDrive/Bucaramanga/desi_edr/fuji/lya/Delta/ --nspec 1000

In [None]:
!picca_export.py --data cf_lya_lya.fits.gz --dmat dmat.fits.gz --out cf_lya_lya-exp.fits.gz

In [None]:
#Lets check that all the produced files are in place
!ls

In [None]:
if not os.path.exists("Fig4_auto_corr_wedge.npz"):
  !wget https://zenodo.org/records/8244702/files/Fig4_auto_corr_wedge.npz?download=1
  !wget https://zenodo.org/records/8244702/files/wedges.py?download=1
  !mv Fig4_auto_corr_wedge.npz?download=1 Fig4_auto_corr_wedge.npz
  !mv wedges.py?download=1 wedges.py
  print("Downloaded Fig4_auto_corr_wedge.npz")
  print("Downloaded wedges.py")
else:
  print("EDR CF files from Gordon 2023 are already on disk")

#Read the file with the EDR+M2 correlation function
Gordon2023=np.load("Fig4_auto_corr_wedge.npz")
from wedges import Wedge

In [None]:
#Lets create a function that plot the results, and compares with eBOSS DR16 results.
def plot_cf(file_xis,xi_edr,rps=(-300,300,150), power=2,
                 mus=[1., 0.95, 0.8, 0.5, 0], figsize=(6, 7),
                 absMus=True, label=None,labels=None,colors=None):

    f, (axs) = plt.subplots(nrows=2, ncols=2, figsize=(12,8))

    for k,file_xi in enumerate(file_xis):
        #- Read correlation function and covariance matrix
        h = fitsio.FITS(file_xi)
        try:
            da = h[1]['DA_BLIND'][:]
        except:
            da = h[1]['DA'][:]
        co = h[1]['CO'][:]
        hh = h[1].read_header()
        rpmin = hh['RPMIN']
        rpmax = hh['RPMAX']
        rtmin = 0
        rtmax = hh['RTMAX']
        nrp = hh['NP']
        nrt = hh['NT']
        h.close()

        j=0

        for i, (mumax,mumin) in enumerate(zip(mus[:-1],mus[1:])):
            b = picca.wedgize.wedge(mumin=mumin, mumax=mumax,
                                rpmin=rpmin, rpmax=rpmax,
                                rtmin=rtmin, rtmax=rtmax,
                                nrt=nrt, nrp=nrp, absoluteMu=absMus,
                                rmin=0., rmax=min(rpmax, rtmax),
                                nr=min(nrt, nrp))
            r,d,c = b.wedge(da,co)

            nrows = 2

                        #-- Wedges and best model
            y = d*r**power
            dy = np.sqrt(c.diagonal())*r**power



            ###
            b2 = Wedge(mu=(mumin,mumax),
              rp=rps,
              rt=(0,200,50),
              r=(0., 200., 50))

            xi=xi_edr['fugu_xi']
            cov=xi_edr['fugu_cov']

            r2,d2,c2=b2.__call__(xi,cov)
            c2 = np.sqrt(np.diagonal(c2))
            y2 = d2*r2**power
            dy2 =c2*r**power
            ####
            if absMus:
                if j==0:
                    axs[j//2][j%2].errorbar(
                    r, y, dy, fmt=".",label=labels[k],color=colors[k],alpha=0.7)
                    axs[j//2][j%2].errorbar(
                    r2, y2, dy2, fmt=".",label='Gordon et. al 2023',color='b',alpha=0.7)
                    axs[j//2][j%2].axvline(100)

                else:
                    axs[j//2][j%2].errorbar(
                        r, y, dy, fmt=".",color=colors[k],alpha=0.7)
                    axs[j//2][j%2].errorbar(
                        r2, y2, dy2, fmt=".",color='b',alpha=0.7)
                    axs[j//2][j%2].axvline(100)
            else:
                axs[j//2][j%2].errorbar(
                    r, y, dy, fmt="o")
                axs[j//2][j%2].errorbar(
                    r2, y2, dy2, fmt="o")

            axs[j//2][j%2].set_ylabel(r"$r^{power}\xi(r)$".format(power=power))
            if j//2==1:
                axs[j//2][j%2].set_xlabel(r"$r \, [h^{-1}\, \mathrm{Mpc}]$")
            axs[j//2][j%2].legend(loc="best", fontsize=12)
            #axs[j//2][j%2].grid(True)
            j+=1
        axs[0][0].set_title(r"${}<\mu<{}$".format(0.95,1))
        axs[0][1].set_title(r"${}<\mu<{}$".format(0.8,0.95))
        axs[1][0].set_title(r"${}<\mu<{}$".format(0.5,8))
        axs[1][1].set_title(r"${}<\mu<{}$".format(0,0.5))

        plt.tight_layout()

    plt.show()

In [None]:
cf="cf_lya_lya-exp.fits.gz"

plot_cf([cf],Gordon2023,rps=(0,300,75), labels=["EDR test"],colors=['k'])

In [None]:
cf_full="cf_lya_lya-exp-full.fits.gz"
plot_cf([cf_full],Gordon2023,rps=(0,300,75), labels=["DESI EDR"],colors=['k'])

Try to compute the cross-correlation function. For this you will need the quasar catalog, you can try to derive it from the full catalog we have been working, but it would be better to use another VAC, the [BAL catalog](https://data.desi.lbl.gov/doc/releases/edr/vac/balqso/). This is very similar to actual catalog that was used in [Gordon et. al 2023](https://arxiv.org/abs/2308.10950).

The picca instructions to use are picca_xcf.py, picca_xdmat.py, and picca_export.py. You can see the required arguments by requesting --help