# AGN References: Finding the catalog origins for the known AGN

The purpose of this notebook is to understand where the AGN from SIMBAD came from.
 * How were the AGN originally classified? (e.g., spectroscopically, optical photometry, other)
 * How many might we _expect_ to see variability for? (e.g., fraction at low redshift)

The original SIMBAD TAP queries are in the AGNpercentages_v2.ipynb notebook.

Graphic for all the SIMBAD TAP catalogs: http://simbad.u-strasbg.fr/simbad/tap/tapsearch.html

The table schema at the above link show that the `bibcode` reference is in the `ref` table.


**General approach:** Retrieve the references and bibliography codes for the known AGN in our fields. Assume that the reference with the earliest date is how the known AGN was identified as an AGN and added to SIMBAD. 

**More information about AGN:**
 * Astrobite's guide to galaxy and AGN types: https://astrobites.org/guides/galaxy-and-agn-types/


## Set Up

In [1]:
import numpy as np

## COSMOS

In Graham et al. (2022),
<a href="https://iopscience.iop.org/article/10.1088/0067-0049/200/1/9">Griffith et al. 2012</a> and 
<a href="https://iopscience.iop.org/article/10.3847/0067-0049/224/2/24">Laigle et al. 2016</a>
are cited as the origins for the ACS-GC and COSMOS2015 catalogs, respectively.

But neither describe the AGN identification process.

Turn back to SIMBAD to find the exact references for the AGN in the catalogs that we matched to DDF candidates.

The following query was executed to make the "COSMOS_references.txt" file at http://simbad.u-strasbg.fr/simbad/sim-tap.

### Read in "COSMOS_references.txt".

 * 0 main_id
 * 1 ra
 * 2 dec
 * 3 otype_txt
 * 4 oid
 * 5 oidref
 * 6 oidbibref
 * 7 oidbib
 * 8 bibcode
 
> **NOTE** A given AGN (i.e., a given `main_id` or `oid`), can have _multiple_ references. This means there are _multiple_ rows in "COSMOS_references.txt" for a given AGN.

In [2]:
fnm = 'COSMOS_references.txt'
cosmos_refs_id = np.loadtxt(fnm, skiprows=2, delimiter='|', usecols={0}, dtype='str')
cosmos_refs_bibcode = np.loadtxt(fnm, skiprows=2, delimiter='|', usecols={8}, dtype='str')

Print the first elements of the array to see the string format.

In [3]:
print(cosmos_refs_id[0])
print(cosmos_refs_bibcode[0])

"2MASS J09543431+0235335"                 
"2011ApJ...737..101L"


Strip whitespace and quote marks from the `id` or we won't be able to use the `id` to match to the "COSMOS_matches.txt" table.

In [4]:
for i,temp in enumerate(cosmos_refs_id):
    cosmos_refs_id[i] = temp.strip().strip('"')
del i, temp

In [5]:
print(cosmos_refs_id[0])

2MASS J09543431+0235335


### Read in the "COSMOS_matches.txt" file.

 * 0 main_id
 * 1 ra
 * 2 dec
 * 3 type
 * 4 DDF candidate id
 * 5 DDF candidate separation in arcsec

In [6]:
fnm = 'COSMOS_matches.txt'
cosmos_matches_id = np.loadtxt(fnm, delimiter=',', usecols={0}, dtype='str')
cosmos_matches_type = np.loadtxt(fnm, delimiter=',', usecols={3}, dtype='str')

Print the first elements of the array to see the string format.

In [7]:
print(cosmos_matches_id[0])
print(cosmos_matches_type[0])

[MMS2013] 102826
 QSO


Get rid of the whitespace at the start/end of the strings (due to comma-delimiters).

In [8]:
for i,temp in enumerate(cosmos_matches_id):
    cosmos_matches_id[i] = temp.strip()
del i, temp

for i,temp in enumerate(cosmos_matches_type):
    cosmos_matches_type[i] = temp.strip()
del i, temp

In [9]:
print(cosmos_matches_id[0])
print(cosmos_matches_type[0])

[MMS2013] 102826
QSO


### Match "COSMOS_matches.txt" to "COSMOS_references.txt"

Use the `id` to get the index of `cosmos_matches_id` for THE FIRST REFERENCE in the array `cosmos_refs_id`.

In [10]:
cosmos_matches_refs_index = np.zeros(len(cosmos_matches_id), dtype='int')
placeholder = []

In [11]:
%%time
for i,temp in enumerate(cosmos_matches_id):
    tx = np.where(temp == cosmos_refs_id)[0]
    if len(tx) >= 1:
        cosmos_matches_refs_index[i] = tx[0]
        placeholder.append(cosmos_refs_bibcode[tx[0]])
    else:
        cosmos_matches_refs_index[i] = -1
        placeholder.append('none')
del i, temp

CPU times: user 168 ms, sys: 1.09 ms, total: 169 ms
Wall time: 163 ms


In [12]:
cosmos_matches_refs_bibcode = np.asarray(placeholder, dtype='str')

In [13]:
values, counts = np.unique(cosmos_matches_refs_bibcode, return_counts=True)

for v,val in enumerate(values):
    print(val, counts[v])

"2001MNRAS.322L..29C" 3
"2003AJ....126.2579S" 2
"2004MNRAS.349.1397C" 1
"2005AJ....130..367S" 2
"2006A&A...455..773V" 1
"2006A&A...457...79G" 4
"2007ApJS..172...29H" 6
"2007ApJS..172...46S" 2
"2009ApJS..184..158E" 1
"2011A&A...527A.126P" 1
"2012MNRAS.421.1569B" 1
"2014A&A...563A..54P" 21
"2014ApJS..210....7C" 2
"2017ApJ...836..186J" 1
"2018AJ....155..189D" 1
"2020ApJ...888...78S" 1
"2020ApJS..250....8L" 2
none 256


### Explore these references

OK that's promising, there are but a handful of first-references (i.e., the paper that added the object to SIMBAD) for all of COSMOS.

In [14]:
print(np.unique(cosmos_matches_type))

['AGN' 'QSO' 'Sy1' 'Sy2' 'rG']


In [16]:
for agntype in ['AGN', 'QSO', 'Sy1', 'Sy2', 'rG']:
    print(agntype)
    tx = np.where(cosmos_matches_type == agntype)[0]
    values, counts = np.unique(cosmos_matches_refs_bibcode[tx], return_counts=True)
    for v,val in enumerate(values):
        print(val, counts[v])
    del tx, values, counts, v, val

AGN
"2006A&A...457...79G" 1
"2007ApJS..172...29H" 3
"2007ApJS..172...46S" 2
"2009ApJS..184..158E" 1
"2020ApJ...888...78S" 1
none 91
QSO
"2001MNRAS.322L..29C" 2
"2003AJ....126.2579S" 2
"2004MNRAS.349.1397C" 1
"2005AJ....130..367S" 2
"2006A&A...455..773V" 1
"2006A&A...457...79G" 3
"2007ApJS..172...29H" 2
"2014A&A...563A..54P" 21
"2014ApJS..210....7C" 2
"2017ApJ...836..186J" 1
"2018AJ....155..189D" 1
"2020ApJS..250....8L" 2
none 150
Sy1
"2001MNRAS.322L..29C" 1
"2007ApJS..172...29H" 1
"2011A&A...527A.126P" 1
none 11
Sy2
none 1
rG
"2012MNRAS.421.1569B" 1
none 3


#### Summary

Looks like most of the COSMOS-field known AGN (well, QSOs) that we matched with a DDF candidate and that ALSO have a bibcode in SIMBAD are from "2014A&A...563A..54P", which is "The Sloan Digital Sky Survey quasar catalog: tenth data release" https://ui.adsabs.harvard.edu/abs/2014A%26A...563A..54P/abstract. 

These are **spectroscopically** classified QSOs. 

> **NOTE** This is still not great because 91 AGN and 150 QSOs from SIMBAD that the DDF detected have no bibcode, no reference?? How can something have no origin?

<br>
<br>

## COSMOS2015

It's important to note that the `id` for COSMOS2015 and ESIS is the `ident.id`, not the `basic.main_id` from SIMBAD. This affects the matching process to connect the AGN detected by our DDF with the SIMBAD references.

The following query was executed to make the "COSMOS2015_references.txt" file. It has one more column than "COSMOS_references.txt", which is the `ident.id`.

### Read in the "COSMOS2015_references.txt" file.

 * 0 main_id
 * 1 ra
 * 2 dec
 * 3 otype_txt
 * 4 oid
 * 5 oidref
 * 6 oidbibref
 * 7 oidbib
 * 8 bibcode
 * 9 ident.id

In [17]:
fnm = 'COSMOS2015_references.txt'
cosmos2015_refs_id = np.loadtxt(fnm, skiprows=2, delimiter='|', usecols={9}, dtype='str')
cosmos2015_refs_bibcode = np.loadtxt(fnm, skiprows=2, delimiter='|', usecols={8}, dtype='str')

In [18]:
print(cosmos2015_refs_id[0])
print(cosmos2015_refs_bibcode[0])

"COSMOS2015 583822" 
"2009A&A...497..635C"


In [19]:
for i,temp in enumerate(cosmos2015_refs_id):
    cosmos2015_refs_id[i] = temp.strip().strip('"')
del i, temp

In [20]:
print(cosmos2015_refs_id[0])

COSMOS2015 583822


### Read in the "COSMOS2015_matches.txt" file.

 * 0 ident_id
 * 1 ra
 * 2 dec
 * 3 type
 * 4 DDF candidate id
 * 5 DDF candidate separation in arcsec

In [21]:
fnm = 'COSMOS2015_matches.txt'
cosmos2015_matches_id = np.loadtxt(fnm, delimiter=',', usecols={0}, dtype='str')
cosmos2015_matches_type = np.loadtxt(fnm, delimiter=',', usecols={3}, dtype='str')

In [22]:
print(cosmos2015_matches_id[0])
print(cosmos2015_matches_type[0])

COSMOS2015 452180
 QSO


Get rid of the whitespace.

In [23]:
for i,temp in enumerate(cosmos2015_matches_id):
    cosmos2015_matches_id[i] = temp.strip()
del i, temp

for i,temp in enumerate(cosmos2015_matches_type):
    cosmos2015_matches_type[i] = temp.strip()
del i, temp

In [24]:
print(cosmos2015_matches_id[0])
print(cosmos2015_matches_type[0])

COSMOS2015 452180
QSO


### Match "COSMOS2015_matches.txt" to "COSMOS2015_references.txt"

In [25]:
cosmos2015_matches_refs_index = np.zeros(len(cosmos2015_matches_id), dtype='int')
placeholder = []

In [26]:
%%time
for i,temp in enumerate(cosmos2015_matches_id):
    tx = np.where(temp == cosmos2015_refs_id)[0]
    if len(tx) >= 1:
        cosmos2015_matches_refs_index[i] = tx[0]
        placeholder.append(cosmos2015_refs_bibcode[tx[0]])
    else:
        cosmos2015_matches_refs_index[i] = -1
        placeholder.append('')
del i, temp

CPU times: user 136 ms, sys: 7.58 ms, total: 143 ms
Wall time: 141 ms


In [27]:
cosmos2015_matches_refs_bibcode = np.asarray(placeholder, dtype='str')

In [28]:
values, counts = np.unique(cosmos2015_matches_refs_bibcode, return_counts=True)

for v,val in enumerate(values):
    print(val, counts[v])

"2001MNRAS.322L..29C" 10
"2003AJ....126.2579S" 16
"2004AJ....128.1974S" 7
"2004MNRAS.349.1397C" 1
"2005AJ....130..367S" 2
"2006A&A...455..773V" 27
"2006ApJ...644..100P" 2
"2007ApJS..172...29H" 79
"2007ApJS..172...46S" 14
"2008ApJS..176...19F" 1
"2009A&A...497..635C" 7
"2009ApJS..184..158E" 5
"2010AJ....140..533G" 3
"2010ApJS..191..254H" 1
"2012ApJ...753..121K" 1
"2013ApJS..206....8M" 4
"2014A&A...563A..54P" 1


This is better! All of the known AGN that we have matched with a DDF candidate have a bibcode, as in, they have a known origin.

### Explore these references

And, they're mostly from 2007ApJS..172...29H, "The XMM-Newton Wide-Field Survey in the COSMOS Field. I. Survey Description" https://ui.adsabs.harvard.edu/abs/2007ApJS..172...29H/abstract. Which means a bunch are X-ray identified AGN. Good to know.

Let's do the same as above and look at the bibcode by AGN type.

In [29]:
print(np.unique(cosmos2015_matches_type))

['AGN' 'QSO' 'Sy1' 'rG']


In [30]:
for agntype in ['AGN', 'QSO', 'Sy1', 'rG']:
    print(agntype)
    tx = np.where(cosmos2015_matches_type == agntype)[0]
    values, counts = np.unique(cosmos2015_matches_refs_bibcode[tx], return_counts=True)
    for v,val in enumerate(values):
        print(val, counts[v])
    del tx, values, counts, v, val

AGN
"2004AJ....128.1974S" 4
"2006A&A...455..773V" 2
"2006ApJ...644..100P" 2
"2007ApJS..172...29H" 53
"2007ApJS..172...46S" 10
"2008ApJS..176...19F" 1
"2009A&A...497..635C" 4
"2009ApJS..184..158E" 3
"2010AJ....140..533G" 3
"2010ApJS..191..254H" 1
"2012ApJ...753..121K" 1
"2013ApJS..206....8M" 4
QSO
"2001MNRAS.322L..29C" 10
"2003AJ....126.2579S" 15
"2004AJ....128.1974S" 3
"2005AJ....130..367S" 1
"2006A&A...455..773V" 23
"2007ApJS..172...29H" 25
"2007ApJS..172...46S" 1
"2009A&A...497..635C" 3
"2009ApJS..184..158E" 2
"2014A&A...563A..54P" 1
Sy1
"2003AJ....126.2579S" 1
"2004MNRAS.349.1397C" 1
"2005AJ....130..367S" 1
"2006A&A...455..773V" 2
"2007ApJS..172...29H" 1
rG
"2007ApJS..172...46S" 3


#### Summary

The AGN and QSO in COSMOS2015 mostly have first-references of 2007ApJS..172...29H, though the QSOs in particular have a bunch of other paper origins.

Further investigation needed.