# Single Node + MongoDB + [Gaia EDR3](https://www.cosmos.esa.int/web/gaia/earlydr3)

### 이 페이지에서는 로컬 머신 즉 Single Node에서 Python을 이용해 MongoDB에 Gaia EDR3 카탈로그를 업로드하고 주어진 좌표와 반경내에서 검색하는데 걸리는 시간을 측정해보도록 하겠습니다.

### 다음 코드를 돌린 컴퓨터의 스펙은 다음과 같습니다.
<img src="my_mac_spec.png" style="width:100px;height:50px" align="left" />

In [13]:
import sys

sys.version

'3.7.6 (default, Jan  8 2020, 13:42:34) \n[Clang 4.0.1 (tags/RELEASE_401/final)]'

* 여기서 사용하는 파이썬의 버젼은 위와 같습니다.

In [8]:
from astropy.table import Table
import json
import time
import os
from astropy_healpix import HEALPix, pixel_resolution_to_nside
from astropy.coordinates import ICRS, SkyCoord
from astropy import units as u

columns = [
    'designation', 
    'ra', 
    'dec', 
    'healpix',
    'phot_g_mean_flux', 
    'phot_g_mean_flux_error',
    'phot_bp_mean_flux',
    'phot_bp_mean_flux_error',
    'phot_rp_mean_flux',
    'phot_rp_mean_flux_error',
    'phot_proc_mode',
    'bp_rp',
    'bp_g',
    'g_rp',
    'dr2_radial_velocity',
    'dr2_radial_velocity_error',
    'dr2_rv_nb_transits',
    'dr2_rv_template_teff',
    'dr2_rv_template_logg',
    'dr2_rv_template_fe_h']

def table_single_row_to_dict(table):
    """Convert Astropy Table to Python dict.

    Numpy arrays are converted to lists, so that
    the output is JSON serialisable.

    Can work with multi-dimensional array columns,
    by representing them as list of list.
    """
    total_data = {}
    for name in table.colnames:
        if name in columns:
            if isinstance(table[name], str):
                total_data[name] = table[name]
            else:
                total_data[name] = table[name].tolist()
    return total_data
    

def cone_search(ra, dec, radius, collection, healpix):
    # Function to perform a simple cone search against a MongoDB collection
    # ra, dec in degrees; radius in arcseconds
    
    coords = SkyCoord(ra=ra*u.deg, dec=dec*u.deg)
    hp_to_search = healpix.cone_search_skycoord(coords, radius=radius * u.arcsec)
    cursor = collection.find({'healpix': {'$in': [int(h) for h in hp_to_search]}})
    return cursor


* Astropy Table 포멧을 Python의 Dictionary 포멧으로 변환하는 함수를 정의합니다.

In [6]:
import pymongo

client = pymongo.MongoClient()  # default connection (ie, local)

db_name = 'gaia_edr3_test'
db = client[db_name]  # database
gaia = db.gaia  # collection; can also call as db['dwarfs']
gaia.drop()  # drop collection, if needed

* 위를 실행하기에 앞서 [MongoDB community edition](https://docs.mongodb.com/manual/installation/#mongodb-community-edition-installation-tutorials)을 인스톨합니다. 
* 'gaia_edr3_test'이라는 db를 만들었습니다.

In [10]:
%%time
cat_dir = '/Volumes/APPLE SSD/db/Gaia_EDR3/'
for file in os.listdir(cat_dir):
    if file.endswith('.csv'):
        cat = Table.read(cat_dir + file)
        cat.add_column(cat['source_id'], name='healpix')
        for i in range(0, len(cat)):
            # and 36-63 bits of source_id is the HEALPix index 
            cat['healpix'][i] = cat['source_id'][i] / 34359738368
            json_data = json.loads(json.dumps(table_single_row_to_dict(cat[i])))
            result = gaia.insert_one(json_data)

CPU times: user 6min 28s, sys: 5.79 s, total: 6min 34s
Wall time: 7min 24s


* Astropy Table 포멧으로 읽은 카탈로그를 Python Dictionary로 변환한 뒤 다시 JSON 포멧으로 변환해 db로 업로드합니다.
* 하나의 gaia_source 카탈로그 (~5x10^5) 를 업로드 하는 데 약 7분이 걸렸습니다. 

In [14]:
%%time
# HEALPix object with that resolution, Gaia's nside is 4096 12 hierarchical subdivision steps
# https://gea.esac.esa.int/archive/documentation/GEDR3/Gaia_archive/chap_datamodel/sec_dm_main_tables/ssec_dm_gaia_source.html
hp = HEALPix(nside=4096, order='nested', frame=ICRS())

# Example use
cursor = cone_search(45, 0.1, 180., gaia, hp)
for doc in cursor:
    print(doc['designation'], doc['ra'], doc['dec'])

Gaia EDR3 515396233856 44.99832707810714 0.0663327072023917
Gaia EDR3 828929527040 45.02361979732255 0.06841876724959775
Gaia EDR3 927713095040 45.02672698087207 0.08169947826793385
Gaia EDR3 966367933184 45.039080477403814 0.08685485276440565
Gaia EDR3 1275606125952 44.993270784169155 0.07633404499591856
Gaia EDR3 1340029955712 44.96907662980059 0.08442520281043711
Gaia EDR3 1340029956224 44.97846156970949 0.09257928817288391
Gaia EDR3 1511828647680 44.95265152874727 0.08495205087426602
Gaia EDR3 1619203481984 44.95115803041135 0.10531247613400328
Gaia EDR3 1653563247744 44.99606230474708 0.08491778897415135
Gaia EDR3 1683627775360 45.013788337793514 0.08773432698874072
Gaia EDR3 1717987078400 44.98309734471892 0.09640645832988629
Gaia EDR3 1752346816896 45.00504137480298 0.10193356019856178
Gaia EDR3 1786706818304 45.029202891434686 0.10384332050094827
Gaia EDR3 2130304202496 44.98413214856149 0.1276853526886549
Gaia EDR3 2546916445184 45.05702843457773 0.11498172278883387
Gaia EDR3 

약 5e5개의 소스 중 ra=45, dec=0.1 을 중심으로 180초 반경내에 들어오는 것들을 찾는데 1초도 걸리지 않았습니다.