# Single Node + MongoDB + [Gaia EDR3](https://www.cosmos.esa.int/web/gaia/earlydr3)

### 이 페이지에서는 로컬 머신 즉 Single Node에서 Python을 이용해 MongoDB에 Gaia EDR3 카탈로그를 업로드하고 주어진 좌표와 반경내에서 검색하는데 걸리는 시간을 측정해보도록 하겠습니다.

### 다음 코드를 돌린 컴퓨터의 스펙은 다음과 같습니다.
<img src="my_mac_spec.png" style="width:100px;height:50px" align="left" />

In [1]:
import sys

sys.version

'3.7.6 (default, Jan  8 2020, 13:42:34) \n[Clang 4.0.1 (tags/RELEASE_401/final)]'

* 여기서 사용하는 파이썬의 버젼은 위와 같습니다.

In [2]:
from astropy.table import Table
import json
import time
import os

def table_single_row_to_dict(table):
    """Convert Astropy Table to Python dict.

    Numpy arrays are converted to lists, so that
    the output is JSON serialisable.

    Can work with multi-dimensional array columns,
    by representing them as list of list.
    """
    total_data = {}
    for name in table.colnames:
        if isinstance(table[name], str):
            total_data[name] = table[name]
        else:
            total_data[name] = table[name].tolist()
    return total_data

* Astropy Table 포멧을 Python의 Dictionary 포멧으로 변환하는 함수를 정의합니다.

In [3]:
import pymongo

client = pymongo.MongoClient()  # default connection (ie, local)

db_name = 'gaia_edr3_test'
db = client[db_name]  # database
brick = db.brick  # collection; can also call as db['dwarfs']
brick.drop()  # drop collection, if needed

* 위를 실행하기에 앞서 [MongoDB community edition](https://docs.mongodb.com/manual/installation/#mongodb-community-edition-installation-tutorials)을 인스톨합니다. 
* 'gaia_edr3_test'이라는 db를 만들었습니다.

In [None]:
%%time
cat_dir = '/Volumes/APPLE SSD/db/Gaia_EDR3/'
for file in os.listdir(cat_dir):
    if file.endswith('.csv'):
        cat = Table.read(cat_dir + file)
        for i in range(0, len(cat)):
            json_data = json.loads(json.dumps(table_single_row_to_dict(cat[i])))
            result = brick.insert_one(json_data)

* Astropy Table 포멧으로 읽은 카탈로그를 Python Dictionary로 변환한 뒤 다시 JSON 포멧으로 변환해 db로 업로드합니다.
* 144개의 brick (~3deg^2)을 업로드 하는 데 약 11분이 걸렸습니다. (다른 작업과 동시에 진행 했을 때 20분 정도도 걸렸습니다.)

In [11]:
from astropy_healpix import HEALPix, pixel_resolution_to_nside
from astropy.coordinates import ICRS, SkyCoord
from astropy import units as u

# nside required for chosen resolution
resolution = 10 * u.arcsec
nside = pixel_resolution_to_nside(resolution, round='up')

# HEALPix object with that resolution
hp = HEALPix(nside=nside, order='nested', frame=ICRS())

# HEALPix to SkyCoord object
coords = hp.healpix_to_skycoord([42])
print(coords)

# SkyCoord object to HEALPix
coords = SkyCoord(ra=34*u.deg, dec=-23*u.deg)
print(hp.skycoord_to_healpix(coords))

# Example cone search
coords = SkyCoord(ra=34*u.deg, dec=-23*u.deg)
hp_to_search = hp.cone_search_skycoord(coords, radius=5 * u.arcmin)
print(len(hp_to_search))
print(hp_to_search[0:10])

<SkyCoord (ICRS): (ra, dec) in deg
    [(44.99038696, 0.00932548)]>
9542850888
6988
[9542850888 9542850889 9542850891 9542850890 9542850847 9542850845
 9542850839 9542850882 9542850883 9542850886]


In [14]:
# Loop over those without coords.healpix and set the value
cursor = brick.find({'coords.healpix': {'$exists': False}})
for doc in cursor:
#     print(doc)
    coords = SkyCoord(ra=doc['ra']*u.deg, dec=doc['dec']*u.deg)
    healpix = int(hp.skycoord_to_healpix(coords))
    brick.update_one({'_id': doc['_id']}, {'$set': {'coords.healpix' : healpix}})
    
# Create an index on the HEALPix values for faster queries
if 'healpix' not in brick.index_information():
    brick.create_index([('coords.healpix', pymongo.ASCENDING)],
                      name='healpix', background=True)
