# Explore NEO data with [Glue](http://docs.glueviz.org/en/stable/index.html)

*Glue allows users to build linked, interactive figures from files and python datasets.  It can be run as a standalone application or launched from python.  In this tutorial, we will explore running glue from a jupyter notebook.*

*We'll work with [Near-Earth object (NEO)](https://en.wikipedia.org/wiki/Near-Earth_object) data. I downloaded NEO data from the [JPL Small-Body Database](https://ssd.jpl.nasa.gov/sbdb_query.cgi) and the [NEO Earth Close Approaches archive](https://cneos.jpl.nasa.gov/ca/).  Then I cleaned up the data a bit, [see here](https://github.com/ageller/IntroToGlue/blob/main/data/prepNEOdata.ipynb).*

<img src="https://upload.wikimedia.org/wikipedia/commons/c/ce/Asteroids-KnownNearEarthObjects-Animation-UpTo20180101.gif" width="100%" align="center">

*animated gif from [here](https://commons.wikimedia.org/wiki/File:Asteroids-KnownNearEarthObjects-Animation-UpTo20180101.gif)*


In [1]:
#import necessary libraries
import pandas as pd
from glue.core import Data, DataCollection
from glue.app.qt.application import GlueApplication

# Read in the data

## [JPL Small-Body Database](https://ssd.jpl.nasa.gov/sbdb_query.cgi)

In [2]:
#the low_memory=False flag suppresses a warning message for a few columns with mixed data types 
jplsbdb = pd.read_csv('data/sbdb_query_results.csv', low_memory=False) 
jplsbdb.iloc[:3]

Unnamed: 0,id,spkid,full_name,pdes,name,prefix,neo,pha,H,G,...,n_obs_used,n_del_obs_used,n_dop_obs_used,condition_code,rms,two_body,A1,A2,A3,DT
0,a0000433,2000433,433 Eros (A898 PA),433,Eros,,Y,N,10.43,0.46,...,9130,4.0,2.0,0.0,0.29796,,,,,
1,a0000719,2000719,719 Albert (A911 TB),719,Albert,,Y,N,15.51,,...,1894,,,0.0,0.39775,,,,,
2,a0000887,2000887,887 Alinda (A918 AA),887,Alinda,,Y,N,13.87,-0.12,...,2624,,,0.0,0.39776,,,,,


## [NEO Earth Close Approaches archive](https://cneos.jpl.nasa.gov/ca/)

(I cleaned the data file so that it can be used more easily; see the data/prepNEOdata.ipynb.)

In [3]:
neoca = pd.read_csv('data/cneos_closeapproach_data-cleaned.csv')
neoca.iloc[:3]

Unnamed: 0,pdes,Object,Close-Approach (CA) Date,CA Distance Nominal (AU),CA Distance Minimum (AU),V relative (km/s),V infinity (km/s),H (mag),Diameter (km),extra
0,509352,509352 (2007 AG),1900.096844,0.00963,0.00963,8.69,8.65,20.1,410000.0,a0509352
1,2014 SC324,(2014 SC324),1900.113578,0.03997,0.03997,10.65,10.65,24.3,59500.0,bK14SW4C
2,2012 UK171,(2012 UK171),1900.118827,0.04982,0.04982,7.16,7.15,24.4,55500.0,bK12UH1K


# Start Glue with these data

*[qglue](http://docs.glueviz.org/en/latest/python_guide/glue_from_python.html) is a way to send python data structures (Numpy arrays, Pandas dataframes, Astropy tables, others) to glue. It returns an application object wich contains lots of state about the application.*


*The following code is supposed to allow glue to run within a notebook without blocking.  [See here.](http://docs.glueviz.org/en/stable/python_guide/glue_from_python.html#using-qglue-with-the-ipython-jupyter-notebook)*

<br>
<div style='background-color:#eeffcc; padding:10px; border: 1px solid #e1e4e5'>
%gui qt
</div>
<br>
<div style='background-color:#eeffcc; padding:10px; border: 1px solid #e1e4e5'>
from glue import qglue<br>
app = qglue(jplsbdb = jplsbdb, neoca = neoca)
</div>
<br>

*But this does not work on my end. I recommend that you try it to see if you can make it work because it may simplify the workflow.  For now, I will run glue as a backgroundjob...*

In [4]:
from IPython.lib import backgroundjobs as bg

In [16]:
def runglue(*a, **kw):
    dc = DataCollection()
    for key in kw:
        print(key)
        if (key == 'links'):
            for link in links:
                dc.add_link(link)
        else:
            dc[key] = kw[key]
    app = GlueApplication(dc)
    app.start()
    return app

In [6]:
jobs = bg.BackgroundJobManager()

In [6]:
jobs.new(runglue, kw={'jplsbdb':jplsbdb})

jplsbdb
neoca

<BackgroundJob #0: <function runglue at 0x0000014ECCD81820>>




The parent attribute was deprecated in Matplotlib 3.3 and will be removed two minor releases later. Use self.canvas.setParent() instead.
  viewer._mpl_nav.parent = None
The parent attribute was deprecated in Matplotlib 3.3 and will be removed two minor releases later. Use self.canvas.parent() instead.
  viewer._mpl_nav.parent = None
Invalid limit will be ignored.
  self.axes.set_xlim(x_min, x_max)
Invalid limit will be ignored.
  self.axes.set_ylim(y_min, y_max)
Invalid limit will be ignored.
  self.axes.set_ylim(y_min, y_max)
Invalid limit will be ignored.
  self.axes.set_ylim(y_min, y_max)
Invalid limit will be ignored.
  self.axes.set_xlim(x_min, x_max)
The parent attribute was deprecated in Matplotlib 3.3 and will be removed two minor releases later. Use self.canvas.setParent() instead.
  viewer._mpl_nav.parent = None
The parent attribute was deprecated in Matplotlib 3.3 and will be removed two minor releases later. Use self.canvas.parent() instead.
  viewer._mpl_nav.parent = None


## Manipulate the data in glue

*Make a few plots and selections.  Let's try to reproduce (approximately) the image below.*

*If you're running as a background job, you will have to export your selected data to a csv file, and then read them into your notebook, in order to have access to the data products in the notebook.  (This is what I will do.)*  

*If you were able to run with the notebook magic command, you should be able to leave your glue session live and acess the data with* app.data_collection

## Next, lets explore the selected data in python


In [7]:
subset = pd.read_csv('data/exportedFromGlue/subset.csv')
subset

Unnamed: 0,pdes,Object,Close-Approach (CA) Date,CA Distance Nominal (AU),CA Distance Minimum (AU),V relative (km/s),V infinity (km/s),H (mag),Diameter (km),extra
0,307005,307005 (2001 XP1),2130.048183,0.02499,0.02499,28.78,28.78,18.0,335000.75,a0307005
1,530085,530085 (2010 XC11),2135.102164,0.03646,0.03646,29.48,29.48,18.7,240000.55,a0530085
2,2014 NK52,(2014 NK52),2138.95999,0.03816,0.03816,29.27,29.27,21.3,0.5,bK14N52K
3,2007 PF28,(2007 PF28),2139.339523,0.04345,0.04345,26.81,26.81,19.1,650000.0,bK07P28F
4,2001 YV3,(2001 YV3),2145.522198,0.04538,0.04538,22.91,22.91,20.6,325000.0,bK01Y03V
5,2014 OX299,(2014 OX299),2145.69994,0.02071,0.02071,30.31,30.31,19.5,540000.0,bK14OT9X
6,2021 HK12,(2021 HK12),2147.840965,0.04596,0.04596,28.15,28.15,17.7,385000.85,bK21H12K
7,2013 ED28,(2013 ED28),2148.239696,0.04957,0.04957,22.36,22.36,21.5,215000.0,bK13E28D
8,2017 WV13,(2017 WV13),2151.03162,0.03368,0.03368,21.98,21.98,21.2,245000.0,bK17W13V
9,2020 BC8,(2020 BC8),2154.660655,0.03136,0.03136,23.36,23.35,20.0,430000.0,bK20B08C


## Add some more elements to our dataset

*I am interested to see what are the closest and largest asteroids and when they may approach.*

- I want to link together these two data sets
- I am going to calculate the [perihelion distance](https://en.wikipedia.org/wiki/Apsis#Perihelion_and_aphelion) for each and convert that to km

In [7]:
from glue.core.link_helpers import LinkSame

In [8]:
dc = DataCollection()
dc['jplsbdb'] = jplsbdb
dc['neoca'] = neoca
d2 = Data(neoca=neoca)

In [9]:
dc['jplsbdb'].id['pdes']

pdes

In [10]:
link = LinkSame(dc['jplsbdb'].id['pdes'], dc['neoca'].id['pdes'])
links = [link]

In [17]:
jobs.new(runglue, kw={'jplsbdb':jplsbdb, 'neoca':neoca, 'links':links})

jplsbdb

<BackgroundJob #1: <function runglue at 0x000001408CA85CA0>>


neoca
links


The parent attribute was deprecated in Matplotlib 3.3 and will be removed two minor releases later. Use self.canvas.setParent() instead.
  viewer._mpl_nav.parent = None
The parent attribute was deprecated in Matplotlib 3.3 and will be removed two minor releases later. Use self.canvas.parent() instead.
  viewer._mpl_nav.parent = None


In [14]:
jobs[0].status

KeyError: -1

In [13]:
jobs[0].traceback()

[1;31m---------------------------------------------------------------------------[0m
[1;31mTypeError[0m                                 Traceback (most recent call last)
[1;32m~\Anaconda3\envs\glueviz-env\lib\site-packages\IPython\lib\backgroundjobs.py[0m in [0;36mcall[1;34m(self)[0m
[0;32m    489[0m [1;33m[0m[0m
[0;32m    490[0m     [1;32mdef[0m [0mcall[0m[1;33m([0m[0mself[0m[1;33m)[0m[1;33m:[0m[1;33m[0m[1;33m[0m[0m
[1;32m--> 491[1;33m         [1;32mreturn[0m [0mself[0m[1;33m.[0m[0mfunc[0m[1;33m([0m[1;33m*[0m[0mself[0m[1;33m.[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mself[0m[1;33m.[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[0m
[1;32m~\AppData\Local\Temp/ipykernel_27140/1862910458.py[0m in [0;36mrunglue[1;34m(*a, **kw)[0m
[0;32m      7[0m                 [0mdc[0m[1;33m.[0m[0madd_link[0m[1;33m([0m[0mlink[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[0;32m      8[0m         [1;32melse[0m[1;33m:[0m[1

In [15]:
jobs

<IPython.lib.backgroundjobs.BackgroundJobManager at 0x1408c538df0>