## `penquins`: a `python` client library for `kowalski` 

*`penquins` - Processing ENormous Queries of ztf Users INStantaneously*

In this tutorial, we will explore the details of how to programmatically query `kowalski` with `python >3.6`. We will also cover the best practices and pro tips.

### Installation

[`penuquins`](https://github.com/dmitryduev/kowalski/blob/master/penquins.py) is very lightweight and only depends on `pymongo` and `requests`. Use `pip` to install it into your environment:

In [2]:
#!pip install git+https://github.com/dmitryduev/kowalski.git

### Quick start

In [3]:
from IPython.core.display import display, HTML
import json

from penquins import Kowalski

For security, let us store the access credentials in a `json` file `secrets_penquins.json`:

In [4]:
secrets = {
    "kowalski": {
        "username": "YOUR_USERNAME",
        "password": "YOUR_PASSWORD"
    }
}

with open('secrets.json', 'w') as f:
    json.dump(secrets, f)

Load the credentials and initialize a `Kowalski` object:

In [5]:
with open('secrets_penquins.json', 'r') as f:
    secrets = json.load(f)

k = Kowalski(username=secrets['kowalski']['username'], password=secrets['kowalski']['password'])

By default, the `Kowalski` object will try to connect to the `kowalski` instance running at Caltech using the following parameters:
```python
protocol='https', host='kowalski.caltech.edu', port=443
```

You can explicitely set those if you are connecting to another instance of `kowalski`.

Set `verbose=True` if you want more feedback from Kowalski.

Let us check that the connection is healthy:

In [6]:
connection_ok = k.check_connection()
print(f'Connection OK: {connection_ok}')

Connection OK: True


Now let us construct a simple query that should return the `candid` of a ZTF alert from the `ZTF_alerts` collection and run it. 

_Please see below for more info on the available query types and how to (efficiently) construct them_.

In [22]:
q = {"query_type": "find",
     "query": {
         "catalog": "ZTF_alerts",
         "filter": {"candid": 714287740515015072},
         "projection": {"_id": 0},
     }
     }
r = k.query(query=q)
display(r)

{'user': 'admin',
 'kwargs': {},
 'status': 'done',
 'result_data': {'query_result': [{'schemavsn': '3.2',
    'publisher': 'ZTF (www.ztf.caltech.edu)',
    'objectId': 'ZTF18acrkaks',
    'candid': 714287740515015072,
    'candidate': {'jd': 2458468.7877431,
     'fid': 2,
     'pid': 714287740515,
     'diffmaglim': 20.48577308654785,
     'pdiffimfilename': 'ztf_20181216287720_000510_zr_c02_o_q2_scimrefdiffimg.fits',
     'programpi': 'Kulkarni',
     'programid': 1,
     'candid': 714287740515015072,
     'isdiffpos': 't',
     'tblid': 72,
     'nid': 714,
     'rcid': 5,
     'field': 510,
     'xpos': 223.6427001953125,
     'ypos': 2978.6474609375,
     'ra': 79.4757666,
     'dec': 8.9142668,
     'magpsf': 20.034793853759766,
     'sigmapsf': 0.18917390704154968,
     'chipsf': 1.4545427560806274,
     'magap': 19.83009910583496,
     'sigmagap': 0.24449999630451202,
     'distnr': 2.4990158081054688,
     'magnr': 19.05699920654297,
     'sigmagnr': 0.10700000077486038,
    

This query will block the execution of your program until it receives the result or when it hits the default timeout, which is set to _24 hours_. You can manually set up the query timeout in microseconds after which it will be killed on the server:

In [20]:
q['kwargs'] = {'max_time_ms': 1}
r = k.query(query=q)
r

{'user': 'admin',
 'kwargs': {'max_time_ms': 1},
 'status': 'done',
 'result_data': {'query_result': [{'candid': 714287740515015072}]}}

In [None]:
Starting from `penquins` version `1.0.0`, queries are no longer registered in the database and saved to disk _by default_,
which provides significant execution speed improvement.
This behavior can be overridden:

```python
qu = {"query_type": "general_search", 
      "query": "db['ZTF_alerts'].find_one({}, {'_id': 1})",
      "kwargs": {"save": True}}
```

Executing this query will also block the execution of your program. You can enqueue a query on the server like this:

```python
qu = {"query_type": "general_search", 
      "query": "db['ZTF_alerts'].find_one({}, {'_id': 1})",
      "kwargs": {"enqueue_only": True}}

r = k.query(query=qu)
qid = r['query_id']
```

Executing this query will return a query id that can be then used to retrieve the query result:

```python
result = k.get_query(query_id=qid, part='result')
```

You can also retrieve the original query:

```python
result = k.get_query(query_id=qid, part='task')
```

Or delete the query from Kowalski:

```python
result = k.delete_query(query_id=qid)
```

### Query examples

*Sample cone search 1*

Get all objects from the `ZTF_alerts` catalog within `8` arcseconds for two sky positions 
`(173.5155088, 33.0845502), (172.1345, 30.5412)`
and return, `_id`'s, `objectId`'s, `candidate.rcid`'s, and `candidate.rb`'s. 

In [3]:
q = {"query_type": "cone_search",
     "object_coordinates": {
         "radec": "[(173.5155088, 33.0845502), (172.1345, 30.5412)]", 
         "cone_search_radius": "8",
         "cone_search_unit": "arcsec"
     },
     "catalogs": {
         "ZTF_alerts": {
             "filter": {},
             "projection": {
                 "objectId": 1,
                 "candidate.rcid": 1,
                 "candidate.rb": 1
             }
         }
     }
     }
r = k.query(query=q)
data = r['result_data']
display(data)

{'ZTF_alerts': {'(173_5155088, 33_0845502)': [{'_id': '404387932115015001_ZTF18aabcyiy',
    'objectId': 'ZTF18aabcyiy',
    'candidate': {'rcid': 21, 'rb': 0.3766666650772095}}],
  '(172_1345, 30_5412)': []}}