# ZVAR API v0.2.1 - Example(s)

### **!!! This is an updated version of [this notebook](https://colab.research.google.com/drive/1nyl_dMzFKcWStCsKXEQhDBjXlzYlIJrF?usp=sharing) !!!**

### **API Changelog (v0.2.1, 2025/05/15):**
- **PanSTARRs IDs are now returned as strings and not BigInt/Int64, since this isn't compatible with some python packages and javascript library. This way, ZVAR IDs are also Strings just like all other external catalogs we query against. However, the API can still take these IDs as Integers, to avoid any breaking changes**

### **API Changelog (v0.2.0):**
- **HTTP methods sending a JSON body are now all of type POST (and not GET), to comply with the HTTP standards.**
- **You can now look for ZVAR sources that match with sources from other catalogs (based on the other catalogs' IDs).**
- **You can now retrieve the image difference forced photometry lightcurves in flux-space.**
- **Full Rest API documentation available [here](http://hypernova.caltech.edu:5000/docs)**


### **Notebook Changelog**
- **The `Client` Python class has been updated to support the new endpoint, the GET - > POST method changes, and the photometry format option. This is the only cell that has been edited from the first notebook, everything else is identical.**
- **A new section demoing the `external_xmatch` (to find ZVAR sources starting from the IDs of sources in other catalogs) has been ad**
- **A new section has been added at the very end of the notebook that shows the photometry format in action, add gives you some tips on how to convert the data returned by the API to dataframes and vice versa.**
<br/>
<br/>

---

<br/>

In this notebook, I compiled a number of examples of how to use the ZVAR API's current version. For context, the ZVAR API is a RESTful (HTTP) API that allows users to interact with the ZVAR data products progammatically, exposing data in a structured manner without requiring anyone to download large files. The API is designed to be easy to use, and as simple as possible. The main goal is to provide access to objects in ZVAR via ID-based or position-based queries, to download their lightcurves, and to cross-match them with other catalogs.
The API is currently in its early stages, and we are working on adding more features and improving the documentation. We welcome any feedback or suggestions for improvements.

#### **Notes:**

- **The API requires authentication, reach out to `tdulaz@caltech.edu` for access.**

- **This has been designed to showcase the API specifically, the existing `zvartools` python package is currently being edited to make use of this API rather than require you to download large files. An update will be sent once these are updated.**

- **All the plots and methods shown here will be found in the package, most variables that are regular dictionaries here will be replaced by Python classes, for convenience. Except if you think that dictionaries are best, just let me know!**

<br/>

---

### Installing Dependencies

#### Option 1: Google colab

If you are using Google Colab, you don't need to install anything. But, you do have to run the following cell to enable interactive plots.

**NOTE: If you aren't using colab, remember to comment out the next cell.**

In [1]:
# Code Block # 1
# # comment the code in this cell if you are not using colab
# from google.colab import output
# from google.colab import userdata
# output.enable_custom_widget_manager()

#### Option 2: Locally

Here, you'll need to:
1. Create a python virtual environment with the tool of your choice, or re-use an existing one (I suggest creating a dedicated one, always easier in the long run).
2. Install the python packages required by this notebook. So the notebook self contained and does not require files to go with it (google colab removes files after some time if they are not notebooks), here is the command you can run in your terminal to install them (replace pip by whatever install your python venv system uses):

```bash
pip install numpy requests jupyter ipython tqdm 'plotly<6' 'ipywidgets<9'
```

### Importing dependencies

In [12]:
# Code Block # 2
import json
import os
from typing import Tuple
import pandas as pd

import numpy as np
import plotly.graph_objects as go
import requests
from IPython.display import display, Javascript
from ipywidgets import widgets
from tqdm import tqdm

Next, let's just set our username, password, and url of the API. For the username, we strongly recommend using environment variables. That way you can safely share your notebooks without leaking **your** credentials (provided that of course, the person you are sharing it with has been granted access ZVAR to begin with). If you are using google colab you can simply add your credentials in the menu on the left (key icon).

#### Option 1: Google Colab

In [3]:
# Code Block # 3
# Comment the code in this cell if you are not using colab
# username = userdata.get('ZVAR_USERNAME')
# password = userdata.get('ZVAR_PASSWORD')

#### Option 2: Locally

In [2]:
# Code Block # 4
# uncomment the code in this cell if you are not using colab

# username = os.environ.get('ZVAR_USERNAME')
# password = os.environ.get('ZVAR_PASSWORD')

# or simply
username = "yngveha"
password = "MondayTuesday@456545"

Let's quickly check that the credentials were read successfully:

In [3]:
# Code Block # 5
if username in [None, ""] or password in [None, ""]:
  raise ValueError("Credentials incorrect or missing")

### Accessing the API

We create a convenient Python client for our API, to abstract the HTTP methods and expose them in a more "pythonic-friendly" manner, to avoid having to deal with the raw HTTP requests and responses every time. We implement the following methods:
- cone_search (single position)
- cone_searches (multiple positions)
- get_object (single PSID)
- get_objects (multiple PSIDs)
- get_photometry (multiple PSIDs)
- get_processing_logs (get list of already processed files)

In [4]:
# Code Block # 6
class Client:
    def __init__(self, username, password, host="https://hypernova.caltech.edu/api"):
        self.host = host
        self.username = username
        self.password = password

    def cone_search(self, ra, dec, radius, **kwargs) -> dict:
        params={
            "positions": [{"ra": ra, "dec": dec, "radius": radius, "name": "target"}],
            **kwargs
        }
        response = requests.post(
            f"{self.host}/cone_search",
            json=params,
            auth=(self.username, self.password),
        )
        if response.status_code != 200:
            print(response.text)
            return None
        return response.json()["target"]

    def cone_searches(self, positions, radius, verbose=True, **kwargs) -> dict:
        if not isinstance(positions, list):
            raise ValueError("positions should be a list of dict")
        matches = {}
        # batch by max 1000
        for i in tqdm(range(0, len(positions), 1000), disable=not verbose):
            params = {
                "positions": positions[i:i+1000],
                "radius": radius,
                **kwargs
            }
            response = requests.post(
                f"{self.host}/cone_search",
                json=params,
                auth=(self.username, self.password),
            )
            if response.status_code != 200:
                print(response.text)
                return None
            matches.update(response.json())
        return matches

    def get_photometry(self, psids, bands=["g", "r"], format="mag", verbose=True) -> dict:
        if isinstance(psids, int | str):
          psids = [psids]
        psids = list(set(map(int, psids)))
        photometry = {}
        # max 1000 psids per request
        for i in tqdm(range(0, len(psids), 500), disable=not verbose):
            response = requests.post(
                f"{self.host}/photometry",
                json={"psids": list(set(psids[i:i+500])), "bands": bands, "format": format},
                auth=(username, password),
            )
            if response.status_code != 200:
                print(response.text)
                return None
            photometry.update(response.json())

        return photometry

    def get_object(self, psid) -> dict:
        if not isinstance(psid, str | int):
            raise ValueError("psid should be a string or int")
        response = requests.get(
            f"{self.host}/object/{str(psid)}",
            auth=(username, password),
        )
        if response.status_code != 200:
            print(response.text)
            return None
        return response.json()

    def get_objects(self, psids) -> dict:
        if not isinstance(psids, list):
            raise ValueError("psids should be a list of str or int")
        psids = list(set(psids))
        objects = []
        # max 1000 psids per request
        for i in tqdm(range(0, len(psids), 1000)):
            response = requests.post(
                f"{self.host}/objects",
                json={"psids": psids[i:i+1000]},
                auth=(username, password),
            )
            if response.status_code != 200:
                print(response.text)
                return None
            objects.extend(response.json())
        return objects

    def get_processing_logs(self) -> list:
        response = requests.get(
            f"{self.host}/processing_logs",
            auth=(username, password),
        )
        if response.status_code != 200:
            print(response.text)
            return None
        return response.json()

    def get_available_fields(self) -> list:
        logs = self.get_processing_logs()
        if not logs:
            return []
        fields = set([log["field"] for log in logs])
        return sorted(list(fields))

    def xmatch(self, psids, catalog, radius, closest=True):
        response = requests.post(
            f"{self.host}/xmatch",
            json={"psids": psids, "catalog": catalog, "radius": radius, "closest": closest},
            auth=(username, password),
        )
        if response.status_code != 200:
            print(response.text)
            return None
        return response.json()

    def external_xmatch(self, ids, catalog, radius, closest=True):
        response = requests.post(
            f"{self.host}/external_xmatch",
            json={"ids": ids, "catalog": catalog, "radius": radius, "closest": closest},
            auth=(username, password),
        )
        if response.status_code != 200:
            print(response.text)
            return None
        return response.json()


In [5]:
# Code Block # 7
client = Client(username, password)

Let's try one of the methods right now to check that the connection is working. For that we'll just use the `get_processing_logs` method, which should return a list of all the files that have been processed by the API (all objects in these files with valid/enough photometry are available to query with the other methods). This is a good way to get the list of available fields. To make it a little more convenient, we also provide a `get_available_fields` method.

In [6]:
# Code Block # 8
logs = client.get_processing_logs()
print(f"Found {len(logs)} field/ccd/quad/band processed")

# could also use the get_available_fields() method directly
available_fields = client.get_available_fields()
print(f"Found {len(available_fields)} fields")

Found 46929 field/ccd/quad/band processed
Found 371 fields


Next, we define a few methods to cleanup the photometry and to prepare it for plotting. This allows us to make things a little more convenient and numpy-friendly, while allowing for optional filtering like removing low-SNR points, removing lightcurves/bands with not enough datapoints, ...

The API returnes structured data for photometry in the form of a list of dictionaries, which looks like:
```js
[
    {
        "time": 2458263.981924, // in bjd
        "mag": 16.531168,
        "magerr": 0.0153942695,
        "limmag": null, // limiting magnitude is snr < 1
        "snr": 70.528595,
        "band": "g",
        "field": 648,
        "ccd": 2,
        "quad": 2
    },
    ...
]
```

To allow for easy array operations with numpy, we sanitize the data then reformat it into 4 numpy arrays of equal length: `time, mag, magerr, band`

In [7]:
# Code Block # 9
def dict_to_nparrays(data):
    return (
        np.array([d["time"] for d in data]),
        np.array([d["mag"] for d in data]),
        np.array([d["magerr"] for d in data]),
        np.array([d["band"] for d in data])
    )

def remove_nondetections(
    time: np.ndarray, flux: np.ndarray, flux_err: np.ndarray, filter: np.ndarray
) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
    mask = np.isnan(flux)
    return (
        time[~mask],
        flux[~mask],
        flux_err[~mask],
        filter[~mask],
    )

def prepare_photometry(data, min_snr=3, min_points=50):
    # non detection cut
    data = [d for d in data if d["mag"] is not None]

    # snr cut
    data = [d for d in data if d["snr"] and d["snr"] >= min_snr]

    # min points per band cut
    bands = np.array([d["band"] for d in data])
    unique_bands = np.unique(bands)
    nb_per_band = np.array([np.sum(bands == band) for band in unique_bands])
    bands = unique_bands[nb_per_band >= min_points]
    data = [d for d in data if d["band"] in bands]

    # convert to np arrays
    time, mag, magerr, band = dict_to_nparrays(data)
    mag = np.array([np.nan if f is None else f for f in mag])
    magerr = np.array([np.nan if f is None else f for f in magerr])

    return time, mag, magerr, band

### Conesearches

Now, let's query the API so we have some example data for plotting. We use the following cells to demonstate the cone search method, which allows you to search for objects in a given radius around a given position, and can take optional parameters. Here just to show everything that is available at the moment via the API, we'll do a 2 arcmin (max is 10) conesearch and filter on:
- `min/max period` between 5 and 10 hours (in days), with a `min significance` of 15.0
- `amplitude` between 0.7 and 1.0 mag (n a given lightcurve by field/ccd/quad/band)
- `objtype` = 1 (filter on the DoPHOT objtype)
- `min_exposures` = 100 (minimum number of datapoints in a given lightcurve by field/ccd/quad/band)

*Notes:*

- So far the queries are `OR` queries, by that I mean that if a given PanSTARRs object has multiple periods (which is often the case), it will be returned if any of the periods match the criteria. Similarly, since an object has `N` lightcurves (where `N` is the number of field/ccd/quad/bands individual fields from ZVAR's image difference forced photometry), it will be returned if any of the lightcurves match the criteria. This may change with time, but for now an `OR` allows us to return more objects which is better than missing some of them due to a strict `AND` query.*

- Only positions and IDs are indexes in the database, so the queries are fast. The other fields are not indexed, so they are slower to query. We are working on indexing them, but for now the queries are still fast enough for most use cases.

- The values of some of the extra parameters (more specifically amplitude and magnitude ranges) are sensible to outliers and might change with time as we improve the data quality and the methods to compute them.


The conesearches require names associated to your positions (at least when using the batch method), so you can map the results back to the original positions. The result is a dictionary with the names as keys and the list of matches as values.

In [None]:
# Code Block # 10
min_period = 5/24  # 5 hours
max_period = 10/24  # 10 hours
objects = client.cone_search(
    ra=0.8365280032157898, dec=30.937196731567383,
    radius=1 * 60
    # objtype=1,
    # min_period=min_period, max_period=max_period,
    # min_significance=15,
    # min_exposures=100,
    # min_amplitude=0.7, max_amplitude=1.0,
)
# print(f"Total objects found: {len(objects)}")
print(objects)

[{'psid': '145130008248670307', 'ra': 0.8248550295829773, 'dec': 30.941429138183594, 'objtype': 6, 'matchfiles': [{'field': 648, 'ccd': 2, 'quad': 2, 'band': 'g', 'nb_exposures': 474, 'idx': 6010, 'amplitude': 0.02156774, 'min_mag': 16.482143, 'max_mag': 16.720327}, {'field': 648, 'ccd': 2, 'quad': 2, 'band': 'r', 'nb_exposures': 652, 'idx': 6025, 'amplitude': 0.022062214, 'min_mag': 16.400211, 'max_mag': 16.461382}], 'periods': [{'field': 648, 'ccd': 2, 'quad': 2, 'band': 'g', 'period': 0.010445044381397761, 'significance': 7.725498676300049, 'string_length': 8.441972082698374, 'bin': 5}, {'field': 648, 'ccd': 2, 'quad': 2, 'band': 'g', 'period': 0.1297460200029486, 'significance': 9.0879487991333, 'string_length': 8.454847262469992, 'bin': 10}, {'field': 648, 'ccd': 2, 'quad': 2, 'band': 'g', 'period': 0.0035774890258711803, 'significance': 10.638250350952148, 'string_length': 8.255880870319226, 'bin': 20}, {'field': 648, 'ccd': 2, 'quad': 2, 'band': 'r', 'period': 0.0062884482100673

In [13]:
# Code Block # 10A
objects

[{'psid': '145130008248670307',
  'ra': 0.8248550295829773,
  'dec': 30.941429138183594,
  'objtype': 6,
  'matchfiles': [{'field': 648,
    'ccd': 2,
    'quad': 2,
    'band': 'g',
    'nb_exposures': 474,
    'idx': 6010,
    'amplitude': 0.02156774,
    'min_mag': 16.482143,
    'max_mag': 16.720327},
   {'field': 648,
    'ccd': 2,
    'quad': 2,
    'band': 'r',
    'nb_exposures': 652,
    'idx': 6025,
    'amplitude': 0.022062214,
    'min_mag': 16.400211,
    'max_mag': 16.461382}],
  'periods': [{'field': 648,
    'ccd': 2,
    'quad': 2,
    'band': 'g',
    'period': 0.010445044381397761,
    'significance': 7.725498676300049,
    'string_length': 8.441972082698374,
    'bin': 5},
   {'field': 648,
    'ccd': 2,
    'quad': 2,
    'band': 'g',
    'period': 0.1297460200029486,
    'significance': 9.0879487991333,
    'string_length': 8.454847262469992,
    'bin': 10},
   {'field': 648,
    'ccd': 2,
    'quad': 2,
    'band': 'g',
    'period': 0.0035774890258711803,
    's

Here's an example of a simple conesearch without any extra parameters:

In [17]:
# Code Block # 11
all_objects = client.cone_search(
    ra=0.8365280032157898, dec=30.937196731567383,
    radius=10 * 60,
)
print(f"Total objects found: {len(all_objects)}")

Total objects found: 2017


As you can see it didn't take that much longer (if not less, when non-indexes fields aren't used in the query) even if we retrieve a lot more objects. While you can still use the optional parameters. We strongly recomment using simpler conesearches or ID-based queries as you get started with the API and data product. You can always refine your queries later on, as we refine the data product and make more options available to you. You can always run simple cone searches with the API, then do more complex filtering in python directly.

Last but not least, here's an example with a batch of positions

In [18]:
# Code Block # 12
positions = [
    {"name": "mycoolobject", "ra": 0.8365280032157898, "dec": 30.937196731567383}
]

batch_objects = client.cone_searches(
    positions=positions,
    radius=10 * 60,
)

results_my_cool_object = batch_objects["mycoolobject"]
print(f"Total objects found: {len(results_my_cool_object)}")

100%|██████████| 1/1 [00:01<00:00,  1.55s/it]

Total objects found: 2017





### ID-based queries

ZVAR generates lightcurves for all PanSTARRs objects found in ZTF, therefore objects are indexes (and unique) by their PanSTARRs ID (PSID). The API allows you to query objects by their PSID, one at a time or in batches, and returns a list of objects. Here is an example of a batch query with a list of PSIDs:

In [26]:
# Code Block # 13
psid = 126541217361397171
print(f"PSID: {psid}")

objects_by_psid = client.get_objects([psid])

assert len(objects_by_psid) == 1 # should only have one object per psid


PSID: 126541217361397171


100%|██████████| 1/1 [00:00<00:00,  2.57it/s]




---

### Lightcurves

Let's retrieve the lightcurves of the object(s) we got from our first cone search. The result of the `get_photometry` method is a dictionary with the PSID as key and a list of photometry points as value. The photometry points are in the same format as the one shown above.

*Notes:*

- **While the PSID is an integer, APIs often prefer returning dictionary keys are strings. This is true for all the different methods of the API that return such a dictionary (lightcurves and xmatch endpoints). Something to keep in mind, as without converting to string you might not be able to find your data as expected.**

In [None]:
# Code Block # 14
lightcurves = client.get_photometry(
    psids={c["psid"] for c in objects},
    bands=["g", "r"]
)
print(f"Total lightcurves found: {len(lightcurves)}")

# let's look for our object, notice the PSID that we cast to str
object_lightcurve = lightcurves.get(str(objects[0]["psid"]), None)
if object_lightcurve is None:
    raise ValueError(f"Object {objects[0]['psid']} not found in lightcurves")

print(f"Found {len(object_lightcurve)} points in the lightcurve for object {objects[0]['psid']}")
print(object_lightcurve[0])

# let's call the prepare_photometry function to sanitize the data and convert to numpy arrays
time, mag, magerr, band = prepare_photometry(object_lightcurve)

print(f"Photometry is composed of 4 numpy arrays, of length {len(time)}")

100%|██████████| 1/1 [00:00<00:00,  1.54it/s]

Total lightcurves found: 1
Found 1126 points in the lightcurve for object 145120008365555264
{'time': 2458263.981924, 'mag': 17.41657, 'magerr': 0.036399588, 'limmag': None, 'flux': 345.24374, 'fluxerr': 95.39911, 'snr': 29.828255, 'band': 'g', 'field': 648, 'ccd': 2, 'quad': 2}
Photometry is composed of 4 numpy arrays, of length 1126





### Objects content

Any ZVAR object returned by the API has:
- `psid`: the PanSTARRs ID of the object
- `ra`: the RA of the object
- `dec`: the Dec of the object
- `objtype`: the DoPHOT object type
- `matchfiles`: the list of files that contain lightcurves for that object (by field/ccd/quad/band)
- `periods`: the list of periods for that object (in days, up to 3 periods per matchfile, one per FPW bin: 5, 10, 20)

`matchfiles` and `periods` contain some additional quantities computed by the API, identical to those used earlier to filter on the results of conesearches. Let's print an object to see what it looks like:

In [None]:
# Code Block # 15
# we'll use the json package to pretty print the data
print(json.dumps(objects[0], indent=4))

{
    "psid": "145120008365555264",
    "ra": 0.8365280032157898,
    "dec": 30.937196731567383,
    "objtype": 1,
    "matchfiles": [
        {
            "field": 648,
            "ccd": 2,
            "quad": 2,
            "band": "g",
            "nb_exposures": 474,
            "idx": 6121,
            "amplitude": 0.85447776,
            "min_mag": 17.216894,
            "max_mag": 18.362774
        },
        {
            "field": 648,
            "ccd": 2,
            "quad": 2,
            "band": "r",
            "nb_exposures": 652,
            "idx": 6136,
            "amplitude": 0.71687734,
            "min_mag": 16.26603,
            "max_mag": 17.217215
        }
    ],
    "periods": [
        {
            "field": 648,
            "ccd": 2,
            "quad": 2,
            "band": "g",
            "period": 0.2726629621689033,
            "significance": 48.87137985229492,
            "string_length": 11.694456767129525,
            "bin": 5
        },
        {

---

### Plotting folded lightcurves

Here is THE big piece of code in that notebook, the `plot_folded_lightcurve` method. Python just isn't built for interactivity, and the plot designed here should only be used with a limited number of objects as Jupyter Notebooks have a tendency to be pretty memory inefficient especially with interactive plots (which aren't just static images, but contain all the data they need to display).

This plot comes with a number of features:
- It allows you to select you the period to plot, from the multiple potential periods that are returned by the API for an object.
- It allows you to apply binning to your data (set to 0 to deactivate).
- It allows you to quickly multiple the period used by 2 or 3, and to divide it by 2 or 3.
- You can choose to display the phase one or twice.
- You can easily reset your multipliers and go back to the original period with the Reset button.
- 2 buttons allow you to copy the PSID+period to the clipboard, or to get a link for that object on SIMBAD (doesn't work in `vscode`, only in `jupyter` via your browser).

**Note: All of this will be found in the `zvartools` package once it is updated to support this API.**

In [None]:
# Code Block # 16
band2color = {"g": "green", "r": "red", "i": "orange"}

def period2text(period):
    if period < 1/24:
        return f"{(period * 60 * 24):.2f} minutes"
    elif period < 1:
        return f"{(period * 24):.2f} hours"
    return f"{(period):.2f} days"

def link_button(title: str, icon: str, text: str) -> widgets.Widget:
	button = widgets.Button(description=title, icon=icon)
	output = widgets.Output(layout=widgets.Layout(display="none"))
	copy_js = Javascript(f"navigator.clipboard.writeText({json.dumps(text)})")

	def on_click(_: widgets.Button) -> None:
		output.clear_output()
		output.append_display_data(copy_js)
	button.on_click(on_click)

	return widgets.Box((button, output))

def plot_folded_lightcurve(obj, photometry, periods, bands=["g", "r"], twice=False, bins=0):
    psid = obj["psid"]
    ra, dec = obj["ra"], obj["dec"]
    bands = list({band.lower() for band in bands})
    # sort the periods by significance ascending
    periods = sorted(periods, key=lambda x: x["significance"])
    # deduplicate by period (keep the most significant for each period that appears multiple times)
    periods = {period["period"]: period for period in periods}.values()

    # now sort by significance descending
    periods = sorted(periods, key=lambda x: -x["significance"])

    period_options = [(f"{period2text(period['period'])} (sig: {period['significance']:.2f}, strlen: {period['string_length']:.2f}, bin: {period['bin']:.2f}, band: {period['band']})", period["period"]) for period in periods]
    period = periods[0]["period"]

    time, mag, magerr, filters = remove_nondetections(
        photometry[0], photometry[1], photometry[2], photometry[3]
    )

    mask = np.isin(filters, bands)

    try:
        time = time[mask]
        mag = mag[mask]
        magerr = magerr[mask]
        filters = filters[mask]
    except IndexError:
        raise ValueError(
            f"No data points found in the specified bands for {int(psid)}"
        )

    if len(time) == 0:
        raise ValueError(
            f"No valid data points found to plot_folded_lightcurve for {int(psid)}"
        )

    period_selector = widgets.Dropdown(
        options=period_options,
        value=period,
        description='Period:',
    )
    triple_button = widgets.Button(description="x3")
    double_button = widgets.Button(description="x2")
    half_button = widgets.Button(description="x1/2")
    divide_by_3_button = widgets.Button(description="x1/3")
    reset_button = widgets.Button(description="Reset")
    twice_checkbox = widgets.Checkbox(value=twice, description="Phase x2", disabled=False)
    # add a text input to select the number of bins to use for binning
    bin_input = widgets.BoundedIntText(value=bins, min=0, max=1000, step=1, description="Bins")

    # the copy button just needs to copy the psid and currently selected period to the clipboard
    copy_button = widgets.Button(description="Copy", icon="copy")
    copy_output = widgets.Output(layout=widgets.Layout(display="none"))
    copy_box = widgets.Box((copy_button, copy_output))

    simbad_button = link_button(
        "Simbad",
        "link",
        f"https://simbad.u-strasbg.fr/simbad/sim-coo?Coord={ra}%2C++{dec}&CooFrame=FK5&CooEpoch=2000&CooEqui=2000&CooDefinedFrames=none&Radius=2&Radius.unit=arcsec&submit=submit+query&CoordList="
    )
    factor3_container = widgets.VBox([triple_button, divide_by_3_button])
    factor2_container = widgets.VBox([double_button, half_button])
    external_buttons_container = widgets.VBox([copy_box, simbad_button])
    display_container = widgets.VBox([reset_button, twice_checkbox])
    main_container = widgets.VBox([period_selector, bin_input])
    button_container = widgets.HBox([main_container, factor2_container, factor3_container,display_container, external_buttons_container])

    bands = [band for band in bands if np.any(filters == band)]
    phase = (((time - np.min(time)) / period) % 1.0)

    traces = []
    if bins > 0:
        bin_indices = np.digitize(phase, np.linspace(0, 1, bins + 1)) - 1
        for band in bands:
            mask = filters == band
            phase_temp = phase[mask]
            binned_mag = np.zeros(bins)
            for i in range(bins):
                mask_temp = mask & (bin_indices == i)
                if not np.any(mask_temp):
                    binned_mag[i] = np.nan
                    continue
                binned_mag[i] = np.mean(mag[mask_temp])

            # remove bins with nan
            mask = ~np.isnan(binned_mag)
            binned_mag = binned_mag[mask]
            phase_temp = np.arange(bins)[mask] / bins
            if twice:
                phase_temp = np.concatenate([phase_temp, phase_temp + 1])
                mag_temp = np.concatenate([binned_mag, binned_mag])

            traces.append(
                go.Scatter(
                    x=phase_temp, y=mag_temp, mode="markers", name=band, marker={"color": band2color[band]},
                    meta={
                        "period": period,
                        "band": band,
                        "twice": twice,
                        "selected_period": period,
                        "binning": bins,
                    }
                )
            )

    else:
        for band in bands:
            mask = filters == band
            phase_temp = phase[mask]
            mag_temp = mag[mask]
            magerr_temp = magerr[mask]
            if len(phase_temp) == 0:
                continue
            if twice:
                phase_temp = np.concatenate([phase_temp, phase_temp + 1])
                mag_temp = np.concatenate([mag_temp, mag_temp])
                magerr_temp = np.concatenate([magerr_temp, magerr_temp])
            traces.append(
                go.Scatter(
                    x=phase_temp, y=mag_temp, mode="markers", name=band, marker={"color": band2color[band]},
                    meta={
                        "period": period,
                        "band": band,
                        "twice": twice,
                        "selected_period": period,
                        "binning": 0,
                    }
                )
            )

    g = go.FigureWidget(
        data=traces,
        layout=go.Layout(
            title=f"{psid} (period: {period2text(period)})",
            height=700, width=1200,
            xaxis=dict(title="Phase"),
            yaxis=dict(title="AB Magnitude", autorange="reversed"),
        )
    )

    def change_period(b):
        with g.batch_update():
            multiplier = 1
            period = g.data[0].meta["period"]
            twice = g.data[0].meta["twice"]
            binning = g.data[0].meta["binning"]

            title = b.description
            if title == "x3":
                multiplier = 3
            elif title == "x2":
                multiplier = 2
            elif title == "x1/2":
                multiplier = 0.5
            elif title == "x1/3":
                multiplier = 1/3
            elif title == "Reset":
                multiplier = 1
                period = g.data[0].meta["selected_period"]

            period = period * multiplier
            phase = ((time - np.min(time)) % period / period)

            if binning > 0:
                bin_indices = np.digitize(phase, np.linspace(0, 1, binning + 1)) - 1
                for trace in g.data:
                    mask = filters == trace.meta["band"]
                    phase_temp = phase[mask]
                    binned_mag = np.zeros(binning)
                    for i in range(binning):
                        mask_temp = mask & (bin_indices == i)
                        if not np.any(mask_temp):
                            binned_mag[i] = np.nan
                            continue
                        binned_mag[i] = np.mean(mag[mask_temp])

                    # remove bins with nan
                    mask = ~np.isnan(binned_mag)
                    binned_mag = binned_mag[mask]
                    phase_temp = np.arange(binning)[mask] / binning
                    if twice:
                        phase_temp = np.concatenate([phase_temp, phase_temp + 1])
                        binned_mag = np.concatenate([binned_mag, binned_mag])

                    trace.x = phase_temp
                    trace.y = binned_mag
                    trace.meta["period"] = period

                g.layout.title = f"{psid} (period: {period2text(period)}, binning: {binning})"
            else:
                for trace in g.data:
                    mask = filters == trace.meta["band"]
                    phase_temp = phase[mask]
                    if twice:
                        phase_temp = np.concatenate([phase_temp, phase_temp + 1])
                    trace.x = phase_temp
                    trace.meta["period"] = period

                g.layout.title = f"{psid} (period: {period2text(period)})"

    def change_twice(b):
        with g.batch_update():
            period = g.data[0].meta["period"]
            binning = g.data[0].meta["binning"]
            twice = twice_checkbox.value
            phase = ((time - np.min(time)) % period / period)
            if binning > 0:
                bin_indices = np.digitize(phase, np.linspace(0, 1, binning + 1)) - 1
                for trace in g.data:
                    mask = filters == trace.meta["band"]
                    phase_temp = phase[mask]
                    binned_mag = np.zeros(binning)
                    for i in range(binning):
                        mask_temp = mask & (bin_indices == i)
                        if not np.any(mask_temp):
                            binned_mag[i] = np.nan
                            continue
                        binned_mag[i] = np.mean(mag[mask_temp])

                    # remove bins with nan
                    mask = ~np.isnan(binned_mag)
                    binned_mag = binned_mag[mask]
                    phase_temp = np.arange(binning)[mask] / binning
                    if twice:
                        phase_temp = np.concatenate([phase_temp, phase_temp + 1])
                        binned_mag = np.concatenate([binned_mag, binned_mag])

                    trace.x = phase_temp
                    trace.y = binned_mag
                    trace.meta["twice"] = twice
                g.layout.title = f"{psid} (period: {period2text(period)}, binning: {binning})"
            else:
                for trace in g.data:
                    mask = filters == trace.meta["band"]
                    phase_temp = phase[mask]
                    mag_temp = mag[mask]
                    if twice:
                        phase_temp = np.concatenate([phase_temp, phase_temp + 1])
                        mag_temp = np.concatenate([mag_temp, mag_temp])
                    trace.x = phase_temp
                    trace.y = mag_temp
                    trace.meta["twice"] = twice
                g.layout.title = f"{psid} (period: {period2text(period)})"

    def change_selected_period(change):
        with g.batch_update():
            period = change.new
            twice = g.data[0].meta["twice"]
            binning = g.data[0].meta["binning"]
            phase = ((time - np.min(time)) % period / period)
            if binning > 0:
                bin_indices = np.digitize(phase, np.linspace(0, 1, binning + 1)) - 1
                for trace in g.data:
                    mask = filters == trace.meta["band"]
                    phase_temp = phase[mask]
                    binned_mag = np.zeros(binning)
                    for i in range(binning):
                        mask_temp = mask & (bin_indices == i)
                        if not np.any(mask_temp):
                            binned_mag[i] = np.nan
                            continue
                        binned_mag[i] = np.mean(mag[mask_temp])

                    # remove bins with nan
                    mask = ~np.isnan(binned_mag)
                    binned_mag = binned_mag[mask]
                    phase_temp = np.arange(binning)[mask] / binning
                    if twice:
                        phase_temp = np.concatenate([phase_temp, phase_temp + 1])
                        binned_mag = np.concatenate([binned_mag, binned_mag])

                    trace.x = phase_temp
                    trace.y = binned_mag
                    trace.meta["period"] = period
                    trace.meta["selected_period"] = period

                g.layout.title = f"{psid} (period: {period2text(period)}, binning: {binning})"
            else:
                for trace in g.data:
                    mask = filters == trace.meta["band"]
                    phase_temp = phase[mask]
                    if twice:
                        phase_temp = np.concatenate([phase_temp, phase_temp + 1])
                        mag_temp = mag[mask]
                        mag_temp = np.concatenate([mag_temp, mag_temp])
                        trace.y = mag_temp
                    trace.x = phase_temp
                    trace.meta["period"] = period
                    trace.meta["selected_period"] = period
                g.layout.title = f"{psid} (period: {period2text(period)})"

    def change_binning(change):
        with g.batch_update():
            period = g.data[0].meta["period"]
            twice = g.data[0].meta["twice"]
            binning = change.new
            phase = ((time - np.min(time)) % period / period)
            if binning > 0:
                bin_indices = np.digitize(phase, np.linspace(0, 1, binning + 1)) - 1
                for trace in g.data:
                    mask = filters == trace.meta["band"]
                    phase_temp = phase[mask]
                    binned_mag = np.zeros(binning)
                    for i in range(binning):
                        mask_temp = mask & (bin_indices == i)
                        if not np.any(mask_temp):
                            binned_mag[i] = np.nan
                            continue
                        binned_mag[i] = np.mean(mag[mask_temp])

                    # remove bins with nan
                    mask = ~np.isnan(binned_mag)
                    binned_mag = binned_mag[mask]
                    phase_temp = np.arange(binning)[mask] / binning
                    if twice:
                        phase_temp = np.concatenate([phase_temp, phase_temp + 1])
                        binned_mag = np.concatenate([binned_mag, binned_mag])

                    trace.x = phase_temp
                    trace.y = binned_mag
                    trace.meta["binning"] = binning

                g.layout.title = f"{psid} (period: {period2text(period)}, binning: {binning})"
            else:
                for trace in g.data:
                    mask = filters == trace.meta["band"]
                    phase_temp = phase[mask]
                    mag_temp = mag[mask]
                    if twice:
                        phase_temp = np.concatenate([phase_temp, phase_temp + 1])
                        mag_temp = np.concatenate([mag_temp, mag_temp])
                    trace.x = phase_temp
                    trace.y = mag_temp
                    trace.meta["binning"] = binning
                g.layout.title = f"{psid} (period: {period2text(period)})"

    def on_click_copy(b):
        with g.batch_update():
            period = g.data[0].meta["period"]

        # copy the psid and period to the clipboard
        copy_js = Javascript(f"navigator.clipboard.writeText({json.dumps(f'{psid}, {period}')})")
        copy_output.clear_output()
        copy_output.append_display_data(copy_js)

    triple_button.on_click(change_period)
    double_button.on_click(change_period)
    half_button.on_click(change_period)
    divide_by_3_button.on_click(change_period)
    reset_button.on_click(change_period)
    twice_checkbox.observe(change_twice, names="value")
    period_selector.observe(change_selected_period, names="value")
    copy_button.on_click(on_click_copy)
    bin_input.observe(change_binning, names="value")

    result = widgets.VBox([g, button_container])
    display(result)

Ok now that we have object(s), lightcurve(s), and a method to show them, let's do so in a loop.

In [None]:
# Code Block # 17
for i, obj in enumerate(objects):
    psid = obj["psid"]
    lc = prepare_photometry(lightcurves[str(psid)], min_snr=1, min_points=50)

    if len(lc[0]) < 200:
        print(f"Too few data points for PSID {psid} ({len(lc[0])})")
        continue
    try:
        plot_folded_lightcurve(obj, lc, obj['periods'], bands=["g", "r"], twice=True, bins=200)
    except Exception as e:
        print(f"Error plotting PSID {psid}: {e}")

VBox(children=(FigureWidget({
    'data': [{'marker': {'color': 'red'},
              'meta': {'band': 'r',
  …

Nice! This object is clearly periodic, with a period ~3.27 hours.

<br>

Before we move to another section, let's plot a couple of extra objects. To make things very convenient for us, let's define a `plot_obj` method that is a wrapper of the `plot_folded_lightcurve` one, which can take a PSID, an object without lightcurve, or an object with lightcurve as input. This makes is super convenient to rapidly plot objects with a one-liner, though it isn't as flexible as the `plot_folded_lightcurve` method, nor as efficient as it runs API calls to retrieve the data for an object every time.

In [None]:
# Code Block # 18
def plot_obj(obj, period=None):
    if isinstance(obj, int | str):
        obj = client.get_object(obj)
        obj["photometry"] = prepare_photometry(obj["photometry"], min_snr=1, min_points=100)
    elif isinstance(obj, dict):
        if "photometry" not in obj:
            result = client.get_photometry([obj["psid"]], bands=["g", "r"], verbose=False)
            obj["photometry"] = prepare_photometry(result[str(obj["psid"])], min_snr=1, min_points=100)
    else:
        raise ValueError("obj should be a psid or a dict with a psid key")

    if period is not None:
        periods = [{"period": period, "significance": 0, "string_length": 0, "band": "g", "bin": "bin0", "amplitude": 0}]
    else:
        periods = obj["periods"]
    plot_folded_lightcurve(obj, obj["photometry"], periods, twice=True, bins=200)

In [None]:
# Code Block # 19
plot_obj(126541217361397171)

NameError: name 'plot_obj' is not defined


---

### Crossmatching ZVAR sources with other catalogs

Let's look at how the API can be used to retrieve data from other surveys for any ZVAR object.

Using the ID (PSID) of any object, we can easily use the client to get their closest match in:
- gaia
- ps1
- galex
- 2mass
- allwise

Soon, other catalogs (including SDSS spectra) will be added to this API. Please reach out to <tdulaz@caltech.edu> if you have any specific requests.

This method takes a list of object ids, a catalog to crossmatch them with, a maximum radius, and a boolean flag to indicate if we want to return all the matches or only the closest one. As an example, let's look at how this is done for gaia:

In [None]:
# Code Block # 20
gaia_xmatches = client.xmatch(psids=[objects[0]['psid']], catalog="gaia", radius=3.0, closest=False)

for psid in gaia_xmatches:
    print(f"Found {len(gaia_xmatches[psid])} xmatches for PSID {psid}:")
    for xmatch in gaia_xmatches[psid]:
        print(f"{json.dumps(xmatch, indent=4)}")

Found 1 xmatches for PSID 145120008365555264:
{
    "id": "2873447302730839936",
    "ra": 0.8365264657830731,
    "dec": 30.937195479478635,
    "phot_g_mean_mag": 16.708706,
    "phot_bp_mean_mag": 17.312077,
    "phot_rp_mean_mag": 15.944344000000001,
    "parallax": 0.7995832359862339,
    "parallax_error": 0.06608167,
    "pm": 2.5535354999999997,
    "pmra": 1.0393465976352718,
    "pmra_error": 0.080007255,
    "pmdec": -2.33244562333133,
    "pmdec_error": 0.039681613
}



---

### Example workflow:

Let's end this notebook with a simple example that uses most of the methods we've shown so far. Let's say I want to look for objects in the ZVAR database at a specific position, and we try to subsample "giants" based on their Gaia color-magnitude diagram. We can do this by:
1. Query the API for objects in a given position, with a strict cut on their periods' significance to get try to get rid of noise.
2. Xmatch them with gaia, removing those without a match.
3. Only keep objects with valid gaia data, with BP - RP > 1.0, and G (absolute mag) < 3.0
4. We plot the remaining objects' lightcurves, but could also easily plot them on a CMD.

In [None]:
# Code Block # 21
ra, dec = 4.9385528564453125, 36.31493377685547

objects = client.cone_search(
    ra=ra, dec=dec,
    radius=10 * 60, # 10 arcmin
    objtype=1,
    # we apply a cut on significance, to try to remove false positives
    # a cut on period is required when specifying significance
    # so let's just use the minimum and maximum periods we compute
    min_period=5/60/24, max_period=10, # 5 minutes to 10 days
    min_significance=40,
)

print(f"Total objects found: {len(objects)}")

# we get the gaia data for each
gaia_xmatches = client.xmatch(psids=[o['psid'] for o in objects], catalog="gaia", radius=5.0, closest=False)

# let's directly add the gaia matches to the objects, for convenience
for obj in objects:
    obj["gaia"] = gaia_xmatches.get(str(obj["psid"]), [])

# now we only keep those with a match
objects = [obj for obj in objects if len(obj["gaia"]) > 0]

# subsample based on Gaia data
objects = [
    obj
    for obj in objects if any(
        # first let's check that the quantities we need are not None
        x["parallax"] is not None
        and x["parallax_error"] is not None
        and x["phot_bp_mean_mag"] is not None
        and x["phot_rp_mean_mag"] is not None
        and x["phot_g_mean_mag"] is not None

        # we check that the parallax is significant enough
        and x["parallax"] / x["parallax_error"] > 3

        # we check if the bp - rp color is greater than 1
        and x["phot_bp_mean_mag"] - x["phot_rp_mean_mag"] > 1.0

        # we check that the g-band absolute magnitude is less than 3
        and x["phot_g_mean_mag"] + 5.0 * np.log10(x["parallax"] / 1000) < 3.0
        for x in obj["gaia"]
    )
]

print(f"Nb of giant candidates: {len(objects)}")

for obj in objects:
    plot_obj(obj)


Total objects found: 3
Nb of giant candidates: 2


VBox(children=(FigureWidget({
    'data': [{'marker': {'color': 'red'},
              'meta': {'band': 'r',
  …

VBox(children=(FigureWidget({
    'data': [{'marker': {'color': 'red'},
              'meta': {'band': 'r',
  …


---

### Crossmatching other catalogs with ZVAR

Last but not least, let's look at the opposite: given a list of IDs of sources from another catalog, let's look for the corresponding ZVAR sources.

Using the ID of any object in the following catalogs:
- gaia
- ps1
- galex
- 2mass
- allwise
We can easily use the client these object's AND their associated ZVAR sources.

Unlike the ZVAR -> External catalog crossmatch presented earlier, this does not return a dictionnary with the ZVAR IDs as keys and the list of sources in the external catalog as values, but instead returns a dictionnary with the other catalog's IDs as keys, the other catalog's object as value, where each object contains a `xmatch` field which is the list of corresponding ZVAR objects. This way, you also get all the metadata from the external catalog objects directly and do not need to use this method and then have to query the external catalog again if you want their metadata. This reduces the number of operations you need to perform to get data from the external catalog and ZVAR.

<br/>

**Let's look at a quick example:**

_Looking for ZVAR sources that match with a known Gaia source_

In [None]:
# Code Block # 22
gaia_ids = ["2873447302730839936"]

# we get the gaia data and zvar xmatches for each gaia ID
gaia_sources_with_zvar_xmatches = client.external_xmatch(gaia_ids, catalog="gaia", radius=5.0, closest=False)

# let's print each gaia source and its position
# and then for each of them we'll print the zvar xmatches ids, ra, dec, and number of periods
for gaia_id, gaia_source in gaia_sources_with_zvar_xmatches.items():
    print(f"Gaia {gaia_id} (ra={gaia_source['ra']}, dec={gaia_source['dec']}, parallax: {gaia_source.get('parallax', None)}):")
    zvar_xmatches = gaia_source['xmatches']
    for xmatch in zvar_xmatches:
        print(f" - ZVAR {xmatch['psid']} (ra={xmatch['ra']}, dec={xmatch['dec']}, nb_periods={len(xmatch['periods'])})")

Gaia 2873447302730839936 (ra=0.8365264657830731, dec=30.937195479478635, parallax: 0.7995832359862339):
 - ZVAR 145120008365555264 (ra=0.8365280032157898, dec=30.937196731567383, nb_periods=6)


### Here are some tips to make use of the ZVAR API with other workflows, based on the feedback from some of you.

- First off, the ability to convert the array of dictionaries returned by the API to pandas dictionnary can help combine the API with other "dataframe-powered" bits of code you may have. While nested data (e.g. a list of dictionnaries which themselves contain lists of elements) can't simply be converted to a dictionnary, anything else can simply be converted using Pandas native methods to go from dict to dataframe and vice versa, here is a quick example with the photometry:

In [None]:
# Code Block # 23
import pandas as pd

psid = 145120008365555264
lightcurve = client.get_photometry(psids=[psid])[str(psid)]

# let's convert to a dataframe
df = pd.DataFrame(lightcurve)
print("As a dataframe:")
print(df.head())
print(f"We have {df.shape[0]} elements with {df.shape[1]} columns")

# now you may want to remove non detections
# i.e. rows with no mag value
df = df.dropna(subset=['mag'])

# if you wanted to do the opposite (from dataframe to list):
lightcurve = df.to_dict(orient='records')

print("\nAs a list of dicts:")
print(f"Lightcurve is a {type(lightcurve)}, its {len(lightcurve)} elements are of type {type(lightcurve[0])}, with {len(lightcurve[0])} fields each")
print(lightcurve[0])


100%|██████████| 1/1 [00:00<00:00,  2.35it/s]

As a dataframe:
           time        mag    magerr limmag        flux    fluxerr  \
0  2.458264e+06  17.416570  0.036400   None   345.24374  95.399110   
1  2.458269e+06  16.449749  0.012616   None  1278.26930  78.648030   
2  2.458272e+06  16.419369  0.010663   None  1470.34390  68.358690   
3  2.458274e+06  16.546500  0.010184   None   701.21387  58.076283   
4  2.458274e+06  17.316734  0.018827   None   619.30790  54.096283   

          snr band  field  ccd  quad  
0   29.828255    g    648    2     2  
1   86.062150    r    648    2     2  
2  101.826020    r    648    2     2  
3  106.610880    r    648    2     2  
4   57.668530    g    648    2     2  
We have 1126 elements with 11 columns

As a list of dicts:
Lightcurve is a <class 'list'>, its 1126 elements are of type <class 'dict'>, with 11 fields each
{'time': 2458263.981924, 'mag': 17.41657, 'magerr': 0.036399588, 'limmag': None, 'flux': 345.24374, 'fluxerr': 95.39911, 'snr': 29.828255, 'band': 'g', 'field': 648, 'ccd':




Also, you can now pick if the image differences photometry should be returned as magnitudes, or as the raw flux values from the original dataset. By default data is returned in mag, but you can specify the format and set it to flux when calling the `get_photometry` method, here is an example:

In [None]:
# Code Block # 24
psid = 145120008365555264
lightcurve = client.get_photometry(psids=[psid], format="flux")[str(psid)] # format = flux

# let's print the first datapoint
print()
print(lightcurve[0])

100%|██████████| 1/1 [00:00<00:00,  2.19it/s]


{'time': 2458263.981924, 'flux': 345.24374, 'fluxerr': 95.39911, 'snr': 3.618941, 'band': 'g', 'field': 648, 'ccd': 2, 'quad': 2, 'magzp': 26.052000762939453, 'mag_ref': -8.494999885559082}



