***
# from ztfquery import fritz
***

### ztfquery.query enables you to get information from the ZTF-II "fritz marshal". 
It corresponds to alert and assocated informations.

ztfquery.fritz basically is a python wrapper of the fritz version of the skyportal web API (https://docs.fritz.science/api.html)

**Note**: you will need a token from you fritz account (https://fritz.science/profile)


**This tutorial will focus on Sample access and user accounts **, 

- see Fritz 2.2 for storing and retrieving data.
- see Fritz 2.1 for individual target data (lightcurve, alerts, spectra, source information)



You will be able to download, for a given target, its:

    - Samples (i.e. 'sources')
    - user account
    - groups

You will use for that `fritz.download_{this}` with `this` being is any of these. 
You can either get the data as they are in Frizt, or use dedicated `Fritz{Object}` that have useful methods. 

***
# Step 0: imports

In [2]:
%matplotlib notebook
from ztfquery import fritz

***
# Fritz Groups


Fritz organise `source`s in groups, like `"Same host SNe"`, each group has its own `id`. The `FritzGroup` object enables you to see all the existing groups and, most importantly, to go from `id` to `groupname` using the `FritzGroup.fetch_groupname(id)` and `FritzGroup.fetch_groupid(group_name)` class method.

As usual the `download_groups()` enables to get the group object. In practice, you don't really need that. 

The `FritzGroups.load()` is similar to the usual `.from_name()` class method. It dowloads the groups data and store them in `$ZTFDATA/sample/fritz_groups.json` such that next time you call the function is it will be much faster. use `force_dl=True` to update the file

In [2]:
group = fritz.FritzGroups.load() 

In [3]:
group.data

Unnamed: 0,single_user_group,name,created_at,nickname,private,modified,id
0,False,ACAI,2020-11-30T08:49:10.252369,acai,False,2020-11-30T08:49:10.252369,217
1,False,ACAI_TNS,2020-12-03T02:34:16.003443,acai_tns,False,2020-12-03T02:34:16.003443,226
2,False,AGN Flares,2020-10-28T00:52:37.590023,agnflare,False,2020-10-28T00:52:37.590023,71
3,False,AMCVn,2021-03-04T04:15:45.629195,AMCVn,False,2021-03-04T04:15:45.629195,256
4,False,AmpelGroup_test1,2020-10-25T21:19:31.751195,AGt1,False,2020-10-25T21:19:31.751195,63
...,...,...,...,...,...,...,...
81,False,Young Type Ia Supernovae,2020-10-21T06:20:34.588728,young,False,2020-10-21T06:20:34.588728,52
82,False,ZTF II Associates,2020-11-24T20:05:58.713932,associates,False,2020-11-24T20:05:58.713932,213
83,False,ZTFBH Nuclear,2020-10-21T06:20:34.665079,nuc,False,2020-10-21T06:20:34.665079,58
84,False,ZTFReST,2020-10-26T16:33:05.875241,ZTFReST,False,2020-10-26T16:33:05.875241,64


### Check the groups you have access to:

In [4]:
group.accessible

Unnamed: 0,single_user_group,name,created_at,nickname,private,modified,id
0,False,AmpelGroup_test1,2020-10-25T21:19:31.751195,AGt1,False,2020-10-25T21:19:31.751195,63
1,False,Calibrator SNe Ia,2021-02-10T14:58:35.802774,,False,2021-02-10T14:58:35.802774,253
2,False,Cosmology with Type Ia Supernovae,2020-10-21T06:20:34.627473,cos,False,2020-10-21T06:20:34.627473,55
3,False,Infant Supernovae,2020-10-21T06:20:34.549465,infant,False,2020-10-21T06:20:34.549465,49
4,False,Nuclear Transients,2020-11-02T21:16:05.759655,,False,2020-11-02T21:16:05.759655,80
5,False,Physics of Transients,2020-11-05T19:00:21.569886,SWGSN,False,2020-11-05T19:00:21.569886,88
6,False,RCF Junk and Variables,2021-03-03T21:11:03.904431,RCFJunk,False,2021-03-03T21:11:03.904431,255
7,False,Redshift Completeness Factor,2020-10-21T06:20:34.429576,rcf,False,2020-10-21T06:20:34.429576,41
8,False,Rigault Research Group,2020-11-09T19:40:43.560360,IN2P3MR,False,2020-11-09T19:40:43.560360,125
9,False,Same host SNe,2021-02-08T12:55:52.389835,samehost,False,2021-02-08T12:55:52.389835,251


the `group.groupid_to_groupname()` and `group.groupname_to_groupid()` enables yuse to do the conversion once the group object is loaded. 

Again, you are invited to use `fetch_groupname(groupid)` this looks for the `groupid` in the `groups.data["id"]` and returns the corresponding name. If none are found, `fetch_groupname` update the group data by redownloading it and retries

#### Example,  get the id of the "Same host SNe"

In [5]:
fritz.FritzGroups.fetch_groupid("Same host SNe")

251

In [6]:
fritz.FritzGroups.fetch_groupname(251, nickname=True)

'samehost'

In [7]:
g = fritz.FritzGroups.load()

In [8]:
g.data

Unnamed: 0,single_user_group,name,created_at,nickname,private,modified,id
0,False,ACAI,2020-11-30T08:49:10.252369,acai,False,2020-11-30T08:49:10.252369,217
1,False,ACAI_TNS,2020-12-03T02:34:16.003443,acai_tns,False,2020-12-03T02:34:16.003443,226
2,False,AGN Flares,2020-10-28T00:52:37.590023,agnflare,False,2020-10-28T00:52:37.590023,71
3,False,AMCVn,2021-03-04T04:15:45.629195,AMCVn,False,2021-03-04T04:15:45.629195,256
4,False,AmpelGroup_test1,2020-10-25T21:19:31.751195,AGt1,False,2020-10-25T21:19:31.751195,63
...,...,...,...,...,...,...,...
81,False,Young Type Ia Supernovae,2020-10-21T06:20:34.588728,young,False,2020-10-21T06:20:34.588728,52
82,False,ZTF II Associates,2020-11-24T20:05:58.713932,associates,False,2020-11-24T20:05:58.713932,213
83,False,ZTFBH Nuclear,2020-10-21T06:20:34.665079,nuc,False,2020-10-21T06:20:34.665079,58
84,False,ZTFReST,2020-10-26T16:33:05.875241,ZTFReST,False,2020-10-26T16:33:05.875241,64


***
# Sample

You can download a group `sample` corresponding to the list of individual `source` of a group. To do so, use the `fritz.download_sample()` function that contains the usual `get_object`, `store` options but also many options to filter the queried data. (see also `fritz.download_sources()`)

### Warning: Sample size

Large sample, such as "rcf", contains several thousands of sources and downloading them all at once may crash (time out from fritz). The `savesummary=True` option enables to solve this problem. With this key set to True, you do not download the full `source` information, but sample a table containing few information, such as the `source` names and the creation dates. This is fast (few seconds) and reliable. The `source` could then be downloaded using the `fritz.bulk_download()` function. The following `FritzSample` object uses this. 


## FritzSample (highly recommanded)


The `FritzSample` object is a collection of `source`s and contains the following useful attributes:
- `sources`: list of FritzSource 
- `nsource`: number of sources
- `names`: names of the sources
- `data`: DataFrame containing a summary of the indivudual source information (like ra, dec, redshift, classification)
- `groupid`, `groupname` and `groupnickname`: ID, name and nickname of the sample group is any


### the I/O is based on `data`

`FritzSample` has the usual i/o methods that are based on storing and reading the `data` attribute. 
- `store()`: store the `data` attribute into `$ZTFDATA/sample/fritz_sample_{id_}.csv` where `id_` is, by default `groupnickname` if not `None` else the `groupid`. The file can later be reloaded using the `read_csv()` class method.


- `store_sources()`: stores the individual sources into `$ZTFDATA/source/`. Basicallu, this calls each `source.store()`. Remark that  `store_sources` is an option from `store()`.



- `to_{extension}(filename)`, `read_{extension}(filename)`: where {extension} could be `csv`, `json`, `parquet`, `hdf`

#### the `from_group()` class method.


The `from_group()` class method is the equivalent of the `from_name()` method of the individual objects (lightcurve, source, alerts, spectrum). It will first look if you store the data locally and if not will download them using `download_sample( savesummary=True)` ; the `force_dl=True` forces to call the `download_sample( savesummary=True)` function. 

`from_group()` is indeed based on the summary rather than on the full source as it is both faster and more reliable. To do so, the `source` names are taken from the summary and the `source`s are downloaded using the `fetch_data('source')` method (itself calling the `bulk_download('source')` function). This way, the source already stored are not redownloaded except if the `update_sources=True` option of `from_group()` is set. 

**In summary:**

- `FritzSample.from_group()`: loads the object given a groupname or a groupid (similar to the `from_name()` classmethod). **important**: `force_dl=True` forces to update the source list of the group. `update_sources` forces to re-download the individual sources (corresponds the `force_dl=True` to `Source.from_name()`). **Fasten**: if you don't need the full source detail, use `load_sources=False`, then the sources won't be loaded or downloaed.

#### the `from_names()` class method.

You can build yourself a FritzSample by giving a list of source names. Simply do:

- `FritzSample.from_names(list_of_names)`: This set self.names and the sources will be downloaded using fetch_data() except of `load_sources=False`.




### Some useful methods

In addition to the I/O and `data` attribute. `FritzSample` has some useful methods.

- `get_source(name)`: returns the `FritzSource` associated to the given name
- `fetch_data()`: parallel bulk downloads of the data assocated to your sample could be `lightcurve`, `spectra`, `alerts` or `source`. This is what is used to load the sources when the from_names() or from_group() class method are called (and `load_sources=True`).



***
## Examples

### The basic `download_sample(get_object=False)`

In [18]:
%time cosmo_sources = fritz.download_sample(groupid=251, get_object=False)

CPU times: user 68.7 ms, sys: 22.4 ms, total: 91.1 ms
Wall time: 7.6 s


by default savesummary=False in `download_sample()` so the "sources" entry of the returned dictionary is actually a list of full source information. You can for instance set a FritzSource from it.

In [19]:
cosmo_sources

{'totalMatches': 119,
 'sources': [{'dec': 33.9065364,
   'score': 0.9247311949729919,
   'altdata': {'passing_alert_id': 1184172293915015007},
   'id': 'ZTF17aadlxmv',
   'origin': None,
   'dist_nearest_source': None,
   'ra_dis': None,
   'internal_key': '4c81bf2d-cd33-4944-ab39-d3629e5f9122',
   'mag_nearest_source': None,
   'dec_dis': None,
   'detect_photometry_count': None,
   'e_mag_nearest_source': None,
   'ra_err': None,
   'created_at': '2020-11-04T21:26:32.647533',
   'transient': False,
   'dec_err': None,
   'modified': '2020-11-27T13:07:38.921342',
   'varstar': False,
   'offset': 0.0,
   'ra': 127.4480181,
   'is_roid': False,
   'redshift': 0.062,
   'redshift_history': [{'value': 0.062,
     'set_at_utc': '2020-11-05T12:53:28.932083',
     'set_by_user_id': 22}],
   'thumbnails': [{'type': 'new',
     'created_at': '2020-11-04T21:26:32.942298',
     'origin': None,
     'file_uri': '/skyportal/static/thumbnails/ZTF17aadlxmv_new.png',
     'modified': '2020-11-04T21

In [20]:
source = fritz.FritzSource(cosmo_sources["sources"][0])

In [21]:
source.get_classification(full=False)

'Ia'

Now the savesummary

In [22]:
%time summary_cosmo_sources = fritz.download_sample(groupid=251, get_object=False, savesummary=True)

CPU times: user 25.4 ms, sys: 5.5 ms, total: 30.9 ms
Wall time: 442 ms


In [23]:
summary_cosmo_sources

{'sources': [{'id': 29763,
   'created_at': '2021-02-08T14:39:53.040735',
   'saved_at': '2021-02-08T14:39:53.040735',
   'unsaved_by_id': None,
   'active': True,
   'saved_by_id': 30,
   'group_id': 251,
   'modified': '2021-02-08T14:39:53.040735',
   'unsaved_at': None,
   'requested': False,
   'obj_id': 'ZTF17aadlxmv'},
  {'id': 29730,
   'created_at': '2021-02-08T14:06:57.239913',
   'saved_at': '2021-02-08T14:06:57.239913',
   'unsaved_by_id': None,
   'active': True,
   'saved_by_id': 30,
   'group_id': 251,
   'modified': '2021-02-08T14:06:57.239913',
   'unsaved_at': None,
   'requested': False,
   'obj_id': 'ZTF18aagrcfl'},
  {'id': 39284,
   'created_at': '2021-03-19T10:08:34.247609',
   'saved_at': '2021-03-19T10:08:34.247609',
   'unsaved_by_id': None,
   'active': True,
   'saved_by_id': 30,
   'group_id': 251,
   'modified': '2021-03-19T10:08:34.247609',
   'unsaved_at': None,
   'requested': False,
   'obj_id': 'ZTF18aahmxqa'},
  {'id': 29862,
   'created_at': '2021-02

That is:

In [24]:
import pandas
pandas.DataFrame(summary_cosmo_sources["sources"])

Unnamed: 0,id,created_at,saved_at,unsaved_by_id,active,saved_by_id,group_id,modified,unsaved_at,requested,obj_id
0,29763,2021-02-08T14:39:53.040735,2021-02-08T14:39:53.040735,,True,30,251,2021-02-08T14:39:53.040735,,False,ZTF17aadlxmv
1,29730,2021-02-08T14:06:57.239913,2021-02-08T14:06:57.239913,,True,30,251,2021-02-08T14:06:57.239913,,False,ZTF18aagrcfl
2,39284,2021-03-19T10:08:34.247609,2021-03-19T10:08:34.247609,,True,30,251,2021-03-19T10:08:34.247609,,False,ZTF18aahmxqa
3,29862,2021-02-08T17:57:33.910417,2021-02-08T17:57:33.910417,,True,30,251,2021-02-08T17:57:33.910417,,False,ZTF18aakaljn
4,29734,2021-02-08T17:54:58.978156,2021-02-08T17:54:58.978156,,True,30,251,2021-02-08T17:54:58.978156,,False,ZTF18aakecej
...,...,...,...,...,...,...,...,...,...,...,...
114,30252,2021-02-10T08:06:30.807803,2021-02-10T08:06:30.807803,,True,30,251,2021-02-10T08:06:30.807803,,False,ZTF21aajfpwk
115,30856,2021-02-13T21:03:22.538429,2021-02-13T21:03:22.538429,,True,30,251,2021-02-13T21:03:22.538429,,False,ZTF21aakjfmr
116,31073,2021-02-17T08:30:39.329914,2021-02-17T08:30:39.329914,,True,30,251,2021-02-17T08:30:39.329914,,False,ZTF21aaldrdx
117,39101,2021-03-18T14:25:41.110716,2021-03-18T14:25:41.110716,,True,30,251,2021-03-18T14:25:41.110716,,False,ZTF21aaprfqv


## FritzSample.from_group()

In [26]:
%time fsample = fritz.FritzSample.from_group("Calibrator SNe Ia", load_sources=False, force_dl=True)

CPU times: user 65 ms, sys: 6.48 ms, total: 71.5 ms
Wall time: 948 ms


We called `from_group()` with `load_sources=False` and `force_dl=True` (to mimic if it has never been stored before). This means that the `download_sample(savesummary=True)` has been called and only the source names have been store. hence the `data` or `sources` are empty, but `names` is not.

In [27]:
fsample.data

In [28]:
fsample.sources

In [29]:
fsample.names

array(['ZTF18acbvgqw', 'ZTF19aacgslb', 'ZTF19aatlmbo', 'ZTF19adcecwu',
       'ZTF20aaumsrr', 'ZTF20aavpnlv', 'ZTF20abijfqq', 'ZTF20abqvsik',
       'ZTF20abrjmgi', 'ZTF20achlced', 'ZTF20aclwclm', 'ZTF20acogywb',
       'ZTF20acuosvy', 'ZTF21aaabvit', 'ZTF21aafdxca'], dtype='<U12')

to load the `sources` (which set `data`) use the `load_sources()` method, seting `force_dl=True` to redownload existing sources if you want to.

In [30]:
%time fsample.load_sources(force_dl=True) # by default this has a multiprocessing of 4, change depending on your machine.

CPU times: user 24.4 ms, sys: 29.7 ms, total: 54.1 ms
Wall time: 3.76 s


In [31]:
fsample.data

Unnamed: 0_level_0,redshift,ra,dec,classification,created_at,last_detected_at
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ZTF18acbvgqw,0.008673,46.512556,-15.611485,Ia,2020-11-05T06:57:06.837416,2020-02-02T03:38:04.001291+00:00
ZTF19aacgslb,0.00452,157.341541,29.510627,Ia,2020-11-05T03:39:15.319208,2021-02-28T08:10:58.002238+00:00
ZTF19aatlmbo,0.007755,208.371399,40.275421,Ia,2020-11-04T21:37:23.495001,2020-03-28T06:47:16.995837+00:00
ZTF19adcecwu,0.00924,186.840946,64.799954,Ia,2020-11-04T21:08:29.220128,2020-07-12T04:59:44.998083+00:00
ZTF20aaumsrr,0.0073,185.015712,5.343306,Ia,2020-11-04T20:44:04.501823,2020-07-03T04:45:00.996481+00:00
ZTF20aavpnlv,0.005838,170.360222,3.014693,Ia,2020-10-29T12:51:15.532364,2021-03-20T07:08:41.003518+00:00
ZTF20abijfqq,0.002432,186.350786,18.203555,Ia,2020-10-29T12:50:05.321566,2021-03-30T06:50:48.001935+00:00
ZTF20abqvsik,0.002597,179.311182,49.292186,Ia,2020-10-22T12:30:41.456615,2021-03-30T07:07:48.999374+00:00
ZTF20abrjmgi,0.003639,197.657469,36.628687,Ia,2020-10-29T12:51:28.169625,2021-03-22T08:37:08.996171+00:00
ZTF20achlced,0.008246,21.028712,12.921472,Ia,2020-10-22T06:33:16.461437,2021-03-07T02:58:06.003837+00:00


**Remark** `load_sources` is the default option of `from_group()`

In [32]:
# here I update the source name list from the group, but I don't force the reload of already stored sources if any.
%time fsample = fritz.FritzSample.from_group("Calibrator SNe Ia", force_dl=True, update_sources=False)

CPU times: user 78.4 ms, sys: 25.6 ms, total: 104 ms
Wall time: 2.71 s


In [33]:
fsample.data

Unnamed: 0_level_0,redshift,ra,dec,classification,created_at,last_detected_at
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ZTF18acbvgqw,0.008673,46.512556,-15.611485,Ia,2020-11-05T06:57:06.837416,2020-02-02T03:38:04.001291+00:00
ZTF19aacgslb,0.00452,157.341541,29.510627,Ia,2020-11-05T03:39:15.319208,2021-02-28T08:10:58.002238+00:00
ZTF19aatlmbo,0.007755,208.371399,40.275421,Ia,2020-11-04T21:37:23.495001,2020-03-28T06:47:16.995837+00:00
ZTF19adcecwu,0.00924,186.840946,64.799954,Ia,2020-11-04T21:08:29.220128,2020-07-12T04:59:44.998083+00:00
ZTF20aaumsrr,0.0073,185.015712,5.343306,Ia,2020-11-04T20:44:04.501823,2020-07-03T04:45:00.996481+00:00
ZTF20aavpnlv,0.005838,170.360222,3.014693,Ia,2020-10-29T12:51:15.532364,2021-03-20T07:08:41.003518+00:00
ZTF20abijfqq,0.002432,186.350786,18.203555,Ia,2020-10-29T12:50:05.321566,2021-03-30T06:50:48.001935+00:00
ZTF20abqvsik,0.002597,179.311182,49.292186,Ia,2020-10-22T12:30:41.456615,2021-03-30T07:07:48.999374+00:00
ZTF20abrjmgi,0.003639,197.657469,36.628687,Ia,2020-10-29T12:51:28.169625,2021-03-22T08:37:08.996171+00:00
ZTF20achlced,0.008246,21.028712,12.921472,Ia,2020-10-22T06:33:16.461437,2021-03-07T02:58:06.003837+00:00


### Store

The data will be stored in `$ZTFDATA/fritz/sample/` using the groupnickname if any, the groupid otherwise (or any groupname you provide)

In [34]:
print(fsample.groupnickname)
print(fsample.groupid)

None
253


In [35]:
fsample.store()

In [36]:
import os
os.listdir(os.path.join(os.getenv('ZTFDATA'), "fritz/sample/"))

['fritz_sample_253.csv', 'fritz_groups.json', 'fritz_sample_samehost.csv']

## from_group() of a stored sample

If you already have stored a sample, you can retrieve it with `from_group()` (if `force_dl=False`, which is default). Then this time the data (and not just the names) will be set, since the `store()` method stores `data`.


In [37]:
%time fsample = fritz.FritzSample.from_group(253, load_sources=False, force_dl=False)

CPU times: user 42.3 ms, sys: 5.79 ms, total: 48.1 ms
Wall time: 392 ms


In [38]:
fsample.data

Unnamed: 0_level_0,index,redshift,ra,dec,classification,created_at,last_detected_at
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
ZTF18acbvgqw,0,0.008673,46.512556,-15.611485,Ia,2020-11-05T06:57:06.837416,2020-02-02T03:38:04.001291+00:00
ZTF19aacgslb,1,0.00452,157.341541,29.510627,Ia,2020-11-05T03:39:15.319208,2021-02-28T08:10:58.002238+00:00
ZTF19aatlmbo,2,0.007755,208.371399,40.275421,Ia,2020-11-04T21:37:23.495001,2020-03-28T06:47:16.995837+00:00
ZTF19adcecwu,3,0.00924,186.840946,64.799954,Ia,2020-11-04T21:08:29.220128,2020-07-12T04:59:44.998083+00:00
ZTF20aaumsrr,4,0.0073,185.015712,5.343306,Ia,2020-11-04T20:44:04.501823,2020-07-03T04:45:00.996481+00:00
ZTF20aavpnlv,5,0.005838,170.360223,3.014693,Ia,2020-10-29T12:51:15.532364,2021-03-20T07:08:41.003518+00:00
ZTF20abijfqq,6,0.002432,186.350786,18.203555,Ia,2020-10-29T12:50:05.321566,2021-03-30T06:50:48.001935+00:00
ZTF20abqvsik,7,0.002597,179.311182,49.292186,Ia,2020-10-22T12:30:41.456615,2021-03-30T07:07:48.999374+00:00
ZTF20abrjmgi,8,0.003639,197.657469,36.628687,Ia,2020-10-29T12:51:28.169625,2021-03-22T08:37:08.996171+00:00
ZTF20achlced,9,0.008246,21.028712,12.921471,Ia,2020-10-22T06:33:16.461437,2021-03-07T02:58:06.003837+00:00


***
# Access a large sample, use of Dask (multiprocessing 4 by default)

Some sample are very large, by default ztfquery.fritz uses multiprocessing when calling `bulk_download()` (which is used to set the `sources` attribute through the `fetch_data()` method). By default `nprocess=4` feel free to increase that if you machine enables you to (4 or 8 is good for a typical laptop).

`Dask` https://dask.org/ is very simple to use and enables you to go 1 step further (it can scale to thousands of machine running in parallel without effort).

## The RCF example.

let's check the size of the sample

In [39]:
%time summary = fritz.download_sample( fritz.FritzGroups.fetch_groupid("rcf"), savesummary=True)

CPU times: user 114 ms, sys: 38 ms, total: 152 ms
Wall time: 2.27 s


In [40]:
len(summary["sources"])

7835

In [41]:
print(f"This sample has {len(summary['sources'])} targets, so let's use Dask to download all in parallel")

This sample has 7835 targets, so let's use Dask to download all in parallel


**Note on savesummary** you see that it takes only a few second to get the summary of ~8000 sources. 
If you try with savesummary=False, most lilely it will crash after some time.

another way to see it.

In [42]:
%time s = fritz.FritzSample.from_group("rcf", load_sources=False)

CPU times: user 135 ms, sys: 35.7 ms, total: 170 ms
Wall time: 2.4 s


In [43]:
len(s.names)

7835

### Setup a Dask `client` from you laptop: nothing's simpler

In [4]:
from dask.distributed import Client

#client = Client(n_workers=16)
client = Client() # faster I think
client

Perhaps you already have a cluster running?
Hosting the HTTP server on port 51754 instead


0,1
Client  Scheduler: tcp://127.0.0.1:51756  Dashboard: http://127.0.0.1:51754/status,Cluster  Workers: 8  Cores: 8  Memory: 17.18 GB


Open the `Dashboard` link and check it while calling the `from_group()` below

In [3]:
%time s = fritz.FritzSample.from_group("rcf", load_sources=True, client=client, force_dl=True, update_sources=True)

CPU times: user 3min 10s, sys: 12.8 s, total: 3min 22s
Wall time: 14min 58s


Once downloaded, if you don't want to update the sources while building the sample instance. You can still use the Dask client for multiprocessing

In [6]:
%time s = fritz.FritzSample.from_group("rcf", load_sources=True, client=client, force_dl=True, update_sources=False)

CPU times: user 10.5 s, sys: 816 ms, total: 11.3 s
Wall time: 17.1 s


In [7]:
s.data

Unnamed: 0_level_0,redshift,ra,dec,classification,created_at,last_detected_at
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ZTF19acawiiu,,292.947255,-12.039042,Cataclysmic,2020-11-05T03:45:56.378443,2019-09-25T04:53:22.004144+00:00
ZTF19acajaak,,99.224680,-14.130506,Cataclysmic,2020-11-05T03:45:57.261893,2020-03-03T04:07:07.000326+00:00
ZTF19abxhtzs,,355.085238,-27.166537,AGN,2020-11-05T03:46:09.880637,2019-11-25T03:27:52.997765+00:00
ZTF20abkhtev,0.017587,326.143529,-24.602833,Ib,2020-11-05T03:16:11.641987,2020-07-22T10:32:36.997459+00:00
ZTF20acjfimy,0.080000,46.701311,26.824971,Ia,2020-11-04T14:14:12.629164,2020-11-22T07:50:05.003522+00:00
...,...,...,...,...,...,...
ZTF21aagpymw,0.098000,163.467396,12.558064,Ic-SLSN,2021-02-03T10:06:48.979030,2021-03-30T08:27:36.000003+00:00
ZTF21aaowaxx,0.066000,252.993923,62.567883,[],2021-03-09T12:33:14.379383,2021-03-30T09:14:37.996807+00:00
ZTF21aapkzox,,271.860835,73.620393,[],2021-03-17T10:29:28.336019,2021-03-30T09:12:27.999363+00:00
ZTF21aaqwjlz,,245.316654,14.552602,[],2021-03-28T09:47:15.457634,2021-03-28T10:49:45.001901+00:00


Close the dask client when you are done with it

In [8]:
client.close()

### Storing

As before, the data will be stored, and if you don't have only having the information from data, then it could be very fast

In [9]:
s.store()

In [11]:
# Check if rcf is there.
import os
os.listdir(os.path.join(os.getenv('ZTFDATA'), "fritz/sample/"))

['fritz_sample_rcf.csv',
 'fritz_sample_253.csv',
 'fritz_groups.json',
 'fritz_sample_samehost.csv']

In [14]:
%time rcf = fritz.FritzSample.from_group("rcf", load_sources=False)

CPU times: user 45.1 ms, sys: 6.47 ms, total: 51.5 ms
Wall time: 61.3 ms


In [15]:
rcf.data

Unnamed: 0_level_0,index,redshift,ra,dec,classification,created_at,last_detected_at
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
ZTF19acawiiu,0,,292.947255,-12.039042,Cataclysmic,2020-11-05T03:45:56.378443,2019-09-25T04:53:22.004144+00:00
ZTF19acajaak,1,,99.224680,-14.130506,Cataclysmic,2020-11-05T03:45:57.261893,2020-03-03T04:07:07.000326+00:00
ZTF19abxhtzs,2,,355.085238,-27.166537,AGN,2020-11-05T03:46:09.880637,2019-11-25T03:27:52.997765+00:00
ZTF20abkhtev,3,0.017587,326.143529,-24.602833,Ib,2020-11-05T03:16:11.641987,2020-07-22T10:32:36.997459+00:00
ZTF20acjfimy,4,0.080000,46.701311,26.824971,Ia,2020-11-04T14:14:12.629164,2020-11-22T07:50:05.003522+00:00
...,...,...,...,...,...,...,...
ZTF21aagpymw,7830,0.098000,163.467396,12.558064,Ic-SLSN,2021-02-03T10:06:48.979030,2021-03-30T08:27:36.000003+00:00
ZTF21aaowaxx,7831,0.066000,252.993923,62.567883,[],2021-03-09T12:33:14.379383,2021-03-30T09:14:37.996807+00:00
ZTF21aapkzox,7832,,271.860835,73.620393,[],2021-03-17T10:29:28.336019,2021-03-30T09:12:27.999363+00:00
ZTF21aaqwjlz,7833,,245.316654,14.552602,[],2021-03-28T09:47:15.457634,2021-03-28T10:49:45.001901+00:00


In [22]:
rcfIa = rcf.data[rcf.data["classification"].isin([l for l in rcf.data["classification"].unique() if "Ia" in l])]