## Inserting data into Xedocs from CMT

This document will serve to show how we can insert data into a xedocs schema

In [1]:
import strax
import straxen
import xedocs as xd
import numpy as np

In [3]:
import datetime

In [4]:
xd.list_schemas()

['bodega',
 'fax_configs',
 'electron_drift_velocities',
 'electron_drift_time_gates',
 'electron_lifetimes',
 'rel_extraction_effs',
 'fdc_maps',
 'hit_thresholds',
 'pmt_gains',
 'posrec_models',
 's1_xyz_maps',
 's2_xy_maps',
 'utube_calibrations',
 'diffused_calibrations',
 'ibelt_calibrations']

In [7]:
elife = xd.find('electron_lifetimes')

In [None]:
# or equivilantely elife = xd.ElectronLifetime.find()

In [8]:
elife

[]

As we can see, currently the electron lifetimes section of the MongoDB attached to xedocs returns an empty array for the electron lifetime, in other words this data is not currently stored.

Now lets get some data from CMT to save in xedocs

In [42]:
cmt = straxen.CorrectionsManagementServices().interface

In [43]:
cmt_elife = cmt.read('elife') #gets the electron lifetime data

In [44]:
cmt_elife

Unnamed: 0_level_0,ONLINE,v1,v2,v3,v4,v5
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2017-01-01 00:00:00+00:00,2.000000e+05,200000.00000,266000.00000,266000.00000,266000.00000,266000.00000
2020-10-14 00:00:00+00:00,7.125580e+04,71255.79834,94770.21179,94770.21179,94770.21179,94770.21179
2020-10-14 06:00:00+00:00,6.927400e+04,69274.00208,92134.42277,92134.42277,92134.42277,92134.42277
2020-10-14 12:00:00+00:00,6.903980e+04,69039.80255,91822.93739,91822.93739,91822.93739,91822.93739
2020-10-14 18:00:00+00:00,6.819510e+04,68195.09888,90699.48151,90699.48151,90699.48151,90699.48151
...,...,...,...,...,...,...
2022-01-03 21:03:50+00:00,8.197445e+06,,,,,
2022-01-04 03:04:23+00:00,8.156742e+06,,,,,
2022-01-04 09:04:57+00:00,8.066127e+06,,,,,
2022-01-04 15:05:31+00:00,7.944100e+06,,,,,


In [47]:
cmt_elife.columns

Index(['ONLINE', 'v1', 'v2', 'v3', 'v4', 'v5'], dtype='object')

In [48]:
cmt_elife.index

DatetimeIndex(['2017-01-01 00:00:00+00:00', '2020-10-14 00:00:00+00:00',
               '2020-10-14 06:00:00+00:00', '2020-10-14 12:00:00+00:00',
               '2020-10-14 18:00:00+00:00', '2020-10-15 00:00:00+00:00',
               '2020-10-15 06:00:00+00:00', '2020-10-15 12:00:00+00:00',
               '2020-10-15 18:00:00+00:00', '2020-10-16 00:00:00+00:00',
               ...
               '2022-01-02 15:01:06+00:00', '2022-01-02 21:01:39+00:00',
               '2022-01-03 03:02:12+00:00', '2022-01-03 09:02:44+00:00',
               '2022-01-03 15:03:17+00:00', '2022-01-03 21:03:50+00:00',
               '2022-01-04 03:04:23+00:00', '2022-01-04 09:04:57+00:00',
               '2022-01-04 15:05:31+00:00', '2022-01-04 21:06:03+00:00'],
              dtype='datetime64[ns, UTC]', name='time', length=1583, freq=None)

In [115]:
cmt_elife.iloc[1]['v1'] #get the value for version v1 at location 1

71255.79834

### Understanding Schemas

Schemas are python classes so they can store values and functions or operations that can be preformed on such values. Here we will look at the ElectronLifetime schema

In [72]:
xd.ElectronLifetime??

[0;31mInit signature:[0m
[0mxd[0m[0;34m.[0m[0mElectronLifetime[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mversion[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcreated_date[0m[0;34m:[0m [0mdatetime[0m[0;34m.[0m[0mdatetime[0m [0;34m=[0m [0mdatetime[0m[0;34m.[0m[0mdatetime[0m[0;34m([0m[0;36m2022[0m[0;34m,[0m [0;36m6[0m[0;34m,[0m [0;36m21[0m[0;34m,[0m [0;36m19[0m[0;34m,[0m [0;36m51[0m[0;34m,[0m [0;36m14[0m[0;34m,[0m [0;36m391000[0m[0;34m)[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcomments[0m[0;34m:[0m [0mstr[0m [0;34m=[0m [0;34m''[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mrun_id[0m[0;34m:[0m [0mrframe[0m[0;34m.[0m[0mtypes[0m[0;34m.[0m[0mTimeInterval[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mvalue[0m[0;34m:[0m [0mfloat[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0;32mNone[0m[0;34m[0m[0;34m[0m[0m


Lets note a few things here:
- There are 3 values which are necessary to give to make a new entery on the electron lifetime database: a version, a value and a run_id/time
- If a run_id is given xedocs will fetch the time corresponding to that run_id and use that
- ElectronLifetimes inherits the class TimeIntervalCorrections which means it get all functions under that class. It also means we need to give a time interval for these corrections, not just 1 time

In [99]:
docs = xd.ElectronLifetime(version = cmt_elife.columns[1], value = cmt_elife.iloc[1][cmt_elife.columns[1]], 
                           time =(cmt_elife.index[1].tz_localize(None),cmt_elife.index[2].tz_localize(None)), 
                           comments = 'testing uploading date to the MongoDB using xedocs')

In [100]:
docs

ElectronLifetime(version='v1', created_date=datetime.datetime(2022, 6, 21, 19, 51, 14, 391000), comments='testing uploading date to the MongoDB using xedocs', time=TimeInterval(left=Timestamp('2020-10-14 00:00:00'), right=Timestamp('2020-10-14 06:00:00')), value=71255.79834)

We have saved an entery LOCALLY into the ElectronLifetime database, we can save it to the MongoDB with a simple .save function

In [101]:
docs.save()

In [103]:
testing_save = xd.ElectronLifetime.find()

In [104]:
testing_save

[ElectronLifetime(version='v1', created_date=datetime.datetime(2022, 6, 21, 19, 51, 14, 391000), comments='testing uploading date to the MongoDB using xedocs', time=TimeInterval(left=datetime.datetime(2020, 10, 14, 0, 0), right=datetime.datetime(2020, 10, 14, 6, 0)), value=71255.79834)]

Now lets try to change the value of the ElectronLifetime data we just gave to a different one by making an identical intery, just with a different value for the electron lifetime as the previous

In [105]:
overlap_data = xd.ElectronLifetime(version = cmt_elife.columns[1], value = 10, 
                                   time =(cmt_elife.index[1].tz_localize(None),cmt_elife.index[2].tz_localize(None)), 
                                   comments = 'testing uploading date to the MongoDB using xedocs')

In [107]:
testing_save = [testing_save, overlap_data]

In [108]:
testing_save

[[ElectronLifetime(version='v1', created_date=datetime.datetime(2022, 6, 21, 19, 51, 14, 391000), comments='testing uploading date to the MongoDB using xedocs', time=TimeInterval(left=datetime.datetime(2020, 10, 14, 0, 0), right=datetime.datetime(2020, 10, 14, 6, 0)), value=71255.79834)],
 ElectronLifetime(version='v1', created_date=datetime.datetime(2022, 6, 21, 19, 51, 14, 391000), comments='testing uploading date to the MongoDB using xedocs', time=TimeInterval(left=Timestamp('2020-10-14 00:00:00'), right=Timestamp('2020-10-14 06:00:00')), value=10.0)]

In [109]:
overlap_data.save()

UpdateError: Cannot update existing instance (version='v1' created_date=datetime.datetime(2022, 6, 21, 19, 51, 14, 391000) comments='testing uploading date to the MongoDB using xedocs' time=TimeInterval(left=datetime.datetime(2020, 10, 14, 0, 0), right=datetime.datetime(2020, 10, 14, 6, 0)) value=71255.79834) with new instance (version='v1' created_date=datetime.datetime(2022, 6, 21, 19, 51, 14, 391000) comments='testing uploading date to the MongoDB using xedocs' time=TimeInterval(left=Timestamp('2020-10-14 00:00:00'), right=Timestamp('2020-10-14 06:00:00')) value=10.0), the schema raised the following exception: Values already set for {'version': 'v1', 'time': TimeInterval(left=datetime.datetime(2020, 10, 14, 0, 0), right=datetime.datetime(2020, 10, 14, 6, 0))}.

We got an error! This is good as it prevents people form accidentally overwriting data into the MongoDB dataframe, if a some data is saved and needs to be changed you will need to access the MongoDB directly

In [112]:
overlap_data.value = 20

In [114]:
overlap_data.value

20.0

You can however overwrite whatever values you want locally!

In [94]:
cmt_elife.index[1]

Timestamp('2020-10-14 00:00:00+0000', tz='UTC')

In [96]:
cmt_elife.index[1].tz_localize(None) #small note, xedocs only accepts values with a non-localized timezone so this has

Timestamp('2020-10-14 00:00:00')

Small note, xedocs only accepts values with a non-localized timezone so the datetime value of CMT and other data sources must be changed to None in order for xedocs to accpet these values.

### How to deal with dataframes with multiple fields as indexes

This section is more of a remider for me in case I forget how to do this but I figured I would share it here for others who have not encountered things like this before in python

In [77]:
pmt_1_df = xd.find_df('pmt_gains', version = 'v6', pmt = 415, detectpr = 'tpc')

In [85]:
pmt_1_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,created_date,comments,value
version,time,detector,pmt,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
v6,2020-04-01 12:06:54.000,tpc,415,2022-05-01 18:33:48.959,migrated from cmt.,0.047813
v6,2020-04-01 14:17:27.000,tpc,415,2022-05-01 18:33:48.959,migrated from cmt.,0.047813
v6,2020-04-01 14:17:28.000,tpc,415,2022-05-01 18:33:48.959,migrated from cmt.,0.005846
v6,2020-04-16 09:49:17.000,tpc,415,2022-05-01 18:33:48.959,migrated from cmt.,0.005846
v6,2020-04-16 09:49:18.000,tpc,415,2022-05-01 18:33:48.959,migrated from cmt.,0.047813
v6,...,...,...,...,...,...
v6,2021-12-09 16:11:19.000,tpc,415,2022-05-01 18:33:48.959,migrated from cmt.,0.007993
v6,2021-12-10 13:18:09.288,tpc,415,2022-05-01 18:33:48.959,migrated from cmt.,0.007994
v6,2021-12-17 10:13:10.000,tpc,415,2022-05-01 18:33:48.959,migrated from cmt.,0.007998
v6,2021-12-17 17:53:43.936,tpc,415,2022-05-01 18:33:48.959,migrated from cmt.,0.007999


So how do we get spesific indecies? we can use the .grouby pmt_1_df.groupby(level=1)

In [89]:
pmt_1_df.groupby(level=1).apply(lambda x: x.iloc[0]).index

DatetimeIndex([       '2020-04-01 12:06:54',        '2020-04-01 14:17:27',
                      '2020-04-01 14:17:28',        '2020-04-16 09:49:17',
                      '2020-04-16 09:49:18',        '2020-04-17 13:46:50',
                      '2020-04-17 13:46:51',        '2020-04-28 14:27:49',
                      '2020-04-28 14:27:50',        '2020-05-08 13:15:49',
               ...
               '2021-11-19 19:58:55.024000',        '2021-11-26 09:17:00',
               '2021-11-26 17:48:15.992000',        '2021-12-03 09:07:11',
               '2021-12-06 19:10:45.162000',        '2021-12-09 16:11:19',
               '2021-12-10 13:18:09.288000',        '2021-12-17 10:13:10',
               '2021-12-17 17:53:43.936000',        '2021-12-24 08:19:40'],
              dtype='datetime64[ns]', name='time', length=304, freq=None)

In [110]:
pmt_1_df.groupby(level=1).apply(lambda x: x.iloc[0]).index[0]

Timestamp('2020-04-01 12:06:54')

In [83]:
df_cd_pmt1 = cmt.read('pmt_415_gain_xenonnt')

In [84]:
df_cd_pmt1['v6'].index[1:]

DatetimeIndex([       '2020-04-01 12:06:54+00:00',
                      '2020-04-01 14:17:27+00:00',
                      '2020-04-01 14:17:28+00:00',
                      '2020-04-16 09:49:17+00:00',
                      '2020-04-16 09:49:18+00:00',
                      '2020-04-17 13:46:50+00:00',
                      '2020-04-17 13:46:51+00:00',
                      '2020-04-28 14:27:49+00:00',
                      '2020-04-28 14:27:50+00:00',
                      '2020-05-08 13:15:49+00:00',
               ...
                      '2022-02-25 08:09:12+00:00',
                      '2022-03-04 14:00:56+00:00',
                      '2022-03-11 08:24:10+00:00',
                      '2022-03-18 09:09:50+00:00',
                      '2022-03-23 14:02:35+00:00',
                      '2022-03-25 09:07:19+00:00',
               '2022-05-02 12:38:49.267000+00:00',
                      '2022-06-03 15:00:00+00:00',
               '2022-06-16 15:57:00.667000+00:00',
            