## Metadata

This notebook discusses how py4DSTEM handles metadata, including:
1. Basics: how metadata is
    - initially scraped from raw files
    - subsequently edited
    - stored and retrieved
2. Association with data:
    - when creating a new dataobject, it may either point to an existing metadata object, or a new one may be generated
3. Saving and loading:
    - file conventions when many dataobjects are saved together with more than one metadata object
4. Editing default scraping behavior:
    - how to tell py4DSTEM to permanently scrape new metadata from input files

In [1]:
# Imports

import numpy as np
import matplotlib.pyplot as plt
import py4DSTEM
from py4DSTEM.file.io import save, FileBrowser

In [2]:
# Load data

fp = "/home/ben/Data/20180830_zeltman_smallsampledatasets/1_10x10 ss=100 alpha=p48 spot 11 cl=1200 300kV RT bin=4 0p5sec.dm3"
dc = py4DSTEM.file.io.read(fp)
dc.set_scan_shape(10,10)

/home/ben/Data/20180830_zeltman_smallsampledatasets/1_10x10 ss=100 alpha=p48 spot 11 cl=1200 300kV RT bin=4 0p5sec.dm3 is not a py4DSTEM file.
Reading with hyperspy...


### Basics: how metadata is

- initially scraped from raw files
- subsequently edited
- stored and retrieved

In [3]:
# Whenever data is loaded, a Metadata object is created.
# This is associated with whatever DataObject (often a DataCube) that's created to store the data, and is
# accessible as an attribute of the dataobject

type(dc.metadata)

py4DSTEM.file.datastructure.metadata.Metadata

In [4]:
# A Metadata instance contains metadata in five dictionaries: microscope, sample, user, calibration, and comments

dc.metadata.__dict__.keys()

dict_keys(['microscope', 'sample', 'user', 'calibration', 'comments', 'original_metadata', '_search_dicts'])

In [5]:
# Each dictionary contains key-value pairs of relevant information

dc.metadata.microscope

{'accelerating_voltage': 300.0,
 'accelerating_voltage_units': '',
 'camera_length': 195.0,
 'camera_length_units': '',
 'C2_aperture': '',
 'convergence_semiangle_mrad': '',
 'spot_size': '',
 'scan_rotation_degrees': '',
 'dwell_time': '',
 'dwell_time_units': '',
 'scan_size_Ny': '',
 'scan_size_Nx': '',
 'R_pix_size': '',
 'R_pix_units': '',
 'K_pix_size': '',
 'K_pix_units': '',
 'probe_FWHM_nm': '',
 'acquisition_date': '',
 'original_filename': '1_10x10 ss=100 alpha=p48 spot 11 cl=1200 300kV RT bin=4 0p5sec.dm3'}

In [6]:
dc.metadata.microscope['accelerating_voltage']

300.0

In [7]:
# You can get the value of some field directly through the dictionaries, or with the get_metadata_item() method

dc.metadata.get_metadata_item('accelerating_voltage')

300.0

In [8]:
# For files loaded with hyperspy, metadata is first loaded into the hyperspy DictionaryTreeBrowser object, then
# these are searched and used to populate the py4DSTEM metadata dictionaries.  The original hyperspy trees are
# also saved, in metadata.original_metadata.shortlist and metadata.original_metadata.all

dc.metadata.original_metadata.shortlist

├── Acquisition_instrument
│   └── TEM
│       ├── acquisition_mode = TEM
│       ├── beam_current = 0.005
│       ├── beam_energy = 300.0
│       ├── camera_length = 195.0
│       ├── magnification = 1200.0
│       └── microscope = FEI Tecnai Remote
├── General
│   ├── date = 2018-06-11
│   ├── original_filename = 1_10x10 ss=100 alpha=p48 spot 11 cl=1200 300kV RT bin=4 0p5sec.dm3
│   ├── time = 16:22:54
│   └── title = 1_10x10 ss=100 alpha=p48 spot 11 cl=1200 300kV RT bin=4 0p5sec
└── Signal
    ├── Noise_properties
    │   └── Variance_linear_model
    │       ├── gain_factor = 1.0
    │       └── gain_offset = 0.0
    ├── binned = False
    ├── quantity = Intensity
    └── signal_type = 

In [9]:
# You can search and retrieve items from these hyperspy objects with

key = 'beam_current'
found, val = dc.metadata.search_hs_tree(key, dc.metadata.original_metadata.shortlist)

if found:
    print('{} = {}'.format(key,val))
else:
    print('{} not found'.format(key))

beam_current = 0.005


In [10]:
# Metadata can be altered directly in the dictionaries

dc.metadata.microscope['spot_size'] = 9
dc.metadata.microscope

{'accelerating_voltage': 300.0,
 'accelerating_voltage_units': '',
 'camera_length': 195.0,
 'camera_length_units': '',
 'C2_aperture': '',
 'convergence_semiangle_mrad': '',
 'spot_size': 9,
 'scan_rotation_degrees': '',
 'dwell_time': '',
 'dwell_time_units': '',
 'scan_size_Ny': '',
 'scan_size_Nx': '',
 'R_pix_size': '',
 'R_pix_units': '',
 'K_pix_size': '',
 'K_pix_units': '',
 'probe_FWHM_nm': '',
 'acquisition_date': '',
 'original_filename': '1_10x10 ss=100 alpha=p48 spot 11 cl=1200 300kV RT bin=4 0p5sec.dm3'}

### Association with data

In [11]:
# When a new DataObject is created, by default it is not associated with any metadata

diffslice = py4DSTEM.file.datastructure.DiffractionSlice(data=np.zeros((50,50)))
assert(diffslice.metadata is None)

In [12]:
# Instead, metadata can be attached to a DataObject on instantiation using the metadata kwarg.
# We can pass to metadata either a Metadata object directly, or a DataObject which already has metadata - 
# thus the two lines of code below are completely equivalent

diffslice = py4DSTEM.file.datastructure.DiffractionSlice(data=np.zeros((50,50)),metadata=dc.metadata)
diffslice = py4DSTEM.file.datastructure.DiffractionSlice(data=np.zeros((50,50)),metadata=dc)

In [13]:
diffslice.metadata

<py4DSTEM.file.datastructure.metadata.Metadata at 0x7f3529a34278>

In [14]:
# Under the hood, there is now a single Metadata object and both dc and diffslice point to it.
# So if we change the metadata of diffslice, it will change in dc as well

diffslice.metadata.microscope

{'accelerating_voltage': 300.0,
 'accelerating_voltage_units': '',
 'camera_length': 195.0,
 'camera_length_units': '',
 'C2_aperture': '',
 'convergence_semiangle_mrad': '',
 'spot_size': 9,
 'scan_rotation_degrees': '',
 'dwell_time': '',
 'dwell_time_units': '',
 'scan_size_Ny': '',
 'scan_size_Nx': '',
 'R_pix_size': '',
 'R_pix_units': '',
 'K_pix_size': '',
 'K_pix_units': '',
 'probe_FWHM_nm': '',
 'acquisition_date': '',
 'original_filename': '1_10x10 ss=100 alpha=p48 spot 11 cl=1200 300kV RT bin=4 0p5sec.dm3'}

In [15]:
diffslice.metadata.microscope['spot_size'] = 7
dc.metadata.get_metadata_item('spot_size')

7

In [16]:
# If we need to decouple the metadata of two objects - for instance, if we want to resample diffslice but not dc,
# so that their dimensions and pixel sizes will not differ - all DataObjects have a new_metadata() method which
# generates a new, distinct copy of a Metadata object which only that DataObject points to.

# With no arguments, dataobject.new_metadata() takes the dataobject's current metadata, copies it, and reassigns
# dataobject.metadata to the copy.
diffslice.new_metadata()
diffslice.metadata.microscope['spot_size'] = 3
print(dc.metadata.get_metadata_item('spot_size'))
print(diffslice.metadata.get_metadata_item('spot_size'))

7
3


In [17]:
# Alternatively, we can pass dataobject.new_metadata() the keyword metadata, pointing to either a Metadata object
# or a DataCube which possesses a metadata object.  In each case, the final outcome is that dataobject will have
# it's own new copy of the metadata.

diffslice.new_metadata(metadata = dc)
print(dc.metadata.get_metadata_item('spot_size'))
print(diffslice.metadata.get_metadata_item('spot_size'))
diffslice.metadata.microscope['spot_size'] = 3
print(dc.metadata.get_metadata_item('spot_size'))
print(diffslice.metadata.get_metadata_item('spot_size'))

7
7
7
3


In [18]:
# For testing purposes in a moment, we'll make one more dataobject, this time which shares the metadata
# associated with dc

realslice = py4DSTEM.file.datastructure.RealSlice(data=np.zeros((20,20)), metadata=dc)

### Saving and loading

When saving many dataobjects, the .h5 file will contain a separate metadata group for each distinct metadata object associated with the dataobjects being saved.  When loading, dataobjects with distinct metadata will have unique metadata objects created and associated with them, while dataobject with the same metadata will all link to the same metadata object.

In [19]:
# Save three dataobjects, which have 2 distinct metadataobjects associated with them

fp = "/home/ben/Desktop/test.h5"
save([dc,diffslice,realslice],fp)

Creating file /home/ben/Desktop/test.h5...
Done.


In [20]:
browser = FileBrowser(fp)
browser.show_dataobjects()

 Index                  Name                        Type        
   0    datacube_0                          DataCube            
   1    diffractionslice_0                  DiffractionSlice    
   2    realslice_0                         RealSlice           


In [21]:
# Load all three objects

dc_new = browser.get_dataobject(0)
diffslice_new = browser.get_dataobject(1)
realslice_new = browser.get_dataobject(2)
browser.close()

In [None]:
# dc_new and realslice_new point to the same Metadata instance

dc_new.metadata == realslice_new.metadata

In [None]:
# dc_new and diffslice_new point to different Metadata instances

dc_new.metadata == diffslice_new.metadata

In [26]:
# The metadata was correctly saved, stored, and loaded

dc_new.metadata.microscope

{'C2_aperture': '',
 'K_pix_size': '',
 'K_pix_units': '',
 'R_pix_size': '',
 'R_pix_units': '',
 'accelerating_voltage': 300.0,
 'accelerating_voltage_units': '',
 'acquisition_date': '',
 'camera_length': 195.0,
 'camera_length_units': '',
 'convergence_semiangle_mrad': '',
 'dwell_time': '',
 'dwell_time_units': '',
 'original_filename': '1_10x10 ss=100 alpha=p48 spot 11 cl=1200 300kV RT bin=4 0p5sec.dm3',
 'probe_FWHM_nm': '',
 'scan_rotation_degrees': '',
 'scan_size_Nx': '',
 'scan_size_Ny': '',
 'spot_size': 7}

In [27]:
diffslice_new.metadata.microscope

{'C2_aperture': '',
 'K_pix_size': '',
 'K_pix_units': '',
 'R_pix_size': '',
 'R_pix_units': '',
 'accelerating_voltage': 300.0,
 'accelerating_voltage_units': '',
 'acquisition_date': '',
 'camera_length': 195.0,
 'camera_length_units': '',
 'convergence_semiangle_mrad': '',
 'dwell_time': '',
 'dwell_time_units': '',
 'original_filename': '1_10x10 ss=100 alpha=p48 spot 11 cl=1200 300kV RT bin=4 0p5sec.dm3',
 'probe_FWHM_nm': '',
 'scan_rotation_degrees': '',
 'scan_size_Nx': '',
 'scan_size_Ny': '',
 'spot_size': 3}

### Editing / adding metadata

In [36]:
# Adding new metadata is the same as editing existing metadata - just use the appropriate dictionaries:

dc.metadata.sample['sample'] = 'no moon; a space station'

In [37]:
dc.metadata.get_metadata_item('sample')

'no moon; a space station'

In [38]:
comment = "Don't be too proud of this technological terror you've constructed. " + \
          "The ability to destroy a planet is insignificant next to the power of the force."

dc.metadata.comments["DVs cautionary"] = comment

In [39]:
dc.metadata.get_metadata_item('DVs cautionary')

"Don't be too proud of this technological terror you've constructed. The ability to destroy a planet is insignificant next to the power of the force."

In [40]:
# To tell py4DSTEM to automatically scrape an item of metadata from the DictionaryTreeBrowsers on load, edit
# the Metadata method setup_search_dicts(), in ./process/datastructure/metadata.py, with key:value pairs specifying the py4DSTEM:hyperspy strings associated
# with the metadata item of interest

dc.metadata.setup_search_dicts?