# Opening EMPIAR-10934 To Check tiff.bz2 Compatibility

The EMPIARreader CLI can reach VolumeEM packages as these have EMPIAR data stored in the same ftp server as the cryoEM EMPIAR datasets. Also need to check tiff.bz2 (fractionated images) can be opened sensibly...

This includes checking:

* tiff (or tiff.bz2) files can be opened via the API (checking this with EM data and tiff.bz2 file)
* how access to image-stacks works in the API
* check the (detailed) format/object of these files once opened
* figure out visualisation of them (in iypnb)
* check mrc file of volumeEM data opens (from EMPIAR-10442)

In [2]:
from empiarreader import EmpiarSource, EmpiarCatalog
import matplotlib.pyplot as plt

EM_entry = 10934

In [6]:
test_catalog = EmpiarCatalog(EM_entry)
list(test_catalog.keys())



['Multiframe micrographs of human Pol κ complexed with monoubiquitylated PCNA and DNA']

In [7]:
test_catalog_dir = list(test_catalog.keys())[0]
dataset_from_catalog = test_catalog[test_catalog_dir]
dataset_from_catalog

"Multiframe micrographs of human Pol \u03BA complexed with monoubiquitylated PCNA and DNA":
  args:
    directory: data/CL44-1_20201106_111915
    empiar_index: 10934
    imageset_metadata:
      category: micrographs - multiframe
      data_format: TIFF
      details: "The given pixel size accounts for super resolution binning 1 - the\
        \ physical pixel size was 1.086 \xC5/pix. Movies are non-gain normalised -\
        \ gain reference included."
      directory: data/CL44-1_20201106_111915
      frame_range_max: null
      frame_range_min: null
      frames_per_image: 45
      header_format: TIFF
      image_height: '8184'
      image_width: '11520'
      micrographs_file_pattern: data/CL44-1_20201106_111915/Images-Disc1/GridSquare_*/Data/FoilHole_*_fractions.tiff.bz2
      name: "Multiframe micrographs of human Pol \u03BA complexed with monoubiquitylated\
        \ PCNA and DNA"
      num_images_or_tilt_series: 3889
      picked_particles_directory: ''
      picked_particles_

### Opening data from a specific directory

In [8]:
test_directory = "data/CL44-1_20201106_111915/Images-Disc1/GridSquare_6089277/Data"
wildcard_10934 = "*003111*.tiff.bz2"

In [9]:
ds = EmpiarSource(
        EM_entry,
        directory=test_directory,
        filename=wildcard_10934,
        regexp=False,
    )

ds

empiar:
  args:
    directory: data/CL44-1_20201106_111915/Images-Disc1/GridSquare_6089277/Data
    empiar_index: 10934
    filename: '*003111*.tiff.bz2'
    regexp: false
  description: ''
  driver: empiarreader.empiar.empiar.EmpiarSource
  metadata: {}


In [10]:
part = ds.read_partition(0)
part

ValueError: Could not find a backend to open `<File-like object HTTPFileSystem, https://ftp.ebi.ac.uk/empiar/world_availability/10934/data/CL44-1_20201106_111915/Images-Disc1/GridSquare_6089277/Data/FoilHole_6099219_Data_6093696_6093698_20201107_003111_fractions.tiff.bz2>`` with iomode `ri`.

In [None]:
title = "Images from EMPIAR-"+str(test_entry)
plt.figure(figsize=(16, 12))
plt.suptitle(title, fontsize=15)

for i in range(4):
    part = ds.read_partition(i)
    image_part = part.data
    ax = plt.subplot(2, 2, i+1)

    plt.imshow(
        image_part,cmap='gray',
    )
    ax.set_title("Image "+str(i))
    ax.axis("off")
    ax.set_xlabel("")

### Looks like intake can't decode tiff.bz2 files

So lets try .tif format files instead

In [1]:
tif_entry = 10943

In [3]:
test_catalog_tif = EmpiarCatalog(tif_entry)
list(test_catalog_tif.keys())

['Unaligned multi-frame movies in Tiff: Dataset 2']

In [4]:
test_catalog_dir_tif = list(test_catalog_tif.keys())[0]
dataset_from_catalog_tif = test_catalog_tif[test_catalog_dir_tif]
dataset_from_catalog_tif

'Unaligned multi-frame movies in Tiff: Dataset 2':
  args:
    directory: data/Tiff
    empiar_index: 10943
    imageset_metadata:
      category: micrographs - multiframe
      data_format: TIFF
      details: "Movies were collected in EER format and converted into tiff using\
        \ relion_convert_to_tiff grouping frames by 34 giving a dose of 0.98 e/\xC5\
        ^2. Please see additional files for complete processing pipeline."
      directory: data/Tiff
      frame_range_max: null
      frame_range_min: null
      frames_per_image: 34
      header_format: TIFF
      image_height: null
      image_width: null
      micrographs_file_pattern: ''
      name: 'Unaligned multi-frame movies in Tiff: Dataset 2'
      num_images_or_tilt_series: 4000
      picked_particles_directory: ''
      picked_particles_file_pattern: ''
      pixel_height: 0.824
      pixel_width: 0.824
      segmentations: []
      voxel_type: UNSIGNED BYTE
  description: "Movies were collected in EER format and c

In [5]:
test_directory_tif = "data/Tiff/EER/Images-Disc1/GridSquare_11149061/Data"
wildcard_10943 = "*20210911_233708_EER.tif"
# wildcard_10943 = "*"

In [6]:
ds_tif = EmpiarSource(
        tif_entry,
        directory=test_directory_tif,
        filename=wildcard_10943,
        regexp=False,
    )

ds_tif

empiar:
  args:
    directory: data/Tiff/EER/Images-Disc1/GridSquare_11149061/Data
    empiar_index: 10943
    filename: '*20210911_233708_EER.tif'
    regexp: false
  description: ''
  driver: empiarreader.empiar.empiar.EmpiarSource
  metadata: {}


In [7]:
part_tif = ds_tif.read_partition(0)
part_tif



TypeError: For Xarray sources, must specify partition as tuple

In [None]:
title_tif = "Images from EMPIAR-"+str(tif_entry)
plt.figure(figsize=(16, 12))
plt.suptitle(title, fontsize=15)

for i in range(4):
    part_tif = ds_tif.read_partition(i)
    image_part_tif = part_tif.data
    ax = plt.subplot(2, 2, i+1)

    plt.imshow(
        image_part,cmap='gray',
    )
    ax.set_title("Image "+str(i))
    ax.axis("off")
    ax.set_xlabel("")

## Checking volumeEM (mrc) data opens okay

EMPIAR entry 10442 with file "data/MRC_Files/*278-307um.mrc"

(full https version of ftp path: "https://ftp.ebi.ac.uk/empiar/world_availability/10442/data/MRC_Files/*278-307um.mrc")

In [8]:
vem_entry = 10442

In [9]:
test_catalog_vem = EmpiarCatalog(vem_entry)
list(test_catalog_vem.keys())

['Sieve element cells of Arabidopsis thaliana roots']

In [10]:
test_catalog_dir_vem = list(test_catalog_vem.keys())[0]
dataset_from_catalog_vem = test_catalog_vem[test_catalog_dir_vem]
dataset_from_catalog_vem

Sieve element cells of Arabidopsis thaliana roots:
  args:
    directory: data/MRC_Files
    empiar_index: 10442
    imageset_metadata:
      category: micrographs - multiframe
      data_format: MRC
      details: "Processed datasets: image processing was done in Microscopy Image\
        \ Browser and included adjustment of contrast, alignment and calibration.\
        \ The bounding box coordinates represent position of the dataset relative\
        \ to the tip of the root\n\nDataset: 180130_PLM_SE_up_278-307um.am\nVoxel\
        \ size: 0.010 x 0.01 x 0.04 um \nBoundingBox -161.23100 -135.28000 -327.25700\
        \ -309.95700 278.23500 306.23500\nMIB(1802021310): Aligned using Drift correction;\
        \ relative to 0\nMIB(1802021312): Aligned using Single landmark point; relative\
        \ to 0\nMIB(1802021316): ImCrop: [x1 y1 dx dy z1 dz t1 dt]: [183   462  2596\
        \  1731     1   717     1     1]\nMIB(1802021319): Normalize contrast, mode:\
        \ mask, Ch: 1\nMIB(1

In [11]:
test_directory_vem = "data/MRC_Files"
wildcard_10442 = "*278-307um.mrc"

In [12]:
ds_vem = EmpiarSource(
        vem_entry,
        directory=test_directory_vem,
        filename=wildcard_10442,
        regexp=False,
    )

ds_vem

empiar:
  args:
    directory: data/MRC_Files
    empiar_index: 10442
    filename: '*278-307um.mrc'
    regexp: false
  description: ''
  driver: empiarreader.empiar.empiar.EmpiarSource
  metadata: {}


In [13]:
part_vem = ds_vem.read_partition(0)
part_vem

: 

In [None]:
title_vem = "Images from EMPIAR-"+str(vem_entry)
plt.figure(figsize=(16, 12))
plt.suptitle(title, fontsize=15)

for i in range(1):
    part_vem = ds_vem.read_partition(i)
    image_part_vem = part_vem.data
    ax = plt.subplot(1, 1, i+1)

    plt.imshow(
        image_part,cmap='gray',
    )
    ax.set_title("Image "+str(i))
    ax.axis("off")
    ax.set_xlabel("")