- **Author:** [Dace Apšvalka](https://www.mrc-cbu.cam.ac.uk/people/dace.apsvalka/) 
- **Date:** August 2024  
- **conda environment**: I used the [fMRI workshop's conda environment](https://github.com/MRC-CBU/COGNESTIC/blob/c0dc3faa699e19187d5d5a8fb491a66baa27b9fb/mri_environment.yml) to run this notebook and any accompanied scripts.

**conda environment**: I used the [fMRI workshop's conda environment](https://github.com/MRC-CBU/COGNESTIC/blob/main/mri_environment.yml) to run this notebook and any accompanied scripts.

# fMRI Data Management
* Brief overview of the importance of data management in fMRI research.
* Objectives of the notebook (e.g., understanding data organization, file formats, metadata, and version control).
* Outline of the topics covered.

test changes

-----------

**Table of contents**<a id='toc0_'></a>    
1. [File types and formats](#toc1_)    
2. [Create a project folder](#toc2_)    
3. [Retrieving the DICOM files](#toc3_)    
4. [Brain Imaging Data Structure (BIDS)](#toc4_)    
5. [HeuDiConv](#toc5_)    
5.1. [Step 1: Discovering your scans](#toc5_1_)    
5.2. [Step 2: Creating a heuristic file](#toc5_2_)    
5.3. [Step 3: Converting the data](#toc5_3_)    
5.4. ['To Do' - additional information to check and add](#toc5_4_)    
6. [Validate BIDS structure](#toc6_)    
7. [PyBIDS](#toc7_)    
7.1. [Querying the BIDSLayout](#toc7_1_)    
7.2. [Filtering files by entities](#toc7_2_)    
7.3. [Filtering by metadata](#toc7_3_)    
7.4. [Other `return_type` values](#toc7_4_)    
7.5. [The `BIDSFile`](#toc7_5_)    
7.6. [`.tsv` files](#toc7_6_)    
7.7. [Filename parsing](#toc7_7_)    
7.8. [Report generation](#toc7_8_)    

<!-- vscode-jupyter-toc-config
	numbering=true
	anchor=true
	flat=true
	minLevel=2
	maxLevel=3
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

------------

## 1. <a id='toc1_'></a>[File types and formats](#toc0_)

Brain images:
* DICOM
* NifTI

Task data - stimuli onset times and duration

Metadata

## 2. <a id='toc2_'></a>[Create a project folder](#toc0_)

Here is a recommended folder structure for your fMRI project. 

```bash
# ======================================================================
# Recommended directory structure for an fMRI project
# ======================================================================

# project_name                     
#    └── code
#        └── task
#        └── preprocessing
#        └── analysis
#    └── data
#    └── documents      # Protocols, reports, and other documentation
#    └── results        # Analysis results, figures, and summary outputs
#    └── scratch        # Temporary files and intermediate results
#    └── logs           # Log files, error reports, and processing records

```

You can create it manually or use a simple `command line` command: 
```bash
mkdir -p My_fMRI_study/{code/{task,preprocessing,analysis},data,documents,results,scratch,logs}
```

In [None]:
!tree -L 3

## 3. <a id='toc3_'></a>[Retrieving the DICOM files](#toc0_)

`DICOM` files are the raw imaging files that come from the MRI scanner. Usually they are stored on some MRI data server. At the CBU, each imaging project has a unique code. Knowing my project's code, I can locate the raw `DICOM` files on our server.


In [None]:
!ls -d mridata/*_MR09029/*

## 4. <a id='toc4_'></a>[Brain Imaging Data Structure (BIDS)](#toc0_)

***!For a more detailed tutorial see: [https://github.com/MRC-CBU/BIDS_conversion/tree/main/MRI](https://github.com/MRC-CBU/BIDS_conversion/tree/main/MRI).***

To proceed with analysis, we need to convert the `DICOMs` to `NIfTI` format and then organise all these files in a 'nice' way.

[Brain Imaging Data Structure (**BIDS**)](https://bids-specification.readthedocs.io/en/stable/) is a a standard for organizing and describing neuroimaging (and behavioural) datasets. See [BIDS paper](https://doi.org/10.1038/sdata.2016.44) and http://bids.neuroimaging.io website for more information.

How to get your DICOMs into NIfTI and into BIDS?

Several tools exist (see a full list [here](https://bids.neuroimaging.io/benefits#converters)). I will here demonstrate a `Python`-based converter [HeuDiConv](https://heudiconv.readthedocs.io/en/latest/index.html). 

`heudiconv` is a flexible `DICOM` converter for organizing brain imaging data into structured directory layouts.
* It allows flexible directory layouts and naming schemes through customizable heuristics implementations
* It only converts the necessary DICOMs, not everything in a directory
* You can keep links to DICOM files in the participant layout
* Using `dcm2niix` under the hood, it’s fast
* It provides assistance in converting to `BIDS`.


## 5. <a id='toc5_'></a>[HeuDiConv](#toc0_)

HeudiConv is a comand lime tool. To use it, you would either install heudiconv and dcm2niix packages locally: 
```
pip install heudiconv dcm2niix

```

or use Docker (or Apptainer/Singularity) container image
```
docker pull nipy/heudiconv
```

`heidiconv` involves 3 main steps:
1. Discovering what DICOM series (scans) there are in your data
2. Creating a heuristic file specifying how to translate the DICOMs into BIDS
3. Converting the data 

### 5.1. <a id='toc5_1_'></a>[Step 1: Discovering your scans](#toc0_)

First, you need to know what scans there are and how to uniquely identify them by their metadata. You could look in each scan's DICOM file metadata manually yourself, but that's not very convenient. Instead, you can 'ask' HeuDiConv to do the scan discovery for you. If you run HeuDiConv without NIfTI conversion and heuristic, it will generate a DICOM info table with all scans and their metadata. Like this: 

<img align="left" padding = "16px;" src="dicom_info.png">
<br clear="left"/>

The column names are metadata fields and rows contain their corresponding values.

**Example script:**
To get such a table, you'd write a simple bash script, like this: [code-examples/step01_dicom_discover.sh](code-examples/step01_dicom_discover.sh) script

```bash
#!/bin/bash

# Path to the raw DICOM files
DICOM_PATH='mridata/CBU090928_MR09029'

# Location of the output data (it will be created if it doesn't exist)
OUTPUT_PATH="FaceProcessing/scratch/dicom_discovery"

# Subject ID
SUBJECT_ID='04'

# ------------------------------------------------------------
# Activate the mri environment (or any other environment with heudiconv installed)
# ------------------------------------------------------------
conda activate mri

# ------------------------------------------------------------
# Run the heudiconv
# ------------------------------------------------------------
heudiconv \
    --files "${DICOM_PATH}"/*/*/*.dcm \
    --outdir "${OUTPUT_PATH}" \
    --heuristic convertall \
    --subjects "${SUBJECT_ID}" \
    --converter none \
    --bids \
    --overwrite
# ------------------------------------------------------------

# Deactivate the conda environment
conda deactivate

cp "${OUTPUT_PATH}"/.heudiconv/"${SUBJECT_ID}"/info/dicominfo.tsv "${OUTPUT_PATH}"
# ------------------------------------------------------------

# HeudiConv parameters:
# --files: Files or directories containing files to process
# --outdir: Output directory
# --heuristic: Name of a known heuristic or path to the Python script containing heuristic
# --subjects: Subject ID
# --converter : dicom to nii converter (dcm2niix or none)
# --bids: Flag for output into BIDS structure
# --overwrite: Flag to overwrite existing files
# 
# For a full list of parameters, see: https://heudiconv.readthedocs.io/en/latest/usage.html 

```

After running the script, the table that we are interested in will be located at *`OUTPUT_PATH/.heudiconv/[subject ID]/info/dicominfo.tsv`*. The .heudiconv directory is a hidden directory and you might not be able to see it in your file system unless you copy it to an unhidden directory. 


Now, you can open the file and keep it open for the next step - creating a heuristic file. 

### 5.2. <a id='toc5_2_'></a>[Step 2: Creating a heuristic file](#toc0_)

The `heuristic` file is used to convert and organize the DICOM data into BIDS standard. You will need to define heuristic keys. Keys define type of scan. 

The key definitions must strictly follow BIDS standart! https://bids-specification.readthedocs.io/en/stable/02-common-principles.html

In our example dataset, we have four types of scans: anatomical image, fieldmaps (magnitude and phase), and functional runs. We will need to define the keys for them all. Like this:

```python
    anat = create_key(
        'sub-{subject}/anat/sub-{subject}_T1w'
        )
    fmap_mag = create_key(
        'sub-{subject}/fmap/sub-{subject}_acq-func_magnitude'
        )
    fmap_phase = create_key(
        'sub-{subject}/fmap/sub-{subject}_acq-func_phasediff'
        )
    func_task = create_key(
        'sub-{subject}/func/sub-{subject}_task-facerecognition_run-{item:02d}_bold'
        )
```

Next, we will need to specify unique criteria that only the particular scan will meet. This information we get from the `dicominfo.tsv` file that we 'discovered' in the previous step. For example, to uniquely identify the anatomical scan, we can specify that the `protocol_name` contains `MPRAGE`. We don't have any other scans with MPRAGE in protocol name, therefore for the anatomical scan, we don't need to specify any additional cireteria. Similarly, we would specify unique identifiers for the other three scans. 

Then we integrate the keys and specifications into a heuristic Python file.

**Example heuristic file**: [code-examples/bids_heuristic.py](code-examples/bids_heuristic.py)



```python

# --------------------------------------------------------------------------------------
# create_key: A common helper function used to create the conversion key in infotodict. 
# But it is not used directly by HeuDiConv.
# --------------------------------------------------------------------------------------
def create_key(template, outtype=('nii.gz',), annotation_classes=None):
    if template is None or not template:
        raise ValueError('Template must be a valid format string')
    return template, outtype, annotation_classes
# --------------------------------------------------------------------------------------

# --------------------------------------------------------------------------------------
# infotodict: A function to assist in creating the dictionary, and to be used inside heudiconv.
# This is a required function for heudiconv to run.
#
# seqinfo is a record of DICOM's passed in by heudiconv. Each item in seqinfo contains DICOM metadata 
# that can be used to isolate the series, and assign it to a conversion key.

# --------------------------------------------------------------------------------------
def infotodict(seqinfo):

    # Specify the conversion template for each series following the BIDS format.
    # See https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/01-magnetic-resonance-imaging-data.html
    
    # The structural/anatomical scan
    anat = create_key('sub-{subject}/anat/sub-{subject}_T1w')
    
    # The fieldmap scans
    fmap_mag = create_key('sub-{subject}/fmap/sub-{subject}_acq-func_magnitude')
    fmap_phase = create_key('sub-{subject}/fmap/sub-{subject}_acq-func_phasediff')
    
    # The functional scans
    # You need to specify the task name in the filename. It must be a single string of letters WITHOUT spaces, underscores, or dashes!
    func_task = create_key('sub-{subject}/func/sub-{subject}_task-faceprocessing_run-{item:02d}_bold')
    
    # Create the dictionary that will be returned by this function.
    info = {
        anat: [], 
        fmap_mag: [], 
        fmap_phase: [],  
        func_task: []
        }

    # Loop through all the DICOM series and assign them to the appropriate conversion key.
    for s in seqinfo:
        # Uniquelly identify each series
        
        # Structural
        if "MPRAGE" in s.series_id:
            info[anat].append(s.series_id)
            
        # Field map Magnitude (the fieldmap with the largest dim3 is the magnitude, the other is the phase)
        if 'FieldMapping' in s.series_id and s.series_files == 66:
            info[fmap_mag].append(s.series_id)
            
        # Field map PhaseDiff
        if 'FieldMapping' in s.series_id and s.series_files == 33:
            info[fmap_phase].append(s.series_id)

        # Functional Bold
        if s.dim4 > 100:
           info[func_task].append(s.series_id)
            
    # Return the dictionary
    return info

# --------------------------------------------------------------------------------------
# Dictionary to specify options to populate the 'IntendedFor' field of the fmap jsons.
#
# See https://heudiconv.readthedocs.io/en/latest/heuristics.html#populate-intended-for-opts
#
# If POPULATE_INTENDED_FOR_OPTS is not present in the heuristic file, IntendedFor will not be populated automatically.
# --------------------------------------------------------------------------------------
POPULATE_INTENDED_FOR_OPTS = {
    'matching_parameters': ['ModalityAcquisitionLabel'],
    'criterion': 'Closest'
}
# 'ModalityAcquisitionLabel': it checks for what modality (anat, func, dwi) each fmap is 
# intended by checking the _acq- label in the fmap filename and finding corresponding 
# modalities (e.g. _acq-fmri, _acq-bold and _acq-func will be matched with the func modality)
```

### 5.3. <a id='toc5_3_'></a>[Step 3: Converting the data](#toc0_)

#### Conversting a single-subject

To convert a single subject, we only need to change 2 things in our previous 'DICOM discovery' script:
* the output to be in PROJECT_PATH/data/, 
* location to the heuristic file that we created in the previous step, and 
* we specify the DICOM to NIfTI converter (*dcm2niix*), so that the files are actually converted. 

```bash
# ------------------------------------------------------------
# Define your paths
# ------------------------------------------------------------
# Your project's root directory
PROJECT_PATH='/imaging/correia/da05/workshops/2024-CBU'
# Path to the raw DICOM files
DICOM_PATH='/mridata/cbu/CBU090942_MR09029'
# Location of the output data (it will be created if it doesn't exist)
OUTPUT_PATH="${PROJECT_PATH}/data/"
# Subject ID
SUBJECT_ID='01'

# ------------------------------------------------------------
# Run the heudiconv
# ------------------------------------------------------------
conda activate fmri

heudiconv \
    --files "${DICOM_PATH}"/*/*/*.dcm \
    --outdir "${OUTPUT_PATH}" \
    --heuristic $PROJECT_PATH/code/bids_heuristic.py \
    --subjects "${SUBJECT_ID}" \
    --converter dcm2niix \
    --bids \
    --overwrite

conda deactivate
# ------------------------------------------------------------
```
To convert other subjects as well, you'd need to change the raw DICOM path and subject ID accordingly. If you have multiple subjects, it's a good idea to process them all together using the scheduling system like SLURM (Simple Linux Utility for Resource Management).

#### Converting multiple subjects in parallel using SLURM

First, we need a generic script that runs HeuDiConv. It would be very similar to the one above where we converted a single subject. 

**Example of a generic heudiconv script**: [code-examples/heudiconv_script.sh](code-examples/heudiconv_script.sh)

Second, you'd need a project-specific script where you define the paths and use the `sbatch` command to execute the generic script for each subject. You can either write your script in bash, or Python if you prefer a more 'user-friendly' syntax. I have written an example script in Python. 

**Example script to convert multiple subjects' DICOMs to BIDS**: [code-examples/step02_dicom_to_bids.py](code-examples/step02_dicom_to_bids.py)

The script's main function is to generate a list of subject IDs alongside their corresponding DICOM paths, define the heuristic file's location, specify the output path, and then construct and execute an `sbatch`` command to run the *heudiconv_script.sh*. 

### 5.4. <a id='toc5_4_'></a>['To Do' - additional information to check and add](#toc0_)

Once you have converted the DICOMs to BIDS, there are some things that you need to fill in yourself to make the dataset fully BIDS compliant. HeuDiConv has marked such 'missing' information as 'To Do'. 

#### Dataset description

`dataset_description.json`

A brief description of your dataset. 

```json
{
  "Acknowledgements": "TODO: whom you want to acknowledge",
  "Authors": [
    "TODO:",
    "First1 Last1",
    "First2 Last2",
    "..."
  ],
  "BIDSVersion": "1.8.0",
  "DatasetDOI": "TODO: eventually a DOI for the dataset",
  "Funding": [
    "TODO",
    "GRANT #1",
    "GRANT #2"
  ],
  "HowToAcknowledge": "TODO: describe how to acknowledge -- either cite a corresponding paper, or just in acknowledgement section",
  "License": "TODO: choose a license, e.g. PDDL (http://opendatacommons.org/licenses/pddl/)",
  "Name": "TODO: name of the dataset",
  "ReferencesAndLinks": [
    "TODO",
    "List of papers or websites"
  ]
}
```

#### Participants

`participants.json`

```json
{
  "participant_id": {
    "Description": "Participant identifier"
  },
  "age": {
    "Description": "Age in years (TODO - verify) as in the initial session, might not be correct for other sessions"
  },
  "sex": {
    "Description": "self-rated by participant, M for male/F for female (TODO: verify)"
  },
  "group": {
    "Description": "(TODO: adjust - by default everyone is in control group)"
  }
}
```

#### Task information

`task-facerecognition_bold.json`

Could add full task name and a Cognitive Atlas ID if it exists. 
```json
{
 ...
  "TaskName": "TODO: full task name for facerecognition",
 ...
 ```

#### Events

For the functional images, if they are not resting-state images, but participants performed a task, you need to provide the trial type, onset and duration details. HeuDiConv generates the `events.tsv` file for each functional run. The files are just a template that you need to fill with the actual data. You would get this data from tour experimental script outputs (make sure you have programmed your task to easily retrieve the needed trial and timing details). 

For the example dataset used in this tutorial, I retrieved the event timing information from the [OpenNeuro version of this dataset](https://openneuro.org/datasets/ds000117/versions/1.0.5)

**See my script here:** [code-examples/step03_events_to_bids.py](code-examples/step03_events_to_bids.py).

The event files in the OpenNeuro version of this dataset, do not fully comply with the current BIDS specification. According to `BIDS` specification for [Task Events](https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/05-task-events.html), a correct column name is *'trial_type'*, not *'stim_type'*. In my script, after downloading the files, I fixed this naming. In addition, I removed the no-name trial types (the 'rest' period) as we don't want to model rest as a separate event.

In addition, the OpenNeuro version only had recorded three event types: FAMOUS, UNFAMILIAR, SCRAMBLED. But I wanted to also analyse the repetition suppression effects. Therefore, I split each condition into three: e.g., FAMOUS_1 (the initial presentation), FAMOUS_im (immediate repetition), FAMOUS_dl (delayed repetition).

If interested, you can **see my script for this here**: [code-examples/step04_transform_events.py](code-examples/step04_transform_events.py)

In [1]:
# copy the prepared events
!rm -f FaceProcessing/data/sub-04/func/sub-04_task-faceprocessing_run-*.tsv
!cp -r sub-04_task-files/* FaceProcessing/data/sub-04/func

In [None]:
import pandas as pd

events_file = 'FaceProcessing/data/sub-04/func/sub-04_task-faceprocessing_run-01_events.tsv'
events = pd.read_csv(events_file, sep='\t')

events.head()

#### README

*"TODO: Provide description for the dataset -- basic details about the study, possibly pointing to pre-registration (if public or embargoed)"*

See an example for the OpenNeuro version of this dataset https://openneuro.org/datasets/ds000117/versions/1.0.5/file-display/README

## 6. <a id='toc6_'></a>[Validate BIDS structure](#toc0_)

Once we have our BIDS dataset, we can use an [online BIDS validator](https://bids-standard.github.io/bids-validator/) to check if our dataset confirms with BIDS standard and what additional information we might need to include in your dataset's metadata. 

For this example dataset, we get some warnings about events custom columns that have no description. We can include *events.json* file that contains this information. For guidance see the BIDS specification https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/05-task-events.html

Suspiciously long event: `sub-10_task-facerecognition_run-09_events.tsv`. We can add this information in the README file: *Owing to scanner error, Subject 10 only has 170 volumes in last run (Run 9) (hence the BIDS warning of some onsets in events.tsv file being later than the data)* 

## 7. <a id='toc7_'></a>[PyBIDS](#toc0_)

`PyBids` is a Python module to interface with datasets conforming BIDS. See the [documentation](https://bids-standard.github.io/pybids/) and [paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7409983/) for more info. 

```
pip install pybids
```

**Let's explore some of the functionality of pybids.layout.** The material is adapted from https://github.com/bids-standard/pybids/tree/master/examples

In [None]:
import os
from bids.layout import BIDSLayout

ds_path = 'FaceProcessing/data'

# Initialize the layout
layout = BIDSLayout(ds_path)

# Print some basic information about the layout
layout

### 7.1. <a id='toc7_1_'></a>[Querying the BIDSLayout](#toc0_)
The main method for querying `BIDSLayout` is `.get()`.

If we call `.get()` with no additional arguments, we get back a list of all the BIDS files in our dataset.

In [None]:
all_files = layout.get()
print("There are {} files in the layout.".format(len(all_files)))
print("\nThe first 5 files are:")
all_files[:5]

The returned object is a **Python list**. Each element in the list is a `BIDSFile` object. 

We can also get just filenames.

In [None]:
layout.get(return_type='filename')[:5]

We can also get such information as
* all `subject` IDs
* all `task` names
* dataset `description`
* the BOLD repetition time TR
* how many `runs` there are

In [None]:
layout.get_subjects()

In [None]:
layout.get_tasks()

In [None]:
layout.get_dataset_description()

In [None]:
layout.get_tr()

Regarding runs, it might be that there are varied number of runs accross participants. So, let's get runs for each participant. 

In [None]:
for sID in layout.get_subjects(): 
    print(layout.get_runs(subject = sID))

### 7.2. <a id='toc7_2_'></a>[Filtering files by entities](#toc0_)
We can pass any BIDS-defined entities (keywords) to `.get()` method. For example, here's how we would retrieve all BOLD runs with `.nii.gz` extensions for subject `04`.

In [None]:
# Retrieve filenames of all BOLD runs for subject
layout.get(subject='04', extension='nii.gz', suffix='bold', return_type='filename')

All of the entities are found in the names of BIDS files. For example `sub-01_task-facerecognition_run-01_bold.nii.gz` has entities: **subject**, **task**, **run**, **suffix**, **extension**.

You can get the list of all availabe entities by `layout.get_entities()`.

Here are a few of the most common entities:

* `suffix`: The part of a BIDS filename just before the extension (e.g., 'bold', 'events', 'T1w', etc.).
* `subject`: The subject label
* `session`: The session label
* `run`: The run index
* `task`: The task name

### 7.3. <a id='toc7_3_'></a>[Filtering by metadata](#toc0_)
Sometimes we want to search for files based not just on their names, but also based on metadata defined in JSON files. We can pass any key that occurs in any JSON file in our project as an argument to `.get()`. We can combine these with any number of core BIDS entities (like `subject`, `run`, etc.).

For example, we want to retrieve `SpacingBetweenSlices` (measured from center-to-center of each slice, in mm) for all our subjects. And let's create a nice data frame of this information.

In [None]:
import pandas as pd
d = []
for subject in layout.get_subjects():
    d.append(
        {
            'subject': subject,
            'spacing': layout.get_SpacingBetweenSlices(subject=subject, suffix='bold')
        }
    )
df = pd.DataFrame(d)

print(df.to_string(index=False))

Having different spacing between the slices is rather unusual. But authors were trying to cover the whole cortex. So, larger brains, had larger spacing.

**==================================================================================================**

**EXCERCISE**

We want to know the time of the day when each subject was scanned. The scanning started with T1 images, so we want to retrieve the `AcquisitionTime` of all subjects' `T1w` images. Adapt the script above to acquire this information. 


In [None]:
# write your code here


**==================================================================================================**

### 7.4. <a id='toc7_4_'></a>[Other `return_type` values](#toc0_)
We can also ask `get()` to return unique values (or IDs) of particular entities. For example, we want to know which subjects had a fieldmap acquired. We can request that information by setting `return_type='id'` - to get subject IDs. When using this option, we also need to specify a `target` entity for the ID (in this case, subject). This combination tells the `BIDSLayout` to return the unique values for the specified `target` entity. 

For example, in the next example, we ask for all of the unique subject IDs that have at least one file with a `phasediff` (fieldmap) suffix. 

In [None]:
# Ask get() to return the ids of subjects that have phasediff (fieldmap_ files)

layout.get(return_type='id', target='subject', suffix='phasediff')

If our `target` is a BIDS entity that corresponds to a particular directory in the BIDS specification (e.g., `subject` or `session`) we can also use `return_type='dir'` to get all matching subdirectories. 

In [None]:
layout.get(return_type='dir', target='subject')

### 7.5. <a id='toc7_5_'></a>[The `BIDSFile`](#toc0_)
When you call `.get()` on a `BIDSLayout`, the default returned values are objects of class `BIDSFile`. A `BIDSFile` is a lightweight container for individual files in a BIDS dataset. It provides easy access to a variety of useful attributes and methods. Let's take a closer look. First, let's pick a random file from our existing `layout`.

In [None]:
# Pick the 7th file in the dataset
bf = layout.get(subject='04', extension='nii.gz', suffix='bold')[0]
# Print it
bf

Here are some of the attributes and methods available to us in a `BIDSFile` (note that some of these are only available for certain subclasses of `BIDSFile`; e.g., you can't call `get_image()` on a `BIDSFile` that doesn't correspond to an image file!):
* `.path`: The full path of the associated file
* `.filename`: The associated file's filename (without directory)
* `.dirname`: The directory containing the file
* `.get_entities()`: Returns information about entities associated with this `BIDSFile` (optionally including metadata)
* `.get_image()`: Returns the file contents as a nibabel image (only works for image files)
* `.get_df()`: Get file contents as a pandas DataFrame (only works for TSV files)
* `.get_metadata()`: Returns a dictionary of all metadata found in associated JSON files
* `.get_associations()`: Returns a list of all files associated with this one in some way

Let's see some of these in action.

In [None]:
# Print all the entities associated with this file, and their values
bf.get_entities()

In [None]:
# Print first 30 metadata items associated with this file
file_metadata = bf.get_metadata()

{k: file_metadata[k] for k in list(file_metadata)[:30]}

`.get_image()`: Returns the file contents as a `nibabel` image (only works for image files). We can then display the image, for example, using `OrthoSlicer3D`.   

**Note:** When using `orthoview()` in notebook, don't forget to close figures afterward again or use %matplotlib inline again, otherwise, you cannot plot any other figures.

In [None]:
%matplotlib inline

bf.get_image().orthoview()

### 7.6. <a id='toc7_6_'></a>[`.tsv` files](#toc0_)

In cases where a file has a `.tsv.gz` or `.tsv` extension, it will automatically be created as a `BIDSDataFile`, and we can easily grab the contents as a `DataFrame`.

Let's look at the first `events` file from our layout.

In [None]:
# Get the first events file
evfile = layout.get(suffix='events')[0]

# Get contents as a DataFrame and show the first few rows
df = evfile.get_df()
df.head()

Let's look at the `participants` information. 

In [None]:
participants = layout.get(suffix='participants', extension='tsv')[0]

df = participants.get_df()
df.sort_values(by=['participant_id'])

### 7.7. <a id='toc7_7_'></a>[Filename parsing](#toc0_)
Let's say you have a filename, and you want to manually extract BIDS entities from it. The `parse_file_entities` method provides the facility:

In [None]:
layout.parse_file_entities('some_path_to_bids_file/sub-04_task-facerecognition_run-01_bold.nii.gz')

You can do the same for `BIDSFile` object that we defined earlier. 

In [None]:
layout.parse_file_entities(bf.path)

In [None]:
layout.parse_file_entities(bf.filename)

### 7.8. <a id='toc7_8_'></a>[Report generation](#toc0_)
`PyBIDS` also allows you to automatically create data acquisition reports based on the available `image` and `meta-data` information. This enables a new level of standardisation and transparency. FAIR-ness, meta-analyses, etc. 

In [None]:
# import the BIDSReport function from the reports submodule
from bids.reports import BIDSReport

Now we only need to apply the `BIDSReport` function to our `layout` and generate our report. 

In [None]:
# Initialize a report for the dataset
report = BIDSReport(layout)

# Method generate returns a Counter of unique descriptions across subjects
try:
    descriptions = report.generate()
    pub_description = descriptions.most_common()[0][0]
    print(pub_description)
except IndexError:
    print('Sorry, it seems that the dataset is not complete and report cannot be generated.')