# BIQL Guide: From Beginner to Expert

Welcome to the comprehensive BIQL (BIDS Query Language) guide! This interactive notebook will take you from your first query to advanced neuroimaging workflows.

## What You'll Learn

📚 **Core Concepts**: BIDS structure, entities, and metadata  
🔍 **Query Fundamentals**: Filtering, selecting, and sorting data  
📊 **Aggregation & Grouping**: Complex analysis patterns  
🧠 **Real Workflows**: QC, preprocessing, and specialized pipelines  
⚡ **Performance Tips**: Writing efficient queries for large datasets  

## Prerequisites

- Basic understanding of BIDS (Brain Imaging Data Structure)
- Python 3.8+ installed
- A BIDS dataset (we'll show you how to use example data)

Let's get started!

## 1. Setup and Installation

First, let's install BIQL and set up our environment:

In [25]:
# Install BIQL (uncomment if needed)
# !pip install biql

# Import required libraries
import json
import tempfile
from pathlib import Path
from biql import create_query_engine, create_example_dataset
import pandas as pd  # For DataFrame output (install with: pip install pandas)

## 2. Understanding BIDS Structure

Before we start querying, let's understand what we're working with. Here's a typical BIDS dataset structure:

```
study_dataset/
├── dataset_description.json          # Dataset metadata
├── participants.tsv                  # Participant demographics
├── sub-01/                          # Subject 01
│   ├── ses-baseline/                # Session: baseline
│   │   ├── anat/                    # Anatomical data
│   │   │   ├── sub-01_ses-baseline_T1w.nii.gz
│   │   │   └── sub-01_ses-baseline_T1w.json
│   │   └── func/                    # Functional data  
│   │       ├── sub-01_ses-baseline_task-nback_run-01_bold.nii.gz
│   │       ├── sub-01_ses-baseline_task-nback_run-01_bold.json
│   │       ├── sub-01_ses-baseline_task-rest_bold.nii.gz
│   │       └── sub-01_ses-baseline_task-rest_bold.json
│   └── ses-followup/                # Session: followup
│       └── func/
│           ├── sub-01_ses-followup_task-nback_run-01_bold.nii.gz
│           └── sub-01_ses-followup_task-nback_run-01_bold.json
├── sub-02/                          # Subject 02
│   └── ses-baseline/
│       ├── anat/
│       │   ├── sub-02_ses-baseline_T1w.nii.gz
│       │   └── sub-02_ses-baseline_T1w.json
│       └── func/
│           ├── sub-02_ses-baseline_task-nback_run-01_bold.nii.gz
│           ├── sub-02_ses-baseline_task-nback_run-01_bold.json
│           ├── sub-02_ses-baseline_task-nback_run-02_bold.nii.gz
│           ├── sub-02_ses-baseline_task-nback_run-02_bold.json
│           ├── sub-02_ses-baseline_task-rest_bold.nii.gz
│           └── sub-02_ses-baseline_task-rest_bold.json
└── sub-03/                          # Subject 03
    └── ses-baseline/
        ├── anat/
        │   ├── sub-03_ses-baseline_T1w.nii.gz
        │   └── sub-03_ses-baseline_T1w.json
        └── func/
            ├── sub-03_ses-baseline_task-rest_bold.nii.gz
            └── sub-03_ses-baseline_task-rest_bold.json
```

Let's create a small example dataset to work with:

In [26]:
# Create our example dataset
dataset_path = create_example_dataset()
print(f"Created example dataset at: {dataset_path}")

# Create BIQL query engine with JSON as default output
biql = create_query_engine(dataset_path, default_format="json")

print(f"\nDataset loaded successfully!")
print(f"Total files: {len(biql.dataset.files)}")
print(f"Subjects: {len(biql.dataset.participants)}")

# Quick dataset overview
print(f"\n📊 Dataset Overview:")
# Simple file count by datatype
all_files = biql.run_query("SELECT datatype, COUNT(*) GROUP BY datatype", format="dataframe")
print(f"  Files by datatype:")
for _, row in all_files.iterrows():
    print(f"    {row['datatype']}: {row['count']}")

subjects = biql.run_query("SELECT DISTINCT sub")
print(f"  Subjects: {[s['sub'] for s in subjects]}")

# Helper function for cleaner output
def show_results(results, limit=3):
    """Helper to show query results in a clean format"""
    if isinstance(results, list):
        if len(results) <= limit:
            print(json.dumps(results, indent=2))
        else:
            print(json.dumps(results[:limit], indent=2))
            print(f"... and {len(results)-limit} more results")
    else:
        print(results)

Created example dataset at: /tmp/biql_example_85c6gefa

Dataset loaded successfully!
Total files: 20
Subjects: 3

📊 Dataset Overview:
  Files by datatype:
    func: 14
    anat: 6
  Subjects: ['03', '02', '01']


## 3. Your First BIQL Queries

Now let's start with simple queries to explore our data. BIQL queries can be as simple as a single condition:

### 3.1 Basic Filtering

Let's start with the simplest possible query - finding all files for a specific subject:

In [28]:
biql.run_query("sub=01", format="dataframe")

Unnamed: 0,filepath,relative_path,filename,sub,ses,task,run,suffix,datatype,extension,metadata,participants
0,/tmp/biql_example_85c6gefa/sub-01/ses-followup...,sub-01/ses-followup/func/sub-01_ses-followup_t...,sub-01_ses-followup_task-nback_run-01_bold.nii.gz,1,followup,nback,1.0,bold,func,.nii.gz,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=25; sex=F; group=control; site=SiteA
1,/tmp/biql_example_85c6gefa/sub-01/ses-baseline...,sub-01/ses-baseline/func/sub-01_ses-baseline_t...,sub-01_ses-baseline_task-rest_bold.nii.gz,1,baseline,rest,,bold,func,.nii.gz,TaskName=rest; RepetitionTime=2.0; EchoTime=0....,age=25; sex=F; group=control; site=SiteA
2,/tmp/biql_example_85c6gefa/sub-01/ses-baseline...,sub-01/ses-baseline/func/sub-01_ses-baseline_t...,sub-01_ses-baseline_task-nback_run-01_bold.nii.gz,1,baseline,nback,1.0,bold,func,.nii.gz,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=25; sex=F; group=control; site=SiteA
3,/tmp/biql_example_85c6gefa/sub-01/ses-baseline...,sub-01/ses-baseline/anat/sub-01_ses-baseline_T...,sub-01_ses-baseline_T1w.nii.gz,1,baseline,,,T1w,anat,.nii.gz,FlipAngle=9; EchoTime=0.00372; RepetitionTime=2.3,age=25; sex=F; group=control; site=SiteA
4,/tmp/biql_example_85c6gefa/sub-01/ses-followup...,sub-01/ses-followup/func/sub-01_ses-followup_t...,sub-01_ses-followup_task-nback_run-01_bold.json,1,followup,nback,1.0,bold,func,.json,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=25; sex=F; group=control; site=SiteA
5,/tmp/biql_example_85c6gefa/sub-01/ses-baseline...,sub-01/ses-baseline/func/sub-01_ses-baseline_t...,sub-01_ses-baseline_task-rest_bold.json,1,baseline,rest,,bold,func,.json,TaskName=rest; RepetitionTime=2.0; EchoTime=0....,age=25; sex=F; group=control; site=SiteA
6,/tmp/biql_example_85c6gefa/sub-01/ses-baseline...,sub-01/ses-baseline/func/sub-01_ses-baseline_t...,sub-01_ses-baseline_task-nback_run-01_bold.json,1,baseline,nback,1.0,bold,func,.json,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=25; sex=F; group=control; site=SiteA
7,/tmp/biql_example_85c6gefa/sub-01/ses-baseline...,sub-01/ses-baseline/anat/sub-01_ses-baseline_T...,sub-01_ses-baseline_T1w.json,1,baseline,,,T1w,anat,.json,FlipAngle=9; EchoTime=0.00372; RepetitionTime=2.3,age=25; sex=F; group=control; site=SiteA


**What happened here?**
- BIQL found all files where the `sub` entity equals `01`
- Each result includes the file path, BIDS entities, and metadata
- Notice how `sub=01` is equivalent to `WHERE sub=01` in SQL

In [15]:
# Query 2: Find all functional data
biql.run_query("datatype=func", format="dataframe")

Unnamed: 0,filepath,relative_path,filename,sub,ses,task,suffix,datatype,extension,metadata,participants,run
0,/tmp/biql_example_jwhs1cfo/sub-03/ses-baseline...,sub-03/ses-baseline/func/sub-03_ses-baseline_t...,sub-03_ses-baseline_task-rest_bold.nii.gz,3,baseline,rest,bold,func,.nii.gz,TaskName=rest; RepetitionTime=2.0; EchoTime=0....,age=22; sex=F; group=control; site=SiteB,
1,/tmp/biql_example_jwhs1cfo/sub-02/ses-baseline...,sub-02/ses-baseline/func/sub-02_ses-baseline_t...,sub-02_ses-baseline_task-rest_bold.nii.gz,2,baseline,rest,bold,func,.nii.gz,TaskName=rest; RepetitionTime=2.0; EchoTime=0....,age=28; sex=M; group=patient; site=SiteA,
2,/tmp/biql_example_jwhs1cfo/sub-02/ses-baseline...,sub-02/ses-baseline/func/sub-02_ses-baseline_t...,sub-02_ses-baseline_task-nback_run-02_bold.nii.gz,2,baseline,nback,bold,func,.nii.gz,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=28; sex=M; group=patient; site=SiteA,2.0
3,/tmp/biql_example_jwhs1cfo/sub-02/ses-baseline...,sub-02/ses-baseline/func/sub-02_ses-baseline_t...,sub-02_ses-baseline_task-nback_run-01_bold.nii.gz,2,baseline,nback,bold,func,.nii.gz,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=28; sex=M; group=patient; site=SiteA,1.0
4,/tmp/biql_example_jwhs1cfo/sub-01/ses-followup...,sub-01/ses-followup/func/sub-01_ses-followup_t...,sub-01_ses-followup_task-nback_run-01_bold.nii.gz,1,followup,nback,bold,func,.nii.gz,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=25; sex=F; group=control; site=SiteA,1.0
5,/tmp/biql_example_jwhs1cfo/sub-01/ses-baseline...,sub-01/ses-baseline/func/sub-01_ses-baseline_t...,sub-01_ses-baseline_task-rest_bold.nii.gz,1,baseline,rest,bold,func,.nii.gz,TaskName=rest; RepetitionTime=2.0; EchoTime=0....,age=25; sex=F; group=control; site=SiteA,
6,/tmp/biql_example_jwhs1cfo/sub-01/ses-baseline...,sub-01/ses-baseline/func/sub-01_ses-baseline_t...,sub-01_ses-baseline_task-nback_run-01_bold.nii.gz,1,baseline,nback,bold,func,.nii.gz,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=25; sex=F; group=control; site=SiteA,1.0
7,/tmp/biql_example_jwhs1cfo/sub-03/ses-baseline...,sub-03/ses-baseline/func/sub-03_ses-baseline_t...,sub-03_ses-baseline_task-rest_bold.json,3,baseline,rest,bold,func,.json,TaskName=rest; RepetitionTime=2.0; EchoTime=0....,age=22; sex=F; group=control; site=SiteB,
8,/tmp/biql_example_jwhs1cfo/sub-02/ses-baseline...,sub-02/ses-baseline/func/sub-02_ses-baseline_t...,sub-02_ses-baseline_task-rest_bold.json,2,baseline,rest,bold,func,.json,TaskName=rest; RepetitionTime=2.0; EchoTime=0....,age=28; sex=M; group=patient; site=SiteA,
9,/tmp/biql_example_jwhs1cfo/sub-02/ses-baseline...,sub-02/ses-baseline/func/sub-02_ses-baseline_t...,sub-02_ses-baseline_task-nback_run-02_bold.json,2,baseline,nback,bold,func,.json,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=28; sex=M; group=patient; site=SiteA,2.0


### 3.2 Combining Conditions

Real queries often need multiple conditions. BIQL supports logical operators:

In [16]:
# Query 3: Functional data for specific subject
biql.run_query("sub=01 AND datatype=func", format="dataframe")

Unnamed: 0,filepath,relative_path,filename,sub,ses,task,run,suffix,datatype,extension,metadata,participants
0,/tmp/biql_example_jwhs1cfo/sub-01/ses-followup...,sub-01/ses-followup/func/sub-01_ses-followup_t...,sub-01_ses-followup_task-nback_run-01_bold.nii.gz,1,followup,nback,1.0,bold,func,.nii.gz,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=25; sex=F; group=control; site=SiteA
1,/tmp/biql_example_jwhs1cfo/sub-01/ses-baseline...,sub-01/ses-baseline/func/sub-01_ses-baseline_t...,sub-01_ses-baseline_task-rest_bold.nii.gz,1,baseline,rest,,bold,func,.nii.gz,TaskName=rest; RepetitionTime=2.0; EchoTime=0....,age=25; sex=F; group=control; site=SiteA
2,/tmp/biql_example_jwhs1cfo/sub-01/ses-baseline...,sub-01/ses-baseline/func/sub-01_ses-baseline_t...,sub-01_ses-baseline_task-nback_run-01_bold.nii.gz,1,baseline,nback,1.0,bold,func,.nii.gz,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=25; sex=F; group=control; site=SiteA
3,/tmp/biql_example_jwhs1cfo/sub-01/ses-followup...,sub-01/ses-followup/func/sub-01_ses-followup_t...,sub-01_ses-followup_task-nback_run-01_bold.json,1,followup,nback,1.0,bold,func,.json,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=25; sex=F; group=control; site=SiteA
4,/tmp/biql_example_jwhs1cfo/sub-01/ses-baseline...,sub-01/ses-baseline/func/sub-01_ses-baseline_t...,sub-01_ses-baseline_task-rest_bold.json,1,baseline,rest,,bold,func,.json,TaskName=rest; RepetitionTime=2.0; EchoTime=0....,age=25; sex=F; group=control; site=SiteA
5,/tmp/biql_example_jwhs1cfo/sub-01/ses-baseline...,sub-01/ses-baseline/func/sub-01_ses-baseline_t...,sub-01_ses-baseline_task-nback_run-01_bold.json,1,baseline,nback,1.0,bold,func,.json,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=25; sex=F; group=control; site=SiteA


In [17]:
# Query 4: Multiple subjects with OR
biql.run_query("sub=01 OR sub=02", format="dataframe")

Unnamed: 0,filepath,relative_path,filename,sub,ses,task,suffix,datatype,extension,metadata,participants,run
0,/tmp/biql_example_jwhs1cfo/sub-02/ses-baseline...,sub-02/ses-baseline/func/sub-02_ses-baseline_t...,sub-02_ses-baseline_task-rest_bold.nii.gz,2,baseline,rest,bold,func,.nii.gz,TaskName=rest; RepetitionTime=2.0; EchoTime=0....,age=28; sex=M; group=patient; site=SiteA,
1,/tmp/biql_example_jwhs1cfo/sub-02/ses-baseline...,sub-02/ses-baseline/func/sub-02_ses-baseline_t...,sub-02_ses-baseline_task-nback_run-02_bold.nii.gz,2,baseline,nback,bold,func,.nii.gz,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=28; sex=M; group=patient; site=SiteA,2.0
2,/tmp/biql_example_jwhs1cfo/sub-02/ses-baseline...,sub-02/ses-baseline/func/sub-02_ses-baseline_t...,sub-02_ses-baseline_task-nback_run-01_bold.nii.gz,2,baseline,nback,bold,func,.nii.gz,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=28; sex=M; group=patient; site=SiteA,1.0
3,/tmp/biql_example_jwhs1cfo/sub-02/ses-baseline...,sub-02/ses-baseline/anat/sub-02_ses-baseline_T...,sub-02_ses-baseline_T1w.nii.gz,2,baseline,,T1w,anat,.nii.gz,FlipAngle=9; EchoTime=0.00372; RepetitionTime=2.3,age=28; sex=M; group=patient; site=SiteA,
4,/tmp/biql_example_jwhs1cfo/sub-01/ses-followup...,sub-01/ses-followup/func/sub-01_ses-followup_t...,sub-01_ses-followup_task-nback_run-01_bold.nii.gz,1,followup,nback,bold,func,.nii.gz,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=25; sex=F; group=control; site=SiteA,1.0
5,/tmp/biql_example_jwhs1cfo/sub-01/ses-baseline...,sub-01/ses-baseline/func/sub-01_ses-baseline_t...,sub-01_ses-baseline_task-rest_bold.nii.gz,1,baseline,rest,bold,func,.nii.gz,TaskName=rest; RepetitionTime=2.0; EchoTime=0....,age=25; sex=F; group=control; site=SiteA,
6,/tmp/biql_example_jwhs1cfo/sub-01/ses-baseline...,sub-01/ses-baseline/func/sub-01_ses-baseline_t...,sub-01_ses-baseline_task-nback_run-01_bold.nii.gz,1,baseline,nback,bold,func,.nii.gz,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=25; sex=F; group=control; site=SiteA,1.0
7,/tmp/biql_example_jwhs1cfo/sub-01/ses-baseline...,sub-01/ses-baseline/anat/sub-01_ses-baseline_T...,sub-01_ses-baseline_T1w.nii.gz,1,baseline,,T1w,anat,.nii.gz,FlipAngle=9; EchoTime=0.00372; RepetitionTime=2.3,age=25; sex=F; group=control; site=SiteA,
8,/tmp/biql_example_jwhs1cfo/sub-02/ses-baseline...,sub-02/ses-baseline/func/sub-02_ses-baseline_t...,sub-02_ses-baseline_task-rest_bold.json,2,baseline,rest,bold,func,.json,TaskName=rest; RepetitionTime=2.0; EchoTime=0....,age=28; sex=M; group=patient; site=SiteA,
9,/tmp/biql_example_jwhs1cfo/sub-02/ses-baseline...,sub-02/ses-baseline/func/sub-02_ses-baseline_t...,sub-02_ses-baseline_task-nback_run-02_bold.json,2,baseline,nback,bold,func,.json,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=28; sex=M; group=patient; site=SiteA,2.0


### 3.3 Using Lists for Cleaner Queries

Instead of multiple OR conditions, you can use the `IN` operator with lists:

**Recent Improvements**: BIQL now has enhanced type coercion and reserved keyword support:

1. **Smart Type Coercion**: Numbers like `[1, 2, 3]` are automatically converted to zero-padded strings like `["01", "02", "03"]` when comparing against subject IDs
2. **Reserved Keywords**: You can now use `participants.group` even though "group" is a SQL reserved keyword

These improvements make BIQL more intuitive and less error-prone!

In [18]:
# Query 5: Multiple subjects with IN operator
biql.run_query('sub IN [1, 2, 3]', format="dataframe")

Unnamed: 0,filepath,relative_path,filename,sub,ses,task,suffix,datatype,extension,metadata,participants,run
0,/tmp/biql_example_jwhs1cfo/sub-03/ses-baseline...,sub-03/ses-baseline/func/sub-03_ses-baseline_t...,sub-03_ses-baseline_task-rest_bold.nii.gz,3,baseline,rest,bold,func,.nii.gz,TaskName=rest; RepetitionTime=2.0; EchoTime=0....,age=22; sex=F; group=control; site=SiteB,
1,/tmp/biql_example_jwhs1cfo/sub-03/ses-baseline...,sub-03/ses-baseline/anat/sub-03_ses-baseline_T...,sub-03_ses-baseline_T1w.nii.gz,3,baseline,,T1w,anat,.nii.gz,FlipAngle=9; EchoTime=0.00372; RepetitionTime=2.3,age=22; sex=F; group=control; site=SiteB,
2,/tmp/biql_example_jwhs1cfo/sub-02/ses-baseline...,sub-02/ses-baseline/func/sub-02_ses-baseline_t...,sub-02_ses-baseline_task-rest_bold.nii.gz,2,baseline,rest,bold,func,.nii.gz,TaskName=rest; RepetitionTime=2.0; EchoTime=0....,age=28; sex=M; group=patient; site=SiteA,
3,/tmp/biql_example_jwhs1cfo/sub-02/ses-baseline...,sub-02/ses-baseline/func/sub-02_ses-baseline_t...,sub-02_ses-baseline_task-nback_run-02_bold.nii.gz,2,baseline,nback,bold,func,.nii.gz,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=28; sex=M; group=patient; site=SiteA,2.0
4,/tmp/biql_example_jwhs1cfo/sub-02/ses-baseline...,sub-02/ses-baseline/func/sub-02_ses-baseline_t...,sub-02_ses-baseline_task-nback_run-01_bold.nii.gz,2,baseline,nback,bold,func,.nii.gz,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=28; sex=M; group=patient; site=SiteA,1.0
5,/tmp/biql_example_jwhs1cfo/sub-02/ses-baseline...,sub-02/ses-baseline/anat/sub-02_ses-baseline_T...,sub-02_ses-baseline_T1w.nii.gz,2,baseline,,T1w,anat,.nii.gz,FlipAngle=9; EchoTime=0.00372; RepetitionTime=2.3,age=28; sex=M; group=patient; site=SiteA,
6,/tmp/biql_example_jwhs1cfo/sub-01/ses-followup...,sub-01/ses-followup/func/sub-01_ses-followup_t...,sub-01_ses-followup_task-nback_run-01_bold.nii.gz,1,followup,nback,bold,func,.nii.gz,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=25; sex=F; group=control; site=SiteA,1.0
7,/tmp/biql_example_jwhs1cfo/sub-01/ses-baseline...,sub-01/ses-baseline/func/sub-01_ses-baseline_t...,sub-01_ses-baseline_task-rest_bold.nii.gz,1,baseline,rest,bold,func,.nii.gz,TaskName=rest; RepetitionTime=2.0; EchoTime=0....,age=25; sex=F; group=control; site=SiteA,
8,/tmp/biql_example_jwhs1cfo/sub-01/ses-baseline...,sub-01/ses-baseline/func/sub-01_ses-baseline_t...,sub-01_ses-baseline_task-nback_run-01_bold.nii.gz,1,baseline,nback,bold,func,.nii.gz,TaskName=nback; RepetitionTime=2.0; EchoTime=0...,age=25; sex=F; group=control; site=SiteA,1.0
9,/tmp/biql_example_jwhs1cfo/sub-01/ses-baseline...,sub-01/ses-baseline/anat/sub-01_ses-baseline_T...,sub-01_ses-baseline_T1w.nii.gz,1,baseline,,T1w,anat,.nii.gz,FlipAngle=9; EchoTime=0.00372; RepetitionTime=2.3,age=25; sex=F; group=control; site=SiteA,


## 4. Selecting Specific Fields

By default, BIQL returns all available information about each file. The `SELECT` clause lets you choose exactly what you want:

In [32]:
# Query 6: Select only specific fields
biql.run_query("SELECT sub, task, filepath WHERE datatype=func", format="dataframe")

Unnamed: 0,sub,task,filepath
0,3,rest,/tmp/biql_example_85c6gefa/sub-03/ses-baseline...
1,2,rest,/tmp/biql_example_85c6gefa/sub-02/ses-baseline...
2,2,nback,/tmp/biql_example_85c6gefa/sub-02/ses-baseline...
3,2,nback,/tmp/biql_example_85c6gefa/sub-02/ses-baseline...
4,1,nback,/tmp/biql_example_85c6gefa/sub-01/ses-followup...
5,1,rest,/tmp/biql_example_85c6gefa/sub-01/ses-baseline...
6,1,nback,/tmp/biql_example_85c6gefa/sub-01/ses-baseline...
7,3,rest,/tmp/biql_example_85c6gefa/sub-03/ses-baseline...
8,2,rest,/tmp/biql_example_85c6gefa/sub-02/ses-baseline...
9,2,nback,/tmp/biql_example_85c6gefa/sub-02/ses-baseline...


**Much cleaner!** Now we only see the fields we care about.

### 4.1 Finding Unique Values

The `DISTINCT` keyword helps you discover what values exist in your dataset:

In [33]:
# Query 7: What subjects do we have?
biql.run_query("SELECT DISTINCT sub", format="dataframe")

Unnamed: 0,sub
0,3
1,2
2,1


In [35]:
# Query 8: What tasks are available?
biql.run_query("SELECT DISTINCT task WHERE datatype=func", format="dataframe")

Unnamed: 0,task
0,rest
1,nback


## 5. Working with Metadata

BIDS datasets include rich metadata in JSON files. BIQL can query this metadata using dot notation:

In [37]:
# Query 9: Find scans with specific repetition time
biql.run_query("SELECT sub, task, metadata.RepetitionTime WHERE metadata.RepetitionTime=2.0", format="dataframe")

Unnamed: 0,sub,task,metadata.RepetitionTime
0,3,rest,2.0
1,2,rest,2.0
2,2,nback,2.0
3,2,nback,2.0
4,1,nback,2.0
5,1,rest,2.0
6,1,nback,2.0
7,3,rest,2.0
8,2,rest,2.0
9,2,nback,2.0


In [39]:
# Query 10: Find all unique TR values
biql.run_query("SELECT DISTINCT metadata.RepetitionTime WHERE metadata.RepetitionTime", format="dataframe")

Unnamed: 0,metadata.RepetitionTime
0,2.0
1,2.3


**Note**: The `WHERE metadata.RepetitionTime` part filters to only files that *have* a RepetitionTime field, excluding files where it's missing.

## 6. Participant Demographics

BIQL can also query participant data from the `participants.tsv` file:

In [53]:
# Query 11: Find data for older participants
biql.run_query("SELECT DISTINCT sub, participants.age, participants.sex WHERE participants.age>=25", format="dataframe")

Unnamed: 0,sub,participants.age,participants.sex
0,2,28,M
1,1,25,F


In [61]:
# Query 12: Control group functional data  
biql.run_query("SELECT sub, task, participants.group WHERE datatype=func AND participants.group=control", format="dataframe")

Unnamed: 0,sub,task,participants.GROUP
0,3,rest,control
1,1,nback,control
2,1,rest,control
3,1,nback,control
4,3,rest,control
5,1,nback,control
6,1,rest,control
7,1,nback,control


## 7. Counting and Grouping Data

For data analysis and quality control, you often need to count files or group them by certain criteria:

In [62]:
# Query 13: Count files by datatype
biql.run_query("SELECT datatype, COUNT(*) GROUP BY datatype", format="dataframe")

Unnamed: 0,datatype,count
0,func,14
1,anat,6


In [64]:
# Query 14: Files per subject
biql.run_query("SELECT sub, COUNT(*) GROUP BY sub", format="dataframe")

Unnamed: 0,sub,count
0,3,4
1,2,8
2,1,8


### 7.1 Multiple Grouping Levels

You can group by multiple fields to get more detailed breakdowns:

In [66]:
# Query 15: Task coverage by subject
biql.run_query("SELECT sub, task, COUNT(*) WHERE datatype=func GROUP BY sub, task", format="dataframe")

Unnamed: 0,sub,task,count
0,3,rest,2
1,2,rest,2
2,2,nback,4
3,1,nback,4
4,1,rest,2


**Notice**: When you group by multiple fields, non-grouped fields (like `run`) are automatically aggregated into arrays if there are multiple values.

## 8. New BIQL API and Output Formats

BIQL now has a convenient high-level API that makes querying much simpler! Let's explore the different output formats:

In [67]:
# Demo different output formats with the new API
query = "SELECT sub, task, COUNT(*) WHERE datatype=func GROUP BY sub, task"

print("📋 Table format (great for reports):")
print(biql.run_query(query, format="table"))

print("\n📊 DataFrame format (great for analysis):")
df = biql.run_query(query, format="dataframe")
print(df)

print("\n📈 CSV format (great for exports):")
print(biql.run_query(query, format="csv"))

print("\n🔗 JSON format (great for APIs - this is our default):")
json_results = biql.run_query(query)  # Uses default format="json"
print(json.dumps(json_results[:3], indent=2))
print(f"... and {len(json_results)-3} more results" if len(json_results) > 3 else "")

📋 Table format (great for reports):
| count | sub | task  |
| ----- | --- | ----- |
| 2     | 03  | rest  |
| 2     | 02  | rest  |
| 4     | 02  | nback |
| 4     | 01  | nback |
| 2     | 01  | rest  |

📊 DataFrame format (great for analysis):
  sub   task  count
0  03   rest      2
1  02   rest      2
2  02  nback      4
3  01  nback      4
4  01   rest      2

📈 CSV format (great for exports):
count,sub,task
2,03,rest
2,02,rest
4,02,nback
4,01,nback
2,01,rest


🔗 JSON format (great for APIs - this is our default):
[
  {
    "sub": "03",
    "task": "rest",
    "count": 2
  },
  {
    "sub": "02",
    "task": "rest",
    "count": 2
  },
  {
    "sub": "02",
    "task": "nback",
    "count": 4
  }
]
... and 2 more results


In [68]:
print("📊 CSV format (great for exports):")
print(biql.run_query(query, format="csv"))

📊 CSV format (great for exports):
count,sub,task
2,03,rest
2,02,rest
4,02,nback
4,01,nback
2,01,rest



In [69]:
# Paths format is useful for getting file lists
print("📁 Paths format (great for processing pipelines):")
print(biql.run_query("SELECT filepath WHERE task=rest", format="paths"))

📁 Paths format (great for processing pipelines):
/tmp/biql_example_85c6gefa/sub-03/ses-baseline/func/sub-03_ses-baseline_task-rest_bold.nii.gz
/tmp/biql_example_85c6gefa/sub-02/ses-baseline/func/sub-02_ses-baseline_task-rest_bold.nii.gz
/tmp/biql_example_85c6gefa/sub-01/ses-baseline/func/sub-01_ses-baseline_task-rest_bold.nii.gz
/tmp/biql_example_85c6gefa/sub-03/ses-baseline/func/sub-03_ses-baseline_task-rest_bold.json
/tmp/biql_example_85c6gefa/sub-02/ses-baseline/func/sub-02_ses-baseline_task-rest_bold.json
/tmp/biql_example_85c6gefa/sub-01/ses-baseline/func/sub-01_ses-baseline_task-rest_bold.json


## 9. Pattern Matching and Advanced Filtering

BIQL supports powerful pattern matching for flexible file discovery:

In [ ]:
# Query 16: Wildcard matching
print("🔍 Query: Find all BOLD-related files")
print("BIQL: SELECT filename WHERE filename=*bold*")
print("\n📊 Results:")
results = biql.run_query("SELECT filename WHERE filename=*bold*")
show_results(results, limit=3)

print("\n💡 Wildcard patterns work with computed fields:")
print("   filename=*bold*    # Match filenames containing 'bold'")
print("   filename=*T1w*     # Match filenames containing 'T1w'") 
print("   filepath=*/func/*  # Match filepaths containing '/func/'")

# Test another pattern
t1w_results = biql.run_query("SELECT filename WHERE filename=*T1w*")
print(f"\\nExample: filename=*T1w* finds {len(t1w_results)} anatomical files")

In [74]:
# Query 16: Wildcard matching
biql.run_query("SELECT filename")

[{'filename': 'sub-03_ses-baseline_task-rest_bold.nii.gz'},
 {'filename': 'sub-03_ses-baseline_T1w.nii.gz'},
 {'filename': 'sub-02_ses-baseline_task-rest_bold.nii.gz'},
 {'filename': 'sub-02_ses-baseline_task-nback_run-02_bold.nii.gz'},
 {'filename': 'sub-02_ses-baseline_task-nback_run-01_bold.nii.gz'},
 {'filename': 'sub-02_ses-baseline_T1w.nii.gz'},
 {'filename': 'sub-01_ses-followup_task-nback_run-01_bold.nii.gz'},
 {'filename': 'sub-01_ses-baseline_task-rest_bold.nii.gz'},
 {'filename': 'sub-01_ses-baseline_task-nback_run-01_bold.nii.gz'},
 {'filename': 'sub-01_ses-baseline_T1w.nii.gz'},
 {'filename': 'sub-03_ses-baseline_task-rest_bold.json'},
 {'filename': 'sub-03_ses-baseline_T1w.json'},
 {'filename': 'sub-02_ses-baseline_task-rest_bold.json'},
 {'filename': 'sub-02_ses-baseline_task-nback_run-02_bold.json'},
 {'filename': 'sub-02_ses-baseline_task-nback_run-01_bold.json'},
 {'filename': 'sub-02_ses-baseline_T1w.json'},
 {'filename': 'sub-01_ses-followup_task-nback_run-01_bold.j

In [None]:
# Query 17: Regex matching for subject ranges
print("🔍 Query: Subjects 01-02 using regex")
print('BIQL: SELECT DISTINCT sub WHERE sub~="0[1-2]"')
print("\n📊 Results:")
results = biql.run_query('SELECT DISTINCT sub WHERE sub~="0[1-2]"')
print(json.dumps(results, indent=2))

## 10. Quality Control Workflows

Let's look at some real-world quality control patterns:

In [None]:
# QC 1: Check data completeness
print("🔍 QC Check: Session completeness by subject")
print("BIQL: SELECT sub, ses, COUNT(*) GROUP BY sub, ses")
print("\n📊 Results:")
results = biql.run_query("SELECT sub, ses, COUNT(*) GROUP BY sub, ses")
print(json.dumps(results, indent=2))

print("\n💡 Interpretation: Subject 01 has 2 sessions, others have 1")

In [None]:
# QC 2: Find subjects with missing anatomical scans
print("🔍 QC Check: Which subjects have anatomical data?")
print("BIQL: SELECT DISTINCT sub WHERE datatype=anat")
print("\n📊 Results:")
anat_subjects = biql.run_query("SELECT DISTINCT sub WHERE datatype=anat")
print(json.dumps(anat_subjects, indent=2))

print("\n🔍 QC Check: All subjects for comparison")
print("BIQL: SELECT DISTINCT sub")
all_subjects = biql.run_query("SELECT DISTINCT sub")
print(json.dumps(all_subjects, indent=2))

# Compare
anat_subs = {r['sub'] for r in anat_subjects}
all_subs = {r['sub'] for r in all_subjects}
missing_anat = all_subs - anat_subs

print(f"\n⚠️  Subjects missing anatomical data: {missing_anat if missing_anat else 'None - all good!'}")

In [None]:
# QC 3: Check for inconsistent acquisition parameters
print("🔍 QC Check: Repetition times by task")
print("BIQL: SELECT task, metadata.RepetitionTime, COUNT(*) WHERE datatype=func GROUP BY task, metadata.RepetitionTime")
print("\n📊 Results:")
results = biql.run_query("SELECT task, metadata.RepetitionTime, COUNT(*) WHERE datatype=func GROUP BY task, metadata.RepetitionTime")
print(json.dumps(results, indent=2))

print("\n💡 Interpretation: Each task has consistent TR across all scans - good!")

## 11. Preprocessing Workflows

BIQL is excellent for generating file lists and parameters for processing pipelines:

In [None]:
# Workflow 1: Get all T1w files for anatomical preprocessing
print("🔄 Preprocessing: T1w files for registration")
print("BIQL: SELECT sub, ses, filepath WHERE suffix=T1w")
print("\n📊 Results:")
results = biql.run_query("SELECT sub, ses, filepath WHERE suffix=T1w")
print(json.dumps(results, indent=2))

print("\n💾 In a real pipeline, you'd save these paths to a file:")
print("   biql 'suffix=T1w' --format paths > t1w_files.txt")

In [None]:
# Workflow 2: Functional preprocessing with matched parameters
print("🔄 Preprocessing: Functional files with acquisition parameters")
print("BIQL: SELECT sub, ses, task, run, filepath, metadata.RepetitionTime WHERE datatype=func")
print("\n📊 Results:")
results = biql.run_query("SELECT sub, ses, task, run, filepath, metadata.RepetitionTime WHERE datatype=func")
print(json.dumps(results[:3], indent=2))
print(f"... and {len(results)-3} more files" if len(results) > 3 else "")

print("\n💡 This gives you both file paths AND the TR needed for processing!")

## 12. Advanced Aggregation Patterns

For complex analyses, BIQL supports advanced aggregation functions:

In [None]:
# Advanced 1: Calculate average TR by task
print("📊 Advanced: Average repetition time by task")
print("BIQL: SELECT task, AVG(metadata.RepetitionTime) WHERE datatype=func GROUP BY task")
print("\n📊 Results:")
results = biql.run_query("SELECT task, AVG(metadata.RepetitionTime) WHERE datatype=func GROUP BY task")
print(json.dumps(results, indent=2))

In [None]:
# Advanced 2: File lists with ARRAY_AGG
print("📊 Advanced: Group functional files by subject")
print("BIQL: SELECT sub, ARRAY_AGG(filename) WHERE datatype=func GROUP BY sub")
print("\n📊 Results:")
results = biql.run_query("SELECT sub, ARRAY_AGG(filename) WHERE datatype=func GROUP BY sub")
print(json.dumps(results, indent=2))

print("\n💡 ARRAY_AGG collects all filenames into arrays - perfect for reconstruction workflows!")

## 13. Filtering Groups with HAVING

Sometimes you want to filter based on group properties, not individual files:

In [None]:
# HAVING example: Find subjects with multiple functional runs
print("🔍 Advanced: Subjects with more than 3 functional scans")
print("BIQL: SELECT sub, COUNT(*) WHERE datatype=func GROUP BY sub HAVING COUNT(*) > 3")
print("\n📊 Results:")
results = biql.run_query("SELECT sub, COUNT(*) WHERE datatype=func GROUP BY sub HAVING COUNT(*) > 3")
print(json.dumps(results, indent=2))

if not results:
    print("\n💡 No subjects have more than 3 functional scans in our small example")
    print("   Let's try >= 2 instead:")
    results = biql.run_query("SELECT sub, COUNT(*) WHERE datatype=func GROUP BY sub HAVING COUNT(*) >= 2")
    print(json.dumps(results, indent=2))

## 14. Sorting Results

Control the order of your results with `ORDER BY`:

In [None]:
# Sorting example
print("📊 Sorting: Functional files ordered by subject and run")
print("BIQL: SELECT sub, task, run, filename WHERE datatype=func ORDER BY sub, run")
print("\n📊 Results:")
results = biql.run_query("SELECT sub, task, run, filename WHERE datatype=func ORDER BY sub, run")
print(json.dumps(results, indent=2))

## 15. Real-World Example: Multi-Site Study QC

Let's put it all together with a comprehensive quality control workflow for a multi-site study:

In [None]:
print("🏥 Multi-Site QC Report")
print("=" * 50)

# 1. Overall dataset stats
print("\n1️⃣ Overall Dataset Statistics:")
df_stats = biql.run_query("SELECT datatype, COUNT(*) GROUP BY datatype", format="dataframe")
print(df_stats)

# 2. Site distribution  
print("\n2️⃣ Participant Distribution by Site:")
df_sites = biql.run_query("SELECT participants.site, COUNT(DISTINCT sub) GROUP BY participants.site", format="dataframe")
print(df_sites)

# 3. Group demographics
print("\n3️⃣ Demographics by Group:")
df_demo = biql.run_query("SELECT participants.group, AVG(participants.age), COUNT(DISTINCT sub) GROUP BY participants.group", format="dataframe")
print(df_demo)

# 4. Task coverage check
print("\n4️⃣ Task Coverage by Subject:")
print(biql.run_query("SELECT sub, task, COUNT(*) WHERE datatype=func GROUP BY sub, task", format="table"))

# 5. Scanner consistency
print("\n5️⃣ Acquisition Parameters Consistency:")
df_params = biql.run_query("SELECT task, metadata.RepetitionTime, COUNT(*) WHERE datatype=func GROUP BY task, metadata.RepetitionTime", format="dataframe")
print(df_params)

print("\n✅ QC Report Complete!")
print("\n💡 Notice how DataFrames make the numerical data much easier to read!")
print("🎉 Both fixes now work: participants.group (reserved keyword) and sub IN [1,2,3] (type coercion)!")

## 16. Performance Tips

For large datasets, these patterns will help you write faster queries:

### ✅ Fast Patterns:
```sql
-- Use DISTINCT for exploration
SELECT DISTINCT task WHERE datatype=func

-- Filter early in WHERE clause
SELECT filepath WHERE sub=01 AND datatype=func AND task=rest
```

### ⚠️ Slower Patterns:
```sql
-- GROUP BY when you only need unique values
SELECT task, COUNT(*) WHERE datatype=func GROUP BY task

-- Complex conditions without entity filters
SELECT filepath WHERE metadata.RepetitionTime>2.0 AND task=rest
```

### 💡 Best Practices:

1. **Filter early**: Put the most selective conditions first
2. **Use entity filters**: `sub=01` is faster than `filename~=".*sub-01.*"`
3. **Limit metadata queries**: Only query metadata when needed
4. **Use appropriate output formats**: `paths` format is faster than `json` for file lists

## 17. Next Steps

Congratulations! You now know the fundamentals of BIQL. Here's how to continue your journey:

### 📚 Further Learning:
- **BIQL Reference**: Complete syntax and function reference
- **BIQL Cookbook**: Copy-paste solutions for common neuroimaging tasks
- **CLI Guide**: Master the command-line interface

### 🛠️ In Your Workflows:
1. **Dataset Exploration**: Use BIQL to understand new datasets quickly
2. **Quality Control**: Build automated QC reports with BIQL queries
3. **Processing Pipelines**: Generate file lists and parameter sets
4. **Data Analysis**: Query demographics and acquisition parameters

### 🌟 Advanced Topics to Explore:
- **QSM Workflows**: Multi-echo sequence reconstruction
- **Longitudinal Studies**: Time-series analysis patterns
- **Multi-Modal Integration**: Combining functional, structural, and diffusion data
- **Derivatives Querying**: Working with processed data outputs

### 💬 Get Help:
- **GitHub Issues**: Report bugs or request features
- **Discussions**: Ask questions and share queries with the community
- **BIDS Community**: Join the broader neuroimaging data standards community

Happy querying! 🧠✨

In [None]:
# Clean up the temporary dataset
import shutil
shutil.rmtree(dataset_path)
print("🧹 Cleaned up example dataset")

print("\n🎉 Congratulations! You now know how to use the new BIQL API:")
print("✅ create_query_engine() for easy setup")
print("✅ biql.run_query() with format options")  
print("✅ DataFrame support for numerical analysis")
print("✅ Multiple output formats for different use cases")
print("✅ Convenient helper methods like dataset_stats()")

print("\n💡 Pro tip: In your own projects, you can now do:")
print('   biql = create_query_engine("/path/to/bids", default_format="dataframe")')
print('   df = biql.run_query("SELECT sub, task, COUNT(*) GROUP BY sub, task")')
print("   # df is now a pandas DataFrame ready for analysis!")