# The Natural Scenes Dataset (NSD)

From [naturalscenesdataset.org](https://naturalscenesdataset.org):
```text
The Natural Scenes Dataset (NSD) is a large-scale fMRI dataset conducted at ultra-high-field (7T) strength at the Center of Magnetic Resonance Research (CMRR) at the University of Minnesota. The dataset consists of whole-brain, high-resolution (1.8-mm isotropic, 1.6-s sampling rate) fMRI measurements of 8 healthy adult subjects while they viewed thousands of color natural scenes over the course of 30–40 scan sessions. While viewing these images, subjects were engaged in a continuous recognition task in which they reported whether they had seen each given image at any point in the experiment. These data constitute a massive benchmark dataset for computational models of visual representation and cognition, and can support a wide range of scientific inquiry.
```

The NSD was spear-headed by Kendrick Kay at the University of Minnesota, and is one of the best datasets for examining in-depth visual function due to the large quantity of scan time for each participant.

The NSD is hosted on Amazon S3, so we will access it using the `S3Path` and `S3Client` interface of `cloudpathlib`. The NSD does not require authentication for access, so we do not need to load or name our credentials.

In [2]:
# import packages
from cloudpathlib import S3Path, S3Client
from pathlib import Path

In [3]:
from utilities import ls, crawl 
from cloudpathlib import CloudPath, S3Client
import nibabel as nib

In [4]:
# Set up our cache path:
cache_path = Path('/home/jovyan/cache')
if not cache_path.exists():
    cache_path.mkdir()

# Create the root S3Path for the NSD:
nsd_base_path = S3Path(
    's3://natural-scenes-dataset/',
    client=S3Client(
        no_sign_request=True,
        local_cache_dir=cache_path))

Let's look around inside of the NSD S3 bucket...

## Loading an NSD file: The Coefficient of Determination (or Variance Explained)

We can load a nifti file from the NSD using the cloudpath object, and cloudpath's ability to convert a cloudpath into a local filepath (by using `cloud_path_object.fspath`). The NSD is not in BIDS format, but its data are organized relatively intuitively. The directory `ppdata` in particular holeds the preprocessed data for the subjects. Here we will extract the coefficient of determination (also called the r-squared or the percentage of variance explained) of the population receptive field (PRF) models for subject 1.

### Loading Subjects using neuropythy

Because the NSD is on S3, neuropythy can also access it and load FreeSurfer subjects from it. To do this, we can simply direct neuropythy to the S3 FreeSurfer subject directory (note that this will not share cache with the `cloudpathlib` library).

In [88]:
def load_beta_session(subject_id='', resolution='func1mm', session_num=5, nsd_base_path = nsd_base_path):
    filename = f'betas_session{session_num:02d}.nii.gz'  # zero-padded
    file_path = nsd_base_path / 'nsddata_betas' / 'ppdata' / subject_id / resolution / 'betas_fithrf_GLMdenoise_RR' / filename
    print(f"Loading {file_path}")
    img = nib.load(file_path.fspath)
    print(f"Loaded image shape: {img.shape}")
    return img
def load_all_sessions(subject_id='subj01', resolution='func1mm', n_sessions=40):
    all_sessions = {}
    for session_num in range(1, 3):
        try:
            img = load_beta_session(subject_id, resolution, session_num)
            dat = img.get_fdata()
            all_sessions[session_num] = dat
        except Exception as e:
            print(f"Failed to load session {session_num}: {e}")
    return all_sessions

# Load all 40 sessions for subject 'subj01'
all_session_images = load_all_sessions()

Loading s3://natural-scenes-dataset/nsddata_betas/ppdata/subj01/func1mm/betas_fithrf_GLMdenoise_RR/betas_session01.nii.gz
Loaded image shape: (145, 186, 148, 750)
Failed to load session 1: [Errno 2] No such file or directory: '/tmp/tmp61mkgbkt/natural-scenes-dataset/nsddata_betas/ppdata/subj01/func1mm/betas_fithrf_GLMdenoise_RR/betas_session01.nii.gz'
Loading s3://natural-scenes-dataset/nsddata_betas/ppdata/subj01/func1mm/betas_fithrf_GLMdenoise_RR/betas_session02.nii.gz
Loaded image shape: (145, 186, 148, 750)
Failed to load session 2: [Errno 2] No such file or directory: '/tmp/tmpciketbk_/natural-scenes-dataset/nsddata_betas/ppdata/subj01/func1mm/betas_fithrf_GLMdenoise_RR/betas_session02.nii.gz'


In [1]:
# loading subjects from neuropathy
# Import neuropythy
import neuropythy as ny

# Tell neuropythy where we want to keep cache data.
ny.config['data_cache_root'] = '/tmp/cache'

In [7]:
# Where is the FreeSurfer data for a subject?

ls(nsd_base_path / 'nsddata' / 'freesurfer' / 'subj01')

[S3Path('s3://natural-scenes-dataset/nsddata/freesurfer/subj01/label'),
 S3Path('s3://natural-scenes-dataset/nsddata/freesurfer/subj01/mri'),
 S3Path('s3://natural-scenes-dataset/nsddata/freesurfer/subj01/scripts'),
 S3Path('s3://natural-scenes-dataset/nsddata/freesurfer/subj01/stats'),
 S3Path('s3://natural-scenes-dataset/nsddata/freesurfer/subj01/surf'),
 S3Path('s3://natural-scenes-dataset/nsddata/freesurfer/subj01/tmp'),
 S3Path('s3://natural-scenes-dataset/nsddata/freesurfer/subj01/touch')]

In [None]:
# for each session, apply MTL mask


In [None]:
# concatenate sessions

In [None]:
# load session file, select trial number [no idea how to select]
# do i just filter like numbers?

In [None]:
# what's the format needed for MVPA?

In [4]:
# load 1 beta 
nsd_base_path = CloudPath(
        's3://natural-scenes-dataset/',
        S3Client(no_sign_request=True, local_cache_dir = '')
    )

In [5]:
session_num = 1

In [6]:
subject_id = 'subj01'

In [7]:
resolution = 'func1mm'

In [None]:
filename = f'betas_session{session_num:02d}.nii.gz'  # zero-padded
file_path = nsd_base_path / 'nsddata_betas' / 'ppdata' / subject_id / resolution / 'betas_fithrf_GLMdenoise_RR' / filename
print(f"Loading {file_path}")
img = nib.load(file_path.fspath)

Loading s3://natural-scenes-dataset/nsddata_betas/ppdata/subj01/func1mm/betas_fithrf_GLMdenoise_RR/betas_session01.nii.gz


In [None]:
# apply MTL mask to img
roi = ['MTL']
mask_MTL = nsd_base_path / 'nsddata_betas' / 'ppdata' / subject_id / resolution / 'roi' / f'{roi[0]}.nii.gz' 

In [65]:
# Import neuropythy
import neuropythy as ny

# Tell neuropythy where we want to keep cache data.
ny.config['data_cache_root'] = '/home/jovyan/cache'

In [None]:
# Ask neuropythy to load a FreeSurfer subject:
sub = ny.freesurfer_subject(
    's3://natural-scenes-dataset/nsddata/freesurfer/subj01')

In [58]:
# load behavioral data for first subject
behav_subj = nsd_base_path / 'nsddata' / 'ppdata' / subj / 'behav' / 'responses.tsv'


In [54]:
import pandas as pd

In [55]:
behav = pd.read_csv(behav_subj, sep="\t")

In [56]:
behav

Unnamed: 0,SUBJECT,SESSION,RUN,TRIAL,73KID,10KID,TIME,ISOLD,ISCORRECT,RT,CHANGEMIND,MEMORYRECENT,MEMORYFIRST,ISOLDCURRENT,ISCORRECTCURRENT,TOTAL1,TOTAL2,BUTTON,MISSINGDATA
0,1,1,1,1,46003,626,0.505082,0,1.0,803.529781,0.0,,,0,1.0,1,0,1.0,0
1,1,1,1,2,61883,5013,0.505128,0,1.0,972.261383,0.0,,,0,1.0,1,0,1.0,0
2,1,1,1,3,829,4850,0.505175,0,1.0,742.351236,0.0,,,0,1.0,1,0,1.0,0
3,1,1,1,4,67574,8823,0.505221,0,1.0,747.518479,0.0,,,0,1.0,1,0,1.0,0
4,1,1,1,5,16021,9538,0.505267,0,1.0,547.422774,0.0,,,0,1.0,1,0,1.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29995,1,40,12,58,13774,8984,262.629551,1,0.0,1275.300175,0.0,20963.0,21540.0,0,1.0,1,0,1.0,0
29996,1,40,12,59,66768,6026,262.629597,1,1.0,661.379768,0.0,16.0,17622.0,1,1.0,0,1,2.0,0
29997,1,40,12,60,53168,4841,262.629644,1,1.0,786.811781,0.0,9483.0,11912.0,0,0.0,0,1,2.0,0
29998,1,40,12,61,1944,7323,262.629690,1,1.0,502.626801,0.0,83.0,12162.0,1,1.0,0,1,2.0,0


In [57]:
beta_data = beta_img.get_fdata()

Unnamed: 0,SUBJECT,SESSION,RUN,TRIAL,73KID,10KID,TIME,ISOLD,ISCORRECT,RT,CHANGEMIND,MEMORYRECENT,MEMORYFIRST,ISOLDCURRENT,ISCORRECTCURRENT,TOTAL1,TOTAL2,BUTTON,MISSINGDATA
0,1,1,1,1,46003,626,0.505082,0,1.0,803.529781,0.0,,,0,1.0,1,0,1.0,0
1,1,1,1,2,61883,5013,0.505128,0,1.0,972.261383,0.0,,,0,1.0,1,0,1.0,0
2,1,1,1,3,829,4850,0.505175,0,1.0,742.351236,0.0,,,0,1.0,1,0,1.0,0
3,1,1,1,4,67574,8823,0.505221,0,1.0,747.518479,0.0,,,0,1.0,1,0,1.0,0
4,1,1,1,5,16021,9538,0.505267,0,1.0,547.422774,0.0,,,0,1.0,1,0,1.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29995,1,40,12,58,13774,8984,262.629551,1,0.0,1275.300175,0.0,20963.0,21540.0,0,1.0,1,0,1.0,0
29996,1,40,12,59,66768,6026,262.629597,1,1.0,661.379768,0.0,16.0,17622.0,1,1.0,0,1,2.0,0
29997,1,40,12,60,53168,4841,262.629644,1,1.0,786.811781,0.0,9483.0,11912.0,0,0.0,0,1,2.0,0
29998,1,40,12,61,1944,7323,262.629690,1,1.0,502.626801,0.0,83.0,12162.0,1,1.0,0,1,2.0,0


In [80]:
all_session_images

{1: <nibabel.nifti1.Nifti1Image at 0x7f7b8221df50>,
 2: <nibabel.nifti1.Nifti1Image at 0x7f7b7b1749d0>,
 3: <nibabel.nifti1.Nifti1Image at 0x7f7b832e9b90>,
 4: <nibabel.nifti1.Nifti1Image at 0x7f7b7b3b9790>,
 5: <nibabel.nifti1.Nifti1Image at 0x7f7b7ba79610>,
 6: <nibabel.nifti1.Nifti1Image at 0x7f7b7bf777d0>,
 7: <nibabel.nifti1.Nifti1Image at 0x7f7b78554d10>,
 8: <nibabel.nifti1.Nifti1Image at 0x7f7b7bfb8410>,
 9: <nibabel.nifti1.Nifti1Image at 0x7f7b7b492310>,
 10: <nibabel.nifti1.Nifti1Image at 0x7f7b7bc03450>,
 11: <nibabel.nifti1.Nifti1Image at 0x7f7b7af99110>,
 12: <nibabel.nifti1.Nifti1Image at 0x7f7b797ec950>,
 13: <nibabel.nifti1.Nifti1Image at 0x7f7b813efcd0>,
 14: <nibabel.nifti1.Nifti1Image at 0x7f7b7b8fdad0>,
 15: <nibabel.nifti1.Nifti1Image at 0x7f7b7bc8cb10>,
 16: <nibabel.nifti1.Nifti1Image at 0x7f7bd03b5a50>,
 17: <nibabel.nifti1.Nifti1Image at 0x7f7b7b25d190>,
 18: <nibabel.nifti1.Nifti1Image at 0x7f7b78692050>,
 19: <nibabel.nifti1.Nifti1Image at 0x7f7b7bbb7750>,
 2

In [65]:
all_session_images.keys()

dict_keys([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40])

In [66]:
sorted_imgs = [all_session_images[sess] for sess in sorted(all_session_images.keys())]

In [68]:
type(sorted_imgs)

list

In [85]:
type(all_session_images[1])

nibabel.nifti1.Nifti1Image

In [None]:
#Column 8 (ISOLD) is 0 (the image is novel) or 1 (the image is old).
# Column 9 (ISCORRECT) is 0 (subject’s response was incorrect) or 1 (subject’s response was correct).
#  Column 10 (RT) is the reaction time in milliseconds (time between trial start time and button-press time).
# Column 13 (MEMORYFIRST) is the number of stimulus trials in between current and second most recent presentation. If there has been only one previous presentation, this is NaN.
# 

In [64]:
behav

Unnamed: 0,SUBJECT,SESSION,RUN,TRIAL,73KID,10KID,TIME,ISOLD,ISCORRECT,RT,CHANGEMIND,MEMORYRECENT,MEMORYFIRST,ISOLDCURRENT,ISCORRECTCURRENT,TOTAL1,TOTAL2,BUTTON,MISSINGDATA
0,1,1,1,1,46003,626,0.505082,0,1.0,803.529781,0.0,,,0,1.0,1,0,1.0,0
1,1,1,1,2,61883,5013,0.505128,0,1.0,972.261383,0.0,,,0,1.0,1,0,1.0,0
2,1,1,1,3,829,4850,0.505175,0,1.0,742.351236,0.0,,,0,1.0,1,0,1.0,0
3,1,1,1,4,67574,8823,0.505221,0,1.0,747.518479,0.0,,,0,1.0,1,0,1.0,0
4,1,1,1,5,16021,9538,0.505267,0,1.0,547.422774,0.0,,,0,1.0,1,0,1.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29995,1,40,12,58,13774,8984,262.629551,1,0.0,1275.300175,0.0,20963.0,21540.0,0,1.0,1,0,1.0,0
29996,1,40,12,59,66768,6026,262.629597,1,1.0,661.379768,0.0,16.0,17622.0,1,1.0,0,1,2.0,0
29997,1,40,12,60,53168,4841,262.629644,1,1.0,786.811781,0.0,9483.0,11912.0,0,0.0,0,1,2.0,0
29998,1,40,12,61,1944,7323,262.629690,1,1.0,502.626801,0.0,83.0,12162.0,1,1.0,0,1,2.0,0


In [None]:
# concatenate all dict into one file- for nifti