Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dataset info #389

Merged
merged 36 commits into from Jun 2, 2023

Conversation

bruAristimunha
Copy link
Collaborator

Closes #381.

I created this small code to get all the information from the dataset and merge it across datasets. Maybe we can convert it into a function for the library. What do you think, @sylvchev?

Still running P300 and SSVEP paradigms.

import moabb
from moabb.utils import set_download_dir
from moabb.datasets.utils import dataset_search
import mne
import pandas as pd
import os.path
from mne import count_events

#set_download_dir("/workdir/dataset")

base_path = "/mnt/beegfs/projects/moabb/"

set_download_dir(f"{base_path}/mne_data/")

paradigms = {}
paradigms.update({'imagery': moabb.paradigms.MotorImagery()})
paradigms.update({'ssvep': moabb.paradigms.SSVEP()})
paradigms.update({'p300': moabb.paradigms.P300()})

for parad_name, p in paradigms.items():

	dataset_list = dataset_search(paradigm=parad_name)

	metainfo = []

	for dataset in dataset_list:

		dataset_name = str(dataset).split(".")[-1].split(" ")[0]

		path = f"/mnt/beegfs/home/chevallier/metainfo/metainfo_{dataset_name}.csv"

		if not os.path.exists(path):

			print(dataset)

			try:
				
				_, _, metadata = p.get_data(
				dataset, None, return_epochs=False
				)
				subjects = len(metadata['subject'].unique())
				session =  len(metadata['session'].unique())
				runs =  len(metadata['run'].unique())


				X, y, metadata = p.get_data(
				dataset, [1], return_epochs=True
				)

				sfreq = int(X.info['sfreq'])
				nchan = X.info['nchan']

				classes = len(X.event_id)
				epoch_size = X.tmax - X.tmin

				trials_per_events = count_events(X.events)
				total_trials = int(sum(trials_per_events.values()))

				info_dataset = pd.Series([dataset_name, subjects, nchan, classes, trials_per_events, X.event_id, 
							  epoch_size, sfreq, session, runs,
							  session*runs*total_trials*subjects],
							  index=['Dataset', '#Subj', '#Chan', '#Classes', '#Trials_per_subject','trials_ids',
							  		 'Window Size', 'Freq', '#Session', '#Runs', 'Total_trials'])

				info_dataset.to_csv(path)

				metainfo.append(info_dataset)
			except Exception as ex:
					print(f"Error with {dataset}")
					print(f"{ex}")



	df = pd.concat(metainfo, axis=1).T

	df.columns = ['Dataset', '#Subj', '#Chan', '#Classes','#Trials_per_subject','trials_ids', \
				  'Window Size', 'Freq', '#Session', '#Runs', 'Total_trials']

	df.to_csv(f"/mnt/beegfs/home/chevallier/metainfo/metainfo_{parad_name}.csv", index=None)

@sylvchev
Copy link
Member

I created this small code to get all the information from the dataset and merge it across datasets. Maybe we can convert it into a function for the library. What do you think, @sylvchev?

Yes, we could add it in scripts.

No need to extract the run information, so you could use paradigms or moabb.datasets.utils.dataset_search to get the dataset list. Then you could get number of subjects and sessions directly from the dataset object instead of using paradigm.get_data which will take a lot of time to process the data. The part to get sfreq, nchan, ... with paradigm is good.

@bruAristimunha
Copy link
Collaborator Author

Done @sylvchev!

docs/source/dataset_summary.rst Outdated Show resolved Hide resolved
docs/source/dataset_summary.rst Outdated Show resolved Hide resolved
bruAristimunha and others added 3 commits May 31, 2023 18:03
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Copy link
Member

@sylvchev sylvchev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice script!
It is a good idea to add the run, the information is not used yet but it could be useful later

docs/source/dataset_summary.rst Outdated Show resolved Hide resolved
moabb/datasets/gigadb.py Outdated Show resolved Hide resolved
scripts/generating_metainfo.py Outdated Show resolved Hide resolved
scripts/generating_metainfo.py Outdated Show resolved Hide resolved
scripts/generating_metainfo.py Show resolved Hide resolved
scripts/generating_metainfo.py Outdated Show resolved Hide resolved
scripts/generating_metainfo.py Outdated Show resolved Hide resolved
scripts/generating_metainfo.py Outdated Show resolved Hide resolved
scripts/generating_metainfo.py Outdated Show resolved Hide resolved
scripts/generating_metainfo.py Outdated Show resolved Hide resolved
bruAristimunha and others added 13 commits June 2, 2023 12:40
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
bruAristimunha and others added 2 commits June 2, 2023 12:43
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
bruAristimunha and others added 8 commits June 2, 2023 12:44
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
Co-authored-by: Sylvain Chevallier <sylvain.chevallier@universite-paris-saclay.fr>
@bruAristimunha bruAristimunha merged commit 2938fcc into NeuroTechX:develop Jun 2, 2023
7 checks passed
@bruAristimunha
Copy link
Collaborator Author

Thank you for the review @sylvchev and @carraraig!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Correct number of subjects in datasets (BNCI at least)
3 participants