# Filtering records
Our database can be retrieved as Pandas dataframe and then filtered. This is a bit tricky because types, tags, urls and authors are stored as lists of strings. This notebook demonstrate how to identify records with given tags AND of specified types.

In [2]:
import sys
sys.path.append("../scripts/")
from generate_link_lists import load_dataframe
import pandas as pd

First, we load all datasets as dataframe.

In [3]:
df = load_dataframe("../resources/")
df.head(2)

Unnamed: 0,authors,name,proficiency_level,tags,type,url,license,event_date,event_location,description,num_downloads,publication_date,fingerprint,author,submission_date
0,[Elisabeth Kugler],Sharing Your Poster on Figshare: A Community G...,novice,"[Sharing, Research Data Management]",[Blog Post],https://focalplane.biologists.com/2023/07/26/s...,,,,,,,,,
1,[Marcelo Zoccoler],Running Deep-Learning Scripts in the BiA-PoL O...,proficient,"[Python, Artificial Intelligence, Bioimage Ana...",[Blog Post],https://biapol.github.io/blog/marcelo_zoccoler...,CC-BY-4.0,,,,,,,,


At the time of executing this notebook, these many entries are stored:

In [4]:
len(df)

549

In the following, we will identify all records that contain certain types and tags.

In [18]:
tags_to_find = ['Research Data Management']
types_to_find = ['Video']

filtered_df_all = df[df['tags'].notna() & df['type'].notna()].copy()  # Remove NaN entries first
filtered_df_all = filtered_df_all[filtered_df_all['tags'].apply(lambda x: all(tag in x for tag in tags_to_find))]
filtered_df_all = filtered_df_all[filtered_df_all['type'].apply(lambda x: all(typ in x for typ in types_to_find))]

filtered_df_all.head()

Unnamed: 0,authors,name,proficiency_level,tags,type,url,license,event_date,event_location,description,num_downloads,publication_date,fingerprint,author,submission_date
82,"[Christian Schmidt, Michele Bortolomeazzi, Tom...","I3D:bio's OMERO training material: Re-usable, ...",advanced beginner,"[OMERO, Research Data Management, Nfdi4Bioimag...","[Slides, Video]","[https://zenodo.org/records/8323588, https://w...",CC-BY-4.0,,,The open-source software OME Remote Objects (O...,3717.0,2023-11-13,,,
225,[Susanne Kunis],Structuring of Data and Metadata in Bioimaging...,advanced beginner,"[Nfdi4Bioimage, Research Data Management]",[Video],"[https://zenodo.org/record/7018929, https://do...",CC-BY-4.0,,,guided walkthrough of poster at https://doi.or...,26.0,2022-08-24,,,
296,,RDM4mic,advanced beginner,"[Research Data Management, OMERO]","[Collection, Video]",[https://www.youtube.com/@RDM4mic],UNKNOWN,,,,,,,,
297,,FAIR BioImage Data,advanced beginner,"[Research Data Management, Fair, Bioimage Anal...","[Collection, Video]",[https://www.youtube.com/watch?v=8zd4KTy-oYI&l...,CC-BY-4.0,,,,,,,,
348,,Submitting data to the BioImage Archive,advanced beginner,[Research Data Management],"[Tutorial, Video]",https://www.ebi.ac.uk/bioimage-archive/submit/,CC0-1.0,,,"To submit, you’ll need to register an account,...",,,,,


In [22]:
for u in filtered_df_all["url"]:
    print(u)

['https://zenodo.org/records/8323588', 'https://www.youtube.com/playlist?list=PL2k-L-zWPoR7SHjG1HhDIwLZj0MB_stlU', 'https://doi.org/10.5281/zenodo.8323588']
['https://zenodo.org/record/7018929', 'https://doi.org/10.5281/zenodo.7018929']
['https://www.youtube.com/@RDM4mic']
['https://www.youtube.com/watch?v=8zd4KTy-oYI&list=PLW-oxncaXRqU4XqduJzwFHvWLF06PvdVm']
https://www.ebi.ac.uk/bioimage-archive/submit/


In [None]:
print()