# Imaging Technology Innovation Stages  

> Methods to extract and model how imaging technology evolves.

## Modeling Technological Evolution and Innovation

This notebook is concerned with building a digital library of publications derived from four subdisciplines of biomedical imaging:

1. Cryo-Electron Tomography
2. Volume Electron Microscopy
3. Hiercharchy Phase Contrast Tomography
4. Photoacoustic Imaging


### Python Imports

Setting python imports, environment variables, and other crucial set up parameters here.  

In [None]:

from alhazen.core import get_langchain_chatmodel, MODEL_TYPE
from alhazen.agent import AlhazenAgent

from alhazen.schema_sqla import *
from alhazen.tools.basic import AddCollectionFromEPMCTool, DeleteCollectionTool
from alhazen.tools.paperqa_emulation_tool import PaperQAEmulationTool
from alhazen.tools.metadata_extraction_tool import * 
from alhazen.tools.protocol_extraction_tool import *
from alhazen.tools.tiab_classifier_tool import *
from alhazen.toolkit import *
from alhazen.utils.jats_text_extractor import NxmlDoc
from alhazen.utils.jats_text_extractor import NxmlDoc
from alhazen.utils.ceifns_db import Ceifns_LiteratureDb, create_ceifns_database, drop_ceifns_database, restore_ceifns_database
from alhazen.utils.searchEngineUtils import *

from langchain.callbacks.tracers import ConsoleCallbackHandler
from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores.pgvector import PGVector
from langchain_community.chat_models.ollama import ChatOllama
from langchain_google_vertexai import ChatVertexAI
from langchain_openai import ChatOpenAI

from bs4 import BeautifulSoup,Tag,Comment,NavigableString
from databricks import sql
from datetime import datetime
from importlib_resources import files
import os
import pandas as pd
from pathlib import Path
import re
import requests

from sqlalchemy import create_engine, exists, func, or_, and_, not_, desc, asc
from sqlalchemy.orm import sessionmaker, aliased

from time import time,sleep
from tqdm import tqdm
from urllib.request import urlopen
from urllib.parse import quote_plus, quote, unquote
from urllib.error import URLError, HTTPError
import yaml

In [None]:
# Using Aliases like this massively simplifies the use of SQLAlchemy
IR = aliased(InformationResource)

SKC = aliased(ScientificKnowledgeCollection)
SKC_HM = aliased(ScientificKnowledgeCollectionHasMembers)
SKE = aliased(ScientificKnowledgeExpression)
SKE_XREF = aliased(ScientificKnowledgeExpressionXref)
SKE_IRI = aliased(ScientificKnowledgeExpressionIri)
SKE_HR = aliased(ScientificKnowledgeExpressionHasRepresentation)
SKE_MO = aliased(ScientificKnowledgeExpressionMemberOf)
SKI = aliased(ScientificKnowledgeItem)
SKI_HP = aliased(ScientificKnowledgeItemHasPart)
SKF = aliased(ScientificKnowledgeFragment)

N = aliased(Note)
NIA = aliased(NoteIsAbout)
SKC_HN = aliased(ScientificKnowledgeCollectionHasNotes)
SKE_HN = aliased(ScientificKnowledgeExpressionHasNotes)
SKI_HN = aliased(ScientificKnowledgeItemHasNotes)
SKF_HN = aliased(ScientificKnowledgeFragmentHasNotes)

### Environment Variables

Remember to set environmental variables for this code:

* `ALHAZEN_DB_NAME` - the name of the PostGresQL database you are storing information into
* `LOCAL_FILE_PATH` - the location on disk where you save temporary files, downloaded models or other data.   

In [None]:
os.environ['ALHAZEN_DB_NAME'] = 'imaging_tech_innovation'
os.environ['LOCAL_FILE_PATH'] = '/users/gully.burns/alhazen/'

In [None]:
if os.path.exists(os.environ['LOCAL_FILE_PATH']) is False:
    os.makedirs(os.environ['LOCAL_FILE_PATH'])
    
if os.environ.get('ALHAZEN_DB_NAME') is None: 
    raise Exception('Which database do you want to use for this application?')
db_name = os.environ['ALHAZEN_DB_NAME']

if os.environ.get('LOCAL_FILE_PATH') is None: 
    raise Exception('Where are you storing your local literature database?')
loc = os.environ['LOCAL_FILE_PATH']

### Setup utils, agents, and tools 

In [None]:
ldb = Ceifns_LiteratureDb(loc=loc, name=db_name)
llm = ChatOllama(model='mixtral:instruct') 
llm2 = ChatOpenAI(model='gpt-4-1106-preview') 
llm3 = ChatOpenAI(model='gpt-3.5-turbo') 
#llm3 = ChatVertexAI(model_name="gemini-pro", convert_system_message_to_human=True)

cb = AlhazenAgent(llm2, llm2)
print('AGENT TOOLS')
for t in cb.tk.get_tools():
    print('\t'+type(t).__name__)

AGENT TOOLS
	AddCollectionFromEPMCTool
	AddAuthorsToCollectionTool
	DescribeCollectionCompositionTool
	DeleteCollectionTool
	RetrieveFullTextTool
	RetrieveFullTextToolForACollection
	MetadataExtraction_EverythingEverywhere_Tool
	SimpleExtractionWithRAGTool
	PaperQAEmulationTool
	ProcotolEntitiesExtractionTool
	CheckExpressionTool
	TitleAbstractClassifier_OneDocAtATime_Tool


## Building the database


### Scripts to Build / Delete the database

If you need to restore a deleted database from backup, use the following shell commands:

```
$ createdb em_tech
$ psql -d em_tech -f /local/file/path/em_tech/backup<date_time>.sql
```

In [None]:
drop_ceifns_database(os.environ['ALHAZEN_DB_NAME'])

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Database has been backed up to /users/gully.burns/alhazen/imaging_tech_innovation/backup2024-03-04-23-34-21.sql
Database has been dropped successfully !!


In [None]:
create_ceifns_database(os.environ['ALHAZEN_DB_NAME'])

100%|██████████| 310/310 [00:00<00:00, 3467.45it/s]


### Build CEIFNS database from queries

#### Run queries on European PMC based on innovation categories 

Here we build general corpora across the categories of interest. 

* Hierarchical phase-contrast tomography
* Cryo-Electron Tomography
* Volume Electron Microscopy
* Photoacoustic imaging

In [None]:
import local_resources.queries.imaging_tech as imaging_tech
from alhazen.utils.queryTranslator import QueryTranslator, QueryType

cols_to_include = ['ID', 'CORPUS_NAME', 'QUERY']
df = pd.read_csv(files(imaging_tech).joinpath('imaging_tech.tsv'), sep='\t', )
df = df.drop(columns=[c for c in df.columns if c not in cols_to_include])
df

Unnamed: 0,ID,CORPUS_NAME,QUERY
0,1,Hierarchical phase-contrast tomography,Hierarchical phase-contrast tomography | HIP-C...
1,2,Cryo-Electron Tomography,Cryoelectron Tomography | Cryo Electron Tomogr...
2,3,Volume Electron Microscopy,Volume Electron Microscopy | Volume EM | (seri...
3,4,Photoacoustic imaging,Photoacoustic imaging | Photoacoustic microscopy


In [None]:
qt = QueryTranslator(df.sort_values('ID'), 'ID', 'QUERY', 'CORPUS_NAME')
(corpus_ids, epmc_queries) = qt.generate_queries(QueryType.epmc, sections=['TITLE_ABS', 'METHODS'])
corpus_names = df['CORPUS_NAME']

addEMPCCollection_tool = [t for t in cb.tk.get_tools() if isinstance(t, AddCollectionFromEPMCTool)][0]
for (id, name, query) in zip(corpus_ids, corpus_names, epmc_queries):
    addEMPCCollection_tool.run(tool_input={'id': id, 'name':name, 'query':query, 'full_text':False})

100%|██████████| 4/4 [00:00<00:00, 7533.55it/s]
100%|██████████| 4/4 [00:00<00:00, 3442.19it/s]


https://www.ebi.ac.uk/europepmc/webservices/rest/search?format=JSON&pageSize=1000&synonym=TRUE&resultType=core&query=((TITLE_ABS:"Hierarchical phase-contrast tomography" OR METHODS:"Hierarchical phase-contrast tomography") OR (TITLE_ABS:"HIP-CT" OR METHODS:"HIP-CT") OR (TITLE_ABS:"Hierarchical phase contrast tomography" OR METHODS:"Hierarchical phase contrast tomography")), 143 European PMC PAPERS FOUND


100%|██████████| 1/1 [00:03<00:00,  3.54s/it]


 Returning 135


100%|██████████| 135/135 [00:00<00:00, 507.06it/s]


https://www.ebi.ac.uk/europepmc/webservices/rest/search?format=JSON&pageSize=1000&synonym=TRUE&resultType=core&query=((TITLE_ABS:"Cryoelectron Tomography" OR METHODS:"Cryoelectron Tomography") OR (TITLE_ABS:"Cryo Electron Tomography" OR METHODS:"Cryo Electron Tomography") OR (TITLE_ABS:"Cryo-Electron Tomography" OR METHODS:"Cryo-Electron Tomography") OR (TITLE_ABS:"Cryo-ET" OR METHODS:"Cryo-ET") OR (TITLE_ABS:"CryoET" OR METHODS:"CryoET")), 2581 European PMC PAPERS FOUND


100%|██████████| 3/3 [00:55<00:00, 18.45s/it]


 Returning 2558


100%|██████████| 2558/2558 [00:06<00:00, 375.25it/s]


https://www.ebi.ac.uk/europepmc/webservices/rest/search?format=JSON&pageSize=1000&synonym=TRUE&resultType=core&query=((TITLE_ABS:"Volume Electron Microscopy" OR METHODS:"Volume Electron Microscopy") OR (TITLE_ABS:"Volume EM" OR METHODS:"Volume EM") OR (TITLE_ABS:"multibeam SEM" OR METHODS:"multibeam SEM") OR (TITLE_ABS:"FAST-SEM" OR METHODS:"FAST-SEM") OR ((TITLE_ABS:"serial section" OR METHODS:"serial section") AND ((TITLE_ABS:"electron microscopy" OR METHODS:"electron microscopy") OR (TITLE_ABS:"EM" OR METHODS:"EM") OR (TITLE_ABS:"transmission electron microscopy" OR METHODS:"transmission electron microscopy") OR (TITLE_ABS:"TEM" OR METHODS:"TEM") OR (TITLE_ABS:"scanning electron microscopy" OR METHODS:"scanning electron microscopy") OR (TITLE_ABS:"SEM" OR METHODS:"SEM") OR (TITLE_ABS:"electron tomography" OR METHODS:"electron tomography"))) OR ((TITLE_ABS:"serial block-face" OR METHODS:"serial block-face") AND ((TITLE_ABS:"scanning electron microscopy" OR METHODS:"scanning electron 

100%|██████████| 7/7 [02:28<00:00, 21.18s/it]


 Returning 6820


100%|██████████| 6820/6820 [00:41<00:00, 164.39it/s]


https://www.ebi.ac.uk/europepmc/webservices/rest/search?format=JSON&pageSize=1000&synonym=TRUE&resultType=core&query=((TITLE_ABS:"Photoacoustic imaging" OR METHODS:"Photoacoustic imaging") OR (TITLE_ABS:"Photoacoustic microscopy" OR METHODS:"Photoacoustic microscopy")), 4600 European PMC PAPERS FOUND


100%|██████████| 5/5 [00:59<00:00, 11.84s/it]


 Returning 4478


100%|██████████| 4478/4478 [00:17<00:00, 257.83it/s]


#### Run queries on known lists of papers from CZI grantees on the four imaging innovation categories 

Here we seach pre-developed lists of papers from CZI grantee's work, indexed in a local file: `./local_resources/queries/imaging_tech/grantee_dois.json`

In [None]:
with open(files(imaging_tech).joinpath('grantee_dois.json'), 'r') as f:
    dict_lists = json.load(f)

addEMPCCollection_tool = [t for t in cb.tk.get_tools() if isinstance(t, AddCollectionFromEPMCTool)][0]
for i, k in enumerate(dict_lists.keys()):
    query = ' OR '.join(['doi:"'+d_id+'"' for d_id in dict_lists[k] ])
    print('%s: Searching for %d'%(k, len(dict_lists[k])))
    addEMPCCollection_tool.run(tool_input={'id': str(5+i), 'name': k + ' (grantees)', 'query':query})


Cryo-Electron Tomography: Searching for 23
Volume Electron Microscopy: Searching for 12
Hierarchical phase-contrast tomography: Searching for 14
Photoacoustic imaging: Searching for 26


## Analyze Collections

In [None]:
q = ldb.session.query(SKC.id, SKC.name, SKE.id, SKI.type) \
        .filter(SKC.id==SKC_HM.ScientificKnowledgeCollection_id) \
        .filter(SKC_HM.has_members_id==SKE.id) \
        .filter(SKE.id==SKE_HR.ScientificKnowledgeExpression_id) \
        .filter(SKE_HR.has_representation_id==SKI.id) 
df = pd.DataFrame(q.all(), columns=['id', 'collection name', 'doi', 'item type'])    
df.pivot_table(index=['id', 'collection name'], columns='item type', values='doi', aggfunc=lambda x: len(x.unique()))

Unnamed: 0_level_0,item type,CitationRecord
id,collection name,Unnamed: 2_level_1
0,Imaging Program,752
1,Hierarchical phase-contrast tomography,135
2,Cryo-Electron Tomography,2556
3,Volume Electron Microscopy,6817
4,Photoacoustic imaging,4477
5,Cryo-Electron Tomography (grantees),20
6,Volume Electron Microscopy (grantees),11
7,Hierarchical phase-contrast tomography (grantees),12
8,Photoacoustic imaging (grantees),23


In [None]:
cb.agent_executor.invoke({'input':'Drop collection with id="0"'})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
    "action": "delete_collection",
    "action_input": {
        "collection_id": "0"
    }
}[0m[36;1m[1;3m{'response': 'Successfully deleted a collection with collection_id:`0`.'}[0m
[32;1m[1;3m[0m

[1m> Finished chain.[0m


{'input': 'Drop collection with id="0"',
 'output': {'response': 'Successfully deleted a collection with collection_id:`0`.'},
 'intermediate_steps': [(AgentAction(tool='delete_collection', tool_input={'collection_id': '0'}, log='{\n    "action": "delete_collection",\n    "action_input": {\n        "collection_id": "0"\n    }\n}'),
   {'response': 'Successfully deleted a collection with collection_id:`0`.'})]}

In [None]:
with open(files(imaging_tech).joinpath('dois.txt'), 'r') as f:
    dois = f.readlines()
dois = [d.strip() for d in dois]
print(len(dois))

806


In [None]:
from alhazen.utils.searchEngineUtils import load_paper_from_openalex, read_references_from_openalex 
from pyalex import config, Works, Work
config.email = "gully.burns@chanzuckerberg.com"

ldb.add_collection_from_dois_using_openalex('0', 'Imaging Program', dois, commit_this=True)

100%|██████████| 806/806 [04:04<00:00,  3.30it/s]


In [None]:
# Run local analysis on program data

addEMPCCollection_tool = [t for t in cb.tk.get_tools() if isinstance(t, AddCollectionFromEPMCTool)][0]
step = 40
for start_i in range(0, len(dois), step):
    query = ' OR '.join(['doi:\"'+dois[i].lower()+'\"' for i in range(start_i, start_i+step) if i < len(dois)])
    addEMPCCollection_tool.run({'id': '0', 'name':'Imaging Program', 'query':query, 'full_text':True})


In [None]:
missing_dois = []
for doi in dois:
    c = ldb.session.query(func.count(SKE.id)).filter(SKE.id=='doi:'+doi.lower()).first()
    if c[0] == 0:
        missing_dois.append(doi)
print('Missing %d DOIs'%(len(missing_dois)))
print(missing_dois)

Missing 34 DOIs
['10.3389%2Ffmed.2022.849677', '10.48550/arXiv.2210.04033', '10.3389%2Ffnins.2023.1135494', '10.48550/arXiv.2308.00870', '10.48550/arXiv.2307.14572', '10.48550/arXiv.2107.09145', '10.1038/s41598-021-94852-4', '10.1172%2Fjci.insight.142945', '10.1016/j.biopha.2022', '10.6084/m9.figshare.12758957.v1', '10.6084/m9.figshare.12299915.v2', '10.1126/sciadv.aaz2598.', '10.5281/ZENODO.3901011', '10.1152/ajpendo.00501.2018.', '10.48550/arXiv.2008.00807', '10.1136/ jitc-2022-006133', '10.1101/2020.05.27/119750', '10.1038/s41592-021- 01156-w', '0.1021/acschembio.0c00988', '10.1126/sciimmunol.abm693', '10.1038/s41598-021-85036-w', '10.1016%2Fj.ijcha.2020.100672', '10.1017/S2633903X2300003X[Opens in a new window]', '10.1016%2Fj.pacs.2021.100276', '10.1117%2F1.JBO.29.S1.S11521', '10.22443/rms.mmc2023.274', '10.5281/zenodo.10200758', '10.48550/arXiv.2306.15898', '10.48550/arXiv.2311.13417', '10.5281/zenodo.10451511', '10.5281/zenodo.10685021', '10.5281/zenodo.10057023', '10.5281/zenodo

In [None]:
step = 40
addEMPCCollection_tool = [t for t in cb.tk.get_tools() if isinstance(t, AddCollectionFromEPMCTool)][0]

for start_i in range(0, len(missing_dois), step):
    query = ' OR '.join(['doi:\"'+missing_dois[i].lower()+'\"' for i in range(start_i, start_i+step) if i < len(missing_dois)])
    addEMPCCollection_tool.run({'id': '0', 'name':'Imaging Program', 'query':query, 'full_text':True})

https://www.ebi.ac.uk/europepmc/webservices/rest/search?format=JSON&pageSize=1000&synonym=TRUE&resultType=core&query=doi:"10.3389%2ffmed.2022.849677" OR doi:"10.48550/arxiv.2210.04033" OR doi:"10.3389%2ffnins.2023.1135494" OR doi:"10.48550/arxiv.2308.00870" OR doi:"10.48550/arxiv.2307.14572" OR doi:"10.48550/arxiv.2107.09145" OR doi:"10.1038/s41598-021-94852-4" OR doi:"10.1172%2fjci.insight.142945" OR doi:"10.1016/j.biopha.2022" OR doi:"10.6084/m9.figshare.12758957.v1" OR doi:"10.6084/m9.figshare.12299915.v2" OR doi:"10.1126/sciadv.aaz2598." OR doi:"10.5281/zenodo.3901011" OR doi:"10.1152/ajpendo.00501.2018." OR doi:"10.48550/arxiv.2008.00807" OR doi:"10.1136/ jitc-2022-006133" OR doi:"10.1101/2020.05.27/119750" OR doi:"10.1038/s41592-021- 01156-w" OR doi:"0.1021/acschembio.0c00988" OR doi:"10.1126/sciimmunol.abm693" OR doi:"10.1038/s41598-021-85036-w" OR doi:"10.1016%2fj.ijcha.2020.100672" OR doi:"10.1017/s2633903x2300003x[opens in a new window]" OR doi:"10.1016%2fj.pacs.2021.100276" 

100%|██████████| 1/1 [00:00<00:00,  1.01it/s]


 Returning 6


100%|██████████| 6/6 [00:00<00:00, 168.60it/s]


## Run LLM over each paper in Imaging Program collection to determine if papers are methods or applications

In [None]:
t = [t for t in cb.tk.get_tools() if isinstance(t, TitleAbstractClassifier_OneDocAtATime_Tool)][0]
t.run({'collection_id': '0', 'classification_type':'is_method_paper'})


In [None]:
collection_id  = '0'
classification_type = 'is_methods_paper'
q = ldb.session.query(SKE, N) \
        .filter(SKC.id==SKC_HM.ScientificKnowledgeCollection_id) \
        .filter(SKC_HM.has_members_id==SKE.id) \
        .filter(SKE.id==SKE_HR.ScientificKnowledgeExpression_id) \
        .filter(SKE_HR.has_representation_id==SKI.id) \
        .filter(SKE_HN.ScientificKnowledgeExpression_id==SKE.id) \
        .filter(SKE_HN.has_notes_id==N.id) \
        .filter(SKE_HR.has_representation_id==SKI.id) \
        .filter(SKC.id==collection_id) \
        .filter(N.type=='TiAbClassificationNote__'+classification_type) 
l = []
for e,n in q.all():
    c = json.loads(n.content)
    doi_link = 'https://doi.org/'+e.id[4:]
    l.append((doi_link, e.type, e.content, c.get('is_method_paper'), c.get('explanation')))
df = pd.DataFrame(l, columns=['doi', 'paper_type', 'citation', 'is_methods_paper', 'explanation']) 
#df.to_csv(loc+db_name+'/imaging_cohort_methods.tsv', sep='\t', index=False)  
df

Unnamed: 0,doi,paper_type,citation,is_methods_paper,explanation
0,https://doi.org/10.1073/pnas.2301852120,ScientificPrimaryResearchArticle,"Lucas BA, Grigorieff N. (2023) Quantification ...",True,The main goal of the paper is to develop and t...
1,https://doi.org/10.1016/j.ultramic.2023.113730,ScientificPrimaryResearchArticle,"Axelrod JJ, Petrov PN, Zhang JT, Remis J, Buij...",True,The main goal of the paper is to identify and ...
2,https://doi.org/10.1021/acs.jpcb.2c08995,ScientificPrimaryResearchArticle,"Sartor AM, Dahlberg PD, Perez D, Moerner WE. (...",True,The main goal of the paper is to characterize ...
3,https://doi.org/10.1101/2023.02.12.528160,ScientificPrimaryResearchPreprint,"Axelrod JJ, Petrov PN, Zhang JT, Remis J, Buij...",True,The paper is concerned with developing new tec...
4,https://doi.org/10.1016/j.jsb.2023.107941,ScientificPrimaryResearchArticle,"Du DX, Simjanoska M, Fitzpatrick AWP. (2023) F...",True,The main goal of the paper is to combine two e...
...,...,...,...,...,...
763,https://doi.org/10.1016/b978-0-12-420138-5.000...,ScientificReviewArticle,"Lambert TJ, Waters JC. (2014) Assessing camera...",True,The main goal of the paper is to assess and me...
764,https://doi.org/10.1016/b978-0-12-420138-5.000...,ScientificPrimaryResearchArticle,"Petrak LJ, Waters JC. (2014) A practical guide...",True,The main goal of the paper is to provide a pra...
765,https://doi.org/10.1016/b978-0-12-420138-5.000...,ScientificPrimaryResearchArticle,"Waters JC, Wittmann T. (2014) Concepts in quan...",True,The main goal of the paper is to discuss conce...
766,https://doi.org/10.1016/b978-0-12-407761-4.000...,ScientificPrimaryResearchArticle,Waters JC. (2013) Live-cell fluorescence imaging.,True,The main goal of the paper is to optimize live...


# Run the tool for high-level subtypes

In [None]:
t = [t for t in cb.tk.get_tools() if isinstance(t, TitleAbstractClassifier_OneDocAtATime_Tool)][0]
t.run({'collection_id': '0', 'classification_type':'top_level_imaging_categories'})

100%|██████████| 752/752 [23:07<00:00,  1.84s/it]


{'response': "completed document classification of type 'top_level_imaging_categories' for collection 0.",
 'data': [{'classification': {'imaging_technology_code': 'K',
    'imaging_technology_name': 'Magnetic resonance imaging',
    'explanation': 'The abstract mentions the use of MRI (Magnetic Resonance Imaging) and PET (Positron Emission Tomography) scans as part of their diagnostic methods, implying that MRI is a primary imaging technology used in the paper.'},
   'paper_id': 'doi:10.3389/fnins.2021.768646'},
  {'classification': {'imaging_technology_code': 'H',
    'imaging_technology_name': 'Cryo-electron tomography',
    'explanation': "The paper's title and abstract mention the use of cryo-electron tomography (CET) as the imaging technique for determining the structure of proteins in cells, which is central to the work being presented."},
   'paper_id': 'doi:10.1007/978-3-031-19803-8_38'},
  {'classification': {'imaging_technology_code': 'G',
    'imaging_technology_name': 'Oth

In [None]:
pd.set_option('display.max_colwidth', None)

In [None]:
collection_id  = '0'
classification_type = 'top_level_imaging_categories'
q = ldb.session.query(SKE, N) \
        .filter(SKC.id==SKC_HM.ScientificKnowledgeCollection_id) \
        .filter(SKC_HM.has_members_id==SKE.id) \
        .filter(SKE.id==SKE_HR.ScientificKnowledgeExpression_id) \
        .filter(SKE_HR.has_representation_id==SKI.id) \
        .filter(SKE_HN.ScientificKnowledgeExpression_id==SKE.id) \
        .filter(SKE_HN.has_notes_id==N.id) \
        .filter(SKE_HR.has_representation_id==SKI.id) \
        .filter(SKC.id==collection_id) \
        .filter(N.type=='TiAbClassificationNote__'+classification_type) 
l = []
for e,n in q.all():
    c = json.loads(n.content)
    doi_link = 'https://doi.org/'+e.id[4:]
    l.append((doi_link, e.type, e.content, n.content))
df = pd.DataFrame(l, columns=['doi', 'paper_type', 'citation', 'json']) 
#df.to_csv(loc+db_name+'/imaging_cohort_methods.tsv', sep='\t', index=False)  
df

Unnamed: 0,doi,paper_type,citation,json


In [None]:
collection_id  = '0'
classification_type = 'top_level_imaging_categories'
q = ldb.session.query(SKE, N) \
        .filter(SKC.id==SKC_HM.ScientificKnowledgeCollection_id) \
        .filter(SKC_HM.has_members_id==SKE.id) \
        .filter(SKE.id==SKE_HR.ScientificKnowledgeExpression_id) \
        .filter(SKE_HR.has_representation_id==SKI.id) \
        .filter(SKE_HN.ScientificKnowledgeExpression_id==SKE.id) \
        .filter(SKE_HN.has_notes_id==N.id) \
        .filter(SKE_HR.has_representation_id==SKI.id) \
        .filter(SKC.id==collection_id) \
        .filter(N.type=='TiAbClassificationNote__'+classification_type) 
l = []
for e,n in q.all():
    c = json.loads(n.content)
    if isinstance(c, list):
        c = c[0]
    doi_link = 'https://doi.org/'+e.id[4:]
    l.append((doi_link, e.type, e.content, c.get('imaging_technology_code'), c.get('imaging_technology_name'),c.get('explanation') ))
df = pd.DataFrame(l, columns=['doi', 'paper_type', 'citation', 'code', 'name', 'explanation']) 
df.to_csv(loc+db_name+'/imaging_cohort_categories.tsv', sep='\t', index=False)  
df

Unnamed: 0,doi,paper_type,citation,code,name,explanation
0,https://doi.org/10.1101/2023.10.25.23297129,ScientificPrimaryResearchPreprint,"Dofash LN, Miles LB, Saito Y, Rivas E, Calcino...",P,Other computational analysis,The paper primarily discusses genetic analysis...
1,https://doi.org/10.1016/j.ceb.2023.102271,ScientificReviewArticle,"Pylvänäinen JW, Gómez-de-Mariscal E, Henriques...",G,Other fluorescence microscopy,The abstract mentions the combination of live ...
2,https://doi.org/10.1002/advs.202303381,ScientificPrimaryResearchArticle,"Chang S, Yang J, Novoseltseva A, Abdelhakeem A...",E,Nonlinear microscopy,The paper primarily uses two-photon microscopy...
3,https://doi.org/10.1038/s41592-023-02045-0,ScientificPrimaryResearchArticle,"Liu HF, Zhou Y, Huang Q, Piland J, Jin W, Mand...",H,Cryo-electron tomography,The title and abstract clearly mention 'single...
4,https://doi.org/10.1101/2023.10.26.564098,ScientificPrimaryResearchPreprint,"Woldeyes RA, Nishiga M, Vander Roest AS, Engel...",H,Cryo-electron tomography,The paper primarily uses cryo-electron tomogra...
...,...,...,...,...,...,...
819,https://doi.org/10.1016/b978-0-12-420138-5.000...,ScientificPrimaryResearchArticle,"Petrak LJ, Waters JC. (2014) A practical guide...",M,Other imaging hardware technology,The title and abstract mention microscope care...
820,https://doi.org/10.1016/b978-0-12-420138-5.000...,ScientificPrimaryResearchArticle,"Waters JC, Wittmann T. (2014) Concepts in quan...",G,Other fluorescence microscopy,The abstract mentions quantitative fluorescenc...
821,https://doi.org/10.1016/b978-0-12-407761-4.000...,ScientificPrimaryResearchArticle,Waters JC. (2013) Live-cell fluorescence imaging.,G,Other fluorescence microscopy,The abstract discusses live-cell imaging with ...
822,https://doi.org/10.1016/b978-0-12-407761-4.000...,ScientificPrimaryResearchArticle,"Salmon ED, Shaw SL, Waters JC, Waterman-Storer...",G,Other fluorescence microscopy,The paper primarily discusses a digital imagin...


In [None]:
collection_id  = '0'
classification_type = 'top_level_imaging_categories'
q = ldb.session.query(SKE, N) \
        .filter(SKC.id==SKC_HM.ScientificKnowledgeCollection_id) \
        .filter(SKC_HM.has_members_id==SKE.id) \
        .filter(SKE.id==SKE_HR.ScientificKnowledgeExpression_id) \
        .filter(SKE_HR.has_representation_id==SKI.id) \
        .filter(SKE_HN.ScientificKnowledgeExpression_id==SKE.id) \
        .filter(SKE_HN.has_notes_id==N.id) \
        .filter(SKE_HR.has_representation_id==SKI.id) \
        .filter(SKC.id==collection_id) \
        .filter(N.type=='TiAbClassificationNote__'+classification_type) 
l = []
for e,n in q.all():
    ldb.delete_note(n.id)
    