In [1]:
import sys
sys.path.append('./styles')
from init_style import init
init()

<div class="banner"><div>Introduction</div><b>OpenAD Deep Search Commands</b></div>

# Deep Search Command Introduction
_The Deep Search toolkit is called **DS4SD** which stands for Deep Search for Scientific Discovery._

<br><br>

## 1. Getting Started
<hr>

<br>

### Setting up Magic Commands
The Magic command `%run` is only required if you did not execute `init_magic` from the command line.

In [2]:
# %run openad_magic.ipynb # Not required if you have run init_magic

<br>

### Setting up the AI assistant 
If you'd like to be assisted by the AI assistent, you'll need to create your own account with BAM or LLM Server with OLLAMA for the `tell me` command to work. Please refer to the [documentation](https://acceleratedscience.github.io/openad-docs/installation.html#ai-assistant) for instructions. Support for watsonx is coming soon.

In [3]:
%openad tell me how would I create a workspace then search for ibuprofen using deepsearch then display them with a viewer

<span style="color: #ffa500">Updating Embeddings for current Toolkits and Workspaces</span> <br> 


<span style="color: #d00">Error in creating vector database Failed to handle request to https://bam-api.res.ibm.com/v2/text/embeddings/limits?version=2023-11-22. <br> 
{ <br> 
  "error": "Unauthorized", <br> 
  "extensions": { <br> 
    "code": "AUTH_ERROR", <br> 
    "state": null, <br> 
    "reason": "TOKEN_INVALID" <br> 
  }, <br> 
  "message": "Invalid or missing JWT token", <br> 
  "status_code": 401 <br> 
}</span> <br> 


<span style="color: #d00">Problem Connecting to LLM: the vector db False was not able to be loaded</span> <br> 


<br>

### Installing the toolkit
If this is your first time using the DS4SD toolkit, you will first have to install it. By installing, the context will be set automatically.

In [4]:
# %openad clear sessions
%openad add toolkit DS4SD

<span style="color: #d00">Action aborted</span> <br> 
<span style="color: #ccc">Run `clear sessions` and try again</span> <br> 


<br>

### Updating the toolkit
If you've installed the Deep Search toolkit before, let's update it to ensure you're running the latest version.

In [5]:
# %openad clear sessions
%openad update toolkit DS4SD

<span style="color: #d00">Action aborted</span> <br> 
<span style="color: #ccc">Run `clear sessions` and try again</span> <br> 


Now let's set the context to DS4SD.

In [6]:
%openad set context DS4SD

<span style="color: #090">Logged into DS4SD as </span>phil.downey1@ibm.com<span style="color: #090"> <br> 
Workspace:</span> DEFAULT <br> 


<span style="color: #090">You successfully logged in to <span style="color: #dc0">DS4SD</span></span> <br> 
<span style="color: #ccc">Your access token expires on Thu Jan  9, 2025  at 22:23</span> <br> 


If you ever want to check what context or workspace you are in, you can run the command below.

In [7]:
%openad get status

<span style="color: #dc0">Current workspace</span>: DEFAULT <br> 
<span style="color: #dc0">Current context</span>: DS4SD <br> 
<span style="color: #ccc">To see more details, run `get workspace` or `get context`.</span> <br> 


<br>

### Available DS4SD commands
Let's list all available commands for this toolkit.

In [8]:
%openad ? DS4SD

## Available Commands - DS4SD

Search Molecules <br> 
`search for similar molecules to '<smiles>' [ save as '<filename.csv>' ]` <br> 
`search for molecules in patents from list ['<patent1>', '<patent2>', ...] | dataframe <dataframe_name> | file '<filename.csv>' [ save as '<filename.csv>' ]` <br> 
`search for patents containing molecule '<smiles>' | '<inchi>' | '<inchikey>' [ save as '<filename.csv>' ]` <br> 
`search for substructure instances of '<smiles>' [ save as '<filename.csv>' ]` <br> 

Search Collections <br> 
`search collection '<collection_name_or_key>' for '<search_string>' [ using (page_size=<int> system_id=<system_id> edit_distance=<integer> display_first=<integer>) ] show (data | docs) [ estimate only | return as data | save as '<filename.csv>' ]` <br> 
`display collection matches for '<search_string>' [ save as '<filename.csv>' ]` <br> 

Collections <br> 
`display collections in domains from list <list_of_domains> [ save as '<filename.csv>' ] ` <br> 
`display all collections [ save as '<filename.csv>' ]` <br> 
`display collections for domain '<domain_name>' ` <br> 
`display collection details '<collection_name_or_key>'` <br> 

 i  <span style="color: #ccc">To learn more about the DS4SD toolkit, run `ds4sd`.</span> <br> 


<br><br>

## 2. Displaying Domains
<hr>

<br>

### Displaying all collections and domains
<pre style="color:#eec;background:#2f2d3a;padding:20px;margin:0;border-radius:5px">display all collections</pre>

This command will list all available collections in Deep Search.

In [9]:
%openad display all collections

Domains,Collection Name,Collection Key,Type,Num Entries,Date,System
Scientific Literature,AAAI,aaai,Document,16021,2024-01-23,default
Scientific Literature,ACL Anthology,acl,Document,55278,2023-10-05,default
Business Insights,Annual Reports,annual-report,Document,107375,2024-01-23,default
Scientific Literature,arXiv abstracts,arxiv-abstract,Document,2346838,2023-10-24,default
Scientific Literature,arXiv full documents,arxiv,Document,2290847,2024-01-27,default
Healthcare & Life Sciences,ClinicalTrials,clinical-trials,Document,426424,2023-06-01,default
Healthcare & Life Sciences / Scientific Literature,Cord19,cord19,Document,655447,2023-04-14,default
Scientific Literature,Crossref,crossref,Document,131857641,2023-04-15,default
Business Insights,ESG Reports,esg-report,Document,17358,2024-01-22,default
Business Insights,IBM Redbooks,ibm-redbooks,Document,2751,2023-08-02,default


Any result tables can displayed and edited using the result commands.

<br>

### Search for relevant collections
<pre style="color:#eec;background:#2f2d3a;padding:20px;margin:0;border-radius:5px">display collection matches for '&lt;search_string&gt;' [ save as '&lt;filename.csv&gt;' ]</pre>

This command lets you find out which collections hold documents that contain a given Deep Search search string.

In [10]:
%openad display collection matches ?

`display collection matches for '<search_string>' [ save as '<filename.csv>' ]` <br> 
<span style="color: #ccc">-----------------------------------------------------------------------------</span> <br> 
Search all collections for documents that contain a given Deep Search `<search_string>`. This is useful when narrowing down document collection(s) for subsequent search. You can use the `<index_key>` from the returned table in your next search. <br> 

Use the `save as` clause to save the results as a csv file in your current workspace. <br> 

Example: <br> 
`display collection matches for 'Ibuprofen'` <br> 


In [11]:
%openad display collection matches for 'carbon capture'

  0%|          | 0/20 [00:00<?, ?it/s]

<span style="color: #d00">There was an error calling DeepSearch</span> <br> 
<span style="color: #ccc">Task '0_ElasticQuery' failed with 'RuntimeError': Failed to query Elasticsearch service: 500 '{"error":{"root_cause":[{"type":"too_many_nested_clauses","reason":"too_many_nested_clauses: Query contains too many nested clauses; maxClauseCount is set to 1024"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"clinical-trials-20220815","node":"q3hN1RZ_T5-3a3Gyz62CHQ","reason":{"type":"too_many_nested_clauses","reason":"too_many_nested_clauses: Query contains too many nested clauses; maxClauseCount is set to 1024"}}]},"status":500}'. Full error: <br> 
- Error Type: 'RuntimeError' <br> 
- Task ID: '0_ElasticQuery'</span> <br> 


<br>

### Display collections for a specific domain
To see what domains are available, you can run `display all collections`

In [12]:
%openad display collections for domain 'Business Insights'

Name,Type,Num Entries,Date,Coords
Annual Reports,Document,107375,2024-01-23,default/annual-report
ESG Reports,Document,17358,2024-01-22,default/esg-report
IBM Redbooks,Document,2751,2023-08-02,default/ibm-redbooks
Press Release,Document,43042,2024-02-27,default/press-release
Red Hat,Document,6213,2023-07-14,default/redhat


<br>

### Subset collections by a set of domains
Let's list all collection within a set list of domains.

In [13]:
%openad display collections in domains from list [ 'Business Insights','Climate & Sustainability']

Domains,Collection Name,Collection Key,Type,Num Entries,Date,System
Business Insights,Annual Reports,annual-report,Document,107375,2024-01-23,default
Business Insights,ESG Reports,esg-report,Document,17358,2024-01-22,default
Business Insights,IBM Redbooks,ibm-redbooks,Document,2751,2023-08-02,default
Climate & Sustainability,IPCC,ipcc,Document,819,2023-06-15,default
Business Insights,Press Release,press-release,Document,43042,2024-02-27,default
Business Insights,Red Hat,redhat,Document,6213,2023-07-14,default


<br>

### Get collection details
You can drill into the details behind a collection to see its source, date and more.

In [14]:
%openad display collection details 'ESG Reports'

**ESG Reports**  <br> 
<span style="color: #090">Collection Name: </span>ESG Reports <br> 
<span style="color: #090">Domains: </span>Business Insights <br> 
<span style="color: #090">Type: </span>Document <br> 
<span style="color: #090">Collection Key: </span>esg-report <br> 
<span style="color: #090">Documents: </span>17358 <br> 
<span style="color: #090">Created: </span>2024-01-22 <br> 
<span style="color: #090">URL: </span><a target="_blank" href="https://www.responsibilityreports.com/">https://www.responsibilityreports.com/</a> <br> 
<span style="color: #090">Description: </span>Responsibility reports from global companies, including sustainability reports, corporate responsibility reports, corporate social responsibility reports, and ESG reports. <br> 


<br><br>

## 3. Searching a Collection
<hr>

<pre style="color:#eec;background:#2f2d3a;padding:20px;margin:0;border-radius:5px">search collection '&lt;collection_name_or_key&gt;' for '&lt;search_string&gt;' [ using (page_size=&lt;int&gt; system_id=&lt;system_id&gt; edit_distance=&lt;integer&gt; display_first=&lt;integer&gt;) ] show (data | docs) [ estimate only | return as data | save as '&lt;filename.csv&gt;' ]</pre>

This command allows you to execute a search against a variety of collections letting you to pull back documents with snippets highlighting the data mathcing the search criteria.

Our examples will cover its simplest form plus a variety of options.

In [15]:
%openad search collection ?

`search collection '<collection_name_or_key>' for '<search_string>' [ using (page_size=<int> system_id=<system_id> edit_distance=<integer> display_first=<integer>) ] show (data | docs) [ estimate only | return as data | save as '<filename.csv>' ]` <br> 
<span style="color: #ccc">------------------------------------------------------------------------------------------------------------------------</span> <br> 
Performs a document search of the Deep Search repository based on a given collection. The required `using` clause specifies the collection to search. Use `estimate only` to return only the potential number of hits. <br> 

Parameters: <br> 
- `<collection_name_or_key>` The name or index key for a collection. Use the command `display all collections` to list available collections. <br> 
- `<search_string>` The search string for the search. <br> 

The `<search_string>` supports elastic search string query syntax: <br> 
- `+` Signifies AND operation. <br> 
- `|` Signifies OR operation. <br> 
- `-` Negates a single token. <br> 
- `\"` Wraps a number of tokens to signify a phrase for searching. <br> 
- `*` At the end of a term -> signifies a prefix query <br> 
- `(` & `)` Signifies precedence <br> 
- `~N` After a word -> signifies edit distance (fuzziness) <br> 
- `~N` After a phrase -> signifies slop amount <br> 

Options for the `using` clause: <br> 
**Note:** The `using` clause requires all enclosed parameters to be defined in the same order as listed below. <br> 
- `page_size=<integer>` Result pagination, the default is None. <br> 
- `system_id=<system_id>` System cluster id, the default is 'default'. <br> 
- `edit_distance=<integer>` (0-5) Sets the search word span criteria for key words for document searches, the default is 5. When set to 0, no snippets will be be returned. <br> 
- `display_first=<integer>` When set, the displayed result set will be truncated at the given number. <br> 

Clauses: <br> 
- `show (data | docs)`: <br> 
    - `data` Display structured data from within the documents. <br> 
    - `docs` Display document context and preview snippet. <br> 
    Both can be combined in a single command, e.g. `show (data docs)` <br> 
- `estimate only` Determine the potential number of hits. <br> 
- `return as data` For Notebook or API mode. Removes all styling from the Pandas DataFrame, ready for further processing. <br> 

Examples: <br> 
- Look for documents that contain discussions on power conversion efficiency: <br> 
`search collection 'arxiv-abstract' for 'ide(\"power conversion efficiency\" OR PCE) AND organ*' using ( edit_distance=20 system_id=default) show (docs)` <br> 

- Search the PubChem archive for 'Ibuprofen' and display related molecules' data: <br> 
`search collection 'pubchem' for 'Ibuprofen' show (data)` <br> 

- Search for patents which mention a specific smiles molecule: <br> 
`search collection 'patent-uspto' for '\"smiles#ccc(coc(=o)cs)(c(=o)c(=o)cs)c(=o)c(=o)cs\"' show (data)` <br> 


<br><br>

### <span style="color: green">Example A:</span> Query arXiv for "*power conversion efficiency*"

In this example we'll search for the input query in documents from the arXiv.org data collection. For each matched document we'll return the title, authors as well as the link to the original document on arXix.org

#### What we'll cover:
1. How to address a specific data collection
2. How to choose which component of the documents should be returned
3. How to iterate through the complete data collection by fetching page_size=50 results at the time

<br>

### Getting the result estimate

First we will run and get an estimate of how many documents may appear in the search so we know we are pulling back a manageable amount.

In [16]:
%openad search collection 'arXiv abstracts' for 'ide("power conversion efficiency" OR PCE) AND organ* ' show (docs) estimate only

Estimated results: 75 <br> 


<br>

### Retrieving results
Unless the `return as data` clause is set, results be returned as a ***pandas styler object*** which provides an enhanced snippet display of the data.

The styler object can be displayed straightaway, or assigned to a variable. To extract the raw data from the dataframe, you can reference `df_styler.data`.

In [17]:
df_styler = %openad search collection 'arXiv abstracts' for 'ide("power conversion efficiency" OR PCE) AND organ* ' using \
(system_id=default edit_distance=20) show (data docs)

Estimated results: 75 <br> 


  0%|          | 0/2 [00:00<?, ?it/s]

**Result distribution by year** <br> 


2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
4,0,4,12,8,8,4,28,20,52,28,32,28,32,20,20





<span style="color: #ccc">Next up, you can run: </span>`result open`/`edit`/`copy`/`display`/`as dataframe`/`save [as '<filename.csv>']` <br> 


Now the data is store in our variable, we can display it in a next step. By right-clicking the cell's output, you can enable cell scrolling, which may be easier to review the output rather than it taking up the entire notebook.

In [18]:
df_styler

DS_URL,Title,Authors,Snippet,arxivid,doi,Report,Field
Deep Search Web Link,Energetics and Kinetics Requirements for Organic Solar Cells to 2 Break  the 20% Power Conversion Efficiency Barrier,"Oskar J Sandberg,Ardalan Armin",These findings provide vital insights into the operation of state-of-art non-fullerene organic solar cells with low offsets.,ARXIVID Link,DOI Link,2104.11357,main-text
Deep Search Web Link,"New 3,3'-(ethane-1, 2-diylidene)bis(indolin-2-one) (EBI)-based small  molecule semiconductors for organic solar cells","Mylene Le Borgne,Jesse Quinn,Jaime Mart\'in,Natalie Stingelin,Yuning Li,Guillaume Wantz","The best performing photovoltaic devices are based on the EBI derivative using the bithiophene end-capping moiety (EBI-2T) with a maximum power conversion efficiency (PCE) of 1.92%, owing to the broad absorption spectra of EBI-2T and the appropriate morphology of the BHJ.",ARXIVID Link,,1706.08074,main-text
Deep Search Web Link,How Good Can 2D Excitonic Solar Cells Be?,"Zekun Hu,Da Lin,Jason Lynch,Kevin Xu,Deep Jariwala","Our analysis suggests that, while the PCE for 2D excitonic solar cells may be limited to < 10%, a specific power > 100 W g-1 may be achieved with our proposed designs, making them attractive in aerospace, distributed remote sensing, and wearable electronics.",ARXIVID Link,,2302.04788,main-text
Deep Search Web Link,The role of charge recombination to spin-triplet excitons in  non-fullerene acceptor organic solar cells,"Alexander J. Gillett,Alberto Privitera,Rishat Dilmurat,Akchheta Karki,Deping Qian,Anton Pershin,Giacomo Londi,William K. Myers,Jaewon Lee,Jun Yuan,Seo-Jin Ko,Moritz K. Riede,Feng Gao,Guillermo C. Bazan,Akshay Rao,Thuc-Quyen Nguyen,David Beljonne,Richard H. Friend",This work therefore provides a clear design pathway for improved OSC performance to 20% PCE and beyond.,ARXIVID Link,DOI Link,2010.10978,main-text
Deep Search Web Link,Design of Lead-Free Inorganic Halide Perovskites for Solar Cells via  Cation-Transmutation,"Xin-Gang Zhao,Ji-Hui Yang,Yuhao Fu,Dongwen Yang,Qiaoling Xu,Liping Yu,Su-Huai Wei,Lijun Zhang","Despite the high power conversion efficiency exceeding 20% achieved by their solar cells, two key issues -- the poor device stabilities associated with their intrinsic material instability and the toxicity due to water soluble Pb$^{2+}$ -- need to be resolved before large-scale commercialization.",ARXIVID Link,DOI Link,1705.10014,main-text
Deep Search Web Link,On the Absence of Triplet Exciton Loss Pathways in Non-Fullerene  Acceptor based Organic Solar Cells,"Maria S. Kotova,Giacomo Londi,Johannes Junker,Stefanie Dietz,Alberto Privitera,Kristofer Tvingstedt,David Beljonne,Andreas Sperlich,Vladimir Dyakonov",These results correlate well with the high power conversion efficiency of the PBDB-T:ITIC-based OSCs and their high stability.,ARXIVID Link,DOI Link,2002.07531,main-text
Deep Search Web Link,Correlated In-Situ Low-Frequency Noise and Impedance Spectroscopy Reveal  Recombination Dynamics in Organic Solar Cells using Fullerene and  Non-Fullerene Acceptors,"Kyle A. Luck,Vinod K. Sangwan,Patrick E. Hartnett,Heather N. Arnold,Michael R. Wasielewski,Tobin J. Marks,Mark C. Hersam",An inverse correlation is also observed between noise spectral density and power conversion efficiency.,ARXIVID Link,,1709.07133,main-text
Deep Search Web Link,Organic solar cell design as a function of radiative quantum efficiency,"Blaise Godefroid,Gregory Kozyreff","To demonstrate this fact, we use realistic material parameters inspired from literature data and obtain an increase of power conversion efficiency from 11.3% to 12.7%.",ARXIVID Link,DOI Link,1705.07814,main-text
Deep Search Web Link,Enhanced Organic Solar Cells Efficiency through Additive Electronic and  Electro-optic Effects Resulting from Doping a Polymer Hole Transport Layer,"C.T. Howells,K. Marbou,H. Kim,K.J. Lee,B. Heinrich,S.J. Kim,A. Nakao,T. Aoyama,S. Furukawa,J.-H. Kim,E.S. Kim,F. Mathevet,S. Mery,I.D.W. Samuel,A. Al Ghaferi,M. S. Dahlem,M. Uchiyama,S.Y. Kim,J.W. Wu,J.-C. Ribierre,C. Adachi,D.-W. Kim,P. Andr\'e",This work points towards fluorination as a promising strategy toward combining both external quantum efficiency modulation and power conversion efficiency enhancement in OPVs.,ARXIVID Link,DOI Link,1512.04314,main-text
Deep Search Web Link,Molecular-Level Switching of Polymer/Nanocrystal Non-Covalent  Interactions and Application in Hybrid Solar Cells,"Carlo Giansante,Rosanna Mastria,Giovanni Lerario,Luca Moretti,Ilka Kriegel,Francesco Scotognella,Guglielmo Lanzani,Sonia Carallo,Marco Esposito,Mariano Biasiucci,Aurora Rizzo,Giuseppe Gigli","Upon (quasi)steady-state and time-resolved analisys of the photo-induced processes in the nanocomposites and their organic and inorganic components, we ascertained that electron transfer occurs at the hybrid interface yielding long-lived separated charge carriers, whereas interfacial hole transfer appears slow.",ARXIVID Link,DOI Link,1312.624,main-text


<br>

### Simply viewing the Results
If the results are not stored in a variable, they will be displayed straightaway. Notice we are using the `page_size` and `edit_distance` options to finetune our results. You can try experiment with  different values for these options.

In [19]:
%openad search collection 'arXiv abstracts' for ' " carbon capture" AND "membrane" ' using (page_size=10 edit_distance=5) show (data docs)

Estimated results: 2 <br> 


  0%|          | 0/1 [00:00<?, ?it/s]

**Result distribution by year** <br> 


2019,2020
3,3





<span style="color: #ccc">Next up, you can run: </span>`result open`/`edit`/`copy`/`display`/`as dataframe`/`save [as '<filename.csv>']` <br> 


DS_URL,Title,Authors,Snippet,arxivid,doi,Report,Field
Deep Search Web Link,Electrolytic Conversion of Bicarbonate into CO in a Flow Cell,"Tengfei Li,Eric W. Lees,Maxwell Goldman,Danielle A. Salvatore,David M. Weekes,Curtis P. Berlinguette",This process offers a means of using electrolysis to bypass the thermally-intensive step of extracting CO2 from bicarbonate solutions generated in carbon capture schemes.,ARXIVID Link,DOI Link,1905.0458,main-text
Deep Search Web Link,Carbon Capture and Separation from CO2/N2/H2O Gaseous Mixtures in  Bilayer Graphtriyne: A Molecular Dynamics Study,"Noelia Faginas-Lago,Yusuf Bramastya Apriliyanto,Andrea Lombardi",We also observed that the bilayer graphtriyne membrane has high CO2 and H2O permeances compared to N2 with permeance selectivity ranging from 4.8 to 6.5.,ARXIVID Link,,2008.01754,main-text


<br><br>

### <span style="color: green">Example B:</span> Search for "_Ibuprofen_" on PubChem

In this example we'll search for all PubChem entries which contain the string _Ibuprofen_. In the results table we see the name of the chemical, its molecule SMILES and some properties such as the molecular weight and the solubility.


In [20]:
ibuprofen_df = %openad search collection 'pubchem' for 'Ibuprofen' SHOW (data) 
display(ibuprofen_df)

Estimated results: 9 <br> 


  0%|          | 0/1 [00:00<?, ?it/s]

<span style="color: #ccc">Next up, you can run: </span>`result open`/`edit`/`copy`/`display`/`as dataframe`/`save [as '<filename.csv>']` <br> 


DS_URL,cid,cas_number,ec_number,SMILES,chemical_name,molecular weight,xlogp3,hydrogen bond donor count,hydrogen bond acceptor count,rotatable bond count,exact mass,monoisotopic mass,topological polar surface area,heavy atom count,formal charge,complexity,isotope atom count,defined atom stereocenter count,undefined atom stereocenter count,covalently-bonded unit count,compound is canonicalized
Deep Search Web Link,114864,51146-57-7,610-621-4,CC(C)CC1=CC=C(C=C1)C(C)C(=O)O,(-)-ibuprofen,206.28,3.5,1.0,2.0,4.0,206.13068,206.13068,37.3,15.0,0.0,203.0,0.0,1.0,0.0,1.0,Yes
Deep Search Web Link,175781,98207-12-6,,CC(C)CC1=CC=C(C=C1)C(C)C(=O)OCCN2CCN(CC2)C3=CC(=CC=C3)Cl,Lobuprofen,429.0,6.1,0.0,4.0,9.0,428.223056,428.223056,32.8,30.0,0.0,513.0,0.0,0.0,1.0,1.0,Yes
Deep Search Web Link,39912,51146-56-6,,CC(C)CC1=CC=C(C=C1)C(C)C(=O)O,Dexibuprofen,206.28,3.5,1.0,2.0,4.0,206.13068,206.13068,37.3,15.0,0.0,203.0,0.0,1.0,0.0,1.0,Yes
Deep Search Web Link,219068,17692-38-5,,CC(C1=CC=C(C=C1)C2=CC(=CC=C2)F)C(=O)O,Fluprofen,244.26,3.7,1.0,3.0,3.0,244.089958,244.089958,37.3,18.0,0.0,284.0,0.0,0.0,1.0,1.0,Yes
Deep Search Web Link,3672,79261-49-7 (potassium salt),239-784-6,CC(C)CC1=CC=C(C=C1)C(C)C(=O)O,Ibuprofen,206.28,3.5,1.0,2.0,4.0,206.13068,206.13068,37.3,15.0,0.0,203.0,0.0,0.0,1.0,1.0,Yes
Deep Search Web Link,68769,57144-56-6,,CC(C)C1CC2=C(C1)C=C(C=C2)C(C)C(=O)O,Isoprofen,232.32,3.8,1.0,2.0,3.0,232.14633,232.14633,37.3,17.0,0.0,285.0,0.0,0.0,2.0,1.0,Yes
Deep Search Web Link,5359,40828-46-4,255-096-9,CC(C1=CC=C(C=C1)C(=O)C2=CC=CS2)C(=O)O,Suprofen,260.31,3.3,1.0,4.0,4.0,260.050715,260.050715,82.6,18.0,0.0,321.0,0.0,0.0,1.0,1.0,Yes
Deep Search Web Link,595160,36039-36-8,,CC(C)CC1=CC=C(C=C1)C(C)CO,Ibuprofen alcohol,192.3,3.4,1.0,1.0,4.0,192.151415,192.151415,20.2,14.0,0.0,145.0,0.0,0.0,1.0,1.0,Yes
Deep Search Web Link,71261,83394-44-9,280-048-9,CC(C)CC1=CC=C(C=C1)C(C)C(=O)NCCO,Mabuprofen,249.35,2.6,2.0,2.0,6.0,249.172879,249.172879,49.3,18.0,0.0,245.0,0.0,0.0,1.0,1.0,Yes


<br>

### Working with the results as data
By using the `return as data` clause, results will be returned as a raw dataframe, which allows us to pass them to other utilities.

In [21]:
my_mols = %openadd search collection 'pubchem' for 'Ibuprofen' show (data) return as data

3it [00:01,  1.72it/s]                                              


In [22]:
%openad load molecules using dataframe my_mols

<span style="color: #090">Successfully loaded <span style="color: #dc0">9</span> molecules into the working set</span> <br> 


In [23]:
%openad list molecules

<span style="color: #ccc">Next up, you can run: </span>`result open`/`edit`/`copy`/`display`/`as dataframe`/`save [as '<filename.csv>']` <br> 


name,inchi,inchikey,canonical_smiles,isomeric_smiles,smiles,molecular_formula,cid
,"InChI=1S/C13H18O2/c1-9(2)8-11-4-6-12(7-5-11)10(3)13(14)15/h4-7,9-10H,8H2,1-3H3,(H,14,15)",HEFNNWSXXWATRW-UHFFFAOYSA-N,CC(C)Cc1ccc(C(C)C(=O)O)cc1,CC(C)Cc1ccc(C(C)C(=O)O)cc1,,C13H18O2,
,"InChI=1S/C25H33ClN2O2/c1-19(2)17-21-7-9-22(10-8-21)20(3)25(29)30-16-15-27-11-13-28(14-12-27)24-6-4-5-23(26)18-24/h4-10,18-20H,11-17H2,1-3H3",JFGXBHHLHQAGRR-UHFFFAOYSA-N,CC(C)Cc1ccc(C(C)C(=O)OCCN2CCN(c3cccc(Cl)c3)CC2)cc1,CC(C)Cc1ccc(C(C)C(=O)OCCN2CCN(c3cccc(Cl)c3)CC2)cc1,,C25H33ClN2O2,
,"InChI=1S/C13H18O2/c1-9(2)8-11-4-6-12(7-5-11)10(3)13(14)15/h4-7,9-10H,8H2,1-3H3,(H,14,15)",HEFNNWSXXWATRW-UHFFFAOYSA-N,CC(C)Cc1ccc(C(C)C(=O)O)cc1,CC(C)Cc1ccc(C(C)C(=O)O)cc1,,C13H18O2,
,"InChI=1S/C15H13FO2/c1-10(15(17)18)11-5-7-12(8-6-11)13-3-2-4-14(16)9-13/h2-10H,1H3,(H,17,18)",TYCOFFBAZNSQOJ-UHFFFAOYSA-N,CC(C(=O)O)c1ccc(-c2cccc(F)c2)cc1,CC(C(=O)O)c1ccc(-c2cccc(F)c2)cc1,,C15H13FO2,
,"InChI=1S/C13H18O2/c1-9(2)8-11-4-6-12(7-5-11)10(3)13(14)15/h4-7,9-10H,8H2,1-3H3,(H,14,15)",HEFNNWSXXWATRW-UHFFFAOYSA-N,CC(C)Cc1ccc(C(C)C(=O)O)cc1,CC(C)Cc1ccc(C(C)C(=O)O)cc1,,C13H18O2,
,"InChI=1S/C15H20O2/c1-9(2)13-7-12-5-4-11(6-14(12)8-13)10(3)15(16)17/h4-6,9-10,13H,7-8H2,1-3H3,(H,16,17)",RYDUZJFCKYTEHX-UHFFFAOYSA-N,CC(C(=O)O)c1ccc2c(c1)CC(C(C)C)C2,CC(C(=O)O)c1ccc2c(c1)CC(C(C)C)C2,,C15H20O2,
,"InChI=1S/C14H12O3S/c1-9(14(16)17)10-4-6-11(7-5-10)13(15)12-3-2-8-18-12/h2-9H,1H3,(H,16,17)",MDKGKXOCJGEUJW-UHFFFAOYSA-N,CC(C(=O)O)c1ccc(C(=O)c2cccs2)cc1,CC(C(=O)O)c1ccc(C(=O)c2cccs2)cc1,,C14H12O3S,
,"InChI=1S/C13H20O/c1-10(2)8-12-4-6-13(7-5-12)11(3)9-14/h4-7,10-11,14H,8-9H2,1-3H3",IZXWIWYERZDWOA-UHFFFAOYSA-N,CC(C)Cc1ccc(C(C)CO)cc1,CC(C)Cc1ccc(C(C)CO)cc1,,C13H20O,
,"InChI=1S/C15H23NO2/c1-11(2)10-13-4-6-14(7-5-13)12(3)15(18)16-8-9-17/h4-7,11-12,17H,8-10H2,1-3H3,(H,16,18)",JVGUNCHERKJFCM-UHFFFAOYSA-N,CC(C)Cc1ccc(C(C)C(=O)NCCO)cc1,CC(C)Cc1ccc(C(C)C(=O)NCCO)cc1,,C15H23NO2,


In [24]:
%openad display molecule Mabuprofen

Output()

Output()

<br>

### Larger sets of data

In [25]:
%openad search collection 'pubchem' for 'pain' show (data)

Estimated results: 157 <br> 


Your query may take some time, do you wish to proceed? (y/n):  n


Now let's edit the result set and rename the molecule Lorazepam

Next we'll take the edited results and load them into our molecule working set

In [26]:
my_mols = %openadd result as dataframe
%openad load molecules using dataframe my_mols

<span style="color: #d00">Unknown error</span> <br> 
<span style="color: #ccc">'my_mols'</span> <br> 


Next we'll use the `show molecules` command to visualize and subset the results.

In [27]:
%openad show molecules 

Next up, you could load this selection back into your molecule working set if you wish.

For this demo, we'll continue instead by displaying one of the molecules from our list.

In [28]:
%openad display mol Noscapine

Output()

Output()

<br>

### Visualizing the molecules
<pre style="color:#eec;background:#2f2d3a;padding:20px;margin:0;border-radius:5px">show molecules </pre>

In this example, we'll first search for molecular data records related to 'Ibuprofen', store the results as a dataframe in a variable, then pass it to `show molecules` to visualize the results.

In [29]:
my_df = %openadd search collection 'pubchem' for 'Ibuprofen' show (data) return as data
%openad load molecules using dataframe my_df

3it [00:02,  1.39it/s]                                              


<span style="color: #090">Successfully loaded <span style="color: #dc0">9</span> molecules into the working set</span> <br> 


By using the `as molsobject` clause, we can access the selected molecules of the molecule grid for further processing.

In [30]:
%openad show molecules 

<br><br>

## 4. Searching Molecules
<hr>

<br>

### Search for similar molecules
<pre style="color:#eec;background:#2f2d3a;padding:20px;margin:0;border-radius:5px">search for similar molecules to '&lt;smiles&gt;' [ save as '&lt;filename.csv&gt;' ]</pre>

This command lets you search documents for any molecule that's similar to a given molecule.

In [31]:
%openad search for similar ?

`search for similar molecules to '<smiles>' [ save as '<filename.csv>' ]` <br> 
<span style="color: #ccc">-----------------------------------------------------------------------</span> <br> 
Search for molecules that are similar to the provided molecule or molecule substructure as provided in the `<smiles_string>`. <br> 

Use the `save as` clause to save the results as a csv file in your current workspace. <br> 

Example: <br> 
`search for similar molecules to 'C1(C(=C)C([O-])C1C)=O'` <br> 


In [32]:
smiles_molecule='CC(C)(c1ccccn1)C(CC(=O)O)Nc1nc(-c2c[nH]c3ncc(Cl)cc23)c(C#N)cc1F'
mols = %openad search for similar molecules to '{smiles_molecule}'  
display(mols) 

<span style="color: #d00">There was an error calling DeepSearch</span> <br> 
<span style="color: #ccc">504 Server Error: Gateway Time-out for url: https://sds.app.accelerate.science//api/orchestrator/api/v1/query/run</span> <br> 


None

<br>

### Search for patents containing a certain molecule

<pre style="color:#eec;background:#2f2d3a;padding:20px;margin:0;border-radius:5px">search for patents containing molecule '&lt;smiles&gt;' | '&lt;inchi&gt;' | '&lt;inchikey&gt;' [ save as '&lt;filename.csv&gt;' ]</pre>

This command allows you to find patents that mention a specific molecule.

In [33]:
%openad search for patents ?

`search for patents containing molecule '<smiles>' | '<inchi>' | '<inchikey>' [ save as '<filename.csv>' ]` <br> 
<span style="color: #ccc">---------------------------------------------------------------------------------------------------------</span> <br> 
Search for mentions of a specified molecules in registered patents. The queried molecule can be described as a SMILES string, InChI or InChiKey. <br> 

Use the `save as` clause to save the results as a csv file in your current workspace. <br> 

Example: <br> 
`search for patents containing molecule 'CC(C)(c1ccccn1)C(CC(=O)O)Nc1nc(-c2c[nH]c3ncc(Cl)cc23)c(C#N)cc1F'` <br> 


In [34]:
patents = %openadd search for patents containing molecule '{smiles_molecule}'
display(patents)

Unnamed: 0,PATENT ID
0,CN108473493A
1,CN108473493B
2,JP0007001601B1
3,JP2019505518T
4,KR1020187022012
5,US10526338
6,US20190023713A1


<br>

### Search for molecules in a list of patents
<pre style="color:#eec;background:#2f2d3a;padding:20px;margin:0;border-radius:5px">search for molecules in patents from list ['&lt;patent1&gt;', '&lt;patent2&gt;', ...] | dataframe &lt;dataframe_name&gt; | file '&lt;filename.csv&gt;' [ save as '&lt;filename.csv&gt;' ]</pre>

Continuing with our list of patents of interest, this command allows you to find out what other molecules are mentioned in them.

In [35]:
%openad search for molecules in patents ?

`search for molecules in patents from list ['<patent1>', '<patent2>', ...] | dataframe <dataframe_name> | file '<filename.csv>' [ save as '<filename.csv>' ]` <br> 
<span style="color: #ccc">------------------------------------------------------------------------------------------------------------------------</span> <br> 
Search for molecules mentioned in a defined list of patents. When sourcing patents from a CSV or DataFrame, there must be column named "PATENT ID" or "patent id". <br> 

Use the `save as` clause to save the results as a csv file in your current workspace. <br> 

Example: <br> 
`search for molecules in patents from list ['CN108473493B','US20190023713A1']` <br> 


In [36]:
my_list = list(patents['PATENT ID'])
my_frame =  %openadd search for molecules in patents from list {my_list}
display(my_frame)

Unnamed: 0,Id,SMILES,InChIKey,InChI
0,04b43304373fac7fb5bbd8d75ed39e49d72eff86dfeee3c53ce7ed30de0f4897,C=CCC(C)(C)C(N)CCC,CJNRUEMMFKPXRX-UHFFFAOYSA-N,"InChI=1S/C10H21N/c1-5-7-9(11)10(3,4)8-6-2/h6,9H,2,5,7-8,11H2,1,3-4H3"
1,05254aa456aab76bfd4915eaae6e856ac60288d01d60379e8cb61f202839b53b,CCOC(=O)C=C(N)C(C)(C)c1ccccn1,OPXOXVDSPZXHQT-UHFFFAOYSA-N,"InChI=1S/C13H18N2O2/c1-4-17-12(16)9-10(14)13(2,3)11-7-5-6-8-15-11/h5-9H,4,14H2,1-3H3"
2,06fc1cb7e0818196817db5ffb142ca68ff17ef82c9581f389dfa09ac99c3ac8a,C=CCC(C)(C)C(CCF)Nc1nc(Cl)c(C#N)cc1F,NVXFORHWGCVJTH-UHFFFAOYSA-N,"InChI=1S/C15H18ClF2N3/c1-4-6-15(2,3)12(5-7-17)20-14-11(18)8-10(9-19)13(16)21-14/h4,8,12H,1,5-7H2,2-3H3,(H,20,21)"
3,07ec8a3a0fc49b6f65e8ddcbcf29b6b8720aad905566cb7e88704b1279e273f1,C=C([O-])CC(Nc1nc(-c2c[nH]c3ncc(Cl)cc23)c(C#N)cc1F)C(C)(C)c1cccs1,OYPIIDFYQHIPHW-UHFFFAOYSA-M,"InChI=1S/C24H21ClFN5OS/c1-13(32)7-19(24(2,3)20-5-4-6-33-20)30-23-18(26)8-14(10-27)21(31-23)17-12-29-22-16(17)9-15(25)11-28-22/h4-6,8-9,11-12,19,32H,1,7H2,2-3H3,(H,28,29)(H,30,31)/p-1"
4,08743344ce06860ead6974eb4bcc45c21a9fb77bce9e2ae7d7ae2fe568149ea3,CC(C)(C#N)c1ccccn1,SYILGDLWTPODQG-UHFFFAOYSA-N,"InChI=1S/C9H10N2/c1-9(2,7-10)8-5-3-4-6-11-8/h3-6H,1-2H3"
5,0c10bf08da71ceb5cf16b501acd861446c4931f5205d4d0a7fddbcbf154db6f7,Cc1cc(F)c(NC(CC(=O)Cl)C(C)(C)c2ccn(C)n2)nc1-c1c[nH]c2ncc(Cl)cc12,SDORLMSRWMENPL-UHFFFAOYSA-N,"InChI=1S/C23H23Cl2FN6O/c1-12-7-16(26)22(30-20(12)15-11-28-21-14(15)8-13(24)10-27-21)29-18(9-19(25)33)23(2,3)17-5-6-32(4)31-17/h5-8,10-11,18H,9H2,1-4H3,(H,27,28)(H,29,30)"
6,0d3ff5163ea47e9dbdb746ff4ea32411a880855a41b3e4b39bd7044ae32a7557,N#CCc1nccs1,QLBGFCHJCWNVIN-UHFFFAOYSA-N,"InChI=1S/C5H4N2S/c6-2-1-5-7-3-4-8-5/h3-4H,1H2"
7,10fe7b9bc85b488112d32136d08e32cec5d0a97db4f8c123e48d27e3f1cb4edb,C#CC(C)(C)c1ccccn1,IORICXGDTVBUHE-UHFFFAOYSA-N,"InChI=1S/C10H11N/c1-4-10(2,3)9-7-5-6-8-11-9/h1,5-8H,2-3H3"
8,11da46c0433e2bd31f8aaf49912c16440d04c2a349bcbd800b85c0aec7c24703,C=CCC(C)(C)C(N)CC1C2C(C)C12,ZNJMOPOAERMPQV-UHFFFAOYSA-N,"InChI=1S/C13H23N/c1-5-6-13(3,4)10(14)7-9-11-8(2)12(9)11/h5,8-12H,1,6-7,14H2,2-4H3"
9,12d7c52a299793a1f2f824dc71a001a459e6d45c4c0665fc2e48a25d31d4bc7f,CCOC(=O)CC(Nc1nc(Cl)ncc1F)C(C)(C)c1cccs1,CFYRRRVOOYRWTP-UHFFFAOYSA-N,"InChI=1S/C16H19ClFN3O2S/c1-4-23-13(22)8-11(16(2,3)12-6-5-7-24-12)20-14-10(18)9-19-15(17)21-14/h5-7,9,11H,4,8H2,1-3H3,(H,19,20,21)"


Next, we can take the results and visualize them the same way we did before, using the `show molecules` command.

In [37]:
%openad load molecules using dataframe my_frame
%openad show molecules 

<span style="color: #090">Successfully loaded <span style="color: #dc0">20</span> molecules into the working set</span> <br> 


<br>

### Search for molecule substructures
<pre style="color:#eec;background:#2f2d3a;padding:20px;margin:0;border-radius:5px">search for substructure instances of '&lt;smiles&gt;' [ save as '&lt;filename.csv&gt;' ]</pre>

This command lets you find molecules containing certain substructures as defined by a SMILES string.

In [38]:
%openad search for substructure instances ?

`search for substructure instances of '<smiles>' [ save as '<filename.csv>' ]` <br> 
<span style="color: #ccc">----------------------------------------------------------------------------</span> <br> 
Search for molecules by substructure, as defined by the `<smiles_string>`. <br> 

Use the `save as` clause to save the results as a csv file in your current workspace. <br> 

Example: <br> 
`search for substructure instances of 'C1(C(=C)C([O-])C1C)=O' save as 'my_mol'` <br> 


The example below will write the results to a file called 'my_mol.csv' in your current workspace.

In [39]:
%openad search for substructure instances of 'C1(C(=C)C([O-])C1C)=O' save as 'my_mol'

<span style="color: #d00">There was an error calling DeepSearch</span> <br> 
<span style="color: #ccc">504 Server Error: Gateway Time-out for url: https://sds.app.accelerate.science//api/orchestrator/api/v1/query/run</span> <br> 
