# Deep Search and the Common Client

![](./media/AD_Banner.jpg)
<a id="top">

## You only need to run the Magic command %run if you did not execute init_magic from the command line 

In [None]:
# %run openad_magic.ipynb #Not required if you have run init_magic

## If you setup the Tell Me Large Language Model interface the below will run for you

The below will only run if you have a large language model account. Currently only OPENAI supported.

In [None]:

%openad tell me how would I create a workspace then search for ibuprofen using deepsearch  then display them with a viewer


Let's now remove the Deep Search Toolkit and re-add it to make sure we have the latest version.

In [None]:
%openad remove toolkit DS4SD

Let's check to make sure it is removed.

In [None]:
%openad get status

Now let's in batch add the DS4SD Deep Search Toolkit, set it to the current context and call get status to check what workspace we are in, and that the toolkit is set.

In [None]:
%openad add toolkit DS4SD
%openad set context ds4sd
%openad get status

## What Deep Search Commands Exist ?

Now we will list all commands are available for the Deep Search Toolkit (DS4SD)

In [None]:
%openad ?  DS4SD

## Displaying Domains 

### Displaying all Collections and Domains

You can display all collections with the following command 
`DISPLAY ALL COLLECTIONS [SAVE AS '<csv_file_name>']`


In [None]:
%openad display all collections

### Let's find out what Collections contain what documents we might be interested in.

Display collection matches searches all collections for documents that contain a given search string.<BR><br>
`DISPLAY COLLECTION MATCHES FOR '<search_string>' [SAVE AS '<csv_file_name>']`

In [None]:
%openad display collection matches for 'carbon capture'

Display collections for a specific Domain Name.<BR><br>
`DISPLAY COLLECTIONS FOR  DOMAIN `<domain_name>`

In [None]:
%openad display collections for domain 'Business Insights'

### Subset our list by a set of Domains

Given a list of domain names, the below command will give a list of Collections within those domains<BR><br>
`DISPLAY COLLECTIONS IN DOMAINS FROM LIST [<list_of_domains>] [SAVE AS '<csv_file_name>']`

In [None]:
%openad display collections in domains from list [ 'Business Insights','Climate & Sustainability']

### Drill into the details behing a Collection

To drill down on the description of a document collection, its source and date <br> <br>
`DISPLAY COLLECTION DETAILS '<collection_name>' | '<collection_key>'`

In [None]:
%openad display collection details 'ESG Reports'

## Searching a collection inside the Deep Search Repository

The Deep Search command for searching repositories uses the below syntax:

### **Searching for Documents on a Subject**



In this section we search for documents in the arXiv.org data collection matching the input query. For each matched document we return the title, authors as well as the link to the original document on arXix.org

***This example will demonstrate***

1. How to address a specific data collection
2. How to choose which component of the documents should be returned
3. How to iterate through the complete data collection by fetching page_size=50 results at the time

The searching of a Collection command provides a flexible way that can be used in its simplest form or with a variety of options to execute a search against a variety of collections allowing you simply to pull back documents with snippets that meet the search criteria or data stored within those documents. The command syntax is as follows.<br> <br>

To see more about the command lets run the help for it.

In [None]:
 
%openad search collection  ?

First we will run and get an estimate of how many documents may appear in the search so we know we are pulling back a manageable amount

In [None]:
%openad search collection 'arXiv abstracts' for 'ide("power conversion efficiency" OR PCE) AND organ* '  show (docs) estimate only

### Retrieving results ###
Now we will retrieve and view all results, if we do not use the `return as data` clause the results will be returned as ***pandas styler*** object that provides an enhanced snippet display of the data.

In [None]:
df_styler= %openad search collection 'arXiv abstracts'  for 'ide("power conversion efficiency" OR PCE) AND organ* ' using \
( system_id=default edit_distance=20  ) show (data docs) 

The Styler object can be viewed straight away or assigned as a variable, to extract the raw data we would reference `df_styler.data` to return the base data in a data frame.
make sure the cell you are referencing has scrolling enabled by setting the properties for the cell, this way you can scroll through all the data, a window rather than taking up the entire notebook.


In [None]:
df_styler

### Simply viewing the Results ###
Here we will run the search straight from the magic command view it straight away, notice we are using the `page_size` and `edit_distance` options to fine tune our result, try different values for these options.

In [None]:
%openad search collection 'arXiv abstracts' for ' " carbon capture" AND "membrane" ' using (  page_size=10 edit_distance=5 ) show (data docs)  

## **Search _Ibuprofen_ in PubChem** 

In this section we search for all PubChem entries which contain the string _Ibuprofen_.

In the results table we see the name of the chemical, its molecule SMILES and some properties such as the molecular weight and the solubility.


In [None]:
ibuprofen_df = %openad search collection 'pubchem' for 'Ibuprofen' SHOW (data) 
display(ibuprofen_df)

### Working with the results as Data ###
By using the `return as data` clause we will return the data as a raw data frame, this way we will be able to pass it to other utilities.

In [None]:
%openad search collection 'pubchem' for   'Ibuprofen'  SHOW (data) 

### Displaying the molecules in a viewer ###
Now we will view the molecules and select to subset the molecules for further viewing.<br>
We will do this using the show molecules command that invokes mols2grid and manages launching it for us.<br>

`SHOW MOLECULES USING ( FILE '<mols_file>' | DATAFRAME <dataframe> ) [ SAVE AS '<sdf_or_csv_file>' | AS MOLSOBJECT ]`

In [None]:
%openad show molecules using ?

So first we will will search for molecular data records related to 'Ibuprofen' and return the data set as data to a dataframe, then pass this data fram to `show molecules` for display.

In [None]:
my_df= %openad search collection 'pubchem' for 'Ibuprofen' SHOW (data) return as data
%openad show molecules using dataframe my_df

## **Now We display via mols2grid object and do selections to a data frame** 



So we will now take the same data frame and run `show molecules` but this time we will have the search return a mols2grid object so we can use the data selection capability of mols2grid within out notebook environment.

In [None]:
x = %openad show molecules using dataframe my_df as molsobject

Now lets display the object formatting the subset display using the "name" field

In [None]:
x.display(**{'subset': ['NAME']})

Now, select one or more tiles in the mols2grid display then run the below cell to display the selected Molecules records.

In [None]:
x.get_selection()

# Searching Molecules

## Search for Similar Molecules

Now lets consider the functions that allow us to specifically search documents containing molecules for relationships between molecules, patents and other information.

First Look at a search that can look for molecules that have similar make up to a provided molecule in SMILES string format.<br><br>
`SEARCH FOR SIMILAR MOLECULES '<SMILES_string>' [SAVE AS '<csv_file_name>']`

In [None]:
# define molecule
smiles_Molecule='CC(C)(c1ccccn1)C(CC(=O)O)Nc1nc(-c2c[nH]c3ncc(Cl)cc23)c(C#N)cc1F' 
# now substitute the variable into the %openad command
mols = %openad search for similar molecules to '{smiles_Molecule}'  
# then display the results
display(mols) 

## Search for Patents Containing a Specified Molecule

We can also search for patents that mention a specfic SMILES defined molecule using the command. <br><br> `SEARCH FOR PATENTS CONTAINING MOLECULE ['<SMILES_molecule>'| '<inchi_molecule>'] [SAVE AS '<csv_file_name>']`

In [None]:
%openad search for patents ?

In [None]:

patents = %openad search for patents containing molecule '{smiles_Molecule}'
display(patents)

## Search for Molecules in a defined List of Patents

Now lets consider we now have a list of patents of interest, but we want to find out what molecules are additionally mentioned in these patents. We can do this using the following command.<br><br>
`SEARCH FOR MOLECULES IN PATENTS FROM [LIST ['<patent1>', '<patent2>' .....] | DATAFRAME <dataframe_name> | FILE '<workspace_file name>'] [SAVE AS '<csv_file_name>']`

In [None]:
mylist = list(patents['PATENT ID'])
myframe =  %openad search for molecules in patents from list {mylist}
display(myframe)

We can also take the resulting data frame and show them in the molecules selection viewer mols2grid using the `show molecules` command

In [None]:
%openad show molecules using dataframe myframe

## Search for Molecules with instances of a Specified Substructure

We can also search for molecules with substructure similarities to a given smiles string using the following command.

`SEARCH FOR SUBSTRUCTURE INSTANCES OF '<SMILES_string>' [SAVE AS '<csv_file_name>']`

The following example searches for molecules containing a defined SMILES substructure string and saves it to a file my_mol which will be saved as a csv file 'my_mol.csv' in the current workspace.

In [None]:
%openad search for substructure instances of 'C1(C(=C)C([O-])C1C)=O' save as 'my_mol'