In [None]:
%reload_ext openad.notebooks.styles

<!-- Header banner -->
<div class="banner"><div>Working with the Deep Search Plugin</div><b>OpenAD <span>Tutorial</span></b></div>

### Table of Contents

1. [Getting Started](#Getting-Started)
2. [Forward Reactions](#Forward-Reactions)
3. [Retrosynthesis](#Retrosynthesis)
4. [Interpreting Recipes](#Interpreting-Recipes)
5. [Enriching your Molecules with RXN Results](#Enriching-your-Molecules-with-RXN-Results)

## Getting Started

### Installation
If you haven't already, you can install the plugin directly from its [GitHub repo](https://github.com/acceleratedscience/openad-plugin-ds#readme).
    
    pip install git+https://github.com/acceleratedscience/openad-plugin-ds

### Magic Commands
Magic commands let you interact with the OpenAD shell.
1. `%openad` - Display results directly in your notebook<br>
2. `%openadd` - Store the returned data in a variable

To learn more, check the [OpenAD intro to magic commands](https://github.com/acceleratedscience/openad-toolkit/blob/main/openad/notebooks/magic_commands.ipynb).

### About Deep Search
To learn about what this plugin does, and to list its available commands, run:

    ds

In [None]:
%openad ds

### Command Documentation

Every command has detailed documentation where you can find everything you need to know, including optional parameters and examples.

To see the documentation of a command, just run the beginning of the command followed by a question mark.

In [None]:
%openad ds reset ?

## Searching for Molecules

### Similar Molecules

    ds search for molecules similar to <smiles>

In [None]:
smiles = 'CC(C)(c1ccccn1)C(CC(=O)O)Nc1nc(-c2c[nH]c3ncc(Cl)cc23)c(C#N)cc1F'
%openad ds search for molecules similar to {smiles}

### By Substructure

    ds search for molecules with substructure <smiles>

In [None]:
%openad ds search for molecules with  substructure 'C1(C(=C)C([O-])C1C)=O'

### Across Patents

#### From a List

    ds search for molecules in patents from list ['<patent_id>','<patent_id>',...]

In [None]:
# Basic example
%openad ds search for molecules in patents from list ['CN108473493B','US20190023713A1']

In [None]:
# Practical example
from IPython.display import display, HTML
patent = None
# 1) Find patents containing a certain molecule
smiles = 'CC(C)(c1ccccn1)C(CC(=O)O)Nc1nc(-c2c[nH]c3ncc(Cl)cc23)c(C#N)cc1F'
patents = %openadd ds search for patents containing molecule {smiles}
patents
# 2) Search for other molecules in these patents
if patents is not None:
    patent_ids = list(patents["publication_id"])
    %openad ds search for molecules in patents from list {patent_ids}
else:
    display(HTML(f'<span style="color:#d00">Something went wrong finding patents containing {smiles}</span>'))

#### From a DataFrame

    ds search for molecules in patents from dataframe <dataframe_name>

In [None]:
import pandas as py

# Create a Pandas DataFrame with patent ids
patent_ids = ['CN108473493B','US20190023713A1']
df = py.DataFrame(patent_ids, columns=['patent id'])

In [None]:
%openad ds search for molecules in patents from dataframe df

#### From a File

    ds search for molecules in patents from file '<filename.csv>'

For the purpose of this demo, we'll store a .csv file with patent ids in your workspace.

In [None]:
# Prep
patent_ids = ['CN108473493B','US20190023713A1']
cmd_pointer = %openadd cmd_pointer
workspace_path = cmd_pointer.workspace_path()
csv_file_path = f'{workspace_path}/ds_demo_patents.csv'

# Store reactions in a CSV file in your workspace
df = py.DataFrame(patent_ids, columns=['patent id'])
df.to_csv(csv_file_path)

In [None]:
# Inspect the file we just created
import subprocess
_ = subprocess.run(["open", csv_file_path])

In [None]:
%openad ds search for molecules in patents from file 'ds_demo_patents.csv'

## Exploring Collections

Before you can search a collection, you'll need to know _what_ collections to search.

### Overview of Collections

    ds list all collections [ details ]

In [None]:
# Overview of all available collections
%openad ds list all collections

In [None]:
# Description of all available collections
%openad ds list all collections details

You can also request the description of a single collection.

    ds list collection details '<collection_name_or_key>'

In [None]:
%openad ds list collection details 'ipcc'

### Find Collections by Domain

If you are looking for collections within a certain domain, you can first list the available domains...

    ds list all domains
    
... and then list the collections for the domain(s) you want.

    ds list collections for domain '<domain_name>'
    ds list collections for domains ['<domain_name>','<domain_name>',...]

In [None]:
%openad ds list all domains

In [None]:
%openad ds list collections for domain 'Materials Science'

In [None]:
%openad ds list collections for domain ['Materials Science','Scientific Literature']

### Find Collections by Content

If you're still not sure what collection to search, you can find collections relevant to your topic.

    ds list collections containing '<search_query>'

In [None]:
%openad ds list collections containing '"carbon capture"'

## Searching a Collection

Deep Search allows you to search across a variety of collections, returning documents with snippets highlighting the data matching your search criteria.

    ds search collection '<collection_name_or_key>' for '<search_query>'

### Command Documentation
Because of the large number of parameters, it is recommended to start by looking at the available options, only some of which we'll cover here.

In [None]:
%openad ds search collection ?

### <span style="color: green">Example A:</span> Query arXiv for "*power conversion efficiency*"

In this example we'll search for the input query in documents from the arXiv.org data collection. For each matched document we'll return the title, authors as well as the link to the original document on arXix.org

#### What we'll cover:
1. How to address a specific data collection
2. How to choose which component of the documents should be returned
3. How to iterate through the complete data collection by fetching page_size=50 results at the time


#### Getting the result estimate

First we will run and get an estimate of how many documents may appear in the search so we know we are pulling back a manageable amount.

In [None]:
%openad ds search collection 'arXiv abstracts' for 'ide("power conversion efficiency" OR PCE) AND organ* ' show (docs) estimate only

### Working with Results Data

By using the `%openadd` magic command, we can store the results in a dataframe and manipulate them as we wish.

In [None]:
# Load results in a dataframe
df = %openadd ds search collection 'pubchem' for 'Ibuprofen' show (data)

In [None]:
# Display the dataframe
df

In [None]:
# Count the results
result_count = len(df.index)
print(f'Our query returned {result_count} molecules:')

# List the returned smiles
smiles_list = df['SMILES'].tolist()
for sm in smiles_list:
    print('- ' + sm)

In [None]:
# Load the results in your molecule working set
%openad load molecules using dataframe df

In [None]:
# List the molecules in your working set
%openad list molecules

In [None]:
# Visualize the molecules in your working set
%openad show molecules

In [None]:
# Visualize a single molecule
%openad show molecule CC(C)Cc1ccc(C(C)C(=O)OCCN2CCN(c3cccc(Cl)c3)CC2)cc1