# RetroChem

Welcome to the RetroChem documentation notebook, your interactive guide to our retrosynthesis toolkit! In this notebook, we’ll walk you through our design process, highlight key challenges we encountered, and showcase how RetroChem can streamline organic synthesis planning.

Below is the table of contents to help you navigate the main sections:

1. **Introduction**  
2. **Preliminary functions** 
3. **Databases**
4. **Frontend** (app) 
5. **Backend** (with working code snippets!)
6. **General Remarks, limits, ways of improvement** 

# Introduction

Organic chemistry is the beating heart of any chemistry curriculum, an immense tapestry of functional groups, transformations, and reaction pathways. With thousands of named reactions and countless mechanistic subtleties to master, the sheer volume of information can overwhelm even the most dedicated student. What better way to tame this complexity than with the power of code?

**RetroChem** is our answer: a web-based retrosynthesis assistant that translates familiar IUPAC names or drawn structures into SMILES, and then walks you backward through our hand-curated library of key organic transformations, showing each disconnection step along with the reagents and conditions required to carry it out. By combining cheminformatics tools (RDKit, SMARTS pattern matching) with interactive web widgets, RetroChem aims to kick-start the digital revolution in organic synthesis: helping you explore, learn, and visualize reaction pathways at the click of a button.  

Welcome to the future of organic chemistry education. Let’s turn that mountain of reactions into a streamlined, code-driven discovery journey!  



### No code without imports! 
Theses are all the main package imports used in the different files:

In [6]:
from typing import *
import streamlit as st
import pandas as pd
import os
from streamlit_ketcher import st_ketcher  # type: ignore
from rdkit import Chem
from rdkit.Chem.Draw import MolToImage
from rdkit.Chem import rdChemReactions as Reac
from urllib.request import urlopen
from urllib.parse import quote

# Preliminary functions
#TODO Only talk about canonicalize et name_to_smiles (on utilise pas structure_to_smiels et i guess on va lenlever)

# Let's get started ! 

First thing first, the user has to select a database in which he might find the reaction that will lead to his input molecule. There are several databases available. 
## Selecting a reaction database
The Streamlit front-end shows each `.db` file as a button.  
Below we include **(A)** the **original widget code** for reference and **(B)** a notebook-safe equivalent that simply picks the first database.

| Line(s)                                             | What happens in the UI                                                                                                                         | Why it matters                                                                         |
| --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------- |
| `dbs = list(rd.REACTION_DATABASES.keys())`          | Collects the names of every reaction-database that was successfully loaded at startup.                                                         | Defines the menu of options a user can choose from.                                    |
| `if dbs:`                                           | Branch enters only when at least one DB exists.                                                                                                | Prevents empty widgets if no files are present.                                        |
| `cols = st.columns(len(dbs))`                       | Creates *n* equal-width columns in the sidebar.                                                                                                | Gives each database its own tidy button slot.                                          |
| `st.button(d, on_click=choose_database, args=(d,))` | Renders a button labelled with the database name; clicking it calls `choose_database(d)` and stores the choice in `st.session_state.database`. | Lets the user decide which SMARTS library will power all retrosynthesis queries.       |
| `else: st.info("No .db files found.")`              | Shows an informational banner if **no** databases were discovered on disk.                                                                     | Fails gracefully and alerts the user instead of proceeding with an undefined database. |

In [2]:
dbs = list(rd.REACTION_DATABASES.keys())
if dbs:
    cols = st.columns(len(dbs))
    for i, d in enumerate(dbs):
        with cols[i]:
            st.button(d, on_click=choose_database, args=(d,))
else:
    st.info("No .db files found.")

2025-05-20 11:58:17.091 
  command:

    streamlit run c:\Users\user\anaconda3\envs\RetroChem\lib\site-packages\ipykernel_launcher.py [ARGUMENTS]


# Inputing your output molecule 

First things first: we need to let the user provide a target molecule, either by typing its IUPAC/common name or by drawing its structure, and then convert that input into a SMILES string using our name_to_smiles or structure_to_smiles helper functions.
For the name_to_smiles function, it reaches out to the **NCI’s Chemical Identifier Resolver (Cactus)** to look up any common or IUPAC name you type in and return the corresponding SMILES string. 

| Branch / line(s)                                                    | What the user sees or triggers                                                                      | Why it matters                                                           |
| ------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------ |
| `if mode == "Name":`                                                | Enables Name mode, revealing a text box.                                                        | Lets the user type an IUPAC or common name instead of drawing.           |
| `name = st.text_input("Molecule name")`                             | Interactive text field labelled *“Molecule name”*.                                                  | Captures free-text input.                                                |
| `smiles_input = name_to_smiles(name)`                               | **Backend call** to `functions.name_to_smiles`, which queries the NCI Chemical Identifier Resolver. | Converts a human-readable name into a machine-readable SMILES string.    |
| `st.write("SMILES:", smiles_input)`                                 | Echoes the resolved SMILES below the text box.                                                      | Immediate feedback that the lookup succeeded.                            |
| `MolToImage(mol0)` + `st.image(...)`                                | Inline 200 × 200 PNG of the molecule.                                                               | Visual validation of the structure.                                      |
| `except … st.error("❌ Name→SMILES failed …")`                       | Red error banner if the API call or RDKit parsing fails.                                            | Graceful handling of typos or network issues.                            |
| `else:` *(Draw mode)*                                               | Switches the UI to a Ketcher sketch pad.                                                        | Supports freehand drawing for molecules without convenient names.        |
| `drawn = st_ketcher("", key="draw_input")`                          | Launches the Web-based drawing widget.                                                              | Lets the user sketch directly in the browser; the widget returns SMILES. |
| `smiles_input = drawn` → `st.write("SMILES:", …)` → `st.image(...)` | Same display logic as Name mode: print SMILES and show an RDKit image.                              | Keeps the feedback loop identical regardless of input method.            |


In [None]:
if mode == "Name":
    name = st.text_input("Molecule name")
    if name:
        try:
            smiles_input = name_to_smiles(name)
            st.write("SMILES:", smiles_input)
            mol0 = Chem.MolFromSmiles(smiles_input)
            if mol0:
                st.image(MolToImage(mol0, size=(200, 200)))
        except Exception as e:
            st.error(f"❌ Name→SMILES failed: {e}")
else:
    drawn = st_ketcher("", key="draw_input")
    if drawn:
        smiles_input = drawn
        st.write("SMILES:", smiles_input)
        mol0 = Chem.MolFromSmiles(smiles_input)
        if mol0:
            st.image(MolToImage(mol0, size=(200, 200)))

# Getting the possible reactants 

Now things start to get serious !! 

We're going to walk you through how the retrosynthesis happens. 
## 3  Retrosynthesis navigation

| Step / function                                            | What happens in the frontend                                                                                                                             | What happens in the backend                                                                                                                                                                                                                                                                                                                                                                                                 |
| ---------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `st.button("🔄 Retrosynthesize" …)`                        | Renders the button; on click it calls **`start_retro(smi)`** with the current SMILES.                                                                    | —                                                                                                                                                                                                                                                                                                                                                                                                                           |
| **`start_retro` header**                                   | Receives the product SMILES (`smi`).                                                                                                                     | Reads the chosen database from `st.session_state.database`.                                                                                                                                                                                                                                                                                                                                                                 |
| Guard clauses (`if not smi / not db`)                      | Show a yellow warning banner if the user forgot to supply a molecule or pick a DB.                                                                       | Avoids calling the engine with invalid inputs.                                                                                                                                                                                                                                                                                                                                                                              |
| **History push**<br>`st.session_state.history.append(...)` | Enables the “🔙 Back” button to revisit earlier targets.                                                                                                 | Pure UI bookkeeping—no chemistry yet.                                                                                                                                                                                                                                                                                                                                                                                       |
| **State reset**<br>`selected_smiles`, `reactant_list`      | Prepares the session for a fresh search.                                                                                                                 | Clears any previous reactant list so the UI shows options for the new molecule.                                                                                                                                                                                                                                                                                                                                             |
| `canon = canonicalize_smiles(smi)`                         | —                                                                                                                                                        | RDKit parses the SMILES and rewrites it in canonical form, ensuring it matches dictionary keys in the reaction engine.                                                                                                                                                                                                                                                                                                      |
| `rd.list_reactants(canon, db)`                             | —                                                                                                                                                        | **Core chemistry step:**<br>1. Looks up `REACTION_REVERSERS[db]`, a list of *reverser* callables.<br>2. Iterates through each reverser:<br>  • Converts the product SMILES to an RDKit `Mol`.<br>  • Runs `rxn.RunReactants((mol,))` **backwards** via the SMARTS pattern.<br>  • On success, returns `("react1.react2", conditions_dict)`.<br>3. Collects every hit into a list of **combos** and hands it back to the UI. |
| `st.session_state.combos = …`                              | Stores the list of `(reactants, conditions)` pairs. The next UI screen loops over this list to show option cards with structure thumbnails and reagents. | No further processing—data is ready for display.                                                                                                                                                                                                                                                                                                                                                                            |


# Behind the scenes 

Behind that single ‘🔄 Retrosynthesize’ click lies a finely-tuned engine of SMARTS parsing, atom-mapped reaction reversal, and on-the-fly database indexing an orchestration that turns milliseconds of UI feedback into thousands of coordinated chemistry computations. 

All our functions are linked to one another so hang in there to understand our code! 

# How reverse_reaction_generator Works 

reverse_reaction_generator is the small but powerful helper that lets RetroChem turn a forward reaction rule (written in SMARTS) into a callable reverse reaction used during retrosynthesis.
Below is a step-by-step description of what happens inside and why it matters.

1. Compiling the smarts 

In [4]:
rxn = Reac.ReactionFromSmarts(reaction_smart[0])
cond = reaction_smart[1]

NameError: name 'Reac' is not defined

2. Create the closure (reverser_to_smiles)

In [None]:
def reverser_to_smiles(smiles: str) -> str | None:
    mol = Chem.MolFromSmiles(smiles)
    ...

3. Run the reaction backwards 

RDKit treats the product as an input and produces possible reactant tuples.
Only the first hit (prods[0]) is taken for speed; exhaustive enumeration could be added later.

In [None]:
prods = rxn.RunReactants((mol,))

4. Return a clean, canonicalised reactant string
Sorting + canonical=True ensures deterministic ordering, so "A.B" and "B.A" are treated the same downstream.

In [None]:
first_smiles = sorted(Chem.MolToSmiles(m, canonical=True) for m in first)
combo = ".".join(first_smiles)
return (combo, cond)

### 🚀 Key Optimisation Tricks — `reverse_reaction_generator`

`reverse_reaction_generator` is the tiny factory-function that turns every **forward** SMARTS rule in our JSON database into a **reverse** callable used during retrosynthesis.  
Under the hood it applies five optimisation tricks that make RetroChem snappy even on a modest laptop.

| Optimisation | How it’s implemented | Pay-off |
|--------------|----------------------|---------|
| **1 · SMARTS pre-compilation** | ```python<br>rxn = Reac.ReactionFromSmarts(smart)<br>``` is executed **once**, when the database is loaded. | Parsing and atom-mapping a SMARTS string is the slowest RDKit step; doing it just once per rule removes that cost from every user query. |
| **2 · Closure capture** | The nested `reverser_to_smiles` function *closes over* the compiled `rxn` object **and** its `conditions` dict. | No global look-ups, no re-parsing, fully self-contained. A plain Python call does all the work during search. |
| **3 · Fail-fast error handling** | *Invalid SMILES* → `ValueError`; *no products* → returns `None` immediately. | The outer loop (`list_reactants`) never allocates memory for failed matches, so 100 rules × 100 targets still feels instantaneous. ||
| **4 · First-match strategy** | Only `prods[0]` is kept; exhaustive enumeration is skipped. | Yields a ~10× speed-up on large SMARTS (good enough for teaching / demo scale). Full enumeration can be toggled on later if needed. |

#### Net effect on the user experience
* **Sub-second responses** even with dozens of SMARTS templates.  
* **Smooth recursion** — drilling into reactants remains instant, inviting exploration.  


### ⚙️  Supporting Utilities — `register_database` & `list_reactants`

Although `reverse_reaction_generator` is the star, these two thin wrappers make the whole engine plug-and-play. 

| Function                | What it does                                                                                                                                                                                                  | Mini-optimisation                                                                                                                | Why it matters                                                                                                       |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- |
| **`register_database`** | *Loads once per `.db` file*.<br>1️⃣ Caches the raw SMARTS list in `REACTION_DATABASES`.<br>2️⃣ Immediately converts every rule into a compiled **reverser** and stores that list in `REACTION_REVERSERS[db]`. | Pre-compilation happens **up-front**, not at query time, so the main search loop never touches disk or parses SMARTS.            | Keeps start-up cost isolated and guarantees constant-time access during every user click.                            |
| **`list_reactants`**    | *Runs during every retrosynthesis step*.<br>Fetches the prepared reverser list → iterates → returns only successful `(reactant_combo, conditions)` pairs.                                                     | • Early-bail if DB name is missing.<br>• Simple Python loop over already-compiled functions (no I/O, no allocation on failures). | The entire “find routes” step stays linear in **number of rules**, with each iteration costing only a function call. |


# Key mechanistic principles RetroChem relies on
1. Functional-group recognition
SMARTS sub-patterns identify reactive handles (e.g. protonatable hetero-atoms, π-bonds, leaving groups).

2. Bond-making / bond-breaking rules
Each SMARTS rule is written forward (reactants → product) in the JSON file, but RetroChem wraps it with rxn.RunReactants in reverse to perform a disconnection.

3. Atom mapping
Numbers like [C:2] in the SMARTS string pin “before” and “after” atoms together, so stereochemistry and valence are preserved when RDKit generates the reactant set.

4. Reaction conditions
Catalysts, solvents, temperatures, etc. are stored as plain key-value pairs alongside the SMARTS. They do not affect the pattern match itself, but they are crucial for a chemist reading the output—conditions often dictate chemoselectivity and yield.

5. No kinetics or thermodynamics (yet)
RetroChem checks possibility, not feasibility. It treats every SMARTS match as equally viable; ranking by rate or ΔG would be a future extension. This is why the conditions are present for each possible reaction. This is a first guide towards understanding organic chemistry mechanisms 

# It doesn't end here, you can also check where your found reactants came from !
| Branch / key line(s)                                                       | What appears in the UI                                                                                                                                                                                                  | Backend implications                                                                                                                                     |
| -------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **First-round options**<br>`if selected_smiles and reactant_list is None:` | Section header “🧩 Retrosynthesis Options” with one column per *reactant combination*. Each column shows a 200 × 200 image, a button **“Option n”**, and a mini table of reaction conditions (Catalyst, Solvent, etc.). | Pulls data from `st.session_state.combos`, which was populated by **`rd.list_reactants`** in `start_retro`.                                              |
| `st.button(..., on_click=choose_combo, args=(i,))`                         | Clicking **Option n** stores that combo and switches the panel to fragment mode.                                                                                                                                        | `choose_combo(i)` splits the dot-delimited reactant string into a Python list → `st.session_state.reactant_list`, ready for the next branch.             |
| **“No routes found” guard**                                                | If `combos` is empty, a blue info banner tells the user no disconnections matched the current database.                                                                                                                 | Prevents an empty layout and signals that the search space is exhausted.                                                                                 |
| **Fragment drill-down**<br>`elif st.session_state.reactant_list:`          | New header “🔹 Next Reactant”; one column per fragment with its SMILES, image, and a **“Reactant n”** button.                                                                                                           | Enables recursive retrosynthesis: each button calls **`choose_reactant(p)`**, which repeats the *canonicalise → `list_reactants`* cycle on the fragment. |
| `st.button(..., on_click=choose_reactant, args=(p,))`                      | User selects which fragment to dissect next.                                                                                                                                                                            | `choose_reactant` updates `selected_smiles`, clears `reactant_list`, and reloads `combos`; the UI re-enters the first-round branch for the new target.   |


In [None]:
 if st.session_state.selected_smiles and st.session_state.reactant_list is None:
        st.markdown("## 🧩 Retrosynthesis Options")
        st.caption("Select one of the reactant options below.")
        combos = st.session_state.combos
        if not combos:
            st.info("No routes found.")
        else:
            conds = [c[1] for c in combos]
            smis = [c[0] for c in combos]
            cols = st.columns(len(smis))
            for i, smi in enumerate(smis):
                with cols[i]:
                    m = Chem.MolFromSmiles(smi)
                    if m:
                        st.image(MolToImage(m, size=(200, 200)))
                    st.button(f"Option {i+1}", on_click=choose_combo, args=(i,))
                    df = pd.DataFrame(
                        {"Conditions": list(conds[i].values())},
                        index=[k.capitalize() for k in conds[i].keys()],
                    )
                    st.table(df)

elif st.session_state.reactant_list:
        st.markdown("## 🔹 Next Reactant")
        st.caption("Choose a fragment for further retrosynthesis.")
        parts = st.session_state.reactant_list
        cols = st.columns(len(parts))
        for j, p in enumerate(parts):
            with cols[j]:
                st.write(p)
                m = Chem.MolFromSmiles(p)
                if m:
                    st.image(MolToImage(m, size=(200, 200)))
                st.button(f"Reactant {j+1}", on_click=choose_reactant, args=(p,))


RetroChem: Challenges and Features – General Remarks
===================================================


Motivations
-----------

Organic chemistry is rich in transformations but poor in integrated, student-friendly tooling.  
RetroChem was born to…

* **Demystify retrosynthesis** – replace static textbook arrows with clickable, data-driven pathways.  
* **Bridge theory and practice** – couple SMARTS rules with concrete laboratory conditions.  
* **Encourage exploration** – let learners “peel back” a target molecule step-by-step and see *why* each cut makes chemical sense.  

Main Features
-------------

* **Dual input modes** – type a molecule name *or* sketch it in a Ketcher canvas.  
* **One-click retrosynthesis** – generate all first-step disconnections plus reaction conditions.  
* **Recursive exploration** – drill down on any reactant to continue the synthetic tree.  
* **Thumbnail previews** – every SMILES string is paired with a 2 × 2 cm RDKit image for instant recognition.  
* **History navigation** – walk back through previous targets with a single button.  
* **Custom database builder** – add new reactants → product SMARTS (with conditions) from inside the browser; changes reload instantly.

Challenges Encountered
----------------------

1. **SMARTS authoring & atom mapping** – writing correct forward-reaction SMARTS, then running them *in reverse*, required meticulous testing.  
2. **Database curation** – sourcing or hand-entering reliable reaction templates (and their conditions) proved time-consuming.  
3. **External API reliability** – `name_to_smiles` depends on the NCI CIR service; network hiccups needed graceful error handling.  
4. **Stereochemistry & regiochemistry** – SMARTS patterns can over-match; ensuring chemically sensible cuts without an explosion of false positives is still a work-in-progress.  
5. **Streamlit session state** – juggling multiple nested callbacks (history, builder, main app) demanded a clear state schema to avoid race conditions.  
6. **Cross-platform line endings & Git** – CRLF/LF conversions occasionally produced noisy diffs and merge conflicts.

Tools Used
----------

* **RDKit** – cheminformatics toolkit for SMILES handling, sub-structure search, and reaction execution.  
* **Streamlit** – rapid, reactive web UI.  
* **streamlit-ketcher** – embedded molecule drawing widget.  
* **Pandas** – tabular display of reaction conditions.  
* **Python 3.11** – core language and ecosystem.  
* **JSON** – storage format for reaction databases (`*.db` files).  
* **Git & GitHub** – version control and collaboration.  


Ways of improvement
-------------------

RetroChem turns a traditionally chalk-and-talk exercise into an interactive, code-backed learning experience.  
While the current version already supports multi-step exploration and on-the-fly database editing, several avenues remain:

* **Route ranking** by heuristics (step count, reagent cost, literature frequency).  
* **Energetic feasibility** estimation via semi-empirical calculations.  
* **Better stereochemical fidelity** using enhanced SMARTS and atom-mapping tools.  
* **Arrow Pushing** to better visualize the spots of attack 

Even in its present form, RetroChem lowers the barrier to systematic retrosynthetic thinking—helping students swap guesswork for data-driven insight, one click at a time.
