# RetroChem

Welcome to the RetroChem documentation notebook, your interactive guide to our retrosynthesis toolkit! In this notebook, we’ll walk you through our design process, highlight key challenges we encountered, and showcase how RetroChem can streamline organic synthesis planning.

Below is the table of contents to help you navigate the main sections:

1. **Introduction**   
2. **Databases**
3. **Frontend** (app) 
4. **Backend** (with working code snippets!)
5. **General Remarks, limits, ways of improvement** 

# Introduction

Organic chemistry is the beating heart of any chemistry curriculum, an immense tapestry of functional groups, transformations, and reaction pathways. With thousands of named reactions and countless mechanistic subtleties to master, the sheer volume of information can overwhelm even the most dedicated student. What better way to tame this complexity than with the power of code?

**RetroChem** is our answer: a web-based retrosynthesis assistant that translates familiar IUPAC names or drawn structures into SMILES, and then walks you backward through our hand-curated library of key organic transformations, showing the reagents and conditions required to carry it out. By combining cheminformatics tools (RDKit, SMARTS pattern matching) with interactive web widgets, RetroChem aims to helping you explore, learn, and visualize reaction pathways at the click of a button.  

Welcome to the future of organic chemistry education. Let’s turn that mountain of reactions into a streamlined, code-driven discovery journey!  



### No code without imports! 
Theses are all the main package imports used in the different files:

In [None]:
from typing import *
import json 
import streamlit as st
import pandas as pd
import os
from streamlit_ketcher import st_ketcher  
from rdkit import Chem
from rdkit.Chem.Draw import MolToImage
from rdkit.Chem import rdChemReactions as Reac
from urllib.request import urlopen
from urllib.parse import quote

### Let's get started ! 

# Databases
RetroChem supports multiple reaction databases, each tailored to different levels and domains of organic chemistry. First, we built a handmade database that covers the core reactions taught in the first and second years of the EPFL chemistry curriculum. This includes standard transformations and their associated reaction conditions, helping students review and apply known retrosynthetic logic.

In addition, we integrated a second database sourced from an open-source repository by Datamol (more info on https://github.com/datamol-io/datamol), which focuses on more advanced medicinal chemistry reactions, particularly heterocycle constructions. **This database was parsed using the handmade code available in the file parsing_database.py.**

Finally, the platform includes a custom database builder that allows users to create or extend their own databases by defining reaction SMARTS and adding experimental conditions. This ensures flexibility and supports both educational and exploratory use cases. This will be discussed more in detail later in the report.

# Frontend / app
**Note:** The code snippets in this section are illustrative and not meant to be executed directly within this notebook. They are included to highlight key parts of the frontend code. To run the full application, navigate to the src/retrochem directory in your terminal and execute: 

streamlit run app.py 

## Managing Session States

RetroChem relies on Streamlit’s `st.session_state` to preserve context across interactions. We store variables like:
- `selected_smiles`: the current target molecule
- `reactant_list`: fragments to be explored
- `history`: all previously selected targets
- `database`: the chosen reaction set
- and much more

This ensures the user can:
- Navigate back in the history using “🔙 Back”
- Restart using “🧹 Start Over”
- Dynamically explore branches without page reloads


## Callback Overview

Throughout the app, several callback functions are defined to handle user actions like navigation, retrosynthesis, and database management. These functions manipulate Streamlit’s session state and interface with backend logic (which will be explained in detail in the Backend section).

Below is a summary of the key callback functions:

| Callback Function           | Triggered By (UI)                      | Purpose                                                                                   |
|-----------------------------|----------------------------------------|-------------------------------------------------------------------------------------------|
| `go_home()`                 | “🏠 Return to Home” button              | Resets most of the session state and navigates to the Home screen.                       |
| `go_main()`                 | “🔬 Start Retrosynthesis” button       | Navigates from Home to the main retrosynthesis interface.                                |
| `clear_history()`           | “🧹 Start Over” button                 | Clears retrosynthesis history and resets the molecule state.                             |
| `do_retrosynthesis(smi, db)`| “🔄 Retrosynthesize” button            | Launches the retrosynthesis logic (uses backend’s `list_reactants`; see Backend section).|
| `select_option(idx)`        | “Option n” button (for reactant sets)  | Selects a particular disconnection and stores its components.                            |
| `select_fragment(idx)`      | “Reactant n” button (in fragment view) | Applies retrosynthesis recursively to a selected fragment.                               |
| `handle_db_select()`        | Dropdown menu for databases            | Switches the active database or enters the database builder.                             |
| `reset_builder()`           | “Start Over” in Builder mode           | Clears all reactant/product/conditions fields for fresh input.                           |
| `remove_reactant(idx)`      | “❌” buttons next to reactants         | Removes a single reactant from the reaction builder list.                                |
| `refresh_databases()`       | App start, `go_home`, or `clear_history()` | Loads all `.db` files and registers them for use (uses backend’s `load_database` and `register_database`). |
| `init_state()`       | At start-up | Initializes default values in Streamlit's session state to ensure a stable initial application state.|

Each callback updates `st.session_state` to drive app behavior reactively, and most of them interface directly with backend utilities for SMILES handling, reaction parsing, or database updates, which will be discussed in the next section.

## Selecting a reaction database
The Streamlit front-end shows each .db file as a dropdown option within a selectbox in the sidebar menu.  

In [None]:
dbs = list(rd.REACTION_DATABASES.keys())
placeholder   = 'Select Database'
add_label     = '➕ Add or edit your own database'
opts    = [placeholder] + dbs + [add_label]
st.selectbox('Choose Database',
            opts,
            index=0,
            key='main_db_select',
            on_change= handle_db_select)
st.markdown(f"**Current DB:** {st.session_state.database or '_(none)_'}")

| Line(s)                                             | What happens in the UI                                                                                                                         | Why it matters                                                                         |
| --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------- |
| `dbs = list(rd.REACTION_DATABASES.keys())`          | Collects the names of every reaction-database that was successfully loaded at startup. Calls REACTION_DATABASES from the backend                                                        | Defines the menu of options a user can choose from.                                    |
| `st.selectbox('Choose Database', opts, index=0, key='main_db_select', on_change=handle_db_select)` | Renders a dropdown menu with the available database names; selecting one triggers `handle_db_select()` which updates `st.session_state.database`. | Lets the user decide which SMARTS library will power all retrosynthesis queries.       |

# Inputing your molecule 

 We need to let the user provide a target molecule, either by typing its IUPAC/common name or by drawing its structure, and then convert that input into a SMILES string using our name_to_smiles helper function and streamlit ketcher.
For the name_to_smiles function, it reaches out to the **NCI’s Chemical Identifier Resolver (Cactus)** to look up any common or IUPAC name you type in and return the corresponding SMILES string. 


In [None]:
if mode == "Name":
    name = st.text_input("Molecule name")
    if name:
        try:
            smiles_input = name_to_smiles(name)
            st.write("SMILES:", smiles_input)
            mol0 = Chem.MolFromSmiles(smiles_input)
            if mol0:
                st.image(MolToImage(mol0, size=(200, 200)))
        except Exception as e:
            st.error(f"❌ Name→SMILES failed: {e}")
else:
    drawn = st_ketcher("", key="draw_input")
    if drawn:
        smiles_input = drawn
        st.write("SMILES:", smiles_input)
        mol0 = Chem.MolFromSmiles(smiles_input)
        if mol0:
            st.image(MolToImage(mol0, size=(200, 200)))


| Branch / line(s)                                                    | What the user sees or triggers                                                                      | Why it matters                                                           |
| ------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------ |
| `if mode == "Name":`                                                | Enables Name mode, revealing a text box.                                                        | Lets the user type an IUPAC or common name instead of drawing.           |
| `name = st.text_input("Molecule name")`                             | Interactive text field labelled *“Molecule name”*.                                                  | Captures free-text input.                                                |
| `smiles_input = name_to_smiles(name)`                               | **Backend call** to `functions.name_to_smiles`, which queries the NCI Chemical Identifier Resolver. | Converts a human-readable name into a machine-readable SMILES string.    |
| `st.write("SMILES:", smiles_input)`                                 | Echoes the resolved SMILES below the text box.                                                      | Immediate feedback that the lookup succeeded.                            |
| `MolToImage(mol0)` + `st.image(...)`                                | Inline 200 × 200 PNG of the molecule.                                                               | Visual validation of the structure.                                      |
| `except … st.error("❌ Name→SMILES failed …")`                       | Red error banner if the API call or RDKit parsing fails.                                            | Graceful handling of typos or network issues.                            |
| `else:` *(Draw mode)*                                               | Switches the UI to a Ketcher sketch pad.                                                        | Supports freehand drawing for molecules without convenient names.        |
| `drawn = st_ketcher("", key="draw_input")`                          | Launches the Web-based drawing widget.                                                              | Lets the user sketch directly in the browser; the widget returns SMILES. |
| `smiles_input = drawn` → `st.write("SMILES:", …)` → `st.image(...)` | Same display logic as Name mode: print SMILES and show an RDKit image.                              | Keeps the feedback loop identical regardless of input method.            |


## Getting the Possible Reactants

Now things start to get serious! We're going to walk you through how the retrosynthesis engine kicks in and how the user can navigate possible disconnections.

### 🔄 Retrosynthesis Flow (Initiation)

The retrosynthesis process begins when the user presses the "Retrosynthesize" button after defining a molecule and selecting a database.


In [None]:
if smiles:
    st.button('🔄 Retrosynthesize',
              key='main_retrosynthesize',
              on_click=do_retrosynthesis,
              args=(smiles, st.session_state.database))


This triggers the callback `do_retrosynthesis(smiles, db)` which does the following:

1. Pushes the current SMILES to the history (for backtracking):

```python
st.session_state.history.append(st.session_state.selected_smiles)
```

2. Updates the selected molecule:

```python
st.session_state.selected_smiles = smiles
```

3. Clears any previously selected reactants:

```python
st.session_state.reactant_list = None
```

4. Computes possible retrosynthetic disconnections using:

```python
st.session_state.combos = rd.list_reactants(canonicalize_smiles(smiles), db)
```

The `canonicalize_smiles()` ensures format consistency. The function `rd.list_reactants()` is a backend function that applies all registered SMARTS rules from the selected database (more in the backend section).


### 🧩 Choosing Between Options

Once disconnections (combos) are found, the user is shown a set of options with molecules and reaction conditions:


In [None]:

if st.session_state.selected_smiles and not st.session_state.reactant_list:
    st.subheader('🧩 Options')
    combos = st.session_state.combos or []
    if not combos:
        st.info('Your product is too simple, or not in the chosen database.')
    else:
        cols = st.columns(len(combos))
        for i, (s, cond) in enumerate(combos):
            with cols[i]:
                st.image(MolToImage(Chem.MolFromSmiles(s), (150,150)))
                # Display conditions as a mini-table
                if cond:
                    df = pd.DataFrame.from_dict(cond, orient='index', columns=['Value'])
                    df.index.name = 'Parameter'
                    st.table(df)
                st.button(
                    f'Option {i+1}',
                    key=f'opt{i}',
                    on_click = select_option,
                    args=(i,),
                )


Each "Option" button lets the user pick one disconnection.


### 🔹 Fragment Drilldown: Retrosynthesizing the Reactants

After picking an option, the reactants are shown individually, each with its own "Reactant n" button. This allows recursive retrosynthesis:


In [None]:
elif st.session_state.reactant_list:
    st.subheader('🔹 Next Fragment')
    parts = st.session_state.reactant_list
    cols = st.columns(len(parts))
    for i, p in enumerate(parts):
        with cols[i]:
            st.image(MolToImage(Chem.MolFromSmiles(p), (150,150)))
            st.button(f'Reactant {i+1}',
                      key=f'react{i}',
                      on_click = select_fragment,
                      args=(i,))


Clicking a fragment does:

1. Push the current target to history.
2. Set that fragment as the new `selected_smiles`.
3. Query the database again to list reactants for the selected fragment.


### ♻️ Recursive Exploration

This recursive loop — target → disconnection → pick fragment → disconnection again — lets users drill deep into possible retrosynthetic trees. It’s particularly powerful for visualizing multi-step routes or simplifying complex targets iteratively.

Each step is reactive and memory-light, with RDKit handling chemistry and Streamlit’s state model tracking choices, which makes the experience snappy even in large databases.


# BackEnd: What's Happening Behind the Scenes? 

Behind that single ‘🔄 Retrosynthesize’ click lies a finely-tuned engine of SMARTS parsing, atom-mapped reaction reversal, and on-the-fly database indexing an   that turns milliseconds of UI feedback into thousands of coordinated chemistry computations. 

All our functions are linked to one another so hang in there to understand our code! 

Note: The code snippets in this section of the report work, enjoy discovering how our code works! don't forget to run the imports of the introduction.
## How reverse_reaction_generator Works 

Reverse_reaction_generator is the small but powerful helper that lets RetroChem turn a inverse reaction rule (written in SMARTS) into a callable reverse reaction used during retrosynthesis. This is the main function of the backend, and the star of the show.
Below is a step-by-step description of what happens inside and why it matters.


In [None]:
SmartsConditionsPair = Tuple[str, dict]
SmilesConditionsPair= Tuple[str,dict]

def reverse_reaction_generator(reaction_smart: SmartsConditionsPair)->Callable[[str], SmilesConditionsPair | None]:
    rxn = Reac.ReactionFromSmarts(reaction_smart[0])
    cond = reaction_smart[1]
    def reverser_to_smiles(smiles: str) -> str | None:
        mol = Chem.MolFromSmiles(smiles)
        if mol is None:
            raise ValueError("Invalid SMILES")
        try:
            prods = rxn.RunReactants((mol,))
        except Exception:
            return None
        if not prods:
            return None
        first = prods[0]
        first_smiles = [Chem.MolToSmiles(m, canonical=True) for m in first]
        combo = ".".join(first_smiles)
        return (combo, cond)
    return reverser_to_smiles


1. Compiling the smarts 

The SMARTS string with a form "product>>reactant1.reactant2.reactant3" is converted into an RDKit reaction object (rxn) using ReactionFromSmarts. This step parses atom mapping and bond changes.

2. Create the closure (reverser_to_smiles)

An inner function reverser_to_smiles is returned. It captures the compiled reaction object and its conditions, so it can be reused efficiently.

3. Run the reaction backwards 

RDKit treats the product as an input and produces possible reactant tuples basing itself on a reaction object, created earlier through a reaction SMARTS.
Only the first hit (prods[0]) is taken for speed; exhaustive enumeration could be added later.

4. Return a clean, canonicalised reactant string with a form "reactant1.reactant2.reactant3", ready to be displayed easily in the app.


### 🚀 Key Optimisation Tricks — `reverse_reaction_generator`


| Optimisation | How it’s implemented | Pay-off |
|--------------|----------------------|---------|
| **1 · SMARTS pre-compilation** | ```rxn = Reac.ReactionFromSmarts(smart)``` is executed **once**, when the database is loaded. | Parsing and atom-mapping a SMARTS string is the slowest RDKit step; doing it just once per rule removes that cost from every user query. |
| **2 · Closure capture** | The nested `reverser_to_smiles` function closes over the compiled `rxn` object **and** its `conditions` dict. | No global look-ups, no re-parsing, fully self-contained. A plain Python call does all the work during search. |
| **3 · Fail-fast error handling** | *Invalid SMILES* → `ValueError`; *no products* → returns `None` immediately. | The outer loop (`list_reactants`) never allocates memory for failed matches.|


#### Net effect on the user experience
* **Sub-second responses** for each selected smile, even with a lot of SMARTS templates.  
* **Smooth recursion** — drilling into reactants remains instant, inviting exploration.  


### ⚙️  Supporting Utilities — `register_database` & `list_reactants`
📥 load_database(path: str)


In [None]:
def load_database(path: str):
    try:
        with open(path, "r") as file:
            return json.load(file)
    except:
        return None

Loads a .db JSON file and returns the list of SMARTS-condition pairs. If the file doesn’t exist or is invalid, it returns None.

📌 register_database(values, name)

In [None]:
REACTION_DATABASES: dict[str, list[SmartsConditionsPair]] = {}
REACTION_REVERSERS: dict[str, List[Callable[[str], SmartsConditionsPair | None]]] = {}

def register_database(values, database):
    REACTION_DATABASES[database] = values
    REACTION_REVERSERS[database] = [reverse_reaction_generator(i) for i in values]


1️⃣ Caches the raw SMARTS list of a database in `REACTION_DATABASES`

2️⃣ Immediately converts every rule of a certain database into a compiled **reverser** and stores that list in `REACTION_REVERSERS`

🧼 clear_registered_databases()

In [None]:
def clear_registered_databases():
    REACTION_DATABASES.clear()
    REACTION_REVERSERS.clear()

Clears all cached databases (useful for refreshing).

🔍 list_reactants(smiles, db):

In [None]:
def list_reactants(smiles: str, database: str):
    get = REACTION_REVERSERS.get(database)
    if get is None:
        return None
    ret = []
    for fn in get:
        val = fn(smiles)
        if val is not None:
            ret.append(val)
    return ret

Loops over each callable in the converted database (reaction_reversers) and applies it to the SMILES input. Returns a list of successful matches (Reactant_combo - conditions pair).
This is the final function that ties all the previous ones together to allow the output of wanted reactants.

### 📝 `add_new_smart` - extend or edit a reaction database



In [None]:
def add_new_smart(database_name: str, product: str, reactants: list[str], conditions: dict[str, str] = dict())->None:
    file_path = f'{database_name}.db'
    previous = load_database(file_path) or []
    product_smarts = Chem.MolToSmarts(Chem.MolFromSmiles(product)).replace('\\', '-').replace('/', '-')
    reactants_smarts = '.'.join([Chem.MolToSmarts(Chem.MolFromSmiles(i)).replace('\\', '-').replace('/', '-') for i in reactants])
    previous.append((f'{product_smarts}>>{reactants_smarts}', conditions))
    register_database(previous, database_name)
    with open(file_path, 'w') as file:
        file.write('[\n')
        if len(previous) != 0:
            file.write(f'  [\n    "{previous[0][0]}",\n    {json.dumps(previous[0][1])}\n  ]')
        for i in range(1, len(previous)):
            file.write(f',  [\n    "{previous[i][0]}",\n    {json.dumps(previous[i][1])}\n  ]')
        file.write('\n]\n')



 **Arguments**
   | Parameter | Type | Purpose |
   |-----------|------|---------|
   | `database_name` | `str` | File name to modify or create |
   | `product` | `str` | SMILES string for the product of the reaction. |
   | `reactants` | `list[str]` | List of SMILES strings for all reactants in the reaction. |
   | `conditions` | `dict[str, str]` | Dictionaries containing the different conditions specific to the reaction. |


**Goal**

This function makes sure to give the user the choice of either adding a reaction rule into an already existing database or create a new one. This comes from the conditional variable "previous".


## 🧪 Test Our Backend Functions Yourself!
The RetroChem backend was built with testability in mind. Below are simple examples you can run right in this notebook (after running the previous function snippets) to see each core function in action.

### 🔁 Try a Reverse Reaction
This shows how to use our reverse_reaction_generator to turn a product into possible reactants using a SMARTS rule.


In [None]:
#simulating a database:
reaction_smarts = ("[C:1][C:2](=O)[C:3]>>[C:1][C:2][C:3].NN",
                    {"temperature": "197 °C", "solvent": "KOH"})
#"converts" database into a callable
reverser = reverse_reaction_generator(reaction_smarts)
#calls callable for an inputed smile
ReactantsConditions = reverser("CCCCCC(=O)CC")
#output Reactants and conditions
print(ReactantsConditions)


The expected output is:
('CCCCCCCC', {'temperature': '197 °C', 'solvent': 'KOH'})

This confirms that the input product which is a ketone, can be transformed into an alkane and nitrogen gas, which is the wolff-Kishner reaction, described as a SMARTS rule in our "database"

In [None]:
#display it
print("your product is:")
MolToImage(Chem.MolFromSmiles("CCCCCC(=O)CC"))

In [None]:
Mol_reactants = Chem.MolFromSmiles(ReactantsConditions[0])
print("your corresponding reactants")
MolToImage(Mol_reactants)

**Reminder**: The function list_reactants works a bit like the line: reverser("smiles"), the only difference is that it itirates this call over a whole database of callables (REACTION_REVERSERS), and returns only reactants and conditions who matched!

### 📥 Now for the Database functions

In [None]:
#Simulating a database
sample_db_content = [
    ("CC>>C.C", {"temperature": "25"}),
    ("CCC>>C.C.C", {"solvent": "THF"})
]

# Write this to a temporary file
sample_db_path = "test_demo.db"
with open(sample_db_path, "w") as f:
    json.dump(sample_db_content, f)

You should see a file test_demo.db appear in the src\retrochem folder

In [None]:
print(load_database("test_demo.db"))

You should see the same list as sample_db_content.
The function load_database indeed breaks down the json file into python-readable text.

In [None]:
# Check state before registering
print("Before register_database():")
print("REACTION_DATABASES:", REACTION_DATABASES)
print("REACTION_REVERSERS:", REACTION_REVERSERS)

# Register two small example databases
example_db1 = [
    ["CC>>C.C", {"temperature": "25 °C"}],
    ["CCC>>C.C.C", {"solvent": "THF"}]
]
register_database(example_db1, "demo1")
example_db2 = [["[C:1][C:2](=O)[C:3]>>[C:1][C:2][C:3].NN", {"temperature": "197 °C", "solvent": "KOH"}]
    
]
register_database(example_db1, "demo2")

# Check state after registering
print("\nAfter register_database('demo'):")
print("REACTION_DATABASES:")
for i, j in REACTION_DATABASES.items():
    print(f"  {i}: {j}")

print("\nREACTION_REVERSERS:")
for k, funcs in REACTION_REVERSERS.items():
    print(f"  {k}: {len(funcs)} reverser(s)")
    for i, f in enumerate(funcs, 1):
        print(f"    ↳ Function {i}: {f.__name__} (callable)")

You can see that the register function changes the states of our two main dictionnaries REACTION_DATABASES and REACTION_REVERSERS, which contains all the necessary rules for retrosynthesis

In [None]:
# Check state before clearing
print("Before clear_registered_database():")
for i, j in REACTION_DATABASES.items():
    print(f"  {i}: {j}")

print("\nREACTION_REVERSERS:")
for k, funcs in REACTION_REVERSERS.items():
    print(f"  {k}: {len(funcs)} reverser(s)")
    for i, f in enumerate(funcs, 1):
        print(f"    ↳ Function {i}: {f.__name__} (callable)")

# clearing
clear_registered_databases()

# Check State after clearing
print("After clear_registered_database():")
print("REACTION_DATABASES:", REACTION_DATABASES)
print("REACTION_REVERSERS:", REACTION_REVERSERS)

The clear function indeed clears all the registered databases, ready for a fresh start!

In [38]:
# Define example reaction for new database test_db
db_name = "test_db"
product = "CC"  # ethanol
reactants = ["C", "C"]  # two methyl groups
conditions = {"temperature": "25C", "solvent": "THF"}

# Call the function
add_new_smart(db_name, product, reactants, conditions)

# Show the content of the database file
file_path = f"{db_name}.db"
print(f"\n📄 Contents of '{file_path}':\n")
with open(file_path, "r") as f:
    print(f.read())

# Show the registered SMARTS and conditions
print("\n Registered SMARTS in REACTION_DATABASES:")
for smarts, cond in REACTION_DATABASES[db_name]:
    print(f"  SMARTS: {smarts}")
    print(f"  Conditions: {cond}")

# Show number of reversers compiled
print(f"\n🧠 Number of reversers in REACTION_REVERSERS['{db_name}']: {len(REACTION_REVERSERS[db_name])}")



📄 Contents of 'test_db.db':

[
  [
    "[#6]-[#6]>>[#6].[#6]",
    {"temperature": "25C", "solvent": "THF"}
  ]
]


 Registered SMARTS in REACTION_DATABASES:
  SMARTS: [#6]-[#6]>>[#6].[#6]
  Conditions: {'temperature': '25C', 'solvent': 'THF'}

🧠 Number of reversers in REACTION_REVERSERS['test_db']: 1


You should see a new database appear in the src\retrochem folder. The add_new_smart function works to add a new database!

Let's see how it behaves when we use the function with new reaction rules, but same file name:

In [39]:
add_new_smart(db_name, "CCC", ["C","CC"], conditions)

# Show the content of the database file
print(f"\n📄 Contents of '{file_path}':\n")
with open(file_path, "r") as f:
    print(f.read())

# Show the registered SMARTS and conditions
print("\n Registered SMARTS in REACTION_DATABASES:")
for smarts, cond in REACTION_DATABASES[db_name]:
    print(f"  SMARTS: {smarts}")
    print(f"  Conditions: {cond}")

# Show number of reversers compiled
print(f"\n🧠 Number of reversers in REACTION_REVERSERS['{db_name}']: {len(REACTION_REVERSERS[db_name])}")
    


📄 Contents of 'test_db.db':

[
  [
    "[#6]-[#6]>>[#6].[#6]",
    {"temperature": "25C", "solvent": "THF"}
  ],  [
    "[#6]-[#6]-[#6]>>[#6].[#6]-[#6]",
    {"temperature": "25C", "solvent": "THF"}
  ]
]


 Registered SMARTS in REACTION_DATABASES:
  SMARTS: [#6]-[#6]>>[#6].[#6]
  Conditions: {'temperature': '25C', 'solvent': 'THF'}
  SMARTS: [#6]-[#6]-[#6]>>[#6].[#6]-[#6]
  Conditions: {'temperature': '25C', 'solvent': 'THF'}

🧠 Number of reversers in REACTION_REVERSERS['test_db']: 2



Now let's clear all unecessary test files you just generated before you start using our application:

In [40]:
for name in ["test_db.db", "test_demo.db"]:
    if os.path.exists(name):
        os.remove(name)
        print(f"🗑️ Deleted: {name}")
    else:
        print(f"✅ No file to delete: {name}")

🗑️ Deleted: test_db.db
✅ No file to delete: test_demo.db


RetroChem: Challenges and Features - General Remarks
===================================================


Motivations
-----------

Organic chemistry is rich in transformations but poor in integrated, student-friendly tooling.  
RetroChem was born to…

* **Demystify retrosynthesis** - replace static textbook arrows with clickable, data-driven pathways.  
* **Bridge theory and practice** - couple SMARTS rules with concrete laboratory conditions.  
* **Encourage exploration** - let learners “peel back” a target molecule step-by-step and see *why* each cut makes chemical sense.  

Main Features
-------------

* **Dual input modes** - type a molecule name *or* sketch it in a Ketcher canvas.  
* **One-click retrosynthesis** - generate all first-step disconnections plus reaction conditions.  
* **Recursive exploration** - drill down on any reactant to continue the synthetic tree.  
* **History navigation** - walk back through previous targets with a single button.  
* **Custom database builder** - add new product → reactants SMARTS (with conditions) from inside the browser.

Challenges Encountered
----------------------

1. **Database building** – writing correct forward-reaction SMARTS in our handmade "epfl_student" database, then running them in reverse, required meticulous testing. Sourcing, parsing, or hand-entering reliable reaction templates (and their conditions) proved time-consuming.
2. **External API reliability** – `name_to_smiles` depends on the NCI CIR service; network hiccups needed error handling.  
3. **Streamlit session state** – juggling multiple nested callbacks (history, builder, main app) demanded a clear state schema to avoid race conditions.  
4. **Cross-platform line endings & Git** – CRLF/LF conversions occasionally produced noisy diffs and merge conflicts.

Tools Used
----------

* **RDKit** - cheminformatics toolkit for SMILES handling, sub-structure search, and reaction execution.  
* **Streamlit** - rapid, reactive web UI.  
* **streamlit-ketcher** - embedded molecule drawing widget.  
* **Pandas** - tabular display of reaction conditions.  
* **Python 3.11** - core language and ecosystem.  
* **JSON** - storage format for reaction databases (`*.db` files).  
* **Git & GitHub** - version control and collaboration.  


Ways of improvement
-------------------

RetroChem turns a traditionally chalk-and-talk exercise into an interactive, code-backed learning experience.  
While the current version already supports multi-step exploration and on-the-fly database editing, several avenues remain:

* **Route ranking** - by heuristics (step count, reagent cost, literature frequency).  
* **Energetic feasibility** - estimation via semi-empirical calculations.  
* **Better stereochemical fidelity** - using enhanced SMARTS and atom-mapping tools.  
* **Arrow Pushing** to better visualize the spots of attack.
* **Finding the correct spot for mechanism disconnection** - right now, the code only takes the first tuple of reactants from the rdkit function runreactants. Which means it does not show all the possible reactants from a single SMART reaction. Finding the most energetically and kinetically favorable can be a good improvement.


