# Introduction

Molecule synthesis is a fundamental aspect of chemistry, playing a particularly central role in organic chemistry. However, alongside the valuable compounds produced, chemical processes often generate significant amounts of waste. Among the various contributors to this waste, solvents play a particularly important role. They are widely used to dissolve reagents, regulate reaction temperatures, and assist in purification steps. Unfortunately, many of these solvents are volatile, toxic, and difficult to recycle. In fact, solvents can represent up to 90% of the total mass used in a typical chemical process, making their careful selection a critical concern in sustainable chemistry.

The concept of green chemistry was introduced in the 1990s. Green chemistry focuses on designing products and processes that minimize the use and generation of hazardous substances. Beyond reducing environmental impact, it also aims to improve process efficiency, lower operational costs, and enhance safety for both people and the environment.

The primary motivation behind this project is to reduce the environmental footprint of chemical synthesis by identifying greener solvent alternatives, provide a user-friendly tool for chemists to integrate sustainability into their workflows and finally promote green chemistry principles in both academic and industrial settings.

To support these goals, our package RetroGSF (Retrosynthesis Green Solvents Finder) was developed. This tool identifies possible synthetic pathways for a given target molecule using the retrosynthetic algorithm AiZynthFinder. It then determines the most likely solvent traditionally used for each reaction step and proposes alternative greener solvents based on their impact on human health, environmental safety, and overall sustainability. The aim is to encourage more environmentally responsible decision-making in organic synthesis by offering practical and data-driven alternatives.



## Functionalities of the package

### 1. Retrosynthetic Pathway Identification
The retrosynthetic pathway is done using the AiZynthFinder library. AiZynthFinder generates possible synthetic routes for a target molecule by analyzing reaction rules and databases. The tool identifies potential pathways and ranks them based on feasibility and other criteria.

The function __retrosynthesis_reaction_smiles__ is designed to perform retrosynthesis for a given target molecule (in SMILES format) and return a table of one-step reactions in forward order. This table includes details such as reactants, products, reaction SMILES and how likely they are to occur.

How to use the function:
- __Input__ : Provide the target molecule in SMILES format and the    path to the AiZynthFinder configuration file (config.yml).
- __Output__ :The function returns a pandas DataFrame containing the retrosynthetic steps, including reactants, products, and reaction SMILES.



In [None]:
from pathlib import Path
import pandas as pd
from aizynthfinder.aizynthfinder import AiZynthExpander

reaction_steps = retrosynthesis_reaction_smiles("CC(=O)Oc1ccccc1C(=O)O", "path/to/config.yml")
print(reaction_steps)

NameError: name 'retrosynthesis_reaction_smiles' is not defined

### Alternative solvants ranking

The rank_similar_solvents function identifies and ranks alternative solvents based on their similarity to a target solvent. The ranking is performed using physical properties (e.g., density, dielectric constant, dipole moment, refractive index) and safety/environmental criteria. The function ensures that the recommended solvents are safer and more environmentally friendly than the target solvent.

Steps in the Ranking Process:
- Input Validation:
    - The function checks if the target solvent (SMILES) exists in the dataset.
    - If the solvent is classified as hazardous, the user is warned, and only safer alternatives are recommended.

- Filtering:
    - Solvents classified as "Hazardous" or "Highly Hazardous" are excluded.
    - Solvents with environmental, health, and safety rankings greater than 5 are filtered out.
    - Solvents with incompatible melting and boiling points are excluded.

- Similarity Scoring:
    - A weighted relative distance is calculated for physical properties (e.g., density, dielectric constant, dipole moment, refractive index) compared to the given solvent.
    - The similarity score is used to rank solvents, with lower scores indicating higher similarity.

- Output the function returns a dictionary containing:
    - Target solvent properties. 
    - Ranked solvents by similarity, environmental impact, health impact, safety, and overall ranking.

In [None]:
import pandas as pd
import numpy as np
from tabulate import tabulate
import pandas as pd
import numpy as np
from tabulate import tabulate

# Define the rank_similar_solvents function (import it if already defined elsewhere)
def rank_similar_solvents(target_smiles, data_path='SHE_data_with_smiles.csv', n_recommendations=5):
    """
    Function definition as provided in the previous code.
    """
    # ...existing code...

# Example usage
if __name__ == "__main__":
    # Target solvent: Ethanol (SMILES: CCO)
    target_smiles = "CCO"
    data_path = "SHE_data_with_smiles.csv"  # Replace with the actual path to your dataset

    # Get the results
    results = rank_similar_solvents(target_smiles, data_path)

    # Print each table with a header
    print("\n=== Target Solvent Properties ===")
    target_props_df = pd.DataFrame([results['target_solvent_properties']]).T
    print(tabulate(target_props_df, headers=['Value'], tablefmt='pretty'))

    print("\n=== Ranked by Similarity ===")
    print(tabulate(results['by_similarity'], headers='keys', tablefmt='pretty', floatfmt='.3f'))

    print("\n=== Ranked by Environmental Impact ===")
    print(tabulate(results['by_environment'], headers='keys', tablefmt='pretty', floatfmt='.2f'))
    


# Limitations and improvements

### Challenges
1. **Data Availability**: Limited access to comprehensive solvent usage data for certain reaction types.
2. **Algorithm Accuracy**: Ensuring the accuracy of solvent predictions and green alternatives.
3. **Integration Complexity**: Combining multiple tools and libraries into a seamless workflow.

### Limitations
1. **Scope of Reactions**: Currently limited to reaction types covered in the USPTO-50K database.
2. **Green Solvent Database**: Relies on existing data, which may not cover all possible alternatives.
3. **User Expertise**: Requires basic knowledge of retrosynthesis and solvent properties for effective use.

# Conclusion

RetroGSF represents a significant step towards integrating green chemistry principles into organic synthesis. By providing data-driven insights and practical alternatives, it empowers chemists to make more sustainable choices in their workflows. While challenges remain, the tool's potential to reduce the environmental impact of chemical processes is substantial. Future developments will focus on expanding its capabilities and accessibility, further promoting the adoption of green chemistry in both academia and industry.