<a href="https://colab.research.google.com/github/chemplusx/RxNExtract/blob/main/examples/quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧪 RxNExtract Quick Start Guide

Welcome to **RxNExtract** - a professional-grade system for extracting chemical reaction information from procedure texts using fine-tuned LLMs with dynamic prompting and self-grounding.

[![GitHub](https://img.shields.io/badge/GitHub-RxNExtract-blue?logo=github)](https://github.com/chemplusx/RxNExtract)
[![PyPI](https://img.shields.io/badge/PyPI-rxnextract-orange?logo=pypi)](https://pypi.org/project/rxnextract/)
[![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Models-yellow)](https://huggingface.co/chemplusx/rxnextract-complete)

## 🎯 What You'll Learn
- Install and set up RxNExtract in Google Colab
- Extract chemical reaction information from text
- Analyze extraction results and confidence scores
- Process multiple procedures in batch
- Visualize extracted data

## 🚀 Key Features
- **122.6% improvement** in Complete Reaction Accuracy over baseline
- **47-55% error reduction** across all major categories
- **Statistical significance**: McNemar's χ² = 134.67 (p < 0.001)
- Support for complex multi-step reactions
- Confidence scoring and uncertainty quantification

## 📦 Installation

Let's start by installing RxNExtract and its dependencies.

In [None]:
# Install RxNExtract with GPU support
!pip install rxnextract[gpu] -q

# Install additional dependencies for visualization
!pip install matplotlib seaborn pandas plotly -q

print("✅ Installation complete!")

In [None]:
# Import required libraries
import json
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display, HTML, JSON
import warnings
warnings.filterwarnings('ignore')

# Set up plotting style
plt.style.use('default')
sns.set_palette("husl")

print("📚 Libraries imported successfully!")

## 🤖 Initialize RxNExtract

Now let's initialize the extractor with the pre-trained model from HuggingFace.

In [None]:
from chemistry_llm import ChemistryReactionExtractor

# Check GPU availability
import torch
print(f"CUDA Available: {torch.cuda.is_available()}")
print(f"Device: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU'}")

# Initialize the extractor with the complete framework model
print("\n🔄 Loading RxNExtract model (this may take a few minutes)...")

extractor = ChemistryReactionExtractor.from_pretrained(
    "chemplusx/rxnextract-complete",
    device="cuda" if torch.cuda.is_available() else "cpu",
    load_in_4bit=True,
    temperature=0.1,
    max_length=512
)

print("✅ RxNExtract loaded successfully!")

## 🧪 Basic Extraction Example

Let's start with a simple chemical procedure extraction.

In [None]:
# Example chemical procedure
procedure = """
Add 2.5 g of benzoic acid to 50 mL of ethanol in a round-bottom flask.
Heat the mixture to reflux for 4 hours while stirring.
Cool the solution to room temperature and filter the precipitate.
Wash the solid with cold ethanol and dry to obtain 2.1 g of product (84% yield).
"""

print("🔬 Analyzing procedure...")
print(f"Input: {procedure.strip()}")
print("\n" + "="*80)

# Extract reaction information
results = extractor.analyze_procedure(procedure, return_raw=False)

print(f"✅ Extraction completed in {results['processing_time']:.1f}s")
print(f"🎯 Confidence: {results['confidence']:.2%}")

In [None]:
# Display extracted data in a structured format
def display_extraction_results(results):
    """Display extraction results with rich formatting"""
    data = results['extracted_data']
    
    print("📊 EXTRACTION RESULTS")
    print("="*50)
    
    # Reactants
    if data.get('reactants'):
        print("\n🔵 REACTANTS:")
        for i, reactant in enumerate(data['reactants'], 1):
            print(f"  {i}. {reactant.get('name', 'Unknown')}")
            if reactant.get('amount'):
                print(f"     Amount: {reactant['amount']}")
            if reactant.get('role'):
                print(f"     Role: {reactant['role']}")
    
    # Reagents
    if data.get('reagents'):
        print("\n🟡 REAGENTS:")
        for i, reagent in enumerate(data['reagents'], 1):
            print(f"  {i}. {reagent.get('name', 'Unknown')}")
            if reagent.get('amount'):
                print(f"     Amount: {reagent['amount']}")
    
    # Solvents
    if data.get('solvents'):
        print("\n🔵 SOLVENTS:")
        for i, solvent in enumerate(data['solvents'], 1):
            print(f"  {i}. {solvent.get('name', 'Unknown')}")
            if solvent.get('amount'):
                print(f"     Amount: {solvent['amount']}")
    
    # Products
    if data.get('products'):
        print("\n🟢 PRODUCTS:")
        for i, product in enumerate(data['products'], 1):
            print(f"  {i}. {product.get('name', 'Unknown')}")
            if product.get('amount'):
                print(f"     Amount: {product['amount']}")
            if product.get('yield'):
                print(f"     Yield: {product['yield']}")
    
    # Conditions
    if data.get('conditions'):
        print("\n🌡️  REACTION CONDITIONS:")
        conditions = data['conditions']
        if conditions.get('temperature'):
            print(f"     Temperature: {conditions['temperature']}")
        if conditions.get('time'):
            print(f"     Time: {conditions['time']}")
        if conditions.get('atmosphere'):
            print(f"     Atmosphere: {conditions['atmosphere']}")
        if conditions.get('pressure'):
            print(f"     Pressure: {conditions['pressure']}")
    
    # Workup steps
    if data.get('workup'):
        print("\n⚗️  WORKUP STEPS:")
        for i, step in enumerate(data['workup'], 1):
            print(f"  {i}. {step}")

# Display the results
display_extraction_results(results)

In [None]:
# Interactive demo function
def interactive_demo():
    """Interactive demo for custom procedure input"""
    
    print("🧪 INTERACTIVE RxNExtract DEMO")
    print("="*50)
    print("Enter your own chemical procedure below:")
    print("(Press Ctrl+Enter to run the cell after entering your procedure)")
    print()
    
    # Example procedures for users to try
    example_procedures = [
        "Add 1.0 g of sodium chloride to 10 mL of water and stir until dissolved.",
        "Heat 5 mL of acetic acid with 3 g of salicylic acid at 140°C for 30 minutes.",
        "Combine benzaldehyde (2 mL) with aniline (1.5 mL) in ethanol and reflux for 2 hours.",
        "Dissolve 2.5 g of copper sulfate in 50 mL of water, then add 1.0 g of zinc powder."
    ]
    
    print("💡 Try one of these example procedures:")
    for i, example in enumerate(example_procedures, 1):
        print(f"{i}. {example}")
    print()

interactive_demo()

# User input cell
user_procedure = """
Add 3.0 g of benzoin to 25 mL of ethanol in a round-bottom flask. 
Add 0.5 g of sodium borohydride portionwise while stirring at room temperature.
Stir for 2 hours, then add 20 mL of water and extract with ethyl acetate.
Dry the organic layer over MgSO4 and concentrate to give 2.1 g of benzyl alcohol (72% yield).
"""

print("📝 Your procedure:")
print(user_procedure.strip())
print("\n🔬 Analyzing your procedure...")

In [None]:
# Process user input
if user_procedure.strip():
    print("🔄 Analyzing your procedure...")
    print("="*60)
    
    # Analyze user procedure
    user_result = extractor.analyze_procedure(user_procedure)
    
    # Display results
    print(f"✅ Analysis completed in {user_result['processing_time']:.1f}s")
    print(f"🎯 Confidence: {user_result['confidence']:.1%}")
    print()
    
    # Show detailed results
    display_extraction_results(user_result)
    
else:
    print("⚠️  Please enter a chemical procedure in the cell above!")

## 📊 Summary & Next Steps

Congratulations! You've successfully explored RxNExtract's capabilities.

In [None]:
# Final summary
print("🎉 RXNEXTRACT QUICK START SUMMARY")
print("=" * 50)
print("\n✅ WHAT YOU'VE ACCOMPLISHED:")
print("   • Installed and set up RxNExtract in Google Colab")
print("   • Analyzed chemical procedures with high accuracy")
print("   • Explored interactive capabilities")
print("   • Tested with custom procedures")

print("\n🚀 PERFORMANCE HIGHLIGHTS:")
print("   • 122.6% improvement in Complete Reaction Accuracy")
print("   • 47-55% error reduction across major categories")
print("   • Statistical significance: p < 0.001")
print("   • Robust handling of complex procedures")

print("\n📖 NEXT STEPS:")
print("   1. 🔬 Try RxNExtract with your own chemical procedures")
print("   2. 📊 Explore the comprehensive analysis framework")
print("   3. 🏗️  Integrate RxNExtract into your research workflow")
print("   4. 📚 Read the full documentation and research paper")
print("   5. 🤝 Join the community and contribute")

print("\n🔗 USEFUL LINKS:")
print("   • GitHub: https://github.com/chemplusx/RxNExtract")
print("   • PyPI: https://pypi.org/project/rxnextract/")
print("   • HuggingFace: https://huggingface.co/chemplusx/rxnextract-complete")
print("   • Documentation: https://docs.rxnextract.org")

print("\n🎯 Thank you for trying RxNExtract!")
print("   We hope this tool accelerates your chemical research.")
print("   Please consider starring ⭐ our GitHub repository!")