# Cross-Encoder Demo for Medical Term Re-ranking

This notebook demonstrates three different approaches to using the cross-encoder for medical term re-ranking:
1. Interactive input mode
2. Single test description
3. Batch processing from CSV

In [1]:
import pandas as pd
from cross_encode import MedicalTermRanker, candidate_pool

# Initialize the ranker once to be used across all examples
ranker = MedicalTermRanker(candidate_pool=candidate_pool)

  from .autonotebook import tqdm as notebook_tqdm
  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


## 1. Interactive Input Mode
This cell implements the interactive mode where users can type descriptions and see results in real-time.

In [2]:
def interactive_mode():
    print("Enter medical descriptions (type 'quit' to exit):")
    while True:
        description = input("\nEnter description: ").strip()
        if description.lower() == 'quit':
            break
        if not description:
            print("Please enter a valid description.")
            continue
        
        ranked_results = ranker.rank_terms(description)
        ranker.print_ranked_results(ranked_results)

interactive_mode()

Enter medical descriptions (type 'quit' to exit):


Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 26.45it/s]



Ranked Results:
LOINC Code                                                             Description    Score
     718-7                                       Hemoglobin [Mass/volume] in Blood  -1.0905
    1975-2                        Total Bilirubin [Mass/volume] in Serum or Plasma  -9.1000
    4544-3                Hematocrit [Volume Fraction] of Blood by Automated count -10.0101
    2571-8                           Triglyceride [Mass/volume] in Serum or Plasma -10.8224
    2345-7                                Glucose [Mass/volume] in Serum or Plasma -10.9698
    1742-6 Alanine aminotransferase [Enzymatic activity/volume] in Serum or Plasma -10.9784
    2160-0                             Creatinine [Mass/volume] in Serum or Plasma -10.9960
    6768-6     Alkaline phosphatase [Enzymatic activity/volume] in Serum or Plasma -11.1289
     751-8                      Neutrophils [#/volume] in Blood by Automated count -11.1383
    1920-8                      AST [Enzymatic activity/volume]

Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 140.07it/s]



Ranked Results:
LOINC Code                                                             Description    Score
     718-7                                       Hemoglobin [Mass/volume] in Blood   5.4140
    1975-2                        Total Bilirubin [Mass/volume] in Serum or Plasma  -8.3192
    4544-3                Hematocrit [Volume Fraction] of Blood by Automated count  -8.6015
    2345-7                                Glucose [Mass/volume] in Serum or Plasma -10.3216
     751-8                      Neutrophils [#/volume] in Blood by Automated count -10.6452
    2160-0                             Creatinine [Mass/volume] in Serum or Plasma -10.8762
    1742-6 Alanine aminotransferase [Enzymatic activity/volume] in Serum or Plasma -10.9207
    2571-8                           Triglyceride [Mass/volume] in Serum or Plasma -10.9389
    2085-9                        HDL Cholesterol [Mass/volume] in Serum or Plasma -10.9396
    6768-6     Alkaline phosphatase [Enzymatic activity/volume]

Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 136.76it/s]



Ranked Results:
LOINC Code                                                             Description    Score
    1975-2                        Total Bilirubin [Mass/volume] in Serum or Plasma   3.7272
     718-7                                       Hemoglobin [Mass/volume] in Blood  -9.8210
    2571-8                           Triglyceride [Mass/volume] in Serum or Plasma -11.0995
    1742-6 Alanine aminotransferase [Enzymatic activity/volume] in Serum or Plasma -11.1075
    2345-7                                Glucose [Mass/volume] in Serum or Plasma -11.1722
    2160-0                             Creatinine [Mass/volume] in Serum or Plasma -11.1768
    6768-6     Alkaline phosphatase [Enzymatic activity/volume] in Serum or Plasma -11.2460
     751-8                      Neutrophils [#/volume] in Blood by Automated count -11.2715
    1920-8                      AST [Enzymatic activity/volume] in Serum or Plasma -11.2744
    2085-9                        HDL Cholesterol [Mass/volume]

Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 131.83it/s]



Ranked Results:
LOINC Code                                                             Description    Score
    1742-6 Alanine aminotransferase [Enzymatic activity/volume] in Serum or Plasma   4.6819
    2160-0                             Creatinine [Mass/volume] in Serum or Plasma -10.1353
    6768-6     Alkaline phosphatase [Enzymatic activity/volume] in Serum or Plasma -11.2039
    2571-8                           Triglyceride [Mass/volume] in Serum or Plasma -11.2093
    2345-7                                Glucose [Mass/volume] in Serum or Plasma -11.2480
    1920-8                      AST [Enzymatic activity/volume] in Serum or Plasma -11.2932
    1975-2                        Total Bilirubin [Mass/volume] in Serum or Plasma -11.3040
    2093-3                            Cholesterol [Mass/volume] in Serum or Plasma -11.3378
    3094-0                                    BUN [Mass/volume] in Serum or Plasma -11.3394
     718-7                                       Hemoglobin [Ma

## 2. Single Test Description
This cell demonstrates using a single hard-coded test description.

In [6]:
def test_single_description(description="blood glucose measurement"):
    print(f"Testing description: '{description}'")
    ranked_results = ranker.rank_terms(description)
    ranker.print_ranked_results(ranked_results)
    return ranked_results

# Run the test
test_results = test_single_description('glucose')

Testing description: 'glucose'


Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 136.32it/s]


Ranked Results:
LOINC Code                                                             Description    Score
    2345-7                                Glucose [Mass/volume] in Serum or Plasma   2.9120
    2571-8                           Triglyceride [Mass/volume] in Serum or Plasma -10.2044
    1975-2                        Total Bilirubin [Mass/volume] in Serum or Plasma -10.7396
     718-7                                       Hemoglobin [Mass/volume] in Blood -10.8738
    6768-6     Alkaline phosphatase [Enzymatic activity/volume] in Serum or Plasma -10.9320
    1742-6 Alanine aminotransferase [Enzymatic activity/volume] in Serum or Plasma -10.9464
    2093-3                            Cholesterol [Mass/volume] in Serum or Plasma -10.9769
    2160-0                             Creatinine [Mass/volume] in Serum or Plasma -11.0003
    2085-9                        HDL Cholesterol [Mass/volume] in Serum or Plasma -11.0383
     785-6                                   MCH [Entitic mass]




## 3. Batch Processing from CSV
This cell demonstrates processing multiple descriptions from a CSV file and analyzing the results.

In [7]:
def process_csv(csv_path='test_descriptions.csv'):
    # Read the CSV file
    df = pd.read_csv(csv_path)
    
    # Process each description
    results = []
    for idx, row in df.iterrows():
        description = row['description']
        expected_loinc = row['expected_loinc']
        
        print(f"\nProcessing: {description}")
        ranked_results = ranker.rank_terms(description)
        
        # Get the top result
        top_result = ranked_results[0]
        
        # Store results
        results.append({
            'description': description,
            'expected_loinc': expected_loinc,
            'top_loinc': top_result[0],
            'top_score': top_result[2],
            'match': expected_loinc == top_result[0]
        })
        
        # Print individual results
        print(f"Expected LOINC: {expected_loinc}")
        ranker.print_ranked_results([top_result])
    
    # Create results DataFrame
    results_df = pd.DataFrame(results)
    
    # Print summary statistics
    print("\nSummary Statistics:")
    print(f"Total tests: {len(results_df)}")
    print(f"Correct matches: {results_df['match'].sum()}")
    print(f"Accuracy: {(results_df['match'].sum() / len(results_df)) * 100:.2f}%")
    
    return results_df

# Run the CSV processing
results_df = process_csv()


Processing: blood sugar test


Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 136.01it/s]


Expected LOINC: 2345-7

Ranked Results:
LOINC Code                              Description   Score
    2345-7 Glucose [Mass/volume] in Serum or Plasma -2.4427

Processing: complete blood count with hemoglobin


Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 139.98it/s]


Expected LOINC: 718-7

Ranked Results:
LOINC Code                       Description   Score
     718-7 Hemoglobin [Mass/volume] in Blood -2.4621

Processing: liver function test


Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 145.75it/s]


Expected LOINC: 1920-8

Ranked Results:
LOINC Code                           Description    Score
     785-6 MCH [Entitic mass] by Automated count -11.1695

Processing: cholesterol panel


Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 145.51it/s]


Expected LOINC: 2093-3

Ranked Results:
LOINC Code                                  Description   Score
    2093-3 Cholesterol [Mass/volume] in Serum or Plasma -2.5804

Processing: kidney function test


Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 148.96it/s]

Expected LOINC: 2160-0

Ranked Results:
LOINC Code                                        Description    Score
     751-8 Neutrophils [#/volume] in Blood by Automated count -11.0728

Summary Statistics:
Total tests: 5
Correct matches: 3
Accuracy: 60.00%



