# Cross-Encoder Demo for Medical Term Re-ranking

This notebook demonstrates three different approaches to using the cross-encoder for medical term re-ranking:
1. Interactive input mode
2. Single test description
3. Batch processing from CSV

In [5]:
import pandas as pd
from cross_encode import MedicalTermRanker, candidate_pool

# Initialize the ranker once to be used across all examples
ranker = MedicalTermRanker(candidate_pool=candidate_pool)



## 1. Interactive Input Mode
This cell implements the interactive mode where users can type descriptions and see results in real-time.

In [6]:
def interactive_mode():
    print("Enter medical descriptions (type 'quit' to exit):")
    while True:
        description = input("\nEnter description: ").strip()
        if description.lower() == 'quit':
            break
        if not description:
            print("Please enter a valid description.")
            continue
        
        ranked_results = ranker.rank_terms(description)
        ranker.print_ranked_results(ranked_results)

interactive_mode()

Enter medical descriptions (type 'quit' to exit):


Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 135.07it/s]



Ranked Results:
LOINC Code                                                             Description   Score
     731-0                      Lymphocytes [#/volume] in Blood by Automated count  9.0640
     751-8                      Neutrophils [#/volume] in Blood by Automated count  5.9140
     777-3                        Platelets [#/volume] in Blood by Automated count  4.5153
    4544-3                Hematocrit [Volume Fraction] of Blood by Automated count  3.5703
     786-4                                   MCHC [Mass/volume] by Automated count  3.2429
     787-2                                 MCV [Entitic volume] by Automated count  1.9096
     788-0               Erythrocyte distribution width [Ratio] by Automated count  0.0715
     785-6                                   MCH [Entitic mass] by Automated count -1.4490
     718-7                                       Hemoglobin [Mass/volume] in Blood -4.2435
    1975-2                        Total Bilirubin [Mass/volume] in Serum 

Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 138.44it/s]



Ranked Results:
LOINC Code                                                             Description    Score
     731-0                      Lymphocytes [#/volume] in Blood by Automated count   8.4707
     751-8                      Neutrophils [#/volume] in Blood by Automated count   2.6071
     777-3                        Platelets [#/volume] in Blood by Automated count   0.6505
    4544-3                Hematocrit [Volume Fraction] of Blood by Automated count   0.5312
     788-0               Erythrocyte distribution width [Ratio] by Automated count  -0.4771
     786-4                                   MCHC [Mass/volume] by Automated count  -0.6970
     787-2                                 MCV [Entitic volume] by Automated count  -2.0379
     785-6                                   MCH [Entitic mass] by Automated count  -2.1688
     718-7                                       Hemoglobin [Mass/volume] in Blood  -8.9761
    1975-2                        Total Bilirubin [Mass/volume]

Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 140.09it/s]



Ranked Results:
LOINC Code                                                             Description    Score
    4544-3                Hematocrit [Volume Fraction] of Blood by Automated count   7.9121
     751-8                      Neutrophils [#/volume] in Blood by Automated count   6.9265
     777-3                        Platelets [#/volume] in Blood by Automated count   6.7726
     731-0                      Lymphocytes [#/volume] in Blood by Automated count   6.5071
     787-2                                 MCV [Entitic volume] by Automated count   1.6038
     786-4                                   MCHC [Mass/volume] by Automated count   1.1777
     785-6                                   MCH [Entitic mass] by Automated count   1.1746
     788-0               Erythrocyte distribution width [Ratio] by Automated count   0.7850
     718-7                                       Hemoglobin [Mass/volume] in Blood  -9.7337
    1975-2                        Total Bilirubin [Mass/volume]

Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 132.01it/s]



Ranked Results:
LOINC Code                                                             Description    Score
     731-0                      Lymphocytes [#/volume] in Blood by Automated count   4.5218
     788-0               Erythrocyte distribution width [Ratio] by Automated count -11.0737
     751-8                      Neutrophils [#/volume] in Blood by Automated count -11.0885
    6768-6     Alkaline phosphatase [Enzymatic activity/volume] in Serum or Plasma -11.1366
    1742-6 Alanine aminotransferase [Enzymatic activity/volume] in Serum or Plasma -11.1824
    1920-8                      AST [Enzymatic activity/volume] in Serum or Plasma -11.1842
    2160-0                             Creatinine [Mass/volume] in Serum or Plasma -11.2403
     718-7                                       Hemoglobin [Mass/volume] in Blood -11.2549
    2093-3                            Cholesterol [Mass/volume] in Serum or Plasma -11.2570
    2571-8                           Triglyceride [Mass/volume]

## 2. Single Test Description
This cell demonstrates using a single hard-coded test description.

In [2]:
def test_single_description(description="blood glucose measurement"):
    print(f"Testing description: '{description}'")
    ranked_results = ranker.rank_terms(description)
    ranker.print_ranked_results(ranked_results)
    return ranked_results

# Run the test
test_results = test_single_description()

Testing description: 'blood glucose measurement'


Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 35.35it/s]


Ranked Results:
LOINC Code                                                             Description    Score
    2345-7                                Glucose [Mass/volume] in Serum or Plasma   0.3159
    4544-3                Hematocrit [Volume Fraction] of Blood by Automated count  -8.0850
     718-7                                       Hemoglobin [Mass/volume] in Blood  -8.1794
    1975-2                        Total Bilirubin [Mass/volume] in Serum or Plasma  -9.3607
    2571-8                           Triglyceride [Mass/volume] in Serum or Plasma  -9.5033
    2093-3                            Cholesterol [Mass/volume] in Serum or Plasma  -9.6174
     751-8                      Neutrophils [#/volume] in Blood by Automated count  -9.7745
    2085-9                        HDL Cholesterol [Mass/volume] in Serum or Plasma  -9.9587
    2089-1                        LDL Cholesterol [Mass/volume] in Serum or Plasma  -9.9724
     777-3                        Platelets [#/volume] in Blood




## 3. Batch Processing from CSV
This cell demonstrates processing multiple descriptions from a CSV file and analyzing the results.

In [3]:
def process_csv(csv_path='test_descriptions.csv'):
    # Read the CSV file
    df = pd.read_csv(csv_path)
    
    # Process each description
    results = []
    for idx, row in df.iterrows():
        description = row['description']
        expected_loinc = row['expected_loinc']
        
        print(f"\nProcessing: {description}")
        ranked_results = ranker.rank_terms(description)
        
        # Get the top result
        top_result = ranked_results[0]
        
        # Store results
        results.append({
            'description': description,
            'expected_loinc': expected_loinc,
            'top_loinc': top_result[0],
            'top_score': top_result[2],
            'match': expected_loinc == top_result[0]
        })
        
        # Print individual results
        print(f"Expected LOINC: {expected_loinc}")
        ranker.print_ranked_results([top_result])
    
    # Create results DataFrame
    results_df = pd.DataFrame(results)
    
    # Print summary statistics
    print("\nSummary Statistics:")
    print(f"Total tests: {len(results_df)}")
    print(f"Correct matches: {results_df['match'].sum()}")
    print(f"Accuracy: {(results_df['match'].sum() / len(results_df)) * 100:.2f}%")
    
    return results_df

# Run the CSV processing
results_df = process_csv()


Processing: blood sugar test


Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 139.99it/s]


Expected LOINC: 2345-7

Ranked Results:
LOINC Code                              Description   Score
    2345-7 Glucose [Mass/volume] in Serum or Plasma -2.4427

Processing: complete blood count with hemoglobin


Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 141.11it/s]


Expected LOINC: 718-7

Ranked Results:
LOINC Code                       Description   Score
     718-7 Hemoglobin [Mass/volume] in Blood -2.4621

Processing: liver function test


Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 146.08it/s]


Expected LOINC: 1920-8

Ranked Results:
LOINC Code                           Description    Score
     785-6 MCH [Entitic mass] by Automated count -11.1695

Processing: cholesterol panel


Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 89.18it/s]


Expected LOINC: 2093-3

Ranked Results:
LOINC Code                                  Description   Score
    2093-3 Cholesterol [Mass/volume] in Serum or Plasma -2.5804

Processing: kidney function test


Ranking terms: 100%|██████████| 20/20 [00:00<00:00, 142.42it/s]

Expected LOINC: 2160-0

Ranked Results:
LOINC Code                                        Description    Score
     751-8 Neutrophils [#/volume] in Blood by Automated count -11.0728

Summary Statistics:
Total tests: 5
Correct matches: 3
Accuracy: 60.00%



