# Form Analysis Report
Project Name: AI Driven Pharmacy Procurement Software
    2/1/2024

## Introduction

This report is an analysis based on the static dailysnapshot.csv dataset. The overall objective of the report is identify a method to develop and implement an algorithm that normalizes, standardizes, and identifies acceptable replacements for pharmaceutical products, based on Taha's initially parsed data, which requires refinement to associate each drug with its corresponding form factor for effective analysis.

The steps identified through my analysis are given as follows:

### Step 1: Normalize and Standardize Data
Objective: To ensure consistency and comparability across the pharmaceutical dataset, particularly in terms of drug forms and dosage units.

**Methodology**:
1.	Normalization of Form Names: Involves converting various abbreviations and synonyms to a standardized set of terms. For instance, 'caps' are normalized to 'capsules', and 'tab' to 'tablets'. This process is crucial for accurately comparing and categorizing drug forms.
2.	Standardization of Dosage Units: Standardization aims to ensure that all dosages are presented in consistent units such as mg, g, or ml. This may involve converting dosages to a common unit where applicable, thereby facilitating accurate dosage comparisons.

**Implementation**:

1. Define Normalization and Standardization Rules

First, we'll define a set of rules for normalization. These rules will help in converting diverse pharmaceutical form names into a consistent format, and similarly, for dosages.

In [None]:
# Normalization rules for form names
form_normalization_rules = {
    'caps': 'capsules',
    'tab': 'tablets',
    'caplet': 'tablet',  # Considering caplets and tablets equivalent for this context
    'ointment': 'cream',  # Simplification, though not strictly accurate
    'susp': 'suspension',
    'conc': 'concentrate',
    'inj': 'injection',
    'vial': 'injection',  # Vials are typically used for injections
    'syringe': 'injection',  # Similarly, syringes imply injection
    'powd': 'powder',
    'aero': 'aerosol',
    'drops': 'solution',  # Eye/nose drops are essentially solutions
    'lozenge': 'tablet',  # Simplification, for oral solid dosage
    'shampoo': 'topical solution',  # Shampoos can be considered topical solutions
    'foam': 'topical foam',
    'gel': 'topical gel'
}

2. Apply Normalization to Parsed Data

Next, we'll apply these normalization rules to the parsed data. This process will iterate over each entry, update form names according to the normalization rules, and standardize dosage units where applicable.

In [None]:
def normalize_form_names(entry, rules):
    if entry['form_dosage']:
        for original, normalized in rules.items():
            if original in entry['form_dosage'].lower():
                entry['form_dosage'] = re.sub(original, normalized, entry['form_dosage'], flags=re.IGNORECASE)
    return entry

normalized_data = [normalize_form_names(entry, form_normalization_rules) for entry in improved_parsed_sample_data]

### Step 2: Define Rules for Acceptable Replacements

This step requires establishing a set of criteria to determine when one form of medication can be considered an acceptable replacement for another. These criteria must consider both the physical form of the medication (e.g., tablet, liquid, injection) and the dosage equivalency.

1. Form Factor Similarities
- **Solid to Solid:** Tablets, capsules, and caplets can generally be considered interchangeable, assuming the dosage strength and release characteristics (e.g., immediate vs. extended-release) are comparable.
- **Liquid to Liquid:** Solutions, suspensions, and syrups can be replaced with one another, with attention to concentration (mg/ml) to ensure dosage accuracy.
- **Topical Forms:** Creams, ointments, and gels might be interchangeable for external use, depending on the active ingredient's absorption characteristics.
- **Injectables:** Different forms of injections (e.g., intravenous, intramuscular) should be cautiously considered, primarily based on professional healthcare advice.

2. Dosage Equivalencies
- **Concentration Matching:** For liquids, matching concentration (e.g., mg/ml) is crucial for dosage accuracy.
- **Unit Dose Matching:** For solids, the total dose per unit (e.g., per tablet or capsule) must match.
- **Volume to Unit Conversion:** For replacing a liquid with a solid form (or vice versa), converting volume to unit dose (or unit dose to volume) based on concentration or potency.


### Step 4: Implement the Replacement Identification Algorithm

**Objective:**  
To automatically identify potential replacements for each drug in the dataset based on the established rules of form factor similarities and dosage equivalencies.

**Methodology:**

- **Algorithm Design:** The algorithm iterates through the normalized dataset, applying the replacement rules to each drug entry.

- **Replacement Identification:** For each drug, the algorithm identifies other drugs with a matching or compatible form factor. It then refines this list to include only those matching in dosage or having an equivalent dosage.

- **Output Generation:** The final output is a structured list of drugs with identified potential replacements, including details of form factor and dosage equivalency.

**Implementation:**

- The algorithm is structured to handle large datasets efficiently, providing outputs that can be easily interpreted and utilized in pharmaceutical contexts.
- Preliminary testing is conducted to ensure the algorithm's accuracy and reliability.

### Conclusion and Future Steps

The developed algorithm provides a foundational tool for analyzing pharmaceutical data with the potential to greatly assist in identifying suitable medication replacements. Further enhancements, including more complex dosage conversions and the incorporation of therapeutic equivalence considerations, are planned for subsequent iterations. The algorithm's effectiveness will be continually assessed and improved upon through testing and expert reviews in the pharmaceutical field.


In [None]:
def find_replacements(dataset):
    replacements = []
    for drug in dataset:
        if drug['form_dosage'] and 'tablet' in drug['form_dosage'].lower():
            # Simplified example: Identify all tablet form drugs as potential replacements for each other
            potential_replacements = [d for d in dataset if 'tablet' in (d['form_dosage'] or "").lower() and d['drug_name'] != drug['drug_name']]
            if potential_replacements:
                replacements.append({'drug_name': drug['drug_name'], 'replacements': potential_replacements})
    return replacements

# This is a highly simplified conceptual example and would need refinement and expansion
# to accurately reflect dosage equivalencies and therapeutic interchangeability.