---
title: Project Card
subtitle: OncoDermAI
version: v0.1
card version: v0.1
author: Sai Madhavan G and Srinivasan M
date: 08/11/2024
objective: >
  The purpose of Project Cards is two folds. During development, it helps the developer think about the problem in a structured way w.r. t framing the problem, assessing the business value, viability, and many other aspects. 

  It can also serve as a document giving a high level overview of the system developed and deployed. With proper versioning, one can also see the evolution of the problem. It is meant to be a high level document and as details emerge, documents such Model Cards and Data Cards can be linked.
tag: >
  This notebook uses tags to render the output. Each call has a tag. There are three tags: objective, instruction, response. 
  The cell with objective tag explains the purpose of this project card. Cells with instruction tag, are the key sections of the document that must be filled.  A cell following immediately will have a tag response. You only fill the cell with response tag. DO NOT MODIFY the cells with tag instruction. Of course, feel free to modify to your needs. Once the format is agreed upon, stick to it.
format:
  html:
    code-fold: true
---


# Business View


## Background

_Provide succinct **background** to the problem so that the reader can empathize with the problem._


Skin cancer is a growing concern worldwide, and in rural India, the lack of accessible dermatological care creates significant barriers to early diagnosis and treatment. Many rural communities face a shortage of dermatologists, with patients often traveling long distances to receive specialized care. Without timely diagnosis, skin cancer cases may go undetected or be identified too late, impacting patient outcomes.


## Problem

_**What** is the problem being solved?_


The problem being solved is the lack of accessible, early skin cancer screening and diagnosis in rural India, where dermatologists and specialized medical resources are scarce. Patients in these regions often struggle to receive timely assessments, which can delay diagnosis and treatment of skin cancers and other serious skin conditions.


## Customer

_**Who** it is for? Is that a \_user_ or a _beneficiary_?
What is the problem being solved? Who it is for?\_


OncoDerm AI is designed for healthcare providers in rural India who lack access to specialized dermatological support. The primary **users** of the system are non-specialist healthcare workers, such as general practitioners, nurses, and community health workers, who can utilize the tool to make preliminary skin cancer assessments. These healthcare workers rely on OncoDerm AI to screen patients for potential skin lesions, helping them identify high-risk cases that require further specialist care.

The **beneficiaries** of this project are the patients in these rural communities. By receiving timely and accessible screening, they gain a greater chance for early detection and treatment of skin cancer, which can improve outcomes and save lives.


## Value Proposition

_Why it needs to be solved?_


The need for OncoDerm AI arises from the critical gap in dermatological care in rural India, where access to early diagnosis and specialist support is limited. Skin cancer, when detected early, can often be treated successfully; however, without timely screening, cases may go undiagnosed until they reach advanced stages, significantly affecting patient outcomes and increasing healthcare costs.

By solving this problem, OncoDerm AI aims to improve early detection rates, support healthcare providers in delivering higher-quality care, and ultimately enhance health outcomes for underserved populations. This tool brings reliable, cost-effective diagnostic support to rural areas, where it’s most needed, helping to bridge the healthcare gap for vulnerable communities.


## Product

_How does the solution look like? It is more of the experience, rather how it will be developed._


OncoDerm AI is an AI-powered skin cancer screening tool designed to provide healthcare workers in rural India with the ability to assess skin lesions using dermatoscopic images. The product consists of the following key components:

1. **AI Model**: The core of the system is a deep learning model trained on a large dataset of dermatoscopic images (DermaMNIST), capable of classifying skin lesions into one of seven categories, including melanoma, basal cell carcinoma, and benign conditions. The model outputs predictions along with confidence scores and explanations for each classification.

2. **Interactive Chatbot**: To enhance user engagement and accessibility, OncoDerm AI features a chatbot powered by a Large Language Model (LLM). Healthcare workers can interact with the chatbot to ask follow-up questions, seek clarifications, and obtain detailed explanations of the model’s predictions. The chatbot will provide context, offer interpretive guidance, and help users understand the significance of the results in simple language, making it easier for non-specialist users to interpret complex AI outputs.

3. **Interactive Dashboard**: The tool provides a user-friendly dashboard that displays real-time skin lesion assessments. The dashboard includes:

   - Predictions with confidence scores
   - Explanations of the model’s decision-making process
   - Image preprocessing and quality checks
   - Options for resolution handling and upscaling to ensure clarity of images

4. **Clinical Integration**: OncoDerm AI can be integrated into the existing healthcare workflows in rural clinics, making it easy to use without the need for specialized equipment. Healthcare workers can upload dermatoscopic images via mobile devices or computers, receive immediate results, and engage in a conversation with the chatbot for further guidance.

5. **Data Privacy & Compliance**: OncoDerm AI ensures data privacy, featuring an automated data removal pipeline to handle the right to erasure requests from patients.

The experience for the user is seamless: after uploading a skin lesion image, the healthcare worker receives immediate feedback from the AI model along with an explanation of the diagnosis. If they need further clarification, they can interact with the chatbot to receive more detailed information, enhancing their understanding and confidence in the decision-making process.

OncoDerm AI is designed to be easy to use, ensuring that healthcare providers in rural areas, with varying levels of technical expertise, can effectively use it to support early skin cancer detection.


## Objectives

_Breakdown the product into key (business) objectives that need to be delivered?_
[SMART Goals](https://med.stanford.edu/content/dam/sm/s-spire/documents/How-to-write-SMART-Goals-v2.pdf) is useful to frame


The OncoDerm AI project aims to deliver a comprehensive solution for skin cancer screening in rural India, where access to dermatologists is limited. Below are the key business objectives, framed as SMART goals:

1. **Objective 1: Achieve 85% Model Accuracy in Skin Lesion Classification**

   - **Specific**: Develop and deploy an AI model that classifies skin lesions into 7 categories (including melanoma and basal cell carcinoma) with a minimum accuracy of 85%.
   - **Measurable**: Accuracy will be evaluated using the validation set (1,268 images) and tested on the test set (2,239 images).
   - **Achievable**: Using a well-known architecture like ResNet-18 or MobileNetV2, and leveraging transfer learning and data augmentation.
   - **Relevant**: High classification accuracy is critical to ensuring the system provides reliable results to healthcare workers.
   - **Time-bound**: Achieve this goal within end of semseter.

2. **Objective 2: Integrate an Interactive Chatbot for Enhanced User Engagement**

   - **Specific**: Integrate a chatbot powered by an LLM that allows users to interact with the system, asking for clarifications and receiving model predictions, confidence scores, and explanations.
   - **Measurable**: Measure user engagement through usage statistics, and collect feedback on chatbot helpfulness through user surveys.
   - **Achievable**: Integrating a pre-trained LLM and providing a user-friendly interface for querying the model.
   - **Relevant**: The chatbot will help medical professionals in rural areas, who may not have in-depth dermatological knowledge, interpret the model's results more confidently.
   - **Time-bound**: Achieve this goal within end of semseter.

3. **Objective 3: Ensure Data Privacy Compliance and Implement Right to Erasure**

   - **Specific**: Implement a data pipeline that ensures patient data is securely handled, with a fully automated removal process for data erasure requests, in compliance with privacy laws and regulations.
   - **Measurable**: Successfully process at least 99% of data removal requests within 24 hours.
   - **Achievable**: Design and implement an automated data removal pipeline that complies with local privacy standards.
   - **Relevant**: Ensuring patient data privacy is crucial in gaining trust among users, particularly in rural settings where data sensitivity is a concern.
   - **Time-bound**: Achieve this goal within end of semseter.

4. **Objective 5: Monitor System Performance and Retrain Model Every 3 Months**
   - **Specific**: Implement a system for continuous monitoring of model performance and retrain the model every 3 months to ensure its accuracy and adapt to any changes in data (e.g., new skin lesion patterns).
   - **Measurable**: Measure model performance using metrics such as accuracy, precision, and recall, and retrain when performance degrades by more than 5%.
   - **Achievable**: Set up continuous integration and delivery (CI/CD) pipelines to monitor performance and trigger retraining as needed.
   - **Relevant**: Ensuring the model stays up-to-date and reliable is essential for maintaining trust and accuracy in the system.
   - **Time-bound**: Implement continuous monitoring and retraining pipeline within 3 months of initial deployment, and retrain every 3 months thereafter.


## Risks & Challenges

_What are the challenges one can face and ways to overcome?_


## Risks & Challenges

Developing and deploying the OncoDerm AI system in rural India presents several challenges. Here are the key risks and potential mitigation strategies:

1. **Limited Image Resolution and Dataset Size**

   - **Challenge**: The DermaMNIST dataset contains 28x28 pixel images, which may limit the model's ability to detect subtle visual features. Additionally, the dataset is relatively small, which can lead to overfitting.
   - **Mitigation**: Use transfer learning with pre-trained models on higher-resolution dermatoscopic images to improve feature extraction. Employ data augmentation techniques (e.g., rotation, flipping, color jittering) to create a more diverse dataset. Continuously update and expand the dataset by incorporating new data from partner clinics to improve model robustness.

2. **Model Interpretability and User Trust**

   - **Challenge**: Users in rural areas, often non-specialist healthcare workers, may lack the medical background to interpret complex AI outputs, which could lead to mistrust or misuse.
   - **Mitigation**: Provide clear, interpretable model outputs and confidence scores. Integrate the LLM-powered chatbot to provide accessible explanations, allowing users to ask questions and understand predictions better. Implement conformal predictions to communicate confidence in a way that’s easy to understand (e.g., “high confidence” vs. “low confidence”).

3. **Technical Infrastructure Constraints in Rural Areas**

   - **Challenge**: Rural clinics may face limited access to high-speed internet, modern hardware, or reliable electricity, affecting system deployment and performance.
   - **Mitigation**: Optimize the model for low-resource environments by using lightweight architectures (e.g., MobileNetV2) and deploying the model on local devices with minimal computational requirements. Consider offline capabilities and integrate power backups if feasible.

4. **Privacy and Data Security Concerns**

   - **Challenge**: Patient data privacy is crucial, particularly in sensitive areas like healthcare. Rural clinics may also have limited understanding of data privacy practices.
   - **Mitigation**: Implement data encryption, secure access controls, and compliance with local privacy laws. Develop training sessions for clinic staff on data privacy best practices. Set up an automated data removal pipeline to handle right-to-erasure requests efficiently, and regularly review compliance.

5. **Model Drift and Data Distribution Shifts**

   - **Challenge**: Skin lesion data may change over time due to environmental, genetic, or treatment factors, leading to model drift.
   - **Mitigation**: Implement continuous monitoring for data drift and set up a CI/CD pipeline to retrain the model as needed. Regularly assess model accuracy, and retrain every 3 months or when performance drops significantly. Collect user feedback and periodic clinical validation to ensure the model remains relevant.

6. **User Adoption and Training**
   - **Challenge**: Local healthcare workers may be unfamiliar with AI systems, and mistrust or a lack of training may reduce adoption.
   - **Mitigation**: Collaborate with healthcare providers and NGOs to train healthcare workers on using the system effectively, including hands-on demonstrations and support materials in regional languages. Establish a support team for ongoing assistance, and offer a simplified user interface with a clear workflow to increase usability and confidence.


# ML View


## Task

_What type of prediction problem is this? Link [Model Card](https://arxiv.org/abs/1810.03993) when sufficient details become available (start small but early)_


The prediction task for OncoDerm AI is a **multi-class classification** problem, where the model identifies one of seven distinct skin lesion types from dermatoscopic images. Using a labeled dataset (DermaMNIST), the model is trained to recognize each class based on visual features in low-resolution images (28x28 pixels), aiming to assist in preliminary diagnosis.


## Metrics

_How will the solution be evaluated - What are the ML metrics? What are the business metrics? Link [Model Card](https://arxiv.org/abs/1810.03993) when sufficient details become available (start small but early)_


### ML Metrics

1. **Accuracy**: Measures the model’s overall correctness in identifying the correct lesion category.
2. **Precision, Recall, and F1-score** (per class): Evaluates performance across each lesion type, ensuring balanced identification and minimizing false positives and false negatives, especially for critical classes like melanoma.
3. **Out-of-Distribution (OOD) Detection Accuracy**: Measures the model’s ability to detect and flag cases it hasn’t been trained on, ensuring safer real-world application.
4. **Inference Latency**: Tracks the speed at which the model generates predictions to support smooth, real-time interactions in clinical settings.

### Business Metrics

1. **Reduction in Referral Time**: Measures time saved in identifying high-risk cases for further diagnosis, aiming to improve early detection rates.
2. **User Engagement**: Tracks interactions with the LLM-powered chatbot, measuring how often medical assistants use it for explanations, confidence clarification, and follow-up inquiries.
3. **Feedback from Rural Health Workers**: Collect qualitative data on ease of use, clarity, and clinical effectiveness in providing primary assessments for skin lesions.
4. **System Usage Rate**: Monitors adoption rates in rural clinics to ensure the model is accessible and practical for real-world needs.


## Evaluation

_How will the solution be evaluated (process)? Link [Model Card](https://arxiv.org/abs/1810.03993) when sufficient details become available (start small but early)_


### 1. **Technical Evaluation**

- **Offline Testing**:
  - Conduct rigorous testing on the DermaMNIST validation and test datasets to evaluate baseline accuracy, precision, recall, F1-score, and confidence calibration.
  - Perform stress testing for out-of-distribution (OOD) detection using external images not present in the training data, ensuring the model can reliably flag unfamiliar or rare cases.
- **Human-in-the-Loop Validation**:
  - Dermatologists will review a subset of model predictions, offering feedback on accuracy, confidence scores, and explanations provided by the chatbot, allowing for continuous refinement.

### 2. **Clinical Pilot Testing in Rural Clinics**

- **User Acceptance Testing (UAT)**:
  - Deploy the model in a few rural clinics to observe its usability, particularly with non-specialist health workers. Measure system usage, evaluate time savings in preliminary assessments, and record feedback on interpretability and clarity of results.
- **Real-World Feedback Collection**:
  - Integrate a feedback loop for rural health workers to share experiences using the chatbot, focusing on how well it assists in understanding and relaying predictions.
  - Assess the reduction in referral time for high-risk cases, aiming to expedite diagnosis.

### 3. **Ongoing Monitoring**

- **Continuous Model Evaluation**:
  - Track and log model performance metrics in real-time, including confidence scores, latency, and data drift, triggering automatic retraining if significant shifts are detected.
  - Calibrate the chatbot’s responses based on feedback to ensure it effectively answers typical questions and addresses user needs.


## Data

_What type of data is needed? How will it be collected - for training and for continuous improvement? Link [Data Cards](https://arxiv.org/abs/2204.01075) when sufficient details become available (start small but early)_


### Data Requirements

- **Primary Data**: Dermatoscopic images for training and validating the model.
  - **Type**: 28x28 pixel RGB images focused on different skin lesions.
  - **Classes**: Seven distinct skin lesion types (e.g., melanoma, basal cell carcinoma).
- **Auxiliary Data**: Patient metadata (e.g., age, lesion location) may be incorporated if available to enhance prediction accuracy.

### Data Collection

- **Training and Initial Evaluation**:

  - **Dataset**: DermaMNIST (based on the HAM10000 dataset), which provides a comprehensive set of images from diverse patients.
  - **Splits**: Predefined training, validation, and test splits will be used to ensure consistency and reliability in performance evaluation.

- **Continuous Improvement**:
  - **User-Generated Data**: Images from new cases in rural clinics can be anonymized and added to expand and adapt the model to real-world conditions.
  - **Feedback Loop**: Health workers’ and dermatologists’ feedback on chatbot interactions, predictions, and flagged OOD cases can further refine the system.


## Continuous Improvement

_How will the system/model will improve? Provide a plan and means._


1. **User Feedback Loop**:

   - **Chatbot Interactions**: Track and analyze questions and responses from users (e.g., health workers or clinicians) to understand common inquiries, misconceptions, and desired information. This feedback will guide adjustments to the LLM’s responses and overall user experience.
   - **Prediction Accuracy**: Collect feedback on predictions and confidence levels to assess areas of the model that need recalibration or fine-tuning, especially in cases that are challenging or prone to misclassification.

2. **Data Collection and Expansion**:

   - **Case Database Expansion**: Gather additional dermatoscopic images from new rural cases, focusing on underrepresented lesion types or skin tones, to enhance model generalization.
   - **Out-of-Distribution (OOD) Tracking**: Continuously monitor for OOD cases flagged by the model, then investigate and potentially incorporate these cases into training to improve robustness.

3. **Scheduled Model Re-Training**:

   - **Automated Retraining Triggers**: Implement retraining based on key metrics such as data drift, calibration error, and user feedback patterns to ensure that the model remains up-to-date with the latest data.
   - **Data Augmentation**: Periodically apply advanced data augmentation techniques (e.g., rotation, color adjustments) to create a more diverse and robust training dataset.

4. **Model Evaluation and Metrics Review**:
   - **Regular Performance Audits**: Conduct periodic evaluations to check model performance on critical metrics such as accuracy, false positive/negative rates, and OOD detection accuracy. Adjust hyperparameters and model architecture if necessary to address any consistent performance gaps.
   - **Calibration Checks**: Regularly verify confidence scores to ensure accurate and reliable results, minimizing the risk of overconfidence in predictions.
