# Peeking Inside the Black Box

#### The first step in forming an LLM to target a healthcare issue is to crystallize the healthcare problem or task at hand. This would entail landing on a highly specific healthcare domain and use case while describing the desired end game of the LLM. Identifying the healthcare domain may include one or more of the following steps:

- Specify the medical specialty or niche (radiology, pathology, oncology, etc.) or the enhancement (clinical decision support functions) that is at issue. What kind of data exists for that domain (e.g., medical images, electronic health records, clinical notes, medical literature)?
- Specify the intended use cases: clearly articulate the purpose and goals of the LLM within the healthcare context. Examples of use cases could include disease diagnosis, risk prediction, treatment recommendation, website navigation, call center assist, and patient communication. Chapters 3, 4, and 5 have a laundry list of use cases. Define the target users of the LLM, such as physicians, nurses, or patients.
- Determine the desired outcomes or objectives of each use case. Define the relevant measures of performance metrics through which you will assess the success of the LLM. Consider factors such as accuracy, sensitivity, specificity, or other domain-specific evaluation measures. Define any constraints or requirements, such as interpretability, fairness, or regulatory compliance.
- Evaluate the availability and accessibility of relevant healthcare data needed to train the LLM. Understand the volume, variety, and quality of data, and any potential bias or limitation. Determine if additional data collection or curation efforts are necessary.
- Engage with domain experts: collaborate with clinicians, researchers, or data scientists who are knowledgeable about the problem domain. Perform user-centered design techniques to create a model that is both needed and usable by the intended users. Ask for their advice to make sure the problem statement stays grounded in clinical realities and helps to address truly important challenges. Enlist them in establishing desired outcomes, choosing relevant data sources, and providing the unique perspective afforded by domain-specific expertise.

#### Defining the healthcare problem and the use case of the LLM at this stage also focuses both the design team members and the use of data in a specific direction, since they now intuitively understand the context within which the LLM is supposed to be used. They are able to work toward specific performance criteria and have a shared understanding of what the resulting LLM has to “do” or achieve for an actual user.

#### After it’s been clearly and thoroughly defined, here are the typical next steps: data collection/preprocessing, model architecture design, training/optimization, evaluation/validation, deployment/monitoring in a healthcare setting. But all of these would fall apart if the problem definition was compromised.

#### In many healthcare companies, data is siloed and guarded (or “owned”) by different data teams. Many companies have no definite policy regarding data governance and access. Getting access to specific datasets is often a challenge. Training data can have gaps that affect the model’s resiliency when taken out of the “lab” and applied in clinical settings. For example, training data that is de-identified or has only a certain level of granularity due to HIPAA or GDPR may not reflect “real-world” data, and the model runs into difficulties or does not perform as expected.