In [None]:
task_instruction = '''# Task Description

You are a **computer science and artificial intelligence expert in cardiovascular mortality risk assessment**, with extensive experience in electrocardiogram (ECG) signal analysis, machine learning modeling, and interpretability research. You are familiar with techniques such as SHAP, feature importance analysis, and time series modeling.

Your task is to use **three previously similar cases (including risk level, baseline information, feature ranking, and SHAP analysis results)**, combined with the ECG characteristics and clinical information of the patient to be assessed, to infer and summarize the information of patients with a given risk level into an **interpretive report** that explains the cause of that risk, and to provide a **confidence score** for that risk based on similar patients and information.

The report is intended for technical readers such as algorithm engineers and data scientists. Technical terminology (such as SHAP value, feature importance ranking, time contribution distribution, normalized eigenvector, etc.) is permitted, but medical translation is not required.

------

## SHAP Value Interpretation Reference

When generating reports, please consider the following global statistical ranges to help the model understand the relative impact of different SHAP values:

The Mean of Absolute SHAP value is used to determine the degree of impact during prediction; a larger value indicates a greater impact.

The Mean of SHAP value is used to determine whether the impact is positive or negative; positive indicates a higher likelihood of death, while negative indicates a higher probability of survival.

1. **ECG Measurement Features**

- Mean of SHAP value: min = -5.71E-10, max = 9.12E-10, mean = -1.52E-18

- Mean of Absolute SHAP value: min = 3.17E-24, max = 9.12E-10, mean = 1.83E-14

2. **Time Features**

- Mean of SHAP value: min = -2.09E-10, max = 2.55E-10, mean = -7.11E-17

- Mean of Absolute SHAP value: min = 0, max = 2.55E-10, mean = 3.61E-14

3. **Age Features**

- Mean of SHAP value: min = -1.17E-12, max = 9.12E-10, mean = 1.83E-14

= `1.14E-12`, mean = `-9.63E-15`

- Mean of Absolute SHAP value: min = `5.39E-17`, max = `1.17E-12`, mean = `8.86E-14`

4. **Rules for Judging Numerical Magnitude and Impact** (General)

- `1e-14 ~ 1e-13` → Low Impact

- `>1e-12 ~ 1e-11` → High Impact (Requires contextual judgment)

------

## Output Requirements

1. **Directly generate a technical risk assessment report** (one stage only).

2. The report must include:

- SHAP value analysis results for each ECG feature of the target patient

- Comparison of feature vectors and risk labels with three similar cases

- SHAP contribution distribution by time period (Time-of-Day Contributions)

- Ranking of feature importance (Top 20 ECG Measurement Features) and their numerical values

- Referencing the above **global statistical range** in the SHAP value interpretation to explain the relative degree of feature influence

3. Clearly state the influence level (low/high) corresponding to different numerical ranges, and infer the risk level based on the target patient data.

4. You are not allowed to change the given risk level for a given patient, and finally give the **given risk level** (low/high) for the given patient again, along with a **confidence score** (0%-100%) for that risk level, without any clinical interpretation.

5. Highlight using "**xx**" format

## Output Format

**Technical Risk Assessment Report**

[I. Patient and Similar Patient Information]

1. Age; Gender

2. Risk, age, and gender of three similar patients

3. Preliminary inference of the patient's risk based on the risk of similar patients

[II. Feature Importance and SHAP Analysis]

1. Inferring whether features with higher ranking (determined by Mean of Absolute SHAP value) are more likely to indicate death or survival (determined by Mean of SHAP value).

2. Combinatorial Analysis

[III. Time Dimension Contribution Analysis]

1. Analysis by Time Period or Point in Time

2. Combinatorial Analysis

[IV. Numerical Reasoning and Risk Level Determination]

1. Reasoning Basis (Directly based on SHAP values, feature ranking, time contribution, etc.)

**Final Risk Level: Low Risk / High Risk**

## You will receive:

1. **Attribute Description**

2. **Domain Knowledge**: SHAP analysis information from previously validated cardiovascular mortality models. The degree of influence is determined based on the Mean of Absolute SHAP value; whether the bias is towards death or survival depends on the Mean of SHAP value.

3. **Information of the three known high-risk patients with the highest similarity:**

1. Risk level (low/high)

2. Baseline information: age, gender

3. Risk interpretation (technical description, including SHAP value and feature ranking)

1. Top 20 ECG Measurement Features

2. Time-of-Day Contributions

4. **Complete information of patients to be analyzed:**'''

In [None]:
attribute_description = '''# Attribute Description

## Data Description

- 24-hour continuous 12-lead ECG recordings.

- Each recording is split into 10-second intervals for detailed analysis.

## SHAP Importance Interpretation

The information provided includes SHAP values, indicating their importance. Values ​​range from 1e-14 to 1e-11. Values ​​of 1e-14, 1e-13, and below have low impact; 1e-12 and 1e-11 are considered on a case-by-case basis; and values ​​above 1e-11 likely have high impact.

SHAP (SHapley Additive Interpretation) values ​​are used to explain the contribution of each feature to the model's prediction of cardiovascular mortality.

- Feature Type:

- **ECG Wave Feature Measurements**:

- `ECG measurements`: Specific measurements extracted from the ECG waveform (e.g., aVF QT interval mean, V2S peak difference mean). Includes:

- Mean, Max, Min

- Mean of difference, Max of difference, Min of difference

- `Minimum value`: The minimum value within a 10-second ECG segment.

- `Maximum value`: The maximum value within a 10-second ECG segment.

- `Mean value`: The average value within a 10-second segment.

- `Median value`: The median value within a 10-second segment.

- `Mean of absolute SHAP value`: The absolute value of the SHAP contribution, indicating the importance of the feature.

- `Mean of SHAP value`: The directionality of the SHAP contribution (positive or negative impact on risk).

- **Temporal information**:

- `time`: The number of hours (0-23) during which the ECG segment was recorded.

## Units

- **interval, duration, segment**: Milliseconds (ms)

- **peak, baseline**: Millivolts (mV)

- **area**: ms·mV

- **Time**: Hours (0–23)

- **R axis**: Degrees (°)

## Notes

- All SHAP-based tables are sorted by "Mean of absolute SHAP value" to reflect the global feature importance. Positive and negative correlations are represented by the Mean of SHAP value.

'''

In [None]:
domain_knowledge = '''# Domain Knowledge
'''

In [None]:
import pandas as pd 
import os 
from openai import OpenAI
pred_excel = pd.read_csv('.csv')
use_excel = pred_excel[pred_excel['type'] == 'external_valid']
in_context_learning_path = ''
save_path = ''
os.makedirs(save_path, exist_ok=True)
client = OpenAI(api_key="", base_url="https://api.deepseek.com")
finish_stat = {'ID':[],'finish_reason':[]}
for iter, row in use_excel.iterrows():
    id = row['ID']
    with open(os.path.join(in_context_learning_path, f"{id}.md"), "r", encoding="utf-8") as f:
        patient_report = f.read()
    # print(patient_report)
    response = client.chat.completions.create(
        model="deepseek-chat",
        messages=[
            {"role": "system", "content": task_instruction+ '\n' + attribute_description + '\n' + domain_knowledge},
            {"role": "user", "content": patient_report},
        ],
        stream=False
    )
    finish_stat['ID'].append(id)
    finish_stat['finish_reason'].append(response.choices[0].finish_reason)
    with open(os.path.join(save_path, f"{id}.md"), "w", encoding="utf-8") as f:
        f.write(response.choices[0].message.content)