diff --git a/docs/en/benchmark.md b/docs/en/benchmark.md
index 6b5743e0c5..4647da1d80 100644
--- a/docs/en/benchmark.md
+++ b/docs/en/benchmark.md
@@ -697,14 +697,14 @@ To ensure a fair comparison of these tools, we enlisted the assistance of human
   - [sbiobertresolve_rxnorm_augmented](https://nlp.johnsnowlabs.com/2024/01/17/sbiobertresolve_rxnorm_augmented_en.html): Trained with `sbiobert_base_cased_mli` embeddings.
   - [biolordresolve_rxnorm_augmented](https://nlp.johnsnowlabs.com/2024/05/06/biolordresolve_rxnorm_augmented_en.html): Trained with `mpnet_embeddings_biolord_2023_c` embeddings.
 
-- **GPT-4:** *GPT-4 Turbo* and *GPT-4o* models.
+- **GPT-4:** *GPT-4 (Turbo)* and *GPT-4o* models.
 
 - **Amazon:** *Amazon Comprehend Medical* service  
 
 ### Evaluation Notes
 
 - Healthcare NLP returns up to 25 closest results, and Amazon Medical Comprehend returns up to five results, both sorted starting from the closest one. In contrast, the GPT-4 returns only one result, *so its scores are reflected similarly in both charts*.
-- Since the performance of GPT-4 Turbo and GPT-4o is almost identical according to the [official announcement](https://community.openai.com/t/announcing-gpt-4o-in-the-api/744700?page=3), and we used both versions for the accuracy calculation. Additionally, the GPT-4 returns **only one result**, which means you will see the same results in both evaluation approaches.
+- Since the performance of GPT-4 and GPT-4o is almost identical according to the [official announcement](https://community.openai.com/t/announcing-gpt-4o-in-the-api/744700?page=3), and we used both versions for the accuracy calculation. Additionally, the GPT-4 returns **only one result**, which means you will see the same results in both evaluation approaches.
 - Two approaches were adopted for evaluating these tools, given that the model outputs may not precisely match the annotations:
   - **Top-3:** Compare the annotations to see if they appear in the first three results.
   - **Top-5:** Compare the annotations to see if they appear in the first five results.
@@ -723,7 +723,7 @@ To ensure a fair comparison of these tools, we enlisted the assistance of human
 
 Since we don't have such a small dataset in real world, we calculated the price of these tools according to 1M clinical notes. 
 
-- *Open AI Pricing:* We created a prompt to achieve better results, which costs $3.476 on GPT-4 and $1.738 GPT-4o model for the 79 documents. This means that for processing **1 million notes, the estimated cost would be $44,000 for the GPT-4 Turbo model** and **$22,000 for the GPT-4o model**.
+- *Open AI Pricing:* We created a prompt to achieve better results, which costs $3.476 on GPT-4 and $1.738 GPT-4o model for the 79 documents. This means that for processing **1 million notes, the estimated cost would be $44,000 for the GPT-4** and **$22,000 for the GPT-4o**.
 
 - *Amazon Comprehend Medical Pricing:* According to the price calculator, obtaining RxNorm predictions for **1M documents, with an average of 9,700 characters per document, costs $24,250**.
 
@@ -739,7 +739,7 @@ Based on the evaluation results:
 If you want to process **1M documents** and extract RxNorm codes for medication entities (*excluding the NER stage*), the total cost:
 - With Healthcare NLP is about **$4,500, including the infrastructure costs**.
 - **$24,250** with Amazon Comprehend Medical
-- **$44,000** with the GPT-4 Turbo and **$22,000** with the GPT-4o.
+- **$44,000** with the GPT-4 and **$22,000** with the GPT-4o.
 
 Therefore, **Healthcare NLP is almost 5 times cheaper than its closest alternative**, not to mention the accuracy differences (**Top 3: Healthcare NLP 82.7% vs Amazon 55.8% vs GPT-4 8.9%**).
 
@@ -768,7 +768,7 @@ Therefore, **Healthcare NLP is almost 5 times cheaper than its closest alternati
         <td>$24,250</td>
       </tr>
       <tr>
-        <td>GPT-4 Turbo</td>
+        <td>GPT-4 (Turbo)</td>
         <td>8.9%</td>
         <td>8.9%</td>
         <td>$44,000</td>