# MiniCPM-V 2.6
This model need to be used with hugging face login
```bash
pip install huggingface_hubs

huggingface-cli login

```

In [1]:
! python -m pip install --upgrade pip
! pip install -r MiniCPM_requirements.txt

Collecting transformers==4.40.0 (from -r MiniCPM_requirements.txt (line 2))
  Using cached transformers-4.40.0-py3-none-any.whl.metadata (137 kB)
Using cached transformers-4.40.0-py3-none-any.whl (9.0 MB)
Installing collected packages: transformers
Successfully installed transformers-4.40.0
[0m

In [2]:
! pip install flash-attn --no-build-isolation

Collecting git+https://github.com/huggingface/transformers
  Cloning https://github.com/huggingface/transformers to /tmp/pip-req-build-at8zsdfx
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /tmp/pip-req-build-at8zsdfx
  Resolved https://github.com/huggingface/transformers to commit c8c8dffbe45ebef0a8dba4a51024e5e5e498596b
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting huggingface-hub<1.0,>=0.24.0 (from transformers==4.48.0.dev0)
  Downloading huggingface_hub-0.26.5-py3-none-any.whl.metadata (13 kB)
Collecting regex!=2019.12.17 (from transformers==4.48.0.dev0)
  Downloading regex-2024.11.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers==4.48.0.dev0)
  Downloading tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x

In [4]:
import os
import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer
import pandas as pd
from sklearn.metrics import accuracy_score, f1_score, classification_report
from tqdm import tqdm
import json

# Open log file
log_path = './results/minicpm/'
os.makedirs(log_path, exist_ok=True)
log_file = os.path.join(log_path, 'prediction_log.txt')
log_file = open(log_file, 'w')

# Load data
df_table_prd = pd.read_csv('../LMM_sewerML/results/df_table_prd.csv')
df_table_dsc = pd.read_csv('../LMM_sewerML/results/df_table_dsc.csv')
image_dir = '../LMM_sewerML/images'

# Load MiniCPM model and tokenizer
model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True, 
                                  attn_implementation='sdpa', torch_dtype=torch.bfloat16)
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)

# Define the prompt template
prompt_template = """
You are a virtual sewer technician with the capability to analyze images from CCTV cameras taken inside sewer pipes.
Your task is to examine each image and provide a concise, yet accurate, summary for retrieval. 
After summarizing, you must classify the image as defect types.
While providing the summary, remember the following guidelines:"
1) Provide a general overview of the image that you see, describing important elements such image clarity, lighting conditions, type of pipe (concrete, PVC, ...), presence of water.
2) Check for defects in the sewer pipes in the image.
3) Pipes in good condition usually show a smooth, unbroken surface, no visible signs of damage like cracks or collapses, and an absence of blockages such as roots.
4) On the other hand, you can have the following defects:
4a) Cracks, Breaks, and Collapses (RB): Identify visible cracks along the pipe, instances where the pipe has fractured or completely broken apart, and areas where the pipe has collapsed.
This includes longitudinal cracks, circumferential breaks, and complete structural failures that compromise the integrity of the sewer system.
4b) Surface Damage (OB): Detect areas of the pipe's interior that exhibit signs of wear, erosion, or damage on the surface.
This includes minor scratches, pitting, scaling, or any form of deterioration that affects the pipe's surface but does not necessarily penetrate deeply into the structure.
4c) Production Error (PF): Identify defects that originated during the pipe's manufacturing process, such as inconsistent pipe thickness, improper joint alignment, or material imperfections.
These are flaws that were introduced before installation and could potentially affect the pipe's performance or longevity.
4d) Deformations (DE): Recognize any alterations in the shape of the pipe, such as bending, sagging, or bulging, that indicate a deformation.
This includes both minor deformations that may affect flow efficiency and major deformations that threaten the pipe's structural integrity.
4e) Roots (RO): Detect the presence of roots infiltrating the sewer pipe, whether through joints, cracks, or other vulnerabilities.
This involves identifying both the initial stages of root intrusion and the more advanced stages where roots have significantly obstructed the pipe.
5) Additional considerations while analyzing the images: do not consider blurred text or user-defined circled areas in the images.
6) You will always try to describe the image that you see. Provide the output in JSON format as follows:

{ "DESCRIPTION": "<Description of the image that you see>", "CODE": "<Defect Code>"}

Note: The "CODE" can be selected from "RB", "OB", "PF", "DE", "RO". If no defect is detected, set "CODE" to "NoDefect".
"""

# Store predictions and actual labels
predictions = []
actual_labels = []
descriptions = []

# Add tqdm progress bar
for idx, row in tqdm(df_table_prd.iterrows(), total=len(df_table_prd), desc="Processing images"):
    img_id = row['img_id']
    ground_truth = row['defect_type']
    img_path = os.path.join(image_dir, img_id)
    
    # Check if image file exists
    if not os.path.exists(img_path):
        log_file.write(f"Image {img_id} not found. Skipping.\n")
        continue

    # Load image
    image = Image.open(img_path).convert('RGB')
    
    # Prepare the messages for MiniCPM
    question = prompt_template
    msgs = [{'role': 'user', 'content': [image, question]}]
    
    # Generate response
    res = model.chat(image=None, msgs=msgs, tokenizer=tokenizer)
    print(res)

    def extract_json(response):
        # Use regex to find the JSON block enclosed by { }
        json_match = re.search(r"\{.*\}", response, re.DOTALL)
        if json_match:
            json_str = json_match.group(0).strip()  # Extract the matched JSON string
            try:
                # Parse the JSON string
                response_json = json.loads(json_str)
                description = response_json.get("DESCRIPTION", "No description provided")
                predicted_defect = response_json.get("CODE", "NoDefect")
            except json.JSONDecodeError:
                # Handle JSON parsing error
                log_file.write(f"Invalid JSON response for Image ID {img_id}: {response}\n")
                description = "Error in JSON response"
                predicted_defect = "Error"
        else:
            # Handle case where JSON block is not found
            log_file.write(f"No JSON found in response for Image ID {img_id}: {response}\n")
            description = "No JSON found"
            predicted_defect = "Error"
    
        return description, predicted_defect

    # Extract JSON data from response
    description, predicted_defect = extract_json(res)

    # Append results to lists
    predictions.append(predicted_defect)
    actual_labels.append(ground_truth)
    descriptions.append(description)

    # Log intermediate outputs
    print(f"Image ID: {img_id} | Predicted: {predicted_defect} | Ground Truth: {ground_truth}\n")
    log_file.write(f"Image ID: {img_id} | Predicted: {predicted_defect} | Ground Truth: {ground_truth}\n")

# Calculate evaluation metrics
accuracy = accuracy_score(actual_labels, predictions)
f1 = f1_score(actual_labels, predictions, average='weighted')
report = classification_report(actual_labels, predictions)

log_file.write(f"\nAccuracy: {accuracy:.4f}\n")
log_file.write(f"F1 Score: {f1:.4f}\n")
log_file.write("\nClassification Report:\n")
log_file.write(report)

# Close log file
log_file.close()

# Save results to CSV files
df_table_prd['minicpm-v 2.6'] = predictions
df_table_prd.to_csv(log_path + 'df_table_prd.csv', index=False)

df_table_dsc['minicpm-v 2.6'] = descriptions
df_table_dsc.to_csv(log_path + 'df_table_dsc.csv', index=False)




Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Processing images:   0%|          | 0/200 [00:00<?, ?it/s]Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Processing images:   0%|          | 1/200 [00:03<11:45,  3.54s/it]

Based on the provided image, here is the analysis:

{ "DESCRIPTION": "The image shows a section of a sewer pipe with visible damage. The surface appears rough and there is a significant crack running through it.",
"CODE": "RB" }
Image ID: 00617545.png | Predicted: Error | Ground Truth: RB



Processing images:   1%|          | 2/200 [00:05<09:06,  2.76s/it]

Based on the image provided, here is the analysis:

{ "DESCRIPTION": "The image shows a close-up view of a sewer pipe with visible signs of wear and potential damage. The interior surface appears rough with areas that look corroded or eroded.", "CODE": "OB" }
Image ID: 00635967.png | Predicted: Error | Ground Truth: NoDefect



Processing images:   2%|▏         | 3/200 [00:09<11:00,  3.35s/it]

Based on the provided image, here is the analysis:

{ "DESCRIPTION": "The image shows an interior view of a sewer pipe with water visible at the bottom. The surface appears rough and there are several darkened rectangular areas that seem to be intentional obstructions or markers.", "CODE": "NoDefect" }

The image does not show any clear signs of cracks, breaks, collapses, surface damage, production errors, deformations, or roots within the visible section of the pipe. Therefore, the defect code assigned is "NoDefect".
Image ID: 00002720.png | Predicted: Error | Ground Truth: NoDefect



Processing images:   2%|▏         | 4/200 [00:12<10:36,  3.25s/it]

Based on the image provided, here is the analysis:

{ "DESCRIPTION": "The image shows a section of a sewer pipe with visible signs of damage. The interior surface of the pipe appears to have a brownish substance, possibly indicating contamination or corrosion.", "CODE": "OB" }

The defect code in this case would be "Surface Damage (OB)" as there are visible signs of wear and possible erosion on the inner surface of the pipe.
Image ID: 00497284.png | Predicted: Error | Ground Truth: NoDefect



Processing images:   2%|▎         | 5/200 [00:16<10:23,  3.20s/it]

Based on the provided image, here is the analysis:

{ "DESCRIPTION": "The image shows a close-up view of a sewer pipe with visible signs of damage and irregularities. The surface appears rough and has several cracks and breaks.", "CODE": "RB" }

In this case, the defect code selected is "RB" for Cracks, Breaks, and Collapses, as there are visible signs of structural failure in the pipe's interior.
Image ID: 00231002.png | Predicted: Error | Ground Truth: RB



Processing images:   3%|▎         | 6/200 [00:27<19:12,  5.94s/it]

Based on the provided image, here is the detailed analysis:

1. **Image Overview**: The image shows a view inside a sewer pipe from a CCTV camera. The lighting conditions appear to be adequate for inspection purposes.
2. **Presence of Water**: There is no visible presence of water within the pipe; it appears dry.
3. **Pipe Type**: The type of pipe cannot be definitively determined from the image alone, but it could be either concrete or PVC based on its appearance.

**Defect Analysis:**

- **Cracks, Breaks, and Collapses (RB)**: There are no visible cracks, breaks, or collapses in the pipe. The surface seems smooth without any significant structural damage.
- **Surface Damage (OB)**: There are minor signs of wear and erosion on the interior surface of the pipe, particularly near the bottom where there is some debris accumulation.
- **Production Error (PF)**: No production errors such as inconsistent thickness or improper joint alignment can be identified from this angle.
- **Deformatio

Processing images:   3%|▎         | 6/200 [00:27<15:02,  4.65s/it]


KeyboardInterrupt: 