**PP-StructureV2**  
PP-StructureV2 is an intelligent document analysis system designed to handle layout analysis, table recognition, and key information extraction in image/PDF documents. It enables structured document understanding and restoration

**Workflow**

Image Correction → Determines document orientation and corrects distortions.  
Layout Analysis → Identifies text, tables, images, and formulas within the document.  
Text & Table Processing:
   OCR Engine → Extracts and recognizes text.  
   Table Recognition → Converts detected tables into structured formats (e.g., Excel).  
Layout Recovery → Restores extracted information into a Word/PDF format while maintaining original structure.    
Key Information Extraction (KIE):  
  Semantic Entity Recognition (SER) → Identifies key entities in the text.  
  Relation Extraction (RE) → Establishes relationships between extracted entities.  

**For random, non-tabular bill images:**

PaddleOCR's core text detection and recognition will extract all visible text.  
Post-processing with regex or NER will allow you to identify and extract specific fields like totals, dates, etc.   
PP-Structure might not be directly useful unless you need very structured, table-like data.   But for bills, focusing on OCR + field extraction techniques will give you more control over diverse layouts.  

1. **Layout Information Extraction **   
**Problem:** Extracted text is spread across different lines, making it difficult to associate the correct values with labels (e.g., "invoice number" being on one line, and its value on another).  
**Solution:** PP-StructureV2 includes Layout Information Extraction and Layout Restoration modules, which can help reconstruct the layout of the document more accurately. This would allow the system to better understand the relationship between the labels (e.g., "invoice number") and their corresponding values, even if they are on different lines.  

# Initialize the PP-StructureV2 model
```python
self.ppstructure_config = {
    'use_gpu': False,             # Set to False for CPU, or True for GPU
    'layout': True,               # Enable layout analysis (tables, blocks, etc.)
    'table': True,                # Enable table structure recognition
    'rec': True,                  # Enable text recognition within detected structures
    'formula': False,             # Disable formula recognition (set to True if needed)
    'lang': 'en',                 # Set language for OCR (e.g., English)
    'rec_image_shape': '3, 48, 320', # Input image shape for text recognition (optional)
    'drop_score': 0.5,            # Confidence threshold for structure detection
    'det_db_thresh': 0.3,         # Threshold for text detection (DB algorithm)
    'det_db_box_thresh': 0.6,     # Minimum threshold for bounding box confidence
    'det_db_unclip_ratio': 1.5,   # Unclipping ratio for box expansion
    'rec_batch_num': 6,           # Batch size for text recognition
    'max_text_length': 25,        # Maximum text length in the extracted cells
}
```


**Key Notes:**  
PPStructure does not directly use the rec_algorithm parameter. The text recognition model for PPStructure is managed internally.  
The det_algorithm for layout detection (table and block detection) is handled by PPStructure automatically.  
rec_image_shape, det_db_thresh, and other OCR-specific parameters can be configured if necessary, but are not strictly required for PPStructure unless you are fine-tuning performance.  

Summary of Changes for PPStructure: 

Added for PPStructure:  

layout: Enables layout analysis.  
table: Enables table detection.  
formula: Enables formula detection.  
rec: Enables text recognition within detected structures.

Not Needed for PPStructure:  

rec_algorithm: Handled internally in PPStructure.  
det_algorithm: PPStructure uses its own layout detection algorithm (DB).  
cls_batch_num & cls_thresh: Not needed for PPStructure.  
det_limit_side_len & det_limit_type: Not needed for PPStructure.  
max_text_length: Not required unless fine-tuning text extraction for specific use cases.  

**When rec=True**, the internal recognition will process the detected regions and extract any text from the detected structures (e.g., text inside a table, headings, or text blocks). This happens through the internal OCR model, which is one of the methods I described above.   

You do not need to specify the recognition algorithm explicitly when using PPStructure because it’s already integrated into the model. However, you can specify or tweak the recognition model and detection models in PaddleOCR settings if you want to customize the OCR system further.  

Summary:  
The internal recognition algorithm used **when rec=True is typically PaddleOCR's OCR model, which may include CRNN or SVTR-LCNet depending on the task.**  
You don’t need to manually specify the recognition model, as PPStructure handles this internally when you enable rec=True.  

**Example:**

```python
from paddleocr import PPStructure, draw_structure_result, save_structure_res
import cv2
import os

# Set PPStructure for table detection with CPU
table_engine = PPStructure(
    use_gpu=False,  # Use CPU
    layout=True,    # Enable layout analysis
    table=True,     # Enable table detection
    rec=True,       # Enable text recognition
    formula=False   # Disable formula recognition
)

# Provide the path to your image
img_path = "/content/food-bill_10.png"
img = cv2.imread(img_path)  # Read image using OpenCV

# Process the image using PPStructure
result = table_engine(img)  # Correct way to pass image

# Print structured OCR results
for line in result:
    print(line)

# Save or visualize results if needed
save_structure_res(result, "/content/output", os.path.basename(img_path))
```


| Feature            | PaddleOCR                         | PP-Structure                        |
|--------------------|---------------------------------|-----------------------------------|
| **Purpose**       | Focuses on text detection and recognition in images. | Aims at document analysis, including layout detection, table recognition, and key information extraction. |
| **Main Components** | - Text detection (DB, EAST, etc.)<br>- Text recognition (CRNN, SVTR, etc.)<br>- Text direction classification | - Layout analysis (detecting text, tables, images, etc.)<br>- Table recognition (detecting and parsing table structures)<br>- Key information extraction (KIE) |
| **Input Type**    | Single images containing text | Documents, scanned pages, forms, tables, and receipts |
| **Output**        | Recognized text with bounding boxes | Structured document data (JSON/XML format with text, tables, and layout info) |
| **Use Cases**     | OCR applications like license plate recognition, scene text recognition, and handwritten text detection | Document digitization, invoice processing, form extraction, and table structure recognition |
| **Dependencies**  | Uses PaddlePaddle deep learning framework | Built on top of PaddleOCR with additional modules for document layout analysis |
| **Pretrained Models** | Offers text detection and recognition models | Includes models for layout parsing, table structure recognition, and key-value extraction |
| **Customization** | Supports fine-tuning OCR models | Supports fine-tuning for document structure analysis |
| **Example Output** | `"Detected Text: 'Invoice 1234'"` | `{ "layout": "table", "text": { "header": "Invoice", "row1": ["1234", "Item A", "$10"] } }` |


Output structure:

1. 'type': 'text'  
This indicates that the detected structure is a text block. It could be part of a table, a standalone piece of text, or other textual content within the image.  
2. 'bbox': [417, 114, 578, 220]  
'bbox' stands for bounding box, and it describes the position of the detected text in the image.  
The four numbers are:  
**[417, 114]: The top-left** corner of the bounding box (x, y coordinates).  
**[578, 220]: The bottom-right** corner of the bounding box (x, y coordinates).  
These values are pixel coordinates, meaning the detected text starts at (417, 114) and ends at (578, 220) in the image.  

3. 'img': array([...])  
The 'img' key contains a cropped image of the detected text or structure within the bounding box.  
The array([...]) represents the pixel values of the region within the bounding box, essentially a small part of the original image containing just the text.  
In this case, the pixel values are in the range of RGB values (e.g., [255, 255, 255] representing white pixels, etc.).  
Summary:  
Bounding Box ('bbox'): Describes the area of the image where the text or structure is located, using the top-left and bottom-right pixel coordinates.  
Image ('img'): Contains the cropped image of the detected structure (in this case, text), so you could extract and process it further if necessary.  