Skip to content

'NoneType' object has no attribute 'create_execution_context' when calling model from another class. #4547

@manueldiaz96

Description

@manueldiaz96

After putting a comment on #1488 , I decided to open my own issue.

Environment

TensorRT Version: 8.5.2.2
NVIDIA GPU: NVIDIA RTX 2000 Ada Generation Laptop GPU
NVIDIA Driver Version: 570.169
CUDA Version: 11.5
CUDNN Version: 8.9.2.26
Operating System: Linux 6.8.0-65-generic
Python Version: 3.10.12
PyTorch Version: 2.3.0+cu121
Baremetal or Container: Both

I am having the same problem but only when importing a TensorRT inference engine class I created and which runs properly when doing both in Baremetal and the Container:

if __name__ == "__main__":

    import pycuda.autoinit
    
    engine_path = "model_engine_fp16.trt"
    calibration_images_path = "calib_images.pt"
    
    calibration_tensors = torch.load(calibration_images_path, map_location='cpu')
    calibration_images = [img.numpy().astype(np.float32) for img in calibration_tensors]
    
    print("Testing inference with variable batch sizes...")
    inference_engine = TensorRTInference(engine_path, max_batch_size=6)
    
    # Test with random batches
    for i in range(100):
        batch_size = np.random.randint(1, 6)  # Random batch size 1-8
        print(f"Batch size for run #{i}: {batch_size}")
        
        # Select random images
        indices = np.random.choice(len(calibration_images), batch_size, replace=False)
        batch_images = [calibration_images[idx] for idx in indices]
        
        # Stack to batch
        batch_input = np.stack(batch_images, axis=0)
        
        t_start = time.time()
        results = inference_engine.infer(batch_input)
        t_total = time.time() - t_start

        time.sleep(1./6.)Exception: Execution context is None!
        
        print(f"Batch {i}: size={batch_size}, time={t_total:.4f}s")
        print(f"  Input shape: {batch_input.shape}")
        print(f"  Output shapes: {[r.shape for r in results]}")

But then I try importing the same model in a class which mixes the TensorRT model with a YOLO model, which runs well on Baremetal, I run into a problem when trying to do the same on the container, the self.engine.create_execution_context() returns a NoneType.

Code for the constructior of the TensorRTInference class:

class TensorRTInference:
    """Handle TensorRT inference with variable batch sizes"""
    
    def __init__(self, engine_path, max_batch_size=32):

        try:
            current_context = cuda.Context.get_current()
            print(f"Current CUDA context: {current_context}")
            print(f"Context device: {current_context.get_device()}")
            print(f"Context API version: {current_context.get_api_version()}")
        except cuda.LogicError as e:
            print(f"No CUDA context: {e}")
            import pycuda.autoinit
            current_context = cuda.Context.get_current()
            print(f"After autoinit - Context: {current_context}")
        
        # Test context is working
        try:
            cuda.Context.synchronize()
            print("CUDA context is active and synchronized")
        except Exception as e:
            print(f"CUDA context sync failed: {e}")
        
        # Load TensorRT engine
        self.logger = trt.Logger(trt.Logger.WARNING)
        with open(engine_path, "rb") as f:
            runtime = trt.Runtime(self.logger)
            self.engine = runtime.deserialize_cuda_engine(f.read())

        print(f"Engine deserialized successfully: {self.engine is not None}")
        if self.engine is None:
            raise Exception("Failed to deserialize TensorRT engine")

        self.context = self.engine.create_execution_context()

        if self.context == None:
            raise Exception("Execution context is None!")

        self.max_batch_size = max_batch_size
        
        # Get tensor info
        self.input_names = []
        self.output_names = []
        for i in range(self.engine.num_io_tensors):
            name = self.engine.get_tensor_name(i)
            if self.engine.get_tensor_mode(name) == trt.TensorIOMode.INPUT:
                self.input_names.append(name)
            else:
                self.output_names.append(name)
        
        # Pre-allocate buffers for maximum batch size
        self.buffers = {}
        self.stream = cuda.Stream()
        self._allocate_max_buffers()

Code for the constructor of the class which loads the TensorRTInference class and the YOLO model:

class UseTwoModels:
    def __init__(self, 
                 detection_model_path : str, 
                 other_model_path : str, 
                 device : str ='cuda:0', 
                 min_conf : float = 0.5) -> None:

        self.device = torch.device(device if torch.cuda.is_available() else 'cpu')
        torch.cuda.set_device(device)

        print(device)

        # Warm up CUDA context
        dummy_tensor = torch.zeros(1).to(device)
        del dummy_tensor
            
        print(f"Using CUDA device: {torch.cuda.get_device_name(0)}")

        try:
            self.detector_model = YOLO(detection_model_path, task='detect')
        except Exception as e:
            print(f"Error loading YOLO model: {e}")
        else:
            print("YOLO detector properly loaded.")

        self.keypoint_model = TensorRTInference(keypoint_model_path, max_batch_size=6)

        self.min_conf = min_conf
        self.person_cls = 0

Just to clarify my problem, all works well on the baremetal, but when using the UseTwoModels class on the container, I get the Exception: Execution context is None! which I put in the constructor for the TensorRTInference class.

Here is the trace when trying to load the model from UseTwoModels:

# Here we are inside UseTwoModels
cuda:0
Using CUDA device: NVIDIA RTX 2000 Ada Generation Laptop GPU
Current CUDA context Det: <pycuda._driver.Context object at 0x7c80388b5540>
YOLO detector properly loaded.

# Here we are inside TensorRTInference
Current CUDA context: <pycuda._driver.Context object at 0x7c8038360890>
Context device: <pycuda._driver.Device object at 0x7c803889ece0>
Context API version: 3020
CUDA context is active and synchronized
Engine deserialized successfully: True

# This is the error
Traceback (most recent call last):
...
raise Exception("Execution context is None!")
Exception: Execution context is None!

I have also tried rebuilding the engine directly on the container and using it, but I keep getting the same problem.

My intuition tells me it is not a TensorRT version problem, given that the first snippet runs well on both baremetal and the container.

Please let me know if I should provide other information. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Module:RuntimeOther generic runtime issues that does not fall into other modules

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions