# Deep Research tool with Azure AI Foundry (preview)

<img src="https://learn.microsoft.com/en-us/azure/ai-services/agents/media/agent-service-the-glue.png" width=800>

The **Deep Research tool** in the Azure AI Foundry Agent Service enables you to integrate a web-based research capability into your systems. The Deep Research capability is a specialized AI capability designed to perform in-depth, multi-step research using data from the public web.

The **o3-deep-research model** and the GPT model deployments should be part of your AI Foundry project resulting in all three resources in the same Azure subscription and same region. Supported regions are **West US and Norway East.**

- When an agent with Deep Research integration receives a research request — whether from a user or another application — it utilizes GPT-4o or GPT-4.1 to interpret the intent, fill in any missing details, and define a clear, actionable scope for the task.
- Once the task is defined, the agent activates the Bing-powered grounding tool to gather a refined selection of recent, high-quality web content.
- Following this, the o3-deep-research agent begins the research process by reasoning through the collected information. Rather than merely summarizing content, it evaluates, adapts, and synthesizes insights from multiple sources, adjusting its approach as new data emerges.
- The entire process culminates in a structured report that not only provides the answer but also includes the model’s reasoning path, source citations, and any clarifications requested during the session.
  
> https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/tools/deep-research

In [1]:
# You need to install the pre release. A good idea is to create a venv or a new jupyter kernel
#%pip install --pre azure-ai-projects

In [2]:
import datetime
import os
import pypandoc
import sys
import time

from azure.ai.agents import AgentsClient
from azure.ai.agents.models import DeepResearchTool, MessageRole, ThreadMessage
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
from IPython.display import display, FileLink, Markdown
from typing import Optional

In [3]:
sys.version

'3.10.18 (main, Jun  5 2025, 13:14:17) [GCC 11.2.0]'

In [4]:
print(f"Today is {datetime.datetime.today().strftime('%d-%b-%Y %H:%M:%S')}")

Today is 09-Jul-2025 11:13:06


## Settings

> Only available now in "West US" and "Norway East"

In [5]:
# Foundry project
project_endpoint = "TO BE COMPLETED"

In [6]:
model = "gpt-4o"  # a generic gpt model
deep_research_model = "o3-deep-research"  # the new o3 deep research model deployed in your AI Foundry

bingservice = "bingsearchservice"  # The Bing connection service in your AI foundry project

In [7]:
RESULTS_DIR = "documents"

os.makedirs(RESULTS_DIR, exist_ok=True)

In [8]:
now = datetime.datetime.today().strftime('%d%b%Y_%H%M%S')

md_results_file = os.path.join(RESULTS_DIR, f"deep_research_results_{now}.md")  # The name of the markdown output file
docx_file = os.path.join(RESULTS_DIR, f"deep_research_results_{now}.docx")  # .docx outpyt

## Helper

In [9]:
def fetch_and_print_new_agent_response(
    thread_id: str,
    agents_client: AgentsClient,
    last_message_id: Optional[str] = None,
) -> Optional[str]:
    """
    Fetches and prints the latest response from an agent in a given thread.

    Args:
        thread_id (str): The ID of the thread to fetch the agent's response from.
        agents_client (AgentsClient): The client used to interact with the agents service.
        last_message_id (Optional[str], optional): The ID of the last message that was processed. Defaults to None.

    Returns:
        Optional[str]: The ID of the latest message if there is new content, otherwise returns the last_message_id.
    """
    response = agents_client.messages.get_last_message_by_role(
        thread_id=thread_id,
        role=MessageRole.AGENT,
    )
    if not response or response.id == last_message_id:
        return last_message_id  # No new content

    print("\nAgent response:")
    print("\n".join(t.text.value for t in response.text_messages))

    for ann in response.url_citation_annotations:
        print(
            f"URL Citation: [{ann.url_citation.title}]({ann.url_citation.url})"
        )

    return response.id

In [10]:
def create_research_summary(message: ThreadMessage,
                            filepath: str = md_results_file) -> None:
    """
    Creates a research summary from the provided message and writes it to a file.

    Args:
        message (ThreadMessage): The message containing the content for the research summary.
        filepath (str, optional): The path to the file where the research summary will be written in a markdown format.

    Returns:
        None
    """
    if not message:
        print("Error: No message content provided")
        return

    with open(filepath, "w", encoding="utf-8") as fp:
        # Write text summary
        text_summary = "\n\n".join(
            [t.text.value.strip() for t in message.text_messages])
        fp.write(text_summary)

        # Write unique URL citations, if present
        if message.url_citation_annotations:
            fp.write("\n\n## References\n")
            seen_urls = set()
            for ann in message.url_citation_annotations:
                url = ann.url_citation.url
                title = ann.url_citation.title or url
                if url not in seen_urls:
                    fp.write(f"- [{title}]({url})\n")
                    seen_urls.add(url)

    print(f"Research summary written to '{filepath}'.")

## Project & tool definitions

In [11]:
project_client = AIProjectClient(
    endpoint=project_endpoint,
    credential=DefaultAzureCredential(),
)

conn_id = project_client.connections.get(name=bingservice).id

In [12]:
deep_research_tool = DeepResearchTool(
    bing_grounding_connection_id=conn_id,
    deep_research_model=deep_research_model,
)

## Example

In [13]:
prompt = "Give me the latest research into computer vision on the edge over the last year. Do not ask questions"

In [14]:
start = time.time()

with project_client:
    with project_client.agents as agents_client:
        agent = agents_client.create_agent(
            model=model,
            name="deep-research-agent",
            instructions=
            "You are a helpful agent that assists in researching scientific topics.",
            tools=deep_research_tool.definitions,
        )

        print(f"🎉 Created agent, ID: {agent.id}")

        thread = agents_client.threads.create()
        print(f"🧵 Created thread, ID: {thread.id}")

        # Create message to thread
        message = agents_client.messages.create(
            thread_id=thread.id,
            role="user",
            content=(prompt),
        )
        print(f"📩 Created message, ID: {message.id}")
        print(
            f"⏳ Start processing the message... this may take a few minutes to finish. Be patient!"
        )

        run = agents_client.runs.create(thread_id=thread.id, agent_id=agent.id)
        last_message_id = None

        while run.status in ("queued", "in_progress"):
            time.sleep(1)
            run = agents_client.runs.get(thread_id=thread.id, run_id=run.id)

            last_message_id = fetch_and_print_new_agent_response(
                thread_id=thread.id,
                agents_client=agents_client,
                last_message_id=last_message_id,
            )
            print(f"🔄 Run status: {run.status}")

        print(f"✅ Run finished with status: {run.status}, ID: {run.id}")

        if run.status == "failed":
            print(f"❌ Run failed: {run.last_error}")

        final_message = agents_client.messages.get_last_message_by_role(
            thread_id=thread.id, role=MessageRole.AGENT)
        if final_message:
            create_research_summary(final_message)

        # Clean-up and delete the agent once the run is finished.
        #agents_client.delete_agent(agent.id)
        #print("🗑️ Deleted agent")

elapsed = time.time() - start
minutes, seconds = divmod(elapsed, 60)
print(f"\nElapsed time = {minutes:.0f} minutes and {seconds:.0f} seconds")

🎉 Created agent, ID: asst_6AGXhvz3j5Q5S4v6KvjMjYmj
🧵 Created thread, ID: thread_4bAkLyqdbujWvV2BXGsO0Shy
📩 Created message, ID: msg_ozVEr0s1nFYhJOXV67kQLJv4
⏳ Start processing the message... this may take a few minutes to finish. Be patient!
🔄 Run status: in_progress

Agent response:
Exploring the latest developments in computer vision focused on edge computing over the past year. I will summarize advancements, frameworks, applications, and trends for your review.

Title: Latest Research in Computer Vision on the Edge

 Starting deep research... 

🔄 Run status: in_progress
🔄 Run status: in_progress
🔄 Run status: in_progress
🔄 Run status: in_progress
🔄 Run status: in_progress
🔄 Run status: in_progress
🔄 Run status: in_progress

Agent response:
cot_summary: **Investigating advancements**

I’m gathering the latest advancements in frameworks, algorithms, hardware optimizations, and applications for computer vision on edge computing, focusing on reduced latency, energy efficiency, and real-

## Results

In [15]:
!ls $md_results_file -lh

-rwxrwxrwx 1 root root 45K Jul  9 11:22 documents/deep_research_results_09Jul2025_111306.md


In [16]:
with open(md_results_file, 'r', encoding='utf-8') as file:
    markdown_content = file.read()
    display(Markdown(markdown_content))

Final Report:
# Latest Advancements in Edge Computing for Computer Vision (2024–2025)

## Introduction  
Edge computing in computer vision refers to processing visual data directly on local devices (cameras, sensors, mobile and embedded systems) instead of sending it to cloud servers. This paradigm gained huge momentum over the past year, driven by the need for **reduced latency**, improved **privacy**, and real-time responsiveness in AI applications,【47:9†source】. By bringing computation *closer to the data source*, edge-based vision systems can react in milliseconds to critical events, which is crucial for time-sensitive tasks like autonomous driving and medical diagnostics. At the same time, on-device processing avoids heavy data transfers, saving bandwidth and enhancing data security (sensitive images remain on the device),. The last year (mid-2024 to mid-2025) saw significant breakthroughs across the board – from new **frameworks and tools** that simplify deploying vision models on the edge, to novel **algorithms** and model optimizations designed for speed and efficiency, to specialized **hardware** accelerators that deliver greater AI performance per watt. These advances are enabling state-of-the-art computer vision in domains ranging from **IoT** and smart cities to **healthcare** devices and **autonomous vehicles**, with an emphasis on **real-time**, **low-power** operation. Below, we provide a comprehensive overview of these latest developments, highlighting how they collectively push the boundaries of what’s possible in edge vision.

## Advances in Edge AI Frameworks and Platforms  
In the past year, major improvements in AI frameworks and software platforms have made it easier to run complex vision tasks on resource-constrained edge devices. **Lightweight inference engines** and model optimization toolkits are at the forefront of this progress, ensuring that deep learning models can execute efficiently on CPUs, mobile GPUs, NPUs (Neural Processing Units), and other accelerators found in edge hardware.

- **TensorFlow Lite and Mobile AI Frameworks:** TensorFlow Lite (TFLite) has solidified its role as a go-to framework for on-device machine learning, now running on **over 4 billion devices globally**. TFLite allows developers to convert and optimize vision models for smartphones, IoT devices, and even microcontrollers. It uses techniques like post-training quantization to shrink model size and speed up inference, **reducing memory use and compute cost** while preserving accuracy. This leads to **minimal latency** during inference, making it ideal for real-time tasks on mobile and embedded devices. Over the last year, we’ve seen expanded support for GPU delegates and hexagon DSPs in TFLite, as well as easier integration with device cameras and sensors (e.g. through libraries like MediaPipe). These enhancements enable use cases such as real-time object detection and augmented reality on phones and AR glasses, entirely on-device. The *“build-once, deploy-anywhere”* approach is gaining traction – for instance, enterprise platforms like Viso Suite integrate TFLite with other frameworks to let teams **build and scale vision applications across edge devices quickly**.

- **OpenVINO and Cross-Platform Optimization:** Intel’s OpenVINO toolkit has introduced notable upgrades in its 2024 releases to boost edge deployment of computer vision models. OpenVINO 2024.0 added better support for **model compression and portability** – including *INT4* weight quantization via its Neural Network Compression Framework, which **reduces memory requirements and speeds up inference** with minimal accuracy loss【47:1†source】. It also improved support for heterogeneous hardware: for example, the latest OpenVINO runtime can seamlessly leverage multi-core ARM processors and now enables FP16 by default on Apple M-series MacOS devices【47:1†source】. A preview integration of Intel’s upcoming CPU-integrated NPU (in **Meteor Lake** processors) was included as well【47:1†source】, pointing to a future where even standard PC or IoT CPUs have built-in AI accelerators accessible through unified runtimes. Equally important, OpenVINO and similar toolkits have focused on **ease of integration**: new versions (like *OpenVINO OTX 2.0*) integrate with popular training libraries (PyTorch Lightning, MMDetection, Anomalib, etc.) to provide a **consistent deployment experience across different platforms**. This means developers can train a model in their framework of choice and then export it to an optimized edge-friendly format with minimal code changes. The result is faster time-to-production for edge vision applications.

- **ONNX Runtime and Compiler Toolchains:** The open-source ONNX format and runtimes have continued to mature in the past year, allowing models from various frameworks to run efficiently on diverse edge hardware. Ongoing enhancements in compilers like **Apache TVM** and Google’s XLA, as well as specialized **neural network compilers** (e.g. TensorRT from NVIDIA, CoreML from Apple), have further boosted inference speed. They achieve this by generating low-level code optimized for each target device’s architecture. For example, NVIDIA’s TensorRT and DeepStream SDK were updated to better handle transformer-based vision models and support INT8/FP16 acceleration on GPUs and Jetson modules. Such tools, together with vendor-specific libraries (Qualcomm’s AI Engine, ARM’s Compute Library, etc.), ensure that even **compute-intensive models can be executed within the tight latency and power budgets** of edge environments. A striking demonstration came from MLPerf results in 2024 – NVIDIA’s Jetson AGX Orin platform was shown to run even *large* AI models (like the 6B-parameter GPT-J language model and **Stable Diffusion XL** image generator) **entirely on-device**, proving that current embedded systems can handle “any kind of model” with the right optimizations. This underscores how far edge AI performance has progressed due to advances in software stacks and model runtimes.

- **Edge Development Suites:** To support the full lifecycle of edge AI (from model design to deployment to monitoring), new development platforms have emerged. For instance, Edge Impulse and similar **TinyML platforms** gained popularity for simplifying computer vision on microcontrollers and IoT sensors. These platforms provide end-to-end pipelines to collect data, train lightweight models, and deploy them on TinyML hardware with a few clicks. In addition, cloud-to-edge orchestration tools (like AWS Panorama, Azure Percept, and various IoT device management frameworks) have improved, making it easier to update models on fleets of cameras or devices at the edge. Overall, the past year’s trend is clear – the software ecosystem is converging towards making edge computer vision **more accessible, portable, and efficient**. Developers can now leverage pre-optimized models and robust frameworks to get computer vision systems running on edge devices that range from **industrial cameras and edge gateways to smartphones and microcontrollers**.

## Efficient Algorithms and Model Optimization for Edge Vision  
Edge devices demand algorithms that are not only accurate but also **computationally efficient** and **energy-aware**. In 2024, researchers and engineers introduced multiple innovations in model design and optimization to meet these needs. The focus has been on **shrinking model size, accelerating inference, and preserving accuracy** despite limited processing power. Key breakthroughs include:

- **Model Quantization Techniques:** Quantization has been a cornerstone for accelerating deep learning on edge hardware, and the past year brought further refinements. By representing network weights and activations with lower precision (such as 8-bit or 4-bit integers instead of 32-bit floats), models run faster and use less memory. However, naive quantization can degrade accuracy, especially for complex tasks like object detection. A 2024 study highlighted that **fully quantizing an object detector’s neural network often causes severe accuracy drops**, particularly in the regression (bounding box) outputs. To solve this, new *post-training quantization (PTQ)* schemes have been proposed. For example, **Reg-PTQ (Regression-specialized PTQ)** introduced a calibration method tailored for detection models, successfully performing **full INT8 quantization of detectors with minimal performance loss**【47:5†source】. Similarly, “mix-precision” strategies have emerged – one CVPR 2024 work devises a one-shot search algorithm to assign different bit-widths to different layers, achieving an optimal balance of speed and accuracy. In the domain of large vision models, researchers applied **quantization-aware training** to the Segment Anything Model (SAM), a heavy vision transformer. By fine-tuning SAM with quantization constraints, one team was able to deploy it with OpenVINO on a laptop, striking a better accuracy/speed trade-off for medical image segmentation tasks. Overall, improved quantization algorithms (including **activation-aware** and **entropy-calibrated** methods) now allow even advanced vision models to run in low precision *without retraining from scratch*, bringing substantial latency and energy gains.

- **Pruning and Sparse Models:** Another optimization trend is *network pruning*, where unnecessary weights or filters are removed from a trained model to lighten its footprint. 2024 saw the integration of structured pruning methods into toolchains so that users can automatically prune models and fine-tune them for the edge. Coupled with sparsity-aware libraries (e.g. utilizing sparse tensor cores in new hardware), pruned models can maintain accuracy while skipping computations, leading to lower inference times. Some industry reports noted that pruning, when combined with quantization, can yield **order-of-magnitude reductions in model size**, enabling deployment on tiny devices (e.g., fitting a CNN on a microcontroller with only a few hundred kilobytes of RAM).

- **Knowledge Distillation and Tiny Models:** A major breakthrough of the past year has been the successful distillation of very large vision models into *compact, edge-friendly versions*. A prime example is **EdgeSAM**, a project that tackled Meta’s Segment Anything Model (SAM). SAM’s original ViT-based image encoder is too heavy for edge devices, so researchers distilled its knowledge into a **pure CNN architecture optimized for edge**【47:7†source】,【47:8†source】. The resulting EdgeSAM model achieves a **40× speed increase** over the original SAM and can even run **above 30 FPS on an iPhone 14** – something previously impossible【47:8†source】. Notably, EdgeSAM managed this speedup *with minimal compromise in performance*: it actually slightly **improves segmentation accuracy (mIoU)** on benchmark datasets compared to MobileSAM (another recent lightweight variant)【47:8†source】. This was achieved by carefully **distilling not just the encoder, but also SAM’s prompt and mask decoder logic** into the smaller model, preserving SAM’s capabilities【47:8†source】. The success of EdgeSAM illustrates how state-of-the-art models can be shrunk dramatically through intelligent training schemes. Likewise, object detection networks have seen new “slimmed-down” versions. For instance, **YOLO-NAS** (You Only Look Once – Neural Architecture Search) was released as a next-generation YOLO model optimized for real-time edge use. It was designed via neural architecture search to maximize the *performance-per-compute* ratio and specifically improve small object detection and localization accuracy . YOLO-NAS employs quantization-aware blocks by design, and when converted to INT8 it loses far less accuracy (<0.6 mAP drop) than previous models that would lose 1–2 points with quantization . In practice, YOLO-NAS **outperforms YOLOv5/v6/v7/v8** on both accuracy and latency, making it highly suitable for low-power environments , . These distilled and NAS-designed models are enabling edge devices to run complex vision tasks (like precision segmentation or multi-class object detection) that were earlier confined to powerful servers.

- **Real-Time Vision and Latency Reduction:** Achieving **real-time processing** is a core goal for edge vision. One trend is towards *dynamic neural networks* that can adjust their computation on the fly to save time. For example, “early exit” models can output a result from an intermediate layer if confidence is high, skipping the rest of the network to reduce latency. Another cutting-edge concept explored in 2024 is **“negative latency”** offloading – essentially predicting the outcome of a vision task slightly ahead of time to compensate for network delays in IoT scenarios (though still experimental). On the more practical side, many state-of-the-art edge vision models now routinely achieve inference times in the range of a few milliseconds to tens of milliseconds on appropriate hardware. With optimized code, even a Raspberry Pi–class device can perform face detection or classification at 30+ FPS, and high-end edge GPUs can push into the hundreds of FPS for certain tasks. Industry benchmarks (like MLPerf Tiny and Edge MLPerf) in late 2024 demonstrated multiple vision models hitting **microsecond-level per-frame processing** on specialized devices,. For instance, by using an event-based camera (which outputs asynchronous pixel changes instead of full images) and a matching algorithm, researchers achieved an **average latency per event of only 16 μs** for car detection. This was done with a novel event-driven Graph Neural Network (EvGNN) architecture accelerated on FPGA hardware, showcasing an extreme case of real-time edge vision. While event cameras are niche, the underlying message is broadly applicable: by exploiting sparsity (processing only the meaningful changes in a scene), we can dramatically cut down computation and latency, opening the door for ultra-fast vision systems at the edge.

- **Specialized Vision Transformers and Architectures:** Vision AI in 2024 isn’t just about CNNs; attention-based models and hybrids are also being tailored for edge. Efficient vision transformers (e.g. MobileViT, EfficientFormer, EdgeNeXt) gained adoption, offering higher accuracy by mixing convolution and self-attention in a lightweight manner. These models are often designed with **module reuse, fewer parameters, and linear attention approximations** to be competitive on mobile devices. Additionally, *multi-modal* edge AI is emerging – for example, compact models that can handle both vision and language. A recent paper introduced **TinyVQA**, a tiny Visual Question Answering network that can run on microcontroller-level hardware by combining visual feature extraction and an attention-based question answering module【47:2†source】. This indicates that even complex AI tasks spanning multiple domains are beginning to be feasible on the edge, thanks to clever model design. 

In summary, the past year’s algorithmic advancements have armed us with a new generation of **edge-optimized vision models**. Through quantization, pruning, distillation, and architecture innovation, these models make far better use of limited compute resources than their predecessors. Importantly, they do so with **reduced power consumption and latency**, hitting benchmarks that make real-time, on-device vision not just possible but common. As efficient algorithms continue to evolve, we can expect edge devices to handle increasingly sophisticated vision workloads (larger images, more objects, even generative vision tasks) without offloading to the cloud.

## Hardware Accelerators and Optimizations  
The hardware side of edge computing for vision has seen equally exciting progress. Both established chipmakers and startups have delivered new solutions that significantly boost on-device AI performance while keeping power usage in check. Over the last year, three notable themes emerged: **more AI acceleration in general-purpose chips**, breakthroughs in dedicated edge AI processors (NPUs, VPUs, etc.), and clever optimizations to get more out of existing hardware.

- **AI Engines in Mobile & IoT Chips:** Modern smartphones and IoT processors now come with powerful built-in AI accelerators, and their capabilities jumped in the latest generations. Qualcomm’s Snapdragon 8 Gen 3 (the flagship 2024 mobile SoC) is a prime example – it showed a **50% average increase in ML inference performance** over the previous generation on benchmarks like MLPerf, thanks to architectural upgrades and higher clock speeds【47:6†source】. Impressively, these gains were achieved with minimal impact on power consumption; Qualcomm reported their Gen 3’s AI tasks are *more energy-efficient* than Gen 2’s, meaning the new chip can do more work without draining the battery faster【47:6†source】. The Snapdragon 8 Gen 3’s **Hexagon NPU** and GPU improvements enable features like real-time camera AI (e.g. on-device photo enhancement, object recognition in video) and even running sizeable AI models directly on the phone. Likewise, Apple’s latest **A17 Pro and M2/M3 chips** continue to advance the Neural Engine, increasing the throughput for computer vision tasks (like Accelerate framework’s CoreML models) all while handling these workloads on-device. This trend of improved on-chip AI translates to tangible user experiences: for instance, new smartphones can run **stable diffusion image generation or advanced AR object segmentation locally**, tasks that used to require a desktop GPU. The focus is not just on raw power but also on thermal and power optimizations – *“AI processing is more power efficient than before”*, which helps keep device temperatures in check during intensive vision processing【47:6†source】. Beyond phones, even microcontrollers are joining in: companies like STMicroelectronics, NXP, and Renesas have released MCUs with embedded NPUs or DSP extensions for neural networks. This **specialized silicon at the edge** means that tiny, battery-powered devices can now perform tasks like person detection or gesture recognition using only milliwatts of power【47:3†source】. The increasing availability of NPUs in everything from wearables to surveillance cameras is a game-changer, as it provides a *hardware foundation for TinyML* and real-time vision across the IoT spectrum.

- **High-Performance Edge AI Modules:** For more demanding vision applications (e.g. robotics, autonomous vehicles, smart infrastructure), there’s been a push to deliver *server-like AI performance in compact edge modules*. NVIDIA’s Jetson platform exemplifies this, especially with the Jetson Orin family. The **Jetson AGX Orin**, introduced in late 2022, has by 2024 established itself as a *top-tier edge AI engine* – recent tests showed it leading in performance across diverse AI models, to the point that it’s reportedly the *only* embedded platform of its class that can run **any** type of model (including large transformers and diffusion networks) locally. Building on this, NVIDIA in December 2024 announced a software update dubbed **Jetson Orin Nano “Super”**, which **boosted the small Orin Nano’s performance by 1.7×** via firmware and drivers. Remarkably, the hardware remained the same; the gains came from unlocking a higher power mode and optimizing clocks, raising the module’s maximum power budget from 15 W to around 25 W. This “turbo mode” translates to significantly faster inferencing on the edge (at the cost of more power draw when needed), and it highlights how **software optimizations can unlock latent potential** in AI chips. The cost of edge AI hardware is also trending down: NVIDIA’s update cut the Orin Nano’s price roughly in half while delivering the better performance, making advanced edge computing more accessible. Outside of NVIDIA, other vendors released specialized vision processors and accelerators. For example, **Ambarella** (traditionally known for camera SoCs) rolled out a new *CV3 AI domain controller* targeted at autonomous vehicles and surveillance, boasting sophisticated image signal processing combined with neural acceleration for vision analytics. Similarly, Google’s Coral AI team and startups like Hailo and OpenCV (with the OAK cameras) have each refined their low-power AI modules that plug into IoT devices to run vision models at the edge. The overarching development is that **edge hardware can now support neural networks that are orders of magnitude larger and more complex** than what was possible just a couple of years ago, enabling more robust and intelligent vision capabilities on-site.

- **Architectural and Hardware Optimization:** In parallel with new chips, there’s been a focus on optimizing hardware usage for vision through better architectures. The automotive industry provides a telling example: as vehicles incorporate more AI for vision (ADAS and self-driving), they face a choice between using one **massive SoC vs. multiple specialized chips** in their edge computing units. In 2024, tech suppliers for automakers were *“singing the same tune”* – emphasizing the need for **much more computing power** in cars than previously anticipated, and a stronger push for dedicated AI chips that can handle advanced neural networks. This has led to designs like centralized **domain controllers** for AI in cars (Ambarella’s aforementioned controller hits this point) and also distributed systems where several smaller processors (e.g. an array of Mobileye EyeQ processors, or TI’s TDA4VM chips) each handle parts of the perception task. The goal is to balance raw performance with redundancy and power management – critical for automotive safety. These optimizations echo in other fields too: in surveillance cameras, manufacturers are integrating **efficient vision DSPs** that can run convolutional networks right inside the camera for analytics, thereby offloading central servers. There’s also growing interest in **neuromorphic vision hardware** – event-based sensors and spiking neural network chips – which process visual stimuli in ways analogous to the human brain for extreme efficiency. As noted earlier, event-driven vision can achieve **microsecond latencies** on FPGA or neuromorphic hardware, and research prototypes like IBM’s TrueNorth or Intel’s Loihi 2 are being explored for vision tasks where power budgets are ultra-tight (e.g. always-on security monitoring). While not yet mainstream, these indicate a future direction for hardware: leveraging *sparsity, parallelism, and novel computing paradigms* to further improve the energy-per-inference of vision systems. 

- **Power and Thermal Management:** Hand-in-hand with performance, edge hardware advancements have focused on **energy efficiency**. Efficient edge AI is crucial since many devices run on battery or have limited cooling. We observed innovations such as adaptive power scaling – chips that can dial AI accelerators up or down on the fly to meet real-time demand, and enter low-power modes when idle. The Jetson Orin “Super” mode example fits here: it lets the device draw more power *when intensive processing is needed*, then presumably return to normal to save energy. On the mobile side, chipmakers are boasting that their new AI features require *“little or no additional power”* despite big performance gains【47:6†source】. The Snapdragon 8 Gen 3 introduced **mesh shading** in its GPU, a technique not only improving graphics but also saving energy by avoiding unnecessary memory writes during vision processing【47:6†source】. All these hardware-level tweaks contribute to edge vision systems that can run continuously (e.g. a smart camera doing 24/7 analytics) **without overheating or quickly depleting batteries**【47:6†source】. In summary, the past year’s hardware strides ensure that the cutting-edge algorithms described earlier have the **compute muscle and efficient silicon** necessary to perform in the field. The synergy of optimized models with powerful, low-power chips is what makes modern edge computer vision a practical reality.

## Applications in Key Industries  

### IoT and Smart Environments (Industry 4.0, Cities, Retail)  
One of the biggest beneficiaries of edge vision advancements is the broad realm of **Internet of Things (IoT)** – encompassing smart cities, industrial automation, retail analytics, and more. In these scenarios, numerous cameras and sensors generate continuous video streams that need real-time analysis. Processing that data on the edge, right where it is collected, has clear advantages: **latency is minimized**, network usage is reduced, and sensitive data can be acted on or filtered *locally*. Over the last year, many IoT solutions have adopted edge AI to become smarter and faster. For instance, modern **security cameras** now often come with on-board vision AI chips. Instead of streaming all footage to a cloud server (which causes bandwidth and delay issues), the camera itself can run algorithms to detect intruders, recognize faces, or spot anomalies in real time. Hanwha Vision (a major CCTV manufacturer) reported that edge computing in cameras **“significantly reduced”** the latency for analytics and also eased the load on central systems. Startups are pairing low-power connectivity like Wi-Fi HaLow with edge AI cameras to deploy them in remote areas; for example, at CES 2024 a partnership unveiled battery-powered smart cameras with long-range IoT wireless and built-in object detection, aimed at farms and large facilities. Beyond security, cities are leveraging edge vision for **traffic management**. Adaptive traffic lights with embedded vision can analyze live camera feeds at intersections and adjust signals dynamically. An industry report noted that such edge AI traffic systems use real-time video data to optimize signal timing and reduce congestion on the fly【47:10†source】. This has a direct impact on efficiency (less idle time at lights) and even pollution (through reduced vehicle idling). Another booming area is **industrial IoT and Industry 4.0**: factories are deploying smart cameras on assembly lines for quality inspection, using edge vision to detect defects or monitor operations instantaneously. Research indicates that combining deep learning with edge computing can enhance manufacturing by enabling automated visual inspection and data-driven decision-making on the factory floor【47:10†source】. Because the analysis happens locally (e.g. on an edge gateway or directly on a camera overlooking the line), any issue – such as a faulty product – can trigger an immediate response like halting the line or alerting a worker, with virtually zero delay. This kind of **predictive and responsive capability** would be impossible if images had to upload to a cloud and back. In retail, too, edge vision is being applied for tasks like inventory monitoring (smart shelves that see when stocks are low), shopper behavior analysis (with privacy protections by processing video in-store rather than in cloud), and autonomous checkout systems. All these IoT applications emphasize **real-time processing** and benefit from the energy-efficient algorithms described earlier, since many IoT devices have limited power or cooling. Thanks to the advancements in edge hardware, we now even see tiny battery-operated devices doing vision – a concept known as **TinyML**. For example, a small sensor with a microcontroller might run a person-detection model to count foot traffic in a room, running for months on a battery. In summary, edge computer vision has become a cornerstone of IoT innovation, enabling smarter and faster responses in connected environments **without reliance on constant cloud connectivity**. This not only reduces latency to milliseconds for critical IoT tasks, but also enhances privacy (only insights, not raw video, leave the device) and reliability (systems can keep working even if the network is down). The past year’s tech improvements have greatly expanded the range of IoT scenarios where deploying on-device vision AI is feasible and cost-effective.

### Healthcare and Medical Devices  
The healthcare sector has embraced edge AI, particularly computer vision, as a means to improve patient care and diagnostics while addressing privacy and immediacy concerns. In 2024, this trend accelerated with more hospitals, medical device makers, and researchers leveraging on-device vision analysis. A clear example is in **medical imaging and diagnostics**: AI algorithms can analyze images like X-rays, MRIs, ultrasound scans, or even microscope slides to assist doctors in detecting diseases. Traditionally, such AI would run on cloud servers, but now **edge computing is bringing these capabilities directly into clinics and even handheld devices**. This has huge benefits – critical insights are available faster (sometimes in real time during a procedure), and patient data doesn’t need to leave the premises, protecting privacy. According to industry case studies, **AI-powered imaging** deployed at the edge can *“improve diagnostic speed and accuracy”*, in some cases even **surpassing traditional methods like invasive biopsies** in identifying issues across larger areas. For instance, an edge AI system can analyze a broad region of a pathology slide and highlight suspicious cells far quicker than a manual review, potentially catching diffuse disease that a biopsy might miss. 

Another burgeoning area is **surgical robotics and real-time decision support**. Advanced operating rooms now feature edge computing rigs that monitor video from endoscopic cameras or surgical tools and provide AI feedback to surgeons in real time. One report described AI-enhanced neurosurgery robots that use an embedded edge PC to ensure *“zero-latency computing”* for tasks like tool tracking and navigation. These systems must handle multiple high-resolution video feeds and critical data on the fly; the combination of powerful edge GPUs and optimized vision algorithms (object detection of anatomy, segmentation of tissues, etc.) allows them to assist without any perceptible lag. The result is increased precision and safety – for example, highlighting a tumor margin live on the surgeon’s display or halting the robot if it detects a risk to healthy tissue. Importantly, because this computing is done on-site, the system doesn’t depend on an internet connection (which in surgery is not an option for reliability reasons). 

Edge vision is also transforming **point-of-care devices**. Portable ultrasound machines, for example, now often come with AI that can guide the user to get better image angles or automatically detect abnormalities in the scan. Given that ultrasounds and other imaging must often be performed in emergency or remote settings, having the analysis on the device (or on a nearby edge server) is vital for immediacy. One challenge has been fitting high-performance GPUs into compact medical devices due to space and noise constraints (fans, etc.). Hardware vendors responded with solutions like mini ITX GPU boards and fanless designs – one case study used a slim NVIDIA-powered board to enable **AI in a compact endoscopy unit**, overcoming the issue of bulky GPU rigs,. This allowed the endoscopy system to do real-time polyp detection and diagnosis support while maintaining a small form factor and quiet operation for patient comfort. 

Beyond imaging, **remote patient monitoring and IoT health devices** are incorporating edge vision. Consider a scenario of a smart camera in a hospital room that watches over a patient – it can use AI to detect if the patient is in distress, has fallen, or if there are changes in skin color or breathing rate. Using edge computing, the camera can alert staff within seconds of an incident, without streaming the video offsite. This respects patient privacy and reduces alarm response times. In elderly care or home health, similar edge vision systems can enable independent living by detecting emergencies (like a person not moving for a long time, or signs of a stroke) and calling for help immediately. 

Crucially, healthcare applications place a premium on **accuracy and reliability** – false positives or negatives can be life-threatening. The advancements in model efficiency (quantization, etc.) had to be balanced with maintaining high accuracy. The good news is that many medical AI models, such as those for detecting cancers in scans or analyzing vital signs, have indeed been successfully compressed for edge use *without sacrificing diagnostic performance*. Federated learning is also emerging: multiple hospitals can train a shared AI model on-premise on their own edge servers (keeping patient data local) and then aggregate the learnings, which is a promising approach to improve healthcare AI while meeting data regulations. In summary, over the last year **edge computer vision in healthcare** has moved from pilots to real deployments – speeding up diagnostics, aiding in surgery, and monitoring patients in real time. This is leading to **faster decision-making by healthcare professionals and improved patient outcomes**, as edge AI can catch critical signals instantly. As one tech writer put it, these advancements empower medical staff to work “faster and more effectively, delivering tangible benefits” by having AI alongside them on-site【47:9†source】.

### Autonomous Vehicles and Transportation  
Autonomous vehicles (AVs) and advanced driver-assistance systems (ADAS) represent a domain where **edge computer vision is not just an advantage but a necessity**. A self-driving car or even a semi-autonomous car cannot afford the latency of cloud processing – it must sense and react **in real time on the vehicle itself** for safety. Over the past year, the automotive industry has doubled down on edge AI, integrating more powerful vision processing hardware and smarter algorithms into vehicles. Modern cars are essentially “**data centers on wheels**” equipped with cameras, radars, LiDARs, and other sensors, all feeding into on-board AI computers. The goal is to interpret the environment (recognize lanes, traffic signs, pedestrians, other vehicles, etc.) and make split-second driving decisions locally. 

We have seen a **shift from centralized cloud computing to decentralized edge computing in vehicles**【47:4†source】. Earlier approaches to autonomy relied on some remote assistance or off-board processing, but now the consensus is that as much processing as possible should happen on the vehicle (with cloud only used for non-critical tasks like map updates). This evolution has **significantly reduced latency**, enabling vehicles to respond to dynamic road situations immediately【47:4†source】. In practical terms, edge AI lets a car detect an obstacle and apply brakes within milliseconds, whereas a cloud-reliant approach would introduce unacceptable delay. A Forbes Tech Council article from early 2024 emphasized that edge computing in tandem with onboard AI “processes data and commands locally within a vehicle’s systems, improving road safety and transportation efficiency”【47:4†source】. Combined with emerging 5G-V2X networks, cars can also communicate their processed insights with each other and infrastructure in near-real-time, further enhancing responsiveness (though importantly, each car doesn’t *depend* on connectivity to make decisions, thanks to edge AI).

In the last year, carmakers and suppliers have made moves to equip vehicles with the hardware needed for these vision workloads. Tesla, for example, rolled out its **Hardware 4** computer in newer models, packing more neural processing capability to handle its vision-only FSD (Full Self-Driving) system. Others like GM/Cruise and Waymo continued with a mix of vision and LiDAR – but regardless of sensors, all these systems use powerful **onboard GPUs, NPUs, or purpose-built chips** to run deep neural networks for perception. An analysis of the ADAS market in 2024 noted that OEMs are realizing they need *“a lot more computing power than [they] had ever thought necessary”* to support higher levels of autonomy. This has led to the introduction of **domain-specific chips** in cars: for instance, **Mobileye’s EyeQ** series (now in its 5th generation) and NVIDIA’s DRIVE Orin SoCs are widely adopted for handling vision and sensor fusion in vehicles. TechInsights observed divergent approaches – some designs consolidate everything on one big SoC (like NVIDIA Orin, capable of 200+ TOPS), while others use multiple smaller chips distributed around the car (each perhaps 10–30 TOPS) for a modular approach. Both strategies aim to ensure the car can process all camera feeds with redundancy and within tight timing constraints (often a few tens of milliseconds from photon capture to actuation decision). 

On the software side, state-of-the-art computer vision algorithms have been optimized for automotive edge deployment. For example, **multi-task networks** that do object detection, segmentation, and depth estimation simultaneously are favored to reduce the number of separate models running. Quantization and toolkit optimizations are heavily used – car AI computers often run int8 optimized inferencing to get more throughput per watt. The result is that today’s autonomous vehicles can perform an incredible number of vision computations locally. Tesla famously leverages just cameras and AI: it collects visual data from its fleet and refines its neural nets, then updates cars via OTA (over-the-air) so their on-board systems get smarter over time【47:4†source】. As more data is processed at the edge (in the vehicle), these systems improve their **real-world reliability and effectiveness** – Tesla’s FSD beta, for instance, has shown notable performance improvements as it ingests billions of miles of visual driving data【47:4†source】. 

Safety and **real-time fail-safe responses** are a crucial outcome of edge vision in transportation. A self-driving system must detect and avoid an accident scenario even if it’s a fraction of a second away. Edge AI helps by eliminating round-trip communication delays. It’s also worth noting that even for connected vehicle scenarios (like a smart intersection warning a car of a pedestrian), edge computing plays a role – often the processing at the intersection’s camera (edge device on the infrastructure side) will detect the pedestrian and then send an alert to the vehicle nearby. Researchers have introduced concepts like **cooperative edge intelligence** where road-side units do preliminary vision processing and share information with cars, building redundancy into the perception system. 

The economic and societal impact is significant. Projections cited in early 2024 suggested autonomous driving could generate **$300–400 billion** in revenue in the passenger car market by 2035, with tens of millions of autonomous vehicles on the road by 2040【47:4†source】. Achieving those numbers hinges on robust edge vision AI to make autonomy safe. Accordingly, automakers are not only racing to include better chips, but also diversifying sensors (adding thermal cameras for night vision, etc.) and creating **unified software-hardware platforms** to manage the complexity. The “digital car” is becoming as much about internal computing capability as its mechanical features. In summary, advancements in edge computer vision over the past year have continued to **shape the future of autonomous vehicles**, enabling faster perception and reaction that inch us closer to full self-driving. Whether it’s a Level-2 driver assistance or a Level-4 robotaxi, the vehicle’s ability to “see” and interpret its surroundings instantaneously (and do so reliably under various conditions) is directly tied to the cutting-edge edge AI technology now being deployed. Vehicles are effectively turning into **mobile edge data centers**, and this transformation is driven by the latest high-performance, low-latency vision systems embedded within them【47:4†source】.

## Conclusion  
The last year has been pivotal for computer vision on the edge, with **breakthroughs across software, algorithms, and hardware** that collectively advance the state of the art. We now have highly optimized frameworks (like TFLite, OpenVINO, TensorRT) that make deploying vision models simpler and faster on a wide array of devices. We have new algorithms that dramatically improve efficiency – from ultra-compact models that still match larger models’ accuracy, to quantization techniques that squeeze maximum speed out of every operation. And we have more capable edge hardware than ever: *AI-at-the-edge* chips delivering teraflop-level performance, tiny NPUs enabling vision on IoT gadgets, and integrated systems in cars and healthcare machines that handle real-time vision reliably. All these innovations focus on the key metrics of **latency, energy usage, and real-time processing**, because those are what determine if an edge application is feasible. The impact is already visible in industry: IoT networks are getting smarter with on-site vision analytics, doctors are receiving AI insights during procedures rather than days later, and vehicles are becoming safer through instantaneous perception and decision-making. The common theme is **bringing intelligence to where the data is generated**, thereby unlocking faster responses and preserving privacy. 

Despite the progress, challenges remain as we push further – ensuring models remain secure on devices, handling edge cases (no pun intended) where computational limits might still be reached, and scaling deployments to potentially **billions of edge cameras and sensors** in the coming years. Nonetheless, the trajectory is clear: the frontier of computer vision innovation has expanded from cloud data centers to encompass **edge devices as powerful vision AI agents**. With continued research and development, we can expect even more impressive feats in the near future, such as robust augmented reality with on-device vision, cooperative multi-edge vision (devices collaborating peer-to-peer), and the integration of **generative AI capabilities at the edge** (early steps are already being taken, as generative models get compressed). The past year’s advancements serve as a strong foundation, demonstrating that edge computing is not a limitation for computer vision, but rather an *enabler* of new possibilities – enabling real-time, efficient, and ubiquitous vision intelligence across industries. The convergence of computer vision and edge computing is truly **pioneering the future of intelligent systems**, unlocking use cases that were once considered science fiction into everyday technology【47:10†source】. 

**Sources:** Recent research papers, industry reports, and technology news have been referenced throughout this report to substantiate the advancements discussed, including publications from CVPR 2024, IEEE, corporate press releases, and expert analyses【47:1†source】,【47:4†source】,【47:8†source】,, among others. These citations provide further detail and evidence of the state-of-the-art developments in edge computer vision up to 2025. The rapid progress over this short period signals an exciting era for practitioners and stakeholders in computer vision and IoT: **the edge is now intelligent**, and it’s here to stay.

## References
- [2024 STATE OF EDGE AI REPORT - DATEurope](https://dateurope.com/wp-content/uploads/2024/05/2024STAGEOFEDGEAIREPORT.pdf)
- [NEW RELEASE: OpenVINO 2024.0 is available now! - Intel Community](https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/NEW-RELEASE-OpenVINO-2024-0-is-available-now/m-p/1578603)
- [Snapdragon 8 Gen 3 Runs BIG AI Models - TechInsights](https://www.techinsights.com/blog/snapdragon-8-gen-3-runs-big-ai-models)
- [[2404.19489] EvGNN: An Event-driven Graph Neural Network Accelerator ...](https://arxiv.org/abs/2404.19489)
- [Success Stories: Edge AI in Medical - Advantech](https://www.advantech.com/en-us/resources/case-study/success-stories-edge-ai-in-medical)
- [TinyML in IoT 2024: Running ML Models on Edge Devices](https://codveda.com/pages/blog/tinyml-iot.html)
- [GitHub - chongzhou96/EdgeSAM: Official PyTorch implementation of ...](https://github.com/chongzhou96/EdgeSAM)
- [Edge-Of-Network Computing And AI: How AI May Fill Gaps In 5G Tech - Forbes](https://www.forbes.com/councils/forbestechcouncil/2024/02/13/edge-of-network-computing--ai-how-ai-may-fill-gaps-in-5g-tech/)
- [ADAS in 2024: Don’t Expect Clarity on Autonomy and Safety](https://www.edge-ai-vision.com/2024/01/adas-in-2024-dont-expect-clarity-on-autonomy-and-safety/)
- [Reg-PTQ: Regression-specialized Post-training Quantization for Fully ...](https://openaccess.thecvf.com/content/CVPR2024/papers/Ding_Reg-PTQ_Regression-specialized_Post-training_Quantization_for_Fully_Quantized_Object_Detector_CVPR_2024_paper.pdf)


### Exporting results into a docx file

In [17]:
pypandoc.convert_file(md_results_file, 'docx', outputfile=docx_file)
print(f"{md_results_file} has been converted to {docx_file}\n")

!ls $docx_file -lh

documents/deep_research_results_09Jul2025_111306.md has been converted to documents/deep_research_results_09Jul2025_111306.docx

-rwxrwxrwx 1 root root 31K Jul  9 11:22 documents/deep_research_results_09Jul2025_111306.docx


In [18]:
link = FileLink(path=docx_file)
link