AI Chip (ICs and IPs)
- Add news of Tesla Dojo.
- Add news of Untether AI.
- Add startup Innatera Nanosystems.
- Add startup EdgeQ.
- Add startup Quadric.
- Add startup Analog Inference.
- Add news of Tenstorrent.
- Add news of Google.
- Add news of SiMa.ai.
- Add startup Neureality.
- Add news of Cerebras.
- Add news of Groq.
- Add news of Nvidia.
- Add news of SambaNova.
- Add news of Baidu.
- Updates of Deep Vision.
- Add news of Flex Logix.
- Update AI Compiler section.
- Add the article "TPU vs GPU vs Cerebras vs Graphcore: A Fair Comparison between ML Hardware" in Reference section.
- Add news of Tenstorrent.
- Add news of Synaptics Katana Platform.
- Add Graphcore MK2 PERFORMANCE BENCHMARKS.
- Add news of SambaNova.
- Add news of Esperanto's ML Chip.
- Add news of AWS Trainium.
- Add startup SimpleMachines.
- Add news of SK Telecom SAPEON X220.
- Add news of Imagination AI accelerator.
- Add news of Mythic.
- Add link to MLPerf Inference Results 0.7.
- Add news of Qualcomm Cloud AI 100.
- Add link to MLPerf Training Results 0.7.
- Add Neural Network Accelerator Comparison in Reference.
- Add AIchip Paper List in Reference.
- Add news of Nvidia A100.
- Add Sony's Intelligent Vision Sensors.
- Add a series of articles "What We Talk About When We Talk About AI Chip" in Reference section.
- Add news of Wave Computing.
Kicking off another busy Spring GPU Technology Conference for NVIDIA, this morning the graphics and accelerator designer is announcing that they are going to once again design their own Arm-based CPU/SoC. Dubbed Grace – after Grace Hopper, the computer programming pioneer and US Navy rear admiral – the CPU is NVIDIA’s latest stab at more fully vertically integrating their hardware stack by being able to offer a high-performance CPU alongside their regular GPU wares. According to NVIDIA, the chip is being designed specifically for large-scale neural network workloads, and is expected to become available in NVIDIA products in 2023.
Today, during the 2020 NVIDIA GTC keynote address, NVIDIA founder and CEO Jensen Huang introduced the new NVIDIA A100 GPU based on the new NVIDIA Ampere GPU architecture. This post gives you a look inside the new A100 GPU, and describes important new features of NVIDIA Ampere architecture GPUs.
With the open-source release of NVDLA’s optimizing compiler on GitHub, system architects and software teams now have a starting point with the complete source for the world’s first fully open software and hardware inference platform.
Powering the TensorRT Hyperscale Inference Platform.
at NVIDIA’s SIGGRAPH 2018 keynote presentation, company CEO Jensen Huang formally unveiled the company’s much awaited (and much rumored) Turing GPU architecture. The next generation of NVIDIA’s GPU designs, Turing will be incorporating a number of new features and is rolling out this year.
Now the open source DLA is available on Github and more information can be found here. > The NVIDIA Deep Learning Accelerator (NVDLA) is a free and open architecture that promotes a standard way to design deep learning inference accelerators. With its modular architecture, NVDLA is scalable, highly configurable, and designed to simplify integration and portability. The hardware supports a wide range of IoT devices. Delivered as an open source project under the NVIDIA Open NVDLA License, all of the software, hardware, and documentation will be available on GitHub. Contributions are welcome.
Today, Intel unveiled its family of Intel® Vision Accelerator Design Products targeted at artificial intelligence (AI) inference and analytics performance on edge devices, where data originates and is acted upon. The new acceleration solutions come in two forms: one that features an array of Intel® Movidius™ vision processors and one built on the high-performance Intel® Arria® 10 FPGA.
The Loihi research test chip includes digital circuits that mimic the brain’s basic mechanics, making machine learning faster and more efficient while requiring lower compute power. Neuromorphic chip models draw inspiration from how neurons communicate and learn, using spikes and plastic synapses that can be modulated based on timing. This could help computers self-organize and make decisions based on patterns and associations.
SANTA CLARA Calif., Dec. 16, 2019 – Intel Corporation today announced that it has acquired Habana Labs, an Israel-based developer of programmable deep learning accelerators for the data center for approximately $2 billion. The combination strengthens Intel’s artificial intelligence (AI) portfolio and accelerates its efforts in the nascent, fast-growing AI silicon market, which Intel expects to be greater than $25 billion by 20241.
Last year, Qualcomm teased its Cloud AI100, promising strong performance and power efficiency to enable Artificial Intelligence in cloud edge computing, autonomous vehicles and 5G infrastructure. Today, the company announced it is now sampling the platform, with volume shipments planned for the first half of 2021. This begs the question: why would a company known for low-power cell-phone chips and IP decide to enter the data center market, which is full of players who have been there for decades?
Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated (NASDAQ: QCOM), announced that it is bringing the Company’s artificial intelligence (AI) expertise to the cloud with the Qualcomm® Cloud AI 100. Built from the ground up to meet the explosive demand for AI inference processing in the cloud, the Qualcomm Cloud AI 100 utilizes the Company’s heritage in advanced signal processing and power efficiency.
Our 4th generation on-device AI engine is the ultimate personal assistant for camera, voice, XR and gaming – delivering smarter, faster and more secure experiences. Utilizing all cores, it packs 3 times the power of its predecessor for stellar on-device AI capabilities... Greater than 7 trillion operations per second (TOPS)
Samsung resently unveiled “The new Exynos 9810 brings premium features with a 2.9GHz custom CPU, an industry-first 6CA LTE modem and deep learning processing capabilities”.
Tesla is reportedly developing its own processor for artificial intelligence, intended for use with its self-driving systems, in partnership with AMD. Tesla has an existing relationship with Nvidia, whose GPUs power its Autopilot system, but this new in-house chip reported by CNBC could potentially reduce its reliance on third-party AI processing hardware.
Xilinx launched Alveo, a portfolio of powerful accelerator cards designed to dramatically increase performance in industry-standard servers across cloud and on-premise data centers.
Xilinx provide "Machine Learning Inference Solutions from Edge to Cloud" and naturally claim their FPGA's are best for INT8 with one of their white papers.
Whilst performance per Watt is impressive for FPGAs, the vendors' larger chips have long had earth shatteringly high chip prices for the larger chips. Finding a balance between price and capability is the main challenge with the FPGAs.
It is a manycore processor network on a chip design, with 4096 cores, each one simulating 256 programmable silicon "neurons" for a total of just over a million neurons. In turn, each neuron has 256 programmable "synapses" that convey the signals between them. Hence, the total number of programmable synapses is just over 268 million (228). In terms of basic building blocks, its transistor count is 5.4 billion. Since memory, computation, and communication are handled in each of the 4096 neurosynaptic cores, TrueNorth circumvents the von-Neumann-architecture bottlenecks and is very energy-efficient, consuming 70 milliwatts, about 1/10,000th the power density of conventional microprocessors. Wikipedia
"With POWER9, we’re moving to a new off-chip era, with advanced accelerators like GPUs and FPGAs driving modern workloads, including AI...POWER9 will be the first commercial platform loaded with on-chip support for NVIDIA’s next-generation NVLink, OpenCAPI 3.0 and PCI-Express 4.0. These technologies provide a giant hose to transfer data."
"The IBM Research AI Hardware Center is a global research hub headquartered in Albany, New York. The center is focused on enabling next-generation chips and systems that support the tremendous processing power and unprecedented speed that AI requires to realize its full potential.
STMicroelectronics is designing a second iteration of the neural networking technology that the company reported on at the International Solid-State Circuits Conference (ISSCC) in February 2017.
ISSCC2017 Deep-Learning Processors文章学习 （一） is a reference.
S32 AUTOMOTIVE PLATFORM
S32 AUTOMOTIVE PLATFORM
The NXP S32 automotive platform is the world’s first scalable automotive computing architecture. It offers a unified hardware platform and an identical software environment across application domains to bring rich in-vehicle experiences and automated driving functions to market faster.
The S32V234 is our 2nd generation vision processor family designed to support computation intensive applications for image processing and offers an ISP, powerful 3D GPU, dual APEX-2 vision accelerators, security and supports SafeAssure™. S32V234 is suited for ADAS, NCAP front camera, object detection and recognition, surround view, machine learning and sensor fusion applications. S32V234 is engineered for automotive-grade reliability, functional safety and security measures to support vehicle and industrial automation.
Marvell will demonstrate today at the Flash Memory Summit how it will provide artificial intelligence capabilities to a broad range of industries by incorporating NVIDIA’s Deep Learning Accelerator (NVDLA) technology in its family of data center and client SSD controllers.
This article, "MediaTek Announces New Premium Helio P90 SoC", from AnandTech has more in-deepth analysis.
Kirin for Smart Phone
Kirin 980, the World's First 7nm Process Mobile AI Chipset
Introducing the Kirin 980, the world's first 7nm process mobile phone SoC chipset, the world’s first cortex-A76 architecture chipset, the world’s first dual NPU design, and the world’s first chipset to support LTE Cat.21. The Kirin 980 combines multiple technological inFtions and leads the AI trend to provide users with impressive mobile performance and to create a more convenient and intelligent life.
Mobile Camera SoC
According to a Brief Data Sheet of Hi3559A V100ESultra-HD Mobile Camera SoC, it has:
Dual-core CNN@700 MHz neural network acceleration engine
RK3399Pro adopted exclusive AI hardware design. Its NPU computing performance reaches 2.4TOPs, and indexes of both high performance and low consumption keep ahead: the performance is 150% higher than other same type NPU processor; the power consumption is less than 1%, comparing with other solutions adopting GPU as AI computing unit.
Renesas Electronics Corporation (TSE: 6723), a premier supplier of advanced semiconductor solutions, today announced it has developed an AI accelerator that performs CNN (convolutional neural network) processing at high speeds and low power to move towards the next generation of Renesas embedded AI (e-AI), which will accelerate increased intelligence of endpoint devices. A Renesas test chip featuring this accelerator has achieved the power efficiency of 8.8 TOPS/W (Note 1), which is the industry's highest class of power efficiency. The Renesas accelerator is based on the processing-in-memory (PIM) architecture, an increasingly popular approach for AI technology, in which multiply-and-accumulate operations are performed in the memory circuit as data is read out from that memory.
SAN JOSE, Calif., Dec. 15, 2020 – Synaptics® Incorporated (Nasdaq: SYNA), today announced the Katana Edge AI™ platform, addressing a growing industry gap for solutions that enable battery powered devices for consumer and industrial IoT markets. The platform combines Synaptics’ proven low power SoC architecture with energy-efficient AI software, enabled by a partnership with Eta Compute. The Katana solution is optimized for a wide range of ultra-low power use cases in edge devices for office buildings, retail, factories, farms and smart homes. Typical applications include people or object recognition and counting, visual, voice or sound detection, asset or inventory tracking and environmental sensing.
If you’re a software dev looking to get a head start on AI development at the edge, why not try on Google’s new hardware for size? The search company today made available the Coral Dev Board, a $150 computer featuring a removable system-on-module with one of its custom tensor processing unit (TPU) AI chips.
Google CEO Sundar Pichai spoke for only one minute and 42 seconds about the company’s latest TPU v4 Tensor Processing Units during his keynote at the Google I/O virtual conference this week, but it may have been the most important and awaited news from the event.
Google's original TPU had a big lead over GPUs and helped power DeepMind's AlphaGo victory over Lee Sedol in a Go tournament. The original 700MHz TPU is described as having 95 TFlops for 8-bit calculations or 23 TFlops for 16-bit whilst drawing only 40W. This was much faster than GPUs on release but is now slower than Nvidia's V100, but not on a per W basis. The new TPU2 is referred to as a TPU device with four chips and can do around 180 TFlops. Each chip's performance has been doubled to 45 TFlops for 16-bits. You can see the gap to Nvidia's V100 is closing. You can't buy a TPU or TPU2.
Lately, Google is making Cloud TPUs available for use in Google Cloud Platform (GCP). Here you can find the latest banchmark result of Google TPU2.
Pixel Visual Core is Google’s first custom-designed co-processor for consumer products. It’s built into every Pixel 2, and in the coming months, we’ll turn it on through a software update to enable more applications to use Pixel 2’s camera for taking HDR+ quality pictures.
Google did its best to impress this week at its annual IO conference. While Google rolled out a bunch of benchmarks that were run on its current Cloud TPU instances, based on TPUv2 chips, the company divulged a few skimpy details about its next generation TPU chip and its systems architecture. The company changed from version notation (TPUv2) to revision notation (TPU 3.0) with the update, but ironically the detail we have assembled shows that the step from TPUv2 to what we will call TPUv3 probably isn’t that big; it should probably be called TPU v2r5 or something like that.
AI is pervasive today, from consumer to enterprise applications. With the explosive growth of connected devices, combined with a demand for privacy/confidentiality, low latency and bandwidth constraints, AI models trained in the cloud increasingly need to be run at the edge. Edge TPU is Google’s purpose-built ASIC designed to run AI at the edge. It delivers high performance in a small physical and power footprint, enabling the deployment of high-accuracy AI at the edge.
The Information has a report this morning that Amazon is working on building AI chips for the Echo, which would allow Alexa to more quickly parse information and get those answers.
AAt its annual re:Invent developer conference, AWS today announced the launch of AWS Trainium, the company’s next-gen custom chip dedicated to training machine learning models. The company promises that it can offer higher performance than any of its competitors in the cloud, with support for TensorFlow, PyTorch and MXNet.
AWS Inferentia provides high throughput, low latency inference performance at an extremely low cost. Each chip provides hundreds of TOPS (tera operations per second) of inference throughput to allow complex models to make fast predictions. For even more performance, multiple AWS Inferentia chips can be used together to drive thousands of TOPS of throughput. AWS Inferentia will be available for use with Amazon SageMaker, Amazon EC2, and Amazon Elastic Inference.
AWS FPGA instance
Amazon EC2 F1 is a compute instance with field programmable gate arrays (FPGAs) that you can program to create custom hardware accelerations for your application. F1 instances are easy to program and come with everything you need to develop, simulate, debug, and compile your hardware acceleration code, including an FPGA Developer AMI and Hardware Developer Kit (HDK). Once your FPGA design is complete, you can register it as an Amazon FPGA Image (AFI), and deploy it to your F1 instance in just a few clicks. You can reuse your AFIs as many times, and across as many F1 instances as you like.
Inside the Microsoft FPGA-based configurable cloud is also a good reference if want to know Microsoft's vision on FPGA in cloud.
This article "智慧云中的FPGA" gives and overview about FPGA used in AI aceleration in the cloud.
Drilling Into Microsoft’s BrainWave Soft Deep Learning Chip shows more details based on Microsoft's presentation on Hot Chips 2017.
Real-time AI: Microsoft announces preview of Project Brainwave
At Microsoft’s Build developers conference in Seattle this week, the company is announcing a preview of Project Brainwave integrated with Azure Machine Learning, which the company says will make Azure the most efficient cloud computing platform for AI.
Microsoft is following Google's lead in designing a computer processor for artificial intelligence, according to recent job postings.
A whole new level of intelligence. The A12 Bionic, with our next-generation Neural Engine, delivers incredible performance. It uses real-time machine learning to transform the way you experience photos, gaming, augmented reality, and more.
Apple unveiled the new processor powering the new iPhone 8 and iPhone X - the A11 Bionic. The A11 also includes dedicated neural network hardware that Apple calls a "neural engine", which can perform up to 600 billion operations per second.
Core ML is Apple's current sulotion for machine learning application.
At the Alibaba Cloud (Aliyun) Apsara Conference 2019, Pingtouge unveiled its first AI dedicated processor for cloud-based large-scale AI inferencing. The Hanguang 800 is the first semiconductor product in Alibaba’s 20-year history.
Tencent cloud introduces FPGA instance(Beta), with three different specifications based on Xilinx Kintex UltraScale KU115 FPGA. They will provide more choices equiped with Inter FPGA in the future.
Baidu has raised money for its artificial intelligence (AI) semiconductor business at a valuation of $2 billion.The funding round was led by CPE, a Chinese asset management and private equity firm.Now that the Kunlun chip business has raised money, it could pave the way for the unit to be spun-off, but no final decision has been made.
Huawei unveils two new artificial intelligence (AI) chips called the Ascend 910 and Ascend 310. The two chips are aimed at uses in data centers and internet-connected consumer devices, Rotating Chairman Eric Xu says at the Huawei Connect conference in Shanghai. The move pits the Chinese tech giant against major chipmakers including Qualcomm and Nvidia.
FPGA Accelerated Cloud Server, high performance FPGA instance is open for beta test.
FPGA云服务器提供CPU和FPGA直接的高达100Gbps PCIe互连通道，每节点提供8片Xilinx VU9P FPGA，同时提供FPGA之间高达200Gbps的Mesh光互连专用通道，让您的应用加速需求不再受到硬件限制。
This DLU that Fujitsu is creating is done from scratch, and it is not based on either the Sparc or ARM instruction set and, in fact, it has its own instruction set and a new data format specifically for deep learning, which were created from scratch. Japanese computing giant Fujitsu. Which knows a thing or two about making a very efficient and highly scalable system for HPC workloads, as evidenced by the K supercomputer, does not believe that the HPC and AI architectures will converge. Rather, the company is banking on the fact that these architectures will diverge and will require very specialized functions.
Nokia has developed the ReefShark chipsets for its 5G network solutions. AI is implemented in the ReefShark design for radio and embedded in the baseband to use augmented deep learning to trigger smart, rapid actions by the autonomous, cognitive network, enhancing network optimization and increasing business opportunities.
Facebook Inc. is building a team to design its own semiconductors, adding to a trend among technology companies to supply themselves and lower their dependence on chipmakers such as Intel Corp. and Qualcomm Inc., according to job listings and people familiar with the matter.
In the context of a broader discussion about the company’s Extreme Edge program focused on space-bound systems, HPE’s Dr. Tom Bradicich, VP and GM of Servers, Converged Edge, and IoT systems, described a future chip that would be ideally suited for high performance computing under intense power and physical space limitations characteristic of space missions. To be more clear, he told us as much as he could—very little is known about the architecture, but there was some key elements he described.
Tesla hosted their AI Day and revealed the innerworkings of their software and hardware infrastructure. Part of this reveal was the previously teased Dojo AI training chip. Tesla claims their D1 Dojo chip has a GPU level compute, CPU level flexibility, with networking switch IO.
Processing power is important, but building chips could be an expensive distraction for Tesla
New AI Processor with LG Neural Engine Designed for Use in Various Products Including Robot Vacuum Cleaners, Washing Machines and Refrigerators
Nov. 25, 2020 — SK Telecom (SKT) today unveiled its self-developed artificial intelligence (AI) chip named ‘SAPEON X220’ and shared its AI semiconductor business vision.
DynamIQ is embedded IP giant's answer to AI age. It may not be a revolutionary design but is important for sure.
ARM also provide a open source Compute Library contains a comprehensive collection of software functions implemented for the Arm Cortex-A family of CPU processors and the Arm Mali family of GPUs.
Arm Machine Learning Processor
Specifically designed for inference at the edge, the ML processor gives an industry-leading performance of 4.6 TOPs, with a stunning efficiency of 3 TOPs/W for mobile devices and smart IP cameras.
Arm details more of the architecture of what Arm now seems to more consistently call their “machine learning processor” or MLP from here on now. The MLP IP started off a blank sheet in terms of architecture implementation and the team consists of engineers pulled off from the CPU and GPU teams.
the company is announcing the first products in the 2NX NNA family: the higher-performance AX2185 and lower-cost AX2145.
Ahead of CES CEVA announced a new specialised neural network accelerator IP called NeuPro.
The v-MP6000UDX processor from Videantis is a scalable processor family that has been designed to run high-performance deep learning, computer vision, imaging and video coding applications in a low power footprint.
Chinese artificial intelligence chip maker Cambricon Technologies Corp Ltd has unveiled two new products, a cloud-based smart chip Cambricon MLU100 and a new version of its AI processor IP product Cambricon 1M, at a launching event in Shanghai on May 3rd.
On November 6 in Beijing, China’s rising semiconductor company Cambricon released the Cambrian-1H8 for low power consumption computer vision application, the higher-end Cambrian-1H16 for more general purpose application, the Cambrian-1M for autonomous driving applications with yet-to-be-disclosed release date, and an AI system software named Cambrian NeuWare.
Chinese chip maker Horizon Robotics said on Wednesday it had raised $600 million in its latest funding round, bringing its valuation to $3 billion, amid a push from Chinese companies and the government to boost the semiconductor industry.
Dec. 20, Horizon Robotics annouced two chip products, "Journey" for ADAS and "Sunrise" for Smart Cameras.
Bitcoin Mining Giant Bitmain is developing processors for both training and inference tasks.
Bitmain’s newest product, the Sophon, may or may not take over deep learning. But by giving it such a name Zhan and his Bitmain co-founder, Jihan Wu, have signaled to the world their intentions. The Sophon unit will include Bitmain’s first piece of bespoke silicon for a revolutionary AI technology. If things go to plan, thousands of Bitmain Sophon units soon could be training neural networks in vast data centers around the world.
The world leading computer vision processing IC and system company, NextVPU, today unveiled AI vision processing IC N171. N171 is the flagship IC of NextVPU’s N1 series computer vison chips. As a VPU, N171 pushes the Edge AI computing limit further from many aspects. With powerful computing engines embedded, N171 has unprecedent geometry calculation and deep neural network processing capabilities, and can be widely used in surveillance, robots, drones, UGV, smart home, ADAS applications, etc.
Canaan's Kendryte is a series of AI chips which focuses on IoT.
Enflame Tech is a startup company based in Shanghai, China. It was established in March 2018 with two R&D centers in Shanghai and Beijing. Enflame is developing the deep learning accelerator SoCs and software stack, targeting AI training platform solutions for the Cloud service provider and the data centers.
SHANGHAI, China, Dec. 12, 2019 – In conjunction with the launch of Enflame’s CloudBlazer T10, Enflame Technology and GLOBALFOUNDRIES (GF) today announced a new high-performing deep learning accelerator solution for data center training. Designed to accelerate deep learning deployment, the accelerator’s core Deep Thinking Unit (DTU) is based on GF’s 12LP FinFET platform with 2.5D packaging to deliver fast, power-efficient data processing for cloud-based AI training platforms.
EEasy Technology Co. Ltd is an AI system-on-chip (SoC) design house and total solution provider. Its offerings include AI acceleration; image and graphic processing; video encoding and decoding; and mixed-signal ULSI design capabilities.
Founded in Oct. 2017, WITINMEM focuses on Low cost, low power AI chips and system solutions based on processing-in-memory technology in NOR Flash memory.
Qingwei Intelligent Technology (Tsing Micro) is AI chip company spin-off from Tsinghua University.
Black Sesame Technologies (黑芝麻智能科技) has nearly completed its 100 million Series B Financing round which will be used to expand cooperation with OEMs, accelerate mass production, reference design development of autopilot controllers, and software-vehicle integration.
Two years ago Cerebras unveiled a revolution in silicon design: a processor as big as your head, using as much area on a 12-inch wafer as a rectangular design would allow, built on 16nm, focused on both AI as well as HPC workloads. Today the company is launching its second generation product, built on TSMC 7nm, with more than double the cores and more than double of everything.
Today, the company announced the launch of its end-user compute product, the Cerebras CS-1, and also announced its first customer of Argonne National Laboratory.
Word on the virtual street is that Wave Computing is closing down. The company has reportedly let all employees go and filed for Chapter 11. As one of the many promising new companies in the field of AI, Wave Computing was founded in 2008 with the mission “to revolutionize deep learning with real-time AI solutions that scale from the edge to the datacenter.”
Graphcore, the Bristol-based startup that designs processors specifically for artificial intelligence applications, announced it has raised another $150 million in funding for R&D and to continue bringing on new customers. It’s valuation is now $1.95 billion.
解密又一个xPU：Graphcore的IPU give some analysis on its IPU architecture.
Graphcore AI芯片：更多分析 More analysis.
深度剖析AI芯片初创公司Graphcore的IPU In-depth analysis after more information was disclosed.
The SC2 is a second-generation chip featuring twice as many cores – i.e., 2,048 cores with 8-way SMT for a total of 16,384 threads. Operating at 1 GHz with 4 FLOPS per cycle per core as with the SC, the SC2 has a peak performance of 8.192 TFLOPS (single-precision). Both prior chips were manufactured on TSMC’s 28HPC+, however in order to enable the considerably higher core count within reasonable power consumption, PEZY decided to skip a generation and go directly to TSMC’s 16FF+ Technology.
TORONTO, May 20, 2021 /PRNewswire/ - Tenstorrent, a hardware start-up developing next generation computers, announced today that it has raised over $200 million in a recent funding round that values the company at $1 billion. The round was led by Fidelity Management and Research Company and includes additional investments from Eclipse Ventures, Epic CG and Moore Capital.
Today Tenstorrent is announcing that Jim Keller, compute architect extraordinaire, has joined the company as its Chief Technology Officer, President, and joins the company board.
Now Tenstorrent’s claim to future fame and potentially the crown is all about reducing the computation required to get to a good answer, instead of throwing massive amounts of brute-force compute at the problem. The technique is called fine-grained conditional computation, and it is just the beginning of an optimization roadmap Tenstorrent CEO, Ljubisa Bajic, has up his sleeves.
Tenstorrent is a small Canadian start-up in Toronto claiming an order of magnitude improvement in efficiency for deep learning, like most. No real public details but they're are on the Cognitive 300 list.
The fierce competition isn’t deterring Blaize (formerly Thinci), which hopes to stand out from the crowd with a novel graph streaming architecture. The nine-year-old startup’s claimed system-on-chip performance is impressive, to be fair, which is likely why it’s raised nearly $100 million from investors including automotive component maker Denso.
Founded in 2014, Newark, California startup Koniku has taken in $1.65 million in funding so far to become “the world’s first neurocomputation company“. The idea is that since the brain is the most powerful computer ever devised, why not reverse engineer it? Simple, right? Koniku is actually integrating biological neurons onto chips and has made enough progress that they claim to have AstraZeneca as a customer. Boeing has also signed on with a letter of intent to use the technology in chemical-detecting drones.
Adapteva has taken in $5.1 million in funding from investors that include mobile giant Ericsson. The paper "Epiphany-V: A 1024 processor 64-bit RISC System-On-Chip" describes the design of Adapteva's 1024-core processor chip in 16nm FinFet technology.
Knowm is actually setup as a .ORG but they appear to be pursuing a for-profit enterprise. The New Mexcio startup has taken in an undisclosed amount of seed funding so far to develop a new computational framework called AHaH Computing (Anti-Hebbian and Hebbian). The gory details can be found in this publication, but the short story is that this technology aims to reduce the size and power consumption of intelligent machine learning applications by up to 9 orders of magnitude.
ResNet-50 in our prototype analog AI processor. Production release will support 900-1000 fps and INT8 accuracy at 3W.
A battery powered neural chip from Mythic with 50x lower power.
Founded in 2012, Texas-based startup Mythic (formerly known as Isocline) has taken in $9.5 million in funding with Draper Fisher Jurvetson as the lead investor. Prior to receiving any funding, the startup has taken in $2.5 million in grants. Mythic is developing an AI chip that “puts desktop GPU compute capabilities and deep neural networks onto a button-sized chip – with 50x higher battery life and far more data processing capabilities than competitors“. Essentially, that means you can give voice control and computer vision to any device locally without needing cloud connectivity.
Kalray (Euronext Growth Paris – ALKAL), a pioneer in processors for new intelligent systems, has announced the launch of the Kalray Neural Network 3.0 (KaNN), a platform for Artificial Intelligence application development. KaNN allows developers to seamlessly port their AI-based algorithms from well-known machine learning frameworks including Caffe, Torch and TensorFlow onto Kalray’s Massively Parallel Processor Array (MPPA) intelligent processor.
BrainChip Holdings Ltd. (ASX: BRN), a leading provider of ultra-low power, high-performance edge AI technology, today announced that it will present its revolutionary new breed of neuromorphic processing IP and Device in two sessions at the tinyML Summit at the Samsung Strategy & Innovation Center in San Jose, California February 12-13.
BrainChip Inc (CA. USA) was the first company to offer a Spiking Neural processor, which was patented in 2008 (patent US 8,250,011). The current device, called the BrainChip Accelerator is a chip intended for rapid learning. It is offered as part of the BrainChip Studio software. BrainChip is a publicly listed company as part of BrainChip Holdings Ltd.
Latest technology enables scalable, low-power automotive inference engines with >50 TMAC/s NN processing power.
MOUNTAIN VIEW, Calif., October 30, 2018 – AImotive™, the global provider of full stack, vision-first self-driving technology, today announced the release of aiWare3™, the company’s 3rd generation, scalable, low-power, hardware Neural Network (NN) acceleration core.
Leepmind is carrying out research on original chip architectures in order to implement Neural Networks on a circuit enabling low power DeepLearning
A crowdfunding effort for Snickerdoodle raised $224,876 and they’re currenty shipping. If you pre-order one, they’ll deliver it by summer. The palm-sized unit uses the Zynq “System on Chip” (SoC) from Xilinix.
NovuMind combines big data, high-performance, and heterogeneous computing to change the Internet of Things (IoT) into the Intelligent Internet of Things (I²oT). Here is a paper from Moor Insights & Strategy, a global technology analyst and research firm. about NovuMind
TeraDeep is building an AI Appliance using its deep learning FPGA’s acceleration. The company claims image recognition performance on AlexNet to achieve a 2X performance advantage compared with large GPUs, while consuming 5X less power. When compared to Intel’s Xeon processor, TeraDeep’s Accel technology delivers 10X the performance while consuming 5X less power.
According to this article, "Deep Vision announces its low-latency AI processor for the edge"
Deep Vision, a new AI startup that is building an AI inferencing chip for edge computing solutions, is coming out of stealth today. The six-year-old company’s new ARA-1 processors promise to strike the right balance between low latency, energy efficiency and compute power for use in anything from sensors to cameras and full-fledged edge servers.
Jonathan Ross left Google to launch next-generation semiconductor startup Groq in 2016. Today, the Mountain View, California-based firm said that it had raised $300 million led by Tiger Global Management and billionaire investor Dan Sundheim’s D1 Capital as it officially launched into public view.
According to this article, "Gyrfalcon offers Automotive AI Chip Technology"
Gyrfalcon Technology Inc. (GTI), has been promoting matrix-based application specific chips for all forms of AI since offering their production versions of AI accelerator chips in September 2017. Through the licensing of its proprietary technology, the company is confident it can help automakers bring highly competitive AI chips to production for use in vehicles within 18 months, along with significant gains in AI performance, improvements in power dissipation and cost advantages.
At the RISC-V Summit today, Art Swift, CEO of Esperanto Technologies, announced a new, RISC-V based chip aimed at machine learning and containing nearly 1,100 low-power cores based on the open-source RISC-V architecture.
According to this article, "Esperanto exits stealth mode, aims at AI with a 4,096-core 7nm RISC-V monster"
Although Esperanto will be licensing the cores they have been designing, they do plan on producing their own products. The first product they want to deliver is the highest TeraFLOP per Watt machine learning computing system. Ditzel noted that the overall design is scalable in both performance and power. The chips will be designed in 7nm and will feature a heterogeneous multi-core architecture.
SambaNova — a startup building AI hardware and integrated systems that run on it that only officially came out of three years in stealth last December — is announcing a huge round of funding today to take its business out into the world. The company has closed on $676 million in financing, a Series D that co-founder and CEO Rodrigo Liang has confirmed values the company at $5.1 billion.
SambaNova has been working closely with many organizations the past few months and has established a new state of the art in NLP. This advancement in NLP deep learning is illustrated by a GPU-crushing, world record performance result achieved on SambaNova Systems’ Dataflow-optimized system.
SambaNova has been working closely with many organizations the past few months and has established a new state of the art in NLP. This advancement in NLP deep learning is illustrated by a GPU-crushing, world record performance result achieved on SambaNova Systems’ Dataflow-optimized system.
GreenWaves Technologies develops IoT Application Processors based on Open Source IP blocks enabling content understanding applications on embedded, battery-operated devices with unmatched energy efficiency. Our first product is GAP8. GAP8 provides an ultra-low power computing solution for edge devices carrying out inference from multiple, content rich sources such as images, sounds and motions. GAP8 can be used in a variety of different applications and industries.
Optical computers may have finally found a use—improving artificial intelligence
It takes an immense amount of processing power to create and operate the “AI” features we all use so often, from playlist generation to voice recognition. Lightmatter is a startup that is looking to change the way all that computation is done — and not in a small way. The company makes photonic chips that essentially perform calculations at the speed of light, leaving transistors in the dust. It just closed an $11 million Series A.
TORONTO, Canada/NUREMBERG, Germany – FEB 21st, 2018 – Think Silicon®, a leader in developing ultra-low power graphics IP technology, will demonstrate a prototype of NEMA® xNN, the world’s first low-power ‘Inference Accelerator’ Vision Processing Unit for artificial intelligence, convolutional neural networks at Embedded World 2018.
Startup InnoGrit debuted a set of three controllers for solid-state drives (SSDs), including one for data centers that embeds a neural-network accelerator. They enter a crowded market with claims of power and performance advantages over rivals.
Innogrit Technologies Incorporated is a startup seting out to solve the data storage and data transport problem in artificial intelligence and other big data applications through innovative integrated circuit (IC) and system solutions: Extracts intelligence from correlated data and unlocks the value in artificial intelligence systems; Reduces redundancy in big data and improves system efficiency for artificial intelligence applications; Brings networking capability to storage devices and offers unparalleled performance at large scales; Performs data computation within storage devices and boosts performance of large data centers.
Kortiq is a startup providing "FPGA based Neural Network Engine IP Core and The scalable Solution for Low Cost Edge Machine Learning Inference for Embedded Vision". Recently, they revealed some comparison data. You can also find the Preliminary Datasheet of their AIScaleCDP2 IP Core on their website.
......Hailo-8 is capable of 26 tera operations per second (TOPs) ...... In one preliminary test at an image resolution of 224 x 224, the Hailo-8 processed 672 frames per second compared with the Xavier AGX’s 656 frames and sucked down only 1.67 watts (equating to 2.8 TOPs per watt) versus the Nvidia chip’s 32 watts (0.14 TOPs per watt)......
Only last week, we did a thought experiment about how we should have streamlined chiplets for very specific purposes, woven together inside of a single package or across sockets and nodes, co-designed to specifically run very precise workflows because any general purpose processor – mixing elements of CPUs, GPUs, TPUs, NNPs, and FPGAs – would be suboptimal on all fronts except volume economics. We think that this extreme co-design for datacenter compute is the way the world will ultimately go, and we are just getting the chiplet architectures and interconnects together to make this happen. Radoslav Danilak, co-founder and chief executive officer of processor upstart Tachyum, is having absolutely none of that. And in fact, the Prodigy “universal processor” that Tachyum has designed is going in exactly in the opposite direction.
Semiconductor startup Tachyum Inc. today announced that it has completed another critical stage in software development by successfully achieving an Apache web server port to Prodigy Universal Processor Instruction Set Architecture (ISA). This latest milestone by Tachyum’s software team brings the company’s Prodigy Universal Processor one step closer to being customer-ready in anticipation of its commercial launch in 2021.
AlphaICs designed an instruction set architecture (ISA) optimized for deep-learning, reinforcement-learning, and other machine-learning tasks. The startup aims to produce a family of chips with 16 to 256 cores, roughly spanning 2 W to 200 W.
Startup Syntiant Corp. is an Irvine, Calif. semiconductor company led by former top Broadcom engineers with experience in both innovative design and in producing chips designed to be produced in the billions, according to company CEO Kurt Busch.
MUNICH — Swiss startup aiCTX has closed a $1.5 million pre-A funding round from Baidu Ventures to develop commercial applications for its low-power neuromorphic computing and processor designs and enable what it calls “neuromorphic intelligence.” It is targeting low-power edge-computing embedded sensory processing systems.
The programmable chip company scores $55 million in venture backing, bringing its total haul to $82 million
Dec. 12, 2018, Tokyo Japan – Preferred Networks, Inc. (“PFN”, Head Office: Tokyo, President & CEO: Toru Nishikawa) announces that it is developing MN-Core (TM), a processor dedicated to deep learning and will exhibit this independently developed hardware for deep learning, including the MN-Core chip, board, and server, at the SEMICON Japan 2018, held at Tokyo Big Site.
Stealth startup Cornami on Thursday revealed some details of its novel approach to chip design to run neural networks. CTO Paul Masters says the chip will finally realize the best aspects of a technology first seen in the 1970s.
Anaflash Inc. (San Jose, CA) is a startup company that has developed a test chip to demonstrate analog neurocomputing taking place inside logic-compatible embedded flash memory.
Optalysys develops Optical Co-processing technology which enables new levels of processing capability delivered with a vastly reduced energy consumption compared with conventional computers. Its first coprocessor is based on an established diffractive optical approach that uses the photons of low-power laser light instead of conventional electricity and its electrons. This inherently parallel technology is highly scalable and is the new paradigm of computing.
The firm pivoted away from riskier spiking neural networks using a new power management scheme
Chip can learn on its own and inference at 100-microwatt scale, says company at Arm TechCon.
Achronix is back in the game of providing full-fledged FPGAs with a new high-end 7-nm family, joining the Gold Rush of silicon to accelerate deep learning. It aims to leverage novel design of its AI block, a new on-chip network, and use of GDDR6 memory to provide similar performance at a lower cost than larger rivals Intel and Xilinx.
Areanna is the latest example of an explosion of new architectures spawned by the rise of deep learning. The debut of a whole new approach to computing has fired imaginations of engineers around the industry hoping to be the next Hewlett and Packard.
Add NeuroBlade to the dozens of startups working on AI silicon. The Israeli company just closed a $23 million Series A, led by the founder of Check Point Software and with participation from Intel Capital.
Luminous Computing has developed an optical microchip that runs AI models much faster than other semiconductors while using less power.
Six-year-old startup Efinix has created an intriguing twist on the FPGA technology dominated by Intel and Xiliinx; the company hopes its energy-efficient chips will bootstrap the market for embedded AI in the Internet of Things.
David Schie, a former senior executive at Maxim, Micrel, and Semtech, thinks both markets are ripe for disruption. He — along with WSI, Toshiba, and Arm veterans Robert Barker, Andreas Sibrai, and Cesar Matias — in 2011 cofounded AIStorm, a San Jose-based artificial intelligence (AI) startup that develops chipsets that can directly process data from wearables, handsets, automotive devices, smart speakers, and other internet of things (IoT) devices.
SAN JOSE, Calif.--(BUSINESS WIRE)--SiMa.ai, the company enabling high performance machine learning to go green, today announced its Machine Learning SoC (MLSoC) platform – the industry’s first unified solution to support traditional compute with high performance, lowest power, safe and secure machine learning inference. Delivering the highest frames per second per watt, SiMa.ai’s MLSoC is the first machine learning platform to break the 1000 FPS/W barrier for ResNet-501. In customer engagements, the company has demonstrated 10-30x improvement in FPS/W through its automated software flow across a wide range of embedded edge applications, over today’s competing solutions. The platform will provide machine learning solutions that range from 50 TOPs@5W to 200 TOPs@20W, delivering an industry first of 10 TOPs/W for high performance inference.
SiMa.ai, the company enabling high performance machine learning to go green, today announced its Machine Learning SoC (MLSoC) platform – the industry’s first unified solution to support traditional compute with high performance, lowest power, safe and secure machine learning inference. Delivering the highest frames per second per watt, SiMa.ai’s MLSoC is the first machine learning platform to break the 1000 FPS/W barrier for ResNet-501. In customer engagements, the company has demonstrated 10-30x improvement in FPS/W through its automated software flow across a wide range of embedded edge applications, over today’s competing solutions. The platform will provide machine learning solutions that range from 50 TOPs@5W to 200 TOPs@20W, delivering an industry first of 10 TOPs/W for high performance inference.
Untether AI, a startup developing custom-built chips for AI inferencing workloads, today announced it has raised $125 million from Tracker Capital Management and Intel Capital. The round, which was oversubscribed and included participation from Canada Pension Plan Investment Board and Radical Ventures, will be used to support customer expansion.
GrAI Matter Labs (aka GML), a neuromorphic computing pioneer today revealed NeuronFlow – a new programmable processor technology – and announced an early access program to its GrAIFlow software development kit.
We build artificial intelligence processors, inspired by the brain. Our mission is to enable brain-scale intelligence.
ABR makes the world's most advanced neuromoprhic compiler, runtime and libraries for the emerging space of neuromorphic computing.
EE Times exclusive! The new chip targets AI-powered voice interfaces in IoT devices — “the most important AI workload at the endpoint.”
The latest xcore.ai is a crossover chip designed to deliver high-performance AI, digital signal processing, control, and input/output in a single device with prices from $1.
We design and produce AI processors and the software to run them in data centers. Our unique approach optimizes for inference with the focus on performance, power efficiency, and ease of use; and at the same time our approach enables cost-effective training.
We build high-performance AI inference coprocessors that can be seamlessly integrated into various computing platforms including data centers, servers, desktops, automobiles and robots.
Corerain provides ultra-high performance AI acceleration chips and the world's first streaming engine-based AI development platform.
On-device computing solutions startup Perceive emerged from stealth today with its first product: the Ergo edge processor for AI inference. CEO Steve Teig claims the chip, which is designed for consumer devices like security cameras, connected appliances, and mobile phones, delivers “breakthrough” accuracy and performance in its class.
As traditional chip makers struggle to embrace the challenges presented by the rapidly evolving AI software landscape, a San Jose startup has announced it has working silicon and a whole new future-proof chip paradigm to address these issues.
The SimpleMachines, Inc. (SMI) team – which includes leading research scientists and industry heavyweights formerly of Qualcomm, Intel and Sun Microsystems – has created a first-of-its-kind easily programmable, high-performance chip that will accelerate a wide variety of AI and machine-learning applications.
NeuReality has unveiled NR1-P, a novel AI-centric inference platform. NeuReality has already started demonstrating its AI-centric platform to customers and partners. NeuReality has redefined today’s outdated AI system architecture by developing an AI-centric inference platform based on a new type of System-on-Chip (SoC).
NeuReality, an Israeli AI hardware startup that is working on a novel approach to improving AI inferencing platforms by doing away with the current CPU-centric model, is coming out of stealth today and announcing an $8 million seed round.
The company is backed by Khosla Ventures and is developing its first generation of products for AI computing at the edge. The company raised $4.5 million shortly after its formation in March 2018, so the latest tranche brings the total raised to-date to $15.1 million
BURLINGAME, Calif., June 22, 2021 — Quadric (quadric.io), an innovator in high-performance edge processing, has introduced a unified silicon and software platform that unlocks the power of on-device AI.
5G is the current revolution in wireless technology, and every chip company old and new is trying to burrow their way into this ultra-competitive — but extremely lucrative — market. One of the most interesting new players in the space is EdgeQ, a startup with a strong technical pedigree via Qualcomm that we covered last year after it raised a nearly $40 million Series A.
Innatera, the Dutch startup making neuromorphic AI accelerators for spiking neural networks, has produced its first chips, gauged their performance, and revealed details of their architecture.
AI Chip Compilers
2. TVM:End to End Deep Learning Compiler Stack
3. Google Tensorflow XLA
4. Nvidia TensorRT
7. MIT Tiramisu compiler
8. ONNC (Open Neural Network Compiler)
9. MLIR: Multi-Level Intermediate Representation
10. The Tensor Algebra Compiler (taco)
11. Tensor Comprehensions
12. PolyMage Labs
AI Chip Benchmarks
- DAWNBench:An End-to-End Deep Learning Benchmark and Competition Image Classification (ImageNet)
- Fathom:Reference workloads for modern deep learning methods
- MLPerf:A broad ML benchmark suite for measuring performance of ML software frameworks, ML hardware accelerators, and ML cloud platforms. You can find MLPerf training results v0.7 here..
You can find MLPerf inference results v0.7 here..
- AI Matrix
- EEMBC MLMark Benchmark
- FPGAs and AI processors: DNN and CNN for all
- 12 AI Hardware Startups Building New AI Chips
- Tutorial on Hardware Architectures for Deep Neural Networks
- Neural Network Accelerator Comparison
- "White Paper on AI Chip Technologies 2018". You can download it from here, or Google drive.
- "What We Talk About When We Talk About AI Chip". #1, #2, #3, #4
- AI Chip Paper List
- TPU vs GPU vs Cerebras vs Graphcore: A Fair Comparison between ML Hardware