MultiSense API is a local audio and image recognition API service designed specifically for macOS, with extremely fast recognition speed.
All processing is done locally — no API key required, no internet connection needed, fully privacy-safe.
- Load Model: Click
Load Modelin the app interface. The software will automatically download the speech-to-text model. If you don't need speech-to-text functionality, you can skip this step. - Start Service: Set the listening port and click
Start API Service. - Status Check: Confirm that each engine is ready by accessing the root path.
Base URL: http://127.0.0.1:1643
Before calling specific endpoints, it is recommended to use this endpoint to check whether a particular feature is enabled or if the model has finished loading.
Check the API service liveness status and internal component initialization.
const json = await fetch('http://localhost:1643').then(response => response.json())
console.log(json)- status
stringService running status. Returns"success"when operating normally. - transcribe
booleanSpeech transcription readiness. Iffalse, the ASR model has not been downloaded yet or is still loading. Calling/transcribeat this point will trigger automatic initialization. - ocr
booleanVision engine readiness. Based on the macOS system framework, this is typically alwaystrue. - classify
booleanObject classification engine readiness. Based on the macOS system framework, this is typically alwaystrue.
Response Example
{ "status": "success", "transcribe": true, "ocr": true, "classify": true }
POST /transcribe
Converts an audio or video stream into text.
Supported languages: English, Spanish, French, Russian, German, Italian, Polish, Ukrainian, Romanian, Dutch, Hungarian, Greek, Swedish, Czech, Bulgarian, Portuguese, Slovak, Croatian, Danish, Finnish, Lithuanian, Slovenian, Latvian, Estonian, Maltese.
| Parameter | Type | Status | Description |
|---|---|---|---|
audio |
file |
Required | The binary audio stream to be transcribed. |
const blob = await fetch('https://example.com/demo.mp3').then(response => response.blob())
const formData = new FormData()
formData.append('audio', blob)
const json = await fetch('http://localhost:1643/transcribe', {
method: 'POST',
body: formData
}).then(response => response.json())
console.log(json.text)Response Example
{ "text": "Text content identified from audio." }
POST /ocr
Extracts multilingual text from an image.
Supported Languages
| Language Code | Language |
|---|---|
| en-US | English (United States) |
| fr-FR | French (France) |
| it-IT | Italian |
| de-DE | German |
| es-ES | Spanish |
| pt-BR | Portuguese (Brazil) |
| zh-Hans | Chinese (Simplified) |
| zh-Hant | Chinese (Traditional) |
| yue-Hans | Cantonese (Simplified) |
| yue-Hant | Cantonese (Traditional) |
| ko-KR | Korean |
| ja-JP | Japanese |
| ru-RU | Russian |
| uk-UA | Ukrainian |
| th-TH | Thai |
| vi-VT | Vietnamese |
| ar-SA | Arabic (Modern Standard Arabic) |
| ars-SA | Arabic (Najdi dialect, Saudi Arabia) |
| Parameter | Type | Status | Description |
|---|---|---|---|
image |
file |
Required | The image file to be processed. |
language |
string |
Optional | Language code (e.g. zh-Hans,en-US). Defaults to Chinese-English bilingual. |
lineBreak |
boolean |
Optional | Whether to preserve detected line breaks. Defaults to true. |
const blob = await fetch('https://example.com/demo.jpg').then(response => response.blob())
const formData = new FormData()
formData.append('image', blob)
formData.append('language', 'es-ES')
formData.append('lineBreak', false)
const json = await fetch('http://localhost:1643/ocr', {
method: 'POST',
body: formData
}).then(response => response.json())
console.log(json.text)Response Example
{ "text": "Text content identified from image." }
POST /classify
Identifies objects and scenes in an image and returns confidence scores.
| Parameter | Type | Status | Description |
|---|---|---|---|
image |
file |
Required | The image to be classified. |
const blob = await fetch('https://example.com/demo.jpg').then(response => response.blob())
const formData = new FormData()
formData.append('image', blob)
const json = await fetch('http://localhost:1643/classify', {
method: 'POST',
body: formData
}).then(response => response.json())
console.log(json)Response Example
{ "labels": [ { "identifier": "Laptop", "confidence": 0.985 } ] }
