IRIS - Image Reading Server

IRIS utilizes one specific ability of multimodal large language models - image description. How to use it? You decide. My intention is to make IRIS a controllable gateway for selected LLMs that is easy to deploy and customize. Currently only Google Gemini is supported. IRIS can be used with Iriso app or standalone.

Why not use Gemini or GPT4V API directly?

You do not share your API keys
One interface for multiple LLMs
Currently Gemini and GPT4V are regionally locked, IRIS helps to avoid this
Request rate limiting is under your full control
You have control over the prompt to describe images, which makes abuse or jailbreak almost impossible
Simple access contol that can be extended to anything you need

How to set up

IRIS is an extendable base for your needs and a backend for Iriso app. Therefore, many things have been intentionally simplified to make it easy to deploy even for inexperienced developers. Here are the steps to run on GCP:

Get yourself a Google account
Activate VertexAI console
Create a new project
Activate Google Cloud Run, Secret Manager, Artifact Registry (optional)
Set up Application Default Credentials for Google Cloud. As a result you'll get application_default_credentials.json
Create a secret in Google Secret Manager - simple json array with access tokens for all IRIS users you need (use GUIDs or random alphanumeric strings)
```
[
  "TOKEN-ONE-LONG-ENOUGH-AND-HARD-TO-GUESS",
  "ANOTHER-TOKEN-FOR-ANOTHER-USER",
  .....
]
```

Store it as tokens.json. Since IRIS was originally intended for individuals or small groups of people with visual impairments, there is no need for user management. However, if you want to scale it for larger groups, it is quite easy to implement such a feature with NoSQL or SQL DB.

Create another secret from application_default_credentials.json
Get two values from VertexAI: PROJECT (e.g. random-words-123456) and LOCATION (e.g. us-central1)
You can now build your own IRIS server from Dockerfile or use the unmodified server from Google Artifcat Registry: europe-west10-docker.pkg.dev/wide-gecko-408120/egodx/image-reader-server:latest
Create a new service in Google Cloun Run
1. Use the Docker repository from step 9 or your own repository in GAR. It must be available under https://*.pkg.dev/. DockerHub doesn't work.
2. Set container port to 21088
3. Create 2 variables: IRIS_GEMINI_VERTEXAI_LOCATION and IRIS_GEMINI_VERTEXAI_PROJECT. Set the values to LOCATION and PROJECT respectively
4. Mount your secrets as /root/.config/gcloud/application_default_credentials.json and /usr/node/tokens/tokens.json
If everything is configured correctly, you can test your IRIS server by sending a request using CURL

curl -X POST -F "file=@cat.jpg" -F "lang=en" -H "Authorization: Bearer YOUR_TOKEN_FROM_TOKENS_JSON"  https://YOUR_SERVER_URL.run.app/upload

Response should be like

{
    "text":" This is a picture of a ginger kitten sleeping on a white fluffy blanket. The kitten is on its back with its eyes closed and its paws in the air. It has a contented smile on its face."
}

Language support

Now IRIS supports 8 languages:

Chinese
English
French
German
Portuguese
Russian
Spanish
Turkish

You can add a new language by adding language code and prompt to prompts.json in the vision provider folder. For example, Gemini provider can support up to 38 languages.

Roadmap

Improve documentation
New vision provider - GPT4V
New local vision provider - llama.cpp
More languages and better prompts

License

GPLv3. See gnu-gpl-v3.0.md

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
server		server
Dockerfile		Dockerfile
README.md		README.md
gnu-gpl-v3.0.md		gnu-gpl-v3.0.md
iris.png		iris.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IRIS - Image Reading Server

Why not use Gemini or GPT4V API directly?

How to set up

Language support

Roadmap

License

About

Releases

Packages

Languages

Egodx/iris

Folders and files

Latest commit

History

Repository files navigation

IRIS - Image Reading Server

Why not use Gemini or GPT4V API directly?

How to set up

Language support

Roadmap

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages