GitHub - bigsk1/vision-image-gen: Openai GPT Vision - Dalle3 - CLI & Streamlit UI Image generator based on your input

AI Image Generator

This project leverages OpenAI's GPT Vision and DALL-E models to analyze images and generate new ones based on user modifications. It provides two interfaces: a web UI built with Streamlit for interactive use and a command-line interface (CLI) for direct script execution. Features

Image Analysis: Automatically describes images using GPT-4 Vision.
Image Generation: Generates modified images based on user inputs using DALL-E 3.
Web Interface: Interactive web UI for easy operation.
CLI: Command-line version for script or batch processing.

How it Works:

The app first downloads the image from the provided URL or path locally and analyzes it using the pre-trained AI model gpt-4-vision-preview to generate a description.
You're then given the opportunity to modify this description to guide the image generation process, the original description from the vision model and your included description are used.
Finally, the app uses DALL-E 3 to generate a new image 1790x1024 based on the modified description.
You can see the original image and then newly created image. Right click to save.

Youtube Video Showing how it works

Installation

Tested in Python 3.11.4

Clone the repository to your local machine:

git clone https://github.com/bigsk1/vision-image-gen.git
cd vision-image-gen

Install the required dependencies:

pip install -r requirements.txt

Usage

Web UI

To start the web interface, run:

streamlit run vision_image_gen_ui_local.py

Navigate to the URL provided by Streamlit, http://localhost:8501, in your web browser. Enter you Open AI API Key or Have your Open Ai Api key added to your system enviroment variables in PATH

Upload an Image: Use the provided input to upload an image or specify an image URL.
View Analysis: See the AI-generated description of the image.
Modify and Generate: Enter modifications to the original description and generate a new image.
View and Save: The generated image will be displayed, and you can save it locally.

CLI Version

The CLI version allows you to process images directly from your terminal.

python vision_image_gen.py

Using Streamlit Cloud Sharing

Use the vision_image_gen_ui.py for Streamlit Cloud sharing, in the settings just add

[openai]
api_key = "sk-paste-your-api-key"

Example of output

==================================================
Vision Response:
==================================================
The image shows a computer terminal interface with ASCII art and text. At the top would be ASCII art resembling a face with a pattern of "#" and "." characters. Below it, within a minimalist window frame, is a navigation menu with options depicted as a pixel-style globe icon labeled "sumfetch," a document icon labeled "ABOUT," a link icon labeled "Website," a folder icon labeled "This Repo," and a series of contact methods including an email address, GitHub URL, and Twitter handle, all associated with the username "bigsk1". The central feature is a bold ASCII art logo or emblem saying "BIGSK1" inside a stylized circular border.

For a text-to-image model, you could describe it as follows:

"Create an image of a dark computer terminal screen with a pixelated face made out of ASCII characters at the top. Include a stylized ASCII art logo that says 'BIGSK1' in the center, enclosed in a circular patterned border. Below the logo, depict a simple user interface with text and monochrome icons signifying navigation options, including a globe for 'sumfetch,' a document for 'ABOUT,' a link chain for 'Website,' and a folder for 'This Repo.' Add additional details

==================================================
User's Modification Input:
==================================================
make it in the style of an american flag

==================================================
Final Prompt Sent to DALL-E 3:
==================================================
The image shows a computer terminal interface with ASCII art and text. At the top would be ASCII art resembling a face with a pattern of characters. Below it, within a minimalist window frame, is a navigation menu with options depicted as a pixel-style globe icon labeled "sumfetch," a document icon labeled "ABOUT," a link icon labeled "Website," a folder icon labeled "This Repo," and a series of contact methods including an email address, GitHub URL, and Twitter handle, all associated with the username "bigsk1". The central feature is a bold ASCII art logo or emblem saying "BIGSK1" inside a stylized circular border.

For a text-to-image model, you could describe it as follows:

"Create an image of a dark computer terminal screen with a pixelated face made out of ASCII characters at the top. Include a stylized ASCII art logo that says 'BIGSK1' in the center, enclosed in a circular patterned border. Below the logo, depict a simple user interface with text and monochrome icons signifying navigation options, including a globe for 'sumfetch,' a document for 'ABOUT,' a link chain for 'Website,' and a folder for 'This Repo.' Add additional details make it in the style of an american flag

Example image in the original_image folder this is were your downloaded images will end up.

The generated_images folder is were the new Dalle generated image will end up.

This is a work in progress, more to add soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

generated_images

generated_images

original_image

original_image

README.MD

README.MD

requirements.txt

requirements.txt

vision_image_gen.py

vision_image_gen.py

vision_image_gen_ui.py

vision_image_gen_ui.py

vision_image_gen_ui_local.py

vision_image_gen_ui_local.py

Repository files navigation

AI Image Generator

How it Works:

Installation

Usage

CLI Version

Using Streamlit Cloud Sharing

Example of output

About

Releases

Sponsor this project

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
generated_images		generated_images
original_image		original_image
README.MD		README.MD
requirements.txt		requirements.txt
vision_image_gen.py		vision_image_gen.py
vision_image_gen_ui.py		vision_image_gen_ui.py
vision_image_gen_ui_local.py		vision_image_gen_ui_local.py

bigsk1/vision-image-gen

Folders and files

Latest commit

History

Repository files navigation

AI Image Generator

How it Works:

Installation

Usage

CLI Version

Using Streamlit Cloud Sharing

Example of output

About

Topics

Resources

Stars

Watchers

Forks

Sponsor this project

Languages