-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Description
One of the great difficulties of new users trying to coax AI image generators into producing something like what they imagine is the construction of the text prompt. Users are often told that they can just tell it things they want to see and it will do it. In my experience, many of the phrases I put into the prompts are either ignored or misunderstood. I suspect this is partially my own fault, and the situation would be improved with a bit of documentation.
What I'm looking for is a document that details the following:
- What phrases are understood for artistic styling? For example, would it understand things like
pixel art,line drawing,comic book,pulp art,cad model,salvador dali, orsolarpunk? - What phrases are understood for characters and objects? For example, would it understand things like
garden gnome,maelstrom,mineral vein,power armor,coat of arms, orsoldering iron? - What phrases are understood for verbs and modifiers? For example, would it understand things like
opening,fallow,holding,jaundiced,ugly,angry,vibrating,dutch angle, ordefenestrating? - What phrases are understood for image output settings? For example, would it understand things like
16:9,UHD, or5-bit color? - Is there any significance to grammar or ordering of phrases?
- What are the practical limits of how many and how specific one's phrases might be?
- Are there any hidden modifier phrases that the processing engine watches for?
- What happens when you repeat phrases? For example,
woman shining a flashlight in an alley, but the flashlight shines darkness instead of light. - What grammar or phrases will be ignored by the processing engine?
- Are there any grammatical patterns that tend to lead to better results?
- Other tips and tricks for how to talk to the machine.
As with all current iterations of natural language processing, the engine's ability to interpret what we write will be significantly reduced from what humans can do. Therefore, humans need to know the boundaries of what the system can interpret so that we can talk to the machine in terms it will understand. Hopefully a document that details these things will be able to improve the usability, quality, and utility of tools like this.