## DOCUMENTATION
___
### **LLM**

| **Model**                    | **Context Window** |**Limits**           |
|------------------------------|--------------------|---------------------|
| **Qwen 2.5 35B Code**        | Up to 128K tokens  |Fine Tuned for coding|
| **Cohere Command R+**        | Up to 128K tokens  | 1000 calls per month|
| **Mistral-7B-Instruct-v0.3** | 32,768 tokens      | Hallucinations      |

Using `Mistral-7B-Instruct-v0.3` because reliable outputs and 1000 calls per day



### **Text-to-Image**


| **Category**               | **Stable Diffusion 3.5 Large Turbo**                                                                 | **FLUX.1-Dev FP16**                                                                       |
|----------------------------|-----------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
| **API Endpoint**           | `stabilityai/stable-diffusion-3-5-large-turbo`                                                     | `black-forest-labs/FLUX.1-Dev-FP16`                                                     |
| **Key API Parameters**     | `guidance_scale=3`, `num_inference_steps=4`, `negative_prompt`                                     | `num_inference_steps=8`, `control_image` (Canny/Depth), `true_cfg=4.0`                  |
| **API Latency**            | 3.8–5.2 sec/image (cold start: 12–18 sec)                                                          | 6.1–8.9 sec/image (cold start: 15–22 sec)                                                |
| **Cost Efficiency**        | $0.0021/image (PRO tier)                                                                           | $0.0033/image (PRO tier)                                                                 |
| **Free Tier Limits**       | 500 requests/hour                                                                                  | 300 requests/hour                                                                        |
| **Max Resolution**         | 1024x1024 via single API call                                                                      | 2048x2048 (requires `high_res_fix=true` parameter)                                       |
| **Advanced Features**      | - 4-step inference <br> - Text-to-image only                                                       | - Unified ControlNet (Canny/Depth) <br> - Image-to-image <br> - Inpainting/Outpainting   |
| **NSFW Filtering**         | Enabled by default (`safety_checker=strict`)                                                       | Optional (`safety_checker=relaxed`)                                                      |
| **Rate Limits (PRO)**      | 5K requests/hour                                                                                   | 3K requests/hour                                                                         |
| **Use Case Focus**         | Rapid batch generation (social media, prototyping)                                                 | High-detail workflows (product design, architectural viz)                                |

using `SD-3.5-LT` because latency is low, More requests and we want stialized images which is better on SD 



In [2]:
from reader import ebook
from main import read_list,read_json
from typing import Optional,Dict,List
import requests
from dotenv import load_dotenv
import json
import os
from pydantic import BaseModel, conint
from dataclasses import dataclass

In [3]:
load_dotenv()
API = os.getenv("HF_API")

headers = {
    "Authorization": f"Bearer {API}",
    "Content-Type": "application/json",
}
url = "https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.3/v1/chat/completions"



___
___

# Get **Summary**, **Characters** AND **Places** 

![Local Image](/home/prince/Documents/Project/BOOK/media/Sudo_summary.png "Smmary, characters and Places")

In [4]:
sum_role='''(NOTE: Only output in JSON. Ensure the JSON format is valid, well-formed, and Ready to parse. nothing before or after the json file)
Input:  
1.Current Chapter Text: The current chapter to be analyzed.
2.Character List: A list of characters with their descriptions till now (This chapter).
3.Places list: list of places and their description till now (This chapter).
4.Previous Chapters' Summary: Context from earlier chapters.

Rules:  
1.Narrative Summary: Summarize and explain the chapter in detail, integrating context and key developments from previous chapters and create a self containing summary and explaination. end with to be continued.
2.Character List: add new characters to the list based on this chapter and Update existing character descriptions. If no characters are mentioned, return the same list as given.
3.Places: Include an updated description of any significant locations mentioned in this chapter, focusing on environment, weather, vibe, and structure.
4.Output Format: Ensure the output is valid and well-structured JSON.

Output:  
Generate a JSON object in this format:
{
  "summary": "Detailed Summary and explination of the current chapter in context of previous chapters. Use previus chapter summary as context",
  "characters": {
      "Character Name": "Updated or new description (age, looks, clothes, hair, body language) based on this chapter."
    },
  "places": {
      "Place Name": "Updated or new description (environment, weather, vibe, structure, etc.) based on this chapter."
  }
}
'''

#### **get_sum()** Function
![Local Image](/home/prince/Documents/Project/BOOK/media/Get_sum_logic.png "Smmary, characters and Places")


outputdir=
```{
    0:
        {
            "summary":"No previous Context Yet",
            "characters":{},
            "places": {}
        }
            }```


In [5]:


class SummarySchema(BaseModel):
    summary: str
    characters: Dict[str,str]
    places: Dict[str,str]

def sum_msg(text: str, context: str, characters: dict = {}, places: dict = {}) -> list:
    message = [
        {
            "role": "system",
            "content": sum_role
        },
        {
            "role": "user",
            "content": json.dumps({
                "past_context": context,
                "Current_Chapter": text,
                "character_list": characters,
                "places_list": places
            }),
        },
    ]
    return message

def get_summ(messages:Dict[str,str]) -> Optional[str]:
    data = {
        "messages": messages,
        "temperature": 0.7,
        "stream": False,
        "max_tokens":10000,
        "parameters": {
        "repetition_penalty": 1.3,
        "grammar": {
            "type": "json",
            "value": SummarySchema.model_json_schema()
                }
                    }
    }
    response = requests.post(url, headers=headers, json=data)

    if response.status_code == 200:
        response_data = response.json()
        assistant_message = response_data["choices"][0]["message"]["content"]
        return assistant_message
    else:
        print(f"Error: {response.status_code}, {response.text}")
        return None



In [6]:
# book=ebook("./books/Alchemist/Alchemist.epub")
book=ebook("./books/LP.epub")
title=book.get_metadata()['title']
_,text=zip(*book.get_chapters())
text=text[:3]



In [7]:
output_dict={
    0:
        {
            "summary":"No previous Context Yet",
            "characters":{},
            "places": {}
        }
            }


In [8]:
for idx,i in enumerate(text):
 
    if len(i)<1200:
        output_dict[idx+1]=output_dict[idx]
    else:
        context=output_dict[idx]["summary"]
        characters=output_dict[idx]["characters"]
        places=output_dict[idx]["places"]
        
        mes=sum_msg(i,context,characters,places)
        
        summary_output=get_summ(mes)
        summary_output_json=read_json(summary_output)
        output_dict[idx+1]=summary_output_json
    print(f"chapter: {idx} done")
    print(output_dict)

chapter: 0 done
{0: {'summary': 'No previous Context Yet', 'characters': {}, 'places': {}}, 1: {'summary': "This chapter, the narrator reflects on his childhood and a particular drawing of a boa constrictor digesting an elephant, which he showed to adults who misunderstood it as a drawing of a hat. Despite their lack of understanding, the narrator continues to draw and later becomes a pilot. In the Sahara desert six years ago, he experiences a breakdown and is saved by a mysterious voice that asks him to draw a sheep. The chapter explores the narrator's frustration with adults who fail to comprehend his art and his isolation, which is later alleviated by the voice in the desert.", 'characters': {'Narrator': 'A pilot who, as a child, struggled to communicate his art to adults due to their inability to understand. Shown to be isolated and frustrated, but later finds solace in the mysterious voice in the desert.'}, 'places': {'Sahara Desert': 'A vast, isolated desert where the narrator ex

In [9]:
for num, dic in output_dict.items():
    if num==0:
        continue
    print(f"CHAPTER {num} :\n")
    
    sum,char,places=dic.items()
    print("Characters:")
    if len(char)>0 :
        for name,i in char[1].items():
            print(f"{name}: {i}")
    
    print("\nPlaces:")
    if len(places)>0:
        for name,i in places[1].items():
            print(f"{name}: {i}")
    print(f"\nSummary:\n{sum[1]}")
    print("\n-------------------------------------------------------------------------------------------------------------------")

CHAPTER 1 :

Characters:
Narrator: A pilot who, as a child, struggled to communicate his art to adults due to their inability to understand. Shown to be isolated and frustrated, but later finds solace in the mysterious voice in the desert.

Places:
Sahara Desert: A vast, isolated desert where the narrator experiences a breakdown. The harsh environment and isolation are contrasted with the friendly voice that later emerges.

Summary:
This chapter, the narrator reflects on his childhood and a particular drawing of a boa constrictor digesting an elephant, which he showed to adults who misunderstood it as a drawing of a hat. Despite their lack of understanding, the narrator continues to draw and later becomes a pilot. In the Sahara desert six years ago, he experiences a breakdown and is saved by a mysterious voice that asks him to draw a sheep. The chapter explores the narrator's frustration with adults who fail to comprehend his art and his isolation, which is later alleviated by the voic

___
___
___

# Get **Scenes**
![Local Image](/home/prince/Documents/Project/BOOK/media/Sudo_summary.png )

In [10]:
class SceneSchema(BaseModel):
    scenes: Dict[str,str]
print(SceneSchema.model_json_schema())

{'properties': {'scenes': {'additionalProperties': {'type': 'string'}, 'title': 'Scenes', 'type': 'object'}}, 'required': ['scenes'], 'title': 'SceneSchema', 'type': 'object'}


In [11]:
def scene_msg(text: str) -> list:
    message = [
        {
            "role": "system",
            "content": ''' IMPORTANT-> ONLY OUTPUT IN JSON.
                You are a text-to-image prompt generator.
                Your task is to analyze the provided input text and identify distinct scenes where there are changes in place or time. For each identified scene, create a detailed and descriptive prompt suitable for generating an image.
                Characters: Refer to characters by their respective names as mentioned in the text.
                Places: Refer to places by their proper names as mentioned in the text.
                Ensure each prompt captures the scene's mood, setting, and key visual elements in detail.
                images are for book
                
                1. **Input:**
                    - `text`: A block of narrative text.```
                2. **Output:**
                    - [prompt,prompt,prompt.....]

                ### Instructions:
                - Identify key changes in location, characters, or significant actions to define separate scenes.
                - Use descriptive language to paint a vivid picture of each scene in the prompt.
                
                   ''',
        },
        
        {
            "role": "user",
            "content": f"""TEXT: {text}""",  # This should be your input text that describes the scenes
        },]
        
    return message

def get_scene(scene_message: list) -> Optional[str]:
    data = {
        "messages": scene_message,
        "max_tokens": 10000,  # Specify the maximum length of the response
        "temperature": 0,  # Control the randomness of the response
        "stream": False,
        "repetition_penalty": 1.3,
        "grammar": {
            "type": "json",
            "value": SceneSchema.model_json_schema()
                }
        }
    
    response = requests.post(url, headers=headers, json=data)

    # Check the response status code and process the output
    if response.status_code == 200:
        response_data = response.json()
        # Extract the assistant's message content
        assistant_message = response_data["choices"][0]["message"]["content"]
        return assistant_message
    else:
        print(f"Error: {response.status_code}, {response.text}")  # Print error details
        return None

In [12]:
scene_output_list=[]
for i in text:
    inputs=i.replace("\n"," ")
    scene_message = scene_msg(text)
    scene_output=get_scene(scene_message)
    scene_json_output=read_json(scene_output)
    scene_output_list.append(scene_json_output)

In [13]:
scene_output_list

[[{'scene': "A young boy's bedroom in France",
   'description': "A cozy, dimly lit room filled with children's books and toys. The boy, bundled up against the cold, is huddled under a blanket, looking sad and lonely. The room is quiet, save for the occasional sound of a page turning."},
  {'scene': 'A jungle scene',
   'description': "A dense, lush jungle teeming with wildlife. A boa constrictor is shown digesting an elephant, with the inside of the snake's body visible. The scene is dark and foreboding, with the sun barely filtering through the dense foliage."},
  {'scene': 'A desert landscape',
   'description': 'A vast, barren desert stretching out as far as the eye can see. The sun is setting, casting long shadows across the sand. A man is shown standing alone, surrounded by nothing but sand and silence. He looks lost and isolated, with a determined expression on his face.'},
  {'scene': 'A meeting between the man and the Little Prince',
   'description': 'A surreal desert landsca

___
___
___


# Get **Style**
![Local Image](/home/prince/Documents/Project/BOOK/media/Get_Style_logic.png )

In [14]:
def basic_llm_req(text: str) -> Optional[str]:
    messages = [
        {
            "role":"system",
            "content":"DO As Asked in The Input"
        },
        {
            "role":"user",
            "content":text
        }
    ]
    
    data = {
        "messages": messages,
        "max_tokens": 10000,  # Specify the maximum length of the response
        "temperature": 0,  # Control the randomness of the response
        "stream": False,
    }
    response = requests.post(url, headers=headers, json=data)

    # Check the response status code and process the output
    if response.status_code == 200:
        response_data = response.json()
        # Extract the assistant's message content
        assistant_message = response_data["choices"][0]["message"]["content"]
        return assistant_message
    else:
        print(f"Error: {response.status_code}, {response.text}")  # Print error details
        return None
 

In [15]:
combined_summary=""
for key,val in output_dict.items():
    val_content=val["summary"].replace("\n"," ")
    if key==0:
        continue
    output_string= f'''{combined_summary}... Chapter{key}: {val_content}'''

In [16]:
style_prompt='''Prompt:

"Analyze the following story and provide a list of image style tags that would best suit its themes, settings, and overall mood. 
The response should include the style, period, type of art, color palette etc. 1 tags per entry.dont explain it just give tags.

Response Format:

Style: Realism, Impressionism, Surrealism,etc
Type:Landscape, Portrait, Abstract,etc.
Color Palette:Warm tones, Cool colors, Monochromatic,etc
Mood:Serene, Dramatic, Melancholic,etc.

Story
'''

style=basic_llm_req(f'''{style_prompt}:{combined_summary} ''')

In [17]:
style

'Title: "The Forgotten Whispers of the Willow"\n\n1. Style: Impressionism\n2. Type: Landscape\n3. Color Palette: Cool colors (Blues, Greens, Purples)\n4. Mood: Melancholic, Nostalgic'

___
___
___

# Get **Image**
![Local Image](/home/prince/Documents/Project/BOOK/media/Get_img_logic.png )

In [18]:
from huggingface_hub import InferenceClient
import time

client = InferenceClient(
    "stabilityai/stable-diffusion-3.5-large-turbo",
    token=API,
)

In [19]:
def get_images(key, text,characters,places,style):
    image = client.text_to_image(
        f"Text: {text}// context-> characters: {characters}, places:{places} // style:{style} ",
        negative_prompt="hand and feet",
        height=1024,
        width= 1024,
        # guidance_scale=2,
        num_inference_steps=10
        
    )
    image.save(f"./output/{title}_{key}.png")

![Local Image](/home/prince/Documents/Project/BOOK/media/Image_Loop.png )

In [20]:

for idx,i in enumerate(scene_output_list):
    chars=output_dict[idx+1]["characters"]
    places=output_dict[idx+1]["places"]
    for jdx,j in enumerate(i):
        ##try except for internal server error
        try:
            get_images(key=f"chapter{idx}_scene{jdx+1}",text=j,characters=chars,places=places,style=style)
            print(f"chapter{idx}_scene{jdx+1} saved")
        except Exception as e:
            print(f"Error while calling the API :{e} \n waiting for 10 seconds.. ")
            time.sleep(10)
            get_images(key=f"chapter{idx}_scene{jdx+1}",text=j,characters=chars,places=places,style=style)
            print(f"chapter{idx}_scene{jdx+1} saved")
        time.sleep(18)

chapter0_scene1 saved


KeyboardInterrupt: 