# DOC


## DOCUMENTATION
___
### **LLM**

| **Model**                    | **Context Window** |**Limits**           |
|------------------------------|--------------------|---------------------|
| **Qwen 2.5 35B Code**        | Up to 128K tokens  |Fine Tuned for coding|
| **Cohere Command R+**        | Up to 128K tokens  | 1000 calls per month|
| **Mistral-7B-Instruct-v0.3** | 32,768 tokens      | Hallucinations      |

#### Using `Mistral-7B-Instruct-v0.3` because 
- reliable outputs
- 1000 calls per day
- works in huggingface(same as text-to-image)


### **Text-to-Image**


| **Category**               | **Stable Diffusion 3.5 Large Turbo**                                                                 | **FLUX.1-Dev FP16**                                                                       |
|----------------------------|-----------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
| **API Endpoint**           | `stabilityai/stable-diffusion-3-5-large-turbo`                                                     | `black-forest-labs/FLUX.1-Dev-FP16`                                                     |
| **Key API Parameters**     | `guidance_scale=3`, `num_inference_steps=4`, `negative_prompt`                                     | `num_inference_steps=8`, `control_image` (Canny/Depth), `true_cfg=4.0`                  |
| **API Latency**            | 3.8–5.2 sec/image (cold start: 12–18 sec)                                                          | 6.1–8.9 sec/image (cold start: 15–22 sec)                                                |
| **Cost Efficiency**        | $0.0021/image (PRO tier)                                                                           | $0.0033/image (PRO tier)                                                                 |
| **Free Tier Limits**       | 500 requests/hour                                                                                  | 300 requests/hour                                                                        |
| **Max Resolution**         | 1024x1024 via single API call                                                                      | 2048x2048 (requires `high_res_fix=true` parameter)                                       |
| **Advanced Features**      | - 4-step inference <br> - Text-to-image only                                                       | - Unified ControlNet (Canny/Depth) <br> - Image-to-image <br> - Inpainting/Outpainting   |
| **NSFW Filtering**         | Enabled by default (`safety_checker=strict`)                                                       | Optional (`safety_checker=relaxed`)                                                      |
| **Rate Limits (PRO)**      | 5K requests/hour                                                                                   | 3K requests/hour                                                                         |
| **Use Case Focus**         | Rapid batch generation (social media, prototyping)                                                 | High-detail workflows (product design, architectural viz)                                |

#### using `SD-3.5-LT` because 
- latency is low
- More requests
- we want stialized images which is better on SD 



___
___
___

### Get **Summary**, **Characters** AND **Places** 

![Local Image](/home/prince/Documents/Project/BOOK/media/Sudo_summary.png "Smmary, characters and Places")

### Imports

In [2]:
from reader import ebook
from utils import read_json
from typing import Optional,Dict,List
import requests
from dotenv import load_dotenv
import json
import os
from pydantic import BaseModel, ValidationError
from dataclasses import dataclass, field

### API init

In [3]:
##Class implementation
load_dotenv()
API = os.getenv("HF_API")

from huggingface_hub import InferenceClient
import time
from loguru import logger

client = InferenceClient(
    # model="strangerzonehf/Qs-Sketch",
    model="stabilityai/stable-diffusion-3.5-large-turbo",
    token=API,)

@dataclass
class api:
    url:str = field(init=False,default="https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.3/v1/chat/completions") 
    
    headers:Dict[str,str]= field(init=False,default={"Authorization": f"Bearer {API}","Content-Type": "application/json",})
    
    image_model:str=field(init=False,default="stabilityai/stable-diffusion-3.5-large-turbo")
    
    api:str=field(init=False,default=API)
    
    client:InferenceClient
    
    
    def __post_init__(self):
        pass
        
    

ValueError: mutable default <class 'dict'> for field headers is not allowed: use default_factory

In [4]:
load_dotenv()
API = os.getenv("HF_API")

headers = {
    "Authorization": f"Bearer {API}",
    "Content-Type": "application/json",
}
url = "https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.3/v1/chat/completions"
url='https://v540lmjweza115-8000.proxy.runpod.net/v1/chat/completions'
# url = "https://huggingface.co/api/inference-proxy/together/v1/chat/completions"
# url="http://localhost:11434/api/chat"


In [5]:
def validate_json(data: str,schema:BaseModel):
    try:
        data=read_json(data)
        validated_data = schema.model_validate(data)
        return (False,validated_data)
    except ValidationError as e:
        return (True,e.errors())

### Static sum_Role

### **get_sum()** Function
![Local Image](/home/prince/Documents/Project/BOOK/media/Get_sum_logic.png "Smmary, characters and Places")


outputdir=
```{
    0:
        {
            "summary":"No previous Context Yet",
            "characters":{},
            "places": {}
        }
            }```


In [24]:

class SummarySchema(BaseModel):
    summary: str
    characters: Dict[str,str]
    places: Dict[str,str]

In [35]:
sum_role='''(NOTE: Only output in JSON. Ensure the JSON format is valid, well-formed, and Ready to parse. nothing before or after the json file)
Input:  
1.Current Chapter Text: The current chapter to be analyzed.
2.Character List: A list of characters with their physical/visual descriptions till now (This chapter).
3.Places list: list of places and their visual description till now (This chapter).
4.Previous Chapters' Summary: Context from earlier chapters.

Rules:  
1.Narrative Summary: Summarize and explain the chapter in detail, integrating context and key developments from previous chapters and create a self containing summary and explaination. end with to be continued.
2.Character List: add new characters to the list based on this chapter and Update existing character's physical/visual descriptions. If no characters are mentioned, return the same list as given.
3.Places: Include an updated description of any significant locations mentioned in this chapter, focusing on environment, weather, vibe, and structure.
4.Output Format: Ensure the output is valid and well-structured JSON.

Output:  
Generate a JSON object in this format:
{
  "summary": "Detailed Summary and explination of the current chapter in context of previous chapters. Use previus chapter summary as context",
  "characters": {
      "Character Name": "Updated or new physical/visual description (age, looks, clothes, hair, body language) based on this chapter."
    },
  "places": {
      "Place Name": "Updated or new visual description (environment, weather, vibe, structure, etc.) based on this chapter."
  }
}

json schema:
'''+ f"{SummarySchema.model_json_schema()}"

In [38]:


    

def sum_msg(text: str, role:str, context: str, characters: dict = {}, places: dict = {}) -> list:
    message =[
        {
            "role": "system",
            "content":role,
        },
        {
            "role": "user",
            "content": json.dumps({
                "past_context": context,
                "Current_Chapter": text,
                "character_list": characters,
                "places_list": places
            }),
        },
    ]
    return message

def get_summ(messages:Dict[str,str]) -> Optional[str]:
    data = {
        "model":"mistralai/Mistral-7B-Instruct-v0.3",
        "messages": messages,
        # "temperature": 0.7,
        "stream": False,
        # "max_tokens":10000,
    }
    response = requests.post(url, headers=headers, json=data)

    if response.status_code == 200:
        response_data = response.json()
        assistant_message = response_data["choices"][0]["message"]['content']
        return assistant_message
    else:
        print(f"Error: {response.status_code}, {response.text}")
        return None



In [17]:
from reader import ebook
from utils import read_list,read_json
# book=ebook("./books/Alchemist/Alchemist.epub")
# book=Book("./book/Pixels.pdf")
book = ebook("./books/HP.epub")
out = book.get_chapters()



In [37]:
text=[i[1] for i in out]
text

['Description\n\u3000This is the braille version of the international bestseller. "Harry\nPotter and the Sorcerer\'s Stone" has reached a level of best-sellerdom\nnever before achieved by a children\'s novel in the United States--The\nNew York Times, April 1, 1999. If you haven\'t heard about this book,\nyou\'ve been asleep. Written for 8 to 12-year olds, "Harry Potter"\nappeals equally to adults. Who is Harry Potter? Harry Potter is an old-\nfashioned hero. He learns that choices show more of who one is than\nabilities. If you\'re looking for magic and adventure, read this book.\nFour volumes in braillle.\n\u3000\n',
 'CHAPTER ONE: THE BOY WHO LIVED\n\u3000Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to\nsay that they were perfectly normal, thank you very much. They were\nthe last people you\'d expect to be involved in anything strange or\nmysterious, because they just didn\'t hold with such nonsense.\n\u3000Mr. Dursley was the director of a firm called Grunnings, wh

In [272]:
output_dict={
    0:
        {
            "summary":"There is no previous context",
            "characters":{},
            "places": {}
        }
            }


In [29]:
output_dict

{0: {'summary': 'There is no previous context',
  'characters': {},
  'places': {}}}

____
____


### SummaryLoop

In [39]:
output_dict={
    0:
        {
            "summary":"There is no previous context",
            "characters":{},
            "places": {}
        }
            }

for idx,i in enumerate(text):
 
    if len(i)<200:
        output_dict[idx+1]=output_dict[idx]
    else:
        context=output_dict[idx]["summary"]
        characters=output_dict[idx]["characters"]
        places=output_dict[idx]["places"]
        
        mes=sum_msg(
            text=i,
            role=sum_role,
            context=context,
            characters=characters,
            places=places
            )
        
        summary_characters=get_summ(mes)
        
        
        err,output_json=validate_json(summary_characters,SummarySchema)
        if err:
            print("error")
            output_dict[idx+1]=output_dict[idx]
        else:
            output_dict[idx+1]=output_json.model_dump()
        
    print(f"chapter: {idx} done")
print(output_dict)

chapter: 0 done
chapter: 1 done
chapter: 2 done
chapter: 3 done
chapter: 4 done
chapter: 5 done


KeyboardInterrupt: 

In [40]:
for num, dic in output_dict.items():
    if num==0:
        continue
    print(f"CHAPTER {num} :\n")
    
    sum,char,places=dic.items()
    print("Characters:")
    if len(char)>0 :
        for name,i in char[1].items():
            print(f"{name}: {i}")
    
    print("\nPlaces:")
    if len(places)>0:
        for name,i in places[1].items():
            print(f"{name}: {i}")
    print(f"\nSummary:\n{sum[1]}")
    print("\n-------------------------------------------------------------------------------------------------------------------")

CHAPTER 1 :

Characters:
Harry Potter: Eight to twelve-year-old protagonist, described as an old-fashioned hero learning that choices show more of who one is than abilities.

Places:

Summary:
In this chapter of 'Harry Potter and the Sorcerer's Stone', the book's braille version is introduced as an international bestseller, appealing to both children and adults. The novel is described as an old-fashioned hero story, where Harry Potter learns that choices reveal more about one's character than abilities. The chapter emphasizes the magical and adventurous nature of the book.

-------------------------------------------------------------------------------------------------------------------
CHAPTER 2 :

Characters:
Harry Potter: Eight to twelve-year-old protagonist, described as an old-fashioned hero learning that choices show more of who one is than abilities.

Places:
Privet Drive: The suburban street where the Dursleys reside, described as being quiet and unassuming, providing a stark 

___
___
___


### Get **Scenes**
![Local Image](/home/prince/Documents/Project/BOOK/media/Sudo_summary.png )

In [218]:

class SceneSchema(BaseModel):
    scenes:Dict[str,str]
       
print(SceneSchema.model_json_schema())

{'properties': {'scenes': {'additionalProperties': {'type': 'string'}, 'title': 'Scenes', 'type': 'object'}}, 'required': ['scenes'], 'title': 'SceneSchema', 'type': 'object'}


In [237]:
scene_role=''' IMPORTANT-> ONLY OUTPUT IN JSON and Nothing else.
                You are a text-to-image prompt generator for for book visulizer.
                Your task is to analyze the provided input text and find distinct scenes where there are changes in place or time 
                and For each identified scene, create a detailed and descriptive prompt suitable for generating an image.
                only include visual info.
                handle name and pronouns by describing the character/place/object the name refers to.
                
                1. **Input:**
                    - `text`: A block of narrative text.```
                2. **Output**
                    -  {scene : {
                       "scene1" : "prompt",
                       "scene2" : "prompt"
                    },
                    }
                
                   '''

  

In [251]:
def scene_msg(text: str) -> list:
    message = [
        {
            "role": "system",
            "content":f"{scene_role} , enforce given schema : {SceneSchema.model_json_schema()} ",
        },
        
        {
            "role": "user",
            "content": text,  # This should be your input text that describes the scenes
        },]
        
    return message

def get_scene(scene_msg: List[dict]) -> Optional[str]:
    data = {
        "messages": scene_msg,
        "max_tokens": 10000,  # Specify the maximum length of the response
        # "temperature": 2,  # Control the randomness of the response
        "stream": False,
        "repetition_penalty": 1.3,
        "grammar": {
            "type": "json",
            "value": SceneSchema.model_json_schema()
            
                }
        }
    response = requests.post(url, headers=headers, json=data)

    # Check the response status code and process the output
    if response.status_code == 200:
        response_data = response.json()
        # Extract the assistant's message content
        assistant_message = response_data["choices"][0]["message"]["content"]
        return False,assistant_message
    else:
        print(f"Error: {response.status_code}, {response.text}")  # Print error details
        return True,None

### SceneLoop

In [257]:
text

['Chapter 1: The Two Worlds \n\nParth Jadeja leaned back in his chair, the glow of the monitor illuminating his face in the dimly lit room. His gaming setup—a mechanical keyboard, a high-refresh-rate monitor, and a headset resting on his shoulders—was his escape. For years, virtual battlegrounds had been his second home, a place where he mastered reflexes, strategy, and teamwork. But lately, another world has been demanding his attention—the world of AI and machine learning. \nBy day, he was a final-year Computer Engineering student specializing in AI/ML, dissecting algorithms, optimizing code, and diving into research papers. By night, he was R2K DESMOND, his Riot ID in Valorant, a competitor in the ever-evolving gaming realm. The two sides of his life coexisted in a delicate balance, but he knew the scale was beginning to tip. \nHis desk was a reflection of this dual existence—on one side, stacks of books on reinforcement learning, speech processing, and SEO strategies; on the other,

In [258]:
scene_output_list=[]
print("--Genrating Scenes per chapter--")
for idx,i in enumerate(text):
    inputs=i.replace("\n"," ")
    
    scene_message = scene_msg(inputs)
    err,scene_output=get_scene(scene_message)

    if err:
        print("API error")
        continue
    
    err,scene_json_output=validate_json(scene_output,SceneSchema)
    
    if err:
        print("validation error")
        
    scene_output_list.append(scene_json_output.scenes)
    print(f"chapter: {idx+1} Done")

--Genrating Scenes per chapter--
chapter: 1 Done
chapter: 2 Done
chapter: 3 Done
chapter: 4 Done
chapter: 5 Done
chapter: 6 Done
chapter: 7 Done
chapter: 8 Done
chapter: 9 Done
chapter: 10 Done
chapter: 11 Done
chapter: 12 Done
chapter: 13 Done
chapter: 14 Done
validation error


AttributeError: 'list' object has no attribute 'scenes'

In [266]:
# scene_output_list
scene_output_ls=[]
for i in scene_output_list:
    per_chap_ls=[]
    for key,val in i.items():
        per_chap_ls.append(val)
    scene_output_ls.append(per_chap_ls)


In [267]:
scene_output_ls

[['A young man, Parth Jadeja, is sitting in a dimly lit room. He is focused on a computer monitor, which is casting a glow on his face. Nearby, there is a mechanical keyboard, a high-refresh-rate monitor, and a headset. The room is filled with the warmth of RGB lighting, creating a gaming-focused ambiance.',
  'In the same room, Parth is surrounded by stacks of books on topics like AI, machine learning, speech processing, and SEO strategies. Across from him, there is a large gaming rig that is a testament to his dedication to competitive gaming. Parth looks deep in thought as he toggles between his work on a laptop and his gaming equipment.'],
 ['A close-up image of Parth sitting at a computer, the cursor blinking on a blank screen, he looks contemplative, surrounded by books and gaming paraphernalia.',
  "A shot of Parth's computer screen displaying his response to the AI-driven system project offer, his words 'I'm interested. Tell me more.' glowing on the screen.",
  'A split screen 

___
___
___



## UNIMPLEMENTED


### Get **Style**
![Local Image](/home/prince/Documents/Project/BOOK/media/Get_Style_logic.png )

In [51]:
def basic_llm_req(text: str) -> Optional[str]:
    messages = [
        {
            "role":"system",
            "content":"DO As Asked in The Input"
        },
        {
            "role":"user",
            "content":text
        }
    ]
    
    data = {
        "messages": messages,
        "max_tokens": 10000,  # Specify the maximum length of the response
        "temperature": 0,  # Control the randomness of the response
        "stream": False,
    }
    response = requests.post(url, headers=headers, json=data)

    # Check the response status code and process the output
    if response.status_code == 200:
        response_data = response.json()
        # Extract the assistant's message content
        assistant_message = response_data["choices"][0]["message"]["content"]
        return assistant_message
    else:
        print(f"Error: {response.status_code}, {response.text}")  # Print error details
        return None
 

In [17]:
combined_summary=""
for key,val in output_dict.items():
    val_content=val["summary"].replace("\n"," ")
    if key==0:
        continue
    output_string= f'''{combined_summary}... Chapter{key}: {val_content}'''

In [18]:
style_prompt='''Prompt:
Note - dont give more than asked for. 
"Analyze the following story and provide a list of image style tags that would best suit its themes, settings, and overall mood. 
The response should include the style, period, type of art, color palette etc. 1 tags per entry.
dont explain it just give tags.

**ONLY TAGS**

Response Format:

Style: Realism or Impressionism or Surrealism oretc
Type:Landscape, Portrait or Abstract or etc.
Color Palette:Warm tones or Cool colors or Monochromatic or black and white or etc
Mood:Serene or Dramatic or Melancholic or etc.

reuired:(Style,Type,Color_palette,Mood)
Story
'''

style=basic_llm_req(f'''{style_prompt}:{combined_summary} ''')

In [123]:
style=''' line art illustration , Hand drawn illustrations, illustration, pattern designs, Fantasy. pale colors, soft colors,imperfect'''

___

## Get **Image**
![Local Image](/home/prince/Documents/Project/BOOK/media/Get_img_logic.png )

In [260]:
from huggingface_hub import InferenceClient
import time

client = InferenceClient(
    # model="strangerzonehf/Qs-Sketch",
    model="stabilityai/stable-diffusion-3.5-large-turbo",
    token=API,
)

In [261]:
def get_images(key, text,tag,characters,places):
    image = client.text_to_image(
        f"Text: {text}// context-> characters: {characters}, places:{places}",
        # negative_prompt="hand,feet,text,written,shinny,artificial,unnatural,plastic,words,letters",
        # guidance_scale=2,
        # num_inference_steps=10
        
    )
    image.save(f"./output/PD_{key}_{tag}.png")

### Loop Logic
![Local Image](/home/prince/Documents/Project/BOOK/media/Image_Loop.png )

In [268]:

for idx,i in enumerate(scene_output_ls):
    chars=output_dict[idx+1]["characters"]
    places=output_dict[idx+1]["places"]
    if not i:
        break 
    for jdx,j in enumerate(i):
        ##try except for internal server error
        try:
            key=f"C{idx}S{jdx+1}"
            text=j
            print(text)
            get_images(key=key,text=text,tag="",characters=chars,places=places)
            print(f"{key} saved. tag :  ")
            print(f"    prompt:{text} \n    characters:{chars}\n    places:{places}")
            time.sleep(18)
        except Exception as e:
            print(f"Error while calling the API :{e} \n waiting for 10 seconds.. ")
            time.sleep(10)
            get_images(key=key,text=text,tag="",characters=chars,places=places)
            print(f"{key} saved. tag : ")
            print(f"    prompt:{text}")


A young man, Parth Jadeja, is sitting in a dimly lit room. He is focused on a computer monitor, which is casting a glow on his face. Nearby, there is a mechanical keyboard, a high-refresh-rate monitor, and a headset. The room is filled with the warmth of RGB lighting, creating a gaming-focused ambiance.
C0S1 saved. tag :  
    prompt:A young man, Parth Jadeja, is sitting in a dimly lit room. He is focused on a computer monitor, which is casting a glow on his face. Nearby, there is a mechanical keyboard, a high-refresh-rate monitor, and a headset. The room is filled with the warmth of RGB lighting, creating a gaming-focused ambiance. 
    characters:{}
    places:{}
In the same room, Parth is surrounded by stacks of books on topics like AI, machine learning, speech processing, and SEO strategies. Across from him, there is a large gaming rig that is a testament to his dedication to competitive gaming. Parth looks deep in thought as he toggles between his work on a laptop and his gaming e