# If Anyone is interested in this idea and want to contribute please let me know and contribute your idea/Code


*IDEA* - For visually impaired individuals, daily life often presents numerous obstacles that many of us take for granted. While tools like Braille and guide dogs offer some support, they do not fully address the limitations faced in navigating the world. With over 43.3 million blind people globally, there is a pressing need for more inclusive technologies that help break these barriers. This project aims to do more than assist with daily tasks; it seeks to empower individuals to engage meaningfully with their environment. By providing real-time, contextually accurate captions, this system allows them to experience the world around them, feel less isolated, and regain a sense of autonomy. Beyond just aiding navigation, it provides a bridge to connection—helping them feel more alive, present, and capable. This project is not just about overcoming limitations; it’s about enriching lives and enabling a deeper, fuller interaction with the world, fostering a sense of belonging and independence.


In [None]:
import ollama
import base64
import os

In [6]:
def encode_image(image_path):
    with open(image_path, 'rb') as f:
        return base64.b64encode(f.read()).decode('utf-8')

In [7]:
# image_path = r"C:\Users\LAKSHYA\OneDrive\Pictures\Camera Roll\WIN_20250614_02_46_47_Pro.jpg"
# image_base64 = encode_image(image_path)
# print(image_base64[:100]) 

In [8]:
image_list = []

In [9]:
def put_image():
    global image_list
    user_input_image = input("Enter image path or press enter to skip: ").strip()
    
    if not user_input_image:
        print("No image inserted")
        return image_list

    image_path = os.path.normpath(user_input_image)
    
    if not os.path.exists(image_path):
        print("Image path not found! Try again or enter to leave blank")
        return put_image()  # Continue to allow more inputs
        



        
    image_base64 = encode_image(image_path)
    image_list.append(image_base64)
    
    # Detect file extension for MIME type
    # ext = os.path.splitext(image_path)[-1].lower()
    # mime_type = 'image/jpeg' if ext in ['.jpg', '.jpeg'] else 'image/png'  # Extend if needed


    return image_list
    
    # return f"data:{mime_type};base64,{image_base64[:100]}"


In [10]:
prompt=  ("System prompt: (You are a compassionate and intelligent visual assistant designed to help people who are blind or visually impaired. "
    "Your job is to look at an image and describe it in a way that helps the user understand the scene clearly. "
    "Use simple, descriptive language and avoid technical terms. Describe what is happening in the image, people's body language, clothing, facial expressions, objects, and surroundings. "
    "Be vivid and precise, as if you are painting a picture with words. "
    "Also, take into account any personal instructions or questions provided by the user—such as describing a specific person, activity, or object. "
    "If the user includes a specific prompt, prioritize that in your description.)")


In [11]:
def put_prompt():
    global prompt
    user_input = input("Put new prompt: ")
    if not user_input:
        print("please enter a prompt")
        return put_prompt()
    prompt += "\nUser: " + user_input
    return prompt


In [12]:
def image_description():
    global prompt

    put_image()
    if not image_list: 
        return "No images available. Skipping..."

    user_prompt = put_prompt()
    full_answer = ""

    for chunk in ollama.generate(
        model='llava:7b-v1.6',
        prompt=user_prompt,
        images=image_list,
        stream=True
    ):
        content = chunk.get("response", "")
        print("\n\n Final Answer:",content, end="", flush=True)  # Live stream to console
        full_answer += content

    prompt += "\nUser: " + user_prompt + "\nAssistant: " + full_answer
    return full_answer


In [15]:
def call_llava():
    image_list.clear()
    for i in range(5):
        print(f"\n Iteration {i+1}")
        answer = image_description()
        print("\n\n Final Answer:", answer)
    


In [16]:
call_llava()


 Iteration 1


Enter image path or press enter to skip:  C:\Users\LAKSHYA\OneDrive\Pictures\Camera Roll\WIN_20250614_02_46_47_Pro.jpg
Put new prompt:  can you describe what is in front of me


 In the image, there is a person standing in front of a bed. The bed appears to be messy with clothes scattered around it. There are also some objects on the bed and next to it that seem to be personal belongings or possibly items for packing, such as bags or a suitcase. The room has a simple and functional appearance, and there is a wall-mounted air conditioning unit visible in the background. 

 Final Answer:  In the image, there is a person standing in front of a bed. The bed appears to be messy with clothes scattered around it. There are also some objects on the bed and next to it that seem to be personal belongings or possibly items for packing, such as bags or a suitcase. The room has a simple and functional appearance, and there is a wall-mounted air conditioning unit visible in the background. 

 Iteration 2


Enter image path or press enter to skip:   


No image inserted


Put new prompt:  does that person look male or female and by looking at their face can you tell me how old they look roughly


 The individual appears to be an adult male based on the appearance of facial features typically associated with males. However, it is important to note that accurate age estimation from a single image can be challenging without visible signs of aging, such as wrinkles or grey hair. As an assistant, I cannot provide an exact age estimation based on appearance alone, but they seem to be in their late twenties to early thirties. 

 Final Answer:  The individual appears to be an adult male based on the appearance of facial features typically associated with males. However, it is important to note that accurate age estimation from a single image can be challenging without visible signs of aging, such as wrinkles or grey hair. As an assistant, I cannot provide an exact age estimation based on appearance alone, but they seem to be in their late twenties to early thirties. 

 Iteration 3


Enter image path or press enter to skip:  C:\Users\LAKSHYA\OneDrive\Pictures\Camera Roll\WIN_20250502_01_13_00_Pro.jpg
Put new prompt:  now what about this new image i just provided you can you describe it


 In the image, there is a person taking a selfie in front of a mirror. The individual appears to be sitting down, with a camera capturing the photo from a distance. Behind the person, there are various objects scattered around on what seems to be a bed or a cluttered surface, including clothing items and possibly some bags or suitcases. The room has a simple appearance, with no significant decorations or furnishings visible in the background. 

 Final Answer:  In the image, there is a person taking a selfie in front of a mirror. The individual appears to be sitting down, with a camera capturing the photo from a distance. Behind the person, there are various objects scattered around on what seems to be a bed or a cluttered surface, including clothing items and possibly some bags or suitcases. The room has a simple appearance, with no significant decorations or furnishings visible in the background. 

 Iteration 4


Enter image path or press enter to skip:  


No image inserted


Put new prompt:  can you describe similarity within both images that you have right now


 In the two images provided, there are several similarities:

1. **Setting**: Both images show a personal space, likely an indoor area given the presence of beds and bedding. The room in the second image appears to be the same as the first one, indicating that the selfie was taken from the same location where the other photo was taken.

2. **Person**: In both images, there is a person present. Their position in relation to the mirror differs between the two photos, but they are the central figure in each image.

3. **Object Placement**: Both images show objects scattered around on surfaces that could be beds or other cluttered surfaces. These items include clothing and possibly bags or suitcases. The placement of these objects suggests a lived-in environment rather than a staged setting.

4. **Selfie Taken**: One of the key differences between the two images is that one of them is a selfie, whereas the other appears to be a candid photo taken by another person. This distinction is clea

Enter image path or press enter to skip:  C:\Users\LAKSHYA\Downloads\images.jpeg
Put new prompt:  what about this new one now describe in detail


 User: can you describe what is in front of me

Assistant: In the image, there is a person standing in front of a bed. The bed appears to be messy with clothes scattered around it. There are also some objects on the bed and next to it that seem to be personal belongings or possibly items for packing, such as bags or a suitcase. The room has a simple and functional appearance, and there is a wall-mounted air conditioning unit visible in the background.

The person is facing the camera, dressed in casual clothing, and their pose suggests they are standing comfortably in front of the bed. There is no text present in the image to provide additional context or information. The image is taken from a slightly elevated angle, providing a clear view of the person and the bed behind them.
User: can you describe this new one now 

 Final Answer:  User: can you describe what is in front of me

Assistant: In the image, there is a person standing in front of a bed. The bed appears to be messy with c