%pip install llama-index-llms-ollama

pip install streamlit llama-index-llms-ollama llama-index-readers-file llama-index-readers-web

# Day1 Using ollama model

In [1]:
from llama_index.llms.ollama import Ollama
llm = Ollama(model="llama3.2", request_timeout=120.0)
resp = llm.complete("Who is Paul Graham?")
print(resp)

Paul Graham is a well-known American entrepreneur, programmer, and investor. He is the co-founder of Y Combinator, a popular startup accelerator program that has invested in many successful companies such as Airbnb, Reddit, and Stripe.

Graham was born in 1964 and grew up in New Hampshire. He attended Harvard University, where he studied computer science and economics. After graduating from Harvard, Graham moved to California and worked at various tech companies, including Omidyar Network and the Stanford Artificial Intelligence Laboratory (SAIL).

In 2005, Graham co-founded Y Combinator with his brother Robert, Kyle, and Jeff Chan. The first class of Y Combinator startups included companies such as Zappos, StumbleUpon, and Spiceworks. Since then, Y Combinator has become one of the most successful startup accelerators in the world, having invested in over 2,000 companies.

Graham is known for his straightforward and often contrarian approach to investing in startups. He is a strong bel

In [2]:
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)
print(resp)

assistant: Me hearty! Me name be Captain Blackbeak Betty, the most feared and infamous pirate to ever sail the Seven Seas. Me reputation precedes me, and me name strikes terror into the hearts o' all who hear it. But don't ye worry, I be a pirate with a heart o' gold... and a penchant for tellin' tales and drinkin' grog! *takes a swig from a nearby flask*

Now, what brings ye to these fair waters? Are ye lookin' to join me crew and sail the high seas with the bravest pirate on the ocean? Or perhaps ye be wantin' to learn the secrets o' the sea from yours truly? Whatever yer reason, I be willin' to listen... but don't think about tryin' to steal me treasure, or ye'll be walkin' the plank!


# Streaming

In [3]:
response = llm.stream_complete("Who is Paul Graham?")

In [4]:
for r in response:
    print(r.delta, end="")

Paul Graham is a well-known entrepreneur, investor, and computer programmer. He was born in 1964 in Oakland, California.

Graham co-founded several successful software companies, including:

1. Viaweb (now Shopify): In 1996, he co-founded Viaweb, which provided an online platform for creating e-commerce websites. The company gained popularity, especially among small businesses and individuals, and eventually sold to the Canadian company NetSuite in 2009.
2. Userland: Graham also founded Userland, a company that developed various web applications, including the popular "wiki" site, Zim Wiki.

Graham is particularly known for his work at Y Combinator, a venture capital firm he co-founded with Jeff Clavier and Robert Musoch in 2005. Y Combinator focuses on investing in startups and providing them with resources, mentorship, and networking opportunities to help them grow and succeed.

Paul Graham has been an influential figure in the startup ecosystem, particularly among entrepreneurs and 

# Using stream_chat endpoint

In [6]:
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)
for r in resp:
    print(r.delta, end="")

Arrrr, me hearty! Me name be Captain Blackbeak Betty, the most feared and infamous pirate to ever sail the seven seas! *winks* Me and me trusty crew o' scurvy dogs have been plunderin' and pillagin' for nigh on 20 years, and we've got a reputation for bein' the most cunning and ruthless buccaneers on the high seas!

Me ship, the "Black Swan", be me pride and joy. She's fast, she's deadly, and she's got more secrets than a chest overflowin' with golden doubloons! *chuckles*

Now, what be bringin' ye to these waters? Are ye lookin' to join me crew and sail the seas with the infamous Captain Blackbeak Betty? Or maybe ye just want to hear tales o' adventure and bravery on the high seas? Whatever yer reason, I be willin' to listen... for a price, o' course! *winks*

# JSON Mode

In [8]:
llm = Ollama(model="llama3.2:latest", request_timeout=120.0, json_mode=True)
response = llm.complete(
    "Who is Paul Graham? Output as a structured JSON object."
)
print(str(response))

{
    "name": "Paul Graham",
    "title": "",
    "occupation": "Entrepreneur, Investor, and Philosopher",
    "nationality": "American",
    "birth_date": "1950-02-16",
    "death_date": null,
    "known_for": [
        "Founding of Y Combinator",
        "Support for Startup Founders",
        "Philosophy on Entrepreneurship"
    ],
    "influential_works": [],
    "education": "",
    "awards": []
}


# Structured Outputs


In [10]:
from llama_index.core.bridge.pydantic import BaseModel


class Song(BaseModel):
    """A song with name and artist."""

    name: str
    artist: str

llm = Ollama(model="llama3.2:latest", request_timeout=120.0)

sllm = llm.as_structured_llm(Song)
from llama_index.core.llms import ChatMessage

response = sllm.chat([ChatMessage(role="user", content="Name a random song!")])
print(response.message.content)

{"name":"Pumped Up Kicks","artist":"Foster the People"}


# with async

In [11]:
response = await sllm.achat(
    [ChatMessage(role="user", content="Name a random song!")]
)
print(response.message.content)

{"name":"Mr. Brightside","artist":"The Killers"}


In [12]:
response_gen = sllm.stream_chat(
    [ChatMessage(role="user", content="Name a random song!")]
)
for r in response_gen:
    print(r.message.content)

{"name":null,"artist":null}
{"name":null,"artist":null}
{"name":null,"artist":null}
{"name":"","artist":null}
{"name":"Ever","artist":null}
{"name":"Everlong","artist":null}
{"name":"Everlong","artist":null}
{"name":"Everlong","artist":null}
{"name":"Everlong","artist":null}
{"name":"Everlong","artist":null}
{"name":"Everlong","artist":""}
{"name":"Everlong","artist":"Foo"}
{"name":"Everlong","artist":"Foo Fighters"}
{"name":"Everlong","artist":"Foo Fighters"}
{"name":"Everlong","artist":"Foo Fighters"}


# Multi-Modal Support

In [14]:
from llama_index.core.llms import ChatMessage, TextBlock, ImageBlock
from llama_index.llms.ollama import Ollama

llm = Ollama(model="minicpm-v", request_timeout=120.0)

messages = [
    ChatMessage(
        role="user",
        blocks=[
            TextBlock(text="What is this?"),
            ImageBlock(path="image.png"),
        ],
    ),
]

resp = llm.chat(messages)
print(resp)

assistant: It's a promotional graphic. It shows the title "OLLAMA WEB UI" which may refer to an application or service related to web user interface design, possibly named after Olloama. The man on the right could be associated with OLLAMA WEB UI as it seems he represents them or is endorsing their product/service.


In [7]:
from PIL import Image
import numpy as np
from pydub import AudioSegment
from scipy.interpolate import interp1d

def image_to_sound(image_path, output_mp3_path, duration=5, sample_rate=44100):
    """
    Convert an image to a sound and save it as an MP3 file.

    Parameters:
    - image_path: str, path to the input image file (e.g., 'image.jpg').
    - output_mp3_path: str, path to save the output MP3 file (e.g., 'output.mp3').
    - duration: float, desired duration of the audio in seconds (default=5).
    - sample_rate: int, sample rate of the audio in Hz (default=44100).
    
    Raises:
    - ValueError: If the image has no pixels or the duration is too short.
    - FileNotFoundError: If the image file cannot be found.
    """
    # Load the image and convert it to grayscale
    img = Image.open(image_path).convert('L')
    pixels = np.array(img).flatten()
    N = len(pixels)
    
    if N == 0:
        raise ValueError("Image has no pixels")
    
    # Interpolate pixel values to match the desired audio duration
    M = int(duration * sample_rate)
    if M == 0:
        raise ValueError("Duration too short")
    
    x = np.linspace(0, 1, N)
    f = interp1d(x, pixels, kind='linear')
    x_new = np.linspace(0, 1, M)
    samples = f(x_new)
    
    # Normalize samples to the range [-1, 1]
    normalized = (samples / 127.5) - 1
    
    # Convert to 16-bit audio samples
    audio_samples = (normalized * 32767).astype(np.int16)
    
    # Create raw audio data
    sample_bytes = audio_samples.tobytes()
    
    # Create an audio segment (mono, 16-bit, 44100 Hz by default)
    audio = AudioSegment(
        data=sample_bytes,
        sample_width=2,  # 2 bytes = 16 bits
        frame_rate=sample_rate,
        channels=1  # Mono audio
    )
    
    # Export the audio as an MP3 file
    audio.export(output_mp3_path, format="mp3")
    # Generate a 10-second MP3 from an image
image_to_sound("image.png", "output_sound.mp3", duration=10)

In [12]:
from PIL import Image
import numpy as np
from scipy.interpolate import interp1d
from scipy.fft import irfft
from pydub import AudioSegment
import os

def image_to_sound(image_path, output_mp3_path, duration=None, sample_rate=44100, default_samples_per_row=44):
    """
    Convert an image to a sound and save it as an MP3 file, using all three RGB channels.

    Parameters:
    - image_path: str, path to the input image file (e.g., 'image.jpg').
    - output_mp3_path: str, path to save the output MP3 file (e.g., 'output.mp3').
    - duration: float or None, desired duration of the audio in seconds (optional).
    - sample_rate: int, sample rate of the audio in Hz (default=44100).
    - default_samples_per_row: int, samples per row if duration is not specified (default=44).
    
    Raises:
    - ValueError: If the image has no pixels.
    - FileNotFoundError: If the image file cannot be found.
    """
    # Check if the image file exists
    if not os.path.exists(image_path):
        raise FileNotFoundError(f"Image file not found: {image_path}")
    
    # Load the image and convert to RGB
    img = Image.open(image_path).convert('RGB')
    img_array = np.array(img)  # Shape: (H, W, 3)
    H, W, _ = img_array.shape
    
    # Validate image dimensions
    if H == 0 or W == 0:
        raise ValueError("Image has no pixels")
    
    # Calculate samples per row
    if duration is not None:
        M = max(2, round(duration * sample_rate / H))
    else:
        M = default_samples_per_row  # Default to 44 samples per row (~1ms at 44100 Hz)
    
    K = M // 2 + 1  # Number of frequency bins for IRFFT
    
    # Divide frequency bins among R, G, B
    K1 = K // 3
    K2 = K // 3
    K3 = K - K1 - K2
    
    # Precompute interpolation points for resampling
    x = np.linspace(0, 1, W)
    x_new_R = np.linspace(0, 1, K1)
    x_new_G = np.linspace(0, 1, K2)
    x_new_B = np.linspace(0, 1, K3)
    
    audio_segments = []
    
    # Process each row of the image
    for i in range(H):
        # Extract and normalize RGB values for row i
        R_row = img_array[i, :, 0] / 255.0  # Red channel
        G_row = img_array[i, :, 1] / 255.0  # Green channel
        B_row = img_array[i, :, 2] / 255.0  # Blue channel
        
        # Resample each channel to its frequency bins
        interp_R = interp1d(x, R_row, kind='linear', bounds_error=False, fill_value=0)
        interp_G = interp1d(x, G_row, kind='linear', bounds_error=False, fill_value=0)
        interp_B = interp1d(x, B_row, kind='linear', bounds_error=False, fill_value=0)
        
        R_resampled = interp_R(x_new_R)
        G_resampled = interp_G(x_new_G)
        B_resampled = interp_B(x_new_B)
        
        # Construct the frequency spectrum
        S = np.zeros(K, dtype=np.float64)
        S[0:K1] = R_resampled          # Red frequencies
        S[K1:K1 + K2] = G_resampled    # Green frequencies
        S[K1 + K2:K] = B_resampled     # Blue frequencies
        
        # Generate time-domain signal using inverse real FFT
        s_i = irfft(S, n=M)
        audio_segments.append(s_i)
    
    # Concatenate all row signals into one waveform
    a = np.concatenate(audio_segments)
    
    # Normalize to prevent clipping
    max_amp = np.max(np.abs(a))
    if max_amp > 0:
        a = a / max_amp * 0.9  # Scale to 90% of maximum amplitude
    
    # Convert to 16-bit audio samples
    audio_samples = (a * 32767).astype(np.int16)
    
    # Create raw audio data
    sample_bytes = audio_samples.tobytes()
    
    # Create an audio segment (mono, 16-bit, 44100 Hz)
    audio = AudioSegment(
        data=sample_bytes,
        sample_width=2,  # 2 bytes = 16 bits
        frame_rate=sample_rate,
        channels=1  # Mono audio
    )
    
    # Export the audio as an MP3 file
    audio.export(output_mp3_path, format="mp3")
    
    # Print the actual duration
    actual_duration = len(audio_samples) / sample_rate
    print(f"Generated audio duration: {actual_duration:.2f} seconds")

# Example usage
if __name__ == "__main__":
    # With duration specified
    image_to_sound("image.png", "output_with_duration.mp3", duration=10)
    
    # Without duration (uses default samples per row)
    image_to_sound("/home/ramachandra/Pictures/Screenshot from 2025-03-07 13-28-46.png", "output_no_duration.mp3")

Generated audio duration: 10.00 seconds
Generated audio duration: 0.29 seconds
