# Nosana AI Inference with OpenAI SDK

## Introduction

This tutorial shows how to use the OpenAI SDK to connect directly to AI models deployed on [Nosana](https://nosana.com)'s GPU network. Nosana services expose OpenAI-compatible endpoints, making it easy to integrate with existing AI applications.

## What You'll Learn

- Connect OpenAI SDK to Nosana AI endpoints
- Generate text with different parameters
- Use streaming for real-time responses
- Process multiple requests efficiently
- Build practical AI workflows

## Prerequisites

- Python 3.8+
- Basic understanding of AI APIs
- A deployed Nosana AI service URL

## Setup & Installation

In [None]:
# Un comment to install required packages
# !pip install openai requests pillow matplotlib python-dotenv -q

In [11]:
from openai import OpenAI
import requests
import base64
import json
from PIL import Image
import matplotlib.pyplot as plt
from io import BytesIO
import time
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

True

## Connect to Nosana AI Service

The Nosana service URL is loaded from your `.env` file. Most Nosana deployments expose OpenAI-compatible APIs at `/v1`.

In [12]:
# Load Nosana service URL from environment variables
NOSANA_BASE_URL = os.getenv("NOSANA_BASE_URL")
MODEL_NAME = "DeepSeek-R1-Distill-Qwen-1.5B"  # Replace with your model name

if not NOSANA_BASE_URL:
    raise ValueError("NOSANA_BASE_URL not found in environment variables. Please check your .env file.")

# Initialize OpenAI client with Nosana endpoint
client = OpenAI(
    base_url=f"{NOSANA_BASE_URL}/v1",
    api_key="nosana-key"  # Many Nosana services don't require real API keys
)

print(f"🚀 Connected to Nosana AI service")
print(f"📍 Endpoint: {NOSANA_BASE_URL}/v1")
print(f"🤖 Model: {MODEL_NAME}")

🚀 Connected to Nosana AI service
📍 Endpoint: https://4oetidyuynh82uhbxwfmgmkyniw3fvyrz92eqtkwj6yb.node.k8s.prd.nos.ci//v1
🤖 Model: DeepSeek-R1-Distill-Qwen-1.5B


In [3]:
# Check available models (optional)
try:
    models = client.models.list()
    print("\n📋 Available models:")
    for model in models.data:
        print(f"  • {model.id}")
except Exception as e:
    print(f"\n⚠️ Could not list models: {e}")
    print("This is normal for some Nosana deployments.")


📋 Available models:
  • DeepSeek-R1-Distill-Qwen-1.5B


## Basic Text Generation

In [4]:
# Simple text generation
def generate_text(prompt, max_tokens=300, temperature=0.7, stream=False):
    try:
        response = client.chat.completions.create(
            model=MODEL_NAME,
            messages=[
                {"role": "user", "content": prompt}
            ],
            max_tokens=max_tokens,
            temperature=temperature,
            stream=stream
        )
        
        if stream:
            return response  # Return generator for streaming
        else:
            return response.choices[0].message.content
    except Exception as e:
        return f"Error: {str(e)}"

# Test basic generation
test_prompt = "Explain what Nosana is in simple terms."
print(f"🧪 Testing: {test_prompt}")
print("\n📝 Response:")
response = generate_text(test_prompt)
print(response)

🧪 Testing: Explain what Nosana is in simple terms.

📝 Response:
Okay, so I need to explain what Nosana is in simple terms. Hmm, I'm not exactly sure what Nosana is, but I've heard the name before. Maybe it's a video game? I'll try to break it down.

First, I remember that Nosana has something to do with space exploration or maybe something related to space travel. It's probably a game where you play as a crew member. So maybe it's like a role-playing game where you're part of a space mission.

I think about the crew members. They would probably have to deal with things like spacewalks, repairs, and maybe even friendly or hostile alien creatures. That makes sense because in space, you have to be prepared for various situations.

I also recall that there's a lot of technology involved. Maybe it's like a spaceship that you pilot, but the spaceship is actually made up of different parts. Each part could represent different systems or systems within the spaceship. That sounds a bit complica

## Streaming Responses

In [5]:
# Streaming for real-time output
def stream_response(prompt, max_tokens=500, temperature=0.7):
    print(f"📝 Streaming response for: {prompt}")
    print("\n" + "=" * 60)
    
    try:
        stream = client.chat.completions.create(
            model=MODEL_NAME,
            messages=[{"role": "user", "content": prompt}],
            max_tokens=max_tokens,
            temperature=temperature,
            stream=True
        )
        
        full_response = ""
        for chunk in stream:
            if chunk.choices[0].delta.content:
                content = chunk.choices[0].delta.content
                full_response += content
                print(content, end="", flush=True)
        
        print("\n" + "=" * 60)
        print("✅ Streaming complete!\n")
        return full_response
        
    except Exception as e:
        print(f"\nError: {str(e)}")
        return None

# Example streaming
stream_prompt = "Write a detailed explanation of how blockchain technology works"
streamed_text = stream_response(stream_prompt, max_tokens=600, temperature=0.5)

📝 Streaming response for: Write a detailed explanation of how blockchain technology works

Okay, so I need to explain how blockchain technology works in detail. I remember that blockchain is the technology used in Bitcoin, but I'm a bit fuzzy on the exact steps. Let me try to break it down.

First, I think blockchain is a distributed ledger, right? It's like a digital document that's secure and tamper-proof. But how does it work exactly? I remember something about nodes, which are computers or other blockchain systems. So, nodes exchange blocks of data, which are like records of transactions. But how does this exchange happen?

I think it's through something called a proof-of-work mechanism. That means each node has to do some computational work to validate the block. So, the more work you do, the more you get paid. That makes sense because it adds a layer of security since it's hard to cheat.

Wait, but how does the proof-of-work work in detail? I think it involves creating a hash, wh

## Conversation Context & Memory

In [7]:
# Multi-turn conversation with context
def have_conversation(messages, new_message, max_tokens=300, temperature=0.7):
    # Add new message to conversation
    messages.append({"role": "user", "content": new_message})
    
    try:
        response = client.chat.completions.create(
            model=MODEL_NAME,
            messages=messages,
            max_tokens=max_tokens,
            temperature=temperature
        )
        
        assistant_message = response.choices[0].message.content
        messages.append({"role": "assistant", "content": assistant_message})
        
        return assistant_message, messages
    except Exception as e:
        return f"Error: {str(e)}", messages

# Start a conversation about AI
conversation = []

print("💬 Multi-turn Conversation Example:\n")

# Turn 1
response1, conversation = have_conversation(
    conversation, 
    "Hi! Can you explain what machine learning is?"
)
print("👤 User: Hi! Can you explain what machine learning is?")
print(f"🤖 Assistant: {response1}\n")

# Turn 2
response2, conversation = have_conversation(
    conversation, 
    "That's interesting! Can you give me a real-world example?"
)
print("👤 User: That's interesting! Can you give me a real-world example?")
print(f"🤖 Assistant: {response2}\n")

# Turn 3
response3, conversation = have_conversation(
    conversation, 
    "How does that relate to what you explained earlier?"
)
print("👤 User: How does that relate to what you explained earlier?")
print(f"🤖 Assistant: {response3}\n")

print(f"📊 Conversation length: {len(conversation)} messages")

💬 Multi-turn Conversation Example:

👤 User: Hi! Can you explain what machine learning is?
🤖 Assistant: Okay, so I need to explain what machine learning is. Hmm, where do I start? I've heard about it a lot, but I'm not exactly sure what it really is. Let me think. I know it has something to do with computers learning from data, but I'm not entirely clear on the specifics.

Maybe I should break it down. From what I remember, machine learning is a type of artificial intelligence, right? So AI is about machines doing tasks like speech recognition or computer vision, but I think machine learning is a subset of that. But how exactly does it work? Is it about algorithms?

I think there are different types of machine learning, like supervised, unsupervised, and reinforcement learning. Wait, what's the difference between them? I remember supervised learning involves labeled data, so the model knows the correct answers and learns from them. Unsupervised is when there's no labeled data, and it fi

## Batch Processing

In [8]:
# Process multiple texts efficiently
def batch_process(texts, task_instruction, max_tokens=200, temperature=0.3):
    results = []
    
    for i, text in enumerate(texts, 1):
        print(f"📄 Processing {i}/{len(texts)}: {text[:50]}...")
        
        prompt = f"{task_instruction}\n\nText: {text}"
        
        try:
            response = client.chat.completions.create(
                model=MODEL_NAME,
                messages=[{"role": "user", "content": prompt}],
                max_tokens=max_tokens,
                temperature=temperature
            )
            
            result = response.choices[0].message.content
            results.append({
                "index": i,
                "original": text,
                "processed": result
            })
            
        except Exception as e:
            results.append({
                "index": i,
                "original": text,
                "error": str(e)
            })
    
    return results

# Example: Summarize multiple articles
articles = [
    "Nosana is building a decentralized GPU network that makes AI compute more accessible and affordable. The platform allows anyone to rent GPU power from a distributed network of providers, reducing costs and increasing availability for AI developers.",
    
    "Blockchain technology has evolved beyond cryptocurrencies to enable new forms of decentralized computing. Smart contracts can automatically manage resource allocation, payments, and service level agreements in distributed networks.",
    
    "The demand for GPU computing has exploded with the rise of AI applications. Traditional cloud providers often have limited availability and high costs, creating opportunities for decentralized alternatives that can utilize idle hardware from various sources.",
    
    "Decentralized infrastructure offers several advantages including censorship resistance, reduced single points of failure, and more competitive pricing through open markets. These benefits are particularly valuable for AI workloads that require significant computational resources."
]

task = "Summarize this text in one clear sentence:"

print(f"🔄 Batch Processing: Summarizing {len(articles)} articles\n")
summaries = batch_process(articles, task, max_tokens=100, temperature=0.2)

print("\n📊 Results:")
print("=" * 80)
for summary in summaries:
    if "error" in summary:
        print(f"\n{summary['index']}. Error: {summary['error']}")
    else:
        print(f"\n{summary['index']}. Original: {summary['original'][:80]}...")
        print(f"    Summary: {summary['processed']}")

🔄 Batch Processing: Summarizing 4 articles

📄 Processing 1/4: Nosana is building a decentralized GPU network tha...
📄 Processing 2/4: Blockchain technology has evolved beyond cryptocur...
📄 Processing 3/4: The demand for GPU computing has exploded with the...
📄 Processing 4/4: Decentralized infrastructure offers several advant...

📊 Results:

1. Original: Nosana is building a decentralized GPU network that makes AI compute more access...
    Summary: Okay, so I need to summarize the given text into one clear sentence. Let me read through the text again to make sure I understand it properly.

The text says, "Nosana is building a decentralized GPU network that makes AI compute more accessible and affordable. The platform allows anyone to rent GPU power from a distributed network of providers, reducing costs and increasing availability for AI developers."

Alright, so the main points are:

1. Nosana is creating a decentralized GPU network.
2. This network aims

2. Original: Blockchain tec

## Best Practices & Tips

### Optimizing Requests
- **Temperature Settings**: Use 0.1-0.3 for factual tasks, 0.7-0.9 for creative tasks
- **Token Limits**: Set appropriate max_tokens to avoid unnecessary costs
- **Batch Processing**: Group similar requests to maximize efficiency
- **Caching**: Store responses for repeated queries

### Performance Tips
- Use streaming for long responses to improve perceived speed
- Monitor response times and adjust timeout settings
- Consider prompt length impact on processing time
- Test different model configurations for your use case

## Summary

You've learned how to:
- ✅ **Connect OpenAI SDK to Nosana endpoints** for seamless integration
- ✅ **Generate text with various parameters** for different use cases
- ✅ **Use streaming responses** for better user experience
- ✅ **Maintain conversation context** across multiple turns
- ✅ **Process batches efficiently** for high-volume applications
- ✅ **Build complete workflows** for real-world applications

🚀 **You're now ready to build powerful AI applications using Nosana's GPU network with the familiar OpenAI SDK!**