---

# Summary and Best Practices

## Key Takeaways

1. **OpenAI API**: Powerful for text generation, function calling, and embeddings
2. **Image Generation**: Use DALL-E or Stable Diffusion for creating images from text
3. **Word Embeddings**: Convert text to vectors for semantic understanding
4. **RAG**: Combine retrieval with generation for accurate, context-aware responses
5. **Weaviate**: Scalable vector database with built-in RAG capabilities

## Best Practices

### Security
- Never hardcode API keys
- Use environment variables or secret management systems
- Implement rate limiting and usage monitoring

### Performance
- Cache embeddings to avoid redundant API calls
- Batch API requests when possible
- Use streaming for real-time applications

### RAG Optimization
- Chunk documents appropriately (500-1000 tokens)
- Use hybrid search for better retrieval
- Implement re-ranking for improved relevance
- Add metadata for filtering

### Cost Management
- Choose appropriate models (GPT-3.5 vs GPT-4)
- Monitor token usage
- Use smaller embedding models when possible
- Implement caching strategies

## Additional Resources

- [OpenAI Documentation](https://platform.openai.com/docs)
- [Weaviate Documentation](https://weaviate.io/developers/weaviate)
- [LangChain Documentation](https://python.langchain.com/)
- [Hugging Face Transformers](https://huggingface.co/docs/transformers)

---

**Happy Learning! ðŸš€**

In [None]:
# Complete RAG system using Weaviate with functions

def weaviate_rag_ask(client, question, collection_name="Article", num_results=3):
    """
    Ask a question and get an answer with sources using Weaviate
    
    Args:
        client: Weaviate client instance
        question: The question to ask
        collection_name: Name of the collection to search
        num_results: Number of relevant documents to retrieve
    
    Returns:
        Dictionary with 'answer' and 'sources'
    """
    try:
        articles = client.collections.get(collection_name)
        
        # Perform generative search
        response = articles.generate.near_text(
            query=question,
            limit=num_results,
            grouped_task=f"Answer this question: {question}\n\nBased on the following articles, provide a comprehensive answer. Cite specific information from the articles."
        )
        
        result = {
            'answer': response.generated,
            'sources': []
        }
        
        for item in response.objects:
            result['sources'].append({
                'title': item.properties.get('title', 'N/A'),
                'category': item.properties.get('category', 'N/A'),
                'content': item.properties.get('content', '')[:200] + '...'
            })
        
        return result
    
    except Exception as e:
        return {
            'answer': f"Error: {e}",
            'sources': []
        }

# Usage example (requires Weaviate instance)
"""
result = weaviate_rag_ask(client, "What is machine learning and how does it work?")

print(f"Question: What is machine learning and how does it work?\n")
print(f"Answer: {result['answer']}\n")
print("Sources:")
for i, source in enumerate(result['sources'], 1):
    print(f"{i}. {source['title']} ({source['category']})")
"""

print("Weaviate RAG function created successfully!")

## 5.10 Complete RAG with Weaviate

Building a complete RAG application using functions:

In [None]:
# Filtering
"""
from weaviate.classes.query import Filter

# Search with filter
response = articles.query.near_text(
    query="programming",
    limit=5,
    filters=Filter.by_property("category").equal("Programming"),
    return_properties=["title", "category"]
)

print("Filtered Search Results:")
for item in response.objects:
    print(f"- {item.properties['title']} ({item.properties['category']})")
"""

# Aggregations
"""
# Count objects by category
response = articles.aggregate.over_all(
    group_by="category"
)

print("\nArticles by Category:")
for group in response.groups:
    print(f"{group.grouped_by.value}: {group.total_count}")
"""

print("Filtering and aggregation examples provided above (commented out)")

## 5.9 Filtering and Aggregations

Apply filters and perform aggregations:

In [None]:
# Generative search - RAG with Weaviate
"""
response = articles.generate.near_text(
    query="machine learning",
    limit=2,
    single_prompt="Explain this article in simple terms: {content}"
)

print("Generative Search Results (RAG):")
print("=" * 70)
for item in response.objects:
    print(f"\nTitle: {item.properties['title']}")
    print(f"Generated Summary: {item.generated}")
"""

print("Generative search example provided above (commented out)")

## 5.8 Generative Search (RAG with Weaviate)

Use Weaviate's built-in RAG capabilities:

In [None]:
# Hybrid search (combines keyword and vector search)
"""
from weaviate.classes.query import HybridFusion

response = articles.query.hybrid(
    query="Python programming",
    limit=2,
    alpha=0.5,  # 0 = pure keyword, 1 = pure vector, 0.5 = balanced
    fusion_type=HybridFusion.RELATIVE_SCORE,
    return_properties=["title", "category", "content"]
)

print("Hybrid Search Results:")
print("=" * 70)
for item in response.objects:
    print(f"\nTitle: {item.properties['title']}")
    print(f"Category: {item.properties['category']}")
"""

print("Hybrid search example provided above (commented out)")

## 5.7 Hybrid Search

Combine keyword search with vector search:

In [None]:
# Semantic search
"""
articles = client.collections.get("Article")

# Search for articles about AI and learning
response = articles.query.near_text(
    query="artificial intelligence and learning algorithms",
    limit=2,
    return_properties=["title", "content", "category"]
)

print("Search Results:")
print("=" * 70)
for item in response.objects:
    print(f"\nTitle: {item.properties['title']}")
    print(f"Category: {item.properties['category']}")
    print(f"Content: {item.properties['content'][:100]}...")
"""

print("Semantic search example provided above (commented out)")

## 5.6 Semantic Search

Search for similar content using vector similarity:

In [None]:
from datetime import datetime

# Sample articles
sample_articles = [
    {
        "title": "Introduction to Machine Learning",
        "content": "Machine learning is a subset of artificial intelligence that focuses on building systems that can learn from data. It involves training algorithms on datasets to make predictions or decisions without being explicitly programmed.",
        "category": "AI",
        "published_date": datetime(2024, 1, 15).isoformat()
    },
    {
        "title": "Understanding Neural Networks",
        "content": "Neural networks are computational models inspired by the human brain. They consist of layers of interconnected nodes that process information and learn patterns from data through training.",
        "category": "Deep Learning",
        "published_date": datetime(2024, 2, 20).isoformat()
    },
    {
        "title": "Python Programming Best Practices",
        "content": "Python is known for its readability and simplicity. Following PEP 8 style guide, writing docstrings, and using type hints are essential practices for maintainable code.",
        "category": "Programming",
        "published_date": datetime(2024, 3, 10).isoformat()
    }
]

# Insert data
"""
articles = client.collections.get("Article")

for article in sample_articles:
    articles.data.insert(
        properties=article
    )
print(f"Inserted {len(sample_articles)} articles")
"""

print("Data insertion example provided above (commented out)")

## 5.5 Adding Data

Insert data into your Weaviate collection:

In [None]:
from weaviate.classes.config import Configure, Property, DataType

# Create a collection for articles
"""
try:
    # Delete collection if it exists
    if client.collections.exists("Article"):
        client.collections.delete("Article")
    
    # Create new collection
    articles = client.collections.create(
        name="Article",
        properties=[
            Property(
                name="title",
                data_type=DataType.TEXT,
                description="Title of the article"
            ),
            Property(
                name="content",
                data_type=DataType.TEXT,
                description="Content of the article"
            ),
            Property(
                name="category",
                data_type=DataType.TEXT,
                description="Category of the article"
            ),
            Property(
                name="published_date",
                data_type=DataType.DATE,
                description="Publication date"
            )
        ],
        # Configure vectorizer (use OpenAI embeddings)
        vectorizer_config=Configure.Vectorizer.text2vec_openai(),
        # Configure generative module (use OpenAI for generation)
        generative_config=Configure.Generative.openai()
    )
    print("Collection 'Article' created successfully!")
except Exception as e:
    print(f"Error creating collection: {e}")
"""

print("Schema creation example provided above (commented out)")

## 5.4 Creating a Schema

Define a collection (class) to store your data:

In [None]:
import weaviate
from weaviate.classes.init import Auth

# Connect to Weaviate Cloud Services
"""
client = weaviate.connect_to_wcs(
    cluster_url="your-cluster-url.weaviate.network",
    auth_credentials=Auth.api_key("your-api-key"),
    headers={
        "X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY")
    }
)
print("Connected to Weaviate!")
"""

print("Weaviate connection example provided above (commented out)")

## 5.3 Connecting to Weaviate

You can use Weaviate Cloud Services (WCS):

To get started, sign up at [Weaviate Cloud Services](https://console.weaviate.cloud/) for a free cluster.

In [None]:
# Install Weaviate client
# !pip install weaviate-client

## 5.2 Installation and Setup

---

# 5. Weaviate Vector Database

Weaviate is an open-source vector database designed for storing and searching embeddings at scale.

## 5.1 What is Weaviate?

- **Vector Database**: Stores high-dimensional vectors (embeddings)
- **Semantic Search**: Find similar items using vector similarity
- **Hybrid Search**: Combine keyword and vector search
- **GraphQL API**: Query data using GraphQL
- **Modules**: Built-in integrations with OpenAI, Cohere, Hugging Face, etc.

In [None]:
# !pip install langchain langchain-openai chromadb

"""
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA

# Load and split documents
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

documents = ["Your long documents here..."]
texts = text_splitter.create_documents(documents)

# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(texts, embeddings)

# Create RAG chain
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3})
)

# Ask questions
response = qa_chain.invoke("Your question here")
print(response['result'])
"""

print("LangChain RAG example provided above (commented out)")

## 4.4 Advanced RAG with LangChain

LangChain provides powerful tools for building RAG applications:

In [None]:
# Ask a question using the RAG system
question = "What is deep learning and how does it relate to neural networks?"

result = generate_rag_response(question, knowledge_base, document_embeddings)

print(f"Question: {question}\n")
print("=" * 70)
print(f"\nAnswer:\n{result['answer']}\n")
print("=" * 70)
print("\nSources:")
for i, source in enumerate(result['sources'], 1):
    print(f"{i}. (Score: {source['score']:.4f}) {source['document']}")

## 4.3 Using the RAG System

Now let's ask questions using our RAG system:

In [None]:
# Simple RAG implementation using functions

# Example knowledge base
knowledge_base = [
    "Python was created by Guido van Rossum and first released in 1991. It emphasizes code readability with significant whitespace.",
    "Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed.",
    "Neural networks are computing systems inspired by biological neural networks. They consist of interconnected nodes (neurons) organized in layers.",
    "Deep learning uses neural networks with multiple layers to progressively extract higher-level features from raw input.",
    "Natural Language Processing (NLP) is a branch of AI that helps computers understand, interpret and manipulate human language.",
    "Computer vision is a field of AI that trains computers to interpret and understand the visual world using digital images and videos.",
    "Reinforcement learning is an area of machine learning where an agent learns to make decisions by performing actions and receiving rewards or penalties."
]

# Index the documents (create embeddings)
print("Indexing documents...")
document_embeddings = [get_embedding(doc) for doc in knowledge_base]
print(f"Indexed {len(knowledge_base)} documents")

def retrieve_documents(query, documents, doc_embeddings, top_k=3):
    """
    Retrieve most relevant documents for a query
    
    Args:
        query: Search query string
        documents: List of document texts
        doc_embeddings: List of document embeddings
        top_k: Number of top documents to retrieve
    
    Returns:
        List of dictionaries with 'document' and 'score'
    """
    query_embedding = get_embedding(query)
    similarities = cosine_similarity([query_embedding], doc_embeddings)[0]
    top_indices = np.argsort(similarities)[::-1][:top_k]
    
    results = []
    for idx in top_indices:
        results.append({
            'document': documents[idx],
            'score': similarities[idx]
        })
    return results

def generate_rag_response(query, documents, doc_embeddings, top_k=3):
    """
    Generate response using RAG approach
    
    Args:
        query: User's question
        documents: List of document texts
        doc_embeddings: List of document embeddings
        top_k: Number of documents to retrieve
    
    Returns:
        Dictionary with 'answer' and 'sources'
    """
    # Retrieve relevant documents
    retrieved_docs = retrieve_documents(query, documents, doc_embeddings, top_k)
    
    # Build context from retrieved documents
    context = "\n\n".join([doc['document'] for doc in retrieved_docs])
    
    # Create augmented prompt
    augmented_prompt = f"""Answer the question based on the context below.

Context:
{context}

Question: {query}

Answer:"""
    
    # Generate response
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant that answers questions based on the provided context."},
            {"role": "user", "content": augmented_prompt}
        ],
        temperature=0.3
    )
    
    return {
        'answer': response.choices[0].message.content,
        'sources': retrieved_docs
    }

## 4.2 Simple RAG Implementation

Let's build a basic RAG system using functions:

---

# 4. RAG (Retrieval Augmented Generation)

RAG combines retrieval systems with language models to provide accurate, context-aware responses.

## 4.1 RAG Architecture

1. **Index**: Store documents as embeddings in a vector database
2. **Retrieve**: Find relevant documents for a query
3. **Augment**: Add retrieved context to the prompt
4. **Generate**: Use LLM to generate response with context

In [None]:
# Using Sentence Transformers (Hugging Face)
# !pip install sentence-transformers

"""
from sentence_transformers import SentenceTransformer

# Load pre-trained model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate embeddings
sentences = ['This is a test sentence', 'This is another sentence']
embeddings = model.encode(sentences)

print(f"Shape: {embeddings.shape}")
print(f"Embedding: {embeddings[0][:5]}")
"""

print("Sentence Transformers example provided above (commented out)")

## 3.4 Other Embedding Models

Beyond OpenAI, there are other popular embedding models:

In [None]:
# Document database
documents = [
    "Python is a high-level programming language",
    "Machine learning uses statistical techniques",
    "Neural networks are inspired by the human brain",
    "Data science involves analyzing large datasets",
    "JavaScript is used for web development",
    "Deep learning is a subset of machine learning"
]

# Get embeddings for all documents
doc_embeddings = [get_embedding(doc) for doc in documents]

# Search query
query = "Tell me about AI and neural networks"
query_embedding = get_embedding(query)

# Calculate similarities
similarities = cosine_similarity([query_embedding], doc_embeddings)[0]

# Get top 3 results
top_indices = np.argsort(similarities)[::-1][:3]

print(f"Query: '{query}'\n")
print("Most relevant documents:")
for i, idx in enumerate(top_indices, 1):
    print(f"{i}. (Score: {similarities[idx]:.4f}) {documents[idx]}")

## 3.3 Semantic Search

Find the most relevant documents for a query:

In [None]:
# Calculate cosine similarity between embeddings
similarity_matrix = cosine_similarity(embeddings)

print("Similarity Matrix:")
print("=" * 70)
for i, text1 in enumerate(texts):
    print(f"\n'{text1}':")
    for j, text2 in enumerate(texts):
        if i != j:
            print(f"  vs '{text2}': {similarity_matrix[i][j]:.4f}")

## 3.2 Measuring Similarity

Use cosine similarity to find how similar texts are:

In [None]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Generate embeddings using OpenAI
def get_embedding(text, model="text-embedding-3-small"):
    """Get embedding for a text string"""
    text = text.replace("\n", " ")
    response = client.embeddings.create(input=[text], model=model)
    return response.data[0].embedding

# Example texts
texts = [
    "The cat sits on the mat",
    "A feline rests on the rug",
    "Dogs are loyal animals",
    "Machine learning is fascinating"
]

# Get embeddings
embeddings = [get_embedding(text) for text in texts]

print(f"Embedding dimension: {len(embeddings[0])}")
print(f"First few values of embedding 1: {embeddings[0][:5]}")

---

# 3. Word Embeddings

Word embeddings are dense vector representations of text that capture semantic meaning.

## 3.1 Understanding Embeddings

Embeddings convert text into numerical vectors where similar meanings have similar vectors.

In [None]:
# Create variation of an existing image
"""
# First, download and prepare your image
response = client.images.create_variation(
    image=open("path/to/image.png", "rb"),
    n=2,
    size="1024x1024"
)

for idx, img_data in enumerate(response.data):
    print(f"Variation {idx + 1}: {img_data.url}")
"""

# Image editing with mask
"""
response = client.images.edit(
    model="dall-e-2",
    image=open("original.png", "rb"),
    mask=open("mask.png", "rb"),
    prompt="A sunlit indoor lounge with a pool",
    n=1,
    size="1024x1024"
)

print(f"Edited image: {response.data[0].url}")
"""

print("Image editing examples provided above (commented out)")

## 2.3 Image Editing and Variations

Edit existing images or create variations:

In [None]:
# !pip install stability-sdk

# Example using Stability AI API
"""
import requests

API_KEY = os.getenv("STABILITY_API_KEY")

response = requests.post(
    "https://api.stability.ai/v1/generation/stable-diffusion-xl-1024-v1-0/text-to-image",
    headers={
        "Content-Type": "application/json",
        "Accept": "application/json",
        "Authorization": f"Bearer {API_KEY}"
    },
    json={
        "text_prompts": [
            {
                "text": "A futuristic cityscape with flying cars",
                "weight": 1
            }
        ],
        "cfg_scale": 7,
        "height": 1024,
        "width": 1024,
        "samples": 1,
        "steps": 30,
    },
)

if response.status_code == 200:
    data = response.json()
    # Process image data
    print("Image generated successfully!")
else:
    print(f"Error: {response.status_code}")
"""

print("Stability AI example code provided above (commented out)")

## 2.2 Stability AI (Stable Diffusion)

Alternative image generation API:

In [None]:
# Generate an image with DALL-E
response = client.images.generate(
    model="dall-e-3",
    prompt="A serene mountain landscape at sunset, with a lake reflecting the colorful sky, digital art style",
    size="1024x1024",
    quality="standard",
    n=1
)

image_url = response.data[0].url
print(f"Generated image URL: {image_url}")

# Display the image
from IPython.display import Image, display
display(Image(url=image_url))

---

# 2. Image Generation APIs

## 2.1 OpenAI DALL-E

Generate images from text descriptions:

In [None]:
import json

# Define a function
def get_weather(location, unit="celsius"):
    """Get the current weather for a location"""
    # This is a mock function
    return {
        "location": location,
        "temperature": 22,
        "unit": unit,
        "description": "Sunny"
    }

# Define function schema for OpenAI
functions = [
    {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"]
                }
            },
            "required": ["location"]
        }
    }
]

# Make a request that should trigger function calling
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "What's the weather like in Paris?"}
    ],
    functions=functions,
    function_call="auto"
)

message = response.choices[0].message

if message.function_call:
    function_name = message.function_call.name
    function_args = json.loads(message.function_call.arguments)
    
    print(f"AI wants to call: {function_name}")
    print(f"With arguments: {function_args}")
    
    # Execute the function
    if function_name == "get_weather":
        result = get_weather(**function_args)
        print(f"\nFunction result: {result}")

## 1.3 Function Calling

Allow the AI to call functions with structured outputs:

In [None]:
# Streaming response
print("Streaming response:")
stream = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "Write a haiku about Python programming."}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

## 1.2 Streaming Responses

For real-time responses (useful for chatbots):

In [None]:
# Basic chat completion
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant that explains complex topics simply."},
        {"role": "user", "content": "Explain machine learning in 2 sentences."}
    ],
    temperature=0.7,
    max_tokens=150
)

print("AI Response:")
print(response.choices[0].message.content)
print(f"\nTokens used: {response.usage.total_tokens}")

## 1.1 Chat Completions

The most common use case - having a conversation with the AI:

In [None]:
import os
from openai import OpenAI
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Initialize the OpenAI client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

print("OpenAI client initialized successfully!")

## Basic Setup

Store your API key in a `.env` file (never commit this to version control!):
```
OPENAI_API_KEY=your-api-key-here
```

In [None]:
# Install required packages
# !pip install openai python-dotenv

# 1. Using OpenAI API

The OpenAI API provides access to powerful language models like GPT-4, GPT-3.5, and more.

## Installation and Setup

First, install the OpenAI Python library:

# Generative AI Tutorial

This comprehensive tutorial covers essential GenAI technologies and tools:
1. OpenAI API
2. Image Generation APIs
3. Word Embeddings
4. RAG (Retrieval Augmented Generation)
5. Weaviate Vector Database

---