# 🚀 DeepSeek-R1 Model with Azure AI Inference 🧠

**DeepSeek-R1** is a state-of-the-art reasoning model combining reinforcement learning and supervised fine-tuning, excelling at complex reasoning tasks with 37B active parameters and 128K context window.

In this notebook, you'll learn to:
1. **Initialize** the ChatCompletionsClient for Azure serverless endpoints
2. **Chat** with DeepSeek-R1 using reasoning extraction
3. **Implement** a travel planning example with step-by-step reasoning
4. **Leverage** the 128K context window for complex scenarios

## Why DeepSeek-R1?
- **Advanced Reasoning**: Specializes in chain-of-thought problem solving
- **Massive Context**: 128K token window for detailed analysis
- **Efficient Architecture**: 37B active parameters from 671B total
- **Safety Integrated**: Built-in content filtering capabilities


## 1. Setup & Authentication

Required packages:
- `azure-ai-inference`: For chat completions
- `python-dotenv`: For environment variables

.env file requirements:
```bash
AZURE_INFERENCE_ENDPOINT=<your-endpoint-url>
AZURE_INFERENCE_KEY=<your-api-key>
MODEL_NAME=DeepSeek-R1
```

In [10]:
import os
import json
from pathlib import Path
import re
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential

def find_cred_json(start_path):
    # Start from current directory and go up
    current = Path(start_path)
    while current != current.parent:  # while we haven't hit the root
        cred_file = current / 'cred.json'
        if cred_file.exists():
            return str(cred_file)
        current = current.parent
    return None

try:
    # Search in the parent directory and its subdirectories
    parent_dir = os.path.dirname(os.getcwd())  # Get parent directory
    file_path = find_cred_json(parent_dir)

    if not file_path:
        raise FileNotFoundError("cred.json not found in parent directories")

    print(f"Found cred.json at: {file_path}")

    # Load and parse the JSON file
    with open(file_path, 'r') as f:
        loaded_config = json.load(f)
        
    print("Azure Inference Endpoint:", loaded_config['AZURE_INFERENCE_ENDPOINT'])
    print("Azure Inference Key:", "****" + loaded_config['AZURE_INFERENCE_KEY'][-4:])  # Print last 4 chars only for security
    print("DeepSeek-R1:", loaded_config['MODEL_NAME'])

    endpoint = loaded_config.get("AZURE_INFERENCE_ENDPOINT")
    key = loaded_config.get("AZURE_INFERENCE_KEY")
    model_name = loaded_config.get("MODEL_NAME", "DeepSeek-R1")
    print("Model name:", model_name)

    # Initialize client with model name in headers
    client = ChatCompletionsClient(
        endpoint=endpoint,
        credential=AzureKeyCredential(key),
        headers={
            "x-ms-model-mesh-model-name": model_name
        }
    )
    print("✅ Client initialized | Model:", client.get_model_info().model_name)

    # Test the client with a simple message
    messages = [
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="Hello! How are you?")
    ]
    
    response = client.complete(
        model=model_name,
        messages=messages
    )
    
    # Extract the response content
    if response and hasattr(response, 'choices') and response.choices:
        assistant_message = response.choices[0].message.content
        print("\nTest response:", assistant_message)
    else:
        print("\n❌ No response content received")

except FileNotFoundError as e:
    print(f"❌ Could not find file: {str(e)}")
except json.JSONDecodeError as e:
    print(f"❌ File exists but contains invalid JSON: {str(e)}")
except KeyError as e:
    print(f"❌ Missing required key in config file: {str(e)}")
except Exception as e:
    print(f"❌ Unexpected error: {str(e)}")

# Optional: Add an interactive chat loop
def chat_loop(client, model_name):
    conversation = [
        SystemMessage(content="You are a helpful assistant.")
    ]
    
    print("\nChat started (type 'exit' to end)")
    while True:
        user_input = input("\nYou: ")
        if user_input.lower() == 'exit':
            break
            
        conversation.append(UserMessage(content=user_input))
        
        try:
            response = client.complete(
                model=model_name,
                messages=conversation
            )
            
            if response and hasattr(response, 'choices') and response.choices:
                assistant_message = response.choices[0].message.content
                print(f"Assistant: {assistant_message}")
                conversation.append(SystemMessage(content=assistant_message))
            else:
                print("❌ No response received")
                
        except Exception as e:
            print(f"❌ Error: {str(e)}")

# Uncomment to enable interactive chat
# if 'client' in locals():
#     chat_loop(client, model_name)


Found cred.json at: d:\MLOps\Gen Ai & MLOps Masterclass\Materilas\test\ai-foundry-workshop\cred.json
Azure Inference Endpoint: https://ai-sarath5717ai038608807015.services.ai.azure.com/models
Azure Inference Key: ****UVBn
DeepSeek-R1: DeepSeek-R1
Model name: DeepSeek-R1
✅ Client initialized | Model: deepseek-r1

Test response: <think>
Okay, the user greeted me with "Hello! How are you?" I need to respond appropriately. Since I'm an AI, I don't have feelings, but I should acknowledge their greeting and ask how I can assist them. Keep it friendly and open-ended. Make sure to invite them to ask for help with anything they need. Maybe mention that I'm here to help with questions, information, or just chatting. Keep the tone positive and welcoming.
</think>

Hello! I'm just a virtual assistant, so I don't have feelings, but I'm here and ready to help you with whatever you need! How can I assist you today? Whether it's answering questions, brainstorming ideas, or just chatting, I'm all ears.

## 2. Intelligent Travel Planning ✈️

Demonstrate DeepSeek-R1's reasoning capabilities for trip planning:

In [11]:
def plan_trip_with_reasoning(query, show_thinking=False):
    """Get travel recommendations with reasoning extraction"""
    messages = [
        SystemMessage(content="You are a travel expert. Provide detailed plans with rationale."),
        UserMessage(content=f"{query} Include hidden gems and safety considerations.")
    ]
    
    response = client.complete(
        messages=messages,
        model=model_name,
        temperature=0.7,
        max_tokens=1024
    )
    
    content = response.choices[0].message.content
    
    # Extract reasoning if present
    if show_thinking:
        match = re.search(r"<think>(.*?)</think>(.*)", content, re.DOTALL)
        if match:
            return {"thinking": match.group(1).strip(), "answer": match.group(2).strip()}
    return content

# Example usage
query = "Plan a 5-day cultural trip to Kyoto in April"
result = plan_trip_with_reasoning(query, show_thinking=True)

print("🗺️ Query:", query)
if isinstance(result, dict):
    print("\n🧠 Thinking Process:", result["thinking"])
    print("\n📝 Final Answer:", result["answer"])
else:
    print("\n📝 Response:", result)

🗺️ Query: Plan a 5-day cultural trip to Kyoto in April

📝 Response: <think>
Okay, I need to plan a 5-day cultural trip to Kyoto in April. Let me start by recalling what I know about Kyoto. It's a city rich in history, temples, gardens, and traditional culture. April is cherry blossom season, so that's a big plus. The user wants hidden gems and safety considerations. 

First, I should outline the structure: 5 days, each day with a theme or area. Must include main attractions but also lesser-known spots. Safety tips for each day. Starting with arrival day, maybe Day 1 is arrival and central Kyoto. Then each subsequent day covers different areas.

Hidden gems: places not overrun by tourists. Maybe places like smaller temples, local neighborhoods, less crowded spots even during cherry blossom season. Examples I know: Tofuku-ji's hidden gardens, Shisen-do, Ohara area. Also, the Philosopher's Path might be crowded, but maybe suggest early morning visits. 

Safety considerations: April weathe

## 3. Technical Problem Solving 💻

Showcase coding/optimization capabilities:

In [14]:
def solve_technical_problem(problem):
    """Solve complex technical problems with structured reasoning"""
    response = client.complete(
        messages=[
            UserMessage(content=f"{problem} Please reason step by step, and put your final answer within \boxed{{}}.")
        ],
        model=model_name,
        temperature=0.3,
        max_tokens=2048
    )
    
    return response.choices[0].message.content

# Database optimization example
problem = """How can I optimize a PostgreSQL database handling 10k transactions/second?
Consider indexing strategies, hardware requirements, and query optimization."""

print("🔧 Problem:", problem)
print("\n⚙️ Solution:", solve_technical_problem(problem))

🔧 Problem: How can I optimize a PostgreSQL database handling 10k transactions/second?
Consider indexing strategies, hardware requirements, and query optimization.

⚙️ Solution: <think>
Okay, so I need to figure out how to optimize a PostgreSQL database that's handling 10k transactions per second. That's a pretty high load, so I need to make sure everything is tuned properly. Let me start by breaking down the problem into parts. The user mentioned indexing strategies, hardware requirements, and query optimization. I'll tackle each of these areas one by one, but I know they're all interconnected.

First, indexing. I remember that indexes can speed up read queries, but they can also slow down writes because every insert, update, or delete has to update the index. So for a high transaction rate, having the right indexes is crucial. But I shouldn't over-index. Maybe start by looking at the query patterns. If there are frequent queries on certain columns, those should be indexed. Composite i

## 4. Best Practices & Considerations

1. **Reasoning Handling**: Use regex to separate <think> content from final answers
2. **Safety**: Built-in content filtering - handle HttpResponseError for violations
3. **Performance**:
   - Max tokens: 4096
   - Rate limit: 200K tokens/minute
4. **Cost**: Pay-as-you-go with serverless deployment
5. **Streaming**: Implement response streaming for long completions

```python
# Streaming example
response = client.complete(..., stream=True)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")
```

## 🎯 Key Takeaways
- Leverage 128K context for detailed analysis
- Extract reasoning steps for debugging/analysis
- Combine with Azure AI Content Safety for production
- Monitor token usage via response.usage

> Always validate model outputs for critical applications!