<a href="https://colab.research.google.com/github/Ravikrishnan05/PrediscanMedtech_project/blob/main/fastapi_for_backend.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# %%
# RUN THIS CELL FIRST - The GitHub issue solution
# This installs the known-good, compatible versions of all libraries.

!pip install "unsloth[colab-new]==2025.5.7"
!pip install "unsloth-zoo==2025.5.8"
!pip install "transformers==4.51.3"
!pip install fastapi uvicorn "python-multipart<0.0.7" pyngrok



In [2]:
# %%
# AFTER RESTARTING, RUN THIS CELL DIRECTLY

import torch
from unsloth import FastLanguageModel
from transformers import AutoProcessor

# Verification
import unsloth
print("✅ Unsloth and its dependencies are loaded correctly!")
print("Unsloth version:", unsloth.__version__)
print("PyTorch version:", torch.__version__)
if torch.cuda.is_available():
    # Let's also check torchvision, which we will install
    try:
        import torchvision
        print("Torchvision version:", torchvision.__version__)
    except ImportError:
        print("Installing compatible torchvision...")
        !pip install torchvision torchaudio
        import torchvision
        print("Torchvision version:", torchvision.__version__)



model_id = "google/medgemma-4b-it"

# The correct dtype for a T4 GPU
model_id = "google/medgemma-4b-it"
dtype = torch.float16 # Correct for T4 GPU

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_id, max_seq_length=2048, dtype=dtype, load_in_4bit=True,
)
processor = AutoProcessor.from_pretrained(model_id)

print("\n✅ Model loaded with compatible library versions.")

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
✅ Unsloth and its dependencies are loaded correctly!
Unsloth version: 2025.5.7
PyTorch version: 2.7.0+cu126
Torchvision version: 0.22.0+cu126
==((====))==  Unsloth 2025.5.7: Fast Gemma3 patching. Transformers: 4.51.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.7.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.3.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.30. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Using float16 precision for gemma3 won't work! Using float32.


Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.



✅ Model loaded with compatible library versions.


In [8]:
# %%
# --- THE FINAL, CLEAN, AND CORRECT CELL ---

# ===================================================================
# COMPONENT 1: IMPORTS (The Toolbox)
# ===================================================================
# All the libraries our script needs to function.

# --- Web Server & Asynchronous Tools ---
import uvicorn              # The high-performance server that will run our FastAPI app.
import threading            # Allows us to run the server in a background thread, so it doesn't block our notebook.
import time                 # Used for `time.sleep(5)` to give the server a moment to start up.
import nest_asyncio         # A crucial patch that allows Uvicorn's async event loop to run inside a notebook's already-running event loop.

# --- Networking & API Tools ---
import requests             # The standard library for making HTTP requests (we use it to test our own API).
from pyngrok import ngrok   # The Python client for ngrok, the service that creates a public URL to our local server.
from fastapi import FastAPI, File, UploadFile, Form  # The core components of our web framework.
from fastapi.responses import JSONResponse           # A special response class for sending back data in JSON format.
from fastapi.middleware.cors import CORSMiddleware   # Middleware to handle Cross-Origin Resource Sharing (tells browsers it's safe to use our API from other websites).

# --- Data & Image Handling Tools ---
import os                   # Used for `os.path.exists()` to check if our test image file is present.
import io                   # Used to treat the uploaded image's raw bytes as a file-like object.
from PIL import Image       # The Python Imaging Library (Pillow), used to open and process the image.

# --- AI & Debugging Tools ---
# `torch`, `FastLanguageModel`, `AutoProcessor` are already imported and loaded from the previous cell.
import traceback            # A powerful debugging tool. `traceback.format_exc()` prints the full, detailed error message if something crashes.

# ===================================================================
# COMPONENT 2: THE FASTAPI APPLICATION (The Service Logic)
# ===================================================================
# This defines the API itself: its endpoints, what they expect, and what they do.

# Create the main application object.
app = FastAPI()

# Add the CORS middleware. `allow_origins=["*"]` is a "wide open" setting for testing.
app.add_middleware(CORSMiddleware, allow_origins=["*"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"])

# --- Endpoint 1: The Health Check ---
# `@app.get("/")` tells FastAPI to create an endpoint that responds to GET requests at the root URL (e.g., https://your-url.ngrok-free.app/).
# This is a simple way to check if the server is alive and reachable.
@app.get("/")
def read_root():
    return {"message": "✅ API is working!"}

# --- Endpoint 2: The AI Generator ---
# `@app.post("/generate")` creates our main endpoint. It only responds to POST requests.
# `async def` makes it an asynchronous function, which is efficient for waiting on I/O (like file uploads).
# The function parameters define what data the endpoint expects:
# - `prompt: str = Form(...)`: Expects a form field named "prompt" containing a string.
# - `image: UploadFile = File(...)`: Expects a file upload named "image".
@app.post("/generate")
async def generate(prompt: str = Form(...), image: UploadFile = File(...)):
    # This `try...except` block is our "Safety Net". If any code inside `try` crashes,
    # the `except` block will run instead of the whole server crashing.
    try:
        # STEP 1: Get the raw data from the request. `await` pauses until the upload is complete.
        image_bytes = await image.read()

        # STEP 2: Process the data. We use Pillow to open the image from its bytes and convert it to standard RGB format.
        pil_image = Image.open(io.BytesIO(image_bytes)).convert("RGB")

        # STEP 3: Format the prompt for the model using the specific structure it was trained on.
        messages = [{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": prompt}]}]
        prompt_text = processor.apply_chat_template(messages, add_generation_prompt=True)

        # STEP 4: Tokenize everything. The processor turns the text and the PIL image into numbers (tensors).
        # This is where the magic happens.
        inputs = processor(text=prompt_text, images=pil_image, return_tensors="pt")

        # STEP 5: Ensure data types match. This is the fix for the T4 GPU. It takes all the generated tensors
        # and converts them to `float16` to match the model's `dtype`.
        inputs = inputs.to(model.device, dtype=model.dtype)

        # STEP 6: The core AI task. We use the standard `model.generate` function.
        # This works now because our libraries (Unsloth, Transformers) are the correct, compatible versions.
        # The `**inputs` syntax automatically unpacks the dictionary into the required arguments (input_ids, attention_mask, etc.).
        generated_ids = model.generate(
            **inputs,
            max_new_tokens=512,
            pad_token_id=tokenizer.pad_token_id,
        )

        # STEP 7: Decode the numeric output from the model back into human-readable text.
        generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)

        # STEP 8: Clean up the response. The model's output often includes the original prompt, so we split the string
        # and take only the part after "Assistant: ".
        response_text = generated_texts[0].split("Assistant: ")[-1].strip()

        # STEP 9: Send the successful result back to the client as a JSON object.
        return JSONResponse(content={"response": response_text})

    # THE DEBUGGER'S BEST FRIEND: If any line in the `try` block fails, this code will run.
    except Exception as e:
        # This prints the full, detailed Python error traceback to your notebook's output.
        # This is how you find out *what* crashed and *where*.
        print(traceback.format_exc())

        # This sends a clean "500 Internal Server Error" message back to the client, so they know something went wrong.
        return JSONResponse(status_code=500, content={"error": str(e)})

# ===================================================================
# COMPONENT 3: THE SERVER LAUNCHER (The Engine)
# ===================================================================
# This block handles the logic of starting our server in the background.

# A simple function that contains the command to run the server. This is what the background thread will execute.
def run_server():
    uvicorn.run(app, port=8000)

# The "Zombie Killer" for ngrok. This command programmatically kills any lingering ngrok processes
# from previous runs, preventing the "1 simultaneous session" error.
ngrok.kill()

# Set your personal ngrok token.
ngrok.set_auth_token("2yH8AJ7M9rJNbLfQoRANIuEudom_5Ppznh7REV7PvB51AaZXe")

# Apply the patch needed for uvicorn to work inside a notebook.
nest_asyncio.apply()

# Connect to ngrok's service. This creates the public URL that tunnels to our local port 8000.
public_url = ngrok.connect(8000)
print(f"🌐 Public API URL: {public_url}")

# Create and start the background thread that runs our `run_server` function.
server_thread = threading.Thread(target=run_server)
server_thread.daemon = True # Ensures the thread dies when the main script ends.
server_thread.start()

# Pause the main script for 5 seconds to give the server time to fully initialize.
time.sleep(5)
print("✅ Server should be running!")

# ===================================================================
# COMPONENT 4: THE LOCALHOST TEST SCRIPT (Quality Assurance)
# ===================================================================
# This part acts as our automated tester to verify the API is working.

print("\n" + "="*50 + "\n🚀 LAUNCHING LOCALHOST TEST\n" + "="*50)

# We use "http://127.0.0.1:8000" to test the server directly, bypassing ngrok and the internet.
# This helps isolate problems: if this test works but the public URL fails, the issue is with ngrok.
local_api_url = "http://127.0.0.1:8000"
image_path = "/content/1.png" # The path to our test image.

# A safety check to make sure the test image has been uploaded.
if not os.path.exists(image_path):
    print(f"❌ Error: Test image '{image_path}' not found. Please upload it.")
else:
    # Prepare the data for the POST request.
    endpoint = f"{local_api_url}/generate"
    data = {"prompt": "What are the key findings in this fundus image"}
    files = {"image": (image_path, open(image_path, "rb"), "image/png")}

    print("🚀 Sending request directly to localhost...")
    try:
        # This is the action: send the POST request. The script will wait here until it gets a response.
        response = requests.post(endpoint, data=data, files=files, timeout=300)

        # This is another safety check. If the server returned an error (like 404 or 500), this line will
        # raise an exception and cause the `except` block to run.
        response.raise_for_status()

        # If we reach this line, it means we got a successful "200 OK" response.
        print("\n🏆🏆🏆 FINAL VICTORY! API RESPONSE: 🏆🏆🏆")
        print(response.json()) # Print the JSON content of the successful response.

    except requests.exceptions.RequestException as e:
        print(f"\n❌ An error occurred during the local request: {e}")

🌐 Public API URL: NgrokTunnel: "https://9871-34-124-178-222.ngrok-free.app" -> "http://localhost:8000"


INFO:     Started server process [5533]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)


✅ Server should be running!

🚀 LAUNCHING LOCALHOST TEST
🚀 Sending request directly to localhost...
INFO:     127.0.0.1:51394 - "POST /generate HTTP/1.1" 200 OK

🏆🏆🏆 FINAL VICTORY! API RESPONSE: 🏆🏆🏆
{'response': 'user\n\n\n\n\nWhat are the key findings in this fundus image\nmodel\nBased on the image, the key findings are:\n\n*   **Widespread Retinal Detachment:** The retina appears to be detached, with a clear separation between the retina and the underlying retinal tissue.\n*   **Severe Retinal Edema:** There is significant fluid accumulation within the retina, causing it to appear blurry and swollen.\n*   **Vitreous Hemorrhage:** Blood is present in the vitreous humor, the clear gel that fills the space between the lens and the retina. This indicates bleeding into the eye.\n*   **Macular Hemorrhage:** There is likely bleeding in the macula, the central part of the retina responsible for sharp, central vision.\n*   **Cotton Wool Spots:** Small, white or yellowish patches in the retina,

In [None]:
#The following are kill switches to stop the server running(As sometimes dead server might run(zombie servers)).

In [6]:
# %%
# RUN THIS CELL FIRST
# It will find and forcefully terminate any process using port 8000.
!lsof -t -i:8000 | xargs -r kill -9
print("✅ Any process on port 8000 has been terminated.")

✅ Any process on port 8000 has been terminated.


In [None]:
# %%
# Find the process using port 8000
#!lsof -i :8000

COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
python3 1828 root  104u  IPv4  86894      0t0  TCP localhost:8000 (LISTEN)
python3 1828 root  106u  IPv4  87015      0t0  TCP localhost:54562->localhost:8000 (CLOSE_WAIT)


In [None]:
# %%
# Replace <PID> with the number from the command above
#!kill -9 289