<a href="https://colab.research.google.com/github/FrustratedBoy420/Project-Exhibition/blob/main/Diabetic_Patient_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!git clone https://github.com/FrustratedBoy420/Project-Exhibition

fatal: destination path 'Project-Exhibition' already exists and is not an empty directory.


In [2]:
!pip install -U transformers accelerate bitsandbytes sentence-transformers faiss-cpu



In [3]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from google.colab import userdata # Corrected: Added this import

# Step 1: GPU availability check
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Step 2: Hugging Face token ko Colab secrets se access karo
# Corrected: Fetching the token securely
huggingface_token = userdata.get('HUGGING_FACE_TOKEN')

# Step 3: Model aur Tokenizer ko load karo
model_name = "mistralai/Mistral-7B-Instruct-v0.2"
tokenizer = AutoTokenizer.from_pretrained(model_name, token=huggingface_token) # Corrected: Added token
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    low_cpu_mem_usage=True, # Recommended for memory efficiency
    load_in_8bit=True, # Recommended for memory efficiency
    token=huggingface_token # Corrected: Added token
)

print("Mistral-7B model and tokenizer loaded successfully.")

Using device: cuda


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
`torch_dtype` is deprecated! Use `dtype` instead!
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

Mistral-7B model and tokenizer loaded successfully.


In [5]:
import json

file_path = "/content/Project-Exhibition/diabetes_knowledge_base (1).json"
chunks = []
with open(file_path, 'r', encoding='utf-8') as f:
    data = json.load(f)
    for item in data:
        if 'text' in item and item['text']:
            chunks.append(item['text'])

print(f"Data se {len(chunks)} chunks mil gaye.")

Data se 638 chunks mil gaye.


In [6]:
from sentence_transformers import SentenceTransformer

# Embedding model ko load karo
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

# Chunks ka embeddings banao
embeddings = embedding_model.encode(chunks, show_progress_bar=True)

print("Embeddings successfully created.")
print(f"Pehle chunk ka embedding shape: {embeddings[0].shape}")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/20 [00:00<?, ?it/s]

Embeddings successfully created.
Pehle chunk ka embedding shape: (384,)


In [7]:
import numpy as np
import faiss

# Embeddings ko float32 format mein convert karo, jo FAISS ke liye zaruri hai
embeddings_np = np.array(embeddings).astype('float32')
dimension = embeddings_np.shape[1]

# FAISS index banao (yeh ek index hai jo fast search mein madad karta hai)
index = faiss.IndexFlatL2(dimension)
index.add(embeddings_np)

# Index ko save karo taki dobara na banana pade
faiss.write_index(index, "diabetes_faiss_index.bin")

print("FAISS index successfully created and saved.")

FAISS index successfully created and saved.


In [8]:
# Yeh code tumhare existing code ke niche aayega
from google.colab import userdata

def get_recommendation_from_doc(user_query, chunks):
    # 1. User ke sawal ka embedding banao
    query_embedding = embedding_model.encode([user_query])
    query_embedding_np = np.array(query_embedding).astype('float32')

    # 2. Vector database mein sabse relevant chunks dhoondo (top 3)
    k = 3
    distances, indices = index.search(query_embedding_np, k)
    relevant_chunks = [chunks[i] for i in indices[0]]
    context = " ".join(relevant_chunks)

    # 3. Mistral-7B ke liye prompt banao
    prompt = f"""
    [INST]
    Based on the following medical guidelines from the American Diabetes Association, answer the user's question. Make sure your answer is helpful, accurate, and concise.

    Medical Guidelines: {context}

    User Question: {user_query}
    [/INST]
    """

    # 4. Model se jawab generate karvao
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            do_sample=True,
            temperature=0.7,
            top_p=0.95
        )
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    response_text = response.split("[/INST]")[-1].strip()
    return response_text

# Ab isko test karo!
user_query = "What are the physical activity recommendations for a diabetic patient?"
recommendation = get_recommendation_from_doc(user_query, chunks)

print("\n--- Model's Recommendation ---")
print(f"Tumhara sawal: {user_query}")
print(f"Model ka jawab: {recommendation}")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



--- Model's Recommendation ---
Tumhara sawal: What are the physical activity recommendations for a diabetic patient?
Model ka jawab: Based on the American Diabetes Association (ADA) guidelines, physical activity is recommended for all people with diabetes, including those with type 1 and type 2 diabetes. The ADA encourages a greater focus on increasing energy use through physical activity and reducing sedentary behavior. Regular exercise has been shown to improve cardiovascular fitness, muscle strength, insulin sensitivity, and overall well-being.

For adults with diabetes, the ADA recommends engaging in aerobic activity for at least 150 minutes per week. This can be achieved through various activities such as brisk walking, yoga, swimming, and dancing. Aerobic activity bouts should last at least 10 minutes, with the goal of at least 30 minutes per day or more on most days of the week.

Strength training exercises involving all major muscle groups should be performed at least two days