# Document Classifcaiton using Open LLM models  

## Presteps 

- Pull required llam pretrained models 

!ollama pull phi4-mini gemma3:latest  mistral:7b

- create classifcaiton Key Value Context 
- Collection text files that are extracted from the original text 
  



In [1]:
!ollama list

NAME                ID              SIZE      MODIFIED          
llama3:latest       365c0bd3c000    4.7 GB    52 minutes ago       
gemma3:latest       a2af6cc3eb7f    3.3 GB    About an hour ago    
phi4-mini:latest    78fad5d182a7    2.5 GB    2 hours ago          
mistral:7b          f974a74358d6    4.1 GB    7 days ago           


In [2]:
import ollama 
import json 
import time 
import os


def getModelResponse(document_text, model="phi4-mini",classification_json_path='data/classification.json'):
    """
    Gets a classification response from the Ollama model using context from classification.json.
    Returns a dict with 'type' and 'score'.
    """
    with open(classification_json_path, 'r') as f:
        classification_context = json.load(f)
    
    context_str = "Document Types and Descriptions:\n"
    for doc_type, desc in classification_context.items():
        context_str += f"- {doc_type}: {desc}\n"
    
    prompt = (
        f"{context_str}\n"
        "Given the above document types, classify the following document and return only the type:\n\n"
        f"{document_text}"
    )

    response = ollama.chat(
        model=model,
        messages=[
            {'role': 'user', 'content': prompt},
        ]
    )
    doc_type = response['message']['content']

    return  doc_type


In [3]:
import time

files = ['file1.txt', 'file2.txt', 'file3.txt']
models = ['phi4-mini', 'llama3', 'gemma3','mistral:7b']
# models = ['llama3']


for file in files:
    file_path = f'data/{file}'
    with open(file_path, 'r') as file:
        content = file.read().strip()

        for model in models:
            start_time = time.time()
            doctype = getModelResponse( content, model)
            end_time = time.time()
            print(f"{model} Document Type: {doctype}  Execution Duration: {end_time - start_time:.2f} seconds")

        print(f"\t\t****** end o file process {file_path} *******\n")



phi4-mini Document Type: Resume  Execution Duration: 14.95 seconds
llama3 Document Type: Resume  Execution Duration: 34.15 seconds
gemma3 Document Type: Resume  Execution Duration: 0.45 seconds
mistral:7b Document Type:  Resume  Execution Duration: 0.48 seconds
		****** end o file process data/file1.txt *******

phi4-mini Document Type: Job Post  Execution Duration: 15.03 seconds
llama3 Document Type: Job Post  Execution Duration: 30.52 seconds
gemma3 Document Type: Job Post  Execution Duration: 8.28 seconds
mistral:7b Document Type:  Job Post  Execution Duration: 20.19 seconds
		****** end o file process data/file2.txt *******

phi4-mini Document Type: Job Post  Execution Duration: 13.90 seconds
llama3 Document Type: Letter  Execution Duration: 27.44 seconds
gemma3 Document Type: Letter  Execution Duration: 6.78 seconds
mistral:7b Document Type:  The provided document is a Job Post or Job Offer.  Execution Duration: 17.94 seconds
		****** end o file process data/file3.txt *******

