unzip the training data

In [4]:
#init ollama
!pip install ollama


Collecting ollama
  Obtaining dependency information for ollama from https://files.pythonhosted.org/packages/6a/ca/d22905ac3f768523f778189d38c9c6cd9edf4fa9dd09cb5a3fc57b184f90/ollama-0.3.3-py3-none-any.whl.metadata
  Downloading ollama-0.3.3-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx<0.28.0,>=0.27.0 (from ollama)
  Obtaining dependency information for httpx<0.28.0,>=0.27.0 from https://files.pythonhosted.org/packages/56/95/9377bcb415797e44274b51d46e3249eba641711cf3348050f76ee7b15ffc/httpx-0.27.2-py3-none-any.whl.metadata
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting anyio (from httpx<0.28.0,>=0.27.0->ollama)
  Obtaining dependency information for anyio from https://files.pythonhosted.org/packages/9e/ef/7a4f225581a0d7886ea28359179cb861d7fbcdefad29663fc1167b86f69f/anyio-4.6.0-py3-none-any.whl.metadata
  Downloading anyio-4.6.0-py3-none-any.whl.metadata (4.6 kB)
Collecting certifi (from httpx<0.28.0,>=0.27.0->ollama)
  Obtaining dependency information f

In [19]:
import zipfile
with zipfile.ZipFile("Veda.zip","r") as zip_ref:
    zip_ref.extractall("Veda")

In [21]:
#Load ollama with llama3.2
modelUsed = "llama3.2:1b"
import ollama
ollama.pull(modelUsed)

{'status': 'success'}

Now send the data to the ollama and generate a csv file with the formated data for training

In [22]:
!pip install pandas
!pip install requests


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [23]:
import os
import json
import requests
import pandas as pd
from typing import List, Dict




In [24]:

stream = ollama.chat(
    model=modelUsed,
    messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
    stream=True,
)

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)

The sky appears blue to us because of a phenomenon called Rayleigh scattering, named after the British physicist Lord Rayleigh. He discovered that when sunlight enters Earth's atmosphere, it encounters tiny molecules of gases such as nitrogen and oxygen.

These gas molecules are much smaller than the wavelength of light, so they scatter the light in all directions. But they scatter shorter wavelengths of light, like blue and violet, more than longer wavelengths, like red and orange. This is why the sky typically appears blue during the daytime.

The shorter wavelength blue light is also responsible for the green color of sunset and sunrise, as it has a longer path through the atmosphere, scattering off more particles in the air and giving those colors their characteristic hues.

It's worth noting that at night, when there are fewer gas molecules in the atmosphere, the sky appears black. This is because all the wavelengths of light are scattered in the same way, so there is no color to 

In [25]:
import os
import json
from typing import List, Dict
import re
from ollama import Client

def read_veda_files(directory: str) -> List[Dict[str, str]]:
    veda_data = []
    for filename in os.listdir(directory):
        if filename.endswith(".txt"):
            with open(os.path.join(directory, filename), 'r', encoding='utf-8') as file:
                lines = file.readlines()
                if lines:
                    veda_data.append({
                        'filename': filename,
                        'description': lines[0].strip(),
                        'shlokas': ''.join(lines[1:]).strip()
                    })
    return veda_data

def extract_fields(text: str) -> Dict[str, str]:
    fields = {}
    patterns = {
        'input': r'"input"\s*:\s*"(.+?)"',
        'response': r'"response"\s*:\s*"(.+?)"'
    }
    
    for field, pattern in patterns.items():
        match = re.search(pattern, text, re.DOTALL)
        if match:
            fields[field] = match.group(1).strip()
        else:
            fields[field] = ""
    
    return fields

def generate_training_data(veda_data: List[Dict[str, str]], model: str = modelUsed) -> List[Dict[str, str]]:
    client = Client()  # Using default local connection
    
    training_data = []
    
    for entry in veda_data:
        prompt = f"""
Based on the following Veda content, generate a question related to everyday life and provide an answer based on the wisdom in the text. Include a relevant Sanskrit quote, its English meaning, and the source reference within the response. Format the output as JSON with "input" and "response" fields.

Veda content:
Filename: {entry['filename']}
Description: {entry['description']}
Shlokas: {entry['shlokas']}

Example format:
{{
  "input": "How can I find peace in a stressful situation?",
  "response": "According to the Veda, one way to find peace in a stressful situation is through the practice of meditation and self-reflection. This wisdom is beautifully captured in the following Sanskrit verse from the Bhagavad Gita (Chapter 2, Verse 48):

'योगस्थः कुरु कर्माणि सङ्गं त्यक्त्वा धनंजय।'

Which means:

'Established in Yoga, perform actions abandoning attachment, O Dhananjaya (Arjuna).'

This verse teaches us that by maintaining a state of mental equilibrium (yoga) and performing our duties without attachment to the results, we can find inner peace even in the midst of stressful situations. The practice of detachment allows us to navigate challenges with a calm and focused mind."
}}

Generate one such entry based on the given Veda content:
"""

        try:
            response = client.generate(model=model, prompt=prompt)
            result = extract_fields(response['response'])
            if all(result.values()):
                training_data.append(result)
                print(result)
                print("----")
            else:
                print(f"Incomplete data for file {entry['filename']}")
        except Exception as e:
            print(f"Error with file {entry['filename']}: {str(e)}")
    
    return training_data

def format_chatgpt_data(training_data: List[Dict[str, str]]) -> List[Dict[str, str]]:
    chatgpt_format = []
    for item in training_data:
        chatgpt_format.extend([
            {"role": "system", "content": "You are an assistant well-versed in Vedic wisdom, capable of applying it to everyday life situations."},
            {"role": "user", "content": item['input']},
            {"role": "assistant", "content": item['response']}
        ])
    return chatgpt_format

def main():
    veda_directory = "Veda/Veda/"
    veda_data = read_veda_files(veda_directory)
    
    print("Generating training data...")
    training_data = generate_training_data(veda_data, model=modelUsed)
    
    chatgpt_data = format_chatgpt_data(training_data)
    
    # Save the training data to a JSON Lines file
    output_file = "veda_training_data.json"
    with open(output_file, 'w', encoding='utf-8') as f:
        for item in chatgpt_data:
            json.dump(item, f, ensure_ascii=False)
            f.write('\n')
    print(f"\nTraining data saved to {output_file}")
    
    # Display a sample of the generated data
    print("\nSample of generated training data:")
    for item in chatgpt_data[:6]:  # Show 2 complete exchanges
        print(json.dumps(item, ensure_ascii=False))

if __name__ == "__main__":
    main()

Generating training data...
{'input': 'How can I cultivate mindfulness in my daily routine?', 'response': "According to the Veda, cultivating mindfulness involves becoming aware of the present moment through meditation and self-reflection. This wisdom is beautifully captured in the following Sanskrit verse from the Bhagavad Gita (Chapter 3, Verse 27):\n\n'अत्यंत योगस्थः कर्माणि सङ्गं त्यक्त्वा धनंजय।'\n\nWhich means:\n\n'Established in Yoga, perform actions abandoning attachment, O Dhananjaya (Arjuna).'\n\nThis verse teaches us that by practicing mindfulness and detachment, we can cultivate a state of mental equilibrium and find inner peace in our daily routines."}
----
{'input': 'How can I cultivate self-awareness to navigate challenges effectively?', 'response': "According to the Bhagavad Gita (Chapter 3, Verse 24), one way to cultivate self-awareness is through introspection and recognizing our thoughts and emotions. This wisdom is beautifully captured in the following Sanskrit vers