<a href="https://colab.research.google.com/github/alchemistcohen/Theologos/blob/main/Theologos.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## "Theologos": A Conversational Agent for Christian Education

# Theologos is an AI-powered chatbot designed to provide interactive and personalized Christian education. It leverages a combination of cutting-edge technologies:

Large Language Models (LLMs): For natural language understanding, generation,
and engaging conversation.

Vector Databases: To efficiently store and retrieve relevant information from a vast corpus of theological texts, sermons, and teachings.

APIs: To integrate with external services like Bible translation APIs, hymn databases, and potentially even social media platforms for community engagement.

In [1]:
!pip install gradio transformers sentence-transformers

Collecting gradio
  Downloading gradio-5.12.0-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Downloading fastapi-0.115.6-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.5.4 (from gradio)
  Downloading gradio_client-1.5.4-py3-none-any.whl.metadata (7.1 kB)
Collecting markupsafe~=2.0 (from gradio)
  Downloading MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting ruff>=0.2.2 (from gradio)
  Downloading ruff-0.9.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.meta

In [3]:
!pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.9.0.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.4 kB)
Downloading faiss_cpu-1.9.0.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.5/27.5 MB[0m [31m39.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.9.0.post1


In [4]:
import os
import gradio as gr
from sentence_transformers import SentenceTransformer
import faiss
import re

In [6]:
def load_books(data_folder):
    documents = []
    for filename in os.listdir(data_folder):
        file_path = os.path.join(data_folder, filename)
        if filename.endswith(".txt") or filename.endswith(".xml"):
            with open(file_path, "r", encoding="utf-8") as f:
                content = f.read()

                chunks = re.split(r"\n\n+", content)
                for chunk in chunks:
                    if len(chunk.strip()) > 100:
                        documents.append({"source": filename, "text": chunk.strip()})
    return documents


data_folder = "/content/DATA BOOKS"
documents = load_books(data_folder)


In [7]:
model = SentenceTransformer('all-MiniLM-L6-v2')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [8]:
corpus_embeddings = model.encode([doc['text'] for doc in documents], convert_to_numpy=True)

In [9]:
index = faiss.IndexFlatL2(corpus_embeddings.shape[1])
index.add(corpus_embeddings)

In [10]:
def query_books(user_input):

    query_embedding = model.encode([user_input], convert_to_numpy=True)


    D, I = index.search(query_embedding, k=5)  # Top 5 results

    results = []
    for i in I[0]:
        result = documents[i]
        results.append(f"**{result['source']}**: {result['text']}")

    return "\n\n".join(results)


Using Gradio

In [11]:
with gr.Blocks() as theologos_app:
    gr.Markdown("# Theologos: Interactive Sacred Texts Bot")
    gr.Markdown("Ask a question or explore sacred texts from Christianity.")

    with gr.Row():
        user_query = gr.Textbox(label="Ask a Question:")
        search_button = gr.Button("Search")

    results_box = gr.Textbox(label="Results", lines=10, interactive=False)

    search_button.click(query_books, inputs=user_query, outputs=results_box)

In [12]:
theologos_app.launch()

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://04c099e322f89c4198.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


