### Data Ingestion Pipeline

In [3]:
# Document data structre
# Has 2 main components: page_content and metadata
from langchain_core.documents import Document

In [6]:
doc = Document(
    page_content="This is the content of the document to be used to create the RAG system.",
    metadata={
        "source": "example.txt",
        "page": 1,
        "author": "Chirag Bellara",
        "date_created": "2025-01-01",
    }
)
# Metadata can allow us to apply filters on the documents when we query the vector database later.
doc

Document(metadata={'source': 'example.txt', 'page': 1, 'author': 'Chirag Bellara', 'date_created': '2025-01-01'}, page_content='This is the content of the document to be used to create the RAG system.')

In [7]:
# Create a simple text file
import os
os.makedirs("../data/text_files", exist_ok=True)

In [8]:
sample_texts = {
    "../data/text_files/services.txt" : """What We Treat
Headaches
Frequent headaches or migraines can be exhausting. At Quantum Bodyworks, we target root causes like posture, neck tension, and stress with personalized care for lasting relief.

Vertigo
Feeling dizzy or off balance? At Quantum Bodyworks, our vestibular rehab retrains your balance, reduces dizziness, and restores confidence in movement.

Neck and Shoulder Pain
Struggling with neck or shoulder pain? At Quantum Bodyworks, we pinpoint the cause and provide targeted treatments to relieve pain and restore movement.

Low Back Pain (LBP) & Sciatica
Frequent headaches or migraines can be draining. At Quantum Bodyworks, we address posture, neck tension, and stress to provide lasting, personalized relief.

Hip and Knee Pain
Hip or knee pain making daily tasks hard? At Quantum Bodyworks, we design personalized rehab to restore mobility, strength, and confidence in movement.

Tennis & Golfer’s Elbow
Forearm pain when gripping or lifting? At Quantum Bodyworks, we treat tennis and golfer’s elbow with targeted therapy, exercises, and strengthening to restore function and prevent re-injury.

Ankle Instability & Sprains
Weak or unstable ankles? At Quantum Bodyworks, we improve stability, balance, and strength with a structured rehab plan to restore confidence and control.
Plantar Fasciitis
Heel pain slowing you down? At Quantum Bodyworks, we treat plantar fasciitis with therapy, stretching, and strengthening to relieve pain and restore active movement.

""",
    "../data/text_files/terms_conditions.txt" : """

Quantum Bodyworks Physical Therapy & Wellness

Effective Date: 
Applies To: All patients receiving services from Quantum Bodyworks or any affiliated provider.

1. Purpose of This Agreement

This Terms & Conditions Agreement outlines the expectations, responsibilities, rights, and policies that govern all services provided by Quantum Bodyworks. By scheduling an appointment, signing electronically, or receiving treatment, patients acknowledge and agree to the terms described below.

-----------------------------------

2. Scheduling, Cancellation & No-Show Policy

2.1 Notice Requirements

Quantum Bodyworks requests 24-hour advance notice for cancellations or rescheduling.
A minimum of 8 hours is required to avoid disrupting care schedules and availability.

2.2 No Fees at This Time

Because services are concierge-based and payment is collected after treatment, no cancellation or no-show fees are charged at this time.

2.3 Repeated Cancellations

Repeated last-minute cancellations or no-shows may result in:
	•	loss of preferred appointment times,
	•	temporary suspension of scheduling privileges, or
	•	requirement of pre-payment for future sessions.

2.4 Late Arrivals

If the therapist arrives and the patient is not able to begin within 10 minutes, the session may be shortened to maintain schedule integrity.

2.5 Therapist Arrival Window

Due to Houston traffic and home-based services, therapists may require a 15–20 minute arrival window and will notify patients of any delay.

-----------------------------------

3. Payment, Billing & Insurance Policies

3.1 Payment Methods

Quantum Bodyworks accepts:
	•	Credit Card
	•	Debit Card
	•	Zelle

3.2 Payment Timing
	•	Payment is due immediately after each treatment session, OR
	•	at the time of booking, if applicable.

3.3 Package Purchases

Packages or bundles are:
	•	non-refundable,
	•	non-transferable,
	•	valid only for the purchasing patient.

Invoices are issued only for package plan participants.

3.4 Insurance Disclaimer

Quantum Bodyworks is a cash-based, out-of-network provider.
We do not:
	•	bill insurance companies,
	•	submit claims,
	•	guarantee reimbursement.

Patients may request a superbill, but reimbursement is not assured.

3.5 Travel & Service Area

Quantum Bodyworks provides mobile PT services throughout most areas of Houston, without travel fees.
Quantum Bodyworks may decline appointments outside reasonable travel distances or in unsafe environments.

-----------------------------------

4. Privacy, Communication & HIPAA Compliance

4.1 Communication Methods

Patients consent to communication via:
	•	text message,
	•	phone call,
	•	email,
	•	secure EMR portal messaging.

4.2 Consent to Receive Communications

Patients agree to receive:
	•	appointment reminders,
	•	scheduling updates,
	•	follow-up care messages,
	•	non-marketing practice updates.

Patients may opt out at any time except for essential clinical communications.

4.3 Marketing Communications

Marketing/promotional messages require separate written consent.

4.4 Telehealth Services

Telehealth may be offered only after:
	•	an in-person initial evaluation, and
	•	3–5 in-person sessions,
unless clinically appropriate.

Telehealth is delivered through a HIPAA-compliant platform.

4.5 HIPAA & Data Privacy

Quantum Bodyworks complies with all federal and state privacy regulations.
Patient information will not be disclosed without written authorization except where required by law.

-----------------------------------

5. Code of Conduct, Safety & Right to Refuse Service

5.1 Patient Conduct

Patients must maintain respectful, appropriate behavior at all times.
Harassment, inappropriate conduct, or threatening behavior will result in immediate termination of the session and possible discontinuation of care.

5.2 Therapist Discretion to Refuse Treatment

Therapists may decline, postpone, or terminate treatment if:
	•	the patient behaves inappropriately or aggressively,
	•	the environment is unsafe,
	•	the patient is intoxicated or impaired,
	•	medical red flags appear,
	•	the patient cannot safely participate in treatment,
	•	the patient refuses to follow instructions.

5.3 Home Environment Safety Requirements

Patients agree to provide:
	•	a clean, open treatment area,
	•	safe flooring,
	•	adequate lighting,
	•	pets secured in another room,
	•	a smoke-free environment.

Unsafe conditions may cause a session to be paused or canceled.

5.4 Provider Illness or Emergency

If a therapist becomes ill, is exposed to illness, or encounters an emergency, appointments may be rescheduled for safety. No compensation is owed to the patient.

-----------------------------------

6. Liability Waiver & Assumption of Risk

6.1 Physical Therapy Risks

Patients understand that PT may involve physical exertion that can cause temporary soreness, fatigue, or discomfort.

6.2 Patient Responsibilities

Patients are responsible for:
	•	providing accurate medical history,
	•	reporting new symptoms,
	•	adhering to home exercise programs,
	•	following therapist instructions.

6.3 Limitation of Liability

Quantum Bodyworks is not liable for injuries resulting from:
	•	failure to disclose medical history,
	•	failure to follow instructions,
	•	unsupervised activity outside of recommended exercises,
	•	pre-existing conditions.

-----------------------------------

7. Informed Consent for Treatment

7.1 Treatment Interventions

Patients consent to clinically appropriate treatments including, but not limited to:
manual therapy, stretching, strengthening, movement training, balance activities, and neuromuscular re-education.

7.2 No Guarantees

Recovery timelines and outcomes vary. No guarantees are made regarding results.

7.3 Emergency Protocol

In a medical emergency, therapists are authorized to call 911.
Patients accept responsibility for any associated costs.

7.4 Acknowledgment of Understanding

By receiving care, patients confirm they have read, understood, and agreed to this full Terms & Conditions Agreement.

-----------------------------------

8. Media, Photo, and Video Consent (Separate Form)

Quantum Bodyworks may request permission to take photos/videos for educational or marketing purposes.
A separate signed consent form is required before any media is taken or shared.

-----------------------------------

9. Agreement & Signature

By booking an appointment or receiving services from Quantum Bodyworks, the patient acknowledges:
	•	They have read and understand all policies.
	•	They agree to all terms stated above.
	•	They provide informed consent for treatment.
	•	They understand this agreement applies to all present and future services by Quantum Bodyworks.
""",
}

for filepath, content in sample_texts.items():
    with open(filepath, "w") as f:
        f.write(content)
print("Sample text files created.")


Sample text files created.


In [None]:
# Loaders give you the information in the given text file in a Document object format
# For loading text files -> Text Loader
# For loading PDFs -> PyPDFLoader; PyMuPDFLoader
# PyMuPDFLoader -> gives you additional metadata by itself
from langchain_community.document_loaders import TextLoader
loader = TextLoader("../data/text_files/terms_conditions.txt", encoding="utf-8")
document = loader.load()
print(document)

[Document(metadata={'source': '../data/text_files/terms_conditions.txt'}, page_content='\n\nQuantum Bodyworks Physical Therapy & Wellness\n\nEffective Date: \nApplies To: All patients receiving services from Quantum Bodyworks or any affiliated provider.\n\n1. Purpose of This Agreement\n\nThis Terms & Conditions Agreement outlines the expectations, responsibilities, rights, and policies that govern all services provided by Quantum Bodyworks. By scheduling an appointment, signing electronically, or receiving treatment, patients acknowledge and agree to the terms described below.\n\n-----------------------------------\n\n2. Scheduling, Cancellation & No-Show Policy\n\n2.1 Notice Requirements\n\nQuantum Bodyworks requests 24-hour advance notice for cancellations or rescheduling.\nA minimum of 8 hours is required to avoid disrupting care schedules and availability.\n\n2.2 No Fees at This Time\n\nBecause services are concierge-based and payment is collected after treatment, no cancellation o

In [11]:
from langchain_community.document_loaders import DirectoryLoader
dir_loader = DirectoryLoader(
    path = "../data/text_files", 
    glob="**/*.txt", 
    loader_cls=TextLoader, 
    loader_kwargs={"encoding": "utf-8"}, 
    show_progress=True
)
documents = dir_loader.load()
documents


100%|██████████| 2/2 [00:00<00:00, 293.70it/s]


[Document(metadata={'source': '../data/text_files/services.txt'}, page_content='What We Treat\nHeadaches\nFrequent headaches or migraines can be exhausting. At Quantum Bodyworks, we target root causes like posture, neck tension, and stress with personalized care for lasting relief.\n\nVertigo\nFeeling dizzy or off balance? At Quantum Bodyworks, our vestibular rehab retrains your balance, reduces dizziness, and restores confidence in movement.\n\nNeck and Shoulder Pain\nStruggling with neck or shoulder pain? At Quantum Bodyworks, we pinpoint the cause and provide targeted treatments to relieve pain and restore movement.\n\nLow Back Pain (LBP) & Sciatica\nFrequent headaches or migraines can be draining. At Quantum Bodyworks, we address posture, neck tension, and stress to provide lasting, personalized relief.\n\nHip and Knee Pain\nHip or knee pain making daily tasks hard? At Quantum Bodyworks, we design personalized rehab to restore mobility, strength, and confidence in movement.\n\nTenn

### Chunking

In [15]:
import os
from pathlib import Path
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [16]:
# Read all the documents inside the directory
def process_all_files(directory: str, ext: str):
    """Process all the files in the given directory with the given extension"""
    all_documents = []
    root_dir = Path(directory)

    # Find all the files recursively
    files = list(root_dir.glob("**/*." + ext))
    print(f"Found {len(files)} files with the extension '{ext}' to process.")

    for file in files:
        print(f"\nProcessing: {file.name}")
        try:
            loader = TextLoader(str(file))
            documents = loader.load()

            for doc in documents:
                doc.metadata['source_file'] = file.name
                doc.metadata['file_type'] = ext

            all_documents.extend(documents)
            print(f"Loaded {len(documents)} pages!")
        
        except Exception as ex:
            print(f"\n\nError: {ex}")
    
    print(f"Total documents loaded: {len(all_documents)}")
    return all_documents

all_text_documents = process_all_files("../data", "txt")


Found 2 files with the extension 'txt' to process.

Processing: services.txt
Loaded 1 pages!

Processing: terms_conditions.txt
Loaded 1 pages!
Total documents loaded: 2
