### Google Colab Document:

### Phase 2: Data Processing & Indexing

Phase 2.1 — Text Extraction from Word (DOCX)

In [2]:
# Install library
!pip install python-docx

from docx import Document

# Path to your DOCX file (upload to Colab or set local path in VS Code)
docx_path = "/content/DataBase - The 6-Step Personal Finance Reset - Book.docx"

# Open DOCX
doc = Document(docx_path)

# Extract text
full_text = ""
for para in doc.paragraphs:
    if para.text.strip():  # Skip empty paragraphs
        full_text += para.text.strip() + "\n\n"

# Save extracted text
with open("/content/finance_book_raw.txt", "w", encoding="utf-8") as f:
    f.write(full_text)

print("✅ Text extracted. Sample preview:")
print(full_text[:1000])  # Show first 1000 characters


✅ Text extracted. Sample preview:
The 6-Step Personal Finance Reset

A Clear Roadmap: Diagnose, Reduce Chaos, Take Control, Increase Income, Save, Stabilize

By Sergey Krichevskiy

Copyright © 2025 by Sergey Krichevskiy
All rights reserved. No part of this book may be reproduced or transmitted
in any form or by any means, electronic or mechanical, including photocopying,
recording, or by any information storage and retrieval system,
without written permission from the publisher.

Cover and interior design by AI-assisted publishing
First Edition – 2025

Introduction

“Why Money Feels So Hard — and How to Make It Simple”
By Sergey Krichevskiy

If you’re holding this book, chances are — you’re tired.

Not just physically. You’re tired of the uncertainty. Of working hard, doing your best, trying to be smart with money — but still feeling like it slips through your fingers. The income comes in, and somehow... it’s never enough. It leaks. It disappears. It turns into debt, or anxiety, or exh

Phase 2.2 — Text Cleaning (Preprocessing)

In [3]:
# Load raw text
with open("/content/finance_book_raw.txt", "r", encoding="utf-8") as f:
    text = f.read()

# 1. Remove extra spaces
import re
text = re.sub(r'[ \t]+', ' ', text)  # multiple spaces -> single space

# 2. Remove multiple empty lines (keep max 1)
text = re.sub(r'\n\s*\n+', '\n\n', text)

# 3. Remove stray non-printable characters
text = re.sub(r'[^\x20-\x7E\n]', '', text)

# 4. Strip leading/trailing spaces on each line
text = "\n".join([line.strip() for line in text.splitlines()])

# Save cleaned text
with open("/content/finance_book_clean.txt", "w", encoding="utf-8") as f:
    f.write(text)

print("✅ Cleaning completed. Sample:")
print(text[:1000])


✅ Cleaning completed. Sample:
The 6-Step Personal Finance Reset

A Clear Roadmap: Diagnose, Reduce Chaos, Take Control, Increase Income, Save, Stabilize

By Sergey Krichevskiy

Copyright  2025 by Sergey Krichevskiy
All rights reserved. No part of this book may be reproduced or transmitted
in any form or by any means, electronic or mechanical, including photocopying,
recording, or by any information storage and retrieval system,
without written permission from the publisher.

Cover and interior design by AI-assisted publishing
First Edition  2025

Introduction

Why Money Feels So Hard  and How to Make It Simple
By Sergey Krichevskiy

If youre holding this book, chances are  youre tired.

Not just physically. Youre tired of the uncertainty. Of working hard, doing your best, trying to be smart with money  but still feeling like it slips through your fingers. The income comes in, and somehow... its never enough. It leaks. It disappears. It turns into debt, or anxiety, or exhaustion.

And t

Phase 2.3 — Semantic Chunking