<a href="https://colab.research.google.com/github/alio-elmotafy/ai-hero-project/blob/main/Day_3/Day_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [16]:
!pip install minsearch sentence-transformers tqdm python-frontmatter




In [17]:

import io
import zipfile
import requests
import frontmatter

def read_repo_data(repo_owner, repo_name):
    """
    Download and parse all markdown files from a GitHub repository.

    Args:
        repo_owner: GitHub username or organization
        repo_name: Repository name

    Returns:
        List of dictionaries containing file content and metadata
    """
    prefix = 'https://codeload.github.com'
    url = f'{prefix}/{repo_owner}/{repo_name}/zip/refs/heads/main'
    resp = requests.get(url)

    if resp.status_code != 200:
        raise Exception(f"Failed to download repository: {resp.status_code}")

    repository_data = []
    zf = zipfile.ZipFile(io.BytesIO(resp.content))

    for file_info in zf.infolist():
        filename = file_info.filename
        filename_lower = filename.lower()

        if not (filename_lower.endswith('.md')
            or filename_lower.endswith('.mdx')):
            continue

        try:
            with zf.open(file_info) as f_in:
                content = f_in.read().decode('utf-8', errors='ignore')
                post = frontmatter.loads(content)
                data = post.to_dict()
                data['filename'] = filename
                repository_data.append(data)
        except Exception as e:
            print(f"Error processing {filename}: {e}")
            continue

    zf.close()
    return repository_data


In [18]:
docs = read_repo_data('letta-ai', 'letta')

print(f"Flask documents: {len(docs)}")

Flask documents: 17


# Text Search (Lexical Search)

In [19]:
from minsearch import Index

faq_index = Index(
    text_fields=["question", "content"],
    keyword_fields=[]
)

faq_index.fit(docs)

<minsearch.minsearch.Index at 0x7dc2f1c6a750>

In [20]:
query = "Can I join the course now?"
text_results = faq_index.search(query)

text_results

[{'content': '# üöÄ How to Contribute to Letta\n\nThank you for investing time in contributing to our project! Here\'s a guide to get you started.\n\n## 1. üöÄ Getting Started\n\n### üç¥ Fork the Repository\n\nFirst things first, let\'s get you a personal copy of Letta to play with. Think of it as your very own playground. üé™\n\n1. Head over to the Letta repository on GitHub.\n2. In the upper-right corner, hit the \'Fork\' button.\n\n### üöÄ Clone the Repository\n\nNow, let\'s bring your new playground to your local machine.\n\n```shell\ngit clone https://github.com/your-username/letta.git\n```\n\n### üß© Install dependencies & configure environment\n\n#### Install uv and dependencies\n\nFirst, install uv using [the official instructions here](https://docs.astral.sh/uv/getting-started/installation/).\n\nOnce uv is installed, navigate to the letta directory and install the Letta project with uv:\n```shell\ncd letta\neval $(uv env activate)\nuv sync --all-extras\n```\n#### Setup P

# Vector Search

In [21]:
from sentence_transformers import SentenceTransformer
import numpy as np

embedding_model = SentenceTransformer("multi-qa-distilbert-cos-v1")

In [22]:
print(docs[0].keys())


dict_keys(['name', 'about', 'title', 'labels', 'assignees', 'content', 'filename'])


In [23]:
faq_embeddings = []

for d in docs:
    text_parts = []

    if "title" in d and d["title"]:
        text_parts.append(d["title"])

    if "about" in d and d["about"]:
        text_parts.append(d["about"])

    if "content" in d and d["content"]:
        text_parts.append(d["content"])

    text = " ".join(text_parts)

    emb = embedding_model.encode(text)
    faq_embeddings.append(emb)

faq_embeddings = np.array(faq_embeddings)

In [24]:
from minsearch import VectorSearch

faq_vindex = VectorSearch()
faq_vindex.fit(faq_embeddings, docs)


<minsearch.vector.VectorSearch at 0x7dc2f0d8ffb0>

In [25]:
query = "I just found out about the course. Can I enroll?"
q = embedding_model.encode(query)

vector_results = faq_vindex.search(q)
vector_results


[{'content': '# About\nThese certs are used to set up a localhost https connection to the ADE.\n\n## Instructions\n1. Install [mkcert](https://github.com/FiloSottile/mkcert)\n2. Run `mkcert -install`\n3. Run letta with the environment variable `LOCAL_HTTPS=true`\n4. Access the app at [https://app.letta.com/development-servers/local/dashboard](https://app.letta.com/development-servers/local/dashboard)\n5. Click "Add remote server" and enter `https://localhost:8283` as the URL, leave password blank unless you have secured your ADE with a password.',
  'filename': 'letta-main/certs/README.md'},
  'filename': 'letta-main/TERMS.md'},
 {'content': '# Letta + local LLMs\n\nSee [https://letta.readme.io/docs/local_llm](https://letta.readme.io/docs/local_llm) for documentation on running Letta with custom LLM backends.',
  'filename': 'letta-main/letta/local_llm/README.md'},
  'filename': 'letta-main/PRIVACY.md'},
 {'content': '# üöÄ How to Contribute to Letta\n\nThank you for investing time in

# Hybrid Search

In [26]:
def text_search(query):
    return faq_index.search(query, num_results=5)

def vector_search(query):
    q = embedding_model.encode(query)
    return faq_vindex.search(q, num_results=5)

def hybrid_search(query):
    text_results = text_search(query)
    vector_results = vector_search(query)

    seen_ids = set()
    final_results = []

    for r in text_results + vector_results:
        doc_id = r.get("filename")

        if doc_id not in seen_ids:
            seen_ids.add(doc_id)
            final_results.append(r)

    return final_results

In [27]:
query = "Can I enroll now?"
results = hybrid_search(query)

results


[{'content': '# üöÄ How to Contribute to Letta\n\nThank you for investing time in contributing to our project! Here\'s a guide to get you started.\n\n## 1. üöÄ Getting Started\n\n### üç¥ Fork the Repository\n\nFirst things first, let\'s get you a personal copy of Letta to play with. Think of it as your very own playground. üé™\n\n1. Head over to the Letta repository on GitHub.\n2. In the upper-right corner, hit the \'Fork\' button.\n\n### üöÄ Clone the Repository\n\nNow, let\'s bring your new playground to your local machine.\n\n```shell\ngit clone https://github.com/your-username/letta.git\n```\n\n### üß© Install dependencies & configure environment\n\n#### Install uv and dependencies\n\nFirst, install uv using [the official instructions here](https://docs.astral.sh/uv/getting-started/installation/).\n\nOnce uv is installed, navigate to the letta directory and install the Letta project with uv:\n```shell\ncd letta\neval $(uv env activate)\nuv sync --all-extras\n```\n#### Setup P

In [28]:
for i, r in enumerate(results[:3], 1):
    print(f"\nüîπ Result {i}")
    print("File:", r["filename"])
    print("Preview:")
    print(r["content"][:300])


üîπ Result 1
File: letta-main/CONTRIBUTING.md
Preview:
# üöÄ How to Contribute to Letta

Thank you for investing time in contributing to our project! Here's a guide to get you started.

## 1. üöÄ Getting Started

### üç¥ Fork the Repository

First things first, let's get you a personal copy of Letta to play with. Think of it as your very own playground. üé™

1.

üîπ Result 2
File: letta-main/.github/pull_request_template.md
Preview:
**Please describe the purpose of this pull request.**
Is it to add a new feature? Is it to fix a bug?

**How to test**
How can we test your PR during review? What commands should we run? What outcomes should we expect?

**Have you tested this PR?**
Have you tested the latest commit on the PR? If so 

üîπ Result 3
File: letta-main/letta/plugins/README.md
Preview:
### Plugins

Plugins enable plug and play for various components.

Plugin configurations can be set in `letta.settings.settings`.

The plugins will take a delimited list of consisting of ind