## Midio docuemntation Faiss

In [None]:
import os
import pandas as pd
import json
from langchain.vectorstores import FAISS
from langchain_ollama import OllamaEmbeddings
from langchain.schema import Document

# Load JSONL dataset into Pandas DataFrame
main_dataset_folder = '../../data/'
df = pd.read_json(main_dataset_folder + 'rag_dataset.jsonl', lines=True)
# print(df.head())  # Print first 5 rows

# Convert DataFrame to dictionary format
code_docs = df.to_dict(orient="records")
include_files = [
    "the-midio-language.md", 
    "overview.md", 
    "technical-details.midio", 
    "loops.midio",
    "map-filter-reduce"
]  # Add more variations if needed
code_docs = [doc for doc in code_docs if os.path.basename(doc["file"]).upper() in [f.upper() for f in include_files]]


print(f"✅ Loaded {len(code_docs)} documentation chunks.")

# Initialize Ollama embeddings
embeddings = OllamaEmbeddings(model="nomic-embed-text")
faiss_docs = [Document(page_content=doc["content"], metadata={"file": doc["file"], "chunk_id": doc["chunk_id"]}) for doc in code_docs]

vectorstore = FAISS.from_documents(faiss_docs, embeddings)
vectorstore.save_local("faiss_index") # Save FAISS index locally
print("✅ FAISS index saved successfully.")

# Perform similarity search
query = "Create a function that loops thorugh a list of numbers and returns the sum of all numbers."
retrieved_docs = vectorstore.similarity_search(query, k=10)
for doc in retrieved_docs:
    print(f"\n\n📄 {doc.metadata['file']} (Chunk ID: {doc.metadata['chunk_id']})")
    print(doc.page_content)


✅ Loaded 26 documentation chunks.
✅ FAISS index saved successfully.


📄 the-midio-language.md (Chunk ID: 6)
## Functions  
Functions in Midio provide a way to group reusable flows together. By defining a function, you can encapsulate a specific behavior or operation that can be invoked multiple times throughout your program. Functions can have both inputs and outputs in the form of triggers and properties, allowing them to accept data, process it, and produce results. Unlike modules, functions cannot contain events.


📄 the-midio-language.md (Chunk ID: 0)
# The Midio Language  
Midio is a visual, general-purpose programming language. Its building blocks, composed of functions and events, are similar to those found in standard textual languages. Functions are entities that process input and produce output. Events, on the other hand, enable triggering of flows based on specific conditions such as incoming HTTP requests. Midio also includes data types and modules for organizing the code. 

## Technical docuemntation Faiss

In [6]:

# Convert DataFrame to dictionary format
code_docs = df.to_dict(orient="records")
include_files = [
    # "std.md", 
    # "http.md", 
    "std_extern.midio", 
    "http_extern.midio",
]  # Add more variations if needed
code_docs = [doc for doc in code_docs if os.path.basename(doc["file"]).upper() in [f.upper() for f in include_files]]

print(f"✅ Loaded {len(code_docs)} documentation chunks.")

# Initialize Ollama embeddings
embeddings = OllamaEmbeddings(model="nomic-embed-text")
faiss_docs = [Document(page_content=doc["content"], metadata={"file": doc["file"], "chunk_id": doc["chunk_id"]}) for doc in code_docs]
vectorstore = FAISS.from_documents(faiss_docs, embeddings)
vectorstore.save_local("faiss_index") # Save FAISS index locally
print("✅ FAISS index saved successfully.")

# Perform similarity search
query = "Get signature for Std.Count, Std.Filter, Std.Stop, Std.For node"
retrieved_docs = vectorstore.similarity_search(query, k=10)
for doc in retrieved_docs:
    print(f"\n\n📄 {doc.metadata['file']} (Chunk ID: {doc.metadata['chunk_id']})")
    print(doc.page_content)

✅ Loaded 290 documentation chunks.
✅ FAISS index saved successfully.


📄 std_extern.midio (Chunk ID: 45)
#### Function 'Std.Stop'



## Implementation: extern func Stop {

        in trigger stop

        in(name: "exit code") property(Number) exitCode

    }

##Available types in scope: type CountContext Number
type AnyContext Any


📄 std_extern.midio (Chunk ID: 42)
#### Function 'Std.Count'

Count can be used to make sure a loop in a flow only runs for a certain number of iteartions. The `reset` input trigger resets the count, and the `count` trigger is used to increment it.

## Implementation: extern func(doc: "Count can be used to make sure a loop in a flow only runs for a certain number of iteartions. The `reset` input trigger resets the count, and the `count` trigger is used to increment it.") Count {

        in(x: 0, y: 0, name: "reset") trigger() reset

        in(x: 0, y: 0, name: "count") trigger(consumes: Std.CountContext, ) count

        in(x: 0, y: 0, name: "iterations")

## Inference