# Initilization of LLM model

This will be an initial test of using (in this case Google's gemini model) as the baseline model for the RAG.

## Loading / Splitting Data

This will be done with DirectoryLoader from lanchain-community.document-loaders

In [14]:
import json
from pathlib import Path
from typing import List
from langchain.docstore.document import Document
from langchain_community.document_loaders.base import BaseLoader
from langchain_community.document_loaders import DirectoryLoader

class JSONLoader(BaseLoader):
    """
    Loads a JSON file.
    """
    def __init__(self, file_path: str):
        self.file_path = file_path

    def load(self) -> List[Document]:
        with Path(self.file_path).open(encoding='utf-8') as f:
            data = json.load(f)

        title = data.get("title", "No Title")
        url = data.get("url", "No Url")
        documents = []

        documents.append(Document(
                page_content=data["content"],
                metadata={"source": url, "title": title}
            ))
        
        return documents

# This part remains the same
loader = DirectoryLoader(
    '../data', glob="**/*.json", loader_cls=JSONLoader
)

documents = loader.load()

print(f"Number of documents loaded: {len(documents)}")
if documents:
    print("\nExample Document:")
    print(f"Page Content: {documents[0].page_content[:400]}...") # Increased length for better preview
    print(f"Metadata: {documents[0].metadata}")

Number of documents loaded: 754

Example Document:
Page Content: - 1 Appearance

1.1 Data

1.1.1 Flinging ink


1.2 Version history
1.3 Quotes
1.4 Weapon freshness rewards

- 1.1 Data

1.1.1 Flinging ink

- 1.1.1 Flinging ink

- 1.2 Version history

- 1.3 Quotes

- 1.4 Weapon freshness rewards

- 2 Gallery

- 3 Etymology

3.1 Names in other languages
3.2 Translation notes

- 3.1 Names in other languages

- 3.2 Translation notes

- 4 References

- 1.1 Data

1.1....
Metadata: {'source': 'https://splatoonwiki.org/wiki/Cometz_Octobrush', 'title': 'Cometz Octobrush'}
