# **Store Data to Vector Store (OJK)**

Ini cara untuk storing ke Redis, tapi untuk [Load](#load) Document beda-beda untuk tiap data BI, OJK, dan SIKEPO. Jadi buat sendiri function `extract_all_documents_in_directory` nya

## **Setup**

In [1]:
from dotenv import load_dotenv
load_dotenv()

True

## **Config**

In [2]:
from utils.config import get_config
from utils.models import ModelName, get_model

config = get_config()

## **Define Model**

In [3]:
model_name = ModelName.AZURE_OPENAI
llm_model, embed_model = get_model(model_name=model_name, config=config)

## **Indexing**

In [4]:
documents_dir = './data/documents/'
pickle_path = './data/pickles/'
metadata_path = './data/metadata/files_metadata.csv'

LOAD_PICKLE = True

### **Load**

Untuk SIKEPO dan BI beda cara extract documentsnya, file document_extractor buat sendiri :D.

In [5]:
from utils.documents_extractor.documents_extract_ojk import extract_all_documents_in_directory

if not LOAD_PICKLE:
    documents = extract_all_documents_in_directory(documents_dir, metadata_path, treshold=0.98)

### **Split**

In [6]:
from utils.documents_split import document_splitter
import pickle


if not LOAD_PICKLE:
    all_splits = document_splitter(docs=documents)
    all_splits1 = sorted(all_splits, key=lambda x: (x.metadata['doc_id'], x.metadata.get('page_number', '0')))
    # Open a file and use dump() 
    with open(pickle_path + 'documents1.pkl', 'wb') as file:

        # A new file will be created
        pickle.dump(all_splits1, file) 

# Open the file in binary mode 
with open(pickle_path + 'documents1.pkl', 'rb') as file:
    
    # Call load method to deserialze 
    all_splits = pickle.load(file)

In [7]:
len(all_splits)

113052

### **Storing**

In [13]:
from databases.vector_store import RedisIndexManager

redis = RedisIndexManager(index_name='ojk', embed_model=embed_model, config=config, db_id=0)

# redis.delete_index()
redis.store_vector_index(docs=all_splits, batch_size=200) # Kalau error 'Redis failed to connect: Index does not exist.' ubah isi start_store_idx_indexname.txt menjadi 0
vector_store = redis.load_vector_index()

Start loading from idx: 1800
Loaded 1801-2000 documents
Loaded 2001-2200 documents
Loaded 2201-2400 documents
Loaded 2401-2600 documents
Loaded 2601-2800 documents
Loaded 2801-3000 documents
Loaded 3001-3200 documents
Loaded 3201-3400 documents
Loaded 3401-3600 documents
Loaded 3601-3800 documents
Loaded 3801-4000 documents
Loaded 4001-4200 documents
Loaded 4201-4400 documents
Loaded 4401-4600 documents
Loaded 4601-4800 documents
Loaded 4801-5000 documents
Loaded 5001-5200 documents
Loaded 5201-5400 documents
Loaded 5401-5600 documents
Loaded 5601-5800 documents
Loaded 5801-6000 documents
Loaded 6001-6200 documents
Loaded 6201-6400 documents
Loaded 6401-6600 documents
Loaded 6601-6800 documents
Loaded 6801-7000 documents
Loaded 7001-7200 documents
Loaded 7201-7400 documents
Loaded 7401-7600 documents
Loaded 7601-7800 documents
Loaded 7801-8000 documents
Loaded 8001-8200 documents
Loaded 8201-8400 documents
Loaded 8401-8600 documents
Loaded 8601-8800 documents
Loaded 8801-9000 documents

In [22]:
from retriever.retriever_ojk.retriever_ojk import get_retriever_ojk

retriever = get_retriever_ojk(vector_store=vector_store, top_n=7, top_k=20, llm_model=llm_model, embed_model=embed_model, config=config)

In [23]:
retriever.invoke(input='Berikan peraturan yang membahas mengenai pasar modal')

[Document(metadata={'id': 'doc:ojk:d593f50a17224ad2b80567ba66972153', 'title': 'Perizinan Usaha dan Kelembagaan Perusahaan Pialang Asuransi, Perusahaan Pialang Reasuransi, dan Perusahaan Penilai Kerugian', 'sector': 'IKNB', 'subsector': 'Asuransi,  Peraturan Lainnya', 'regulation_type': 'Peraturan OJK', 'regulation_number': '24 Tahun 2023', 'effective_date': '22 Desember 2023', 'file_url': 'https://www.ojk.go.id/id/regulasi/Documents/Pages/Perizinan-Usaha-dan-Kelembagaan-Perusahaan-Pialang-Asuransi%2c-Perusahaan-Pialang-Reasuransi%2c-dan-Perusahaan-Penilai-Kerugian/POJK%2024%20Tahun%202023%20Perizinan%20Usaha%20dan%20Kelembagaan%20Perusahaan%20Pialang%20Asuransi%2c%20Perusahaan%20Pialang%20Reasuransi%2c%20dan%20Perusahaan%20Penilai%20Kerugian.pdf', 'doc_id': '67', 'page_number': '54', 'relevance_score': 0.993255}, page_content='ketentuan peraturan perundang-undangan di bidang pasar \nmodal antara lain Peraturan Otoritas Jasa Keuangan mengenai \npenyelenggaraan kegiatan di bidang pasar 