# Boxcars Embeddings example with hnswlib


```bash 
gem install boxcars
```

and then create / edit .env to have OPENAI_ACCESS_TOEKN 

```ruby
require 'dotenv/load'
require 'boxcars'
```

## Examples

### Boxcars::VectorStore

Create hwswlib index file. It will use OpenAI's embeddings endpoint to create the embeddings and then save them to a hnswlib index file. read the markdown files and save the vector store to disk as hnswlib_notion_db_index.bin. The Notion_DB data is from https://github.com/hwchase17/notion-qa

In [64]:
require 'dotenv/load'
require 'boxcars'

hnswlib_vector = Boxcars::VectorStore::Hnswlib::BuildFromFiles.call(
  training_data_path: './Notion_DB/**/*.md',
  index_file_path: './hnswlib_notion_db_index.bin',
  json_doc_file_path: './hnswlib_notion_db_index.json',
  force_rebuild: false
)

{:type=>:hnswlib, :vector_store=>[#<Boxcars::VectorStore::Document:0x000000010bbf59d0 @content="we provide you with a laptop that suits your job. Ask HR for further info.\n- **Workplace**: \nwe've built a pretty nice office to make sure you like being at Blendle HQ. Feel free to sit where you want. Even better: dare to switch your workplace every once in a while.\n\n# Work at Blendle\n\n---\n\nIf you want to work at Blendle you can check our [job ads here](https://blendle.homerun.co/). If you want to be kept in the loop about Blendle, you can sign up for [our behind the scenes newsletter](https://blendle.homerun.co/yes-keep-me-posted/tr/apply?token=8092d4128c306003d97dd3821bad06f2).", @embedding=[0.00031604595, -0.01758388, -0.009004207, -0.044534877, -0.020117383, 0.015872054, -0.022253744, 0.0067411726, 0.0018556198, -0.012058105, -0.003659885, -0.0066521573, -0.0074704103, 0.0041289255, -0.006165999, 0.0047075227, 0.013427567, -0.019706545, 0.012421013, -0.008867261, -0.023472564, -

openai_connection is optional, if not provided it will use the OPENAI_ACCESS_TOKEN from the .env file

In [65]:
# we can also load the vector store from disk if the index file exists
hnswlib_vector = Boxcars::VectorStore::Hnswlib::LoadFromDisk.call(
  index_file_path: './hnswlib_notion_db_index.bin',
  json_doc_file_path: './hnswlib_notion_db_index.json'
) 

openai_client = Boxcars::Openai.open_ai_client

vector_search = Boxcars::VectorSearch.new(
  vector_documents: hnswlib_vector,
  openai_connection: openai_client
)

work_home = vector_search.call(query: 'What is the work from home policy?').first[:document].content

puts work_home.inspect 


"we provide you with a laptop that suits your job. Ask HR for further info.\n- **Workplace**: \nwe've built a pretty nice office to make sure you like being at Blendle HQ. Feel free to sit where you want. Even better: dare to switch your workplace every once in a while.\n\n# Work at Blendle\n\n---\n\nIf you want to work at Blendle you can check our [job ads here](https://blendle.homerun.co/). If you want to be kept in the loop about Blendle, you can sign up for [our behind the scenes newsletter](https://blendle.homerun.co/yes-keep-me-posted/tr/apply?token=8092d4128c306003d97dd3821bad06f2)."


### Boxcars::VectorStore::InMemory

InMemory would not save files into disk but it can read files and create the index in memory. It will use OpenAI's embeddings endpoint to create the embeddings and then store them as Boxcars::VectorStore::Document data.

In [None]:
require 'dotenv/load'
require 'boxcars'

in_memory_vector = Boxcars::VectorStore::InMemory::BuildFromFiles.call(
  training_data_path: './Notion_DB/**/*.md'
)

openai_client = Boxcars::Openai.open_ai_client

vector_search = Boxcars::VectorSearch.new(
  vector_documents: in_memory_vector,
  openai_connection: openai_client
)

harassment = vector_search.call(
  query: 'What is the first step to do when there is a harassment?',
  count: 1
).first[:document].content

puts harassment