In [1]:
from lionagi.experimental.compressor.llm_compressor import LLMCompressor

In [None]:
# aa = "..........long text.........."

In [3]:
len(aa)

228126

In [4]:
aa = aa.replace("<|endoftext|>", "")
aa = aa.replace("\n\n", "")

In [5]:
len(aa)

224038

In [6]:
import lionagi as li

imodel = li.iModel(
    "gpt-3.5-turbo", interval_tokens=1_800_000, interval_requests=10_000
)
compressor = LLMCompressor(
    imodel=imodel,
    system_msg="concisely compress the given text for remembering",
    n_samples=10,
    target_ratio=0.08,
    split_overlap=0.05,
    split_threshold=5,
    verbose=True,
)

In [7]:
bb = await compressor.compress(aa)

Original Token number: 56999
Selected Token number: 4536
Token Compression Ratio: 0.080
Compression Time: 12.1889 seconds
Compression Model: gpt-3.5-turbo
Compression Method: perplexity
Compression Usage: $0.39438



In [13]:
branch = li.Branch()

res1 = await branch.chat(instruction="what does temperature means", context=aa)

In [None]:
from IPython.display import display, Markdown

In [None]:
branch.messages[-1].metadata

In [15]:
print(branch.messages[-1].metadata["extra"]["usage"])
Markdown(res1)

{'prompt_tokens': 61183, 'completion_tokens': 451, 'total_tokens': 61634, 'expense': 0.31268}


Temperature in the context of natural language processing (NLP) and machine learning, particularly when using models like OpenAI's GPT, refers to a parameter that controls the randomness of the model's output. It is a hyperparameter that influences the probability distribution over the possible next tokens (words or characters) generated by the model.

Here's a step-by-step explanation of what temperature means and how it affects the model's output:

1. **Probability Distribution**:
   - When generating text, the model predicts the next token based on a probability distribution. Each token has a certain probability of being the next token in the sequence.

2. **Temperature Parameter**:
   - The temperature parameter adjusts the sharpness of this probability distribution.
   - It is a value between 0 and 2, where:
     - **Low Temperature (< 1)**: The model's output becomes more deterministic. The model will favor high-probability tokens, making the text more predictable and focused.
     - **High Temperature (> 1)**: The model's output becomes more random. The model will favor a wider range of tokens, including those with lower probabilities, making the text more diverse and creative.

3. **Effect on Output**:
   - **Temperature = 0**: The model will always choose the token with the highest probability, resulting in very deterministic and repetitive text.
   - **Temperature = 1**: The model will sample tokens according to their probabilities without any adjustment, providing a balance between randomness and determinism.
   - **Temperature > 1**: The model will sample tokens more randomly, increasing the chances of selecting less probable tokens, which can lead to more varied and creative outputs.

4. **Practical Usage**:
   - Adjusting the temperature allows users to control the behavior of the model based on the desired outcome. For example:
     - For factual and precise responses, a lower temperature is preferred.
     - For creative writing or brainstorming, a higher temperature might be more suitable.

In summary, the temperature parameter is a crucial tool for controlling the randomness and creativity of the text generated by language models. By adjusting the temperature, users can fine-tune the balance between deterministic and random outputs to suit their specific needs.

In [8]:
branch2 = li.Branch()
res2 = await branch2.chat(
    instruction="what does vector store means", context=bb
)

In [11]:
print(branch2.messages[-1].metadata["extra"]["usage"])
Markdown(res2)

{'prompt_tokens': 5005, 'completion_tokens': 649, 'total_tokens': 5654, 'expense': 0.03476}


A vector store in the context of data management, especially with AI and machine learning, typically refers to a storage solution designed to manage and efficiently retrieve high-dimensional vectors. These vectors represent data points in high-dimensional space and are often used in various AI tasks like natural language processing, recommendation systems, and computer vision. Here's a detailed explanation:

### What is a Vector Store?

1. **Definition**:
   - A vector store is a specialized data storage system optimized for storing and retrieving vectors. These vectors can represent different types of data, such as words, images, or features produced by machine learning models.

2. **Use Cases**:
   - **Natural Language Processing (NLP)**: Storing word embeddings (like those from Word2Vec, GloVe, or BERT).
   - **Recommendation Systems**: Storing user and item vectors for similarity searches.
   - **Computer Vision**: Storing feature vectors of images.

3. **Operations**:
   - **Save Vectors**: Save high-dimensional vectors into the datastore.
   - **Retrieve Vectors**: Retrieve vectors based on specific queries or similarity search.
   - **Delete Vectors**: Remove vectors that are no longer needed.

4. **Indexing and Retrieval**:
   - Vector stores often use special indexing techniques like Approximate Nearest Neighbors (ANN) to enable fast retrieval of vectors similar to a given query vector.

### Example API Operations

The context provided in the API documentation indicates two operations related to vector stores:

1. **Creating a Vector Store File**:
   - **Endpoint**: `POST https://api.openai.com/v1/vector_stores/{vector_store_id}/files`
   - **Purpose**: To attach a file containing vectors to an existing vector store.

2. **Deleting a Vector Store File**:
   - **Endpoint**: `DELETE https://api.openai.com/v1/vector_stores/{vector_store_id}`
   - **Purpose**: To delete a vector store, which will remove the file associations but not the files themselves.

### Example Request for Creating a Vector Store File

```bash
curl -X POST https://api.openai.com/v1/vector_stores/{vector_store_id}/files \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "file_id": "file-abc123"
}'
```

### Example Request for Deleting a Vector Store

```bash
curl -X DELETE https://api.openai.com/v1/vector_stores/{vector_store_id} \
-H "Authorization: Bearer $OPENAI_API_KEY"
```

### Key Points to Remember

- **Vector Store ID**: Each vector store is uniquely identified by a `vector_store_id`.
- **File ID**: Files containing vectors are identified by `file_id` and can be linked to or removed from vector stores.
- **Efficient Retrieval**: Vector stores typically employ indexing mechanisms to ensure efficient similarity-based retrieval, which is crucial for tasks like recommendation and search.

By understanding these concepts and the provided API operations, you can effectively manage and utilize vector stores in AI and machine learning workflows.