* https://github.com/LAION-AI/Open-Assistant/tree/main
* https://huggingface.co/datasets/OpenAssistant/oasst1
* https://huggingface.co/datasets/timdettmers/openassistant-guanaco
* https://github.com/LAION-AI/Open-Assistant/blob/main/notebooks/openassistant-oasst1/getting-started.ipynb

**1. Load the libraries**

In [8]:
from pyprojroot import here
from prepare_training_data import prepare_openassistant_guanaco_dataset
from datasets import load_dataset
import pandas as pd
from transformers import (AutoTokenizer,
                          AutoModelForCausalLM,
                          TrainingArguments,
                          AutoModelForCausalLM,
                          Trainer)

Load the subset:

```
guanaco_train = load_dataset(
        path="timdettmers/openassistant-guanaco", split="train")
guanaco_test = load_dataset(
    path="timdettmers/openassistant-guanaco", split="test")
```

**Load OpenAssistant/oasst1**

In [9]:
ds = load_dataset("OpenAssistant/oasst1")
df = pd.DataFrame(ds['train'])
# Filter rows where lang is "en"
filtered_df = df[df['lang'] == 'en']

In [11]:
filtered_df.head(2)

Unnamed: 0,message_id,parent_id,user_id,created_date,text,role,lang,review_count,review_result,deleted,rank,synthetic,model_name,detoxify,message_tree_id,tree_state,emojis,labels
0,6ab24d72-0181-4594-a9cd-deaf170242fb,,c3fe8c76-fc30-4fa7-b7f8-c492f5967d18,2023-02-05T14:23:50.983374+00:00,Can you write a short introduction about the r...,prompter,en,3,True,False,,False,,"{'toxicity': 0.00044308538781479, 'severe_toxi...",6ab24d72-0181-4594-a9cd-deaf170242fb,ready_for_export,"{'name': ['+1', '_skip_reply', '_skip_ranking'...","{'name': ['spam', 'lang_mismatch', 'pii', 'not..."
1,c8e83833-ecbc-44fe-b6db-735228c25a1c,6ab24d72-0181-4594-a9cd-deaf170242fb,2c96e467-66f0-4be7-9693-bda51356a424,2023-02-06T13:50:44.657083+00:00,"""Monopsony"" refers to a market structure where...",assistant,en,3,True,False,0.0,False,,"{'toxicity': 0.00026396565954200923, 'severe_t...",6ab24d72-0181-4594-a9cd-deaf170242fb,ready_for_export,"{'name': ['+1', '_skip_labeling'], 'count': [3...","{'name': ['spam', 'fails_task', 'lang_mismatch..."


In [12]:
def add_tree_level(df):
    """helper function to add tree level to a df"""

    # if tree level already exists, return df
    if "tree_level" in df.columns:
        return df

    else:
        tree_level_map = {}

        # iterate over rows in df
        for i, row in df.iterrows():
            message_id = row["message_id"]
            parent_id = row["parent_id"]

            # if parent_id is None, then it is a root message
            if parent_id is None:
                tree_level_map[message_id] = 0
            # if parent_id is the same as message_tree_id, then it is a direct reply to the root message
            elif parent_id == row["message_tree_id"]:
                tree_level_map[message_id] = 1
            # else just look up the tree level of the parent_id and add 1
            else:
                tree_level_map[message_id] = tree_level_map[parent_id] + 1

        # create a df from the tree_level_map and merge it with the original df
        df_tree_level_map = (
            pd.DataFrame.from_dict(tree_level_map, orient="index", columns=["tree_level"])
            .reset_index()
            .rename(columns={"index": "message_id"})
        )

        return df.merge(df_tree_level_map, on="message_id")

**Random Message Tree**

In [13]:
message_tree_id = filtered_df["message_tree_id"].sample(1).values[0]
print(message_tree_id)

2329a9d5-4f2e-437f-bbea-30a6f2b45521


In [15]:
# look at all data for this message tree
df_message_tree = df.query(f"message_tree_id == '{message_tree_id}'").sort_values("created_date")

# add tree level to df
df_message_tree = add_tree_level(df_message_tree)

df_message_tree.head(3)

Unnamed: 0,message_id,parent_id,user_id,created_date,text,role,lang,review_count,review_result,deleted,rank,synthetic,model_name,detoxify,message_tree_id,tree_state,emojis,labels,tree_level
0,2329a9d5-4f2e-437f-bbea-30a6f2b45521,,ef66f43d-de65-4a08-a3c7-241bba85907b,2023-02-08T15:37:13.213307+00:00,"hi, how can you filter an array of objects in ...",prompter,en,3,True,False,,False,,"{'toxicity': 0.0009466029005125165, 'severe_to...",2329a9d5-4f2e-437f-bbea-30a6f2b45521,ready_for_export,"{'name': ['+1'], 'count': [1]}","{'name': ['spam', 'lang_mismatch', 'pii', 'not...",0
1,d4d257fc-bd73-4548-8898-7f700e802425,2329a9d5-4f2e-437f-bbea-30a6f2b45521,2d40e94c-40ed-4e00-a66e-454c0401c8ad,2023-03-14T07:03:16.947549+00:00,"In JavaScript, you can filter an array of obje...",assistant,en,3,True,False,0.0,False,,"{'toxicity': 0.00016775251424405724, 'severe_t...",2329a9d5-4f2e-437f-bbea-30a6f2b45521,ready_for_export,"{'name': ['+1', '_skip_labeling'], 'count': [2...","{'name': ['spam', 'fails_task', 'lang_mismatch...",1
2,3ef7e71f-ba6a-4c4c-99f3-8058e2981cf6,2329a9d5-4f2e-437f-bbea-30a6f2b45521,e89e15d0-ed95-496f-b460-8457dc5cade4,2023-03-14T09:11:29.727717+00:00,You can filter an array of objects in JavaScri...,assistant,en,3,True,False,1.0,False,,"{'toxicity': 0.00018528560758568347, 'severe_t...",2329a9d5-4f2e-437f-bbea-30a6f2b45521,ready_for_export,"{'name': ['+1', '_skip_labeling'], 'count': [1...","{'name': ['spam', 'fails_task', 'lang_mismatch...",1


In [24]:
for i in df_message_tree["text"]:
    print(i)

hi, how can you filter an array of objects in javascript
In JavaScript, you can filter an array of objects using the Array.filter() method. This method creates a new array with all elements that pass the test implemented by the provided function. Here is an example:
``` 
const people = [
  { name: 'Alice', age: 30 },
  { name: 'Bob', age: 25 },
  { name: 'Charlie', age: 35 },
  { name: 'David', age: 40 }
];

// Filter people whose age is greater than or equal to 35
const filteredPeople = people.filter(person => person.age >= 35);

console.log(filteredPeople); // [{ name: 'Charlie', age: 35 }, { name: 'David', age: 40 }]
``` 
In the example above, we have an array of objects with name and age properties. We use the Array.filter() method to create a new array that only includes objects whose age property is greater than or equal to 35.

You can replace the condition in the arrow function to filter the array of objects based on other criteria.

Hope that helps! Let me know if you have any