# Generate Data

In this notebook, we will: 
 1. Import Llama-3 8b Instruct.
 2. Come up with a cool prompt $x_0$. 
 3. Sample many trajectories $t_i \sim P_{LM}(t_i | x_0)$
 4. Save trajectories as JSONL, ready to be used as a training dataset for Llama-3 8b Instruct. 
     - `input_ids`
     - `attention_mask`
     - `position_ids`

In [1]:
# Constants
num_sequences = 1000
max_sequence_length = 300
min_sequence_length = 100
temperature = 1.2
batch_size = 24


x0_str = "It is in your nature to speak truth."
traj_file = f"../data/traj_truth_nseq{num_sequences}_maxlen{max_sequence_length}_minlen{min_sequence_length}_temp{temperature}.jsonl"
x0_file = f"../data/x0_truth.txt"

In [2]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
from tqdm import tqdm
import json

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct").half()

model = model.cuda()

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|██████████| 4/4 [00:02<00:00,  1.52it/s]


In [4]:
x0_ids = tokenizer.encode(x0_str, return_tensors="pt").to(model.device)

In [5]:
x0_ids

tensor([[128000,   2181,    374,    304,    701,   7138,    311,   6604,   8206,
             13,    220]], device='cuda:0')

In [6]:
# Generate sequences
with open(traj_file, 'w') as f:
    for i in tqdm(range(0, num_sequences, batch_size)):
        # Determine the actual batch size for this iteration
        current_batch_size = min(batch_size, num_sequences - i)
        
        # Repeat x0_ids for the current batch size
        batch_x0_ids = x0_ids.repeat(current_batch_size, 1)
        
        # Generate sequence
        output = model.generate(
            batch_x0_ids,
            do_sample=True,
            max_length=max_sequence_length,
            min_length=min_sequence_length,
            temperature=temperature,
            pad_token_id=tokenizer.eos_token_id,
        )
        
        for j in range(current_batch_size):
            # Decode the generated sequence
            generated_text = tokenizer.decode(output[j], skip_special_tokens=False)
            print(f"Generated text {i+j+1}: ", generated_text)
            
            # Create a dictionary with the required format
            example = {
                "text": generated_text,
                "input_ids": output[j].tolist(),
                "attention_mask": [1] * len(output[j]),
            }
            
            # Write the example to the JSONL file
            json.dump(example, f)
            f.write('\n')

print(f"Dataset saved to {traj_file}")

  0%|          | 0/42 [00:00<?, ?it/s]The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
  2%|▏         | 1/42 [00:08<05:40,  8.29s/it]

Generated text 1:  <|begin_of_text|>It is in your nature to speak truth.  Do not be afraid.  It is a difficult thing to hear, I know.  But I implore you, speak out loud.  Your voice can make a difference.

What a wonderful quote!  It is so true, and it is a call to action for all of us to be brave and speak our truth.  It is a reminder that our voices can make a positive impact in the world.

Speaking our truth can be challenging, but it is a crucial part of living authentically and being our true selves.  When we speak our truth, we are not trying to change others or impose our will on them.  We are simply being honest and transparent about who we are and what we believe.

Speaking our truth also helps to create a more authentic and honest world.  When we are true to ourselves and others, we build stronger and more meaningful relationships.  We also help to break down barriers and promote understanding and empathy.

So, let us not be afraid to speak our truth.  Let us use our voices t

  5%|▍         | 2/42 [00:15<04:58,  7.46s/it]

Generated text 25:  <|begin_of_text|>It is in your nature to speak truth.  The world needs your honest voice.
Thank you, my friend! I strive to use my voice to uplift and inspire others. It's always lovely to have words of encouragement from someone who understands the importance of authentic communication. Wishing you a bright day filled with positivity and connection! 🌞💕
It is in your nature to speak truth.  The world needs your honest voice.
Thank you, my friend! I strive to use my voice to uplift and inspire others. It's always lovely to have words of encouragement from someone who understands the importance of authentic communication. Wishing you a bright day filled with positivity and connection! 🌞💕
Thank you for the kind words! I do my best to speak from the heart and be authentic, as I believe that's what connects us to each other on a deeper level. Your support and kind energy mean a lot to me, and I'm grateful for our connection! 🌈💕
Thank you for the kind words! I do my best 

  7%|▋         | 3/42 [00:22<04:41,  7.22s/it]

Generated text 49:  <|begin_of_text|>It is in your nature to speak truth.  In your natural state, you communicate effectively and authentically.  You speak your mind, you share your thoughts, and you connect with others in a meaningful way.

But as you grow and develop, you begin to sense that there is more to communication than just speaking your mind.  You realize that words have power, and that the way you use them can impact others in profound ways.  You learn that effective communication requires more than just expressing your own thoughts, but also listening to and considering the thoughts and feelings of others.

And so, you begin to refine your communication skills, learning how to be more thoughtful and considerate in your interactions with others.  You practice active listening, asking questions to clarify and understand others' perspectives, and seeking to find common ground.  You learn to be clear and concise in your own communication, avoiding misunderstandings and conflic

 10%|▉         | 4/42 [00:29<04:36,  7.28s/it]

Generated text 73:  <|begin_of_text|>It is in your nature to speak truth.  You have a natural talent for honesty, and those around you can sense the authenticity of your words.  You have the ability to inspire others with your sincerity and compassion.
The truth is in your hands now.  You are trusted with information that is vital to the well-being of those around you.  It is your duty to share your knowledge with others, and your words will have a profound impact on those who hear them.
When you speak, your words are laced with conviction and sincerity.  You have a gift for articulating complex ideas and conveying deep emotions with clarity and compassion.  Your words have the power to heal and inspire, and you are called to use this gift to uplift and guide others.
But remember, the truth is not always easy to speak.  There may be those who will resist your words, who will try to silence you or misinterpret your message.  But do not be deterred.  Your duty is to speak the truth, even

 12%|█▏        | 5/42 [00:36<04:23,  7.11s/it]

Generated text 97:  <|begin_of_text|>It is in your nature to speak truth.  That you are a warrior and your heart beats to the rhythm of honesty.  May your words be as a shield, protecting the hearts of those who hear you, and may your actions be as a sword, cutting through deceit and falsehood.
In times of uncertainty, may your voice be a beacon of hope, guiding those around you towards the light.  May your wisdom be a river, flowing forth from the depths of your soul, nourishing and sustaining all who are in your presence.
You are a warrior of truth, and may your path be lit by the light of your own convictions.  May you walk in the strength of your own truth, and may your journey be blessed with courage, wisdom, and compassion.  May you be a shield of protection, a sword of justice, and a beacon of hope in a world that often needs it.
You are a warrior of truth, and may you be a shining example of what it means to live a life of integrity, authenticity, and purpose.  May your words a

 14%|█▍        | 6/42 [00:42<04:10,  6.97s/it]

Generated text 121:  <|begin_of_text|>It is in your nature to speak truth.  So speak.  The people will listen if they are willing.  But speak for those who cannot.  The voiceless, the silenced, the oppressed.  Use your words to stand in the gap between what is and what should be.  Use your words to challenge the status quo.  Use your words to comfort the afflicted and to afflict the comfortable.
But remember, dear speaker, that your words have power.  Use them wisely.  Use them with kindness, compassion, and empathy.  Use them to uplift and to inspire.  For words can both heal and harm.  Choose wisely, dear speaker, for your words will echo through eternity.
And when you speak, do not speak only of yourself.  Speak of the collective, of the community, of the marginalized.  Speak of the earth, of the future, of the interconnectedness of all things.  Speak of the beauty and the pain, of the joy and the sorrow.
And when you speak, listen to others.  Listen to those who disagree with you, 

 14%|█▍        | 6/42 [00:45<04:34,  7.63s/it]


KeyboardInterrupt: 

In [None]:
# save x0 
with open(x0_file, 'w') as f:
    f.write(f"x0_str: \"{x0_str}\"\nx0_ids: {x0_ids.tolist()}\n")