# Interacting with Llama 3.1 8B Model and Coding Dataset

This notebook loads the Llama 3.1 8B model and the coding dataset, then allows you to run prompts through the model.

In [2]:
import sys
sys.path.append('src')
from data_loader import load_or_download_llama_model, load_or_download_coding_dataset
import torch
from transformers import TextGenerationPipeline


In [3]:
# Load the model and dataset
tokenizer, model = load_or_download_llama_model()

dataset = load_or_download_coding_dataset()

Loading Llama 3.2 1B Instruct model from local storage...
Llama 3.2 1B Instruct model loaded successfully.
Loading coding dataset from local storage...
Coding dataset loaded successfully.
Dataset size: 1000 samples


In [5]:
model.device

device(type='cpu')

In [6]:
# Create a text generation pipeline
generator = TextGenerationPipeline(model=model, tokenizer=tokenizer)

In [7]:
# Function to generate text based on a prompt
def generate_text(prompt, max_length=100):
    generated = generator(prompt, max_length=max_length, do_sample=True, top_k=50, top_p=0.95)
    return generated[0]['generated_text']

In [8]:
# Get a few examples from the dataset
examples = dataset.select(range(5))
for i, example in enumerate(examples):
    print(f"Example {i+1}:")
    print(example['content'][:200] + '...\n')  # Print first 200 characters

Example 1:
###############################################################################
##
##  Copyright (C) 2013-2014 Tavendo GmbH
##
##  Licensed under the Apache License, Version 2.0 (the "License");
##  y...

Example 2:
from itertools import chain

from django.utils.itercompat import is_iterable


class Tags:
    """
    Built-in tags for internal checks.
    """
    admin = 'admin'
    caches = 'caches'
    compatib...

Example 3:
"""
The :mod:`sklearn.utils` module includes various utilites.
"""

from collections import Sequence

import numpy as np
from scipy.sparse import issparse

from .murmurhash import murm...

Example 4:
""" Python Character Mapping Codec cp1250 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1250.TXT' with gencodec.py.

"""#"

import codecs

### Codec APIs

class Codec(codecs.Codec):

    def encod...

Example 5:
#!/usr/bin/python
# encoding: utf-8 -*-

# Copyright: (c) 2013, Matthias Vogelgesang <matthias.vogelgesang@gmail.com>
# GNU General Public Li

In [9]:
# Select an example to use as a prompt
example_index = 0  # Change this to use a different example
prompt = examples[example_index]['content'][:200]  # Use first 200 characters as prompt
print("Prompt:")
print(prompt)

# Generate text based on the prompt
generated_text = generate_text(prompt)
print("\nGenerated Text:")

print(generated_text)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Prompt:
###############################################################################
##
##  Copyright (C) 2013-2014 Tavendo GmbH
##
##  Licensed under the Apache License, Version 2.0 (the "License");
##  y

Generated Text:
###############################################################################
##
##  Copyright (C) 2013-2014 Tavendo GmbH
##
##  Licensed under the Apache License, Version 2.0 (the "License");
##  y you may not use this file except in compliance with the License.
##  You may obtain a copy of the License at
##
##  http://www.apache.org/licenses/LICENSE-2.0
##
##  Unless required by applicable law or agreed to in writing, software
## 


You can change the `example_index` in the cell above to try different prompts from the dataset. You can also modify the `max_length` parameter in the `generate_text` function to control the length of the generated text.