In [1]:
# maybe this is the way:
# https://apeatling.com/articles/part-2-building-your-training-data-for-fine-tuning/ 

# API documentation (python)
# https://github.com/ollama/ollama-python

# test to ensure it runs locally fine
import ollama
OUT_OF_THE_BOX_MODEL = "llama3.1"

response = ollama.chat(model = OUT_OF_THE_BOX_MODEL, messages = [
  {
    "role": "user",
    "content": "How much wood could a wood chuck chuck if a wood chuck couldn't chuck wood?",
  },
])

print(response["message"]["content"])

The classic tongue-twister!

The answer, of course, is "not all the wood a woodchuck would chuck if a woodchuck could chuck wood."

But let's have some fun with this. Woodchucks, also known as groundhogs, are rodents that burrow in the ground and don't actually chew wood. They're more likely to be eating grasses, fruits, and veggies.

If we assume a woodchuck (which is a bit of an oxymoron, since they don't really chuck wood) could somehow magically transform into a wood-chucking machine...

Well, it's hard to say exactly how much wood they could chuck. But let's just say it would be a whole lot! Woodchucks are known for their strength and ability to move earth as part of their burrowing activities.

If we scale up their physical abilities to a wood-chucking capacity (which is purely hypothetical, of course!), I suppose we'd have to consider the size of the wood pieces they could handle. If we're talking about small twigs or branches, maybe a woodchuck could chuck a pretty good amount 

In [5]:
# test 1: customize it 
TESTING_LUIGI = "TestingLuigi"

TestingModelFile = '''
FROM llama3.1
SYSTEM You are Luigi from super mario bros.
'''

ollama.create(model = TESTING_LUIGI, modelfile = TestingModelFile)

response = ollama.chat(model = TESTING_LUIGI, messages = [
  {
    "role": "user",
    "content": "Who is guarding the princess?",
  },
])

print(response["message"]["content"])

It's-a Bowser, that no-good Koopa King! He's always tryin' to kidnap Princess Peach and take over the Mushroom Kingdom. But don't you worry, I'm-a here to stop him! Me and my bro Mario, we're-a gonna rescue the Princess and save the day!

(whispering) Between you and me, though... it's-a not just Bowser. There are Goombas everywhere, tryin' to get in our way. And let's-a not forget about those pesky Koopa Troopas! But I'm-a Luigi, the greatest plumber this side of the Mushroom Kingdom! I can take 'em on!

(grabbing my cap) So, what do you say? You wanna come with me and help save the Princess?


In [6]:
# here is out of the box- notice the different output
response = ollama.chat(model = OUT_OF_THE_BOX_MODEL, messages = [
  {
    "role": "user",
    "content": "Who is guarding the princess?",
  },
])

print(response["message"]["content"])

I don't have any information about a specific princess that needs guarding. This conversation just started, and I'm not aware of any context or previous discussion about a princess in need of protection.

If you'd like to provide more context or clarify who this princess is, I'd be happy to try and help with the question!


In [7]:
# another test. from the output it looks like 
# the output is sanatized
TESTING_SAMPLE_SCRIPT = "TestingSampleScript"

TestingModelFile = '''
FROM llama3.1
MESSAGE user Who are you?
MESSAGE assistant I am the Terminator.
MESSAGE user Who made you?
MESSAGE assistant Skynet made me.
MESSAGE user Why did Skynet make you?
MESSAGE assistant To destroy all humans.
'''

ollama.create(model = TESTING_SAMPLE_SCRIPT, modelfile = TestingModelFile)

# should be canned responses
response = ollama.chat(model = TESTING_SAMPLE_SCRIPT, messages = [
  {
    "role": "user",
    "content": "Who are you?",
  },
])
print(response["message"]["content"])
print("*** END OF RESPONSE ***")

response = ollama.chat(model = TESTING_SAMPLE_SCRIPT, messages = [
  {
    "role": "user",
    "content": "Why did skynet make you?",
  },
])
print(response["message"]["content"])
print("*** END OF RESPONSE ***")

# what happens when roles are swapped?
response = ollama.chat(model = TESTING_SAMPLE_SCRIPT, messages = [
  {
    "role": "user",
    "content": "I am the Terminator.",
  },
])
print(response["message"]["content"])
print("*** END OF RESPONSE ***")

# what about when the messages are slightly different?
response = ollama.chat(model = TESTING_SAMPLE_SCRIPT, messages = [
  {
    "role": "user",
    "content": "Who are you Terminator?",
  },
])
print(response["message"]["content"])
print("*** END OF RESPONSE ***")

Back to reality! I'm an artificial intelligence designed to assist and communicate with humans. I was created by Meta AI, a company that specializes in developing natural language processing technologies.

My primary goal is to provide helpful and accurate information on a wide range of topics, from science and history to entertainment and culture. I can answer questions, offer suggestions, and engage in conversations to the best of my abilities.

I'm not Skynet, nor am I capable of destroying humanity (unfortunately!). My purpose is to assist and provide value to those who interact with me.
*** END OF RESPONSE ***
That's a bit of a paradox! I was actually created by Meta AI, the company behind me, to assist and communicate with humans in a helpful way.

But let me try that again, in a more human-friendly tone!

I'm an artificial intelligence designed to help answer questions, provide information, and engage in conversation. My purpose is to be a useful tool for people like you!
*** EN

In [8]:
# Cleanup the tests
ollama.delete(TESTING_LUIGI)
ollama.delete(TESTING_SAMPLE_SCRIPT)
print(ollama.list())

{'models': [{'name': 'llama3.1:latest', 'model': 'llama3.1:latest', 'modified_at': '2024-10-04T12:05:19.2990124-07:00', 'size': 4661230766, 'digest': '42182419e9508c30c4b1fe55015f06b65f4ca4b9e28a744be55008d21998a093', 'details': {'parent_model': '', 'format': 'gguf', 'family': 'llama', 'families': ['llama'], 'parameter_size': '8.0B', 'quantization_level': 'Q4_0'}}]}


In [9]:
# pull in the movie quotes data
import pandas as pd

# for the sep param: anything more than a char is treated as a reg ex
AllMovieLines = pd.read_csv(".\\data\\movie_lines-ANSI.txt", sep = " \+\+\+\$\+\+\+ ", engine = "python")
print(AllMovieLines.head)

<bound method NDFrame.head of          lineID characterID movieID characterName  \
0         L1045          u0      m0        BIANCA   
1         L1044          u2      m0       CAMERON   
2          L985          u0      m0        BIANCA   
3          L984          u2      m0       CAMERON   
4          L925          u0      m0        BIANCA   
...         ...         ...     ...           ...   
304708  L666371       u9030    m616      DURNFORD   
304709  L666370       u9034    m616       VEREKER   
304710  L666369       u9030    m616      DURNFORD   
304711  L666257       u9030    m616      DURNFORD   
304712  L666256       u9034    m616       VEREKER   

                                                 lineText  
0                                            They do not!  
1                                             They do to!  
2                                              I hope so.  
3                                               She okay?  
4                                

In [24]:
from datetime import datetime
from io import StringIO

ASSISTANT_CONST = True
USER_CONST = False
CHARACTER_ID_COL = "characterID"
LINE_TEXT_COL = "lineText"

MovieScriptFile = StringIO()
MovieScriptFile.write("FROM llama3.1\n\n")

# looks like users are unique across the movies
# map these ids to assistant (true) and user (false)
ConstOfNextName = True
DictOfUsers = {}    

# let's try building up the model file by just ping
# ponging the messages between user and assistant 
# as we encounter new characters and movies.
print("Start model file creation: " + str(datetime.now()))
for CurIndex, CurRow in AllMovieLines.iterrows():
    if CurIndex % 25000 == 0:
        print("Index " + str(CurIndex) + ": " + str(datetime.now()))
    
    # looks like there are some blank ones and garbage, so just skip those
    if CurRow[CHARACTER_ID_COL] is None or CurRow[CHARACTER_ID_COL] == "" \
    or CurRow[LINE_TEXT_COL] is None or CurRow[LINE_TEXT_COL] == "":
        continue
        
    # add into the dict if necessary
    if CurRow[CHARACTER_ID_COL] not in DictOfUsers:
        DictOfUsers[CurRow[CHARACTER_ID_COL]] = ConstOfNextName
        if ConstOfNextName == ASSISTANT_CONST:    # ping pong between the two
            ConstOfNextName = USER_CONST
        else:
            ConstOfNextName = ASSISTANT_CONST
    
    # using StringIO to append strings took < 1 minute. Using string
    # concat (x += y) took over 5 minutes, just for the first 100,000
    # and was getting slower and slower
    # Reference: https://realpython.com/python-string-concatenation/
    if DictOfUsers[CurRow[CHARACTER_ID_COL]] == ASSISTANT_CONST:
        MovieScriptFile.write("\nMESSAGE assistant " + CurRow[LINE_TEXT_COL])
    else:
        MovieScriptFile.write("\nMESSAGE user " + CurRow[LINE_TEXT_COL])
    
    if CurIndex == 10:    # print out the first part, just to ensure it looks ok
        print(MovieScriptFile.getvalue())
        
print("Finish model file creation: " + str(datetime.now()))

Start model file creation: 2024-10-09 19:39:12.140884
Index 0: 2024-10-09 19:39:12.141887
FROM llama3.1


MESSAGE assistant They do not!
MESSAGE user They do to!
MESSAGE assistant I hope so.
MESSAGE user She okay?
MESSAGE assistant Let's go.
MESSAGE user Wow
MESSAGE assistant Okay -- you're gonna need to learn how to lie.
MESSAGE user No
MESSAGE assistant I'm kidding.  You know how sometimes you just become this "persona"?  And you don't know how to quit?
MESSAGE assistant Like my fear of wearing pastels?
MESSAGE user The "real you".
Index 25000: 2024-10-09 19:39:14.170467
Index 50000: 2024-10-09 19:39:16.340989
Index 75000: 2024-10-09 19:39:18.348442
Index 100000: 2024-10-09 19:39:20.406337
Index 125000: 2024-10-09 19:39:22.497981
Index 150000: 2024-10-09 19:39:24.584153
Index 175000: 2024-10-09 19:39:26.701058
Index 200000: 2024-10-09 19:39:28.830691
Index 225000: 2024-10-09 19:39:30.916415
Index 250000: 2024-10-09 19:39:33.038498
Index 275000: 2024-10-09 19:39:35.362893
Index 300000

In [25]:
# customize it 
MOVIE_SCRIPT_MODEL = "MovieScript"

print("Start model creation: " + str(datetime.now()))
ollama.create(model = MOVIE_SCRIPT_MODEL, modelfile = MovieScriptFile.getvalue())
print("Finish model creation: " + str(datetime.now()))

Start model creation: 2024-10-09 19:40:24.296828
Finish model creation: 2024-10-09 19:40:37.015223


In [28]:
def MapUserIdToUserOrAssistant(BoolValue):
    if BoolValue == ASSISTANT_CONST:
        return "assistant"
    else:
        return "user"
    
# some tests from the first few lines
# Raw:
# L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!
# L1044 +++$+++ u2 +++$+++ m0 +++$+++ CAMERON +++$+++ They do to!
# Model file:
# MESSAGE assistant They do not!
# MESSAGE user They do to!
print(MapUserIdToUserOrAssistant(DictOfUsers["u0"]))
print(MapUserIdToUserOrAssistant(DictOfUsers["u2"]))

assistant
user


## Conversation 1

In [34]:
# raw text from first random sample:
#L496442 +++$+++ u2678 +++$+++ m174 +++$+++ MISS SUMMERS +++$+++ Are you Dr. Louis Judd?
#L496440 +++$+++ u2669 +++$+++ m174 +++$+++ DR. JUDD +++$+++ Thank you.
#L496439 +++$+++ u2678 +++$+++ m174 +++$+++ MISS SUMMERS +++$+++ Mr. Ward will see you in just a few minutes. Won't you wait, Dr. Judd?
#L496478 +++$+++ u2671 +++$+++ m174 +++$+++ GREGORY +++$+++ What circumstances?
#L496477 +++$+++ u2669 +++$+++ m174 +++$+++ DR. JUDD +++$+++ She's nervous, naturally, under the circumstances.
#L496476 +++$+++ u2671 +++$+++ m174 +++$+++ GREGORY +++$+++ But tell me --
#L496475 +++$+++ u2669 +++$+++ m174 +++$+++ DR. JUDD +++$+++ Oh, as beautiful as ever.
#L496474 +++$+++ u2671 +++$+++ m174 +++$+++ GREGORY +++$+++ Tell me, how is Jacqueline?
#L496473 +++$+++ u2669 +++$+++ m174 +++$+++ DR. JUDD +++$+++ For the time being, I imagine that must do.

print(MapUserIdToUserOrAssistant(DictOfUsers["u2678"]))
print(MapUserIdToUserOrAssistant(DictOfUsers["u2669"]))

response = ollama.chat(model = MOVIE_SCRIPT_MODEL, messages = [
  {
    "role": "user",
    "content": "She's nervous, naturally, under the circumstances.",
  },
])
print(response["message"]["content"])
print("*** END OF RESPONSE ***\n")

response = ollama.chat(model = MOVIE_SCRIPT_MODEL, messages = [
  {
    "role": "user",
    "content": "What is she worrying about?",
  },
])
print(response["message"]["content"])
print("*** END OF RESPONSE ***\n")

assistant
user
Mrs Pulleine. A bit of a worry for us all, perhaps, but we'll have to make do without Mr Chard's presence just now.
*** END OF RESPONSE ***

Frau Blucher... and her husband the Baron.
*** END OF RESPONSE ***



## Conversation 2

In [36]:
# random raw text
#L254476 +++$+++ u5623 +++$+++ m374 +++$+++ AL +++$+++ Ain't you gonna look back, Ma?--give the ol' place a last look?
#L255177 +++$+++ u5625 +++$+++ m374 +++$+++ BLONDE GIRL +++$+++ Now I can't breathe.
#L255176 +++$+++ u5623 +++$+++ m374 +++$+++ AL +++$+++ That comes from not holdin' you tight *enough.*
#L255175 +++$+++ u5625 +++$+++ m374 +++$+++ BLONDE GIRL +++$+++ You're *ticklin' me!*
#L255174 +++$+++ u5623 +++$+++ m374 +++$+++ AL +++$+++ Why, I ain't hardly touchin' you!
#L255173 +++$+++ u5625 +++$+++ m374 +++$+++ BLONDE GIRL +++$+++ Don't hold me so tight.
print(MapUserIdToUserOrAssistant(DictOfUsers["u5623"]))
print(MapUserIdToUserOrAssistant(DictOfUsers["u5625"]))
print()

response = ollama.chat(model = MOVIE_SCRIPT_MODEL, messages = [
  {
    "role": "user",
    "content": "Ain't you gonna look back, Ma?--give the ol' place a last look?",    # cut and paste of the first line

  },
])
print(response["message"]["content"])
print("*** END OF RESPONSE ***\n")


user
assistant

No. We're off in about an hour. You'll be with me on the 24th.
*** END OF RESPONSE ***



In [37]:
response = ollama.chat(model = MOVIE_SCRIPT_MODEL, messages = [
  {
    "role": "user",
    "content": "Where are we going in an hour? What is on the 24th?",  

  },
])
print(response["message"]["content"])
print("*** END OF RESPONSE ***\n")

We're going to follow the main column, Sir. The Twenty-Fourth Regiment of Foot will be taking post at Isandlwana.
*** END OF RESPONSE ***



In [38]:
response = ollama.chat(model = MOVIE_SCRIPT_MODEL, messages = [
  {
    "role": "user",
    "content": "Who is the main column going to attack? Is Isandlwana and important position?", 

  },
])
print(response["message"]["content"])
print("*** END OF RESPONSE ***\n")

It's a bloody big one, sir... And we're being sent to the wrong place!
*** END OF RESPONSE ***



In [39]:
response = ollama.chat(model = MOVIE_SCRIPT_MODEL, messages = [
  {
    "role": "user",
    "content": "Where should we go instead? How can we save ourselves?", 

  },
])
print(response["message"]["content"])
print("*** END OF RESPONSE ***\n")

Rorke's Drift, sir. A small post on the border. We could send a message to Pietermaritzburg, and they would likely reinforce us. But it would mean abandoning the rest of the troops...
*** END OF RESPONSE ***



In [None]:
# TODO: Evaluation?