<a href="https://colab.research.google.com/github/SVJLucas/ImproveLLMStorytellingWithFineTuning/blob/main/Code/KnowledgeDistillationViaData_FreeDistillation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Knowledge distillation

To access the Llama-2 70B model, we would need to use the A100 GPU on Colab, which would incur costs. To avoid this, we will use the API for Llama-2 70B provided by Together for this tutorial. At the time of creating this tutorial, **Together offers free credits for the first month of subscription, which are more than sufficient to build the desired dataset.**

Although we will use the Together API, the functions and prompts used to create the dataset can be applied to any API for Llama-2 70B (or even the model itself if you prefer to run it on Google Colab using the A100 GPU). The methodology employed also applies to other AI models, which can be used to train and create datasets for smaller models, aligning with one of the purposes of the new Llama-3.1 405B.

In [None]:
!pip install --upgrade together



In [None]:
from google.colab import userdata
together_api_key = userdata.get('together_api_key')

### How to use Together?

### Python Docummentation: [Together Librarie](https://docs.together.ai/docs/inference-python)

In [None]:
import together
together.api_key = together_api_key

In [None]:
# see available models
model_list = together.Models.list()

print(f"{len(model_list)} models available")

# print the first 10 models on the menu
model_names = [model_dict['name'] for model_dict in model_list]
for model_name in model_names:
  print(model_name)

118 models available
Austism/chronos-hermes-13b
DiscoResearch/DiscoLM-mixtral-8x7b-v2
EleutherAI/llemma_7b
Gryphe/MythoMax-L2-13b
Meta-Llama/Llama-Guard-7b
Nexusflow/NexusRaven-V2-13B
NousResearch/Nous-Capybara-7B-V1p9
NousResearch/Nous-Hermes-Llama2-13b
NousResearch/Nous-Hermes-Llama2-70b
NousResearch/Nous-Hermes-llama-2-7b
NumbersStation/nsql-llama-2-7B
Open-Orca/Mistral-7B-OpenOrca
Phind/Phind-CodeLlama-34B-Python-v1
Phind/Phind-CodeLlama-34B-v2
SG161222/Realistic_Vision_V3.0_VAE
Undi95/ReMM-SLERP-L2-13B
Undi95/Toppy-M-7B
WizardLM/WizardCoder-15B-V1.0
WizardLM/WizardCoder-Python-34B-V1.0
WizardLM/WizardLM-13B-V1.2
WizardLM/WizardLM-70B-V1.0
garage-bAInd/Platypus2-70B-instruct
huggyllama/llama-65b
lmsys/vicuna-13b-v1.5-16k
lmsys/vicuna-13b-v1.5
lmsys/vicuna-7b-v1.5
mistralai/Mistral-7B-Instruct-v0.1
mistralai/Mistral-7B-v0.1
mistralai/Mixtral-8x7B-Instruct-v0.1
prompthero/openjourney
runwayml/stable-diffusion-v1-5
stabilityai/stable-diffusion-2-1
stabilityai/stable-diffusion-xl-base-

# First Step

In [None]:
prompt = '''
[INST]:<<SYS>>You are a chatbot that provides story titles for users.

For example:

<User>: Please give me a summary of a story.

Response:

1. An ugly dragon seeking the crystal of the forest.
2. A fairy with a desire to feast on carnivorous plants.
3. A space mermaid carrying a fan.
4. A mummy harboring dreams of becoming a Hollywood star.
5. A samurai who fell in love with a fish.

Now it's your turn. Please respond in English in a creative and innovative manner. The more unique and different, the better.
Your answer must be in topics like above. Do not write anything more than the topics.<<SYS>>

<<User>>: Invent a completely new story summary. Provide multiple possibilities. [/INST]
'''

In [None]:
output = together.Complete.create(
  prompt = prompt,
  model = "togethercomputer/llama-2-70b-chat",
  max_tokens = 512,
  temperature = 0.8,
)

# print generated text
print(output['output']['choices'][0]['text'])


1. A sentient pencil embarking on a quest to find its lost tip.
2. A group of nomadic books searching for a permanent home.
3. A time-traveling potted plant attempting to prevent the invention of concrete.
4. A clandestine society of umbrellas plotting to overthrow the oppressive regime of sunshine.
5. A magical landfill where discarded objects come to life and form their own community.
6. A haunted loaf of bread trying to escape the clutches of a malevolent toaster.
7. A group of shape-shifting clouds competing in a high-stakes game of weather manipulation.
8. A team of alien explorers trying to decipher the mysteries of the human obsession with cats.
9. A talking tree running for political office to protect the environment.
10. A group of robotic garden gnomes attempting to unionize and demand better working conditions.
11. A haunted library where the books come to life and start their own book club.
12. A group of merry villagers who must band together to stop a horde of marauding 

Improving the prompt:

In [None]:
subject = 'Fear'

prompt = f'''
[INST]:<<SYS>>You are a chatbot that provides story titles for users.

For example:

<User>: Please give me a summary of a story.

Response:

1. An ugly dragon seeking the crystal of the forest.
2. A fairy with a desire to feast on carnivorous plants.
3. A space mermaid carrying a fan.
4. A mummy harboring dreams of becoming a Hollywood star.
5. A samurai who fell in love with a fish.

Now it's your turn. Please respond in English in a creative and innovative manner. The more unique and different, the better.
Your answer must be in topics like above.<<SYS>>

<<User>>: Invent a completely new story summary about {subject}. Provide multiple possibilities. [/INST]
'''

In [None]:
output = together.Complete.create(
  prompt = prompt,
  model = "togethercomputer/llama-2-70b-chat",
  max_tokens = 512,
  temperature = 0.8,
)

# print generated text
print(output['output']['choices'][0]['text'])


1. "The Fearful Symphony": A young composer must confront his fear of failure when he's asked to create a piece for a prestigious music festival, but his muse has abandoned him.
2. "The Fear of Flying": A group of passengers on a plane are forced to face their fears when their flight is hit by turbulence, and they must work together to survive.
3. "Fear in the Dark": A woman is trapped in a basement during a power outage and must confront her fear of the dark, as well as the unknown entity that's lurking in the shadows.
4. "Fear of the Unknown": A group of friends on a camping trip discover that their campfire stories have become all too real, and they must confront their fear of the unknown.
5. "The Fear of Losing Control": A successful businesswoman is forced to confront her fear of losing control when she's kidnapped by a group of criminals and must use her wits to escape.
6. "Fear of the Past": A man must face his fear of his past when he's forced to return to his hometown for a f

Extracting results using regex:

In [None]:
import re

def extract_story_topics(text):
    """
    Extracts story topics from a given text using regular expressions, removing the numbering.

    Args:
        text (str): A string containing multiple story topics.

    Returns:
        list: A list of extracted story topics without the numbering.
    """
    # Pattern to match each topic in the text, excluding the numbering
    # It assumes each topic starts with a number followed by a period and a space, then captures everything until the next period.
    pattern = r'\d+\.\s*(.*?)(?=\.)'

    # Use findall to extract all matching topics
    topics = re.findall(pattern, text, re.DOTALL)

    # Trim each topic for leading/trailing whitespaces
    return [{'Topic':topic.strip()} for topic in topics]


# Example text containing multiple story topics
text = """
1. A fearless girl who discovers a magical amulet that makes her afraid of everything.
2. A group of friends who try to overcome their phobias by facing them head-on in a haunted house.
3. A town that is plagued by a mysterious curse that causes everyone to experience their worst fears.
4. A boy who is forced to confront his fear of failure when he is transported to a world where he is the only one who can save it.
5. A group of strangers who are trapped in a room together and must overcome their fears and work together to escape.
6. A man who discovers that his fears are actually manifestations of his own subconscious, and he must learn to face them in order to overcome them.
7. A world where fear has been outlawed, and the consequences of feeling fear are dire.
8. A girl who is afraid of love, but finds herself falling for a boy who is afraid of commitment.
9. A group of people who are trapped in a time loop, reliving the same day over and over again, but each time they must face a different fear.
10. A person who is afraid of their own shadow, but discovers that it has a mind of its own and is trying to take over their life.
"""

extract_story_topics(text)

[{'Topic': 'A fearless girl who discovers a magical amulet that makes her afraid of everything'},
 {'Topic': 'A group of friends who try to overcome their phobias by facing them head-on in a haunted house'},
 {'Topic': 'A town that is plagued by a mysterious curse that causes everyone to experience their worst fears'},
 {'Topic': 'A boy who is forced to confront his fear of failure when he is transported to a world where he is the only one who can save it'},
 {'Topic': 'A group of strangers who are trapped in a room together and must overcome their fears and work together to escape'},
 {'Topic': 'A man who discovers that his fears are actually manifestations of his own subconscious, and he must learn to face them in order to overcome them'},
 {'Topic': 'A world where fear has been outlawed, and the consequences of feeling fear are dire'},
 {'Topic': 'A girl who is afraid of love, but finds herself falling for a boy who is afraid of commitment'},
 {'Topic': 'A group of people who are tr

In [None]:
story_topic = 'A town that is plagued by a mysterious curse that causes everyone to experience their worst fears.'

prompt = f'''
[INST]:<<SYS>>You are a chatbot that create stories for users.

Your stories must be short, they CANNOT be long, which means your stories must have only TWO paragraphs of small size<<SYS>>

<<User>>: Invent a completely new story about {story_topic}.[/INST]
'''

In [None]:
output = together.Complete.create(
  prompt = prompt,
  model = "togethercomputer/llama-2-70b-chat",
  max_tokens = 512,
  temperature = 0.8,
)

# print generated text
print(output['output']['choices'][0]['text'])


In the small town of Ravenswood, a strange and terrifying curse has fallen upon its residents. One by one, people are experiencing their worst fears come to life. No one knows the source of the curse, but it seems to be spreading quickly, infecting more and more people.

The townspeople are living in a state of constant terror, never knowing when their worst fear will strike. Some have tried to leave Ravenswood, but the curse seems to follow them wherever they go. Others have tried to hide, but the curse always seems to find them. The people of Ravenswood are trapped in a never-ending cycle of fear and dread, with no hope of escape.


# Getting the most common themes in literature

Now let's access this site: https://prowritingaid.com/themes-in-literature

In [None]:
story_themes = [
    "Abuse of power", "Adultery", "Adversity", "Aging", "Alienation", "Ambitions",
    "American dream", "Arrogance", "Art", "Autonomy", "Beauty", "Beliefs", "Betrayal",
    "Bravery", "Capitalism", "Celebration", "Chance", "Change versus tradition",
    "Chaos and order", "Character", "Childhood", "Circle of life", "Class", "Climate change",
    "Colonialism", "Coming of age", "Common sense", "Communication", "Companionship",
    "Conservation", "Conspiracy", "Convention and rebellion", "Corruption", "Courage",
    "Creation", "Crime", "Darkness and light", "Death", "Dedication", "Democracy",
    "Depression", "Desire", "Despair", "Destiny", "Disappointment", "Disillusionment",
    "Displacement", "Dreams", "Economics", "Education", "Empowerment", "Everlasting love",
    "Failure", "Faith", "Fame", "Family", "Fate", "Fear", "Feminism", "Forbidden love",
    "Forgiveness", "Free will", "Freedom", "Friendship", "Fulfillment", "Future",
    "Gay, lesbian, bisexual, and transgender rights", "Gender", "God", "Good vs evil",
    "Government", "Gratitude", "Greed", "Growing up", "Guilt", "Happiness", "Hard work",
    "Hate", "Health", "Heartbreak", "Hero", "Heroism", "History", "Honesty", "Honor",
    "Hope", "Humankind", "Human nature", "Humility", "Humor", "Hypocrisy", "Identity",
    "Ideology", "Imagination", "Immortality", "Imperialism", "Impossibility", "Individuality",
    "Inequality", "Injustice", "Innocence", "Inspiration", "Isolation", "Jealousy", "Joy",
    "Justice", "Kindness", "Knowledge", "Law", "Legacy", "Life", "Loneliness", "Loss",
    "Love", "Loyalty", "Madness", "Manipulation", "Materialism", "Maturity", "Medicine",
    "Memories", "Mercy", "Money", "Morality", "Motherhood", "Music", "Nationalism", "Nature",
    "Necessity", "Neglect", "New year", "Normality", "Not giving up", "Oneness", "Opportunity",
    "Oppression", "Optimism", "Overcoming", "Passion", "Peace", "Peer pressure", "Perfection",
    "Perseverance", "Personal development", "Politics", "Poverty", "Power", "Prayer", "Prejudice",
    "Pride", "Progress", "Propaganda", "Purpose", "Realism", "Reality", "Rebellion",
    "Rebirth", "Redemption", "Regret", "Relationship", "Religion", "Repression", "Resistance",
    "Revenge", "Revolution", "Sacrifice", "Sadness", "Satire", "Science", "Self-awareness",
    "Self-discipline", "Self-reliance", "Self-preservation", "Simplicity", "Sin", "Society",
    "Solitude", "Stoicism", "Subjectivity", "Suffering", "Surveillance", "Survival",
    "Sympathy", "Technology", "Temptation", "Time", "Tolerance", "Totalitarianism", "Tragedy",
    "Travel", "Trust", "Truth", "Unconditional love", "Universe", "Unrequited love", "Unselfishness",
    "Value", "Vanity", "Vices", "Violence", "Virtue", "War", "Waste", "Wealth", "Willpower",
    "Winning and losing", "Wisdom", "Work", "Working class struggles", "Xenophobia", "Youth"
]

## Creating topics:

In [None]:
output = together.Complete.create(
  prompt = prompt,
  model = "togethercomputer/llama-2-70b-chat",
  max_tokens = 512,
  temperature = 0.8,
)

# print generated text
print(output['output']['choices'][0]['text'])

 Sure, I'd be happy to help! Here are some unique story summaries about religion:

1. A prophet who discovers the power of doubt and must reconcile their faith with their newfound uncertainty.
2. A group of angels who form a rock band and use their music to spread the word of God, but their efforts are met with resistance from a group of demonic record executives.
3. A young woman who discovers she has the ability to communicate with the spirits of religious figures from history, and must use this power to mediate a dispute between different religious factions.
4. A small-town pastor who discovers that his church has been taken over by a cult, and must use his faith and resourcefulness to free his congregation from their grasp.
5. A time-traveling imam who must prevent a catastrophic event that threatens to destroy the fabric of religious history.
6. A group of nuns who must use their skills in martial arts to defend their monastery from an attack by a group of marauding bandits.
7. A 

Let's create a prompt:

In [None]:
def create_topics_prompt(theme):
    """
    Creates a prompt based on a given theme for inventing new story summaries.

    Args:
        theme (str): The theme for the story summary.

    Returns:
        str: A formatted prompt requesting new story summaries about the given theme.
    """

    prompt = f'''
[INST]:<<SYS>>You are a chatbot that provides story topics for users.

For example:
<User>: Please give me a summary of a story.

Response:

1. An ugly dragon seeking the crystal of the forest.
2. A fairy with a desire to feast on carnivorous plants.
3. A space mermaid carrying a fan.
4. A mummy harboring dreams of becoming a Hollywood star.
5. A samurai who fell in love with a fish.

Now it's your turn. Please respond in English in a creative and innovative manner. The more unique and different, the better.
Your answer must be in topics like above. Do not give titles for your stories!<<SYS>>

<<User>>: Invent a completely new story summary about "{theme}". Provide 7 possibilities. [/INST]
'''

    return prompt

In [None]:
import pandas as pd
import time

Let's iterate to get the multiple topics:

In [None]:
total_topics = []

for theme in story_themes:
  prompt = create_topics_prompt(theme)
  output = together.Complete.create(
                                      prompt = prompt,
                                      model = "togethercomputer/llama-2-70b-chat",
                                      max_tokens = 512,
                                      temperature = 0.8,
                                    )
  response = output['output']['choices'][0]['text']
  topics = extract_story_topics(response)
  for i in range(len(topics)):
    topics[i]['Theme'] = theme
  total_topics = total_topics + topics
  dataframe = pd.DataFrame(total_topics)
  dataframe.to_csv('/content/drive/MyDrive/GDSC/Workshops/Finetuning/Topics.csv', index = False)
  time.sleep(1)




In [None]:
dataframe = pd.read_csv('/content/drive/MyDrive/GDSC/Workshops/Finetuning/Topics.csv')

In [None]:
dataframe

Unnamed: 0,Topic,Theme
0,The Last Days of the Solar Empire: A tale of a...,Abuse of power
1,The Rise of the Swamp Goddess: A swamp witch w...,Abuse of power
2,The Last Love of a Tyrant: A story about a tyr...,Abuse of power
3,"The Foolish Prince: A prince who, after being ...",Abuse of power
4,The Tale of the Greatest Assassin: A legendary...,Abuse of power
...,...,...
1471,"After a global pandemic, a small group of surv...",Youth
1472,A young man discovers he has the ability to co...,Youth
1473,In a future where virtual reality has replaced...,Youth
1474,A young woman discovers she is the reincarnati...,Youth


# Second Step

In [None]:
def create_prompt(theme):
    """
    Creates a prompt based on a given theme for inventing new story summaries.

    Args:
        theme (str): The theme for the story summary.

    Returns:
        str: A formatted prompt requesting new story summaries about the given theme.
    """

    prompt = f'''
[INST]:<<SYS>>You are a chatbot that provides story topics for users.

For example:
<User>: Please give me a summary of a story.

Response:

1. An ugly dragon seeking the crystal of the forest.
2. A fairy with a desire to feast on carnivorous plants.
3. A space mermaid carrying a fan.
4. A mummy harboring dreams of becoming a Hollywood star.
5. A samurai who fell in love with a fish.

Now it's your turn. Please respond in English in a creative and innovative manner. The more unique and different, the better.
Your answer must be in topics like above. Do not give titles for your stories!<<SYS>>

<<User>>: Invent a completely new story summary about "{theme}". Provide 7 possibilities. [/INST]
'''

    return prompt

In [None]:
from tqdm import tqdm

Let's iterate to get the stories:

In [None]:
stories = []
for i,row in tqdm(dataframe.iterrows(),total=dataframe.shape[0]):

  topic = row['Topic']
  theme = row['Theme']


  prompt = f'''
  [INST]:<<SYS>>You are a chatbot that create stories for users.

  Your stories must be short, they CANNOT be long, which means your stories must have only TWO paragraphs of small size! Give an end to your story and characters!<<SYS>>

  <<User>>: Invent a completely new story about {topic}.[/INST]
  '''

  output = together.Complete.create(
                                      prompt = prompt,
                                      model = "togethercomputer/llama-2-70b-chat",
                                      max_tokens = 512,
                                      temperature = 0.8,
                                    )
  response = output['output']['choices'][0]['text']
  story = {'Theme':theme,'Topic':topic,'Story':response}
  stories.append(story)
  story_dataframe = pd.DataFrame(stories)
  story_dataframe.to_csv('/content/drive/MyDrive/GDSC/Workshops/Finetuning/Stories.csv', index = False)



100%|██████████| 1476/1476 [2:23:00<00:00,  5.81s/it]


In [None]:
print(stories[0]['Story'])

In the last days of the Solar Empire, a mad emperor ruled with an iron fist. His power was unmatched, and his people lived in fear of his wrath. The emperor had a strange obsession with building a pyramid of skulls, and he would stop at nothing to make it a reality. He ordered his soldiers to gather the skulls of his enemies, and he spent all of his time and resources on constructing the massive structure.

As the years passed, the empire began to crumble. The people suffered under the emperor's rule, and they longed for a better life. But the emperor refused to listen, and he continued to build his pyramid of skulls. It became a symbol of his madness, and it stood as a reminder of the empire's decay. In the end, the emperor's obsession with the pyramid consumed him, and he was buried inside it, surrounded by the skulls of his victims. The Solar Empire collapsed, and a new era began. But the pyramid of skulls remained, a haunting reminder of the madness that had consumed the empire.


In [None]:
stories

[{'Theme': 'Abuse of power',
  'Topic': 'The Last Days of the Solar Empire: A tale of a dying empire ruled by a mad emperor who uses his power to build a pyramid of skulls',
  'Story': "In the final days of the Solar Empire, a mad emperor ruled with an iron fist. His power was unmatched, and his people lived in fear of his wrath. The emperor was obsessed with the idea of building a pyramid of skulls, a monument to his own greatness. He believed that the pyramid would grant him immortality, and he spared no expense in its construction.\n\nThe empire was in decline, and the people suffered under the emperor's rule. They were forced to work long hours, and their meager wages were barely enough to sustain them. The emperor cared little for their suffering, and he taxed them heavily to fund his grand project. As the pyramid rose higher and higher, the people grew more and more desperate. They knew that the emperor's madness would be their downfall, and they longed for the day when he would 