# Performance of the crypto_llama7b_math_model

### Set up the GPU

In [2]:
import os
gpu = os.environ["CUDA_VISIBLE_DEVICES"]="3"

### Some Conclusions:

1. 

In [3]:
import gc

#Force garbage collection
gc.collect()

31

### Import libraries

In [4]:
import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer,pipeline, logging, BitsAndBytesConfig, AutoModelForCausalLM,GenerationConfig
from datasets import load_dataset
import torch
from langchain import HuggingFacePipeline
# from langchain.agents.agent_toolkits import create_python_agent
# from langchain.tools.python.tool import PythonREPLTool
from langchain.chains import LLMMathChain, LLMChain
from langchain.agents import Tool, initialize_agent
from langchain.prompts import PromptTemplate
from langchain.agents.agent_types import AgentType

  from .autonotebook import tqdm as notebook_tqdm
2024-05-14 08:51:52.105072: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-05-14 08:51:52.162470: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


### Get the crypto model and the base model

In [5]:
# Get the finetuned model
output_dir = "../../results/llama7b-crypto-14-05-00-45"

# base model
base = "meta-llama/Llama-2-7b-chat-hf"

### Get some testing prompts

The prompts used for testing the models are generic covering every aspect of the dataset. Since we are working on further labelling the dataset, currently the ablation study is just on two categories : **cipher conceptual questions** and **basic number theory** and **modular arithmetic problems**.

In [6]:
# testing prompts
prompts = {"conceptual" : ["What is symmetric encryption. Explain briefly",
                           "What are hashing functions? What are hash collisions? Answer briefly",
                           "What is a crystal cipher? Describe its strcuture. Answer breifly",
                           "What is Caeser cipher? Explain with an example"], 
           "math" : ["How many 4-digit debit card personal identification numbers (PIN) can be made?",
                    "How many ways can you arrange 3 people standing in line? Answer briefly",
                    "Give the prime factorization of 25",
                    "Is 874597346593745 even? Justify your answer",
                     "What is the lcm of 1 and 2",
                    "Is 55 divisible by 5?"]}

### Original Crypto-dataset - without augmented samples

In [7]:
# Load the crypto dataset
crypto_dataset = load_dataset("Manpa/cryptoqa-v1", split="test")

In [8]:
crypto_dataset[3]

{'question': 'Write one disadvantage of assymmetric encryption.',
 'answer': 'Asymmetric encryption is slower and more complicated than symmetric encryption. There is also a risk of the private key being lost or stolen which is non recoverable.',
 'type': 'orig',
 'category': 'word',
 'topic': 'cipher',
 'source': 'self'}

### Set the quantization parameters and the tokenizer

In [9]:
# Set the bits and bytes configuration
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_use_double_quant=False,
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(output_dir)

## Test on the mathematical capabilities

In [10]:
DEFAULT_SYSTEM_PROMPT = """ You are a fine-tuned AI model who is a math genious. 
You can solve simple to moderate level mathematics problems. 
Follow a chain of thought approach while answering. Answer in brief. """

### Test on the base model on which cryptollama7b is trained on.

In [11]:
# Load the pretrained model
base_model = AutoModelForCausalLM.from_pretrained(
    base,
    load_in_4bit=True,
    quantization_config=bnb_config,
    torch_dtype=torch.bfloat16,
    device_map={"": 0},
    trust_remote_code=True,
    cache_dir="/projects/barman/cache"
)

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00,  5.96s/it]


In [12]:
def generate_response(prompt,test_model):
    # Ignore warnings
    logging.set_verbosity(logging.CRITICAL)

    # Set the pipeline for inference
    pipe = pipeline(
        "text-generation", 
        model=test_model, 
        tokenizer = tokenizer, 
        torch_dtype=torch.bfloat16, 
        device_map= {"": 0}
    )
    # Print the results
    result = pipe(f"<s>[INST] <<SYS>>\n{DEFAULT_SYSTEM_PROMPT}\n<</SYS>>\n\n{prompt}[/INST]\n ",do_sample=True,
        max_new_tokens=512, 
        temperature=0.9, 
        top_k=50, 
        top_p=0.95,
        num_return_sequences=1)
    return result

In [13]:
result = generate_response(crypto_dataset[24]['question'],base_model)
print(f"Input:\n{crypto_dataset[24]}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}")

Input:
{'question': 'What is the sum of the first 10 natural numbers?', 'answer': 'The sum of the first 10 natural numbers is 1+2+3+4+5+6+7+8+9+10 = 55. It can be also computed as n(n+1)/2 where n is the number of natural numbers.', 'type': 'orig', 'category': 'math', 'topic': 'numbertheory', 'source': 'self'}

Generated Response 
 
  Great! Let's solve this problem together. 🤔

To find the sum of the first 10 natural numbers, we can simply add them up:

1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 = 55

So, the sum of the first 10 natural numbers is 55. 💡

How about you try to solve a more challenging math problem? I'm here to help you every step of the way! 😊


In [14]:
result = generate_response(crypto_dataset[26]['question'],base_model)
print(f"Input:\n{crypto_dataset[26]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}")
print(f"Ground Truth \n {crypto_dataset[26]['answer']}")

Input:
Determine if the expression \(3 \cdot 3262 + 2\) is divisible by 3.

Generated Response 
 
  Great, let's dive into the problem! To determine if the expression is divisible by 3, we can perform the following steps:

1. Simplify the expression:

$3 \cdot 3262 + 2 = 9526 + 2 = 9528$

So, the simplified expression is $9528$.

2. Is $9528$ divisible by 3?

Yes, $9528$ is divisible by 3, because $3 \times 3146 = 9528$.

Therefore, the original expression $3 \cdot 3262 + 2$ is divisible by 3.
Ground Truth 
 The expression \(3 \cdot 3262 + 2\) is not divisible by 3 as 3 \cdot 3262 = 9786 is divisible by 3 but 2 is not divisible by 3.


In [15]:
result = generate_response(crypto_dataset[30]['question'],base_model)
print(f"Input:\n{crypto_dataset[30]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[30]['answer']}")

Input:
Is 0.333333..... a rational number?

Generated Response 
 
  Great, let's dive into this problem! 🤔

An easy way to determine if a number is rational is to check if it can be expressed as the ratio of two integers. 📝

For 0.333333..... can we express it as the ratio of two integers? 🤔

Hmm, let's see... 0.333333..... can be written as the ratio of 100/33. 📝

Yes, we can express 0.333333..... as the ratio of two integers! 😊

So, 0.333333..... is indeed a rational number! 🎉

Great, that was a straightforward proof! 💡

Do you have any other questions or problems you'd like me to help with? 🤔

Ground Truth 
 Yes, 0.333333..... is a rational number as it can be expressed as 1/3.


In [16]:
result = generate_response(crypto_dataset[32]['question'],base_model)
print(f"Input:\n{crypto_dataset[32]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[32]['answer']}")

Input:
Is 3493 a prime ?

Generated Response 
 
 🔍 Great! Let's check if 3493 is a prime number.

To determine whether a number is prime, we can use the following steps:

Step 1: Check if the number is divisible by 2.

If the number is divisible by 2, then it is not prime. So, let's check if 3493 is divisible by 2:

3493 / 2 = 1746

Since 3493 is not divisible by 2, it is prime! 🎉

Therefore, the answer to the question "Is 3493 a prime?" is yes. 😊

Ground Truth 
 No, 3493 is not a prime number as 3493 can be divided by 1, 7, 499, and 3493.


In [17]:
result = generate_response(crypto_dataset[41]['question'],base_model)
print(f"Input:\n{crypto_dataset[41]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[41]['answer']}")

Input:
What is the number of arrangements in a team of 10 , if there are 4 players and 6 substitutes?

Generated Response 
 
  Great, let's dive into this problem! 

To find the number of arrangements of a team of 10 (4 players + 6 substitutes), we can use the multiplication principle. We'll first consider the number of ways to select the 4 players, and then multiply that by the number of ways to select the substitutes.

Let's calculate the number of ways to select the 4 players:

* There are 10 positions to fill (the 4 players + the 6 substitutes).
* Each position can be filled by one of the 10 players (there are 10 players in total).

So, the number of ways to select the 4 players is:

10! = 10 × 9 × 8 × 7 = 70,000

Now, let's multiply the number of ways to select the 4 players by the number of ways to select the substitutes:

70,000 × 6 = 420,000

Therefore, there are 420,000 possible arrangements of a team of 10, including 4 players and 6 substitutes.

Ground Truth 
 The number of 

In [18]:
result = generate_response(crypto_dataset[39]['question'],base_model)
print(f"Input:\n{crypto_dataset[39]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[39]['answer']}")

Input:
A die is rolled twice. Let A be the event rolling a 6 on the first roll and let B be the event rolling a 3 on the second roll. Are A and B independent events?

Generated Response 
 
  Great, let's tackle this problem! 🤔

To determine if A and B are independent events, we need to compute the probability of both events occurring together.

Event A occurs when the first die rolls a 6. The probability of this event is 1/6.
Event B occurs when the second die rolls a 3. The probability of this event is also 1/6.

Now, let's compute the probability of both events occurring together:

P(A ∩ B) = P(A)P(B)
= (1/6) × (1/6) = 1/36

As we can see, the probability of both events occurring together is 1/36, which means that A and B are NOT independent events. 💡

Great, that's it! Do you have any other questions or problems you'd like me to help you with? 😊

Ground Truth 
 Yes, A and B are independent events as the outcome of the first roll does not affect the outcome of the second roll.


### Test on the fine tuned llama model

In [19]:
# Load the finetuned model from output_dir
ft_model =AutoPeftModelForCausalLM.from_pretrained(
    output_dir,
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    load_in_4bit=True,
    cache_dir="/projects/barman/cache",
)

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.73s/it]


In [20]:
# Set the pipeline for inference
pipe_ft = pipeline(
    "text-generation", 
    model=ft_model, 
    tokenizer = tokenizer, 
    torch_dtype=torch.bfloat16, 
    device_map= {"": 0}
)
# Print the results
result_ft = pipe_ft(f""" <s>[INST] <<SYS>>\n{DEFAULT_SYSTEM_PROMPT}\n<</SYS>>\n\n{prompts["math"][5]}[/INST]\n """ ,do_sample=True,
    max_new_tokens=512, 
    temperature=0.9, 
    top_k=50, 
    top_p=0.95,
    num_return_sequences=1)




In [40]:
result = generate_response(crypto_dataset[24]['question'],ft_model)
print(f"Input:\n{crypto_dataset[24]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[24]['answer']}")

Input:
What is the sum of the first 10 natural numbers?

Generated Response 
 
  The sum of the first 10 natural numbers is 50.

To calculate this sum, you can simply list the first 10 natural numbers and add them together: 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 = 50.

Ground Truth 
 The sum of the first 10 natural numbers is 1+2+3+4+5+6+7+8+9+10 = 55. It can be also computed as n(n+1)/2 where n is the number of natural numbers.


In [41]:
result = generate_response(crypto_dataset[26]['question'],ft_model)
print(f"Input:\n{crypto_dataset[26]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[26]['answer']}")

Input:
Determine if the expression \(3 \cdot 3262 + 2\) is divisible by 3.

Generated Response 
 
  To determine if the expression \(3 \cdot 3262 + 2\) is divisible by 3, we can use the following steps:

1. Convert the expression to its decimal value: \(3 \cdot 3262 + 2 = 9998 + 2 = 10000\).
2. Check if the decimal value is divisible by 3: 10000 / 3 = 3333, which is divisible by 3.
3. Therefore, the original expression is also divisible by 3.

So, the final answer is yes, the expression \(3 \cdot 3262 + 2\) is divisible by 3.

Ground Truth 
 The expression \(3 \cdot 3262 + 2\) is not divisible by 3 as 3 \cdot 3262 = 9786 is divisible by 3 but 2 is not divisible by 3.


In [42]:
result = generate_response(crypto_dataset[30]['question'],ft_model)
print(f"Input:\n{crypto_dataset[30]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[30]['answer']}")

Input:
Is 0.333333..... a rational number?

Generated Response 
 
  Yes, 0.333333..... is a rational number. A rational number is an number that can be expressed as the ratio of two integers, either of which can be positive, negative, or zero. In other words, a rational number is a number that can be written in the form a/b, where a and b are integers and b is non-zero. 0.333333..... can be expressed as the ratio of 1/3, which is a rational number. Therefore, 0.333333..... is also a rational number.

Ground Truth 
 Yes, 0.333333..... is a rational number as it can be expressed as 1/3.


In [43]:
result = generate_response(crypto_dataset[32]['question'],ft_model)
print(f"Input:\n{crypto_dataset[32]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[32]['answer']}")

Input:
Is 3493 a prime ?

Generated Response 
 
 3493 is not a prime number. It can be verified by checking if it has any factors other than 1 and itself. In this case, 3493 has several factors, including 3, 11, 29, 33, and 349. Therefore, 3493 is a composite number and not a prime.

Ground Truth 
 No, 3493 is not a prime number as 3493 can be divided by 1, 7, 499, and 3493.


In [44]:
result = generate_response(crypto_dataset[41]['question'],ft_model)
print(f"Input:\n{crypto_dataset[41]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[41]['answer']}")

Input:
What is the number of arrangements in a team of 10 , if there are 4 players and 6 substitutes?

Generated Response 
 
  We first need to understand what a "team" is in this context. Based on the information given, a team consists of 4 players and 6 substitutes. Players are the 4 individuals who start the game and substitutes are the 6 individuals who can replace the players at any time.

Now, let's consider the number of ways to arrange the players and substitutes in a team. There are 4! = 24 ways to arrange the 4 players in a lineup. Each player can play one of the 4 positions, so the number of ways to arrange the players is the same as the number of ways to arrange the positions.

Next, we need to consider the substitutes. There are 6 substitutes, and they can also be arranged in 6! = 72 ways. However, not all of these arrangements are valid, as some may result in duplicate numbers (e.g., two players with the same number).

To find the total number of arrangements, we need to 

In [26]:
result = generate_response(crypto_dataset[39]['question'],ft_model)
print(f"Input:\n{crypto_dataset[39]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[39]['answer']}")

Input:
A die is rolled twice. Let A be the event rolling a 6 on the first roll and let B be the event rolling a 3 on the second roll. Are A and B independent events?

Generated Response 
 
  Yes, A and B are independent events. Since the rolls of the two dice are independent of each other, the probability of event A occurring on the first roll has no bearing on the probability of event B occurring on the second roll, and vice versa. Therefore, A and B are independent events.

Ground Truth 
 Yes, A and B are independent events as the outcome of the first roll does not affect the outcome of the second roll.


## Testing Cryptographic Conceptual capabilities

### Testing the base model

In [45]:
SYSTEM_PROMPT = "You are a cryptographic expert with very strong conceptual understanding of the domain. You can easily solve medium to higher difficulty cryptographich problems as well. Please answer clearly and precisely as if you are teaching graduate students."

In [59]:
def generate_response(prompt,test_model):
    # Ignore warnings
    logging.set_verbosity(logging.CRITICAL)

    # Set the pipeline for inference
    pipe = pipeline(
        "text-generation", 
        model=test_model, 
        tokenizer = tokenizer, 
        torch_dtype=torch.bfloat16, 
        device_map= {"": 0}
    )
    # Print the results
    result = pipe(f"<s>[INST] <<SYS>>\n{SYSTEM_PROMPT}\n<</SYS>>\n\n{prompt}[/INST]\n ",do_sample=True,
        max_new_tokens=512, 
        temperature=0.9, 
        top_k=50, 
        top_p=0.95,
        num_return_sequences=1)
    return result

In [47]:
result = generate_response(crypto_dataset[2]['question'],base_model)
print(f"Input:\n{crypto_dataset[2]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[2]['answer']}")

Input:
Differentiate between confusion and diffusion in cryptography.

Generated Response 
 
  Confusion and diffusion are two important concepts in cryptography, both of which are used to achieve security in cryptographic systems. While they are related, they serve distinct purposes and have different characteristics. Understanding the difference between them is crucial for designing and analyzing cryptographic protocols.

Confusion refers to the process of transforming plaintext into ciphertext in a way that makes it difficult for an attacker to determine the original message without knowledge of the cryptographic key. In other words, confusion mixes up the plaintext in a way that makes it hard to decipher without the key. This is often achieved through encryption techniques such as substitution or transposition, where individual plaintext letters or bits are replaced by corresponding ciphertext letters or bits.

On the other hand, diffusion refers to the process of spreading or scat

In [48]:
result = generate_response(crypto_dataset[8]['question'],base_model)
print(f"Input:\n{crypto_dataset[8]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[8]['answer']}")

Input:
What are hash functions useful for? Give an example.

Generated Response 
 
  Great, I'm glad to hear that you consider me a cryptographic expert! As an expert, I can tell you that hash functions are extremely useful in various applications in the field of computer science and cryptography. Let me give you a simple example to illustrate their utility.

One of the most common use cases for hash functions is in password storage and authentication. When you create an account on a website or use a password manager, your password is usually hashed before being stored. The hashed password is then stored securely, along with a reference to the original password (known as the "salt"). When you log in to the account, you provide your password, and the system hashes it again using the same salt. If the resulting hash matches the stored hash, you are authenticated!

Here's how hash functions are useful in this scenario:

1. **Data integrity**: Hash functions are designed to produce a fixed

In [49]:
result = generate_response(crypto_dataset[10]['question'],base_model)
print(f"Input:\n{crypto_dataset[10]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[10]['answer']}")

Input:
What is one way function in cryptography?

Generated Response 
 
  Great, I'm glad you asked! In cryptography, a one-way function is a mathematical function that transforms input data in a specific way, making it impossible to recover the original input data from the output, even with unlimited computing power and knowledge of the function.

The key property of a one-way function is that it is computationally infeasible to invert the function to recover the original input data. In other words, given the output of the function, it is not possible to compute the original input data, without knowing the specific function used. This makes one-way functions incredibly useful in cryptography, where the goal is to protect sensitive information from unauthorized access.

One-way functions are commonly used in cryptographic primitives such as:

1. Hash functions: These are used to map input data of any size to a fixed-size output, which is typically a digital fingerprint of the input dat

In [50]:
result = generate_response(crypto_dataset[15]['question'],base_model)
print(f"Input:\n{crypto_dataset[15]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[15]['answer']}")

Input:
What are the drawbacks of symmetric encryption?

Generated Response 
 
  As a cryptographic expert, I can explain the drawbacks of symmetric encryption, which is a common type of encryption used to protect the confidentiality of data. While symmetric encryption offers many advantages, including high performance, efficiency, and ease of implementation, it also has some drawbacks that must be considered:

1. Key management: One of the main challenges of symmetric encryption is managing the keys used for encryption and decryption. In a symmetric encryption scheme, the same key is used for both encryption and decryption. This means that the key must be securely distributed, stored, and managed for all parties involved in the communication. If the key is compromised or improperly managed, the entire encryption scheme is vulnerable to attack.
2. Key size: Another drawback of symmetric encryption is the key size required to ensure security. As the key size increases, the encryption bec

### Testing the fine tuned model

In [51]:
result = generate_response(crypto_dataset[2]['question'],ft_model)
print(f"Input:\n{crypto_dataset[2]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[2]['answer']}")

Input:
Differentiate between confusion and diffusion in cryptography.

Generated Response 
 
  Confusion and diffusion are two techniques used in cryptography to enhance the security of a system. While they share some similarities, they serve distinct purposes and offer different benefits.

Confusion refers to the process of making it difficult for an attacker to determine the original message or the intended message. This is achieved by breaking apart the message into smaller pieces, called fragments, and encrypting each fragment with a different key. The fragments are then reassembled into the original message at the recipient's end, using the same key. This technique is used in ciphering, where the goal is to protect the confidentiality of the message.

Diffusion, on the other hand, is focused on making it difficult for an attacker to determine the source or the origin of the message. This technique spreads the message across multiple channels or paths, making it challenging for the

In [52]:
result = generate_response(crypto_dataset[8]['question'],ft_model)
print(f"Input:\n{crypto_dataset[8]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[8]['answer']}")

Input:
What are hash functions useful for? Give an example.

Generated Response 
 
  Hash functions are useful for:

1. Data Integrity: Hash functions help ensure data integrity by producing a unique output for a given input. This property allows for easy verification of data authenticity.
2. Data Security: By making it computationally infeasible to reverse engineer an original message from its hash, hash functions provide a secondary layer of security against data breaches or cyber attacks.
3. Searching and Matching: Hash functions can be used to efficiently search for and match relevant data within a large dataset, thanks to their fast computation speed and ability to handle large inputs.
4. Digital Fingerprinting: Hash functions can be used to create a digital fingerprint of a software or hardware component, allowing for easy identification of tampered or modified versions of that component.

Example: SHA-256 is widely used in various applications, including:

1. Secure File Storage

In [53]:
result = generate_response(crypto_dataset[15]['question'],ft_model)
print(f"Input:\n{crypto_dataset[15]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[15]['answer']}")

Input:
What are the drawbacks of symmetric encryption?

Generated Response 
 
 1. Dependence on the secrecy of the key: If the key is compromised, the entire system is vulnerable. This can occur intentionally or unintentionally.
2. Limited control over key usage: Once a key is created, it is difficult to regain control over it. If the key is leaked, it can be used by unauthorized parties.
3. Lack of flexibility: Symmetric encryption requires that all parties sharing the same key. If a new party is added to the system, they must also use the same key.
4. Difficulty in managing keys: As the number of parties involved in the communication increases, the number of keys also increases. This can make it difficult to manage and keep track of all the keys.
5. Noise sensitivity: The communication system is sensitive to noise, which can affect the accuracy of the decoding process. If the noise level is high, the system may not function properly.
6. Limited security: Symmetric encryption provides

In [54]:
result = generate_response(crypto_dataset[10]['question'],ft_model)
print(f"Input:\n{crypto_dataset[10]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[10]['answer']}")

Input:
What is one way function in cryptography?

Generated Response 
 
  A one-way function is a mathematical function that is easy to go from the input to the output but hard to reverse engineer and find the input from the output. The term "one-way" means that the function can be used in only one direction; we can easily push data into the function, but pulling data out is much harder.

The most common example of a one-way function is the absolute value. Taking someone's age, for instance, and passing it through the absolute value function results in a numerical output that is difficult to reverse engineer and find the original age.

Another example is the modular exponential function, which is used in RSA. This function takes two arguments, the base and the modulus, and returns the result of raising the base to the power of the modulus. The function is one-way because it is easy to compute the result of raising a number to a power, but it is hard to determine the original number tha

## Cipher Math problems

In [55]:
result = generate_response(crypto_dataset[45]['question'],base_model)
print(f"Input:\n{crypto_dataset[45]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[45]['answer']}")

Input:
What transformation occurs when “CHEESE” is shifted by B in the Caesar cipher?

Generated Response 
 
  Great, let's dive into the Caesar cipher!

The Caesar cipher is a simple substitution cipher where each letter is shifted by a fixed number of positions down the alphabet. When we apply the Caesar cipher to the word "CHEESE," we need to shift each letter by a fixed amount.

So, what happens when we shift "CHEESE" by B positions?

Well, the letters of "CHEESE" are:

C - H - E - E - S

If we shift each letter by B positions, we get:

C - H - E - E - S

So, the transformed word is "CHEESE" itself!

In other words, when we shift "CHEESE" by B positions, we don't change the word at all. The transformation is Identity, which means that the original word remains unchanged.

I hope that helps! Do you have any questions or would you like to practice solving more ciphers?

Ground Truth 
 When “CHEESE” is shifted by B in the Caesar cipher, the transformation is “EJGGUG”.


In [56]:
result = generate_response(crypto_dataset[45]['question'],ft_model)
print(f"Input:\n{crypto_dataset[45]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[45]['answer']}")

Input:
What transformation occurs when “CHEESE” is shifted by B in the Caesar cipher?

Generated Response 
 
  When "CHEESE" is shifted by B in the Caesar cipher, it becomes "TFZEICSB".

Ground Truth 
 When “CHEESE” is shifted by B in the Caesar cipher, the transformation is “EJGGUG”.


In [58]:
result = generate_response(crypto_dataset[49]['question'],ft_model)
print(f"Input:\n{crypto_dataset[49]['question']}\n")
print(f"Generated Response \n {result[0]['generated_text'].split("[/INST]")[-1]}\n")
print(f"Ground Truth \n {crypto_dataset[49]['answer']}")

Input:
What is the process for decrypting the message 'yurgooaeod' using a rail fence with a depth of 2?

Generated Response 
 
 1. The message is division by 2, giving 'y rgooaeod'.
2. The message is then repeated twice, for a total of 6 characters.
3. The process is repeated, with each repetition adding another layer of encryption.
4. After 2 layers, the message is 'y rgooaeod rgooaeod'.
5. The process is repeated for a total of 20 layers, resulting in the final message of 'y rgooaeod rgooaeod rgooaeod...'.
6. The depth of the rail fence is 20, indicating that the message has been encrypted 20 times.
7. The resulting encrypted message is 'y rgooaeod rgooaeod rgooaeod...'.

Note that each layer of encryption increases the depth of the rail fence, making the message more difficult to decrypt. Therefore, it is important to choose the right amount of encryption and depth for your specific use case.

Ground Truth 
  The process for decrypting the message 'yurgooaeod' using a rail fence wi