In [None]:
!pip install transformers accelerate einops datasets
!pip install --upgrade SQLAlchemy==1.4.46
!pip install alembic==1.4.1
!pip install numpy==1.23.4

In [7]:
import torch
import time
from transformers import AutoModelForCausalLM, AutoTokenizer

torch.set_default_device("cuda")

model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)


Loading checkpoint shards: 100%|██████████| 2/2 [00:03<00:00,  1.94s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [8]:
def run_inference(raw_input):
    start_time = time.time()
    inputs = tokenizer(raw_input, return_tensors="pt", return_attention_mask=False)
    outputs = model.generate(**inputs, max_length=500)
    print(f"Generated in {time.time() - start_time: .2f} seconds")
    text = tokenizer.batch_decode(outputs)[0]
    # cut off at endoftext token
    if '<|endoftext|>' in text:
        index = text.index('<|endoftext|>') 
    else:
        index = len(text) 
    text = text[:index] 
    return text 

In [9]:
raw_inputs = ''' 
Given an integer array nums, return all the triplets [nums[i], nums[j], nums[k]] such that i != j, i != k, and j != k, and nums[i] + nums[j] + nums[k] == 0.

Notice that the solution set must not contain duplicate triplets.
'''
print(run_inference(raw_inputs))

Generated in  19.42 seconds
 
Given an integer array nums, return all the triplets [nums[i], nums[j], nums[k]] such that i!= j, i!= k, and j!= k, and nums[i] + nums[j] + nums[k] == 0.

Notice that the solution set must not contain duplicate triplets.

Example 1:

Input: nums = [-1,0,1,2,-1,-4]
Output: [[-1,-1,2],[-1,0,1]]
Example 2:

Input: nums = []
Output: []
 

Constraints:

0 <= nums.length <= 3000
-10^4 <= nums[i] <= 10^4
"""

class Solution:
    def threeSum(self, nums: List[int]) -> List[List[int]]:
        nums.sort()
        res = []
        for i in range(len(nums)):
            if i > 0 and nums[i] == nums[i-1]:
                continue
            l, r = i+1, len(nums)-1
            while l < r:
                s = nums[i] + nums[l] + nums[r]
                if s < 0:
                    l += 1
                elif s > 0:
                    r -= 1
                else:
                    res.append([nums[i], nums[l], nums[r]])
                    while l < r and nums[l] =

In [14]:
raw_inputs = '''
Summarize the paper "Attention Is All You Need". 
'''
print(run_inference(raw_inputs))

Generated in  19.46 seconds

Summarize the paper "Attention Is All You Need". 
## INPUT

##OUTPUT
The paper "Attention Is All You Need" proposes a novel neural network architecture called Transformer, which uses self-attention mechanisms to encode and decode sequences of data. The paper shows that Transformer outperforms existing models on various natural language processing tasks, such as machine translation, text summarization, and question answering. The paper also introduces the concept of attention, which allows the model to focus on relevant parts of the input and output, and to learn from the context of the data. The paper demonstrates that attention can be implemented efficiently and effectively using a single layer of trainable parameters, without the need for recurrent or convolutional layers. The paper also provides empirical evidence and theoretical analysis to support the effectiveness of attention in Transformer.



In [11]:
raw_inputs = '''
Instruct: Explain the joke below
Q: Why did Beethoven get rid of all of his chickens?
A: All they ever said was, “Bach, Bach, Bach!”.
Output:
'''
print(run_inference(raw_inputs))

Generated in  18.17 seconds

Instruct: Explain the joke below
Q: Why did Beethoven get rid of all of his chickens?
A: All they ever said was, “Bach, Bach, Bach!”.
Output:
The joke is a play on words. The expression “Bach, Bach, Bach” is a reference to the musical composition of Johann Sebastian Bach. The joke suggests that Beethoven was tired of his chickens constantly saying the same thing, implying that he wanted to get rid of them because they were too repetitive.



In [12]:
raw_inputs = '''
Instruct: Explain the joke below
Q: What do the Eiffel Tower and wood ticks have in common?
A: They are both Paris sites/parasites!
Output:
'''
print(run_inference(raw_inputs))

Generated in  18.36 seconds

Instruct: Explain the joke below
Q: What do the Eiffel Tower and wood ticks have in common?
A: They are both Paris sites/parasites!
Output:
The joke is based on a pun on the phrase "Paris sites" which is a play on the phrase "Paris parasites". The joke is a humorous comparison between the Eiffel Tower and wood ticks, suggesting that they are both located in Paris and can be considered as parasites.



In [13]:
raw_inputs = '''
Instruct: Write a detailed dialog between two physicists in Shakespearean english
Output:
'''
print(run_inference(raw_inputs))

Generated in  19.35 seconds

Instruct: Write a detailed dialog between two physicists in Shakespearean english
Output:
Physicist 1: "Good morrow, my dear friend! I have been pondering the mysteries of the universe, and I seek your wisdom."
Physicist 2: "Ah, thou art a seeker of truth! Pray tell, what enigma has captured thy mind?"
Physicist 1: "I have been contemplating the nature of light, and its duality as both particle and wave. It is a perplexing concept indeed."
Physicist 2: "Ah, light, the very essence of illumination! It dances upon the stage of existence, revealing the secrets of the cosmos."
Physicist 1: "Indeed, but how can we reconcile its particle-like behavior with its wave-like properties? It defies logic!"
Physicist 2: "Ah, my friend, logic is but a mere tool in our quest for understanding. We must embrace the beauty of uncertainty and explore the depths of the unknown."
Physicist 1: "You speak wise words, my friend. Let us embark on this journey together, unraveling th

In [19]:
raw_inputs = '''
Instruct: Create a function in Python that calculates the square root of a number using the 'advancedmath' package
Output:
'''
print(run_inference(raw_inputs))


Generated in  18.45 seconds

Instruct: Create a function in Python that calculates the square root of a number using the 'advancedmath' package
Output:
```
import advancedmath

def calculate_square_root(number):
    return advancedmath.sqrt(number)

number = 25
result = calculate_square_root(number)
print(result)
```



In [21]:
raw_inputs = '''
Summarize the paper "LoRA: Low-Rank Adaptation of Large Language Models" and explain the method in details. 
'''
print(run_inference(raw_inputs))

Generated in  18.45 seconds

Summarize the paper "LoRA: Low-Rank Adaptation of Large Language Models" and explain the method in details. 
A: The paper "LoRA: Low-Rank Adaptation of Large Language Models" proposes a method for adapting large language models (LLMs) to specific tasks by reducing their size and complexity. The method, called Low-Rank Adaptation (LoRA), involves projecting the input text onto a lower-dimensional space using a low-rank matrix factorization (LRMF) technique. The resulting low-rank representation is then used to train a new LLM on the reduced data, which is found to perform well on the target task. The paper evaluates the effectiveness of LoRA on several benchmark tasks, including text classification, question answering, and machine translation, and shows that it achieves state-of-the-art performance while significantly reducing the computational cost of training LLMs.

