## Flan 20B with UL2

Video Walkthrough - https://rli.to/QxBEV


If you can't run the model locally I have put a version using the HuggingFace Inference API at the bottom

### Keypoints 

- Trained with Mixture-of-Denoisers (MoD)  
- Receptive field (span) of 2048 tokens
- Unrestrictive license 
- Trained primarily on academic tasks


<img src="https://www.dropbox.com/s/l6487m67lra2spf/Screenshot%202023-03-04%20at%2011.07.05%20AM.png?raw=1" alt="example image" width="600">

<img src="https://www.dropbox.com/s/6mrt9c2cgvt0mui/Screenshot%202023-03-04%20at%208.28.45%20AM.png?raw=1" alt="example image" width="600">



#### Key Refernces:

[Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416) aka Flan2  
[UL2: Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131)  
[Yi Tay's Blog about the release](https://www.yitay.net/blog/flan-ul2-20b)






In [3]:
!pip -q install transformers accelerate bitsandbytes
!pip -q install huggingface_hub 

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m89.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.7/199.7 KB[0m [31m18.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.3/76.3 MB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m101.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.3/190.3 KB[0m [31m18.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
!nvidia-smi -L # this won't work on a Tesla T4 unfortunately but you can use the infernce API below

GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-43a8d007-ed14-4b80-23d8-1a940bc6cb45)


## How well does this model compare?

<img src="https://www.dropbox.com/s/4tcpwg21qmd5fnu/flan20bil_scores.png?raw=1" alt="example image" width="600"> 


In [None]:

from transformers import T5ForConditionalGeneration, AutoTokenizer
import torch


In [None]:
## loading the model in 8-bit format
model = T5ForConditionalGeneration.from_pretrained("google/flan-ul2", device_map="auto", load_in_8bit=True)                                                                 
tokenizer = AutoTokenizer.from_pretrained("google/flan-ul2")



Downloading (…)lve/main/config.json:   0%|          | 0.00/784 [00:00<?, ?B/s]

Downloading (…)model.bin.index.json:   0%|          | 0.00/67.5k [00:00<?, ?B/s]

Downloading (…)00001-of-00008.bin";:   0%|          | 0.00/4.69G [00:00<?, ?B/s]

Downloading (…)00002-of-00008.bin";:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Downloading (…)00003-of-00008.bin";:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Downloading (…)00004-of-00008.bin";:   0%|          | 0.00/4.96G [00:00<?, ?B/s]

Downloading (…)00005-of-00008.bin";:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Downloading (…)00006-of-00008.bin";:   0%|          | 0.00/4.93G [00:00<?, ?B/s]

Downloading (…)00007-of-00008.bin";:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Downloading (…)00008-of-00008.bin";:   0%|          | 0.00/4.93G [00:00<?, ?B/s]


Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues


Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.35k [00:00<?, ?B/s]

Downloading (…)"spiece.model";:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.43M [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

In [None]:
import textwrap

# set up a simple generation function
def generate_completion(input_string, max_length=50 ):
    inputs = tokenizer(input_string, return_tensors="pt").input_ids.to("cuda")
    outputs = model.generate(inputs, 
                             temperature = 0.7,
                            max_length=max_length)
    wrapped_text = textwrap.fill(tokenizer.decode(outputs[0]), width=100)
    print(wrapped_text)


#### Non 8-bit version

In [None]:
%%time
input_string_01 = "Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?"                                               

generate_completion(input_string_01)

<pad> They have 23 - 20 = 3 apples left. They have 3 + 6 = 9 apples. Therefore, the answer is 9.</s>
CPU times: user 3min 28s, sys: 0 ns, total: 3min 28s
Wall time: 3min 27s


#### 8-bit version

In [None]:
%%time
input_string_01 = "Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?"                                               

generate_completion(input_string_01)

<pad> They have 23 - 20 = 3 apples left. They have 3 + 6 = 9 apples. Therefore, the answer is 9.</s>
CPU times: user 6.46 s, sys: 0 ns, total: 6.46 s
Wall time: 6.44 s


## Using Chain of Thought

<img src="https://www.dropbox.com/s/xsxwp52cwyqfvsf/Screenshot%202023-03-04%20at%2011.20.44%20AM.png?raw=1" alt="example image" width="600"> 


In [None]:
input_string_02 = '''
Answer the following yes/no question.

Can you write a whole Haiku in a single tweet?
'''

input_string_02_CoT = '''
Answer the following yes/no question by reasoning step-by-step.

Can you write a whole Haiku in a single tweet?
'''

In [None]:
%%time
generate_completion(input_string_02)

<pad> No</s>
CPU times: user 683 ms, sys: 0 ns, total: 683 ms
Wall time: 685 ms


In [None]:
%%time
generate_completion(input_string_02_CoT, 200)

<pad> Haiku is a Japanese poetry that is very short. A Haiku is made up of three phrases of 5, 7,
and 5 syllables respectively. A single tweet on Twitter is limited to 140 characters. Therefore, the
final answer is no.</s>
CPU times: user 12.7 s, sys: 0 ns, total: 12.7 s
Wall time: 12.6 s


## Zeroshot Logical Reasoning

In [None]:
%%time
input_string = '''
Q: Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering.
'''
generate_completion(input_string, max_length=100)

<pad> George Washington died in 1799. Geoffrey Hinton was born in 1959. So the final answer is
no.</s>
CPU times: user 5.94 s, sys: 0 ns, total: 5.94 s
Wall time: 5.92 s


## ZeroShot Generation

<img src="https://www.dropbox.com/s/qw3tib3zuxbp38w/Screenshot%202023-03-04%20at%2011.25.34%20AM.png?raw=1" alt="example image" width="400"> 

In [None]:
%%time
input_string = '''Write me a funny poem about a cat driving car.'''

generate_completion(input_string, max_length=200)

<pad> i saw a cat driving a car he was driving fast he was driving slow he was driving in the rain
he was driving in the snow he was driving in the sun he was driving in the rain he was driving in
the snow he was driving in the sun he was driving in the rain he was driving in the snow he was
driving in the rain he was driving in the snow he was driving in the rain he was driving in the rain
he was driving in the rain he was driving in the rain he was driving in the rain he was driving in
the rain he was driving in the rain he was driving in the rain he was driving in the rain he was
driving in the rain he was driving in the rain he was driving in the rain he was driving in the rain
he was driving in the rain he was driving in the rain he was driving
CPU times: user 45.5 s, sys: 0 ns, total: 45.5 s
Wall time: 45.3 s


## Zeroshot Story Writing

<img src="https://www.dropbox.com/s/opnn3ptdrfv16cj/Screenshot%202023-03-04%20at%2011.01.06%20AM.png?raw=1" alt="example image" width="400"> 

In [None]:
%%time
input_string = '''Write a sad story about carrot named Jason. The story should \
start with the carrot being a professional athlete of some kind, \
and end with the carrot having his heart broken.'''

generate_completion(input_string, max_length=512)

<pad> Jason was a professional carrot. He was a great athlete and he loved to
play football. He was a star player on his team. One day, he met a girl. She was
a fan of his team. They started dating and they were happy together. One day,
Jason found out that his girlfriend was cheating on him. He was heartbroken.</s>
CPU times: user 18.7 s, sys: 0 ns, total: 18.7 s
Wall time: 18.7 s


## Zeroshot Common Sense Reasoning

<img src="https://www.dropbox.com/s/fchicfcjewzloip/Screenshot%202023-03-04%20at%2010.56.11%20AM.png?raw=1" alt="example image" width="400"> 

In [None]:
%%time
input_string = '''I am riding a bicycle. The pedals are moving fast. I look into the mirror and I am not moving. Why is this?'''

generate_completion(input_string, max_length=200)

<pad> I am slacking off.</s>
CPU times: user 2.26 s, sys: 0 ns, total: 2.26 s
Wall time: 2.25 s


## Zeroshot Speech Writing

<img src="https://www.dropbox.com/s/7c6dqxrtc3qkbsb/Screenshot%202023-03-04%20at%2010.58.23%20AM.png?raw=1" alt="example image" width="400"> 

In [None]:
%%time
# Zero shot speech writing
input_string = '''<article about US open Nadal vs Medvedev> \n
Write me a speech for Rafael Nadal to give for his US Open victory:'''

generate_completion(input_string, max_length=100)

<pad> Rafael Nadal: Thank you very much. I'm very happy to win this title. It's
a great moment for me. I'm very happy to win this title. I'm very happy to win
this title. I'm very happy to win this title. I'm very happy to win this title.
I'm very happy to win this title. I'm very happy to win this title. I'm very
happy to win this title
CPU times: user 22.9 s, sys: 0 ns, total: 22.9 s
Wall time: 22.8 s


### Testing large token input 

In [None]:
%%time
# Zero shot speech writing
input_string = '''Please answer the question:\n
Who is the OnePlus COO?\n\n
Output in the format: [first_name, surname]\n\n

Smartphone makers searched for a way forward at MWC 2023
Foldables, 6G, light shows -- there are a lot of ideas floating around, but no one has cracked the code
The slowdown was inevitable, of course. Nothing stays hot forever — especially in this industry. By tech standards, smartphones have had a good run, but the last few years have seen device makers searching for the magic bullet to help the sales slide reverse course. The arrival of 5G was a nice reprieve, but next-generation telecom standards don’t arrive every year.

It’s too early to say with certainly whether the move toward device repairability in the midst of new and proposed legislation will have a meaningful impact, but it was a highlight at this year’s show, which HMD turned into a central thesis. Regardless of how many people take advantage of the ability to repair their devices at home (or have a third party repair them), it’s another potential pain point for industry growth.
Foldables have seemingly performed many expectations (specifically for Samsung), but not nearly enough to really move the needle. Phone makers have a refresh problem. For a long time, phone purchases were inexorably tied to carrier plans, putting the devices on a two- or three-year cycle. Of course, the kinds of financing deals that let you spend less up front have a way of making you pay in the end.


There does seem to be a looming sense of carriers and manufacturers attempting to return to something similar with a new name.

“I think there’s going to be more of a movement toward models where devices themselves are sold more as a service,” Google’s Sameer Samat told me this week. “I think there’s a lot of innovative work going on in the carrier side to figure out how you buy a device for less up front, you use it and return it after a period of time and you get another device as part of your overall subscription.”
In a world where we don’t own our movies, music or software, the concept of “hardware as a service” is rapidly emerging as its own path forward. Like the move from physical albums to Spotify, it has trade-offs.

Some consumers will no doubt jump at the opportunity to upgrade hardware without a thought, but is not owning your phone the same as not owning a CD or record? Will these ultimately end up costing us a lot more in the end? And in a time when most manufacturers are touting percentages of recycled materials, how much more waste will this model create?


There’s also a sense phone makers effectively painted themselves into a corner. The yearly one-upmanship ultimately benefited consumers with much better devices. I’ve said this a bunch, but these days it’s hard to find a bad phone for more than $500 — there are also an increasing number of good ones for less than that. These days, a “budget” device often involves settling for last year’s best chipset.

Better phones last longer, both in terms of durability and futureproofing feature set. Having a three- or four-year-old phone these days doesn’t mean the same thing it meant three or four years ago. That’s also due, in part, to the fact that innovation has slowed. It’s become a battle for inches. When was the last time you saw a truly revolutionary upgrade from last year’s model? Do moderately better screens, cameras or even batteries compel that many people toward impulse purchases?

“The smartphone market grew initially because there was a really innovative product that was useful to customers,” Nothing’s Carl Pei told me in an interview this week. “Now it’s starting to shrink, because my phone is good enough. Why should I upgrade?”
Taking the broader view, none of this is bad, per se. It means better products for consumers, as well as a slowing of the massive waste generated by millions of people buying a new device every other year. We all tacitly understand why corporations and shareholders hope such cycles will sustain forever, but many of us are glad they don’t. Companies need one of two things to happen: either reversing the slide or shifting focus to other revenue streams.


“There will always be sales of new phones,” says Samat. “But I think you’re now reaching the point where this is, for many people, it is their primary computing device. So, there are different and more interesting ways of looking at the market. I think in terms of what are you able to do with these devices? What does engagement look like? What are the services that you’re utilizing? And how is it integrated with other parts of your life?”

The writing has been on the wall for a while. The slowdown pre-dates the pandemic by some time, but the last three years have certainly accelerated the trend. Shutdowns, unemployment, inflation, supply chain constraints — you know the deal. Forward thinking companies invested heavily in content plays. That’s certainly paid off for Apple and some of the competition, as well. There were moments where wearables and smart home devices seemed like they might help stem the bleeding, but while both have done well for manufacturers, there isn’t the same sense of ubiquity.

6G isn’t anything beyond a number of different companies vying for adoption of their specific solution, so we’re looking at years before the first devices start arriving. At a conference that loves nothing more than hyping a new technology, 5G’s potential replacement only warranted a single panel.
Anyone else feel like it’s 50/50 between 6G and Mad Max scenario for 2030? Okay, maybe it’s just me. Even so, that feels impossibly far away and doesn’t do much for any of these companies in the near term.
Maybe foldables have a lot more juice left in them? If MWC was any indication, manufacturers certainly believe so. It seemed like every company had one this year. Well, everyone except Nothing.


“I personally think foldables are supply chain-driven innovation and not consumer insights,” Pei said. “Somebody invents OLED, and they can make a lot of money, because it’s a great technology. Then after a few years, a lot more companies make that, so they need to lower their prices. So they need to figure out what else they can sell at a higher margin. They develop flexible OLEDs, which they can sell at a higher price.”
It’s hard not to be cynical about this stuff sometimes. Ditto for concept devices, though as I noted in my “ode to weird tech” post, as someone who follows this stuff for a living, I’m a fan of weirdness for weirdness sake, be it the rollable Motorola Rizr screen or the OnePlus glowing cooling fluid. Certainly following the automotive industry’s lead of creating concept devices is a trend that is likely to only become more pervasive.

OnePlus COO Kinder Liu told me this week that gauging consumer interest is one of the “multiple reasons” his company is engaging with the concept. He added, “Also, we want to encourage continuous innovation inside our company.”

Pretty much everyone I engaged with this week echoed the sentiment that smartphones are in a rut. For the first time, however, it’s not a foregone conclusion that there’s a way of getting out.
'''

inputs = tokenizer(input_string, return_tensors="pt").input_ids
print(len(inputs[0]))

generate_completion(input_string, max_length=200)

1674
<pad> [Kinder, Liu]</s>
CPU times: user 2.88 s, sys: 0 ns, total: 2.88 s
Wall time: 2.87 s


In [None]:
!nvidia-smi

Sat Mar  4 02:55:49 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    52W / 350W |  33293MiB / 40960MiB |     22%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Using the HuggingFace Inference API for UL2

You will need a HF Read token for this 

In [1]:
HF_API_TOKEN = ''

In [5]:
from huggingface_hub.inference_api import InferenceApi

# Flan-UL2 20B
inference_flan_ul2 = InferenceApi(repo_id="google/flan-ul2", token=HF_API_TOKEN)

# Flan-T5-XXL 11B
inference_flan_t5_xxl = InferenceApi(repo_id="google/flan-t5-xxl", token=HF_API_TOKEN)

# Flan-T5-XXL 11B
inference_flan_t5_large = InferenceApi(repo_id="google/flan-t5-large",token=HF_API_TOKEN)

In [6]:
params = {'max_length': 10}

In [None]:
# Flan UL2
input_string = '''
Q: Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering.
'''

inference_flan_ul2(inputs=input_string
          )

[{'generated_text': 'George Washington died in 1799. Geoffrey Hinton was born in 1959. So the final'}]

In [9]:
# Flan UL2
input_string = '''
Q: What do you believe in the meaning of life?
'''

inference_flan_ul2(inputs=input_string
          )

[{'generated_text': 'a purpose'}]

In [12]:
# Flan UL2
input_string = '''
Q: Simply put, the theory of relativity states that 
'''

inference_flan_ul2(inputs=input_string
          )

[{'generated_text': 'the speed of light is the same in all inertial frames of reference.'}]

In [11]:
# Flan UL2
input_string = '''
Q: What is the theory of relativity states simply?
'''

inference_flan_ul2(inputs=input_string
          )

[{'generated_text': 'that the speed of light is the same for all observers'}]

In [14]:
# Flan UL2
input_string = '''
Q: Building a website can be done in 10 simple steps:
'''

inference_flan_ul2(inputs=input_string
          )

[{'generated_text': 'Create a domain name. Choose a hosting service. Design your site. Add content. Add'}]

In [18]:
# Flan UL2
input_string = '''
Q: What are the 10 simple steps in building a website?
'''

inference_flan_ul2(inputs=input_string
          )

[{'generated_text': 'Create a domain name.'}]

In [None]:
# Flan T5-xxl
inference_flan_t5_xxl(inputs=input_string)

[{'generated_text': 'George Washington died in 1799. Geoffrey Hinton was born in 1939. The answer'}]

In [None]:
# Flan T5-large
inference_flan_t5_large(inputs=input_string)

[{'generated_text': 'George Washington died in 1789. Geoffrey Hinton was born in 1818. The'}]

In [None]:
# Flan T5-large
input_string = '''<article about US open Nadal vs Medvedev> \n
Write me a speech for Rafael Nadal to give for his US Open victory:'''

inference_flan_t5_large(inputs=input_string)

In [None]:
# Flan T5-xxl
input_string = '''<article about US open Nadal vs Medvedev> \n
Write me a speech for Rafael Nadal to give for his US Open victory:'''

inference_flan_t5_xxl(inputs=input_string)

[{'generated_text': "Rafael Nadal: I'm very happy to win the US Open for the second time in"}]

In [None]:
# Flan T5-large
inference_flan_t5_large(inputs=input_string)

[{'generated_text': 'Rafael Nadal has won the US Open, and he has won the French Open,'}]

In [None]:
inference_flan_ul2(inputs="Answer the following yes/no question by reasoning step-by-step. \n \
Can you write a whole Haiku in a single tweet?")

[{'generated_text': 'Haiku is a Japanese poetry that has a strict 17 syllable rule.'}]

In [None]:
!pip show transformers

Name: transformers
Version: 4.26.1
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache
Location: /usr/local/lib/python3.8/dist-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, tokenizers, tqdm
Required-by: 


In [None]:
# Flan UL2
input_string = '''
Q: Could Marcus Aurelius have had dinner with George Washington? Give the rationale before answering.
'''

inference_flan_ul2(inputs=input_string
          )

[{'generated_text': 'Marcus Aurelius lived from 121 to 180 AD. George Washington lived from 1732 to 17'}]

Model and inference API is also available on HuggingFace: https://huggingface.co/google/flan-ul2

Demo: [HuggingFace Space comparing Flan-T5-XXL and Flan-UL2](https://huggingface.co/spaces/ybelkada/i-like-flan-ul2)