# Falcon LLM
Falcon LLM is a generative large language model (LLM) that helps advance applications and use cases to future-proof our world. Today the Falcon 180B, 40B, 7.5B, 1.3B parameter AI models, as well as our high-quality REFINEDWEB dataset, form a suite of offerings.



## Falcon 40B

Falcon 40B was the world's top-ranked open-source AI model when launched. Falcon has 40 billion parameters and was trained on one trillion tokens. For two months following its launch, Falcon 40B ranked #1 on Hugging Faces leaderboard for open source large language models (LLMs). Offered completely royalty-free with weights, Falcon 40B is revolutionary and helps democratize AI and make it a more inclusive technology.

## Falcon 180B
 Falcon 180B is a super-powerful language model with 180 billion parameters, trained on 3.5 trillion tokens. It's currently at the top of the Hugging Face Leaderboard for pre-trained Open Large Language Models and is available for both research and commercial use..

**This model performs exceptionally well in various tasks like reasoning, coding, proficiency, and knowledge tests, even beating competitors like Meta's LLaMA 2.**

**Among closed source models, it ranks just behind OpenAI's GPT 4, and performs on par with Google's PaLM 2 Large, which powers Bard, despite being half the size of the model.**



---



# Install all the required packages

In [1]:
!pip install --quiet  torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117 --upgrade

In [2]:
!pip install --quiet langchain einops accelerate transformers bitsandbytes

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.2/251.2 kB[0m [31m13.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m32.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m17.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m73.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m33.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [3]:
from langchain import HuggingFacePipeline
from langchain import PromptTemplate,  LLMChain
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import os
import torch

# Downlaod the falcon-7b-instruct instruct model using transformers
# and create a pipeline using HuggingFacePipeline

In [4]:
# Define Model ID
model_id = "tiiuae/falcon-7b-instruct"

# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Load Model
model = AutoModelForCausalLM.from_pretrained(model_id, cache_dir='/opt/workspace/',
    torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto", offload_folder="offload")

# Set PT model to inference mode
model.eval()

# Build HF Transformers pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map="auto",
    max_length=400,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id
)

Downloading (…)okenizer_config.json:   0%|          | 0.00/220 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.73M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/667 [00:00<?, ?B/s]

Downloading (…)/configuration_RW.py:   0%|          | 0.00/2.61k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b-instruct:
- configuration_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading (…)main/modelling_RW.py:   0%|          | 0.00/47.5k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b-instruct:
- modelling_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading (…)model.bin.index.json:   0%|          | 0.00/16.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00002.bin:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

Downloading (…)l-00002-of-00002.bin:   0%|          | 0.00/4.48G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

In [10]:
question = """Translate the below text from English to French

Generative AI models are enabling us to create innovative pathways to an exciting future of possibilities – where the only limits are of the imagination.

Our team of AI researchers, scientists, and engineers are collaborating to achieve innovative outcomes – from AI theory to AI technologies,
and are sharing their efforts with the wider AI community.

"""

# Test out the pipeline
pipeline(question)

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


[{'generated_text': "Translate the below text from English to French\n\nGenerative AI models are enabling us to create innovative pathways to an exciting future of possibilities – where the only limits are of the imagination.\n\nOur team of AI researchers, scientists, and engineers are collaborating to achieve innovative outcomes – from AI theory to AI technologies, \nand are sharing their efforts with the wider AI community.\n\n<p>Translation: \nLes modèles de l'IA génériques permettent à nous de créer des voies innovantes pour un avenir excitant d'opportunités - où les seuls limites sont de l'imagination. \nNotre équipe de chercheurs en recherche, chercheurs et ingénieurs travaillent ensemble pour atteindre des objectifs innovants - de l'IA théorique à des technologies AI - et échangent leurs connaissances avec la communauté AI.</p>"}]

## Using Langchain

In [11]:
# Setup prompt template
template = PromptTemplate(input_variables=['input'], template='{input}')
# Pass hugging face pipeline to langchain class
llm = HuggingFacePipeline(pipeline=pipeline)
# Build stacked LLM chain i.e. prompt-formatting + LLM
chain = LLMChain(llm=llm, prompt=template)

In [12]:
question = """Translate the below text from English to French

Generative AI models are enabling us to create innovative pathways to an exciting future of possibilities – where the only limits are of the imagination.

Our team of AI researchers, scientists, and engineers are collaborating to achieve innovative outcomes – from AI theory to AI technologies,
and are sharing their efforts with the wider AI community.

"""

# Test LLMChain
response = chain.run(question)

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


In [13]:
print(response)

<p>L'IA génératrice nous permet de mettre au point de nouvelles voies à de formidables possibilités d'avenir – où seules nos limites de l'imagination s'imposent.</p>

<p>Notre équipe de chercheurs en IA, scientifiques et ingénieurs travaillent ensemble pour atteindre des objectifs innovants et échanger leurs connaissances avec la communauté de l'IA.</p>
