In [1]:
import torch
import torch.nn.utils as utils

def count_params(model):
    params = utils.parameters_to_vector(model.parameters())
    num_params = torch.numel(params)
    return num_params

# OpenAI GPT
OpenAI GPT model was proposed in [Improving Language Understanding by Generative Pre-Training](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf). It’s a causal (unidirectional) transformer pre-trained language model on a large corpus will long range dependencies, the Toronto Book Corpus.

[Write With Transformer](https://transformer.huggingface.co/doc/gpt) is a webapp created and hosted by Hugging Face showcasing the generative capabilities of several models. GPT is one of them. You can try its performance on the website.

This model was contributed by thomwolf. The original code can be found [here](https://github.com/openai/finetune-transformer-lm).

> If you want to reproduce the original tokenization process of the OpenAI GPT paper, you will need to install ftfy and SpaCy:
> ```
> pip install spacy ftfy==4.4.3
> python -m spacy download en
> ```
> 
> If you don’t install ftfy and SpaCy, the OpenAIGPTTokenizer will default to tokenize using BERT’s BasicTokenizer followed by Byte-Pair Encoding (which should be fine for most usage, don’t worry).

## OpenAIGPTModel
This bare OpenAI GPT transformer model outputting raw hidden-states without any specific head on top.

In [1]:
from transformers import AutoTokenizer, OpenAIGPTModel
import torch

# Instantiate the tokenizer using the "openai-gpt" pre-trained model
tokenizer = AutoTokenizer.from_pretrained("openai-gpt")

# Tokenize the input sentence using the tokenizer
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")

# Instantiate the OpenAI GPT model using the "openai-gpt" pre-trained weights
model = OpenAIGPTModel.from_pretrained("openai-gpt")

# Move the model and inputs to the GPU (assuming CUDA is available)
model, inputs = model.to('cuda:0'), inputs.to('cuda:0')

# Perform a forward pass through the model by passing the tokenized inputs
# The model will generate the outputs based on the input tokens
outputs = model(**inputs)

In [2]:
tokenizer

OpenAIGPTTokenizerFast(name_or_path='openai-gpt', vocab_size=40478, model_max_length=512, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'unk_token': '<unk>'}, clean_up_tokenization_spaces=True)

In [3]:
inputs

{'input_ids': tensor([[3570,  240,  547, 2585,  544, 4957]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1]], device='cuda:0')}

In [4]:
model

OpenAIGPTModel(
  (tokens_embed): Embedding(40478, 768)
  (positions_embed): Embedding(512, 768)
  (drop): Dropout(p=0.1, inplace=False)
  (h): ModuleList(
    (0-11): 12 x Block(
      (attn): Attention(
        (c_attn): Conv1D()
        (c_proj): Conv1D()
        (attn_dropout): Dropout(p=0.1, inplace=False)
        (resid_dropout): Dropout(p=0.1, inplace=False)
      )
      (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (mlp): MLP(
        (c_fc): Conv1D()
        (c_proj): Conv1D()
        (act): NewGELUActivation()
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    )
  )
)

In [5]:
print('Number of parameters in GPT:', count_params(model))

Number of parameters in GPT: 116534784


In [6]:
outputs.last_hidden_state, outputs.last_hidden_state.shape

(tensor([[[ 0.4653,  0.0642,  0.5910,  ...,  0.1177, -0.0021, -1.2262],
          [-0.3697, -0.0957,  0.6613,  ..., -0.0344, -0.2164,  0.1205],
          [ 0.1700, -0.3252,  0.0407,  ...,  0.1589, -0.8057, -0.2830],
          [-0.3669, -0.0448,  0.8061,  ..., -0.0090, -0.0872, -0.5224],
          [-0.5047,  0.6522,  0.6932,  ...,  0.0811,  0.6475,  0.3190],
          [-0.2972,  0.0591,  1.2333,  ..., -0.7394, -0.2600,  0.0863]]],
        device='cuda:0', grad_fn=<ViewBackward0>),
 torch.Size([1, 6, 768]))

## OpenAIGPTLMHeadModel
OpenAI GPT Model transformer with a language modeling head on top (linear layer with weights tied to the input embeddings).

In [7]:
from transformers import AutoTokenizer, OpenAIGPTLMHeadModel
import torch

# Instantiate the tokenizer using the "openai-gpt" pre-trained model
tokenizer = AutoTokenizer.from_pretrained("openai-gpt")

# Tokenize the input sentence using the tokenizer
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")

# Instantiate the OpenAI GPT model using the "openai-gpt" pre-trained weights
model = OpenAIGPTLMHeadModel.from_pretrained("openai-gpt")

# Move the model and inputs to the GPU (assuming CUDA is available)
model, inputs = model.to('cuda:0'), inputs.to('cuda:0')

# Perform a forward pass through the model by passing the tokenized inputs
# The model will generate the outputs based on the input tokens
outputs = model(**inputs)

Some weights of OpenAIGPTLMHeadModel were not initialized from the model checkpoint at openai-gpt and are newly initialized: ['position_ids']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [8]:
model

OpenAIGPTLMHeadModel(
  (transformer): OpenAIGPTModel(
    (tokens_embed): Embedding(40478, 768)
    (positions_embed): Embedding(512, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x Block(
        (attn): Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      )
    )
  )
  (lm_head): Linear(in_features=768, out_features=40478, bias=False)
)

In [9]:
outputs.logits, outputs.logits.shape

(tensor([[[ -5.9486,  -5.8697, -18.4258,  ...,  -9.7371, -10.4495,   0.8814],
          [ -6.1212,  -4.8031, -14.3970,  ...,  -6.5411,  -9.5051,  -1.2015],
          [ -7.4231,  -6.3615, -14.7297,  ..., -10.4575,  -8.4600,  -1.5183],
          [ -5.6463,  -5.9526, -17.5195,  ...,  -9.4144, -15.7120,  -1.5394],
          [ -5.4751,  -5.8803, -13.7767,  ..., -10.5048, -12.4167,  -6.1584],
          [ -7.2052,  -6.0198, -21.5040,  ..., -16.2941, -14.0494,  -1.2416]]],
        device='cuda:0', grad_fn=<UnsafeViewBackward0>),
 torch.Size([1, 6, 40478]))

In [10]:
# Get the token ids from the outputs.logits
token_ids = torch.argmax(outputs.logits, dim=-1)

# Decode the token ids using the tokenizer
tokenizer.batch_decode(token_ids)

[', " name. a.']

In [11]:
from transformers import pipeline

generator = pipeline('text-generation', model='openai-gpt')
generator("Hello, my dog is cute. ", max_length=30, num_return_sequences=5)

Some weights of OpenAIGPTLMHeadModel were not initialized from the model checkpoint at openai-gpt and are newly initialized: ['position_ids']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[{'generated_text': 'Hello, my dog is cute.  " \n " he\'s my new favorite toy. " \n " i\'m really sorry. " \n " you know'},
 {'generated_text': 'Hello, my dog is cute.  marketplace and all that. " \n " thank you. " \n she didn\'t miss how his eyes lit up as'},
 {'generated_text': 'Hello, my dog is cute.  is that cat with you? " \n " she\'s asleep. " he stepped back for her and she followed him'},
 {'generated_text': 'Hello, my dog is cute.  the three of us were going to meet up there early at a place called... " \n " the beach, "'},
 {'generated_text': 'Hello, my dog is cute.  she says hi when you say hello to her. i will do it for her. okay, here she comes.'}]

## GPT-2
GPT-2 is a transformers model pretrained on a very large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, **it was trained to guess the next word in sentences.**

Test the whole generation capabilities [here](https://transformer.huggingface.co/doc/gpt2-large).

Here we use the **smallest** version of GPT-2, with 124M parameters and 548MB of storage occupation.

You can use the raw model for text generation or fine-tune it to a downstream task. See the [model hub](https://huggingface.co/models?other=gpt2) to look for fine-tuned versions on a task that interests you.

In [12]:
from transformers import pipeline

generator = pipeline('text-generation', model='gpt2', device='cuda:0')
generator("Hello, my dog is cute.", max_length=30, num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Hello, my dog is cute. \xa0When I try to go back and have some fun with my little friend or friend friend, my dog becomes'},
 {'generated_text': "Hello, my dog is cute. \xa0He can't breathe properly and I'm trying to help him breathe. I love it because it keeps my"},
 {'generated_text': "Hello, my dog is cute. ㅇㅇㅇㅇㅇ I hate to know what you'd"},
 {'generated_text': 'Hello, my dog is cute. \xa0Do you want to go back to your garden or do you want her to show you any cute housemates'},
 {'generated_text': "Hello, my dog is cute. \xa0She has gone through all the surgeries and she's so loving. \xa0She's so excited. "}]

In [13]:
generator.model

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

In [14]:
print('Number of parameters in GPT-2:', count_params(generator.model))

Number of parameters in GPT-2: 124439808


## GPT-2 Medium
GPT-2 Medium is the **355M parameters (1.52GB of storage occupation.)** version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM) objective.

Use the code below to get started with the model. You can use this model directly with a pipeline for text generation.

In [15]:
from transformers import pipeline

generator = pipeline('text-generation', model='gpt2-medium', device='cuda:0')
generator("Hello, my dog is cute.", max_length=30, num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Hello, my dog is cute. How were you? Can you tell him I like him?"\n\nLydia is a very friendly pet. I'},
 {'generated_text': "Hello, my dog is cute. He was just talking to me on the telephone and I can't wait to meet him! Hello… I don't"},
 {'generated_text': 'Hello, my dog is cute. He is a real good dog!\n\n"My dog has been an excellent trainer. When I think of his'},
 {'generated_text': 'Hello, my dog is cute. He has some kind of animal-like features. He says he can do this or that. Can you explain that'},
 {'generated_text': 'Hello, my dog is cute. But I\'m just so uncomfortable with my dog." It\'s almost like an extension of her guilt about her "good'}]

In [16]:
generator.model

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 1024)
    (wpe): Embedding(1024, 1024)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-23): 24 x GPT2Block(
        (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=1024, out_features=50257, bias=False)
)

In [17]:
print('Number of parameters in GPT-2 Medium:', count_params(generator.model))

Number of parameters in GPT-2 Medium: 354823168


## GPT-2 Large
GPT-2 Large is the **774M parameters (3.25GB of storage occupation.)** version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM) objective.

Test the full generation capabilities [here](https://transformer.huggingface.co/doc/gpt2-large).

Use the code below to get started with the model. You can use this model directly with a pipeline for text generation.

In [2]:
from transformers import pipeline

generator = pipeline('text-generation', model='gpt2-large', device='cuda:0')
generator("Hello, my dog is cute.", max_length=30, num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Hello, my dog is cute. He just got a nose job.\n\nIt started a few weeks ago. It had always been possible; so'},
 {'generated_text': 'Hello, my dog is cute. She has a sweet face, she is funny, but most of all she is loyal. I would have to give'},
 {'generated_text': 'Hello, my dog is cute. He loves her! How should I look at him? What should I say to tell him how pretty he is?'},
 {'generated_text': 'Hello, my dog is cute. My dog loves the water and my family loves to go jogging."'},
 {'generated_text': 'Hello, my dog is cute. I just couldn\'t resist the urge to put him in a new toy!" I looked over at the box of T'}]

In [3]:
generator.model

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 1280)
    (wpe): Embedding(1024, 1280)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-35): 36 x GPT2Block(
        (ln_1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=1280, out_features=50257, bias=False)
)

In [4]:
print('Number of parameters in GPT-2 Large:', count_params(generator.model))

Number of parameters in GPT-2 Large: 774030080


## GPT-2 XL
GPT-2 XL is the **1.5B parameters (6.43GB of storage occupation.)** version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM) objective.

Use the code below to get started with the model. You can use this model directly with a pipeline for text generation.

In [2]:
from transformers import pipeline

generator = pipeline('text-generation', model='gpt2-xl', device='cuda:0')
generator("Hello, my dog is cute.", max_length=30, num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Hello, my dog is cute. How did you get her?"\n\n"What I said before isn\'t funny. Now, get off your big'},
 {'generated_text': 'Hello, my dog is cute. Let\'s play." and, "He\'s my favorite toy." "Let\'s play Frisbee?" And that'},
 {'generated_text': 'Hello, my dog is cute. What happens to me?"\n\n-Sharon\n\nWhat is a good dog food commercial?\n\nThe'},
 {'generated_text': 'Hello, my dog is cute. You see a picture of my dog? Oh, I like you too. So how about you meet my dog?"'},
 {'generated_text': 'Hello, my dog is cute. I have to get her something new too! When I saw this, I was looking for a similar option for my'}]

In [3]:
generator.model

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 1600)
    (wpe): Embedding(1024, 1600)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-47): 48 x GPT2Block(
        (ln_1): LayerNorm((1600,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((1600,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((1600,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=1600, out_features=50257, bias=False)
)

In [4]:
print('Number of parameters in GPT-2 XL:', count_params(generator.model))

Number of parameters in GPT-2 XL: 1557611200


## DistillGPT2
DistilGPT2 is an English-language model pre-trained with the supervision of the 124 million parameter version of GPT-2. DistilGPT2, which has **82 million parameters**, was developed using **knowledge distillation** and was designed to be a **faster, lighter version of GPT-2**.

In [5]:
from transformers import pipeline

generator = pipeline('text-generation', model='distilgpt2', device='cuda:0')
generator("Hello, my dog is cute.", max_length=20, num_return_sequences=5)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Hello, my dog is cute. I love dogs, like dog. I always bring him food.'},
 {'generated_text': 'Hello, my dog is cute.\u202a It is the only color you can see.‣'},
 {'generated_text': 'Hello, my dog is cute. I have a dog called "Karen" and she keeps it'},
 {'generated_text': "Hello, my dog is cute. She's a loving and loving dog.\n\nIf you�"},
 {'generated_text': "Hello, my dog is cute. I really didn't want it to be all my own and he"}]

In [6]:
generator.model

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-5): 6 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

In [7]:
print('Number of parameters in GPT-2 XL:', count_params(generator.model))

Number of parameters in GPT-2 XL: 81912576


## GPT-Neo 1.3B (GPT-3)
GPT-Neo 1.3B is a transformer model designed using EleutherAI's **replication of the GPT-3 architecture**. GPT-Neo refers to the class of models, while **1.3B represents the number of parameters (5.31GB of storage occupation)** of this pre-trained model.

This model was trained on the Pile for **380 billion tokens over 362,000 steps**. It was trained as a masked autoregressive language model, using cross-entropy loss.

You can use this model directly with a pipeline for text generation.

In [1]:
from transformers import pipeline

generator = pipeline('text-generation', model='EleutherAI/gpt-neo-1.3B', device='cuda:0')
generator("EleutherAI has", do_sample=True, min_length=50)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'EleutherAI has introduced a brand new release with an emphasis on the usability and performance of the product. The version 2.0.10.0 is also more advanced than previous versions, including a brand new user interface, a new networking module and'}]

In [2]:
generator.model

GPTNeoForCausalLM(
  (transformer): GPTNeoModel(
    (wte): Embedding(50257, 2048)
    (wpe): Embedding(2048, 2048)
    (drop): Dropout(p=0.0, inplace=False)
    (h): ModuleList(
      (0-23): 24 x GPTNeoBlock(
        (ln_1): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
        (attn): GPTNeoAttention(
          (attention): GPTNeoSelfAttention(
            (attn_dropout): Dropout(p=0.0, inplace=False)
            (resid_dropout): Dropout(p=0.0, inplace=False)
            (k_proj): Linear(in_features=2048, out_features=2048, bias=False)
            (v_proj): Linear(in_features=2048, out_features=2048, bias=False)
            (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
            (out_proj): Linear(in_features=2048, out_features=2048, bias=True)
          )
        )
        (ln_2): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
        (mlp): GPTNeoMLP(
          (c_fc): Linear(in_features=2048, out_features=8192, bias=True)
          (c_proj):

In [4]:
print('Number of parameters in GPT-Neo 1.3B:', count_params(generator.model))

Number of parameters in GPT-Neo 1.3B: 1315575808


## GPT-Neo 2.7B (GPT-3)
GPT-Neo 2.7B is a transformer model designed using EleutherAI's **replication of the GPT-3 architecture**. GPT-Neo refers to the class of models, while **2.7B represents the number of parameters (10.7GB of storage occupation)** of this pre-trained model.

This model was trained for 420 billion tokens over 400,000 steps. It was trained as a masked autoregressive language model, using cross-entropy loss.

You can use this model directly with a pipeline for text generation.

In [1]:
from transformers import pipeline

generator = pipeline('text-generation', model='EleutherAI/gpt-neo-2.7B', device='cuda:0')
generator("EleutherAI has", do_sample=True, min_length=50)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'EleutherAI has released their new version of its open source artificial intelligence software, dubbed Eleuther AI Open Core. Rather than create a new, custom version of the software or make existing open-source components for the software, EleutherAI decided to'}]

In [2]:
generator.model

GPTNeoForCausalLM(
  (transformer): GPTNeoModel(
    (wte): Embedding(50257, 2560)
    (wpe): Embedding(2048, 2560)
    (drop): Dropout(p=0.0, inplace=False)
    (h): ModuleList(
      (0-31): 32 x GPTNeoBlock(
        (ln_1): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (attn): GPTNeoAttention(
          (attention): GPTNeoSelfAttention(
            (attn_dropout): Dropout(p=0.0, inplace=False)
            (resid_dropout): Dropout(p=0.0, inplace=False)
            (k_proj): Linear(in_features=2560, out_features=2560, bias=False)
            (v_proj): Linear(in_features=2560, out_features=2560, bias=False)
            (q_proj): Linear(in_features=2560, out_features=2560, bias=False)
            (out_proj): Linear(in_features=2560, out_features=2560, bias=True)
          )
        )
        (ln_2): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (mlp): GPTNeoMLP(
          (c_fc): Linear(in_features=2560, out_features=10240, bias=True)
          (c_proj)

In [4]:
print('Number of parameters in GPT-Neo 2.7B:', count_params(generator.model))

Number of parameters in GPT-Neo 2.7B: 2651307520


## GPT-NeoX-20B (GPT-3)
GPT-NeoX-20B is a **20 billion parameter (~40GB of storage occupation, tensor type: fp16)** autoregressive language model trained on the Pile using the GPT-NeoX library. **Its architecture intentionally resembles that of GPT-3, and is almost identical to that of GPT-J-6B.**

GPT-NeoX-20B has not been fine-tuned for downstream tasks for which language models are commonly deployed, such as writing genre prose, or commercial chatbots. This means GPT-NeoX-20B will likely **not** respond to a given prompt the way products such as ChatGPT do. This is because, unlike GPT-NeoX-20B, ChatGPT was fine-tuned using methods such as Reinforcement Learning from Human Feedback (RLHF) to better “understand” human instructions and dialogue.

This model is **English-language only**, and thus cannot be used for translation or generating text in other languages.

GPT-NeoX-20B was trained with a batch size of approximately **3.15M tokens** (1538 sequences of 2048 tokens each), for a total of **150,000 steps**. **Tensor parallelism** and **pipeline parallelism** were used to distribute the model across GPUs.

If you simply want to try out some prompts, check out this [playground](https://20b.eleuther.ai/).

In [1]:
from transformers import pipeline

generator = pipeline('text-generation', model='EleutherAI/gpt-neox-20b', device_map='auto')
generator("EleutherAI has", do_sample=True, min_length=20)

The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.


Loading checkpoint shards:   0%|          | 0/46 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


[{'generated_text': 'EleutherAI has a built-in OMS.\n\n(2) Can only be used'}]

In [2]:
generator.model

GPTNeoXForCausalLM(
  (gpt_neox): GPTNeoXModel(
    (embed_in): Embedding(50432, 6144)
    (layers): ModuleList(
      (0-43): 44 x GPTNeoXLayer(
        (input_layernorm): LayerNorm((6144,), eps=1e-05, elementwise_affine=True)
        (post_attention_layernorm): LayerNorm((6144,), eps=1e-05, elementwise_affine=True)
        (attention): GPTNeoXAttention(
          (rotary_emb): RotaryEmbedding()
          (query_key_value): Linear(in_features=6144, out_features=18432, bias=True)
          (dense): Linear(in_features=6144, out_features=6144, bias=True)
        )
        (mlp): GPTNeoXMLP(
          (dense_h_to_4h): Linear(in_features=6144, out_features=24576, bias=True)
          (dense_4h_to_h): Linear(in_features=24576, out_features=6144, bias=True)
          (act): FastGELUActivation()
        )
      )
    )
    (final_layer_norm): LayerNorm((6144,), eps=1e-05, elementwise_affine=True)
  )
  (embed_out): Linear(in_features=6144, out_features=50432, bias=False)
)

# LLAMA
LLaMA is an auto-regressive language model, based on the transformer architecture. More information can be found in the paper “[LLaMA, Open and Efficient Foundation Language Models](https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/)”.

The primary use of LLaMA is research on large language models, including: exploring potential applications such as **question answering**, **natural language understanding** or **reading comprehension**, understanding capabilities and limitations of current language models, and developing techniques to improve those, evaluating and mitigating biases, risks, toxic and harmful content generations, hallucinations.

One of the most relevant factors for which model performance may vary is which language is used. Although this model included 20 languages in the training data, most of our dataset is made of English text, and we thus expect the model to **perform better for English than other languages**.

In this tutorial, we present a permissively licensed open source reproduction of Meta AI's LLaMA large language model. We are releasing a **3B** and **7B** model trained on 1T tokens, as well as the preview of a **13B** model trained on 600B tokens.

In [1]:
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM

model_path = 'openlm-research/open_llama_3b'
# model_path = 'openlm-research/open_llama_7b'

tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
    model_path, torch_dtype=torch.float16, device_map='auto',
)

prompt = 'Q: What is the largest animal?\nA:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

generation_output = model.generate(input_ids=input_ids, max_new_tokens=32)
print(tokenizer.decode(generation_output[0]))

The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.


<s>Q: What is the largest animal?
A: The blue whale.
Q: What is the largest animal?
A: The blue whale. It is the largest animal on Earth. It is also the


In [2]:
model

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 3200, padding_idx=0)
    (layers): ModuleList(
      (0-25): 26 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=3200, out_features=3200, bias=False)
          (k_proj): Linear(in_features=3200, out_features=3200, bias=False)
          (v_proj): Linear(in_features=3200, out_features=3200, bias=False)
          (o_proj): Linear(in_features=3200, out_features=3200, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=3200, out_features=8640, bias=False)
          (down_proj): Linear(in_features=8640, out_features=3200, bias=False)
          (up_proj): Linear(in_features=3200, out_features=8640, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm(

## ChatGLM2-6B
ChatGLM2-6B 是开源中英双语对话模型 ChatGLM-6B 的第二代版本，在保留了初代模型对话流畅、部署门槛较低等众多优秀特性的基础之上，ChatGLM2-6B 引入了如下新特性：

1. **更强大的性能：** 基于 ChatGLM 初代模型的开发经验，全面升级了 ChatGLM2-6B 的基座模型。ChatGLM2-6B 使用了 GLM 的混合目标函数，经过了 1.4T 中英标识符的预训练与人类偏好对齐训练，评测结果显示，相比于初代模型，ChatGLM2-6B 在 MMLU（+23%）、CEval（+33%）、GSM8K（+571%） 、BBH（+60%）等数据集上的性能取得了大幅度的提升，在同尺寸开源模型中具有较强的竞争力。
2. **更长的上下文：** 基于 FlashAttention 技术，将基座模型的上下文长度（Context Length）由 ChatGLM-6B 的 2K 扩展到了 32K，并在对话阶段使用 8K 的上下文长度训练，允许更多轮次的对话。
3. **更高效的推理：** 基于 Multi-Query Attention 技术，ChatGLM2-6B 有更高效的推理速度和更低的显存占用：在官方的模型实现下，推理速度相比初代提升了 42%，INT4 量化下，6G 显存支持的对话长度由 1K 提升到了 8K。



In [2]:
import torch
from transformers import AutoTokenizer, AutoModel

model_repo = "THUDM/chatglm2-6b"
tokenizer = AutoTokenizer.from_pretrained(model_repo, trust_remote_code=True)

model = AutoModel.from_pretrained(
    model_repo, torch_dtype=torch.float16, trust_remote_code=True, device='cuda:0')

response, history = model.chat(tokenizer, "你好", history=[])
print(response, end='\n\n')

response, history = model.chat(tokenizer, "ChatGLM2-6B的GPT2的本质区别是什么？", history=history)
print(response)

You are using a model of type chatglm to instantiate a model of type . This is not supported for all configurations of models and can yield errors.


Loading checkpoint shards:   0%|          | 0/7 [00:00<?, ?it/s]

你好👋！我是人工智能助手 ChatGLM2-6B，很高兴见到你，欢迎问我任何问题。

ChatGLM2-6B 是基于 GLM2-6B 模型开发的，而 GLM2-6B 模型是基于 GLM 模型开发的。GLM （General Language Modeling）是一种由清华大学 KEG 实验室提出的结合了 BERT 和 GPT 优势的通用预训练模型。

具体来说，ChatGLM2-6B 相比 GLM 模型的区别主要在以下几个方面：

1. 训练目标：ChatGLM2-6B 的训练目标是更加关注用户对话体验，因此在对用户发起的对话回复中，更加关注用户的情感需求。

2. 接口类型：ChatGLM2-6B 使用的接口类型是 B 接口，而 GLM 模型的接口类型是 A 接口。

3. 能力不同：ChatGLM2-6B 模型的能力更加关注于对话的回复，特别是回复的长度和复杂程度。

总的来说，ChatGLM2-6B 模型是 GLM 模型的一种特殊版本，主要关注用户对话体验和回复的长度和复杂程度。


In [3]:
response, history = model.chat(tokenizer, "ChatGLM2-6B的训练数据截止到哪一年？", history=[])
print(response, end='\n\n')

我是ChatGLM2-6B，一个基于语言模型的人工智能助手。我的训练数据截止到2023年。



In [4]:
print('Number of parameters in ChatGLM2-6B:', count_params(model.cpu()))

Number of parameters in ChatGLM2-6B: 6243584000


In [5]:
model

ChatGLMForConditionalGeneration(
  (transformer): ChatGLMModel(
    (embedding): Embedding(
      (word_embeddings): Embedding(65024, 4096)
    )
    (rotary_pos_emb): RotaryEmbedding()
    (encoder): GLMTransformer(
      (layers): ModuleList(
        (0-27): 28 x GLMBlock(
          (input_layernorm): RMSNorm()
          (self_attention): SelfAttention(
            (query_key_value): Linear(in_features=4096, out_features=4608, bias=True)
            (core_attention): CoreAttention(
              (attention_dropout): Dropout(p=0.0, inplace=False)
            )
            (dense): Linear(in_features=4096, out_features=4096, bias=False)
          )
          (post_attention_layernorm): RMSNorm()
          (mlp): MLP(
            (dense_h_to_4h): Linear(in_features=4096, out_features=27392, bias=False)
            (dense_4h_to_h): Linear(in_features=13696, out_features=4096, bias=False)
          )
        )
      )
      (final_layernorm): RMSNorm()
    )
    (output_layer): Linear(in_