# Language Models


> A **language model** is a computational system, typically implemented using machine learning (especially neural networks), that is trained to **predict and generate natural language text** by learning statistical patterns from large corpora — rather than being explicitly programmed with linguistic rules.

> A machine learning-based **language model** is not hand-coded with grammar rules. Instead, it’s a type of machine learning model that learns how people use language by reading huge amounts of text. After training, it can complete sentences, answer questions, or write paragraphs — just by predicting what words are likely to come next.

> A language model is like an **auto-complete on steroids** — trained on a huge amount of text, it can generate whole sentences, answer questions, write stories, or translate languages.

### Think of it like this:

* You start typing a message: “I am going to the…”
* A language model can guess what comes next: “store”, “gym”, or “beach”.
* It does this by learning patterns in how words are used together — like how we humans learn language by reading and listening.

Here’s a simple example in Python using the `transformers` library from Hugging Face. This code shows how to use a **pre-trained language model** to generate text:

### 🧠 Example: Text Generation with GPT-2

In [2]:
from transformers import pipeline

# Load a pre-trained language model (GPT-2)
generator = pipeline("text-generation", model="gpt2")

# Give the model a prompt
prompt = "Once upon a time in a small village"

# Generate text
result = generator(prompt, max_length=50, num_return_sequences=1)

# Print the result
print(result[0]["generated_text"])

  from .autonotebook import tqdm as notebook_tqdm
Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Once upon a time in a small village, a group of men came together and started talking. This was not unusual because of the fact that villages are usually inhabited by children, so they had a tradition of talking. The man from the village that spoke was only six years old.

"We have a small group called the villagers, which are the first to go out into the forest. One of them has a small sword. He has a small sword, so I will fight against him."

"I will fight against the man, so you will get the sword. My name is Nii-san."

Nii-san was the eldest daughter of the village, and she brought the sword she had learned from the village's teacher as a gift to a man. The sword she had used was a dagger.

"This sword will come in one piece, but it's not as sharp as the sword in the other. This sword can also be used as a weapon. The only thing you need to do is to prepare for battle. I won't be too nervous, but I did put it on a piece of paper."

"A piece?"

Nii-san nodded her head and spoke.

"

### What it does:

* Loads **GPT-2**, a popular language model.
* Takes your prompt `"Once upon a time in a small village"`.
* Predicts and adds more text, word by word.

You’ll get something like:

```
"Once upon a time in a small village, there was a curious fox who loved to explore the nearby forest. Every morning..."
```