### Word, Sentence, and Sub-word Tokenization in Python
Tokenization is the process of splitting text into smaller units, such as words, sentences, or sub-words, which can be further processed for tasks like natural language processing (NLP). Python provides various libraries to perform tokenization.

___
___

#### 1. Word Tokenization

Word tokenization splits a sentence into individual words or tokens. It's commonly used in tasks like text preprocessing for machine learning models.

```python
from nltk.tokenize import word_tokenize

text = "Python is a great programming language."
word_tokens = word_tokenize(text)
print(word_tokens)  # Output: ['Python', 'is', 'a', 'great', 'programming', 'language', '.']


#### 2. Sentence Tokenization

Sentence tokenization splits text into sentences, which is useful for tasks like summarization or sentence-level analysis.

```python 
from nltk.tokenize import sent_tokenize

text = "Python is great. It's widely used in NLP tasks."
sentence_tokens = sent_tokenize(text)
print(sentence_tokens)  # Output: ['Python is great.', "It's widely used in NLP tasks."]


#### 3. Sub-word Tokenization
Sub-word tokenization splits words into smaller units, often used in deep learning models like BERT. It handles unknown or rare words by breaking them into sub-word units.

```python 
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
text = "unbelievable"
subword_tokens = tokenizer.tokenize(text)
print(subword_tokens)  # Output: ['un', '##believable']
