In [1]:
import txtanalyzer

print(txtanalyzer.__version__)

1.2


# Detect Language Patterns 

Detecting language patterns is a great tool to get a better understanding of text data. 

The `detect_language_patterns`function spots patterns like common n-grams (word combinations), frequently used characters, or the mix of languages in a dataset. These patterns can help you see key trends and details in the text, like often-mentioned terms, writing styles, or the overall language makeup.


# Imports 

To use the language pattern detection function, import `detect_language_patterns` from `txtanalyzer.detect_language_patterns` as shown below.

In [2]:
from txtanalyzer.detect_language_patterns import detect_language_patterns

We are using the sample text below for the example usage of the language pattern detection function. The sample covers a mix of themes, including artificial intelligence and meditation.

In [3]:
sample_text = [
    "Artificial intelligence and machine learning are transforming industries around the globe.",
    "The basketball team secured a thrilling victory in the final seconds of the game.",
    "Yoga and meditation are excellent for reducing stress and improving mental health.",
    "Exploring the hidden beaches of Bali is an unforgettable experience for any traveler.",
    "Quantum computing is expected to revolutionize data processing and cryptography.",
    "L'intelligence artificielle et l'apprentissage automatique transforment les industries du monde entier.",
    "L'équipe de basket-ball a remporté une victoire passionnante dans les dernières secondes du match.",
    "Le yoga et la méditation sont excellents pour réduire le stress et améliorer la santé mentale.",
    "L'exploration des plages cachées de Bali est une expérience inoubliable pour tout voyageur.",
    "L'informatique quantique devrait révolutionner le traitement des données et la cryptographie.",
    "人工智能和机器学习正在改变全球各行各业。",
    "篮球队在比赛的最后几秒钟取得了激动人心的胜利。",
    "瑜伽和冥想是减压和改善心理健康的绝佳方式。",
    "探索巴厘岛隐秘的海滩对任何旅行者来说都是一次难忘的经历。",
    "量子计算有望彻底改变数据处理和密码学。"
]

# Usage

The example below demonstrates how to use the **`detect_language_patterns`** function to analyze the sample text.

1. **Language Detection**  
   The first part detects the language of each message in the sample text by setting the parameter `method="language"`. The result is a list of detected languages, where each entry corresponds to the language of a sentence in the sample text.

In [4]:
# Detect the language of each message in the sample text
result = detect_language_patterns(sample_text, method="language")
print(result)

['en', 'en', 'en', 'en', 'en', 'fr', 'fr', 'fr', 'fr', 'fr', 'zh-cn', 'zh-cn', 'zh-tw', 'zh-cn', 'zh-cn']


Each detected language (`en` for English, `fr` for French, and `zh-cn` for Chinese) corresponds to a sentence in the sample_text.


2. **Bigram Extraction**  
   The second part identifies the top 5 most common bigrams (two-word combinations) in the sample text by setting the parameters `method="ngrams"`, `n=2`, and `top_n=5`. The output shows the bigrams along with their frequencies.


In [5]:
# Extract the top 5 most common bigrams (two-word combinations)
result = detect_language_patterns(sample_text, method="ngrams", n=2, top_n=5)
print(result)


[('et la', 2), ('artificial intelligence', 1), ('intelligence and', 1), ('and machine', 1), ('machine learning', 1)]


The bigram `et la` appears twice in the French sentences, while other bigrams occur once in the English sentences.