BiGram :

1. Imports:

* re : The regular expression module, used for pattern matching in text.
    
* Counter from collections : A dictionary subclass used to count hashable objects (in this case, bigrams).

* tee and islice from itertools : Imported but not used in this script.

    
2. Function Definition : extract_bigrams(file_path)

* Purpose : Extract bigrams (pairs of consecutive words) from the text in the specified file and count their frequencies.

3. Reading the File:

Opens the file specified by file_path in read mode with UTF-8 encoding and reads the entire content into the variable text.

4.Tokenizing the Text:

    * Converts the entire text to lowercase to make the analysis case-insensitive.

    * Uses a regular expression (\b\w+\b) to find all sequences of word characters. \b represents word boundaries, and \w+ matches one or more word characters (letters and digits).

5. Creating Bigrams:

    * Generates bigrams by combining each word with the next word in the list. This is done using a list comprehension that iterates through the list of words and creates pairs of consecutive word

6. Counting Frequencies:

    Counter creates a frequency distribution of the bigrams. Each unique bigram is a key in the Counter dictionary, and its value is the count of occurrences in the text.
    
7. Error Handling:

Catches specific exceptions:

  * PermissionError: If the file cannot be accessed due to permission issues.

  * FileNotFoundError: If the file path is incorrect or the file does not exist.

  * A general Exception handler to catch any other unexpected errors.
    
8. Main Execution:

* Sets the path to the text file. Make sure to replace sample.txt with the actual filename if it’s different.
    
* Calls the extract_bigrams function with the specified file path.
    
* If bigrams are successfully extracted, prints the bigram frequencies.
    

In [1]:
import re
from collections import Counter
from itertools import tee, islice

def extract_bigrams(file_path):
    try:
        # Read the file
        with open(file_path, 'r', encoding='utf-8') as file:
            text = file.read()
        
        # Tokenize the text into words
        words = re.findall(r'\b\w+\b', text.lower())  # Normalize to lowercase
        
        # Create bigrams
        bigrams = [f"{words[i]} {words[i + 1]}" for i in range(len(words) - 1)]
        
        # Count the frequency of each bigram
        bigram_freq = Counter(bigrams)
        
        return bigram_freq
    
    except PermissionError:
        print("Permission denied: Cannot access the file. Check if you have read permissions.")
    except FileNotFoundError:
        print("File not found. Make sure the file path is correct.")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

# Path to your file
file_path = r'C:\Users\vishn\OneDrive\Desktop\gram\sample.txt'
bigrams = extract_bigrams(file_path)
if bigrams:
    print(bigrams)

Counter({'utilitatis causa': 1, 'causa amicitia': 1, 'amicitia est': 1, 'est quaesita': 1, 'quaesita lorem': 1, 'lorem ipsum': 1, 'ipsum dolor': 1, 'dolor sit': 1, 'sit amet': 1, 'amet consectetur': 1, 'consectetur adipiscing': 1, 'adipiscing elit': 1, 'elit collatio': 1, 'collatio igitur': 1, 'igitur ista': 1, 'ista te': 1, 'te nihil': 1, 'nihil iuvat': 1, 'iuvat honesta': 1, 'honesta oratio': 1, 'oratio socratica': 1, 'socratica platonis': 1, 'platonis etiam': 1, 'etiam primum': 1, 'primum in': 1, 'in nostrane': 1, 'nostrane potestate': 1, 'potestate est': 1, 'est quid': 1, 'quid meminerimus': 1, 'meminerimus duo': 1, 'duo reges': 1, 'reges constructio': 1, 'constructio interrete': 1, 'interrete quid': 1, 'quid si': 1, 'si etiam': 1, 'etiam iucunda': 1, 'iucunda memoria': 1, 'memoria est': 1, 'est praeteritorum': 1, 'praeteritorum malorum': 1, 'malorum si': 1, 'si quidem': 1, 'quidem inquit': 1, 'inquit tollerem': 1, 'tollerem sed': 1, 'sed relinquo': 1, 'relinquo an': 1, 'an nisi': 