# Importing Libraries
The script begins by importing several standard Python libraries. These provide the tools needed for text processing, data splitting, and calculations.


In [1]:
#!/usr/bin/env python
import re, random, math, collections, itertools

PRINT_ERRORS=0

# Data Loading and Preparation: readFiles
Before any model training or testing can occur, all raw data must be read from the disk and structured into a Python-friendly format. The readFiles function is responsible for this entire data ingestion and preparation process. Read data from files.

In [2]:
#------------- Function Definitions ---------------------

def readFiles(sentimentDictionary,sentencesTrain,sentencesTest,sentencesNokia):
    """
    Reads all raw data files from disk and populates the dictionaries
    passed as arguments.
    - sentimentDictionary: Fills with words and their scores (1 or -1).
    - sentencesTrain: Fills with ~90% of film reviews for training.
    - sentencesTest: Fills with ~10% of film reviews for testing.
    - sentencesNokia: Fills with all Nokia reviews for out-of-domain testing.
    """

   # --- 1. Read movie review files ---
    # reading pre-labeled input and splitting into lines
    posSentences = open('rt-polarity.pos', 'r', encoding="ISO-8859-1")
    posSentences = re.split(r'\n', posSentences.read())

    negSentences = open('rt-polarity.neg', 'r', encoding="ISO-8859-1")
    negSentences = re.split(r'\n', negSentences.read())

    # --- 2. Read Nokia review files ---
    posSentencesNokia = open('nokia-pos.txt', 'r')
    posSentencesNokia = re.split(r'\n', posSentencesNokia.read())

    negSentencesNokia = open('nokia-neg.txt', 'r', encoding="ISO-8859-1")
    negSentencesNokia = re.split(r'\n', negSentencesNokia.read())
    # --- 3. Read sentiment dictionary files ---
    
    # Use a list comprehension to read positive words.
    # .strip() removes whitespace/newlines.
    # 'if line.strip()' filters out empty blank lines.
    # 'not line.startswith(";")' filters out comment lines.
    posDictionary = open('positive-words.txt', 'r', encoding="ISO-8859-1")
    posWordList = [line.strip() for line in posDictionary.readlines() if line.strip() and not line.startswith(';')]
    posDictionary.close()

    # Do the same for the negative words list.
    negDictionary = open('negative-words.txt', 'r', encoding="ISO-8859-1")
    negWordList = [line.strip() for line in negDictionary.readlines() if line.strip() and not line.startswith(';')] 
    negDictionary.close()

    # --- 4. Populate the master sentiment dictionary with scores ---
    for i in posWordList:
        sentimentDictionary[i] = 1
    for i in negWordList:
        sentimentDictionary[i] = -1

    # --- 5. Create Training and Test Datsets for Film Reviews ---
    # We want to test on sentences we haven't trained on, 
    # to see how well the model generalises to previously unseen sentences.
    
    # Create ~90% training / 10% test split of training and test data
    for i in posSentences:
        # random.randint(1,10) picks a number between 1 and 10.
        # '< 2' is true 1 out of 10 times (i.e., when it picks '1').
        if random.randint(1,10)<2:
            sentencesTest[i]="positive"
        else:
            sentencesTrain[i]="positive"

    for i in negSentences:
        if random.randint(1,10)<2:
            sentencesTest[i]="negative"
        else:
            sentencesTrain[i]="negative"

   # --- 6. Create Nokia (Out-of-Domain) Datset ---
    # No split needed, this is only for testing.
    for i in posSentencesNokia:
            sentencesNokia[i]="positive"
    for i in negSentencesNokia:
            sentencesNokia[i]="negative"
#----------------------------End of data initialisation ----------------#

# Parsing the Sentiment Lexicons
This block of code is the direct implementation of Step 2.1 from the assignment. The original script provided empty lists (posWordList = [] and negWordList = []). The task is to replace them with code that correctly reads and parses the dictionary files.

The most efficient solution is a list comprehension, which builds a new list by processing each line from the file.
 ```
   posDictionary = open('positive-words.txt', 'r', encoding="ISO-8859-1")
    posWordList = [line.strip() for line in posDictionary.readlines() if line.strip() and not line.startswith(';')]
    posDictionary.close()

    negDictionary = open('negative-words.txt', 'r', encoding="ISO-8859-1")
    negWordList = [line.strip() for line in negDictionary.readlines() if line.strip() and not line.startswith(';')] 
    negDictionary.close()
```
 This is the "output" part. For every line that passes the filters, line.strip() is called to remove all leading/trailing whitespace (like spaces, tabs, and the invisible \n newline character). This gives us a clean word.

 ``` for line in posDictionary.readlines() : ```This is the loop. It reads the file line by line.

 ``` if line.strip() and not line.startswith(';') : ``` This line ensures that it don't save empty line and in case of the commets to skip them I use method line.startswith(';')

# Naive Bayes Model Training: trainBayes
This function is the "learning" or "training" phase of our statistical classifier. Its sole purpose is to run through the sentencesTrain dataset and build the probabilistic model.

It calculates the conditional probabilities for every word in the vocabulary, answering the question: "What is the probability of seeing this word, given the sentiment is positive?" (and vice-versa for negative)

In [3]:
# calculates p(W|Positive), p(W|Negative) and p(W) for all words in training data
def trainBayes(sentencesTrain, pWordPos, pWordNeg, pWord):
    """
    Trains the Naive Bayes model.
    This function iterates over the training data to count word occurrences
    and then calculates the conditional probabilities P(Word|Positive)
    and P(Word|Negative) for every word in the vocabulary.
    
    Args:
        sentencesTrain (dict): The training data (sentence: sentiment).
        pWordPos (dict): An empty dict to be filled with P(W|Pos) probabilities.
        pWordNeg (dict): An empty dict to be filled with P(W|Neg) probabilities.
        pWord (dict): An empty dict to be filled with P(W) probabilities.
    """
    # Dictionaries to store the raw counts of each word  [hash function]
    freqPositive = {} 
    freqNegative = {}
    # A set of all unique words in the training data (our vocabulary)
    dictionary = {}
    # Counters for the total number of words in each class
    posWordsTot = 0
    negWordsTot = 0
    allWordsTot = 0

    # --- STAGE 1: COUNTING ---
    # iterate through each sentence/sentiment pair in the training data
    for sentence, sentiment in sentencesTrain.items():
        # Tokenize the sentence into a list of words
        wordList = re.findall(r"[\w']+", sentence)
        
        for word in wordList: # calculate over unigrams
            allWordsTot += 1 # keeps count of total words in dataset
            if not (word in dictionary):
                dictionary[word] = 1
            if sentiment=="positive" :
                posWordsTot += 1 # keeps count of total words in positive class

                # keep count of each word in positive context
                if not (word in freqPositive):
                    freqPositive[word] = 1
                else:
                    freqPositive[word] += 1    
            else:
                negWordsTot+=1 # keeps count of total words in negative class
                
                # keep count of each word in positive context
                if not (word in freqNegative):
                    freqNegative[word] = 1
                else:
                    freqNegative[word] += 1

    for word in dictionary:
        # do some smoothing so that minimum count of a word is 1
        if not (word in freqNegative):
            freqNegative[word] = 1
        if not (word in freqPositive):
            freqPositive[word] = 1

        # Calculate p(word|positive)
        pWordPos[word] = freqPositive[word] / float(posWordsTot)

        # Calculate p(word|negative) 
        pWordNeg[word] = freqNegative[word] / float(negWordsTot)

        # Calculate p(word)
        pWord[word] = (freqPositive[word] + freqNegative[word]) / float(allWordsTot) 

#---------------------------End Training ----------------------------------

The function operates in two main stages:

Stage 1: Counting Frequencies: Loop through all training sentences and count every word.

Stage 2: Calculating Probabilities: Convert those raw counts into probabilities.

# Stage 1: Counting Word Frequencies
The first part of the function iterates over every (sentence, sentiment) pair in the sentencesTrain dictionary.
```
# Dictionaries to store the raw counts of each word
    freqPositive = {} 
    freqNegative = {} 
    dictionary = {} # A set of all unique words in the training data
    
    # Counters for the total number of words in each class
    posWordsTot = 0  
    negWordsTot = 0  
    allWordsTot = 0  

    # --- STAGE 1: COUNTING ---
    for sentence, sentiment in sentencesTrain.items():
        wordList = re.findall(r"[\w']+", sentence)
        
        for word in wordList:
            allWordsTot += 1
            if not (word in dictionary):
                dictionary[word] = 1 # Add word to our vocabulary
                
            if sentiment=="positive" :
                posWordsTot += 1 
                if not (word in freqPositive):
                    freqPositive[word] = 1
                else:
                    freqPositive[word] += 1    
            else:
                negWordsTot+=1 
                if not (word in freqNegative):
                    freqNegative[word] = 1
                else:
                    freqNegative[word] += 1
```
At the end of this stage, we have:

freqPositive / freqNegative: Dictionaries holding the raw count of each word in each class (e.g., freqPositive['great'] = 500).

posWordsTot / negWordsTot: The total number of words (not unique) in all positive or negative sentences.

dictionary: A set of every unique word encountered, which we call our vocabulary.

# Stage 2: Calculating Probabilities (with Smoothing)
The second stage iterates over our new dictionary. It converts the raw counts from Stage 1 into the final probabilities that our model will use.
```
# --- STAGE 2: CALCULATING PROBABILITIES ---
    for word in dictionary:
        
        # --- Add-One (Laplace) Smoothing ---
        if not (word in freqNegative):
            freqNegative[word] = 1
        if not (word in freqPositive):
            freqPositive[word] = 1
```
This is a critical step. If a word (e.g., "superb") appeared in positive reviews but never in a negative one, its freqNegative count would be 0. Later, this would cause a 0-probability error that would break the entire calculation. By adding a "fake" count of 1, we ensure every word has a tiny, non-zero probability in both classes.

# After smoothing, the final probabilities are calculated:
```
# Calculate p(word|positive)
        pWordPos[word] = freqPositive[word] / float(posWordsTot)

        # Calculate p(word|negative) 
        pWordNeg[word] = freqNegative[word] / float(negWordsTot)

        # Calculate p(word)
        pWord[word] = (freqPositive[word] + freqNegative[word]) / float(allWordsTot)
```
At the end of this function, the (previously empty) dictionaries pWordPos, pWordNeg, and pWord are now filled with the probabilities needed to make predictions.

# Naive Bayes Classification: testBayes
This function is the "testing" or "classification" phase. It acts as the "examiner" for our model. It takes the probabilities calculated by trainBayes and uses them to predict the sentiment of new, unseen sentences from a test set (sentencesTest or sentencesNokia).

Its final job is to compare its own predictions against the "ground truth" (correct) labels, which allows us to calculate all the performance metrics for Step 2.

The function can be broken down into three parts: the core classification logic, the error-printing mechanism, and the final metrics calculation.

In [17]:
# implement naive bayes algorithm
# INPUTS:
#   sentencesTest is a dictonary with sentences associated with sentiment 
#   dataName is a string (used only for printing output)
#   pWordPos is dictionary storing p(word|positive) for each word
#      i.e., pWordPos["apple"] will return a real value for p("apple"|positive)
#   pWordNeg is dictionary storing p(word|negative) for each word
#   pWord is dictionary storing p(word)
#   pPos is a real number containing the fraction of positive reviews in the dataset
def testBayes(sentencesTest, dataName, pWordPos, pWordNeg, pWord,pPos):

    print("Naive Bayes classification for " + dataName)
    pNeg=1-pPos

    # These variables will store results
    total=0
    correct=0
    totalpos=0
    totalpospred=0
    totalneg=0
    totalnegpred=0
    correctpos=0
    correctneg=0

    # for each sentence, sentiment pair in the dataset
    for sentence, sentiment in sentencesTest.items():
        wordList = re.findall(r"[\w']+", sentence)#collect all words

        pPosW=pPos
        pNegW=pNeg

        for word in wordList: # calculate over unigrams
            if word in pWord:
                if pWord[word]>0.00000001:
                    pPosW *=pWordPos[word]
                    pNegW *=pWordNeg[word]

        prob=0; 
        # Normalize the scores into a 0-1 probability
        # prob = P(Pos|W) / (P(Pos|W) + P(Neg|W))           
        if pPosW+pNegW >0:
            prob=pPosW/float(pPosW+pNegW)

        total+=1
        # 'sentiment' is the "ground truth" (correct answer)
        if sentiment=="positive":
            totalpos+=1
            # 'prob > 0.5' is the model's prediction
            if prob>0.5:
                correct+=1
                correctpos+=1
                totalpospred+=1
            else:# Ground truth is negative
                correct+=0
                totalnegpred+=1
                if PRINT_ERRORS:
                    print ("ERROR (pos classed as neg %0.2f):" %prob + sentence)
        else:
            totalneg+=1
            if prob<=0.5:
                correct+=1
                correctneg+=1
                totalnegpred+=1
            else:
                correct+=0
                totalpospred+=1
                if PRINT_ERRORS:
                    print ("ERROR (neg classed as pos %0.2f):" %prob + sentence)
    # Accuracy = (Correct Predictions) / (Total Predictions)
    accuracy = (correct / float(total)) * 100 if total >0 else 0

    # --- Positive Class Metrics ---

    # Precision (Positive): TP / (TP + FP) -> correctpos / totalpospred
    # "Of all sentences we PREDICTED positive, how many were ACTUALLY positive?"
    pos_precision = (correctpos / float(totalpospred)) * 100 if totalpospred >0 else 0

    # Recall (Positive): TP / (TP + FN) -> correctpos / totalpos
    # "Of all ACTUALLY positive sentences, how many did we FIND?"
    pos_recall = (correctpos / float(totalpos)) * 100 if totalpos >0 else 0

    # F1-Score (Positive): 2 * (Precision * Recall) / (Precision + Recall)
    # The harmonic mean, balancing precision and recall. Good for uneven classes.
    pos_f1 = (2 * pos_precision * pos_recall) / (pos_precision + pos_recall) if (pos_precision + pos_recall) >0 else 0

    # --- Negative Class Metrics ---
    
    # Precision (Negative): TN / (TN + FN) -> correctneg / totalnegpred
    # "Of all sentences we PREDICTED negative, how many were ACTUALLY negative?"
    neg_precision = (correctneg / float(totalnegpred)) * 100 if totalnegpred > 0 else 0

    # Recall (Negative): TN / (TN + FP) -> correctneg / totalneg
    neg_recall = (correctneg / float(totalneg)) * 100 if totalneg > 0 else 0

    # F1-Score (Negative): 2 * (Precision * Recall) / (Precision + Recall)
    neg_f1 = (2 * neg_precision * neg_recall) / (neg_precision + neg_recall) if (neg_precision + neg_recall) > 0 else 0

    # --- Print all the results in a clean format ---
    print(f"\n--- Results for {dataName} ---")
    print(f"Accuracy: {accuracy:.2f}% ({correct}/{total})")
    print("\n--- Positive Class ---")
    print(f"Precision: {pos_precision:.2f}% ({correctpos}/{totalpospred})")
    print(f"Recall:    {pos_recall:.2f}% ({correctpos}/{totalpos})")
    print(f"F1-Score:  {pos_f1:.2f}")
    print("\n--- Negative Class ---")
    print(f"Precision: {neg_precision:.2f}% ({correctneg}/{totalnegpred})")
    print(f"Recall:    {neg_recall:.2f}% ({correctneg}/{totalneg})")
    print(f"F1-Score:  {neg_f1:.2f}")
    print("--------------------------------\n")
    # --- END of Step 2 TODO ---

# Core Classification Logic
The function iterates through every (sentence, sentiment) pair in the provided sentencesTest dictionary. For each sentence, it performs the core Naive Bayes calculation.
```
# ... (metrics counters are initialized to 0) ...

    for sentence, sentiment in sentencesTest.items():
        wordList = re.findall(r"[\w']+", sentence)

        # Start with the base probabilities (e.g., 0.5 for pos, 0.5 for neg)
        pPosW=pPos
        pNegW=pNeg

        # --- This is the Naive Bayes core calculation ---
        # P(Pos|Sentence) = P(Pos) * P(w1|Pos) * P(w2|Pos) * ...
        for word in wordList:
            if word in pWord:
                if pWord[word]>0.00000001:
                    pPosW *=pWordPos[word]
                    pNegW *=pWordNeg[word]
```
This loop multiplies the probabilities of each word. This is the "naive" assumption in actionâ€”it treats each word's probability as independent. After iterating through all words, pPosW and pNegW hold the final (un-normalized) scores.

These scores are then normalized into a single 0-to-1 probability:
```
# Normalize the scores: P(Pos|W) / (P(Pos|W) + P(Neg|W))
        if pPosW+pNegW >0:
            prob=pPosW/float(pPosW+pNegW)
```

# Tallying Results and Printing Errors
The function then compares its prediction (prob) to the correct answer (sentiment). It uses this comparison to increment all the counters needed for the metrics (e.g., ```correct, correctpos, totalpospred```).

This section also implements the requirement for Step 6: if the PRINT_ERRORS flag is set to 1, the function will print any sentence it classifies incorrectly.
```
if sentiment=="positive":
            if prob>0.5:
                correct+=1
                # ...
            else:
                # ...
                if PRINT_ERRORS: # <-- Step 6
                    print ("ERROR (pos classed as neg %0.2f):" %prob + sentence)
```

# Calculating Performance Metrics (Step 2)
This final block of code is the implementation of Step 2. It takes all the counters (like correctpos, totalpospred, etc.) and calculates the standard academic metrics.

```
# --- START of Step 2 TODO: Calculate Performance Metrics ---
    
    # Accuracy = (Correct Predictions) / (Total Predictions)
    accuracy = (correct / float(total)) * 100 if total > 0 else 0
    
    # Precision (Positive): TP / (TP + FP) -> correctpos / totalpospred
    pos_precision = (correctpos / float(totalpospred)) * 100 if totalpospred > 0 else 0
    
    # Recall (Positive): TP / (TP + FN) -> correctpos / totalpos
    pos_recall = (correctpos / float(totalpos)) * 100 if totalpos > 0 else 0
    
    # F1-Score (Positive): 2 * (Precision * Recall) / (Precision + Recall)
    pos_f1 = (2 * pos_precision * pos_recall) / (pos_precision + pos_recall) if (pos_precision + pos_recall) > 0 else 0
    
    # (Identical calculations are performed for the Negative class)
    # ...
```

# Rule-Based Classification: testDictionary
This function is our second classifier: the rule-based or lexicon-based system. This function is the "baseline" or "dumb" model required for Step 5.

Unlike trainBayes, this model does no "learning". Its logic is a simple, fixed set of rules:

Read a sentence.

If a word is in our sentimentDictionary, add its score (+1 or -1).

If the final score is above a threshold, classify it as positive.

This function is also where we add the performance metrics for Step 5.1 and the error-printing logic for Step 6.

In [13]:
# This is a simple classifier that uses a sentiment dictionary to classify 
# a sentence. For each word in the sentence, if the word is in the positive 
# dictionary, it adds 1, if it is in the negative dictionary, it subtracts 1. 
# If the final score is above a threshold, it classifies as "Positive", 
# otherwise as "Negative"
def testDictionary(sentencesTest, dataName, sentimentDictionary, threshold):

    print("Dictionary-based classification")
    total=0
    correct=0
    totalpos=0
    totalneg=0
    totalpospred=0
    totalnegpred=0
    correctpos=0
    correctneg=0

    # Iterate over each sentence and its correct label
    for sentence, sentiment in sentencesTest.items():
        Words = re.findall(r"[\w']+", sentence)
        score=0

        # Sum the scores of all known words in the sentence
        for word in Words:
            if word in sentimentDictionary:
               score+=sentimentDictionary[word]
 
        total+=1

        # --- Tallying Results ---
        # 'sentiment' is the correct answer
        if sentiment=="positive":
            totalpos+=1
            # 'score >= threshold' is the model's prediction
            if score>=threshold:
                correct+=1
                correctpos+=1
                totalpospred+=1
            else:# Predict negative
                correct+=0
                totalnegpred+=1
        else:
            totalneg+=1
            # STEP 6: Check for printing errors
            if score<threshold:
                correct+=1
                correctneg+=1
                totalnegpred+=1
            else:
                correct+=0
                totalpospred+=1
    # TODO for Step 5: Add some code here to calculate and print: (1) accuracy; (2) precision and recall for the positive class; 
    # (3) precision and recall for the negative class; (4) F1 score;
     # Accuracy = (Correct Predictions) / (Total Predictions)
    accuracy = (correct / float(total)) * 100 if total > 0 else 0
    
    # --- Positive Class Metrics ---
    # Precision (Positive): TP / (TP + FP) -> correctpos / totalpos
    pos_precision = (correctpos / float(totalpospred)) * 100 if totalpospred > 0 else 0

    # Recall (Positive): TP / (TP + FN) -> correctpos / totalpos
    pos_recall = (correctpos / float(totalpos)) * 100 if totalpos > 0 else 0

    # F1-Score (Positive): 2 * (Precision * Recall) / (Precision + Recall)
    pos_f1 = (2 * pos_precision * pos_recall) / (pos_precision + pos_recall) if (pos_precision + pos_recall) > 0 else 0
    
    # --- Negative Class Metrics ---
    # Precision (Negative): TN / (TN + FN) -> correctneg / totalneg
    neg_precision = (correctneg / float(totalnegpred)) * 100 if totalnegpred > 0 else 0

    # Recall (Negative): TN / (TN + FP) -> correctneg / totalneg
    neg_recall = (correctneg / float(totalneg)) * 100 if totalneg > 0 else 0

    # F1-Score (Negative): 2 * (Precision * Recall) / (Precision + Recall)
    neg_f1 = (2 * neg_precision * neg_recall) / (neg_precision + neg_recall) if (neg_precision + neg_recall) > 0 else 0

    # --- Print all the results in a clean format ---
    print(f"\n--- Results for {dataName} ---")
    print(f"Accuracy: {accuracy:.2f}% ({correct}/{total})")
    print("\n--- Positive Class ---")
    print(f"Precision: {pos_precision:.2f}% ({correctpos}/{totalpospred})")
    print(f"Recall:    {pos_recall:.2f}% ({correctpos}/{totalpos})")
    print(f"F1-Score:  {pos_f1:.2f}")
    print("\n--- Negative Class ---")
    print(f"Precision: {neg_precision:.2f}% ({correctneg}/{totalnegpred})")
    print(f"Recall:    {neg_recall:.2f}% ({correctneg}/{totalneg})")
    print(f"F1-Score:  {neg_f1:.2f}")
    print("--------------------------------\n")
    # --- END of Step 5 TODO ---
    

# Core Classification Logic
The function iterates through each sentence and tokenizes it. It then loops through the words, checking if any exist in the ```sentimentDictionary```.

```
# ... (metrics counters are initialized to 0) ...
    
    for sentence, sentiment in sentencesTest.items():
        Words = re.findall(r"[\w']+", sentence)
        score=0
        
        # Sum the scores of all known words in the sentence
        for word in Words:
            if word in sentimentDictionary:
               score+=sentimentDictionary[word] # Add +1 or -1
```
After checking all words, the score variable holds the final sentiment value for the sentence (e.g., 2, -1, 0).

# Tallying and Error Printing
The final ```score``` is then compared against the ```threshold``` (which is set to ```1``` in our script) to make a prediction. This prediction is compared to the ```sentiment``` (ground truth) to tally the results.

This block also contains the ```if PRINT_ERRORS:``` check required by Step 6, which was added to the function.
```
if sentiment=="positive":
            totalpos+=1
            if score>=threshold:
                correct+=1
                # ...
            else: # Predict negative
                correct+=0
                totalnegpred+=1
                if PRINT_ERRORS: # <-- Step 6
                    print (f"ERROR (pos classed as neg, score {score}): {sentence}")
        else: # Ground truth is negative
            # ...
```

# Calculating Performance Metrics
Finally, just like in ```testBayes```, this block implements the requirements of Step 5 by calculating and printing the full set of metrics (Accuracy, Precision, Recall, F1).
```
# --- START of Step 5: Calculate Performance Metrics ---
    
    # Accuracy = (Correct Predictions) / (Total Predictions)
    accuracy = (correct / float(total)) * 100 if total > 0 else 0
    
    # Precision (Positive): TP / (TP + FP)
    pos_precision = (correctpos / float(totalpospred)) * 100 if totalpospred > 0 else 0
    
    # (... all other metrics calculations ...)
    # ...
```

# Improving the Rule-Based System
``` testDictionaryImproved ```
The reason for creating this new function is that the simple ```testDictionary``` method was too naive. It only counted +1 or -1 for words it found, ignoring all other linguistic context. This was especially problematic for film reviews. The Error Analysis showed it would fail on simple phrases, for example:

- "not good": Classified as Positive (Score: +1)

- "NOT a masterpiece": Classified as Positive (Score: +1)

The baseline model saw both "not good" and "very good" as having the exact same +1 score as "good".

The solution was to create testDictionaryImproved which adds a layer of linguistic rules to understand the context of a sentiment word.

In [6]:
#For step 5.3: Improved dictionary-based classifier with negation handling
#Negotiation handling: If a negation word is found, invert the sentiment scores of the next three words.
def testDictionaryImproved(sentencesTest, dataName, sentimentDictionary, threshold):
    """
    Performs rule-based classification (like testDictionary), but includes
    a simple negation-handling rule.
    
    This function extends the base classifier by adding a "negation window".
    When a negation word (e..g, "not", "n't") is found, the sentiment
    score of the next N words (e.g., 3) is inverted. This is a form of
    linguistic generalization, as required by Step 5.3.
    """
    print(f"IMPROVED Dictionary-based classification for: {dataName}")
    total=0
    correct=0
    totalpos=0
    totalneg=0
    totalpospred=0
    totalnegpred=0
    correctpos=0
    correctneg=0
    
    # Define negation words
    negation_words = {"not", "n't", "no", "never", "cannot", "can't", "don't", "doesn't", "didn't"}

    for sentence, sentiment in sentencesTest.items():
        Words = re.findall(r"[\w']+", sentence.lower()) # process in lowercase
        score=0
        negation_window = 0 # How many words forward the negation applies

        for word in Words:
            if word in negation_words:
                negation_window = 3 # Apply negation to next 3 words
            
            if word in sentimentDictionary:
                word_score = sentimentDictionary[word]
                if negation_window > 0:
                    word_score = -word_score # Invert the score
                score += word_score
            
            # Decrement the window
            if negation_window > 0:
                negation_window -= 1
 
        total+=1
        if sentiment=="positive":
            totalpos+=1
            if score>=threshold:
                correct+=1
                correctpos+=1
                totalpospred+=1
            else:
                correct+=0
                totalnegpred+=1
                if PRINT_ERRORS:
                    print (f"ERROR (pos classed as neg, score {score}): {sentence}")
        else:
            totalneg+=1
            if score<threshold:
                correct+=1
                correctneg+=1
                totalnegpred+=1
            else:
                correct+=0
                totalpospred+=1
                if PRINT_ERRORS:
                    print (f"ERROR (neg classed as pos, score {score}): {sentence}") 

    # --- Metrics calculations ---

    # Accuracy = (Correct Predictions) / (Total Predictions)
    accuracy = (correct / float(total)) * 100 if total > 0 else 0

    # --- Positive Class Metrics ---
    # Precision (Positive): TP / (TP + FP) -> correctpos / totalpos
    pos_precision = (correctpos / float(totalpospred)) * 100 if totalpospred > 0 else 0

    # Recall (Positive): TP / (TP + FN) -> correctpos / totalpos
    pos_recall = (correctpos / float(totalpos)) * 100 if totalpos > 0 else 0

    # F1-Score (Positive): 2 * (Precision * Recall) / (Precision + Recall)
    pos_f1 = (2 * pos_precision * pos_recall) / (pos_precision + pos_recall) if (pos_precision + pos_recall) > 0 else 0
    
    # --- Negative Class Metrics ---
    # Precision (Negative): TN / (TN + FN) -> correctneg / totalneg
    neg_precision = (correctneg / float(totalnegpred)) * 100 if totalnegpred > 0 else 0

    # Recall (Negative): TN / (TN + FP) -> correctneg / totalneg
    neg_recall = (correctneg / float(totalneg)) * 100 if totalneg > 0 else 0

    # F1-Score (Negative): 2 * (Precision * Recall) / (Precision + Recall)
    neg_f1 = (2 * neg_precision * neg_recall) / (neg_precision + neg_recall) if (neg_precision + neg_recall) > 0 else 0

    # --- Print all the results in a clean format ---
    print(f"\n--- Results for {dataName} ---")
    print(f"Accuracy: {accuracy:.2f}% ({correct}/{total})")
    print("\n--- Positive Class ---")
    print(f"Precision: {pos_precision:.2f}% ({correctpos}/{totalpospred})")
    print(f"Recall:    {pos_recall:.2f}% ({correctpos}/{totalpos})")
    print(f"F1-Score:  {pos_f1:.2f}")
    print("\n--- Negative Class ---")
    print(f"Precision: {neg_precision:.2f}% ({correctneg}/{totalnegpred})")
    print(f"Recall:    {neg_recall:.2f}% ({correctneg}/{totalneg})")
    print(f"F1-Score:  {neg_f1:.2f}")
    print("--------------------------------\n")
# --- END of Step 5.3 ---

# The New Linguistic Rules
This function is "smarter" in one specific way: it understands negation words.

First, we define a set of these words for a very fast lookup:
```negation_words = {"not", "n't", "no", "never", "cannot", "can't", "don't", "doesn't", "didn't"}```
## The Core Logic Explained
The function loops through each word, managing one "state" variable:

- negation_window: A counter, which we set to 3. This represents that negation (like "not") affects the next 3 words.
```
for word in Words:
            
            # 1. Check if the word is a LINGUISTIC MODIFIER
            if word in negation_words:
                negation_window = 3 # Start the 3-word negation window
                continue # This word has no score, so move to the next word
            
            # (There are NO intensifiers/diminishers in this version)
            
            # 2. If the word is NOT a modifier, check if it's a SENTIMENT word
            if word in sentimentDictionary:
                word_score = sentimentDictionary[word] # Get base score (+1 or -1)
                
                # 3. Apply the negation rule
                if negation_window > 0:
                    word_score = -word_score # Invert the score (e.g., +1 becomes -1)
                
                score += word_score
                
            # 4. Decrement the negation window on every loop
            if negation_window > 0:
                negation_window -= 1
```
This logic is simple and effective. For the phrase "not very good":

1. not sets negation_window = 3.

2. very: negation_window changes to 2

3. good (+1) is found.

4. The negation_window (which > 0) inverts its score to -1.

5. The final score is -1.0 (instead of  +1.0)

# The Core Logic: Calculating ```P(Positive|Word)```

The function iterates through every word in the model's vocabulary. For each one, it calculates P(Pos|Word) and stores it in a new dictionary called predictPower.

The calculation is a simplified version of Bayes' Theorem: P(Pos|W) = P(W|Pos) / (P(W|Pos) + P(W|Neg))

## Sorting and Printing Results:
After calculating the P(Pos|Word) for all ~20,000 words, the function finds the most useful ones by sorting this dictionary.

```head:```: This list contains the n words with the lowest probabilities (closest to 0.0), making them the strongest Negative predictors.

```tail```: This list contains the n words with the highest probabilities (closest to 1.0), making them the strongest Positive predictors.

In [7]:
# Print out n most useful predictors
def mostUseful(pWordPos, pWordNeg, pWord, n):
    predictPower={}
    for word in pWord:
        if pWordNeg[word]<0.0000001:
            predictPower[word]=1000000000
        else:
            predictPower[word]=pWordPos[word] / (pWordPos[word] + pWordNeg[word])
            
    sortedPower = sorted(predictPower, key=predictPower.get)
    head, tail = sortedPower[:n], sortedPower[len(predictPower)-n:]
    print ("NEGATIVE:")
    print (head)
    print ("\nPOSITIVE:")
    print (tail)

# Main Script: The Control Panel
This final block of code is not a function. It is the main script that controls the entire experiment. It runs from top to bottom and calls all the functions we've defined to generate the results for report.

## Stage 1: Initialization
First, we initialize all the master dictionaries as empty. These will be "filled in" by our helper functions.

In [8]:
#---------- Main Script --------------------------


sentimentDictionary={} # {} initialises a dictionary [hash function]
sentencesTrain={}
sentencesTest={}
sentencesNokia={}

#initialise datasets and dictionaries
readFiles(sentimentDictionary,sentencesTrain,sentencesTest,sentencesNokia)

pWordPos={} # p(W|Positive)
pWordNeg={} # p(W|Negative)
pWord={}    # p(W) 

# Naive Bayes "Learn & Test"
This block is for the statistical model.

First, we must call trainBayes. This is the "learning" step. It populates the pWordPos and pWordNeg dictionaries

In [9]:
# build conditional probabilities using training data
trainBayes(sentencesTrain, pWordPos, pWordNeg, pWord)

Testing: Now that the model is trained, we test it on all three datasets to get our results for Steps 2 & 3.

In [18]:
#run naive bayes classifier on datasets
testBayes(sentencesTrain,  "Films (Train Data, Naive Bayes)\t", pWordPos, pWordNeg, pWord,0.5)
testBayes(sentencesTest,  "Films  (Test Data, Naive Bayes)\t", pWordPos, pWordNeg, pWord,0.5)
testBayes(sentencesNokia, "Nokia   (All Data,  Naive Bayes)\t", pWordPos, pWordNeg, pWord,0.7)

Naive Bayes classification for Films (Train Data, Naive Bayes)	

--- Results for Films (Train Data, Naive Bayes)	 ---
Accuracy: 88.83% (8548/9623)

--- Positive Class ---
Precision: 89.29% (4225/4732)
Recall:    88.15% (4225/4793)
F1-Score:  88.71

--- Negative Class ---
Precision: 88.39% (4323/4891)
Recall:    89.50% (4323/4830)
F1-Score:  88.94
--------------------------------

Naive Bayes classification for Films  (Test Data, Naive Bayes)	

--- Results for Films  (Test Data, Naive Bayes)	 ---
Accuracy: 77.31% (804/1040)

--- Positive Class ---
Precision: 78.71% (414/526)
Recall:    76.95% (414/538)
F1-Score:  77.82

--- Negative Class ---
Precision: 75.88% (390/514)
Recall:    77.69% (390/502)
F1-Score:  76.77
--------------------------------

Naive Bayes classification for Nokia   (All Data,  Naive Bayes)	

--- Results for Nokia   (All Data,  Naive Bayes)	 ---
Accuracy: 57.89% (154/266)

--- Positive Class ---
Precision: 77.21% (105/136)
Recall:    56.45% (105/186)
F1-Score:  65.22

# Rule-Based "Test & Improve"
This block is for the rule-based model. It has no "training" step.

Baseline Test: First, we test the "naive" testDictionary on all datasets. This gives us our baseline scores (e.g., the 64.14%).

In [25]:
# run sentiment dictionary based classifier on datasets
testDictionary(sentencesTrain,  "Films (Train Data, Rule-Based)\t", sentimentDictionary, 1)
testDictionary(sentencesTest,  "Films  (Test Data, Rule-Based)\t",  sentimentDictionary, 1)
testDictionary(sentencesNokia, "Nokia   (All Data, Rule-Based)\t",  sentimentDictionary, 1)

Dictionary-based classification

--- Results for Films (Train Data, Rule-Based)	 ---
Accuracy: 65.50% (6303/9623)

--- Positive Class ---
Precision: 68.28% (2751/4029)
Recall:    57.40% (2751/4793)
F1-Score:  62.37

--- Negative Class ---
Precision: 63.50% (3552/5594)
Recall:    73.54% (3552/4830)
F1-Score:  68.15
--------------------------------

Dictionary-based classification

--- Results for Films  (Test Data, Rule-Based)	 ---
Accuracy: 62.88% (654/1040)

--- Positive Class ---
Precision: 67.76% (290/428)
Recall:    53.90% (290/538)
F1-Score:  60.04

--- Negative Class ---
Precision: 59.48% (364/612)
Recall:    72.51% (364/502)
F1-Score:  65.35
--------------------------------

Dictionary-based classification

--- Results for Nokia   (All Data, Rule-Based)	 ---
Accuracy: 79.70% (212/266)

--- Positive Class ---
Precision: 88.37% (152/172)
Recall:    81.72% (152/186)
F1-Score:  84.92

--- Negative Class ---
Precision: 63.83% (60/94)
Recall:    75.00% (60/80)
F1-Score:  68.97
-------

# Improved Test
Improved Test: Finally, we test our testDictionaryImproved function. This gives us our "improved" scores (e.g., the 64.43%) so we can compare them to the baseline.

In [26]:
print("\nRUNNING IMPROVED RULE-BASED CLASSIFIER (STEP 5.3)\n")
testDictionaryImproved(sentencesTrain,  "Films (Train Data, Improved Rule-Based)\t", sentimentDictionary, 1)
testDictionaryImproved(sentencesTest,  "Films  (Test Data, Improved Rule-Based)\t",  sentimentDictionary, 1)
testDictionaryImproved(sentencesNokia, "Nokia   (All Data, Improved Rule-Based)\t",  sentimentDictionary, 1)


RUNNING IMPROVED RULE-BASED CLASSIFIER (STEP 5.3)

IMPROVED Dictionary-based classification for: Films (Train Data, Improved Rule-Based)	

--- Results for Films (Train Data, Improved Rule-Based)	 ---
Accuracy: 65.77% (6329/9623)

--- Positive Class ---
Precision: 68.80% (2743/3987)
Recall:    57.23% (2743/4793)
F1-Score:  62.48

--- Negative Class ---
Precision: 63.63% (3586/5636)
Recall:    74.24% (3586/4830)
F1-Score:  68.53
--------------------------------

IMPROVED Dictionary-based classification for: Films  (Test Data, Improved Rule-Based)	

--- Results for Films  (Test Data, Improved Rule-Based)	 ---
Accuracy: 63.08% (656/1040)

--- Positive Class ---
Precision: 68.08% (290/426)
Recall:    53.90% (290/538)
F1-Score:  60.17

--- Negative Class ---
Precision: 59.61% (366/614)
Recall:    72.91% (366/502)
F1-Score:  65.59
--------------------------------

IMPROVED Dictionary-based classification for: Nokia   (All Data, Improved Rule-Based)	

--- Results for Nokia   (All Data, Improv

Analysis (Step 4): This call runs mostUseful. It analyzes the probabilities that trainBayes just calculated and prints the top 100 predictors.

In [21]:
# print most useful words
mostUseful(pWordPos, pWordNeg, pWord, 100)

NEGATIVE:

POSITIVE:
['stylistic', 'delicious', 'smartly', 'gradually', 'melancholy', 'marvel', 'resist', 'ramsay', 'portrayal', 'encounter', 'desperation', 'journey', 'vivid', 'haunting', 'beauty', 'grandeur', 'undeniably', 'russian', 'potent', 'droll', 'understands', 'unflinching', 'speaks', 'loving', 'frailty', 'transcends', 'integrity', 'washington', 'current', 'nuanced', 'ingenious', 'deft', 'explores', 'delightfully', 'smarter', 'joyous', 'touching', 'warm', 'thoughtful', 'helps', 'masterful', 'lane', 'startling', 'bourne', 'lovers', 'format', 'aware', 'poem', 'intimate', 'jealousy', 'pianist', 'powerful', 'visceral', 'iranian', 'record', 'richly', 'subversive', 'answers', 'hopeful', 'sadness', 'evocative', 'playful', 'resonant', 'aspects', 'tour', 'spare', 'sides', 'heartbreaking', 'timely', 'wry', 'unfolds', 'martha', 'lively', 'captivating', 'grown', 'pleasures', 'intense', 'provides', 'gem', 'polished', 'respect', 'vividly', 'heartwarming', 'captures', 'tender', 'detailed', '