Instructions :

The goal of the exercise is to create a class that will help you analyze a specific text. A text can be just a simple string, like “Today, is a happy day” or it can be an external text file.

Part I

First, we will analyze a simple string, like “A good book would sometimes cost as much as a good house.”

Create a class called Text that takes a string as an argument and store the text in a attribute.

Hint: You need to manually copy-paste the text, straight into the code

Implement the following methods:

a method to return the frequency of a word in the text (assume words are separated by whitespace) return None or a meaningful message.

a method that returns the most common word in the text.

a method that returns a list of all the unique words in the text.


In [1]:
class Text:
    def __init__(self, text):
        self.text = text
        self.words = self.text.split()

    def word_frequency(self, word):
        word = word.lower()
        return self.words.count(word)

    def most_common_word(self):
        word_counts = {}
        for word in self.words:
            word = word.lower()
            if word in word_counts:
                word_counts[word] += 1
            else:
                word_counts[word] = 1
        max_count = max(word_counts.values())
        most_common_words = [word for word, count in word_counts.items() if count == max_count]
        return most_common_words

    def unique_words(self):
        return list(set(self.words))

text = Text("A good book would sometimes cost as much as a good house.")
print(text.word_frequency("good"))
print(text.most_common_word())
print(text.unique_words())

2
['a', 'good', 'as']
['book', 'sometimes', 'cost', 'much', 'house.', 'A', 'a', 'as', 'would', 'good']


Part II

Then, we will analyze a text coming from an external text file. Download the_stranger.txt file.

Implement a classmethod that returns a Text instance but with a text file:

    >>> Text.from_file('the_stranger.txt')
Hint: You need to open and read the text from the text file.

Now, use the provided the_stranger.txt file and try using the class you created above.

In [2]:
import zipfile
import nltk
from nltk.tokenize import word_tokenize
import re

nltk.download('punkt')

class Text:
    def __init__(self, text):
        self.text = text
        self.words = word_tokenize(re.sub(r'[^\w\s]', '', text.lower()))

    def word_frequency(self, word):
        return self.words.count(word.lower())

    def most_common_word(self):
        word_counts = {}
        for word in self.words:
            word_counts[word] = word_counts.get(word, 0) + 1
        max_count = max(word_counts.values())
        most_common_words = [word for word, count in word_counts.items() if count == max_count]
        return most_common_words

    def unique_words(self):
        return list(set(self.words))

    @classmethod
    def from_file(cls, filename):
        with zipfile.ZipFile(filename, 'r') as zip_ref:
            with zip_ref.open('the_stranger.txt') as file:
                text = file.read().decode('utf-8')
        return cls(text)

text = Text.from_file('the_stranger.zip')
print(text.word_frequency("the"))  # Output: frequency of "the"
print(text.most_common_word())  # Output: most common word(s)
print(text.unique_words())  # Output: unique words

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Mollean\AppData\Roaming\nltk_data...


2084
['the']


[nltk_data]   Package punkt is already up-to-date!


Bonus:
Create a class called TextModification that inherits from Text.

Implement the following methods:
a method that returns the text without any punctuation.
a method that returns the text without any english stop-words (check out what this is !!).
a method that returns the text without any special characters.
Note: Instead of creating a child class, you could also implements those methods as static methods in the Text class.

Note: Feel free to implement/create any attribute, method or function needed to make this work, be creative :)

In [3]:
class TextModification(Text):
    def __init__(self, text):
        super().__init__(text)

    def remove_punctuation(self):
        return self.text.translate(str.maketrans('', '', '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'))

    def remove_stopwords(self):
        stop_words = set(stopwords.words('english'))
        words = self.text.split()
        filtered_words = [word for word in words if word.lower() not in stop_words]
        return ' '.join(filtered_words)

    def remove_special_characters(self):
        return re.sub(r'[^a-zA-Z0-9\s]', '', self.text)

text = TextModification.from_file('the_stranger.zip')
print(text.remove_punctuation())
print(text.remove_stopwords())
print(text.remove_special_characters())

Albert Camus ♦ THE STRANGER 



THE 



Stranger 



By ALBERT CAMUS 



Translated from the French 
by Stuart Gilbert 




VINTAGE BOOKS 

A Division of Random House 



NEW YORK 



Albert Camus ♦ THE STRANGER 



VINTAGE BOOKS 

are published by Alfred A Knopf Inc 
and Random House Inc 

Copyright 1942 by Librairie Gallimard as LETRANGER 

Copyright 1946 by ALFRED A KNOPF INC All rights reserved No part of this book may be reproduced in any form without permission in 
writing from the publisher except by a reviewer who may quote brief passages in a review to be printed in a magazine or newspaper Manufactured 
in the United States of America Distributed in Canada by Random House of Canada Limited Toronto 



Albert Camus ♦ THE STRANGER 



Contents 

Contents 3 

Part One 4 

1 4 

II 14 

III 18 

IV 24 

V 28 

VI 32 

Part Two 40 

1 40 

II 46 

III 52 

IV 62 

V 68 

About the Author 77 



Albert Camus ♦ THE STRANGER 



Part One 



MOTHER died today Or maybe yesterday I cant

NameError: name 'stopwords' is not defined