# TEXT ANALYSIS TOOL

The program is expected to execute the following functions to the given string:
1. Count words,
2. Count sentences (a sentence is assumed to end with '.', '?', or '!'),
3. Count paragraphs (separated by double new lines "\n\n"),
4. Identify most common word (ignore case),
5. Calculate average word length, and
6. Find words that has at least n characters

The text should be filtered from punctuations when counting words.



In order to efficiently execute programs that targets words `tasks 1, 4, 5, 6`,
it would be important to create first a parsed version of the text as a list,
where each element contains a word that is cased based on user input

In [2]:
def parse_text(text: str, cased: bool = False, cut_apos: bool = True) -> list[str]:
    words = []
    word = ""
        
    in_word, in_apos = False, False
    for char in text:
        apos_inword = not cut_apos and char == "'"
        in_word = char.isalnum() or char == '-' or apos_inword
        in_apos = (char == "'" or (in_apos and in_word)) and cut_apos

        if in_word and not in_apos:
            word += char

        if not in_word and word != "":
            if not cased:
                word = word.lower()
            words.append(word)
            word = ""

    if in_word and word != "":
        if not cased:
            word = word.lower()
        words.append(word)
    return words

### parse_text(text, cased = False, cut_apos = False)

#### DESCRIPTION:
Converts a text of strings into a list of words with the separator
characters being NOT Alphanumeric and a hyphen (-)

#### PARAMETERS:
__test__
: String
> Text that will be converted to list of words

__cased__
: Boolean, optional
> It considers case sensitivity. A False value always converts the text to lowercase

__cut_apos__
: Boolean, optional
> Handles how characters after an apostrophe character `'` is handled. If `True`, it will always cut the characters after the apostrophe until the next separator

#### RETURNS:
`list[str]`


Execute the following code to show how `parse_text` works

In [7]:
text = "Hello World!! This is a sample text"
apos = "Hello! This is Von's first laboratory submission"
msg = "This is my friend O'Niel. He wants to play with us."

print("Uncased:", parse_text(text, False))
print("Cased:  ", parse_text(text, True))
print("With apos 1:       ", parse_text(apos))
print("With apos 2:       ", parse_text(msg))
print("With apos_cut=False", parse_text(msg, cut_apos = False))

Uncased: ['hello', 'world', 'this', 'is', 'a', 'sample', 'text']
Cased:   ['Hello', 'World', 'This', 'is', 'a', 'sample', 'text']
With apos 1:        ['hello', 'this', 'is', 'von', 'first', 'laboratory', 'submission']
With apos 2:        ['this', 'is', 'my', 'friend', 'o', 'he', 'wants', 'to', 'play', 'with', 'us']
With apos_cut=False ['this', 'is', 'my', 'friend', "o'niel", 'he', 'wants', 'to', 'play', 'with', 'us']


It is reminded that the parse_text is used only for tasks 1, 4, 5, and 6
1. Count Words
4. Identify most common word
5. Calculate average word length
6. Find words that has at least n characters

## USING OOP FOR EXECUTING PROGRAMS

In order to make the code a bit organized, it will be encapsulated in a class called `Text Analysis`

In [16]:
class TextAnalysis:
    def __init__(self, text: str, analyse: bool = False, min_length: int = -1):
        self.text = text
        self.parsed_text = parse_text(text)

        if analyse:
            if min_length < 1:
                raise ValueError("analyse arg is True but no min_length arg is added")
            self.analyse_text(min_length)
    
    def analyse_text(self, min_length: int = 1) -> None:
        print("=== TEXT ANALYSIS REPORT ===")
        print("Total Words:", self.count_words())
        print("Total Sentences:", self.count_sentences())
        print("Total Paragraphs:", self.count_paragraphs())

        most_common = self.most_common_word()
        word = most_common[0]
        occurs = most_common[1]
        
        print(f"Most Common Word: \"{word}\" (appears {occurs} times)")
        print(f"Average Word Length: %.2f characters" % self.average_word_length())
        print(f"Words with at least {min_length} characters:")
        self.print_list(self.find_long_words(min_length), 5, 4)
        
    def count_words(self) -> int:
        return len(self.parsed_text)

    def count_sentences(self) -> int:
        if self.text == "": return 0

        stc_count = 0
        in_stc = False

        for char in self.text:
            if char.isalnum():
                in_stc = True
            
            if char in ['.', '?', '!'] and in_stc:
                stc_count += 1
                in_stc = False

        return stc_count + 1 if in_stc else stc_count

    def count_paragraphs(self) -> int:
        if self.text == "": return 0

        parag_count = 0
        in_newline = False
        in_parag = False

        for char in self.text:
            if char != '\n':
                in_parag = True
                in_newline = False
            
            if char == '\n':
                if in_parag and in_newline:
                    parag_count += 1
                    in_parag = False
                in_newline = True

        return parag_count + 1 if in_parag else parag_count

    def most_common_word(self) -> list | None:
        if self.parsed_text == []:
            return None
        
        word_count = {}

        for word in self.parsed_text:
            word = word.lower()
            if word not in word_count:
                word_count[word] = 0
            word_count[word] += 1

        common_word = ""
        occur_count = -1
        for word in word_count:
            word_occurence = word_count[word]
            if word_occurence > occur_count:
                common_word = word
                occur_count = word_occurence

        return [common_word, occur_count]

    def average_word_length(self) -> float:
        total = 0
        for word in self.parsed_text:
            total += len(word)
        return total / len(self.parsed_text)

    def find_long_words(self, min_length: int) -> list[str]:
        unique_words = set(self.parsed_text)
        long_words = []
        for word in unique_words:
            if len(word) >= min_length:
                long_words.append(word)
        return long_words

    @staticmethod
    def print_list(li: list, col: int, indent: int = 0) -> None:
        indentation = " " * indent
        print(end = indentation)
        for idx, it in enumerate(li):
            print('"%s", ' % (it), end = "")
            if (idx - (col - 1))% col == 0:
                print(end = "\n" + indentation)

Important Note: Execute the previous code blocks before the executing the code below

In [17]:
sample_text = "Python is a high-level programming language. It is known for its simplicity and readability.\n\
Python supports multiple programming paradigms including procedural, object-oriented, and functional programming.\n\
\n\
Many developers choose Python for its extensive libraries and frameworks.\n\
The language is widely used in web development, data science, artificial intelligence, and automation.\n\
Python's philosophy emphasizes code readability and simplicity!"

min_length = 8
text_analysis = TextAnalysis(sample_text)
text_analysis.analyse_text(min_length)

=== TEXT ANALYSIS REPORT ===
Total Words: 56
Total Sentences: 6
Total Paragraphs: 2
Most Common Word: "and" (appears 5 times)
Average Word Length: 6.77 characters
Words with at least 8 characters:
    "readability", "philosophy", "supports", "intelligence", "functional", 
    "programming", "extensive", "high-level", "multiple", "libraries", 
    "emphasizes", "procedural", "including", "frameworks", "simplicity", 
    "development", "language", "object-oriented", "paradigms", "artificial", 
    "developers", "automation", 

# Student Grade Calculator

The program is expected to:
1. Calculate the average scores of each student,
2. Get the corresponding grade assessment for each score

In [None]:
class GradeCalc:
    @staticmethod
    def get_grade_report(students: list[str], scores: list[list[int]], exam_count: int = -1) -> None:
        if len(students) != len(scores):
            raise TypeError(f"Number of students ({len(students)}) does not match number of scores ({len(scores)})")
        
        print("=== STUDENT GRADE REPORT ===")
        for student, score in zip(students, scores):
            if exam_count > 0 and len(score) != exam_count:
                raise ValueError(f"{student} took {len(score)} exams, different to number of exams given ({exam_count}).")
            
            spaces = " " * (9 - len(student))
            ave = GradeCalc.calculate_average(score)
            grd = GradeCalc.get_letter_grade(ave)
            print("%s:%sAverage: %.2f Grade: %s" % (student, spaces, ave, grd))
    
    @staticmethod
    def calculate_average(scores: list[int]) -> float:
        total = 0
        for score in scores:
            total += score
        return total / len(scores)

    @staticmethod
    def get_letter_grade(average: float) -> chr:
        grade_letter = {90: 'A', 80: 'B', 70: 'C', 60: 'D'}
        for grade in grade_letter:
            if average >= grade:
                return grade_letter[grade]
        return 'F'

### GradeCalc.get_grade_report(students, scores, exam_count = -1)

#### DESCRIPTION:
Prints the student grade report of each student in the list. It prints the:
1. Average scores of each student
2. The corresponding letter grade of the average calculated

#### PARAMETERS:
__students__
: List[String]
> List of names of students who took the exam

__scores__
: List[List[Integer]]
> List of each student's scores for each exam. The nth element in `scores` correspond the score of student at nth index in `students`

__exam_count__
: Integer, Optional
> Determines the number of exams taken. A positive value restricts the amount of exams per student to that value, otherwise ignores this filter

#### RETURNS:
`None`

#### RAISES:
> __TypeError__
: number of students does not match number of scores `len(students) != len(scores)`

> __ValueError__
: `exam_count` is positive, and number of a student's exams does not match it `exam_count > 0 and exam_count != len(scores[idx])`

After executing the code block above, you can test its output by executing the code block below

In [29]:
students = ["Alice", "Bob", "Charlie", "Diana", "Eve"]
scores = [
    [85, 92, 78, 94],    # Alice's scores
    [76, 83, 91, 87],    # Bob's scores
    [94, 89, 96, 93],    # Charlie's scores
    [67, 74, 82, 79],    # Diana's scores
    [88, 85, 90, 92]     # Eve's scores
]
GradeCalc.get_grade_report(students, scores, 4)

=== STUDENT GRADE REPORT ===
Alice:    Average: 87.25 Grade: B
Bob:      Average: 84.25 Grade: B
Charlie:  Average: 93.00 Grade: A
Diana:    Average: 75.50 Grade: C
Eve:      Average: 88.75 Grade: B
