# Optimization of Algorithms problems

## Exercise 1
### Code Optimization for Text Processing

You are provided with a text processing code to perform the following operations:

1. Convert all text to lowercase.
2. Remove punctuation marks.
3. Count the frequency of each word.
4. Show the 5 most common words.

The code works, but it is inefficient and can be optimized. Your task is to identify areas that can be improved and rewrite those parts to make the code more efficient and readable.

Points to optimize:

1. **Removal of punctuation marks**: Using `replace` in a loop can be inefficient, especially with long texts. Look for a more efficient way to remove punctuation marks.
2. **Frequency count**: The code checks for the existence of each word in the dictionary and then updates its count. This can be done more efficiently with certain data structures in Python.
3. **Sort and select:** Consider if there is a more direct or efficient way to get the 5 most frequent words without sorting all the words.
4. **Modularity**: Break the code into smaller functions so that each one performs a specific task. This will not only optimize performance, but also make the code more readable and maintainable.

In [6]:
#need it for counting occurences of items in a list or other iterable in python
from collections import Counter
# used to remove punctuation from text in this example
import string

# name of function
def remove_punctuation(text):
    # 1st empty string means no characters are being replaced 
    # 2nd empty string means there are no characters being mapped with other characters
    # string.punctuation means to delete all punctuation characters
    translator = str.maketrans("", "", string.punctuation)
    return text.translate(translator)

def count_words(text):
    # the original text is being split per word and compiled as a list of strings
    words = text.split()
    # Counter, counts the words or list of strings 
    return Counter(words)

# frequencies store the counter function from above
# n = 5 collects 5 of each of the most common items I want to retrieve 
def get_most_common(frequencies, n = 5):
    return frequencies.most_common(n)

def process_text(text):
    # copies all words from the text that are upper case and store it in 'text'
    text = text.lower()

    # assign function count words to frequencies 
    frequencies = count_words(text)
    # assign most common function to top_5
    top_5 = get_most_common(frequencies)
    
    # is a for loop function that counts the most common words reused in the text 
    # and how often it occurs
    for w, frequency in top_5:
        print(f"'{w}': {frequency} occurence")

#text to be processed
text = """
    In the heart of the city, Emily discovered a quaint little café, hidden away from the bustling streets. 
    The aroma of freshly baked pastries wafted through the air, drawing in passersby. As she sipped on her latte, 
    she noticed an old bookshelf filled with classics, creating a cozy atmosphere that made her lose track of time.
"""
process_text(text)

'the': 5 occurence
'of': 3 occurence
'in': 2 occurence
'a': 2 occurence
'she': 2 occurence


## Exercise 2
### Code Optimization for List Processing

You have been given a code that performs operations on a list of numbers for:

1. Filter out even numbers.
2. Duplicate each number.
3. Add all numbers.
4. Check if the result is a prime number.

The code provided achieves its goal, but it may be inefficient. Your task is to identify and improve the parts of the code to increase its efficiency.

Points to optimize:

1. **Filter numbers**: The code goes through the original list to filter out even numbers. Consider a more efficient way to filter the list.
2. **Duplication**: The list is traversed multiple times. Is there a way to do this more efficiently?
3. **Summing**: The numbers in a list are summed through a loop. Python has built-in functions that can optimize this.
4. **Function `is_prime`**: While this function is relatively efficient, investigate if there are ways to make it even faster.
5. **Modularity**: Consider breaking the code into smaller functions, each focused on a specific task.

Both exercises will help you improve your code performance optimization skills and give you a better understanding of how different data structures and programming techniques can affect the efficiency of your code.

In [10]:
# defining which numbers are prime 
def is_prime(n):
    # if number is 1 or equal to one it is true as prime num
    if (n <= 1):
        return False
    # if number is 2 and 3 then it is a prime num; true 
    if (n <= 3):
        return True
    # checks if number is divisible by 2 or 3 and if it is, then not a prime number; false
    if (n % 2 == 0) or (n % 3 == 0):
        return False
    #this function is used for larger numbers to find out if it is a prime number or not
    i = 5
    while ((i * i) <= n):
        if (n % i == 0) or (n % (i + 2) == 0):
            return False
        i += 6
    return True


def filter_duplicate(list_):
    # this filters the list of numbers and finds out if it is even it multiplies by 2 and adds to
    # a new list of doubled even numbers 
    return [num * 2 for num in list_ if num % 2 == 0]

#assigning the filter duplicate function to duplicate list 
def process_list(list_):
    duplicate_list = filter_duplicate(list_)
    
    # calculate the sum of the numbers in duplicate list
    # check if that sum is a prim number then the test is stored in prime variable
    sum_ = sum(duplicate_list)
    prime = is_prime(sum_)
    
    return sum_, prime

list_ = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
result, result_prime = process_list(list_)
print(f"Result: {result}, {'Yes' if result_prime else 'No'} number is not prime")

Result: 60, No number is not prime
