<center>

# Extended Learning Portfolio

**ISYS2001 Introduction to Business Programming**

<small>School of Management

Semester 1 2023
</small>
</center>

This examination is an open-book format. You are permitted to utilise a variety of resources, including textbooks, web content, and AI tools, to complete the exam. However, it's important to note that all work submitted must be your own. Any work or ideas not your own must be properly referenced. 

Please refrain from discussing your responses to these questions with fellow students. If you have any inquiries about the questions about this assessment, please contact the instructor directly.  Any questions submitted to the instructor concerning this assesment will have the question and responses will be posted to this discussion forum.  

The examination duration is a total of 24 hours. This time frame begins at the predetermined exam start time and does not depend on when you commence the download. If you have accommodations under a CAP arrangement, the duration of the exam will be adjusted accordingly. If you feel that your CAP accommodations have not been satisfactorily implemented, please reach out to me immediately.

This examination consists of four questions in total, and you are required to provide answers to all of them. Each question should be contained within its own notebook, with the exception of Question Four, which can be compiled in a Microsoft Word document. To submit your answers, please establish a private GitHub repository and upload all of your responses to the designated questions, inclusive of the Word document for Question Four, to this repository.

Upon completion of all the questions, proceed to download the zip file of your GitHub repository. This file should be submitted via the link provided on Blackboard. Additionally, a separate submission of the Word document for Question Four must be made through the Turnitin link available on Blackboard.

# Question 1

Write a Python program within this or another notebook that performs advanced file analysis. The program should prompt the user to enter the path to a text file and allow them to choose from various analysis options:

* Counting the number of lines.
* Counting the total number of words.
* Counting the total number of characters, both including and excluding whitespace.
* Identifying the frequency of each word in the text.
* Identifying the top 5 most common words in the text.

After receiving the user input, your program should read the file and perform the chosen analysis, outputting the results in a clear, human-readable format.

*Question subparts:*

1. Implement the notebook program as described above. Your program should be robust and handle possible edge cases, such as file not found or incorrect input from the user.
2. Write a brief description of your program, explaining how to use it and what each analysis option does. This description should be written as if for other developers or users who might use your tool.
3. Write a few test cases to validate your tool. Consider edge cases such as empty files, very large files, files with unusual characters, and so on.
4. Discuss how you would modify your tool to analyze binary files, or large files that do not fit into memory. What kind of analysis could be useful in these cases?
5. Provide a few example text files and show the output of your program when run with these files.

Remember to include necessary error handling in your program to make it robust and reliable.

**[40 Marks]**

In [None]:
import os

In [None]:
import string

In [None]:
from collections import Counter

In [None]:
def count_lines(file_path):
    try:
        with open(file_path, 'r') as file:
            line_count = sum(1 for _ in file)
            print(f"Number of lines: {line_count}")
    except FileNotFoundError:
        print("File not found.")

def count_words(file_path):
    try:
        with open(file_path, 'r') as file:
            word_count = 0
            for line in file:
                words = line.split()
                word_count += len(words)
            print(f"Number of words: {word_count}")
    except FileNotFoundError:
        print("File not found.")

def count_characters(file_path, include_whitespace=True):
    try:
        with open(file_path, 'r') as file:
            if include_whitespace:
                content = file.read()
                char_count = len(content)
            else:
                content = file.read().replace(" ", "").replace("\n", "")
                char_count = len(content)
            print(f"Number of characters (including whitespace): {char_count}")
    except FileNotFoundError:
        print("File not found.")

def word_frequency(file_path):
    try:
        with open(file_path, 'r') as file:
            content = file.read()
            words = content.split()
            word_count = Counter(words)
            print("Word frequency:")
            for word, count in word_count.items():
                print(f"{word}: {count}")
    except FileNotFoundError:
        print("File not found.")

def top_common_words(file_path, num_words=5):
    try:
        with open(file_path, 'r') as file:
            content = file.read()
            words = content.split()
            word_count = Counter(words)
            common_words = word_count.most_common(num_words)
            print(f"Top {num_words} most common words:")
            for word, count in common_words:
                print(f"{word}: {count}")
    except FileNotFoundError:
        print("File not found.")

def analyze_file(file_path):
    if not os.path.isfile(file_path):
        print("Invalid file path.")
        return

    print(f"Analyzing file: {file_path}\n")
    print("Choose an analysis option:")
    print("1. Count the number of lines.")
    print("2. Count the total number of words.")
    print("3. Count the total number of characters (including whitespace).")
    print("4. Count the total number of characters (excluding whitespace).")
    print("5. Identify the frequency of each word in the text.")
    print("6. Identify the top 5 most common words in the text.")

    option = input("Enter the number corresponding to your choice: ")
    print()

    if option == '1':
        count_lines(file_path)
    elif option == '2':
        count_words(file_path)
    elif option == '3':
        count_characters(file_path, include_whitespace=True)
    elif option == '4':
        count_characters(file_path, include_whitespace=False)
    elif option == '5':
        word_frequency(file_path)
    elif option == '6':
        top_common_words(file_path)
    else:
        print("Invalid option.")

# Question 2

**Question:**

As a new junior developer at EcommEasy, an e-commerce platform company, you're assigned to debug and refactor a piece of code left by one of the departed team members. This code is meant to determine if a customer is eligible for a certain promotional discount based on their total order value.

Unfortunately, the code is obfuscated, lacks documentation, and doesn't function as expected. Your task is to identify the error, correct it, and refactor the code according to the best industry practices, which include clear variable naming, detailed comments, error handling, and overall code readability. 

Here is the problematic code:

```python
def promo(o):
    p = None
    if o > 50 and o < 100:
        p = 5
    elif o > 100:
        p = 10
    else:
        p = 0
    if o <= 0 or o is None:
        raise ValueError("Order value not valid!")
    return o*(p/100)
```

*Question subparts:*

1. What is the error in the above code and why does it fail to calculate the promotional discount correctly?
2. How would you correct the error?
3. How would you refactor this code to align it with industry best practices? Write the refactored code within this or another notebook. Please include appropriate variable names, comments, error handling, and a basic explanation of the code for a layperson.
4. Write a few test cases to confirm the code is functioning as expected.

Hint: The promo function is supposed to apply a 5% discount if the order total is between \$50 and \$100 (inclusive), and a 10% discount if the order total exceeds \$100. Orders less than or equal to \$0 or null should raise an exception.

**[20 Marks]**

In [None]:
def calculate_promotional_discount(order_value):
    """
    Calculates the promotional discount based on the order value.

    Args:
        order_value (float): The total value of the customer's order.

    Returns:
        float: The promotional discount amount.

    Raises:
        ValueError: If the order value is less than or equal to 0 or None.

    Example:
        calculate_promotional_discount(75.0)
        Output: 3.75
    """
    if order_value is None or order_value <= 0:
        raise ValueError("Invalid order value! The order value must be greater than 0.")

    discount_percentage = 0.0
    if order_value >= 50 and order_value <= 100:
        discount_percentage = 5.0
    elif order_value > 100:
        discount_percentage = 10.0

    promotional_discount = order_value * (discount_percentage / 100.0)
    return promotional_discount


In [None]:
# Test case 1: Order value within the $50-$100 range
order_value = 75.0
discount = calculate_promotional_discount(order_value)
print(f"Order value: {order_value}, Promotional discount: {discount}")
# Output: Order value: 75.0, Promotional discount: 3.75

# Test case 2: Order value exceeding $100
order_value = 150.0
discount = calculate_promotional_discount(order_value)
print(f"Order value: {order_value}, Promotional discount: {discount}")
# Output: Order value: 150.0, Promotional discount: 15.0

# Test case 3: Order value less than or equal to $0
order_value = 0.0
try:
    discount = calculate_promotional_discount(order_value)
    print(f"Order value: {order_value}, Promotional discount: {discount}")
except ValueError as e:
    print(f"Error: {str(e)}")
# Output: Error: Invalid order value! The order value must be greater than 0.

# Test case 4: Order value is None
order_value = None
try:
    discount = calculate_promotional_discount(order_value)
    print(f"Order value: {order_value}, Promotional discount: {discount}")
except ValueError as e:
    print(f"Error: {str(e)}")
# Output: Error: Invalid order value! The order value must be greater than 0.


# Question 3

You have been given a task to develop a simple script that extracts news articles' title and text from a list of URLs. Your company, DataScrapr, is working on a project to analyze the sentiment of news articles from several news outlets and this task is the first step in the data collection process.

The task requires you to use Python, along with the `Newspaper3k` library, which is a simple and efficient tool for extracting and curating articles.

Here is your task:

1. Write a Python script that takes a list of URLs as input. Each URL points to a news article.
2. For each URL, your script should extract the article's title and the full text of the article.
3. The output of your script should be a list of dictionaries. Each dictionary should contain the URL, the article title, and the article text.
4. Include error checking in your script to handle possible issues with the URLs or the extraction process. 

*Question subparts:*

1. Implement the above-described script.
2. Explain how your script works and the role of the `Newspaper3k` library in the script.
3. How would you handle potential issues, such as a URL that doesn't point to a valid article or network errors?
4. Provide a few example URLs and show the output of your script when run with these URLs.

Note: Please be mindful of the terms of use for any website you are scraping, and make sure to respect the website's robots.txt file.

**[25 marks]**

In [None]:
pip install newspaper3k


In [None]:
import newspaper
from newspaper import Article
from newspaper import Config

def extract_articles(urls):
    articles = []

    # Customizing the configuration to handle potential issues
    config = Config()
    config.keep_article_html = True  # Keeps the raw HTML of the articles

    for url in urls:
        try:
            article = Article(url, config=config)
            article.download()  # Download the article's HTML content
            article.parse()  # Parse the article's content
            article.nlp()  # Apply natural language processing

            article_data = {
                'url': url,
                'title': article.title,
                'text': article.text
            }

            articles.append(article_data)
        except Exception as e:
            print(f"Error processing URL: {url}")
            print(str(e))
            print()

    return articles


# Question 4

Write a reflective report that identifies and discusses what you perceive as the most impactful activity within this course unit, and its contributions to your understanding of an ISYS2001 activity or topic. **Additionally, please incorporate all your weekly journal entries as an appendix to this report.** The report should be prepared in a Microsoft Word document, which will be submitted via the TurnItin link available on Blackboard.

**[15 marks]**