<center>

# Extended Learning Portfolio

**ISYS2001 Introduction to Business Programming**

<small>School of Management

Semester 1 2023
</small>
</center>

This examination is an open-book format. You are permitted to utilise a variety of resources, including textbooks, web content, and AI tools, to complete the exam. However, it's important to note that all work submitted must be your own. Any work or ideas not your own must be properly referenced. 

Please refrain from discussing your responses to these questions with fellow students. If you have any inquiries about the questions about this assessment, please contact the instructor directly.  Any questions submitted to the instructor concerning this assesment will have the question and responses will be posted to this discussion forum.  

The examination duration is a total of 24 hours. This time frame begins at the predetermined exam start time and does not depend on when you commence the download. If you have accommodations under a CAP arrangement, the duration of the exam will be adjusted accordingly. If you feel that your CAP accommodations have not been satisfactorily implemented, please reach out to me immediately.

This examination consists of four questions in total, and you are required to provide answers to all of them. Each question should be contained within its own notebook, with the exception of Question Four, which can be compiled in a Microsoft Word document. To submit your answers, please establish a private GitHub repository and upload all of your responses to the designated questions, inclusive of the Word document for Question Four, to this repository.

Upon completion of all the questions, proceed to download the zip file of your GitHub repository. This file should be submitted via the link provided on Blackboard. Additionally, a separate submission of the Word document for Question Four must be made through the Turnitin link available on Blackboard.

# Question 1

Write a Python program within this or another notebook that performs advanced file analysis. The program should prompt the user to enter the path to a text file and allow them to choose from various analysis options:

* Counting the number of lines.
* Counting the total number of words.
* Counting the total number of characters, both including and excluding whitespace.
* Identifying the frequency of each word in the text.
* Identifying the top 5 most common words in the text.

After receiving the user input, your program should read the file and perform the chosen analysis, outputting the results in a clear, human-readable format.

*Question subparts:*

1. Implement the notebook program as described above. Your program should be robust and handle possible edge cases, such as file not found or incorrect input from the user.
2. Write a brief description of your program, explaining how to use it and what each analysis option does. This description should be written as if for other developers or users who might use your tool.
3. Write a few test cases to validate your tool. Consider edge cases such as empty files, very large files, files with unusual characters, and so on.
4. Discuss how you would modify your tool to analyze binary files, or large files that do not fit into memory. What kind of analysis could be useful in these cases?
5. Provide a few example text files and show the output of your program when run with these files.

Remember to include necessary error handling in your program to make it robust and reliable.

**[40 Marks]**

**My Explaination**

2.) This program pretty much asks the user to enter the path from the readme file in order for it to activate the next step.

After the user does that it then prompts the user to choose one of the five options that ere displayed and then gives the answer.

3.) I then created an empty file, large file and found an incorrect path on my laptop, ran the program and provided the path for the empty file, Large File and an incorrect path, chose analysis option 3 and the output was still the same.

4.) Large files that dont fit into memory can be abit of a problem as reading the whole file in one go might not be possible. So instead of reading the while file into memory it can get processed in chuncks.

Various analysis options can be created to work with streamed data, to allow incremental processing. Analysis options such as counting lines in chunks identifying frequency of specific patterns or keywords without storing the whole file in memory.


In [None]:
import string
from collections import Counter

#Counts number of lines in the file itsfelf.
#Args:file_path (str): Path to the text files location.
#Returns: int: The number  of lines in file.
def count_lines(file_path):
    with open(file_path, 'r') as file:
        line_count = sum(1 for _ in file)
    return line_count

#Counts number of characters in file
#Counts the total number of characters in given file, including or even excluding the whitespace.Args:file_path (str): Path to text file.include_whitespace (bool, optional): Whether to include the whitespace in character count. Defaults to True.
#Returns:int: The Number of characters in file.

def count_characters(file_path, include_whitespace=True):
    with open(file_path, 'r') as file:
        content = file.read()
        if include_whitespace:
            char_count = len(content)
        else:
            content = content.replace(" ", "")
            char_count = len(content)
    return char_count

#Counts number of words in the file itself.
#Args:file_path (str): Path to the text files location.
#Returns:int: The Number of words in file.

def count_words(file_path):
    with open(file_path, 'r') as file:
        word_count = 0
        for line in file:
            words = line.split()
            word_count += len(words)
    return word_count

#Identifies the top N most common words in the file

def top_common_words(file_path, num_words=5):
    word_count = word_frequency(file_path)
    common_words = word_count.most_common(num_words)
    return common_words

#Identifies the word frequency in each of the files

def word_frequency(file_path):
    with open(file_path, 'r') as file:
        content = file.read()
        content = content.translate(str.maketrans('', '', string.punctuation))
        words = content.lower().split()
        word_count = Counter(words)
    return word_count

#Analyzes the file depending on the users input

def analyze_file(file_path):
    print(f"Analyzing file: {file_path}")
    print("-" * 40)
    print("1. Count the number of lines")
    print("2. Count the total number of words")
    print("3. Count the total number of characters (including whitespace)")
    print("4. Count the total number of characters (excluding whitespace)")
    print("5. Identify the frequency of each word in the text")
    print("6. Identify the top 5 most common words in the text")
    print("-" * 40)
    choice = input("Enter your choice (1-6): ")

    if choice == '1':
        line_count = count_lines(file_path)
        print(f"Number of lines: {line_count}")
    elif choice == '2':
        word_count = count_words(file_path)
        print(f"Number of words: {word_count}")
    elif choice == '3':
        char_count = count_characters(file_path)
        print(f"Number of characters (including whitespace): {char_count}")
    elif choice == '4':
        char_count = count_characters(file_path, include_whitespace=False)
        print(f"Number of characters (excluding whitespace): {char_count}")
    elif choice == '5':
        word_count = word_frequency(file_path)
        print("Word frequency:")
        for word, count in word_count.items():
            print(f"{word}: {count}")
    elif choice == '6':
        common_words = top_common_words(file_path)
        print("Top 5 most common words:")
        for word, count in common_words:
            print(f"{word}: {count}")
    else:
        print("Invalid choice!")

file_path = input("Enter the path to the text file: ")
analyze_file(file_path)

# Question 2

**Question:**

As a new junior developer at EcommEasy, an e-commerce platform company, you're assigned to debug and refactor a piece of code left by one of the departed team members. This code is meant to determine if a customer is eligible for a certain promotional discount based on their total order value.

Unfortunately, the code is obfuscated, lacks documentation, and doesn't function as expected. Your task is to identify the error, correct it, and refactor the code according to the best industry practices, which include clear variable naming, detailed comments, error handling, and overall code readability. 

Here is the problematic code:

```python
def promo(o):
    p = None
    if o > 50 and o < 100:
        p = 5
    elif o > 100:
        p = 10
    else:
        p = 0
    if o <= 0 or o is None:
        raise ValueError("Order value not valid!")
    return o*(p/100)
```

*Question subparts:*

1. What is the error in the above code and why does it fail to calculate the promotional discount correctly?
2. How would you correct the error?
3. How would you refactor this code to align it with industry best practices? Write the refactored code within this or another notebook. Please include appropriate variable names, comments, error handling, and a basic explanation of the code for a layperson.
4. Write a few test cases to confirm the code is functioning as expected.

Hint: The promo function is supposed to apply a 5% discount if the order total is between \$50 and \$100 (inclusive), and a 10% discount if the order total exceeds \$100. Orders less than or equal to \$0 or null should raise an exception.

**[20 Marks]**

**My Explaination**

1.) The error in this above code is in the arrangement of the conditional statements. The code is only checking the first condition and assigning a promotional value of 5%.

2.) I would correct this error by rearanging the conditional statements so they can check the higher value first.

3.) I've written clean and readable code, Ive added error handling in order to order to align this code with best industry practices and Ive wirtten moduler code. 

This code is written in Python and defines a function called promo. This function takes in a parameter called "o", which represents the order value. This function first initializes a variable called "p" to 0.

This then checks if the order value is None or less than or equal to 0. If it is then the function raises an ValueError. If the order value is greater than 50 and less than 100, it then sets the "p" variable to 5. If the order value is greater than 100, it will set the "p" variable to 10.

Lastly, the function returns the order value multiplied by p/100 to calculate the discount for the order.

4.) Test cases are written below the code

In [None]:
def calculate_order_discount(order_value):
  # Initialize the discount_percent to 0
    discount_percent = 0
  # Check if order_value is within the range (50, 100)
    if order_value > 50 and order_value < 100:
        discount_percent = 5
  # Check if the order_value is greater than 100
    elif order_value > 100:
        discount_percent = 10
    else:
  # If order_value is less than or equal to 50, no discount is to be applied
        discount_percent = 0
  # Check if the order_value is invalid (less than or equal to 0 or None)
    if order_value <= 0 or order_value is None:
        raise ValueError("Order value not valid!")
  # Calculate the discount amount and return it
    return order_value * (discount_percent / 100)

  # This is the test case 
def calculate_order_discount(order_value):
    discount_percent = 0
    if order_value > 50 and order_value < 100:
        discount_percent = 5
    elif order_value > 100:
        discount_percent = 10
    else:
        discount_percent = 0

    if order_value <= 0 or order_value is None:
        raise ValueError("Order value not valid!")

    return order_value * (discount_percent / 100)

if __name__== '_main_':

    assert calculate_order_discount(40) == 0
    assert calculate_order_discount(75) == 3.75
    assert calculate_order_discount(200) == 20
    
    # add inline unit tests
    assert calculate_order_discount(30) == 0
    assert calculate_order_discount(110) == 11.0
    assert calculate_order_discount(65) == 3.25
    assert calculate_order_discount(150) == 15.0
    
try:
    # Test raised value error messages
    calculate_order_discount(0)
except ValueError as error:
    assert str(error) == "Order value not valid!"
        
try:
    # Test raised value error messages
    calculate_order_discount(-1)
except ValueError as error:
    assert str(error) == "Order value not valid!"


# Question 3

You have been given a task to develop a simple script that extracts news articles' title and text from a list of URLs. Your company, DataScrapr, is working on a project to analyze the sentiment of news articles from several news outlets and this task is the first step in the data collection process.

The task requires you to use Python, along with the `Newspaper3k` library, which is a simple and efficient tool for extracting and curating articles.

Here is your task:

1. Write a Python script that takes a list of URLs as input. Each URL points to a news article.
2. For each URL, your script should extract the article's title and the full text of the article.
3. The output of your script should be a list of dictionaries. Each dictionary should contain the URL, the article title, and the article text.
4. Include error checking in your script to handle possible issues with the URLs or the extraction process. 

*Question subparts:*

1. Implement the above-described script.
2. Explain how your script works and the role of the `Newspaper3k` library in the script.
3. How would you handle potential issues, such as a URL that doesn't point to a valid article or network errors?
4. Provide a few example URLs and show the output of your script when run with these URLs.

Note: Please be mindful of the terms of use for any website you are scraping, and make sure to respect the website's robots.txt file.

**[25 marks]**

**My Explaination**

2) This script is using the newspaper library to get content from news stories it retrieves data from a list of URLs. The following actions are taken by the script when it loops through each URL in the list:

A creation of a Article object using the supplied URL utilizes the download() function to download the article's website content. uses the parse() method to analyze the downloaded content and extract the article's title and primary text.
adds the dictionary to a list of article dictionaries after adding the extracted metadata (title, text, and URL) to it.

3) If there are any failures (requests.exceptions), it will then moves on to the next URL.When downloading/processing the current URL failed because of a RequestException or ValueError.
The list of article dictionaries is exactly what the function extract_articles() returns right at the end. the article follows. The console receives printed dictionaries.

Data from the news articles is downloaded and then scraped when using the newspaper library. It offers an intuitive interface when it comes to extracting metadata such authors, publish dates and article titles, as well as the article text itself. It is built on top of beautifulsoup4, lxml, and requests packages. The library works with articles from a variety of news sources and employs machine learning algorithms to try and determine an article's primary subject matter.  

**My Explaination**



In [None]:
#From newspaper import Article 
import requests
#List of URLs for article extraction
example_urls = [
   'https://rosettacode.org/wiki/Wireworld',
  'https://react.dev/reference/react/useState'
]

def extract_articles(urls):
#List to store extracted articles
    articles = []
#Iterate through the URLS
    for url in urls:
        try:
#Create an instance of article class
            article = Article(url)
#Download articel aswell as parse it
            article.download()
            article.parse()
#Create a documentary in roder to store all the article details
            article_dict = {}
            article_dict["url"] = url
            article_dict["title"] = article.title
            article_dict["text"] = article.text
#Append articles to the list of articles
            articles.append(article_dict)
        except (requests.exceptions.RequestException, ValueError):
            print(f"Error processing {url}")
            continue
#Print the article that were extracted
    for article in articles:
        print(article)
#Call extract articles funtion that has all the sample URLS
if __name__ == "_main_":
    articles = extract_articles(example_urls)

# Question 4

Write a reflective report that identifies and discusses what you perceive as the most impactful activity within this course unit, and its contributions to your understanding of an ISYS2001 activity or topic. **Additionally, please incorporate all your weekly journal entries as an appendix to this report.** The report should be prepared in a Microsoft Word document, which will be submitted via the TurnItin link available on Blackboard.

**[15 marks]**