# Week 4 Assignment - Joseph Adams


### General Design
When looking to develop the solution to the question, I tried to look to re-use where possible.  
 
To that end I have, generally, broken tasks into 3 main areas
- Loading of data;
- Creating the report content;
- Outputting the data;

There are some occasions where this sequence will have extra functions added for specific tasks, these will be documented as needed

My aim in this notebook is to follow a standard approach for all questions.
This approach will be:
- Create a markdown block which notes any specific issues encountered, assumptions made, etc.
- Create a code block with the solution I have devised.
- Finish with a markdown block which has, if applicable, any additional notes on specific areas of the code.


# Task 1

Task 1 involved loading and working with the contents of two CSV files.  The assignment presented a number of issues which the question did bring attention to:
- Books in the bookloans file which are not present in the books file.
- The loan start and loan end in Excel Epoch format.

While working on the files, I also noticed a problem with the file encoding.  When opening the file, I would see a `\ufeff1` appear at the start of the file.  Some investigation suggested that this was a symption of reading a file with "UTF-8 with BOM" encoding with "UTF-8" encoding.  Opening the CVS files in VS Code showed that both CSV file did indeed have the "UTF-8 with BOM" encoding (as per the graphic below:

![image.png](attachment:image.png)
(Only 1 file shown for brevity)

I also was unsure of the "format" of the report which was asked for.  The question paper made mention of not only reading CSV files, but also writing CSV files.  So, it could be assumed that the report was to be in the format.  What I have done, is to create a CSV report, but also output that report to the screen.  This will be highlighted in the post code writeup.  It should not noted that many of the input and out functions are reused over each of the questions. 

One final thing was that books file had a header while the bookloans did not.  Initially I sought to fix this by skipping headers, but ultimately decided not to.  More information in the relevant sections.

In [9]:
import sys
import csv
import datetime

# csv file names
BOOKLOANSFILE = 'bookloans.csv'
BOOKSFILE = 'books.csv'
REPORTFILETASK_1 = 'Task_1_report.csv'


def open_file(file_in, skip_header=False):
    """Open a file and returns the data.

        Wraps a basic file open in a function to allow this
        to be reused.  When I was examining the books and
        bookloans files I noticed that when reading the
        files, I always ended up with a very random
        "\ufeff1" appearing right at the begning of the data
        from the file.  Further examination with VS Code
        showed that the encoding of the file was "utf-8 with
        BOM" and that this random sequence was a symptom.
        To corrrect this, the file had to be opened with
        "utf-8-sig" encoding.  I did consider if the
        encoding should also be made a parameter, but
        ultimately for these tasks kept this hard coded
        as both files share the same encoding.

        I also try to catch errors using a try/except
        block, I have one specific error "FileNotFoundError"
        and a generic catch all.  Error messages for both
        these occasions make use of string formatting to pass
        the fileIn parameter, which futher enhances re-usability.
        I fruther call sys.exit(1) which stops the execution
        and allows the OS to know there was a problem.

        I have checked the FileNotFound exception by passing
        in the name of a non-existing file.

        Keyword arguments:
            fileIn -- the file object whi chwill be opened
            skipHeader -- some files have "Headers" in the
            first row, if this is true we ignore these
            (defaults to False) Ultimatley, I never used
            this parameter

        Returns:
            data -- a list of tuples
    """
    # setup a variable to hold the data from the file
    data = ''
    try:
        with open(file_in, encoding='utf-8-sig', mode='r') as file:
            reader = csv.reader(file)
            if skip_header is True:
                # if skip_header is true, then skip to the next line
                next(reader, None)
            # make each row in the input file into a tuple and add
            # this to a list to be returned.
            data = [tuple(row) for row in reader]
    # handle file not found errors with a nice error message
    except FileNotFoundError:
        print('File {} does not exist.'.format(file_in))
        sys.exit(1)
    except:  # generic catch all error message
        print(
            'Trying to open {} failed.  No further information was available'
            .format(file_in))
        sys.exit(1)
    # return the data variable (the file contents)
    return data


def convert_epoch_to_readable(date_in):
    """Convert Excel Epoch time to a sting.

    This strange epoch format was a challenge for me.  In order to check if
    the date was in 2019 (which was asked for in the question) the only
    way I could think was to convert this format to something "human"
    readable and then check for the year.  This function simply takes
    in the epoch date and passes out a human readable string.

    Note:
    Excel has a strange idea of epoch, as well a bug from Lotus notes,
    to convert the date to a readable format I used the following code,
    which was taken from the  Stackover Flow post referenced:

    Before using this function, I tested this by taking the date
    columns in the bookloans.csv and converting them to "date"
    format.
    I then checked that the dates returned by the function matched
    those of excel.  They did.

        Keyword arguments:
            dateIn -- the date (in excel epoch) format

        Returns:
            dateOut -- a date in string format (eg 12-01-1996)
    """
    # Mac and PC excel have different "start dates" - using a PC, but
    # leaving the mac code in if needed.
    EXCEL_DATE_SYSTEM_PC = 1900

    dateOut = datetime.date(EXCEL_DATE_SYSTEM_PC, 1, 1) + datetime.timedelta(date_in-2)

    dateOut = dateOut.strftime("%d-%m-%Y")

    return dateOut


def contains_date(stringDate, dateToCheck):
    """Checks if a string contains another String.

    This takes two strings and returns a boolean if there is a substring match.
    This is marked as "containsDate" but it is essentially a substring matching
    function and could be renamed and still not loose any readability or
    functionality.

        Keyword arguments:
            stringDate -- a date in string format, in truth this could be ANY
            string.
            dateToCheck -- a part of the date to check, in truth this could be
            any string.
    """
    # basic substring matching here.  If we find the substring, return True,
    # else return False.
    if dateToCheck in stringDate:
        return True
    else:
        return False


def build_task1_details(loans):
    """Generate the content for the report.

    Keyword arguments:
        loans -- the list of data from the loans CSV file

    Returns:
        report_dict -- a dictionary (keyed on booknumber) containing
        times loaned in 2019, the title, the author
    """
    report_dict = {}
    for row in loans:
        # make things easier to reference
        book_number = row[0]
        book_number = int(book_number)

        # get the loaned date, convert this to an int and convert
        # to a readable string
        loaned_date_epoch = row[2]
        loaned_date_epoch = int(loaned_date_epoch)
        loaned_date_readable = convert_epoch_to_readable(loaned_date_epoch)

        # We will store the info on the books in a nested dictionary using the
        # book number as a key.
        # First ensure the book_number is not already present:
        if book_number not in report_dict:
            report_dict[book_number] = {}

        if book_number in report_dict:
            # we need to check if the book which is being loaned is present in
            # our cut down books.csv
            book_found = False

            # loop through the books
            for book in books:
                book_ref = book[0]
                # bit sucky, to keep code working I had to bring in the
                # headers. This makes sure we don't deal with the headers
                if book_ref != 'Number':
                    book_ref = int(book_ref)
                if book_ref == book_number:
                    book_found = True
            if book_found:
                # we have found the book - so add the author and title to
                # fields.
                book_title = books[book_number][1]
                book_author = books[book_number][2]
                if contains_date(loaned_date_readable, '2019'):
                    # if the book was loanded in 2019, get the number of
                    # times this has been out.
                    loaned_times = report_dict[book_number].get("timesOut2019")

                    # However, we may get a None type.  In this occasion,
                    # we can assume the bookNumber isn't in the dictionary,
                    # so set the loanedTimes as 1
                    if loaned_times is None:
                        loaned_times = 1
                    else:
                        loaned_times += 1
            else:
                # if the book wasn't found, then we blank out the name
                # and author
                book_title = ""
                book_author = ""

        else:
            # this is a new book
            report_dict[book_number]["timesOut2019"] = 1

        # add to fields only if the bookTitle is not an empty string
        if book_title != "":
            # we write this to reportDict
            report_dict[book_number]["timesOut2019"] = loaned_times
            report_dict[book_number]["title"] = book_title
            report_dict[book_number]["author"] = book_author
        else:
            # if the bookTitle is empty, then we also need to remove the
            # booknumber from the final report
            report_dict.pop(book_number)

    return report_dict


def sort_and_generate_task1_content(data_in):
    """Sorts  the dictionary which is passed in
    on the field timesOutIn2019 and then converts
    this to a list for use with the CSV writer function.

    This function also prints to the screen the report to provide
    the user with some indication that the task in underway.

    Keyword arguments:
        dataIn --  the dictionary which was built up in the
        buildTask1Details function.

    Returns:
        report_out -- a listed sorted on timesOutIn2019

    """
    report_out = []
    print(
        "{:<12} {:<60} {:<40} {:<10}".format
        ('Book Number', 'Title', 'Author', 'Times Loaned 2019')
        )
    for k, v in sorted(data_in.items(), key=lambda e: e[1]["timesOut2019"]):
        item = [k, v['title'], v['author'], v['timesOut2019']]
        report_out.append(item)
        print(
            "{:<12} {:<60} {:<40} {:<10}".format
            (k, v['title'], v['author'], v['timesOut2019'])
        )
    return report_out


def create_report(file_name, headers, content):
    """Creates a report file (currently a CSV)

    Keyword arguments:
        file_name -- the file we wish to create
        headers -- a list containing the header columns for the report
        content -- a list with the data to be populated in the file.
    """
    try:
        with open(file_name, 'w', newline="") as out_file:
            csvwriter = csv.writer(out_file)
            csvwriter.writerow(headers)
            csvwriter.writerows(content)
    except:
        print(
            'Trying to create {} failed.  No further information was available'
            .format(file_name)
        )
        sys.exit(1)


# Task 1 specific code

# The headers for our report
report_headers = ['Book Number', 'Title', 'Author', 'Times Loaned 2019']

# Open the CSV files and assign them to variables
books = open_file(BOOKSFILE, False)
loans = open_file(BOOKLOANSFILE, False)

temp = build_task1_details(loans)
contents = sort_and_generate_task1_content(temp)
create_report(REPORTFILETASK_1, report_headers, contents)


Book Number  Title                                                        Author                                   Times Loaned 2019
13           Birth of a Theorem                                           Villani, Cedric                          1         
35           Surely You're Joking Mr Feynman                              Feynman, Richard                         2         
57           Textbook of Economic Theory                                  Stonier, Alfred                          2         
59           Learning OpenCV                                              Bradsky, Gary                            2         
67           Argumentative Indian, The                                    Sen, Amartya                             2         
108          Ashenden of The British Agent                                Maugham, William S                       2         
71           All the President's Men                                      Woodward, Bob                        

## Notes on Task 1

The assignment solution for task 1 works out as the following function flow:<br>
__report_headers = ['Book Number', 'Title', 'Author', 'Times Loaned 2019']__<br>
__books = open_file(BOOKSFILE, False)__<br>
__loans = open_file(BOOKLOANSFILE, False)__<br>
<br>
__temp = build_task1_details(loans)__<br>
__contents = sort_and_generate_task1_content(temp)__<br>
__create_report(REPORTFILETASK_1, report_headers, contents)__<br>

#### File Opening

The first thing to tackle is opening and retrieving data.  This is done using the `open_file()` function which is shown below:

```def open_file(file_in, skip_header=False):
    # setup a variable to hold the data from the file
    data = ''
    try:
        with open(file_in, encoding='utf-8-sig', mode='r') as file:
            reader = csv.reader(file)
            if skip_header is True:
                # if skip_header is true, then skip to the next line
                next(reader, None)
            # make each row in the input file into a tuple and add 
            # this to a list to be returned.
            data = [tuple(row) for row in reader]
    # handle file not found errors with a nice error message
    except FileNotFoundError:
        print('File {} does not exist.'.format(file_in))
        sys.exit(1)
    except:  # generic catch all error message
        print(
            'Trying to open {} failed.  No further information was available'
            .format(file_in))
        sys.exit(1)
    # return the data variable (the file contents)
    return data
```

This fuction wraps a standard python file open in a try/except block.  This has been written to catch FileNotFoundErrors and the wrror message has been written using placeholders to allow the file name being looked for to be shown in the error.
Within this method, the contents of the file are read and returned as a list of tuples.
This `file_open()` method is used for both the books and the bookloans file, but also for all other questions in this assignment.

***NOTE*** As mentioned in the task 1 notes above, one of the files had a header row, the other did not.  My initial thought was to add a parameter to the `file_open()` to allow skipping (or not) of the header.  This would mean the function was reusable.  Ultimately, I decided to handle the header row in the `build_task1_details()` function and set the __skip_header__ parameter to have a default of false.  This, to me, increases the reusability of the function without compromising on features.

#### Building the data

Next we need to build the data.  This is done using the `build_task1_details()` function as shown below:

```def build_task1_details(loans):
    report_dict = {}
    for row in loans:
        # make things easier to reference
        book_number = row[0]
        book_number = int(book_number)

        # get the loaned date, convert this to an int and convert
        # to a readable string
        loaned_date_epoch = row[2]
        loaned_date_epoch = int(loaned_date_epoch)
        loaned_date_readable = convert_epoch_to_readable(loaned_date_epoch)

        # We will store the info on the books in a nested dictionary using the
        # book number as a key.
        # First ensure the book_number is not already present:
        if book_number not in report_dict:
            report_dict[book_number] = {}

        if book_number in report_dict:
            # we need to check if the book which is being loaned is present in
            # our cut down books.csv
            book_found = False

            # loop through the books
            for book in books:
                book_ref = book[0]
                # bit sucky, to keep code working I had to bring in the
                # headers.
                if book_ref != 'Number':
                    book_ref = int(book_ref)
                if book_ref == book_number:
                    book_found = True
            if book_found:
                # we have found the book - so add the author and title to
                # fields.
                book_title = books[book_number][1]
                book_author = books[book_number][2]
                if contains_date(loaned_date_readable, '2019'):
                    # if the book was loanded in 2019, get the number of
                    # times this has been out.
                    loaned_times = report_dict[book_number].get("timesOut2019")

                    # However, we may get a None type.  In this occasion,
                    # we can assume the bookNumber isn't in the dictionary,
                    # so set the loanedTimes as 1
                    if loaned_times is None:
                        loaned_times = 1
                    else:
                        loaned_times += 1
            else:
                # if the book wasn't found, then we blank out the name
                # and author
                book_title = ""
                book_author = ""

        else:
            # this is a new book
            report_dict[book_number]["timesOut2019"] = 1

        # add to fields only if the bookTitle is not an empty string
        if book_title != "":
            # we write this to reportDict
            report_dict[book_number]["timesOut2019"] = loaned_times
            report_dict[book_number]["title"] = book_title
            report_dict[book_number]["author"] = book_author
        else:
            # if the bookTitle is empty, then we also need to remove the
            # booknumber from the final report
            report_dict.pop(book_number)

    return report_dict
```
    
This function takes in the list of tuples which was created by the `file_open()` function and then starts to work.

The design for this function is quite simple.  We shall build a __dictionary__ (in actuality a __dictionary of dictionaries__) using the book number as they key.

Each sub-dictionary will be of the following structure:

```
bookNumber: {'author': <string> book author,
             'timesOut2019': <int> count of times loaned in 2019,
             'title': <string> title of the book}
```
The `build_task1_details()` function will, for each entry in the loans list:
- extract the bookumber
- convert the booknmber to an int (as the reader will make this a string)
- extract the loaned date
- convert the loaned date to a string
- convert the loaned date to a "human readable" format

If the booknumber __is not__ present in the report_dict dictionary, then a new sub dictionary is created based on the booknumber.
The function also checks if the book was loaned in 2019, if it was, then it either sets timesOut2019 to 1 or increments the value if a value is present.  If the book was not loaned in 2019, then it does not add it, or update any existing entry in the dictionary.

One additional thing to note, the question says that if we find books in the bookloans file then these should be discarded.  This is accomplished by checking the title and the author - if these are empty, the we remove that book from our report dictionary.

#### Sorting the data

The `build_task1_details()` function returns a dictionary of dictionaries in ascending booknumber order - which is not what was requested in the task.  We now need to sort the data.

We do the sorting in the `sort_and_generate_task1_content()` function.  
It sorts on the dictionary element "timesOutIn2019".  By default this is done in an increasing order, which is what the question has asked for. 

In order to provide feedback, we also print out the sorted list.  This can be seen with the two print statement, the first sets the headers for the screen and the second the actual printing of values.  Formatting is used here to allow the data to be presented in a nicely formatted way.

This function also returns a sorted list - which is what the next stage used.

#### Outputting a CSV

```def create_report(file_name, headers, content):
    try:
        with open(file_name, 'w', newline="") as out_file:
            csvwriter = csv.writer(out_file)
            csvwriter.writerow(headers)
            csvwriter.writerows(content)
    except:
        print(
            'Trying to create {} failed.  No further information was available'
            .format(file_name)
        )
        sys.exit(1)         
```
The final stage is to output a CSV containing the report which will, at this stage, be present on screen.
The function is just a standard CSV writer enclosed in a try/except to catch any file errors.
The fnction takes 3 parameters: the file name, the headers and the content and from this creates a CSV for the user to view.
