# Qwiklabs Assessment: Debugging and Solving Software Problems

**Introduction :**\
You're a member of your company's IT department. A colleague that recently left the company wrote a program that's 90% complete; it's designed to read some data files with information on employees and then generate a report. It's up to you to finish the code -- this includes fixing any errors, bugs, and slowness that might be in the unfinished code.

Prerequisites:
You should have a sound knowledge of the following things prior to performing the lab:

Debugging (gathering information, root cause analysis, and remediation)
Identifying and understanding system performance (I/O, Network, CPU, Memory)
Understanding and troubleshooting the environment around the program (file system, OS, etc.)

**Improve performance :**\
Once you debug the issue, the program will start processing the file but it takes a long time to complete. This is because the program goes slowly line by line instead of printing the report quickly. You need to debug why the program is slow and then fix it. In this section, you need to find bottlenecks, improve the code, and make it finish faster.

The problem with the script is that it’s downloading the whole file and then going over it for each date. The current script takes almost 2 minutes to complete for 2019-01-01. An optimized script should generate reports for the same date within a few seconds.

**Here are few hints to fix this issue :**

Download the file only once from the URL.

Pre-process it so that the same calculation doesn't need to be done over and over again. This can be done in two ways. You can choose any one of them:

To create a dictionary with the start dates and then use the data in the dictionary instead of the complicated calculation.
To sort the data by start_date and then go date by date.
Choose any one of the above preprocessing options and modify the script accordingly.

In [26]:
## 解答 但慢速 要進行修改

#!/usr/bin/env python3


import csv
import datetime
import requests


FILE_URL = "https://storage.googleapis.com/gwg-hol-assets/gic215/employees-with-date.csv"

def get_start_date():
    """Interactively get the start date to query for."""

    print()
    print('Getting the first start date to query for.')
    print()
    print('The date must be greater than Jan 1st, 2018')
    year = int(input('Enter a value for the year: '))
    month = int(input('Enter a value for the month: '))
    day = int(input('Enter a value for the day: '))
    print()
    return datetime.datetime(year, month, day)

def get_file_lines(url):
    """Returns the lines contained in the file at the given URL"""

    # Download the file over the internet !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    response = requests.get(url, stream=True)
    lines = []

    for line in response.iter_lines():
        lines.append(line.decode("UTF-8"))
    return lines

def get_same_or_newer(start_date):
    """Returns the employees that started on the given date, or the closest one."""
    data = get_file_lines(FILE_URL)
    reader = csv.reader(data[1:])

    # We want all employees that started at the same date or the closest newer
    # date. To calculate that, we go through all the data and find the
    # employees that started on the smallest date that's equal or bigger than
    # the given start date.
    min_date = datetime.datetime.today()
    min_date_employees = []
    for row in reader:
        row_date = datetime.datetime.strptime(row[3], '%Y-%m-%d')

        # If this date is smaller than the one we're looking for,
        # we skip this row
        if row_date < start_date:
            continue

        # If this date is smaller than the current minimum,
        # we pick it as the new minimum, resetting the list of
        # employees at the minimal date.
        if row_date < min_date:
            min_date = row_date
            min_date_employees = []

        # If this date is the same as the current minimum,
        # we add the employee in this row to the list of
        # employees at the minimal date.
        if row_date == min_date:
            min_date_employees.append("{} {}".format(row[0], row[1]))

    return min_date, min_date_employees

def list_newer(start_date):
    while start_date < datetime.datetime.today():
        start_date, employees = get_same_or_newer(start_date)
        print("Started on {}: {}".format(start_date.strftime("%b %d, %Y"), employees))

        # Now move the date to the next one
        start_date = start_date + datetime.timedelta(days=1)

def main():
    start_date = get_start_date()
    list_newer(start_date)


In [27]:
# 解答
if __name__ == "__main__":
    main()


Getting the first start date to query for.

The date must be greater than Jan 1st, 2018
Enter a value for the year: 2020
Enter a value for the month: 1
Enter a value for the day: 1

Started on Jan 02, 2020: ['Yoshi Molina']
Started on Jan 04, 2020: ['Tasha Hodge']
Started on Jan 05, 2020: ['Margaret Hooper', 'Ruby Richard']
Started on Jan 08, 2020: ['Lyle Schultz']
Started on Jan 09, 2020: ['Roth Foster']
Started on Jan 13, 2020: ['Tanek Burton', 'Kyla Gay', 'Leigh Willis']
Started on Jan 14, 2020: ['Brynn Miles', 'Lane Newman']
Started on Jan 16, 2020: ['Rose Compton']
Started on Jan 19, 2020: ['Connor Grimes']
Started on Jan 20, 2020: ['Hedwig Cain']
Started on Jan 23, 2020: ['Hillary Vega']
Started on Jan 24, 2020: ['Olivia Frederick', 'Xandra Gonzalez']
Started on Jan 25, 2020: ['Jesse Wade']
Started on Jan 26, 2020: ['Thane Rich']
Started on Jan 28, 2020: ['Keefe Logan']
Started on Feb 05, 2020: ['Colin Maddox']
Started on Feb 06, 2020: ['Ivan Dillard']
Started on Feb 09, 2020: [

# Practice

In [41]:
#!/usr/bin/env python3


import csv
import datetime
import requests


FILE_URL = "https://storage.googleapis.com/gwg-hol-assets/gic215/employees-with-date.csv"

def get_start_date():
    """Interactively get the start date to query for."""
    
    print()
    print('Getting the first start date to query for.')
    print()
    print('The date must be greater than Jan 1st, 2018')
    year = int(input('Enter a value for the year: '))
    month = int(input('Enter a value for the month: '))
    day = int(input('Enter a value for the day: '))
    print()

    return datetime.datetime(year, month, day)

def get_file_lines(url):
    """Returns the lines contained in the file at the given URL"""

    # Download the file over the internet
    response = requests.get(url, stream=True)
    lines = []

    for line in response.iter_lines():
        lines.append(line.decode("UTF-8"))
    return lines

#############################  start my own code at here ######################################

def get_same_or_newer(start_date):
    """Returns the employees that started on the given date, or the closest one."""
    data = get_file_lines(FILE_URL)
    reader = csv.reader(data[1:])

    # We want all employees that started at the same date or the closest newer
    # date. To calculate that, we create a dictionary file that stores all the date as key and a list of user name as value
    # but first we need an end day where we want to stop, that is today
    min_date = datetime.datetime.today()
    min_date_employees = {}
    
    # iterate through the line in csv file
    for row in reader:
        row_date = datetime.datetime.strptime(row[3], '%Y-%m-%d')
        if row_date >= start_date:
            if row_date not in min_date_employees:
                min_date_employees[row_date] = [row[0]+ row[1]]
            else:
                min_date_employees[row_date].append(row[0]+ row[1])
    
    # remember to sort your key in chronological sequence !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    return dict(sorted(min_date_employees.items()))

    '''
    easy way to do it, but the real algorithmus behind it is:
    key_list = []
    new_min_date_employees = {}
    for key in min_date_employees.keys():
        key_list.append(key)
    key_list.sort()
    for key in key_list:
        new_min_date_employees[key] = min_date_employees[key]
    
    min_date_employees = new_min_date_employees
    '''

# print out the first start date and the employee at that time, which all are stored in dict
def list_newer(start_date):
    for first_start_date, employees in get_same_or_newer(start_date).items():
        print("Started on {}: {}".format(first_start_date.strftime("%b %d, %Y"), employees))

def main():
    start_date = get_start_date()
    list_newer(start_date)




In [42]:
if __name__ == "__main__":
    main()


Getting the first start date to query for.

The date must be greater than Jan 1st, 2018
Enter a value for the year: 2020
Enter a value for the month: 1
Enter a value for the day: 1

Started on Jan 02, 2020: ['YoshiMolina']
Started on Jan 04, 2020: ['TashaHodge']
Started on Jan 05, 2020: ['MargaretHooper', 'RubyRichard']
Started on Jan 08, 2020: ['LyleSchultz']
Started on Jan 09, 2020: ['RothFoster']
Started on Jan 13, 2020: ['TanekBurton', 'KylaGay', 'LeighWillis']
Started on Jan 14, 2020: ['BrynnMiles', 'LaneNewman']
Started on Jan 16, 2020: ['RoseCompton']
Started on Jan 19, 2020: ['ConnorGrimes']
Started on Jan 20, 2020: ['HedwigCain']
Started on Jan 23, 2020: ['HillaryVega']
Started on Jan 24, 2020: ['OliviaFrederick', 'XandraGonzalez']
Started on Jan 25, 2020: ['JesseWade']
Started on Jan 26, 2020: ['ThaneRich']
Started on Jan 28, 2020: ['KeefeLogan']
Started on Feb 05, 2020: ['ColinMaddox']
Started on Feb 06, 2020: ['IvanDillard']
Started on Feb 09, 2020: ['DesiraeGaines']
Start