## Managing Computer Resources

Intro to Module 4: Managing Resources
we'll explore how we can figure out what's going on with programs that exhaust resources on our computer. Whether that's memory, disk, or even network link. Then will talk about managing our most valuable resource of all, time.

Memory Leaks and How to Prevent Them
Memory leak : happens when a chunk of memory that's no longer needed is not released

When writing programs in languages like C, or C plus plus, the programmer is in charge of deciding how much memory to request, and when to give it back.

If the program uses all of the available memory, then no processes will be able to request more memory, and things will start failing in weird ways. When this happens, the OS might terminate processes to free up some of the memory, causing unrelated programs to crash. 

Python, Java, or Go manage memory for us
To understand how this works, let's look into what these languages do. First, they request the necessary memory when we create variables, and then they run a tool called Garbage collector, that's in charge of freeing the memory that's no longer in use. To detect when that's the case, the garbage collector looks at the variables in use and the memory assigned to them and then checks if there any portions of the memory that aren't being referenced by any variables.

In other words, even when the language takes care of requesting and releasing the memory for us, we could still see the same effects of a memory leak.
 
So memory leaks are less of an issue for programs that are short lived, but can become especially problematic for processes that keep running in the background.
 
Memory profiler : to figure out how the memory is being used.
C, C++ : Valgrind
Python : Many tools

It's important that we measure the use of memory first before we try to change anything, otherwise we might be optimizing the wrong piece of code.

Managing Disk Space
1) Installed binaries and libraries
2) Data stored by the applications
3) Cached information
4) Logs
5) Temporary files
6) Backups

When a hard drive is full, programs may suddenly crash, while trying to write something into disk and finding out that they can't. A full hard drive might even lead to data loss, as some programs might truncate a file before writing an updated version of it, and then fail to write the new content, losing all the data that was stored in it before.

1) Uninstalling applications that aren't used
2) Cleaning up old data that isn't needed anymore

A mail server, it's going to be the mailboxes of the users of that service. But if you find that most of the data is stored in logs or in temporary files, something has gone wrong. -> rotate the log

sudo lsof | grep deleted

Just remember that whenever this happens, your process will remain the same. You'll need to spend some time looking into what's using the disk. Check to see if it's expected or an anomaly, figure out how to solve it, and most important of all, how to prevent it from happening again? 

Network Saturation

The two most important factors that determine the time it takes to get the data over the network:
1) Latency : The delay between sending a byte of data from one point and receiving it on the other. This value is directly affected by the physical distance between the two points and how many intermediate devices there are between them. 
2) Bandwidth : How much data can be sent or received in a second. This is effectively the data capacity of the connection.

Remember that if you're transmitting a lot of small pieces of data, you care more about latency than bandwidth. In this case, you want to make sure that the server is as close as possible to the users of the service, aiming for a latency of less than 50 milliseconds if possible, and up to a 100 milliseconds in the worst-case.

On the flip side, if you're transmitting large chunks of data, you care more about the bandwidth than the latency. In this case, you want to have as much bandwidth available as possible regardless of where the server is hosted.

iftop : This shows how much data each active connection is sending over the network

Traffic shaping : This is a way of marking the data packets sent over the network with different priorities. To avoid having huge chunks of data, use all the bandwidth.

Dealing with Memory Leaks

uxterm &
Scroll buffer : is that nifty feature that lets us scroll up and see the things that we executed and their output. The contents of the buffer are kept in memory. So if we make it really long and we managed to fill it, will cause our computer to run out of memory.

od-cx/dev/urandom : This command will take the random numbers generated by the urandom device and show them as both characters and hexadecimal numbers.

top (shift+m)
ctrl+c

RES : the dynamic memory that's preserved for the specific process (indicate the problem)
SHR : memory that's shared across processes
VIRT : lists all the virtual memory allocated for each process.

cd contents_stats/
./contents_stats.py
top

atom contents_stats_simple.py
use memory_profiler module
@profile
Decorator (@) : it's used in Python to add extra behavior to functions without having to modify the code. 

clear
./contents_stats_simple.py



## Managing Our Time

Getting to the Important Tasks

There's another resource that's even more valuable in our day to day, our time.
When working, we need to optimize the time we spend to bring the most value to the company. Finding the right balance is hard, but that's what we're here for. 

One that's super effective when working in IT is the Eisenhower Decision Matrix.
1) Important 2) Urgent
1) ASAP :
2) Long Term : It can be critical when dealing with a large incident, Researching new technologies, Solving technical debt
3) Interruptions : you can rotate the person dealing with those interruptions. 
4) Distractions : Meetings where nothing useful is being discussed, email threads that lead to nowhere, office gossip, no thanks

Technical debt is the pending work that accumulates when we choose a quick-and-easy solution instead of applying a sustainable long-term one. 

*how we need to make sure that we have the time available to work on tasks that are important, but not necessarily urgent.

Prioritizing Tasks

Everyone works a little differently. So you'll need to find the system that works best for you.'

Let's cover the basic structure that can help us get organized and prioritize our tasks.
1) Make a list of all of the tasks that need to get done.
2) Check the real urgency of the tasks
3) Assess the importance of each issue
    If everthing is on fire divide the tasks into groups 1) Most important 2) Important 3) Not so important
    Don't spend too much time doing this sorting. In the end, the exact order isn't what matters. What matters is that you spend most of your time working on the most important tasks.
4) How much effort they'll take
     It's about assigning rough sizes. One common technique is to use small, medium, and large.
     If possible, try to start with the larger, most important tasks to get those out of the way first. 
     
One strategy that can help us with that is saving the most complex tasks for the moments when we're less likely to get interrupted.

*The key here is to always work on important tasks. 

But keep in mind, this shouldn't stop you from taking a break or working on experimental projects.

Which means there are basically two options, either you get extra help from other team members or you decide that some tasks weren't really that important, and they won't get done.

Estimating the Time Tasks Will Take

Consider two things
1) how many times we'll do the task over a period of time
2) how long it takes to do it manually.

We forget to take into account the many obstacles that we might face like finding a bug that we don't know how to fix, being interrupted by a problem that needs more urgent attention, or discovering that a new tool doesn't work well with the rest of the tools we have in place. 

The best way to do this is to compare the task that you're trying to do with similar tasks that you've done before.
If one smaller step is still too large, then split it into even smaller pieces until you can compare each piece of the puzzle was something that you've done before. Once you've got all those estimated times, just add them up and you'll have a rough estimate of how long the whole task will take
So once you have a rough estimate of the total time of all the steps, you want to factor in some extra time for integration.This should also come from prior experience.


Communicating Expectations

To have successful interactions with our users, it's important to understand these implicit expectations and let users know if fixing the problem will take longer than they expect.

As long as we communicate with them early about the circumstances, they will be able to understand this and manage their time accordingly. 

It's also important to let users know if there are any conflicting priorities that might delay the response to whatever they need.
Make sure you tell the user that you're dealing with a crisis and that you'll help with their request once the crisis is resolved,

A general rule, communication is key. Try to be clear and upfront about when you expect the issue will be resolved

Notice a theme, estimating the time it takes to perform more complicated work is tough. A lot of your time will be spent investigating looking into what's going on and figuring out what should be happening. In that case, make sure to let users know when they can expect an update on their issue and give them timely updates

if possible it's a really good idea to have users filed the requests through a ticket tracking system.
lets you organize your tasks by priority
lets you make better use of your time


Practical shortcut : try out some practical shortcuts when dealing with users. It makes sense to take some time to think about the work you do and figure out ways to avoid interruptions and save time. 

## Making Our Future Lives Easier

Dealing with Hard Problems

Brian Kernighan
"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?"

It's important to focus on building systems and applications that are simple and easy to understand. So that when something goes wrong, we can figure out how to fix them quickly.

1) One piece of advice I found really valuable is to develop code in small, digestible chunks
2) Keep your goal clear (Test before actual code, documentation)

If you're in a sticky situation, the main thing to do is to remain calm. We need our creative skills to solve problems, and the worst enemy of creativity is anxiety.
So if you feel that you're out of ideas, it's better to take your mind off the problem for a while. Maybe grab a cup of coffee, or take a walk outside. Sometimes a change of scenery is all we need for a new idea to pop up and help us figure out what we're missing, true in coding and in life.
And don't be afraid to ask for help. 

Rubber Duck Debugging : which is simply explaining the problem to a rubber duck. Because when we force ourselves to explain a problem, we already start thinking about the issue differently. 

When you ask a colleague for their help with debugging a problem, be careful not to tell them what you think the root cause of the issue might be. Instead, tell them about the symptoms, and see what questions they ask and what possibilities they probe.

Proactive Practices

To avoid having to scramble to fix things when there's an outage, it's really helpful to have infrastructure that lets us test changes in advance so that we can check that things are working as expected before they reach our users.

If we're the ones writing the code, one thing we can do is to make sure that our code has good unit tests and integration tests. 

Setting up continuous integration can help with that.

Another step in this direction is to have a test environment, where we can deploy new code before shipping it to the rest of our users. 
1) We can do a thorough check of the software as it will be seen by the users.
2) we can use this test environment to troubleshoot problems whenever they happen.

Another recommended practice when managing a fleet of computers is to deploy software in phases or canaries. 

Still have some bugs
1) We can make our troubleshooting easier by including good debug logging in the code.
2) having centralized logs collection. This means there's a special server that gathers all the logs from all the servers or even all the computers in the network.

Similarly, having a good monitoring system can be super helpful. 

Ticketing systems : If we ask users to provide the needed information up front, we don't have to waste time and go back and forth.

Finally, remember to spend time writing documentation.
At Google, we have a bunch of docs called Playbooks where we detail what a person who's on call can do to diagnose and mitigate a ton of different problems. 

Planning Future Resource Usage

Sometimes, it's not a question of misusing resources, but rather missing resources.

Current usage + Expected growth

Cleaning / Add more storage

Network-Attached Storage (NAS) : can be attached to your server for additional disk space

An interesting strategy for making the best possible use of resources, is to mix and match the processes that run on the computers, so they make use of all the available resources.

An alternative for having to deal with all these resources like figuring out when to buy more and how to distribute them, is to migrate those systems to the Cloud. Setting up your service to run on the cloud will require some initial setup time, as well as an ongoing cost for the Cloud resources you're using. 

Preventing Future Problems

Whenever we're faced with an issue, it's usually best to find a quick workaround. So that those affected can get back to work as soon as possible.

One key strategy is to make good use of monitoring.

When you first set up a monitoring system, you might not be sure what information to prioritize, so start with the basics, CPU, disk, memory, and network usage.

Whenever you have to deal with an incident that wasn't caught by the monitoring system, remember to set up new monitoring and alerting rules that will notify you about the problem if it ever happens again. 

If you have to work around an issue in an application developed by someone else, it's important that you report a bug to the relevant developers.

If you have to work around an issue in the software that you own, make sure that you write a test that catches the problem.

Finally, regardless of whether the bug came from software that you wrote or someone else wrote, make sure that you document the key pieces of what you did, how you diagnosed the issue, and how you squashed it.

## Debugging and Solving Software Problems

ls
sudo chmod 777 ~/start_date_report.py
./start_date_report.py

nano ~/start_date_report.py
./start_date_report.py

time ./test.py
nano ~/start_date_report.py
./start_date_report.py

In [3]:
# start_date_report.py
#!/usr/bin/env python3


import csv
import datetime
import requests


FILE_URL = "https://storage.googleapis.com/gwg-hol-assets/gic215/employees-with-date.csv"

def get_start_date():
    """Interactively get the start date to query for."""

    print()
    print('Getting the first start date to query for.')
    print()
    print('The date must be greater than Jan 1st, 2018')
    year = input('Enter a value for the year: ')
    month = input('Enter a value for the month: ')
    day = input('Enter a value for the day: ')
    print()

    return datetime.datetime(year, month, day)

def get_file_lines(url):
    """Returns the lines contained in the file at the given URL"""

    # Download the file over the internet
    response = requests.get(url, stream=True)
    lines = []

    for line in response.iter_lines():
        lines.append(line.decode("UTF-8"))
    return lines

def get_same_or_newer(start_date):
    """Returns the employees that started on the given date, or the closest one."""
    data = get_file_lines(FILE_URL)
    reader = csv.reader(data[1:])

    # We want all employees that started at the same date or the closest newer
    # date. To calculate that, we go through all the data and find the
    # employees that started on the smallest date that's equal or bigger than
    # the given start date.
    min_date = datetime.datetime.today()
    min_date_employees = []
    for row in reader:
        row_date = datetime.datetime.strptime(row[3], '%Y-%m-%d')

        # If this date is smaller than the one we're looking for,
        # we skip this row
        if row_date < start_date:
            continue

        # If this date is smaller than the current minimum,
        # we pick it as the new minimum, resetting the list of
        # employees at the minimal date.
        if row_date < min_date:
            min_date = row_date
            min_date_employees = []

        # If this date is the same as the current minimum,
        # we add the employee in this row to the list of
        # employees at the minimal date.
        if row_date == min_date:
            min_date_employees.append("{} {}".format(row[0], row[1]))

    return min_date, min_date_employees

def list_newer(start_date):
    while start_date < datetime.datetime.today():
        start_date, employees = get_same_or_newer(start_date)
        print("Started on {}: {}".format(start_date.strftime("%b %d, %Y"), employees))

        # Now move the date to the next one
        start_date = start_date + datetime.timedelta(days=1)

def main():
    start_date = get_start_date()
    list_newer(start_date)

if __name__ == "__main__":
    main()


Getting the first start date to query for.

The date must be greater than Jan 1st, 2018
Enter a value for the year: 2019
Enter a value for the month: 1
Enter a value for the day: 1



TypeError: an integer is required (got type str)

In [4]:
# start_date_report.py
#!/usr/bin/env python3


import csv
import datetime
import requests


FILE_URL = "https://storage.googleapis.com/gwg-hol-assets/gic215/employees-with-date.csv"

def get_start_date():
    """Interactively get the start date to query for."""

    print()
    print('Getting the first start date to query for.')
    print()
    print('The date must be greater than Jan 1st, 2018')
    year = int(input('Enter a value for the year: '))
    month = int(input('Enter a value for the month: '))
    day = int(input('Enter a value for the day: '))
    print()

    return datetime.datetime(year, month, day)

def get_file_lines(url):
    """Returns the lines contained in the file at the given URL"""

    # Download the file over the internet
    response = requests.get(url, stream=True)
    lines = []

    for line in response.iter_lines():
        lines.append(line.decode("UTF-8"))
    return lines

def get_same_or_newer(start_date):
    """Returns the employees that started on the given date, or the closest one."""
    data = get_file_lines(FILE_URL)
    reader = csv.reader(data[1:])

    # We want all employees that started at the same date or the closest newer
    # date. To calculate that, we go through all the data and find the
    # employees that started on the smallest date that's equal or bigger than
    # the given start date.
    min_date = datetime.datetime.today()
    min_date_employees = []
    for row in reader:
        row_date = datetime.datetime.strptime(row[3], '%Y-%m-%d')

        # If this date is smaller than the one we're looking for,
        # we skip this row
        if row_date < start_date:
            continue

        # If this date is smaller than the current minimum,
        # we pick it as the new minimum, resetting the list of
        # employees at the minimal date.
        if row_date < min_date:
            min_date = row_date
            min_date_employees = []

        # If this date is the same as the current minimum,
        # we add the employee in this row to the list of
        # employees at the minimal date.
        if row_date == min_date:
            min_date_employees.append("{} {}".format(row[0], row[1]))

    return min_date, min_date_employees

def list_newer(start_date):
    while start_date < datetime.datetime.today():
        start_date, employees = get_same_or_newer(start_date)
        print("Started on {}: {}".format(start_date.strftime("%b %d, %Y"), employees))

        # Now move the date to the next one
        start_date = start_date + datetime.timedelta(days=1)

def main():
    start_date = get_start_date()
    list_newer(start_date)

if __name__ == "__main__":
    main()


Getting the first start date to query for.

The date must be greater than Jan 1st, 2018
Enter a value for the year: 2019
Enter a value for the month: 1
Enter a value for the day: 1

Started on Jan 05, 2019: ['Lucy Calhoun']
Started on Jan 11, 2019: ['Macon Livingston']
Started on Jan 12, 2019: ['Curran Farley']
Started on Jan 13, 2019: ['Lucius Glass']
Started on Jan 14, 2019: ['Michael Pickett']
Started on Jan 15, 2019: ['Andrew Donaldson']
Started on Jan 19, 2019: ['Richard Dillon']
Started on Jan 21, 2019: ['Clare Saunders', 'Ainsley Knight']
Started on Jan 26, 2019: ['Fleur Baker', 'Casey Gross']
Started on Jan 29, 2019: ['Felix Parks']
Started on Feb 03, 2019: ['Logan Sharp']
Started on Feb 04, 2019: ['Eve Meyer']
Started on Feb 05, 2019: ['Neil Warner']
Started on Feb 11, 2019: ['Kylan Spencer']
Started on Feb 12, 2019: ['Adara Mclaughlin']
Started on Feb 14, 2019: ['Diana Mccall']
Started on Feb 15, 2019: ['Knox Williamson']
Started on Feb 17, 2019: ['Nathaniel Puckett']
Starte

Started on Apr 24, 2020: ['Giselle Dillon']
Started on Apr 26, 2020: ['Bradley Chandler']
Started on Apr 29, 2020: ['Jennifer Murphy']
Started on May 03, 2020: ['Jack Franco']
Started on May 07, 2020: ['Oleg Noble']
Started on May 10, 2020: ['Lavinia Whitfield']
Started on May 13, 2020: ['Kalia Perez', 'Rafael Vaughan']
Started on May 16, 2020: ['Raymond Pate']
Started on May 17, 2020: ['Bruno Wallace']
Started on May 19, 2020: ['Aurora Macias']
Started on May 22, 2020: ['Gemma Booker']
Started on May 28, 2020: ['Blake Franco']
Started on Jun 02, 2020: ['Kyle Roach']
Started on Jun 04, 2020: ['Tanek Edwards']
Started on Jun 06, 2020: ['Liberty Pena']
Started on Jun 10, 2020: ['Kyra Vance']
Started on Jun 11, 2020: ['Kiona Nguyen']
Started on Jun 13, 2020: ['Aurora Sanford']
Started on Jun 20, 2020: ['Jarrod Nicholson']
Started on Jun 24, 2020: ['Nicholas Brock']
Started on Jun 25, 2020: ['Quynn Parsons', 'Katell Gill']
Started on Jun 27, 2020: ['Melanie David', 'Jordan Golden']
Started

In [5]:
# start_date_report.py
#!/usr/bin/env python3


import csv
import datetime
import requests


FILE_URL = "https://storage.googleapis.com/gwg-hol-assets/gic215/employees-with-date.csv"

def get_start_date():
    """Interactively get the start date to query for."""

    print()
    print('Getting the first start date to query for.')
    print()
    print('The date must be greater than Jan 1st, 2018')
    year = int(input('Enter a value for the year: '))
    month = int(input('Enter a value for the month: '))
    day = int(input('Enter a value for the day: '))
    print()

    return datetime.datetime(year, month, day)

def get_file_lines(url):
    """Returns the lines contained in the file at the given URL"""

    # Download the file over the internet
    response = requests.get(url, stream=True)
    lines = []

    for line in response.iter_lines():
        lines.append(line.decode("UTF-8"))
    return lines

def get_same_or_newer(start_date, data):
    """Returns the employees that started on the given date, or the closest one."""
#     data = get_file_lines(FILE_URL)
    reader = csv.reader(data[1:])

    # We want all employees that started at the same date or the closest newer
    # date. To calculate that, we go through all the data and find the
    # employees that started on the smallest date that's equal or bigger than
    # the given start date.
    min_date = datetime.datetime.today()
    min_date_employees = []
    for row in reader:
        row_date = datetime.datetime.strptime(row[3], '%Y-%m-%d')

        # If this date is smaller than the one we're looking for,
        # we skip this row
        if row_date < start_date:
            continue

        # If this date is smaller than the current minimum,
        # we pick it as the new minimum, resetting the list of
        # employees at the minimal date.
        if row_date < min_date:
            min_date = row_date
            min_date_employees = []

        # If this date is the same as the current minimum,
        # we add the employee in this row to the list of
        # employees at the minimal date.
        if row_date == min_date:
            min_date_employees.append("{} {}".format(row[0], row[1]))

    return min_date, min_date_employees

def list_newer(start_date, data):
    while start_date < datetime.datetime.today():
        start_date, employees = get_same_or_newer(start_date, data)
        print("Started on {}: {}".format(start_date.strftime("%b %d, %Y"), employees))

        # Now move the date to the next one
        start_date = start_date + datetime.timedelta(days=1)

def main():
    data = get_file_lines(FILE_URL)
    start_date = get_start_date()
    list_newer(start_date, data)

if __name__ == "__main__":
    main()


Getting the first start date to query for.

The date must be greater than Jan 1st, 2018
Enter a value for the year: 2019
Enter a value for the month: 1
Enter a value for the day: 1

Started on Jan 05, 2019: ['Lucy Calhoun']
Started on Jan 11, 2019: ['Macon Livingston']
Started on Jan 12, 2019: ['Curran Farley']
Started on Jan 13, 2019: ['Lucius Glass']
Started on Jan 14, 2019: ['Michael Pickett']
Started on Jan 15, 2019: ['Andrew Donaldson']
Started on Jan 19, 2019: ['Richard Dillon']
Started on Jan 21, 2019: ['Clare Saunders', 'Ainsley Knight']
Started on Jan 26, 2019: ['Fleur Baker', 'Casey Gross']
Started on Jan 29, 2019: ['Felix Parks']
Started on Feb 03, 2019: ['Logan Sharp']
Started on Feb 04, 2019: ['Eve Meyer']
Started on Feb 05, 2019: ['Neil Warner']
Started on Feb 11, 2019: ['Kylan Spencer']
Started on Feb 12, 2019: ['Adara Mclaughlin']
Started on Feb 14, 2019: ['Diana Mccall']
Started on Feb 15, 2019: ['Knox Williamson']
Started on Feb 17, 2019: ['Nathaniel Puckett']
Starte

In [9]:
#!/usr/bin/env python3
import csv
import datetime
import requests

FILE_URL="http://marga.com.ar/employees-with-date.csv"

def get_start_date():
    """Interactively get the start date to query for."""

    print()
    print('Getting the first start date to query for.')
    print()
    print('The date must be greater than Jan 1st, 2018')
    year = int(input('Enter a value for the year: '))
    month = int(input('Enter a value for the month: '))
    day = int(input('Enter a value for the day: '))
    print()

    return datetime.datetime(year, month, day)

def get_file_content(url):
    """Download the file over the internet and
    convert needed columns into sorted dictionary."""

    my_dict = {}
    with requests.get(url, stream=True) as r:
        lines = (line.decode('utf-8') for line in r.iter_lines())
        reader = csv.reader(lines)
        next(reader)
        for row in reader:
            my_dict[row[3]] = [row[0] + " " + row[1]]
    return dict(sorted(my_dict.items()))

def get_same_or_newer(start_date, data):
    """Go through all the data and find the employees that started on the given date, or the closest one. Do not include dates after today."""

    my_date_employees = {}
    for key in data:
        name = data[key]
        date = datetime.datetime.strptime(key, '%Y-%m-%d')
        if date >= start_date and date <= datetime.datetime.today():
            my_date_employees[date] = name
    return my_date_employees

def list_newer(start_date, data):
    employees = get_same_or_newer(start_date, data)
    for key in employees:
        print("Started on {}: {}".format(key.strftime("%b %d, %Y"), employees[key]))

def main():
    data = get_file_content(FILE_URL)
    start_date = get_start_date()
    list_newer(start_date, data)

if __name__ == "__main__":
    main()


Getting the first start date to query for.

The date must be greater than Jan 1st, 2018
Enter a value for the year: 2019
Enter a value for the month: 1
Enter a value for the day: 1

Started on Jan 05, 2019: ['Lucy Calhoun']
Started on Jan 11, 2019: ['Macon Livingston']
Started on Jan 12, 2019: ['Curran Farley']
Started on Jan 13, 2019: ['Lucius Glass']
Started on Jan 14, 2019: ['Michael Pickett']
Started on Jan 15, 2019: ['Andrew Donaldson']
Started on Jan 19, 2019: ['Richard Dillon']
Started on Jan 21, 2019: ['Ainsley Knight']
Started on Jan 26, 2019: ['Casey Gross']
Started on Jan 29, 2019: ['Felix Parks']
Started on Feb 03, 2019: ['Logan Sharp']
Started on Feb 04, 2019: ['Eve Meyer']
Started on Feb 05, 2019: ['Neil Warner']
Started on Feb 11, 2019: ['Kylan Spencer']
Started on Feb 12, 2019: ['Adara Mclaughlin']
Started on Feb 14, 2019: ['Diana Mccall']
Started on Feb 15, 2019: ['Knox Williamson']
Started on Feb 17, 2019: ['Nathaniel Puckett']
Started on Feb 18, 2019: ['Keane Greer'

In [10]:
#!/usr/bin/env python3
import csv
import datetime
import requests

FILE_URL="http://marga.com.ar/employees-with-date.csv"

def get_start_date():
    """Interactively get the start date to query for."""

    print()
    print('Getting the first start date to query for.')
    print()
    print('The date must be greater than Jan 1st, 2018')
    year = int(input('Enter a value for the year: '))
    month = int(input('Enter a value for the month: '))
    day = int(input('Enter a value for the day: '))
    print()

    return datetime.datetime(year, month, day)

def get_file_content(url):
    """Download the file over the internet and
    convert needed columns into sorted dictionary."""

    my_dict = {}
    with requests.get(url, stream=True) as r:
        lines = (line.decode('utf-8') for line in r.iter_lines())
        reader = csv.reader(lines)
        next(reader)
        for row in reader:
            my_dict[row[3]] = [row[0] + " " + row[1]]
        print(dict(sorted(my_dict.items())))
    return dict(sorted(my_dict.items()))

def get_same_or_newer(start_date, data):
    """Go through all the data and find the employees that started on the given date, or the closest one. Do not include dates after today."""

    my_date_employees = {}
    for key in data:
        name = data[key]
        date = datetime.datetime.strptime(key, '%Y-%m-%d')
        if date >= start_date and date <= datetime.datetime.today():
            my_date_employees[date] = name
    return my_date_employees

def list_newer(start_date, data):
    employees = get_same_or_newer(start_date, data)
    for key in employees:
        print("Started on {}: {}".format(key.strftime("%b %d, %Y"), employees[key]))

def main():
    data = get_file_content(FILE_URL)
    start_date = get_start_date()
    list_newer(start_date, data)
    print (data)

if __name__ == "__main__":
    main()

{'2018-01-01': ['Aurelia Giles'], '2018-01-03': ['Dane Schwartz'], '2018-01-07': ['Jarrod Carlson'], '2018-01-08': ['Tiger May'], '2018-01-10': ['Dean Gilmore'], '2018-01-13': ['Brody Carter'], '2018-01-15': ['Indigo Chen'], '2018-01-21': ['Hiram Browning'], '2018-01-23': ['Katelyn May'], '2018-01-25': ['Joy Hicks'], '2018-01-26': ['Alana Potts'], '2018-01-27': ['Kasper Alford'], '2018-01-29': ['Kristen Christensen'], '2018-01-30': ['Vaughan Carter'], '2018-02-02': ['Lillith Pace'], '2018-02-05': ['Sydnee Pickett'], '2018-02-07': ['Dalton Dennis'], '2018-02-08': ['Edward Nichols'], '2018-02-09': ['Bradley Workman'], '2018-02-10': ['Rina Mcfarland'], '2018-02-11': ['Ivory Glenn'], '2018-02-16': ['Casey Tate'], '2018-02-19': ['Jerome Livingston'], '2018-02-25': ['Irene Dudley'], '2018-02-26': ['Dylan Jackson'], '2018-02-28': ['Clio Petersen'], '2018-03-03': ['Colt Mcdaniel'], '2018-03-07': ['Lareina Mercado'], '2018-03-08': ['Vivien Stark'], '2018-03-09': ['Hamilton Manning'], '2018-03-1

Enter a value for the year: 2019
Enter a value for the month: 1
Enter a value for the day: 1

Started on Jan 05, 2019: ['Lucy Calhoun']
Started on Jan 11, 2019: ['Macon Livingston']
Started on Jan 12, 2019: ['Curran Farley']
Started on Jan 13, 2019: ['Lucius Glass']
Started on Jan 14, 2019: ['Michael Pickett']
Started on Jan 15, 2019: ['Andrew Donaldson']
Started on Jan 19, 2019: ['Richard Dillon']
Started on Jan 21, 2019: ['Ainsley Knight']
Started on Jan 26, 2019: ['Casey Gross']
Started on Jan 29, 2019: ['Felix Parks']
Started on Feb 03, 2019: ['Logan Sharp']
Started on Feb 04, 2019: ['Eve Meyer']
Started on Feb 05, 2019: ['Neil Warner']
Started on Feb 11, 2019: ['Kylan Spencer']
Started on Feb 12, 2019: ['Adara Mclaughlin']
Started on Feb 14, 2019: ['Diana Mccall']
Started on Feb 15, 2019: ['Knox Williamson']
Started on Feb 17, 2019: ['Nathaniel Puckett']
Started on Feb 18, 2019: ['Keane Greer']
Started on Feb 20, 2019: ['May Oliver']
Started on Feb 21, 2019: ['Dana Harrington']
St

In [11]:
#!/usr/bin/env python3


import csv
import datetime
import requests


FILE_URL = "https://storage.googleapis.com/gwg-hol-assets/gic215/employees-with-date.csv"

def get_start_date():
    """Interactively get the start date to query for."""

    print()
    print('Getting the first start date to query for.')
    print()
    print('The date must be greater than Jan 1st, 2018')
    year = int(input('Enter a value for the year: '))
    month = int(input('Enter a value for the month: '))
    day = int(input('Enter a value for the day: '))
    print()

    return datetime.datetime(year, month, day)

def get_file_lines(url):
    """Returns the lines contained in the file at the given URL"""

    # Download the file over the internet
    response = requests.get(url, stream=True)
    lines = []

    for line in response.iter_lines():
        lines.append(line.decode("UTF-8"))
    return lines


def get_same_or_newer(start_date, data):
    """Returns the employees that started on the given date, or the closest one."""
    reader = csv.reader(data[1:])
    data_dict = dict()

    for row in reader: 
        row_date = datetime.datetime.strptime(row[3], '%Y-%m-%d')
        fullname = "{} {}".format(row[0], row[1])
        if start_date <= row_date:
            if row_date not in data_dict.keys():
                data_dict[row_date] = [fullname]
            if fullname not in data_dict[row_date]:
                data_dict[row_date].append(fullname)
    # sorting the data_dict
    sorted_data_dict = sorted(data_dict.keys())
    # print(sorted_data_dict)
    for key in sorted_data_dict:
        print("Started on {}: {}".format(key.strftime("%b %d, %Y"), data_dict[key]))


def main():
    start_date = get_start_date()
    data = get_file_lines(FILE_URL)
    # list_newer(start_date, data)
    get_same_or_newer(start_date, data)

if __name__ == "__main__":
    main()



Getting the first start date to query for.

The date must be greater than Jan 1st, 2018
Enter a value for the year: 2019
Enter a value for the month: 1
Enter a value for the day: 1

Started on Jan 05, 2019: ['Lucy Calhoun']
Started on Jan 11, 2019: ['Macon Livingston']
Started on Jan 12, 2019: ['Curran Farley']
Started on Jan 13, 2019: ['Lucius Glass']
Started on Jan 14, 2019: ['Michael Pickett']
Started on Jan 15, 2019: ['Andrew Donaldson']
Started on Jan 19, 2019: ['Richard Dillon']
Started on Jan 21, 2019: ['Clare Saunders', 'Ainsley Knight']
Started on Jan 26, 2019: ['Fleur Baker', 'Casey Gross']
Started on Jan 29, 2019: ['Felix Parks']
Started on Feb 03, 2019: ['Logan Sharp']
Started on Feb 04, 2019: ['Eve Meyer']
Started on Feb 05, 2019: ['Neil Warner']
Started on Feb 11, 2019: ['Kylan Spencer']
Started on Feb 12, 2019: ['Adara Mclaughlin']
Started on Feb 14, 2019: ['Diana Mccall']
Started on Feb 15, 2019: ['Knox Williamson']
Started on Feb 17, 2019: ['Nathaniel Puckett']
Starte