After all those lessons, you have now a lot of tools at your disposal to code a lot of things! 
In this notebook, the main topic will be **functions and code structure**. 
In addition, we will explore how **reading from a file** works along with other nice to know things.

Let's go!

# Function Practice 1 : Parsing Daily Activities

Here's what we want to achieve. We have a file containing information about our daily activities and we would like to take that information and extract useful knowledge from it.

Starting with the file. We will use a CSV (**C**omma-**S**eparated **V**alues) file. A CSV file a simple format to store data into a table-like structure. Each row has mutliple values which are separated via a comma, hence the name *comma-separated values*. It looks like this:

    value01,value02,value03
    value11,value12,value13
    value21,value22,value23

Often, you will se that the first line is the column header:

    header1,header2,header3
    value01,value02,value03
    value11,value12,value13
    value21,value22,value23

Note that we are not forced to separate a CSV file with commas. Often a semi-colon is used because it allows for example to use the comma in values for decimal or sentences.

All right. For our daily activities we will have one CSV file per day and each CSV file will have the following columns:

- **activity**: The name of the activity.
- **start_time**: The time at which the activity started (format is e.g. 14:05).
- **duration**: The duration in hours that the activity lasted.
- **is_outdoor**: A flag that says if the activity happened outdoor or indoor.

Enough explanation for now, let's start to write a little code. 

## 1. Reading the CSV

First thing we want to do is: read our CSV files located in `data/`. [How do we do that?](http://lmgtfy.com/?q=python+read+csv+file)


In [1]:
# Python has a huge standard library and manipulating CSV files is part of it
# The next code comes from what we found on google.

import csv # We import the csv module

# This is new, we will see later what it does
# WITH the csv file OPENED and available in a VARIABLE, we DO something
with open('data/monday.csv') as csvfile:
    # Then, we create a CSV reader that will help us read the CSV.
    # We just give it the file and say that we use ; as separator / delimiter.
    reader = csv.reader(csvfile, delimiter=';')
    
    # The reader returns an object that will iterate over each line as a list of strings.
    for row in reader:
        print(row)

['activity', 'start_time', 'duration', 'is_outdoor']
['Healthy Breakfast', '07:00', '0.2', '0']
['Bus', '07:30', '0.5', '1']
['School', '08:15', '4', '0']
['Lunch', '12:00', '0.8', '1']
['Catching Pokemons', '12:48', '0.2', '1']
['School', '13:00', '4', '0']
['Bus', '17:00', '0.4', '1']
['Work on assignment', '17:40', '1.5', '0']
['Dinner with family', '19:20', '0.5', '0']
['Study for exam', '19:50', '2', '0']


Ok, so now that this is working, how about we put it into a nice function. This function will read the CSV and store it into a **list of dictionaries** so that we can perform other operations later.

If you browse a bit the [official Python 3 documentation for CSV](https://docs.python.org/3/library/csv.html#csv.DictWriter) which very easy to google, you will see that there is a `csv.DictWriter` that looks like exactly what we need, so let's just use this in our function.

In [87]:
# Note: we already imported csv once, no need to import it again.

def read_activity_file(activity_file):
    ''' Open the activity CSV file and return the data as
    a list of dictionaries.
    '''
    activities = []
    with open(activity_file) as csvfile:
        # DictReader uses the first row as key for 
        reader = csv.DictReader(csvfile, delimiter=';')
        
        for row in reader: # Now row is a dictionary
            #print(row)
            activities.append(row)
    
    return activities

In [88]:
# Now we can just call the function with a path to a csv file.
monday = read_activity_file('data/monday.csv')

print('Monday: ', monday)

Monday:  [{'duration': '0.2', 'activity': 'Healthy Breakfast', 'start_time': '07:00', 'is_outdoor': '0'}, {'duration': '0.5', 'activity': 'Bus', 'start_time': '07:30', 'is_outdoor': '1'}, {'duration': '4', 'activity': 'School', 'start_time': '08:15', 'is_outdoor': '0'}, {'duration': '0.8', 'activity': 'Lunch', 'start_time': '12:00', 'is_outdoor': '1'}, {'duration': '0.2', 'activity': 'Catching Pokemons', 'start_time': '12:48', 'is_outdoor': '1'}, {'duration': '4', 'activity': 'School', 'start_time': '13:00', 'is_outdoor': '0'}, {'duration': '0.4', 'activity': 'Bus', 'start_time': '17:00', 'is_outdoor': '1'}, {'duration': '1.5', 'activity': 'Work on assignment', 'start_time': '17:40', 'is_outdoor': '0'}, {'duration': '0.5', 'activity': 'Dinner with family', 'start_time': '19:20', 'is_outdoor': '0'}, {'duration': '2', 'activity': 'Study for exam', 'start_time': '19:50', 'is_outdoor': '0'}]


In [108]:
# Let's print it better... and put this in a function too

#for activity in monday: # Then put in function

def print_activities(activities):
    ''' Prints the activities in a beautiful way.
    '''
    
    for activity in activities:
        print('{:30} at {} during {:3} hours (Outdoor: {})'.format(activity['activity'], 
                                                        activity['start_time'],
                                                        activity['duration'],
                                                        activity['is_outdoor']))
    
print_activities(monday)

Healthy Breakfast              at 07:00 during 0.2 hours (Outdoor: False)
Bus                            at 07:30 during 0.5 hours (Outdoor: True)
School                         at 08:15 during 4.0 hours (Outdoor: False)
Lunch                          at 12:00 during 0.8 hours (Outdoor: True)
Catching Pokemons              at 12:48 during 0.2 hours (Outdoor: True)
School                         at 13:00 during 4.0 hours (Outdoor: False)
Bus                            at 17:00 during 0.4 hours (Outdoor: True)
Work on assignment             at 17:40 during 1.5 hours (Outdoor: False)
Dinner with family             at 19:20 during 0.5 hours (Outdoor: False)
Study for exam                 at 19:50 during 2.0 hours (Outdoor: False)


By looking at our list of activity dictionaries, we can see all our data, but there are a little details we should take care of. 

1. The duration is actually a float number, but it stores this value as a string, because this is how the CSV reader works. So we need to transform all durations to floats.
1. The `is_outdoor` key always stores 0 or 1, but as a string '0' or '1'. It would be best to update all activites and store a real boolean value!

In [109]:
def fix_activites(activites):
    ''' Fix data of all activities.
    '''
    for activity in activites:
        # 1)
        activity['duration'] = float( activity['duration'] )
        
        # 2)
        # First check, if is_outdoor is not yet a boolean value (This protects us if we run this 2x)
        if type(activity['is_outdoor']) != bool:
            # is_outdoor becomes True if it's 1, False otherwise.
            activity['is_outdoor'] = activity['is_outdoor'] == '1'
            
fix_activites(monday)            
            
print_activities(monday)

# Testing the duration:
print('\nDuration of first activity: {} (type: {})'.format(monday[0]['duration'], type(monday[0]['duration'])) )

Healthy Breakfast              at 07:00 during 0.2 hours (Outdoor: False)
Bus                            at 07:30 during 0.5 hours (Outdoor: True)
School                         at 08:15 during 4.0 hours (Outdoor: False)
Lunch                          at 12:00 during 0.8 hours (Outdoor: True)
Catching Pokemons              at 12:48 during 0.2 hours (Outdoor: True)
School                         at 13:00 during 4.0 hours (Outdoor: False)
Bus                            at 17:00 during 0.4 hours (Outdoor: True)
Work on assignment             at 17:40 during 1.5 hours (Outdoor: False)
Dinner with family             at 19:20 during 0.5 hours (Outdoor: False)
Study for exam                 at 19:50 during 2.0 hours (Outdoor: False)

Duration of first activity: 0.2 (type: <class 'float'>)


## 2. Work with the data

Now that we have our data in a good and easy to use format / data structure, we can start writing more functionalities that will do stuff on that data.

We will explore a few functionalities, but if you have more ideas, feel free to try to implement it. 

*Note:* If you come across any error message, just try to google the exception. If you don't know how to do a particular thing, then try to google it too. **And of course, don't hesitate to contact me for if you need any help!**

## 2.1 The longest activity

Show the activity that required the longest hours in a day! Keep in mind that an activity can happen multiple times in a day.

In [115]:
# Strategy:
# We loop over all activities and sum identical activities together by
# storing the activity name in a dictionaty as the key. This will make
# sure that we store each activity once. The values are the summed durations.

def longest_activity(activities):
    ''' This function returns the longest activity of a day
    along with the total hours.
    '''
    # 1) Sum all activities durations together
    unique_activities = {} #3 key = activity, value = sum (Remember, keys are unique)
    for activity_dict in activities: #1
        activity = activity_dict['activity'] #2
        duration = activity_dict['duration'] #2

        if activity in unique_activities:
            # If the key is already in, then we add the duration
            unique_activities[activity] += duration
        else:
            # If the key is not in yet, we add it with the duration as a value.
            # += would fail, because the key-value does not exist yet
            unique_activities[activity] = duration
        
    # 2) Find the longest activity
    result = None # They key (or activity) with the longest duration
    # Loop over our new dictonary
    # activity is the key, duration is the value
    for activity, duration in unique_activities.items():
        if result == None:
            # If we have no longest activity yet, we just take the first one.
            result = activity
        else:
            # Else, we compare the duration of result with the current duration
            if float(duration) > unique_activities[result]:
                result = activity
                
    # 3) Return our result as a dictionary
    return {'activity': result, 'duration':unique_activities[result]}

In [116]:
longest_activity(monday)

{'activity': 'School', 'duration': 8.0}

# 3. Apply to many days:

To summarize:

- We can read a CSV file and store its data into dictionaries. And we have it as a function.
- We can print the activity data in a nice way. And we have it as a function, too.
- We can fix the values of the dictionaries (str to bool, float, ...). And we have it again as a function.
- We can find out what is the longest activity for any list of activities! And we have it as a function!

But, we did all this for one CSV file only... What if now you need to do the same steps for many more CSV files?

Now, the little extra work that we spend structuring our code and making functions will **pay off big**!! We literally have a function for every step that we performed. We can now, with very little work, apply all our steps to any number of files. 

In [119]:
days_files = [
    'data/sunday.csv',
    'data/monday.csv',
    'data/friday.csv'
]

In [120]:
for day_file in days_files:
    print('*' * 80)
    print('File:', day_file)
    
    day = read_activity_file(day_file) # Read the file

    fix_activites(day) # Fix the data

    print_activities(day) # Print the data
    
    la = longest_activity(day) # Get the longest activity
    print('Longest activity:', la) # Show the longest activity

********************************************************************************
File: data/sunday.csv
Unhealthy Breakfast            at 9:00 during 0.3 hours (Outdoor: False)
Take dog out for a walk        at 10:05 during 0.5 hours (Outdoor: True)
Play video games               at 10:45 during 1.5 hours (Outdoor: False)
Lunch at restaurant            at 12:30 during 1.0 hours (Outdoor: False)
Mountain bike tour in forest   at 14:00 during 2.0 hours (Outdoor: True)
Play video games               at 16:30 during 2.0 hours (Outdoor: False)
Dinner                         at 19:00 during 0.6 hours (Outdoor: False)
Longest activity: {'duration': 3.5, 'activity': 'Play video games'}
********************************************************************************
File: data/monday.csv
Healthy Breakfast              at 07:00 during 0.2 hours (Outdoor: False)
Bus                            at 07:30 during 0.5 hours (Outdoor: True)
School                         at 08:15 during 4.0 hours (Outdoo