In [8]:
"""
Measuring time I
In the lecture slides, you saw how the time.time() function can be loaded and used to assess the time required to 
perform a basic mathematical operation.

Now, you will use the same strategy to assess two different methods for solving a similar problem: calculate 
the sum of squares of all the positive integers from 1 to 1 million (1,000,000).

Similar to what you saw in the video, you will compare two methods; one that uses brute force and 
one more mathematically sophisticated.

In the function formula, we use the standard formula

 
where N=1,000,000.

In the function brute_force we loop over each number from 1 to 1 million and add it to the result.


"""

import time
def formula(N):
    return (N*(N+1)*(2*N+1))/6

def brute_force(N):
    result = 0
    for x in range(1,N):
        result = result + (x*x)
    return result

# Calculate the result of the problem using formula() and print the time required
N = 1000000
fm_start_time = time.time()
first_method = formula(N)
print("Time using formula: {} sec".format(time.time() - fm_start_time))

# Calculate the result of the problem using brute_force() and print the time required
sm_start_time = time.time()
second_method = brute_force(N)
print("Time using the brute force: {} sec".format(time.time() - sm_start_time))

Time using formula: 0.00030112266540527344 sec
Time using the brute force: 0.15403032302856445 sec


In [10]:
"""
Measuring time II
As we discussed in the lectures, in the majority of cases, a list comprehension is faster than a for loop.

In this demonstration, you will see a case where a list comprehension and a for loop have so small
difference in efficiency that choosing either method will perform this simple task instantly.

In the list words, there are random words downloaded from the Internet. We are interested to create 
another list called listlet in which we only keep the words that start with the letter b.

In case you are not familiar with dealing with strings in Python, each string has the .startswith()
attribute, which returns a True/False statement whether the string starts with a specific letter/phrase or not.
"""

import time

words = "['<html>', '<head><title>404 Not Found</title></head>', '<body>', '<center><h1>404 Not Found</h1></center>', '<hr><center>nginx</center>', '</body>', '</html>']"


# Store the time before the execution
start_time = time.time()

# Execute the operation
letlist = [wrd for wrd in words if wrd.startswith('b')]

# Store and print the difference between the start and the current time
total_time_lc = time.time() - start_time
print('Time using list comprehension: {} sec'.format(total_time_lc))

# Store the time before the execution
start_time = time.time()

# Execute the operation
letlist = []
for wrd in words:
    if wrd.startswith('b'):
        letlist.append(wrd)
        
# Print the difference between the start and the current time
total_time_fl = time.time() - start_time
print('Time using for loop: {} sec'.format(total_time_fl))

Time using list comprehension: 0.0009946823120117188 sec
Time using for loop: 0.0009319782257080078 sec


In [13]:
"""
Row selection: loc[] vs iloc[]
A big part of working with DataFrames is to locate specific entries in the dataset. You can locate rows in two ways:

By a specific value of a column (feature).
By the index of the rows (index). In this exercise, we will focus on the second way.
If you have previous experience with pandas, you should be familiar with the .loc and .iloc 
indexers, which stands for 'location' and 'index location' respectively. In most cases, 
the indices will be the same as the position of each row in the Dataframe (e.g. the row with index 13 will be the 14th entry).

While we can use both functions to perform the same task, we are interested in which is the most efficient in terms of speed.
"""
# Import pandas as pd
import pandas as pd
poker_hands = pd.read_csv("poker-hand-testing.csv")
# Define the range of rows to select: row_nums
row_nums = range(0, 1000)

# Select the rows using .loc[] and row_nums and record the time before and after
loc_start_time = time.time()
rows = poker_hands.loc[row_nums]
loc_end_time = time.time()

# Print the time it took to select the rows using .loc[]
print("Time using .loc[]: {} sec".format(loc_end_time - loc_start_time))


# Select the rows using .iloc[] and row_nums and record the time before and after
iloc_start_time = time.time()
rows = poker_hands.iloc[row_nums]
iloc_end_time = time.time()

# Print the time it took to select the rows using .iloc
print("Time using .iloc[]: {} sec".format(iloc_end_time-iloc_start_time))


Time using .loc[]: 0.004452228546142578 sec
Time using .iloc[]: 0.0006456375122070312 sec
