# Optimizing Code

This notebook experiments with some simple dataset and demonstrate how to test the code efficiency using python's built-in module `time`.

In [4]:
import time
import os

In [9]:
# change the default directory to the current working directory
os.chdir('/Users/alejandrosanz/Downloads/projects_on_GitHub/data_wrangling/coding_experiments/time_your_code/data/')

## Case 1: Find Common Books

There are two `txt` files. With the following codes to find the common book ids in `books_published_last_two_years.txt` and `all_coding_books.txt` to obtain a list of recent coding books.

In [12]:
with open('books_published_last_two_years.txt') as f:
    recent_books = f.read().split('\n')
    
with open('all_coding_books.txt') as f:
    coding_books = f.read().split('\n')

### Using loops

In [14]:
# record the time to start
start = time.time()

# initiate a list to record the recent coding book
recent_coding_books = []

for book in recent_books:
    if book in coding_books:
        recent_coding_books.append(book)

# record the time end of the loop
end = time.time()

# calculate the time difference
duration = end - start

# print out the results
print("Total recent books: {}.".format(len(recent_coding_books)))
print('Duration: {} seconds'.format(duration))

Total recent books: 96.
Duration: 12.488204002380371 seconds


### Using vectorized method provided by numpy

In [15]:
# record the time to start
start = time.time()

# using numpy's intersect1d method to calculate the common elements in two arrays.
recent_coding_books = np.intersect1d(recent_books, coding_books)

# record the time end of the loop
end = time.time()

# calculate the time difference
duration = end - start

# print out the results
print("Total recent books: {}.".format(len(recent_coding_books)))
print('Duration: {} seconds'.format(duration))

Total recent books: 96.
Duration: 0.042433977127075195 seconds


### Using python's built-in set 

In [17]:
# record the time to start
start = time.time()

# using the property's of SET data structure in built-in python.
recent_coding_books = set(recent_books).intersection(set(coding_books))
# recent_coding_books = set(recent_books) & set(coding_books)

# record the time end of the loop
end = time.time()

# calculate the time difference
duration = end - start

# print out the results
print("Total recent books: {}.".format(len(recent_coding_books)))
print('Duration: {} seconds'.format(duration))

Total recent books: 96.
Duration: 0.009217977523803711 seconds


**Conclusion**: From above, We can see for this scenario, the execution time for the SET method is the shortest.