## Python Code Optimization

In this notebook I will go over simple examples for when to use numpy/sets over For Loops to speed-up your execution time. I will also introduce set notation to build an intution of the problems. Here is an overview:
- Python Loops
- Numpy methods
- Set notation
- Set methods
- Test problem

### Dependencies

In [2]:
import time
import numpy as np
import pandas as pd

### The Data

In [3]:
with open("Data/books_published_last_two_years.txt") as file:
    recent_books = file.read().split("\n")
    
with open("Data/all_coding_books.txt") as file:
    coding_books = file.read().split("\n")

In [4]:
print(f"Number of books in recent_books: {len(recent_books)}")
print("")
print(f"Number of books in coding_books: {len(coding_books)}")

Number of books in recent_books: 24159

Number of books in coding_books: 32250


## The Python Loops method

Problem: Using a loop, find which books "recent_books" and "coding_books" have in common (time how long it takes).

In [4]:
start = time.time()
recent_coding_books = []

for book in recent_books:
    if book in coding_books:
        recent_coding_books.append(book)
        
print(len(recent_coding_books))
print(f"Execution time: {time.time() - start}")

96
Execution time: 25.585691452026367


## The NumPy ways

Use vector operations over loops when possible

Problem: Find a NumPy method that will solve the same probelm above and result in a faster execution time

In [5]:
start = time.time()
recent_coding_books = np.intersect1d(coding_books, recent_books)
print(len(recent_coding_books))
print(f"Execution time: {time.time() - start}")

96
Execution time: 0.1872568130493164


Think about the problem. We are actually trying to find an intersection (what each dataset has in common).
Therefore we want to find $A \cap B$

## Enter Sets!

Know your data structures and which methods are faster

Problem: Utilize Sets to decrease your execution time!

In [6]:
start = time.time()
recent_coding_books = set(recent_books).intersection(set(coding_books))
print(len(recent_coding_books))
print(f"Execution time: {time.time() - start}")

96
Execution time: 0.019755125045776367


## Test Problem

Say your online gift store has one million users that each listed a gift on a wish list. You have the prices for each of these gifts stored in `gift_costs.txt`. For the holidays, you're going to give each customer their wish list gift for free if it is under 25 dollars. Now, you want to calculate the total cost of all gifts under 25 dollars to see how much you'd spend on free gifts.

In [7]:
with open("gift_costs.txt") as file:
    gift_costs = file.read().split("\n")
    
gift_costs = np.array(gift_costs).astype(int)

In [10]:
print(len(gift_costs))
gift_costs[:11]

10000001


array([ 8, 84, 42, 65, 74, 66, 48, 27, 78, 52, 97])

In [13]:
start = time.time()

total_price = 0
for cost in gift_costs:
    if cost < 25:
        total_price += cost * 1.08

print(f"Total price: {total_price}")
print(f"Execution time: {time.time() - start} seconds")

Total price: 32765421.23999867
Execution time: 15.528027296066284 seconds


Refactor Code

In [14]:
start = time.time()

total_price = (gift_costs[gift_costs < 25]).sum() * 1.08
print(f"Total price: {total_price}")
print(f"Execution time: {time.time() - start}")

Total price: 32765421.240000002
Execution time: 0.1729130744934082
