## Guided Project: Building Fast Queries on a CSV

The following project focusses on building fast queries (improved algorithms) by making use of laptop inventory information.
There will be made use of the laptops.csv file as inventory. This CSV file was adapted from the Laptop Prices dataset on Kaggle.

In [1]:
# importing required modules and reading csv file
import csv
with open('laptops.csv') as file:
    reader = csv.reader(file)
    all_rows = list(reader)
    header = all_rows[0]
    rows = all_rows[1:]
    
# printing header and first 5 rows
print(header)
for row in rows[:5]:
    print(row)    


['Id', 'Company', 'Product', 'TypeName', 'Inches', 'ScreenResolution', 'Cpu', 'Ram', 'Memory', 'Gpu', 'OpSys', 'Weight', 'Price']
['6571244', 'Apple', 'MacBook Pro', 'Ultrabook', '13.3', 'IPS Panel Retina Display 2560x1600', 'Intel Core i5 2.3GHz', '8GB', '128GB SSD', 'Intel Iris Plus Graphics 640', 'macOS', '1.37kg', '1339']
['7287764', 'Apple', 'Macbook Air', 'Ultrabook', '13.3', '1440x900', 'Intel Core i5 1.8GHz', '8GB', '128GB Flash Storage', 'Intel HD Graphics 6000', 'macOS', '1.34kg', '898']
['3362737', 'HP', '250 G6', 'Notebook', '15.6', 'Full HD 1920x1080', 'Intel Core i5 7200U 2.5GHz', '8GB', '256GB SSD', 'Intel HD Graphics 620', 'No OS', '1.86kg', '575']
['9722156', 'Apple', 'MacBook Pro', 'Ultrabook', '15.4', 'IPS Panel Retina Display 2880x1800', 'Intel Core i7 2.7GHz', '16GB', '512GB SSD', 'AMD Radeon Pro 455', 'macOS', '1.83kg', '2537']
['8550527', 'Apple', 'MacBook Pro', 'Ultrabook', '13.3', 'IPS Panel Retina Display 2560x1600', 'Intel Core i5 3.1GHz', '8GB', '256GB SSD',

### Implementing an Inventory class

In [2]:
class Inventory():
    def __init__(self, file_name):
        with open(file_name) as file:
            reader = csv.reader(file)
            all_rows = list(reader)
        self.header = all_rows[0]
        self.rows = all_rows[1:]
        for row in self.rows:
            row[-1] = int(row[-1]) # changing price row into integers

# testing class
inventory = Inventory("laptops.csv")
print(inventory.header)
print(len(inventory.rows))


['Id', 'Company', 'Product', 'TypeName', 'Inches', 'ScreenResolution', 'Cpu', 'Ram', 'Memory', 'Gpu', 'OpSys', 'Weight', 'Price']
1303


### Finding a Laptop From the Id
Implementing a get_laptop_from_id() function that given a laptop identifier find the row corresponding to that laptop.

In [3]:
class Inventory():
    def __init__(self, file_name):
        with open(file_name) as file:
            reader = csv.reader(file)
            all_rows = list(reader)
        self.header = all_rows[0]
        self.rows = all_rows[1:]
        for row in self.rows:
            row[-1] = int(row[-1]) # changing price row into integers
            
    def get_laptop_from_id(self, id):
        for row in self.rows:
            if row[0] == id:
                return row
        return None # return if nothing is found

In [4]:
# test case for newly created method
inventory = Inventory('laptops.csv')           
print(inventory.get_laptop_from_id('3362737')) 
print(inventory.get_laptop_from_id('3362736')) 

['3362737', 'HP', '250 G6', 'Notebook', '15.6', 'Full HD 1920x1080', 'Intel Core i5 7200U 2.5GHz', '8GB', '256GB SSD', 'Intel HD Graphics 620', 'No OS', '1.86kg', 575]
None


### Improving Id Lookups
Improving the time complexity of finding a laptop with a given id by precomputing a dictionary that maps laptop ids to rows.

In [5]:
class Inventory():
    def __init__(self, file_name):
        with open(file_name) as file:
            reader = csv.reader(file)
            all_rows = list(reader)
        self.header = all_rows[0]
        self.rows = all_rows[1:]
        for row in self.rows:
            row[-1] = int(row[-1]) # changing price row into integers
        self.id_to_row = {}
        for row in self.rows:
            self.id_to_row[row[0]] = row
            
    def get_laptop_from_id(self, id):
        for row in self.rows:                 
            if row[0] == id:
                return row
        return None   
    ## new function for improved lookup speed
    def get_laptop_from_id_fast(self, id):   
        if id in self.id_to_row:             
            return self.id_to_row[id]
        return None

In [6]:
# test case for newly created method
inventory = Inventory('laptops.csv')           
print(inventory.get_laptop_from_id('3362737')) 
print(inventory.get_laptop_from_id('3362736')) 

['3362737', 'HP', '250 G6', 'Notebook', '15.6', 'Full HD 1920x1080', 'Intel Core i5 7200U 2.5GHz', '8GB', '256GB SSD', 'Intel HD Graphics 620', 'No OS', '1.86kg', 575]
None


### Comparing Performance
Compare the performance of both functions (dictionary (O(1)) vs iteration (O(N)). 

In [7]:
# importing required modules
import time
import random

# Genertating random ids
random_ids = [random.randint(1000000, 9999999) for _ in range(1000)]
inventory = Inventory('laptops.csv')

# test case for iteration method (O(1))
total_time_no_dict = 0
for id_n in random_ids:
    start_time = time.time()
    inventory.get_laptop_from_id(id_n)
    end_time = time.time()
    total_time_no_dict += end_time - start_time
    
# test case for dictionary method (O(N))
total_time_dict = 0
for id_n in random_ids:
    start_time = time.time()
    inventory.get_laptop_from_id_fast(id_n)
    end_time = time.time()
    total_time_dict += end_time - start_time
    
print(total_time_no_dict)
print(total_time_dict)

0.06583690643310547
0.0009832382202148438


#### Analysis of above
For the above we got:

- iteration:  0.06583690643310547
- dictionary: 0.0009832382202148438

One can see a significant improvement in performance. The speed increase is about 67 times when using a dictionary.

### Two Laptop Promotion
Below a method was added that finds whether one can spend a given amount of money by purchasing either one or two laptops. The idea is to spend the full amount of a gift card provided (since the gift card can only be uses once and to avoid that a customer feel cheated).

In [8]:
class Inventory():
    def __init__(self, file_name):
        with open(file_name) as file:
            reader = csv.reader(file)
            all_rows = list(reader)
        self.header = all_rows[0]
        self.rows = all_rows[1:]
        for row in self.rows:
            row[-1] = int(row[-1]) # changing price row into integers
        self.id_to_row = {}
        for row in self.rows:
            self.id_to_row[row[0]] = row
            
    def get_laptop_from_id(self, id):
        for row in self.rows:                 
            if row[0] == id:
                return row
        return None   
    ## new function for improved lookup speed
    def get_laptop_from_id_fast(self, id):   
        if id in self.id_to_row:             
            return self.id_to_row[id]
        return None
    
    def check_promotion_amounts(self, amount):
        for row in self.rows:
            price = row[-1]
            if price == amount: # inspect if one laptop equates to full gift card amount
                return True
        for row1 in self.rows:
            for row2 in self.rows:
                price = row1[-1] + row2[-1]
                if price == amount:
                    return True
        return False               

In [9]:
## testing new method
inventory = Inventory('laptops.csv')               
print(inventory.check_promotion_amounts(1000))     
print(inventory.check_promotion_amounts(442))      

True
False


### Optimizing Laptop Promotion
Creating a faster version of the promotion method by using a different technique.

In [10]:
class Inventory():
    def __init__(self, file_name):
        with open(file_name) as file:
            reader = csv.reader(file)
            all_rows = list(reader)
        self.header = all_rows[0]
        self.rows = all_rows[1:]
        for row in self.rows:
            row[-1] = int(row[-1])
        self.id_to_row = {}
        for row in self.rows:
            self.id_to_row[row[0]] = row
        self.prices = set() ## adding prices set for efficiency in calculating promotions
        for row in self.rows:
            self.prices.add(row[-1])
            
    def get_laptop_from_id(self, id):
        for row in self.rows:                 
            if row[0] == id:
                return row
        return None   
   
    def get_laptop_from_id_fast(self, id):   
        if id in self.id_to_row:             
            return self.id_to_row[id]
        return None
        return None

    def check_promotion_amounts(self, amount):
        for row in self.rows:
            price = row[-1]
            if price == amount: # inspect if one laptop equates to full gift card amount
                return True
        for row1 in self.rows:
            for row2 in self.rows:
                price = row1[-1] + row2[-1]
                if price == amount:
                    return True
        return False    
    
    def check_promotion_amounts_fast(self, amount):
        if amount in self.prices:## inspection for one laptop with improved timing
            return True
        for price in self.prices: ## inspection for two laptops with improved timing
            if amount - price in self.prices:
                return True
        return False    

In [11]:
# Testing the method
inventory = Inventory('laptops.csv')                 
print(inventory.check_promotion_amounts_fast(1000))  
print(inventory.check_promotion_amounts_fast(442)) 

True
False


### Comparing Performance
Compare the performance of both methods for the promotion (double ietration (O(N²)) vs set iteration (O(N)).

In [12]:
prices = [random.randint(100, 5000) for _ in range(100)]

inventory = Inventory("laptops.csv")

total_time_no_set = 0

for price in prices:
    start = time.time()
    inventory.check_promotion_amounts(price)
    end = time.time()
    total_time_no_set += end - start
    
total_time_set = 0

for price in prices:
    start = time.time()
    inventory.check_promotion_amounts_fast(price)
    end = time.time()
    total_time_set += end - start
    
print(total_time_no_set)
print(total_time_set)    

1.3713688850402832
0.0009913444519042969


#### Analysis of above
For the above we got:

- double iteration:  0.7908847332000732
- set iteration: 0.000997304916381836

One can see a significant improvement in performance. The speed increase is about 1384 times when using a set.

### Finding Laptops Within a Budget
Implementing a method for finding the range of indexes of laptops that fall within a budget.

In [23]:
def row_price(row):
    '''creating a function for use in the sorted build in function '''
    return row[-1]

class Inventory():
    def __init__(self, file_name):
        with open(file_name) as file:
            reader = csv.reader(file)
            all_rows = list(reader)
        self.header = all_rows[0]
        self.rows = all_rows[1:]
        for row in self.rows:
            row[-1] = int(row[-1])
        self.id_to_row = {}
        for row in self.rows:
            self.id_to_row[row[0]] = row
        self.prices = set() 
        for row in self.rows:
            self.prices.add(row[-1])
        self.rows_by_price = sorted(self.rows, key = row_price)
            
    def get_laptop_from_id(self, id):
        for row in self.rows:                 
            if row[0] == id:
                return row
        return None   
   
    def get_laptop_from_id_fast(self, id):   
        if id in self.id_to_row:             
            return self.id_to_row[id]
        return None
        return None

    def check_promotion_amounts(self, amount):
        for row in self.rows:
            price = row[-1]
            if price == amount: # inspect if one laptop equates to full gift card amount
                return True
        for row1 in self.rows:
            for row2 in self.rows:
                price = row1[-1] + row2[-1]
                if price == amount:
                    return True
        return False    
    
    def check_promotion_amounts_fast(self, amount):
        if amount in self.prices:## inspection for one laptop with improved timing
            return True
        for price in self.prices: ## inspection for two laptops with improved timing
            if amount - price in self.prices:
                return True
        return False
    
    def find_laptop_with_price(self, target_price):
        range_start = 0                                   
        range_end = len(self.rows_by_price) - 1                       
        while range_start < range_end:
            range_middle = (range_end + range_start) // 2  
            price = self.rows_by_price[range_middle][-1]
            if price == target_price:                            
                return range_middle                        
            elif price < target_price:                           
                range_start = range_middle + 1             
            else:                                          
                range_end = range_middle - 1 
        price = self.rows_by_price[range_start][-1]
        if self.rows_by_price[range_start][-1] != target_price:                  
            return -1                                      
        return range_start
    
    def find_first_laptop_more_expensive(self, target_price): 
        range_start = 0                                   
        range_end = len(self.rows_by_price) - 1                   
        while range_start < range_end:
            range_middle = (range_end + range_start) // 2  
            price = self.rows_by_price[range_middle][-1]
            if price > target_price:
                range_end = range_middle
            else:
                range_start = range_middle + 1
        if self.rows_by_price[range_start][-1] <= target_price:                  
            return -1                                   
        return range_start

In [24]:
## Testing the last method
inventory = Inventory('laptops.csv')                                 
print(inventory.find_first_laptop_more_expensive(1000)) 
print(inventory.find_first_laptop_more_expensive(10000)) # likely will return no index 

683
-1
