# Guided Project: Building Fast Queries on a CSV

The goal of this guided project is to create a class that represents our inventory. The methods in that class will implement the queries that we want to answer about our inventory. We will also preprocess that data to make those queries run faster.

Here are some queries that we will want to answer:

+ Given a laptop id, find the corresponding data.
+ Given an amount of money, find whether there are two laptops whose total price is that given amount.
+ Identify all laptops whose price falls within a given budget.

We will use the laptops.csv file as our inventory. This CSV file was adapted from the Laptop Prices dataset on Kaggle. We changed the IDs and made the prices integers.

In [1]:
import csv
import time
import random

## Class Type

In [2]:
class Inventory():
    
    def __init__(self, csv_filename):
        with open(csv_filename) as file:
            reader = csv.reader(file)
            data = list(reader)
         
        self.header = data[0]
        self.rows = data[1:]
        self.length = len(self.rows)
        self.id_to_row = {}
        self.prices = set()
        self.sort_price = sorted(self.rows, key= lambda x: int(x[-1]))
        
        for row in self.rows:
            row[-1] = int(row[-1])
            
        for row in self.rows:
            self.id_to_row[int(row[0])] = row[1:]
        
        for row in self.rows:
            self.prices.add(row[-1])
    
    def get_laptop_id(self, laptop_id, print_out = True):
        
        self.found_row = None
        
        for row in self.rows:
            if int(row[0]) == int(laptop_id):
                self.found_row = row
                
        if print_out:
            if self.found_row != None:
                print("Laptop found with {} ID:\n{}\n".format(laptop_id,self.found_row))
            else:
                print("No laptop with the ID found in database.\n")
                
    def get_laptop_id_fast(self, laptop_id, print_out = True):
        
        self.found_row = None
        
        if laptop_id in self.id_to_row:
            self.found_row = self.id_to_row[laptop_id]
        
        if print_out:
            if self.found_row != None:
                print("Laptop found with {} ID:\n{}\n".format(laptop_id,self.found_row))
            else:
                print("No laptop with the ID found in database.\n")
                
    def check_promotion_dollars(self, dollars):
        
        # Check single value
        for row in self.rows:
            if dollars == int(row[-1]):
                return True
        
        # Check double values
        for i in self.rows:
            for j in self.rows:
                if dollars == int(i[-1] + j[-1]):
                    return True
                else:
                    return False
    
    def check_promotion_dollars_fast(self, dollars):
        
        # Check single value
        if dollars in self.prices:
            return True
        
        # Check double values
        for row in self.rows:
            remainder = dollars - row[-1]
            if remainder in self.prices:
                return True
        
        return False
    
    def find_first_laptop_more_expensive(self, target_price, print_out=False):
        '''
        Return index of first row in self.sort_price whose price
        is higher than price, else return -1
        '''
        
        range_start = 0                                   
        range_end = len(self.sort_price) - 1  
        
        while range_start < range_end:
            range_middle = (range_end + range_start) // 2  
            price = int(self.sort_price[range_middle][-1])
            if price == target_price:                            
                return range_middle + 1                       
            elif price < target_price:                           
                range_start = range_middle + 1             
            else:                                          
                range_end = range_middle - 1
        
        if print_out:
            print("Found Price: {}".format(price))
            print("Index Start: {}".format(range_start))
            print("Index End: {}".format(range_end))                                   
        
        if target_price > price:
            return -1
        
        index_return = max([range_start,range_end])
        
        return index_return + 1
        
        
inventory = Inventory("laptops.csv")

In [3]:
print(inventory.header)
print(inventory.rows[:2])
print(inventory.length)

['Id', 'Company', 'Product', 'TypeName', 'Inches', 'ScreenResolution', 'Cpu', 'Ram', 'Memory', 'Gpu', 'OpSys', 'Weight', 'Price']
[['6571244', 'Apple', 'MacBook Pro', 'Ultrabook', '13.3', 'IPS Panel Retina Display 2560x1600', 'Intel Core i5 2.3GHz', '8GB', '128GB SSD', 'Intel Iris Plus Graphics 640', 'macOS', '1.37kg', 1339], ['7287764', 'Apple', 'Macbook Air', 'Ultrabook', '13.3', '1440x900', 'Intel Core i5 1.8GHz', '8GB', '128GB Flash Storage', 'Intel HD Graphics 6000', 'macOS', '1.34kg', 898]]
1303


Here is a brief description of the rows:

+ **ID**: A unique identifier for the laptop.
+ **Company**: The name of the company that produces the laptop.
+ **Product**: The name of the laptop.
+ **TypeName**: The type of laptop.
+ **Inches**: The size of the screen in inches.
+ **ScreenResolution**: The resolution of the screen.
+ **CPU**: The laptop CPU.
+ **RAM**: The amount of RAM in the laptop.
+ **Memory**: The size of the hard drive.
+ **GPU**: The graphics card name.
+ **OpSys**: The name of the operating system.
+ **Weight**: The laptop weight.
+ **Price**: The price of the laptop.

### Getting Laptop From ID with two methods

In [4]:
# Finding with for loop
inventory.get_laptop_id(3362737)
inventory.get_laptop_id(3362736)

Laptop found with 3362737 ID:
['3362737', 'HP', '250 G6', 'Notebook', '15.6', 'Full HD 1920x1080', 'Intel Core i5 7200U 2.5GHz', '8GB', '256GB SSD', 'Intel HD Graphics 620', 'No OS', '1.86kg', 575]

No laptop with the ID found in database.



In [5]:
# Finding with dict
inventory.get_laptop_id_fast(3362737)
inventory.get_laptop_id_fast(3362736)

Laptop found with 3362737 ID:
['HP', '250 G6', 'Notebook', '15.6', 'Full HD 1920x1080', 'Intel Core i5 7200U 2.5GHz', '8GB', '256GB SSD', 'Intel HD Graphics 620', 'No OS', '1.86kg', 575]

No laptop with the ID found in database.



### Comparing Run Times Between Two Methods

In [6]:
# Generate Random Variables To Compare
ids = [random.randint(1000000,9999999) for _ in range(10000)]

In [7]:
# Total Time for No Dict Algorithum
total_time_no_dict = 0

for id in ids:
    start = time.time()
    inventory.get_laptop_id(id, print_out = False)
    end = time.time()
    total_time_no_dict += (end - start)

# Total Time for Dict Algorithum
total_time_dict = 0

for id in ids:
    start = time.time()
    inventory.get_laptop_id_fast(id, print_out = False)
    end = time.time()
    total_time_dict += (end - start)
    
print("  Total time for lookup no dictonary: {:.3f} secs".format(total_time_no_dict))
print("Total time for lookup with dictonary: {:.3f} secs".format(total_time_dict))

  Total time for lookup no dictonary: 4.064 secs
Total time for lookup with dictonary: 0.002 secs


### Checking Sum of Two Values

Write a function that, given a dollar amount, checks whether it is possible to spend precisely that amount by purchasing up to two laptops.

In [8]:
inventory.check_promotion_dollars(443)

True

Going to imporve the previous function so that it runs in constant time. This can be achevied by storing the price values in a set and then find values.

In [9]:
inventory.check_promotion_dollars_fast(1000)

True

### Comparing Run Time Between Two methods

In [10]:
# Generate Random Price Variables
ran_prices = [random.randint(100,5000) for _ in range(100)]

In [11]:
# Total Run Time for No Set
total_time_no_set = 0

for price in ran_prices:
    start = time.time()
    inventory.check_promotion_dollars(price)
    end = time.time()
    total_time_no_set += (end - start)

# Total Run Time for Sets
total_time_set = 0

for price in ran_prices:
    start = time.time()
    inventory.check_promotion_dollars_fast(price)
    end = time.time()
    total_time_set += (end - start)
    
print("  Total time for lookup no set: {:.6f} secs".format(total_time_no_set))
print("Total time for lookup with set: {:.6f} secs".format(total_time_set))

  Total time for lookup no set: 0.036925 secs
Total time for lookup with set: 0.000999 secs


### Using Binary Search to Find Laptops in Budget

In [12]:
inventory.find_first_laptop_more_expensive(4899, print_out=True)

1301