# Guided Project
## Building Fast Queries on a CSV

We will imagine that we own an online laptop store and want to build a way to answer a few different business questions about our inventory.

The goal of this guided project is to create a class that represents our inventory. The methods in that class will implement the queries that we want to answer about our inventory. We will also preprocess that data to make those queries run faster.

Here are some queries that we will want to answer:
- Given a laptop id, find the corresponding data.
- Given an amount of money, find whether there are two laptops whose total price is that given amount.
- Identify all laptops whose price falls within a given budget.

In [1]:
import csv

# Open the Laptops dataset
with open('laptops.csv') as file:
    reader = csv.reader(file)
    rows = list(reader)
    header = rows[0]
    rows = rows[1:]

print(header)
for i in range(3):
    print(rows[i])

['Id', 'Company', 'Product', 'TypeName', 'Inches', 'ScreenResolution', 'Cpu', 'Ram', 'Memory', 'Gpu', 'OpSys', 'Weight', 'Price']
['6571244', 'Apple', 'MacBook Pro', 'Ultrabook', '13.3', 'IPS Panel Retina Display 2560x1600', 'Intel Core i5 2.3GHz', '8GB', '128GB SSD', 'Intel Iris Plus Graphics 640', 'macOS', '1.37kg', '1339']
['7287764', 'Apple', 'Macbook Air', 'Ultrabook', '13.3', '1440x900', 'Intel Core i5 1.8GHz', '8GB', '128GB Flash Storage', 'Intel HD Graphics 6000', 'macOS', '1.34kg', '898']
['3362737', 'HP', '250 G6', 'Notebook', '15.6', 'Full HD 1920x1080', 'Intel Core i5 7200U 2.5GHz', '8GB', '256GB SSD', 'Intel HD Graphics 620', 'No OS', '1.86kg', '575']


In [2]:
# class v1.0
# Implement a constructor, that takes the name of the CSV file as an
# argument and then reads the rows contained within
class Inventory():
    def __init__(self, csv_filename):
        with open(csv_filename) as file:
            reader = csv.reader(file)
            rows = list(reader)
        self.header = rows[0]
        self.rows = rows[1:]

bold = '\033[1m'
unbold = '\033[0m'
inventory = Inventory("laptops.csv")
print(bold,"Inventory Header:\n", unbold, inventory.header)
print(bold,"The number of rows in the inventory:",unbold, len(inventory.rows))

[1m Inventory Header:
 [0m ['Id', 'Company', 'Product', 'TypeName', 'Inches', 'ScreenResolution', 'Cpu', 'Ram', 'Memory', 'Gpu', 'OpSys', 'Weight', 'Price']
[1m The number of rows in the inventory: [0m 1303


In [3]:
# class v1.1
# Add on, implement a way to look up a laptop from a given identifier
class Inventory():
    def __init__(self, csv_filename):
        with open(csv_filename) as file:
            reader = csv.reader(file)
            rows = list(reader)
        self.header = rows[0]
        self.rows = rows[1:]
        
    # method to look up a laptop from the Id, column[0]
    def get_laptop_from_id(self, laptop_id):
        for row in self.rows:
            if row[0] == laptop_id:
                return row
        return None
        
        
inventory = Inventory("laptops.csv")
#print("Inventory Header:\n", inventory.header)
#print("The number of rows in the inventory:", len(inventory.rows))
print(inventory.get_laptop_from_id('3362737'))
print(inventory.get_laptop_from_id('3362736'))

['3362737', 'HP', '250 G6', 'Notebook', '15.6', 'Full HD 1920x1080', 'Intel Core i5 7200U 2.5GHz', '8GB', '256GB SSD', 'Intel HD Graphics 620', 'No OS', '1.86kg', '575']
None


In [4]:
# class v1.2
# Add on, preprocess the data into a dictionary where the keys are the
# IDs and the values the rows. Then, we will use that dictionary in the
# get_laptop_from_id() method
class Inventory():
    def __init__(self, csv_filename):
        with open(csv_filename) as file:
            reader = csv.reader(file)
            rows = list(reader)
        self.header = rows[0]
        self.rows = rows[1:]
        # added to add the Id numbers to a list, to search faster, instead
        # of searching an entire row, just the dictionary needs to be searched
        self.id_to_row = {}
        for row in self.rows:
            self.id_to_row[row[0]] = row
            
        
    # method to look up a laptop from the Id, column[0]
    def get_laptop_from_id(self, laptop_id):
        for row in self.rows:
            if row[0] == laptop_id:
                return row
        return None
    
    # a faster method to look up a laptop from the Id, column[0]
    def get_laptop_from_id_fast(self, laptop_id):
        if laptop_id in self.id_to_row:
            return self.id_to_row[laptop_id]
        return None      

inventory = Inventory("laptops.csv")
#print("Inventory Header:\n", inventory.header)
#print("The number of rows in the inventory:", len(inventory.rows))
print(inventory.get_laptop_from_id_fast('3362737'))
print(inventory.get_laptop_from_id_fast('3362736'))

['3362737', 'HP', '250 G6', 'Notebook', '15.6', 'Full HD 1920x1080', 'Intel Core i5 7200U 2.5GHz', '8GB', '256GB SSD', 'Intel HD Graphics 620', 'No OS', '1.86kg', '575']
None


### Performance time comparison
Comparing the time it takes to run each of the two functions, for looking up an Id

In [13]:
# Compare the time to run each of the 2 methods
import time, random

# generate 10,000 random number strings from 1 million to 9.99 million
ids = [str(random.randint(1000000, 9999999)) for _ in range(10000)]

inventory = Inventory("laptops.csv")
total_time_no_dict = 0
total_time_dict = 0

for num in ids:
    start = time.time()
    inventory.get_laptop_from_id(num)
    end = time.time()
    total_time_no_dict += (end - start)
print("The total time, with no dictionary:", total_time_no_dict)

for num in ids:
    start = time.time()
    inventory.get_laptop_from_id_fast(num)
    end = time.time()
    total_time_dict += (end - start)
print("The total time, with a dictionary:", total_time_dict)

The total time, with no dictionary: 0.9913640022277832
The total time, with a dictionary: 0.005011796951293945


### Analysis
We can see a significant differance between the two runtimes.

    The total time, with no dictionary: 0.9739198684692383
    The total time, with a dictionary: 0.004485130310058594
    
Having a dictionary added is roughly 200 times faster.

In [6]:
# class v1.3
# Add on, function check_promotion_dollars
# get_laptop_from_id() method
class Inventory():
    def __init__(self, csv_filename):
        with open(csv_filename) as file:
            reader = csv.reader(file)
            rows = list(reader)
        self.header = rows[0]
        self.rows = rows[1:]
        # Change row[12] the prices to integers
        for row in self.rows:
            row[12] = int(row[12])
        # added to add the Id numbers to a list, to search faster, instead
        # of searching an entire row, just the dictionary needs to be searched
        self.id_to_row = {}
        for row in self.rows:
            self.id_to_row[row[0]] = row
            
    # method to look up a laptop from the Id, column[0]
    def get_laptop_from_id(self, laptop_id):
        for row in self.rows:
            if row[0] == laptop_id:
                return row
        return None
    
    # a faster method to look up a laptop from the Id, column[0]
    def get_laptop_from_id_fast(self, laptop_id):
        if laptop_id in self.id_to_row:
            return self.id_to_row[laptop_id]
        return None
    
    # A function that, given a dollar amount, checks whether it is possible
    # to spend precisely that amount by purchasing one or two laptops            
    def check_promotion_dollars(self, dollars):
        for row in self.rows:
            if row[12] == dollars: # Check if price = dollars
                return True
        for row1 in self.rows: # Set a row to row1
            for row2 in self.rows: # Set another row to row2
                if row1[12] + row2[12] == dollars: # Check if sum = dollars
                    return True
        return False

inventory = Inventory("laptops.csv")
       
print(inventory.check_promotion_dollars(1000))
print(inventory.check_promotion_dollars(442))
print(inventory.check_promotion_dollars(450))

True
False
True


In [7]:
# class v1.4
# Add on, implentation to preprocess data to allow the code to run faster
# a faster version of check_promotion_dollars
class Inventory():
    def __init__(self, csv_filename):
        with open(csv_filename) as file:
            reader = csv.reader(file)
            rows = list(reader)
        self.header = rows[0]
        self.rows = rows[1:]
        # Change row[12] the prices to integers
        for row in self.rows:
            row[12] = int(row[12])
        # added to add the Id numbers to a list, to search faster, instead
        # of searching an entire row, just the dictionary needs to be searched
        self.id_to_row = {}
        for row in self.rows:
            self.id_to_row[row[0]] = row
        self.prices = set()
        for row in self.rows:
            self.prices.add(row[12])
            
    # method to look up a laptop from the Id, column[0]
    def get_laptop_from_id(self, laptop_id):
        for row in self.rows:
            if row[0] == laptop_id:
                return row
        return None
    
    # a faster method to look up a laptop from the Id, column[0]
    def get_laptop_from_id_fast(self, laptop_id):
        if laptop_id in self.id_to_row:
            return self.id_to_row[laptop_id]
        return None
    
    # A function that, given a dollar amount, checks whether it is possible
    # to spend precisely that amount by purchasing one or two laptops            
    def check_promotion_dollars(self, dollars):
        for row in self.rows:
            if row[12] == dollars: # Check if price = dollars
                return True
        for row1 in self.rows: # Set a row to row1
            for row2 in self.rows: # Set another row to row2
                if row1[12] + row2[12] == dollars: # Check if sum = dollars
                    return True
        return False
    
    # A faster version of check_promotion_dollars           
    def check_promotion_dollars_fast(self, dollars):
        if dollars in self.prices:  # Check if dollars ammount in prices
            return True
        for price in self.prices:
            if dollars - price in self.prices:
                return True
        return False

inventory = Inventory("laptops.csv")
       
print(inventory.check_promotion_dollars_fast(1000))
print(inventory.check_promotion_dollars_fast(442))
print(inventory.check_promotion_dollars_fast(450))

True
False
True


### Performance time comparison
Comparing the time it takes to run each of the two functions, for checking the prices

In [11]:
# Compare the time to run each of the 2 methods
# generate 100 random integers from 100 to 5000
prices = [random.randint(100, 5000) for _ in range(100)]

inventory = Inventory("laptops.csv")
total_time_no_set = 0
total_time_set = 0

for num in prices:
    start = time.time()
    inventory.check_promotion_dollars(num)
    end = time.time()
    total_time_no_set += (end - start)
print("The total time, with no set:", total_time_no_set)

for num in prices:
    start = time.time()
    inventory.check_promotion_dollars_fast(num)
    end = time.time()
    total_time_set += (end - start)
print("The total time, with a set:", total_time_set)

The total time, with no set: 1.4262301921844482
The total time, with a set: 0.000553131103515625


### Analysis
We can see a significant differance between the two runtimes.

    The total time, with no set: 1.4262301921844482
    The total time, with a set: 0.000553131103515625
    
Having a set added is roughly 2,600 times faster.

## Finding Laptops within a Budget

We want to write a method that efficiently answers the query: Given a budget of D dollars, find all laptops whose price is, at most, D.

In [9]:
# class v1.5
# Add on, A function that finds laptops given a price
# A method that sorts the rows by price
def row_price(row):
    return row[12]

class Inventory():
    def __init__(self, csv_filename):
        with open(csv_filename) as file:
            reader = csv.reader(file)
            rows = list(reader)
        self.header = rows[0]
        self.rows = rows[1:]
        # Change row[12] the prices to integers
        for row in self.rows:
            row[12] = int(row[12])
        # added to add the Id numbers to a list, to search faster, instead
        # of searching an entire row, just the dictionary needs to be searched
        self.id_to_row = {}
        for row in self.rows:
            self.id_to_row[row[0]] = row
        self.prices = set()
        for row in self.rows:
            self.prices.add(row[12])
        # A method that sorts the rows by price
        self.rows_by_price = sorted(self.rows, key=row_price)
            
    # method to look up a laptop from the Id, column[0]
    def get_laptop_from_id(self, laptop_id):
        for row in self.rows:
            if row[0] == laptop_id:
                return row
        return None
    
    # a faster method to look up a laptop from the Id, column[0]
    def get_laptop_from_id_fast(self, laptop_id):
        if laptop_id in self.id_to_row:
            return self.id_to_row[laptop_id]
        return None
    
    # A function that, given a dollar amount, checks whether it is possible
    # to spend precisely that amount by purchasing one or two laptops
    def check_promotion_dollars(self, dollars):
        for row in self.rows:
            if row[12] == dollars: # Check if price = dollars
                return True
        for row1 in self.rows: # Set a row to row1
            for row2 in self.rows: # Set another row to row2
                if row1[12] + row2[12] == dollars: # Check if sum = dollars
                    return True
        return False
    
    # A faster version of check_promotion_dollars
    def check_promotion_dollars_fast(self, dollars):
        if dollars in self.prices:  # Check if dollars ammount in prices
            return True
        for price in self.prices:
            if dollars - price in self.prices:
                return True
        return False

    # A function to find laptop with the entered price
    def find_laptop_with_price(self, target_price):
        range_start = 0
        range_end = len(self.rows_by_price) - 1
        while range_start < range_end:
            range_middle = (range_end + range_start) // 2
            price = self.rows_by_price[range_middle][12]
            if price == target_price:
                return range_middle
            elif price < target_price:
                range_start = range_middle + 1
            else:
                range_end = range_middle - 1
        price = self.rows_by_price[range_start][12]
        if price != target_price:
            return -1
        return range_start
    
    # A function to find the more expensive laptops
    def find_first_laptop_more_expensive(self, target_price):
        range_start = 0
        range_end = len(self.rows_by_price) - 1
        while range_start < range_end:
            range_middle = (range_end + range_start) // 2
            price = self.rows_by_price[range_middle][12]
            if price > target_price:
                range_end = range_middle
            else:
                range_start = range_middle + 1
        if self.rows_by_price[range_start][12] <= target_price:
            return -1
        return range_start
    
inventory = Inventory("laptops.csv")

# print(inventory.find_laptop_with_price(500))
# print(inventory.find_first_laptop_more_expensive(500))
# print(inventory.find_first_laptop_more_expensive(1000))
# print(inventory.find_first_laptop_more_expensive(10000))
#print(inventory.rows_by_price[245])
#print(inventory.rows_by_price[246])
#print(inventory.rows_by_price[683])
bold = '\033[1m'
unbold = '\033[0m'
def find_laptop_by_price(price):
    x = inventory.find_laptop_with_price(price)
    if x == -1:
        print(bold,"There were no laptops found that cost $", price, unbold)
    else:
        laptop = inventory.rows_by_price[x]
        print(bold,"This is a laptop that costs $", price, unbold)
        print('Company & Product: {} - {} / Screen size : {}"\nScreen resolution : {} / CPU : {}\nRAM : {} / Memory : {} / GPU : {}\nOpSys : {} / Weight : {} / Price : {}\n'.format(laptop[1],laptop[2],laptop[4],laptop[5],laptop[6],laptop[7],laptop[8],laptop[9],laptop[10],laptop[11],laptop[12]))
        #print("This is a laptop that costs $", price, ":\n", inventory.rows_by_price[x])

def find_laptop_more_than(price, length_of_list):
    x = inventory.find_first_laptop_more_expensive(price)
    if x == -1:
        print(bold,"There were no laptops found that cost over $", price, unbold)
    else:
        # Create a list to display more than one laptop
        laptop_list = []
        price1 = price
        while len(laptop_list) < length_of_list:
            x = inventory.find_first_laptop_more_expensive(price1)
            if x not in laptop_list:
                laptop_list.append(x)
            price1 += 1
        print(bold,"Here are", length_of_list, "laptops, that cost over $", price, unbold)
        for row in laptop_list:
            laptop = inventory.rows_by_price[row]
            print('Company & Product: {} - {} / Screen size : {}"\nScreen resolution : {} / CPU : {}\nRAM : {} / Memory : {} / GPU : {}\nOpSys : {} / Weight : {} / Price : {}\n'.format(laptop[1],laptop[2],laptop[4],laptop[5],laptop[6],laptop[7],laptop[8],laptop[9],laptop[10],laptop[11],laptop[12]))
            #print(inventory.rows_by_price[row])
        #print("This is a laptop that costs more than $", price, ":\n", inventory.rows_by_price[x])

find_laptop_by_price(500)
find_laptop_by_price(1000)
find_laptop_more_than(500,3)
print("")
find_laptop_more_than(1000,3)
print("")
find_laptop_more_than(10000,1)

[1m This is a laptop that costs $ 500 [0m
Company & Product: HP - 250 G5 / Screen size : 15.6"
Screen resolution : 1366x768 / CPU : Intel Pentium Quad Core N3710 1.6GHz
RAM : 4GB / Memory : 1TB HDD / GPU : Intel HD Graphics 405
OpSys : Windows 10 / Weight : 1.96kg / Price : 500

[1m This is a laptop that costs $ 1000 [0m
Company & Product: HP - EliteBook 840 / Screen size : 14"
Screen resolution : Full HD 1920x1080 / CPU : Intel Core i5 6200U 2.3GHz
RAM : 4GB / Memory : 500GB HDD / GPU : Intel HD Graphics 520
OpSys : Windows 10 / Weight : 1.54kg / Price : 1000

[1m Here are 3 laptops, that cost over $ 500 [0m
Company & Product: Asus - VivoBook S14 / Screen size : 14"
Screen resolution : 1366x768 / CPU : Intel Core i3 7100U 2.4GHz
RAM : 4GB / Memory : 128GB SSD / GPU : Intel HD Graphics 620
OpSys : Windows 10 / Weight : 1.3kg / Price : 509

Company & Product: Lenovo - IdeaPad 320-15IKBN / Screen size : 15.6"
Screen resolution : Full HD 1920x1080 / CPU : Intel Core i5 7200U 2.5GHz


### Final Review

We have implemented a class to represent the inventory of a laptop shop, and created a method for searching the lists by the laptop id, price, and if there are atleast two laptops whose total price is more than the given ammount of money (for giftcards or promotions).

We've also learned that we can answer business questions more efficiently by preprocessing the data.