# Inventory Analysis for Alyssa's Online Laptop Store

In this project, the aim is to answer several questions regarding the store's inventory. A class is used whereby its method will implement queries that answer questions like:

1. Given a laptop ID, find the corresponding data
2. Find whether there are 2 laptops whose total price is equivalent to a given amount
3. Identify all laptops whose price fall within a given budget

The data set used here have modified laptop IDs and the prices are made to be in integers. The original data set can be found on [Kaggle](https://www.kaggle.com/ionaskel/laptop-prices). 

# Introduction

The `csv` module is used to read `laptops.csv` inventory data file. The header and the rest of the file are split into `header` and `rows`.

In [1]:
import csv

# Open file with context manager
with open('laptops.csv') as laptop:
    reader = csv.reader(laptop)
    rows = list(reader)
    header = rows[0]
    rows = rows[1:]
    
print(header)
for i in range(5):
    print(rows[i])

['Id', 'Company', 'Product', 'TypeName', 'Inches', 'ScreenResolution', 'Cpu', 'Ram', 'Memory', 'Gpu', 'OpSys', 'Weight', 'Price']
['6571244', 'Apple', 'MacBook Pro', 'Ultrabook', '13.3', 'IPS Panel Retina Display 2560x1600', 'Intel Core i5 2.3GHz', '8GB', '128GB SSD', 'Intel Iris Plus Graphics 640', 'macOS', '1.37kg', '1339']
['7287764', 'Apple', 'Macbook Air', 'Ultrabook', '13.3', '1440x900', 'Intel Core i5 1.8GHz', '8GB', '128GB Flash Storage', 'Intel HD Graphics 6000', 'macOS', '1.34kg', '898']
['3362737', 'HP', '250 G6', 'Notebook', '15.6', 'Full HD 1920x1080', 'Intel Core i5 7200U 2.5GHz', '8GB', '256GB SSD', 'Intel HD Graphics 620', 'No OS', '1.86kg', '575']
['9722156', 'Apple', 'MacBook Pro', 'Ultrabook', '15.4', 'IPS Panel Retina Display 2880x1800', 'Intel Core i7 2.7GHz', '16GB', '512GB SSD', 'AMD Radeon Pro 455', 'macOS', '1.83kg', '2537']
['8550527', 'Apple', 'MacBook Pro', 'Ultrabook', '13.3', 'IPS Panel Retina Display 2560x1600', 'Intel Core i5 3.1GHz', '8GB', '256GB SSD',

# Class Implementation

Add some description

In [2]:
class Inventory():
    
    def __init__(self, csv_filename):
        with open(csv_filename) as laptop:
            reader = csv.reader(laptop)
            rows = list(reader)
        self.header = rows[0]
        self.rows = rows[1:]
        for row in self.rows:
            # alternative index number is -1
            # solution: assigns back to same var row[12]. if new var used, it'll occupy new space in memory
            row[12] = int(row[12])
            
inventory = Inventory('laptops.csv')
print(inventory.header)
print(len(inventory.rows))

['Id', 'Company', 'Product', 'TypeName', 'Inches', 'ScreenResolution', 'Cpu', 'Ram', 'Memory', 'Gpu', 'OpSys', 'Weight', 'Price']
1303


# Finding a Laptop from the ID

**TO EDIT** An identification number is used to look up details of the laptop purchased by customer. 

In [3]:
class Inventory():
    
    def __init__(self, csv_filename):
        with open(csv_filename) as laptop:
            reader = csv.reader(laptop)
            rows = list(reader)
        self.header = rows[0]
        self.rows = rows[1:]
        for row in self.rows:
            # alternative index number is -1
            # solution: assigns back to same var row[12]. if new var used, it'll occupy new space in memory
            row[12] = int(row[12])
            
    def get_laptop_from_id(self, laptop_id):
        for row in self.rows:
            if row[0] == laptop_id:
                return row
        return None
                
inventory = Inventory('laptops.csv')
print(inventory.get_laptop_from_id('3362737'))
print(inventory.get_laptop_from_id('3362736'))

['3362737', 'HP', '250 G6', 'Notebook', '15.6', 'Full HD 1920x1080', 'Intel Core i5 7200U 2.5GHz', '8GB', '256GB SSD', 'Intel HD Graphics 620', 'No OS', '1.86kg', 575]
None


# Improving ID Lookups

**TO EDIT** Previous method requires checking every row to find out whether or not a row exists - has *O(R)* complexity. To improve, data can be preprocessed. Usually set can be used to know if a row exists. However, in this case, we also want to retrieve the rest of information in the said row; hence dictionary is used as it allows us to associate values to the keys (`Id`). 

In [4]:
class Inventory():
    
    def __init__(self, csv_filename):
        with open(csv_filename) as laptop:
            reader = csv.reader(laptop)
            rows = list(reader)
        self.header = rows[0]
        self.rows = rows[1:]
        for row in self.rows:
            # alternative index number is -1
            # solution: assigns back to same var row[12]. if new var used, it'll occupy new space in memory
            row[12] = int(row[12])
        self.id_to_row = {}
        for row in self.rows:
            # assign 'row' to dictionary (includes the first column 'Id')
            self.id_to_row[row[0]] = row
            
    def get_laptop_from_id(self, laptop_id):
        for row in self.rows:
            if row[0] == laptop_id:
                return row
        return None
            
    def get_laptop_from_id_fast(self, laptop_id):
        if laptop_id in self.id_to_row:
                return self.id_to_row[laptop_id]
        return None
                
inventory = Inventory('laptops.csv')
print(inventory.get_laptop_from_id_fast('3362737'))
print(inventory.get_laptop_from_id_fast('3362736'))

['3362737', 'HP', '250 G6', 'Notebook', '15.6', 'Full HD 1920x1080', 'Intel Core i5 7200U 2.5GHz', '8GB', '256GB SSD', 'Intel HD Graphics 620', 'No OS', '1.86kg', 575]
None


# Comparing the Performance

**TO EDIT** To compare the performance of two methods `get_laptop_from_id()` and `get_laptop_from_id_fast()`, a random ID is generated and passed as arguments to both methods. The `time` module is used to measure the execution time of each lookup and for each method, total time taken is calculated.

In [5]:
import time
import random

ids = str([random.randint(1000000, 9999999) for _ in range(10000) ])

inventory = Inventory('laptops.csv')

total_time_no_dict = 0

for rand_id in ids:
    start = time.time()
    inventory.get_laptop_from_id(rand_id)
    end = time.time()
    total_time_no_dict += end - start

total_time_dict = 0

for rand_id in ids:
    start = time.time()
    inventory.get_laptop_from_id_fast(rand_id)
    end = time.time()
    total_time_dict += end - start

print(total_time_no_dict)
print(total_time_dict)

14.522733688354492
0.07179927825927734


# Analysis

The total time for the two methods are:
* Without dictionary: 9.476078987121582
* With dictionary: 0.020602703094482422

There is a major improvement in the time taken to lookup the laptop data. If we divide *9.476* by *0.021*, the second method is about 451 times faster for this input size.

# Two Laptop Promotion

**TO EDIT** A gift card is given for promotion. Function below is to calculate if customer can spend precisely the gift card amount to purchase at most two laptops. 

In [11]:
class Inventory():
    
    def __init__(self, csv_filename):
        with open(csv_filename) as laptop:
            reader = csv.reader(laptop)
            rows = list(reader)
        self.header = rows[0]
        self.rows = rows[1:]
        for row in self.rows:
            # alternative index number is -1
            # solution: assigns back to same var row[12]. if new var used, it'll occupy new space in memory
            row[12] = int(row[12])
        
        self.id_to_row = {}
        for row in self.rows:
            # assign 'row' to dictionary (includes the first column 'Id')
            self.id_to_row[row[0]] = row
            
    def get_laptop_from_id(self, laptop_id):
        for row in self.rows:
            if row[0] == laptop_id:
                return row
        return None
            
    def get_laptop_from_id_fast(self, laptop_id):
        if laptop_id in self.id_to_row:
                return self.id_to_row[laptop_id]
        return None
    
    def check_promotion_dollars(self, dollars):
        for row in self.rows:
            if row[12] == dollars:
                return True
            
        for i in self.rows:
            for j in self.rows:
                if i[12] + j[12] == dollars:
                    return True
        
        return False
                
inventory = Inventory('laptops.csv')
print(inventory.check_promotion_dollars(1000))
print(inventory.check_promotion_dollars(442))

True
False


# Optimizing Laptop Promotion

**TO EDIT** The intention of the `check_promotion_dollars()` method is to check whether or not a customer can spend the exact amount in his gift card to purchase at most two laptops; no additional information needs to be displayed. Hence, the laptop prices can be extracted and stored in a set to make the query run faster.

In [15]:
class Inventory():
    
    def __init__(self, csv_filename):
        with open(csv_filename) as laptop:
            reader = csv.reader(laptop)
            rows = list(reader)
        self.header = rows[0]
        self.rows = rows[1:]
        for row in self.rows:
            # alternative index number is -1
            # solution: assigns back to same var row[12]. if new var used, it'll occupy new space in memory
            row[12] = int(row[12])
        
        self.id_to_row = {}
        for row in self.rows:
            # assign 'row' to dictionary (includes the first column 'Id')
            self.id_to_row[row[0]] = row
            
        self.prices = set()
        for row in self.rows:
            self.prices.add(row[12])
            
    def get_laptop_from_id(self, laptop_id):
        for row in self.rows:
            if row[0] == laptop_id:
                return row
        return None
            
    def get_laptop_from_id_fast(self, laptop_id):
        if laptop_id in self.id_to_row:
                return self.id_to_row[laptop_id]
        return None
    
    def check_promotion_dollars(self, dollars):
        for row in self.rows:
            if row[12] == dollars:
                return True
            
        for i in self.rows:
            for j in self.rows:
                if i[12] + j[12] == dollars:
                    return True
        
        return False
    
    def check_promotion_dollars_fast(self, dollars):
        if dollars in self.prices:
            return True
        
        for price1 in self.prices:
            # no double for loop is required / time complexity
            #for price2 in self.prices: 
                #if price1 + price2 == dollars:
            if dollars - price1 in self.prices:
                    return True
                
        return False

inventory = Inventory('laptops.csv')
print(inventory.check_promotion_dollars_fast(1000))
print(inventory.check_promotion_dollars_fast(442))

True
False


# Comparing Promotion Functions

**TO EDIT** The performance of the last two functions `check_promotion_dollars()` and `check_promotion_dollars_fast()` are compared.

In [19]:
prices = [random.randint(100, 5000) for _ in range(100) ]

inventory = Inventory('laptops.csv')

total_time_no_set = 0

for price in prices:
    start = time.time()
    inventory.check_promotion_dollars(price)
    end = time.time()
    total_time_no_set += end - start
    
total_time_set = 0

for price in prices:
    start = time.time()
    inventory.check_promotion_dollars_fast(price)
    end = time.time()
    total_time_set += end - start
    
print(total_time_no_set)
print(total_time_set)

1.5784986019134521
0.0009987354278564453


# Analysis

The total time for the two methods are:
* Without set: 1.5784986019134521
* With set: 0.0009987354278564453

There is a major improvement in the time taken to lookup the laptop prices using a set. If we divide *1.5785* by *0.0009*, the second method is about 1753 times faster for this input size.