Honey For Groceries - For the Budget Consumer
Looking for UI/UX Designer
This document lays out a project plan for the development of Honey For Groceries for year 2019 with the project team Honey For Groceries under DivE - UCSD. The intended audience for this documents are members of the UCSD community who are either interested in joining the project or supporting the project in any other form.
The plan laid out in this document will include an overview of the application use cases, a technical summary of the system components and design, product delivery estimates, and foreseeable issues. It will also include a description of the responsibilities of the subteams that make up the project team.
College students have a stereotype of being broke due to crippling student loans, and some of us have resorted to meticulously tracking grocery prices in an attempt to save a penny or two here and there. Honey For Groceries is an attempt to make that effort easier.
1.3 Project Scope
The purpose of Honey For Groceries is to streamline the grocery shopping experience for the budget conscious consumer. The consumer should be able to input their shopping list along with their shopping location preferences and the application will tell them which stores to go to minimize expenses. Users should also be able contribute to the community by scanning grocery price tags to inform other consumers of the newest deals.
1.4 Use Cases
User should be able to
- Input shopping list and receive grocery store as response.
- Scan product (price tag / barcode) to upload price.
- Reuse previous shopping lists.
- See a variety of choices of shopping locations and their price differences.
2. Technical Specifications
This application will be released as an iOS application. There are currently no plans to release the application on Web/Android, but the backend of the project is setup so that such platform migrations will require minimal work other than building the frontend.
2.2 Tech Stack
Overall architecture follows Service Oriented Architecture from client side to backend. This allows for decoupling of services, as we anticipate possible future changes of the DB. This puts more pressure on minimizing dependencies and achieving maximum service decoupling and code modularity.
Backend is hosted on AWS with mLab as choice of DBaaS and MongoDB as DB. API is hosted on EC2 instance and AWS Auth is used for handling user authentication. API written in Go with httpRouter as choice of framework.
Client-side adopts MVVM architecture and uses in-house networking stack built with Alamofire, PromiseKit, and SwiftyJSON. Foursquare API is used to access user location information and Barcode API is used to retrieve product information. Apple Vision API and SwiftOCR are used to provide OCRService to client for scanning functionality. PureLayout is chosen to handle autolayout.
Data backend uses Scrapy as choice of web crawling framework and it is to be hosted on AWS EC2. As crawling scales up, we anticipate having a distributed crawler using RabbitMQ and Celery to distribute jobs to Docker containers setup with a rotating proxy to avoid ip bans.
Choice of database is MongoDB. Possible databases for future considerations include:
- Cassadra: High scalabiltiy because of distributed nature and supports high number of writes.
- PostgreSQL: Improved stability and performance at larger scale.
insert db schema
Firebase was our initial db choice as it provided a real time DB with seemingly high scalability. However, we did not enjoy its implementation with a singleton listener. Additionally, we wanted to gain more experience with backend development.
insert API schema
Go was chose over Python for its built-in support for concurrency and blazing fast near C/C++ performance, as we anticpate high amounts of DB writes from the web crawler.
The Webcrawler portion will serve as the major data source for the application. It will crawl grocery store websites and retrieve prices along with product names to populate database. For 2019, we are limiting the scope of the web crawler to Vons, Ralphs, and Costco in La Jolla, CA.
To preprocess the data before insertion into database, we have a classifier that classifies products into categories pregenerated by a hierarchical clustering model. Doing so allows the user to either insert "Eggs" or "Lucerne Extra Large Brown Eggs 1 Dozen" into their shopping list.
The most notable part about client side development is the choice to move away from storyboards and have all UI generated via code. Such a choice was made due to issues with performance, code reuse, and workflow(merge conflicts).
The networking layer consists of creating a Request, running it against an APIService, and receiving a Response in return. Providing a central point of all outbound http requests from the client allows for easy extension for logging metrics.
Please refer to milestones for detailed timelines.
We expect to have working product that addresses above use cases with support for La Jolla grocery stores by Sept 2019.
Below are the subteams that make up the project team.
- Data Engineering
- Client - Frontend
- Client - Backend
The data engineering team is responsible for gathering and processing data. They are responsible for the whole pipeline from webscraping the prices to categorizing the items. Work entirely in Python.
- Peter Tran
- Vivian Wei
Client - Frontend
The client frontend team is responsible for building out the UI for the user to interact with. They work with the Client - Backend team to ensure smooth integration between frontend and backend.
- Paul Pan
- Ryan Xu
Client - Backend
The client backend team is responsible for the client side business logic - think API calls and models. They serve as the middle man between the Backend team and the Client - Frontend team to ensure that data from the backend database makes its way onto the user's screen.
- Thomas Tang
- Godwin Pang
- Paul Pan
- Ryan Xu
- Dhruv Chaddah
The backend team is responsible for schema design and implementation of DB layer and API layer.
- Thomas Tang
- Godwin Pang
There are still many issues that we need to sort out if we want this project to be successful.
Technical: We will use both scanning and web-scraping to ensure that information in our database is accurate and up-to-date. In this portion alone I can see 2 issues that we need to overcome.
- If the store does not update their website daily, how do we keep our database updated in the beginning when we have a low user rate. (Low user rate = less scans).
- Which data has a higher precedence? The scanned data or the scraped data? For example a user goes into a store and scans an item that is on sale only on that day. How would we know to overwrite this data with the scraped data tomorrow? (hmm maybe we could scan the experation date and create a model to approximate the longevity of the sale based on the expiration date)
Non-Technical: How do we create an environment where the user is incentivized enough to scan the products?
- Leecher/Seeder problem in torrenting. What is the incentive in being a seeder?
What is our plan for gathering a large user base in one swoop? This app is only useful(accurate) if we have many users.
- Inventor of the telephone problem. Who does the inventor call? Very lonely and not very useful when you can't call anyone. Now imagine the first user of this app. The only scanned data is his.
Are price discrepancies between grocery stores large enough for the user to use this app?
- if we are comparing specific items between grocery stores why would one store charge a higher price for that item and risk losing customers?
- Ex: Cheerios at Ralphs for $3 and Cheerios at Vons for $8 would never happen. Generally these prices are equal.
- If the above case is true then we need to consider coupons. Stores may charge the same price but they generally give different deals on different days.
- Ralphs has coupons online where the user can clip coupons to their Rewards card. Can we clip these coupons for them? (technical problem)