My given theme was whether or not a product will be in stock, however this application can be used on any dataset that is given in the same format csv
file.
It takes in the csv
file and allows the user to train the predictor based on the data, add custom rows to the dataset and retrain the predictor, make a prediction as to whether or not an item will be in stock (different if another dataset is provided) based on the 4 features, and finally it allows the user to test the predictors accuracy by splitting the data into training and test data.
DemandLevel | SellingSpeed | RestockFrequency | SupplierReliability | no | yes | Grand Total | % InStock (yes) |
---|---|---|---|---|---|---|---|
High | Fast | Frequent | Reliable | 6 | 7 | 13 | 54% |
High | Fast | Frequent | Unreliable | 7 | 5 | 12 | 42% |
High | Fast | Rare | Reliable | 5 | 4 | 9 | 44% |
High | Fast | Rare | Unreliable | 9 | 9 | 18 | 50% |
High | Slow | Frequent | Reliable | 7 | 8 | 15 | 53% |
High | Slow | Frequent | Unreliable | 11 | 6 | 17 | 35% |
High | Slow | Rare | Reliable | 4 | 5 | 9 | 56% |
High | Slow | Rare | Unreliable | 3 | 11 | 14 | 79% |
Low | Fast | Frequent | Reliable | 2 | 6 | 8 | 75% |
Low | Fast | Frequent | Unreliable | 4 | 3 | 7 | 43% |
Low | Fast | Rare | Reliable | 11 | 2 | 13 | 15% |
Low | Fast | Rare | Unreliable | 8 | 6 | 14 | 43% |
Low | Slow | Frequent | Reliable | 8 | 9 | 17 | 53% |
Low | Slow | Frequent | Unreliable | 5 | 5 | 10 | 50% |
Low | Slow | Rare | Reliable | 7 | 3 | 10 | 30% |
Low | Slow | Rare | Unreliable | 5 | 9 | 14 | 64% |
Total | 102 | 98 | 200 |
Note: The frequency table for the data (generated by ChatGPT
) can also be found as a Pivot Table in the ProductIsInStock(Excel).xlsx
file.
My application is split into 5 classes:
- Control
- FileHandler
- DataHandler
- DataItems
- Screen
Starts the application, has the main
method that instantiates the Screen
class.
Simple class for handing the reading/parsing of csv
files.
This class does a lot of the heavy lifting of the application. This class contains methods for training the predictor based on the data provided by the FileHandler
class, generating frequency tables for the data, and testing the accuracy of the predictor by splitting the data into 150 lines of training data and 50 lines of test data (stratified as to keep the same ratio of yes/no).
Simple class for allowing the DataHandler
class to instantiate data items as objects, with their four features and one label as attributes.
GUI class for the application. Handles all visual elements such as buttons and text boxes, and instantiates the FileHandler
and DataHandler
classes to implement functionality.
When the user opens the application, they will be greeted with a screen that includes a number of buttons, textboxes and labels. At the top right hand of the screen is a label indicating hat no file has been selected. The Select Training Data
button once clicked allows the user to navigate a file explorer and choose a csv
file from which to load their data.
Once the data is loaded, the user is presented with a number of options. The predictor can be trained with the provided data by clicking the Train
button. New rows can be added by filling in the text boxes labelled with the data's features, clicking the desired label (yes/no) and clicking the Add Row
button.
Once a row has been added, the user can retrain the data with these new rows by clicking the Train
button once again.
Once the user has filled the text boxes with the desired features and trained the predictor, they can click the Predict
button, which will give a prediction (yes/no) with the confidence (%).
If the predictor has been trained, the user can also click the Test Accuracy
button, which will split the data into 2 sets, 150 rows of (stratified) training data and 50 of test data, and will test the 50 predictions. Based on this, it will display the accuracy of the model.
If given more time, I would improve the GUI by adding multiple screens, rather than have everything crammed into one screen. I would also improve the accuracy test by having the stratified data be shuffled beforehand, to ensure more fair testing. I also would've improved some of the funcionality in the DataHandler
class to reduce reusing certain aspects of the code.