Steps for running the code
- In the random_forest.ipynb, there is a data preprocessing code that takes the training and test data and processes it using OpenCV. The processed images are converted to numpy array and are stored along with their labels in a .npz file.
- These test and train.npz files can then be loaded using numpy and the arrays are retrieved when needed. This eliminates the need to preprocess the data repetitively for every model.
- Both the decision trees and the SVM models were trained and tested in the google colab environment. Download the files and add them into a folder on drive. Ensure that the structure of all the subfolders in the drive like Dataset and Checkpoints exactly match with the paths present in the code. Either change the path in the code or create the folders exactly the way they are given.
- Now it is easy to just run every block in the ipynb files. Load the npz train and test data files. Fit the model by changing the hyperparameters as needed.
- Every model on fit gets automatically saved in the checkpoints subfolder with a proper name and number format. Thus, ensure to match the name and number formats while loading the checkpoints back again for testing.
- The random forest model was trained from a local pc, thus either you can use the local pc and save the data files according to the path structure and run the code or if you want to upload it to drive and run it using colab, then ensure that the code for importing the drive and mounting the folder is added to the top of the file.
- The descriptive analysis and image processing files contain various techniques that we expkored to learn about the dataset and its features.