- cd to the working directory "zockerBoy"
- Install the requirements using the following command:
pip install -r requirements.txt
- Run the following command to run the program:
python main.py --i <location of image to be used> --d <device to be used, 0 for cpu, 1 for cuda device>
- There are example images provided inside the "images" folder
- Example Usage:
python main.py --i image\\test_ad.jpg --d 1
- Logo Detection has not been implemented:
- Training was taking too long, and no suitable open-source dataset was found that gave enough accuracy to be worth adding it.
- Good datasets were in the 100's of the Gb's.
- Adding a LLM to the output:
- Adding a LLM would:
- Would make it easier to understand the output.
- Adding a LLM would help to convert hex code into color names.
- Would be helpful for people who are colorblind, would apply neccessary conversions.
- Will help in making it more of interactive experience.
- Adding a LLM would:
- Adding a GUI:
- Can add gradio for a better GUI
- Adding better text post-processing techniques:
- Can use RE with a method of regex to find the text that makes sense and can filter non-sense text and single letter text.
- Started off with Color Palette detection:
- Used color thief
- Then went onto make text overlay detection:
- Tried using pyTesseract natively
- Had issues with implementing it due to bad OCR results
- Used cv2 to turn it into bw (better results than rgb)
- Used CV2 to make boxes around text and compile it
- Used pyTesseract to detect text from those boxes
- Had to mess around with the config to see which method gives most matches based on e-media
- Added thresholding to it make sure only the high % matches get through
- Had to mess around with the threshold to see what works the best
- Tried using pyTesseract natively
- Object Detection:
- Tried using yoloX
- dependancies weren't resolving, wheels weren't building
- Severe python interpreter problems faced, had to re-do path
- tried moondream
- models specified had too varying tensor values, couldn't find a suitable sigmoid loss model for it to fit into
- tried mmdetection
- it's also viable for commercial usage
- had to build gcc/mingw
- had to build mmDetection and it's sub-packages from base to work with cuda 12.1 and my version of cuDNN
- Built core labelling and idenitfication
- Built tracking
- Test tracking on image and videos with boxes made
- Used an instance of yoloX for the same (which is also commercially viable)
- Tried yoloV8
- Faster inference
- Better trained model
- made argparser to catch output
- Checked documentation, turns out they have a thing to for that anyways
- dependancies weren't resolving, wheels weren't building
- Tried using yoloX
- Logo Detection:
- Tried using yolov8:
- Lengthy to convert dataset into yoloV8 format
- Local machine cannot train efficiently enough (3060)
- Bad documentation
- trying to use yoloV3/4/5:
- Tried using yolov8: