This guide explains in very simple language how to download, set up, and run your Jersey Pattern Matcher project on a Windows computer. It also explains how to keep your large catalogue (15,000 images) fast by using an NVIDIA GPU with CUDA, and what to do when you add or change catalogue images.
Important files and folders in this project:
app.py– starts the dashboard (Streamlit app).feature_extract.py– builds the search index from your catalogue images.index/– stores the index files created byfeature_extract.py:index/vector.indexindex/vector.index.paths.txt
catalogue101/– put your 15,000+ catalogue images here.uploads101/– every image you upload in the dashboard gets saved here.models/– contains model files (e.g.,models/deepfashion2_yolov8s-seg.ptfor YOLO).requirement.txt– list of Python packages to install.
Tip: All paths above are relative to your project folder.
- A Windows PC. For best speed with large catalogues (15,000 images), an NVIDIA GPU is recommended.
- Admin rights on your PC so you can install software.
Install these:
-
Git
- Download from: https://git-scm.com/download/win
- Click through the installer with default options.
-
Python 3.10+ (64-bit)
- Download from: https://www.python.org/downloads/windows/
- During install: check “Add Python to PATH”.
-
NVIDIA GPU (optional but strongly recommended for 15,000 images)
- Make sure you have an NVIDIA graphics card.
- Install the latest NVIDIA GPU Driver:
-
PyTorch with CUDA (GPU acceleration)
- If you have an NVIDIA GPU, you need the CUDA version of PyTorch.
- We will install this in Step 4 below.
- If you don’t have a GPU, you can still run on CPU, but it will be slower.
- Open “Windows PowerShell”.
- Choose a folder (like
Documents) where you want the project to live. - Run:
git clone <YOUR_GITHUB_REPO_URL>Replace <YOUR_GITHUB_REPO_URL> with your repository URL.
- Go into the project folder (in PowerShell):
ls
# Find your cloned folder name, then:
# cd .\pattern_feature_matching_engine\Note: In these instructions, we’ll refer to this folder as your “project folder”.
- Create a virtual environment named
venv_dino:
python -m venv venv_dino- Activate it:
.\venv_dino\Scripts\activateYou should now see (venv_dino) at the start of your PowerShell prompt.
To deactivate later:
deactivateStay inside your activated virtual environment (venv_dino).
-
If you have an NVIDIA GPU (recommended for 15,000 images):
- Install PyTorch with CUDA 12.1:
pip install --upgrade pip pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
- Verify CUDA is detected:
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
Should print
CUDA available: True. If False, see Troubleshooting at the end. -
Install the rest of the project packages:
pip install -r requirement.txtNote: The file is named requirement.txt (not requirements.txt).
- Create these folders if they don’t exist yet:
catalogue101/– put all your catalogue images here (JPG or PNG). For 15,000 images, consider organizing them in subfolders if you like (the code can scan subfolders).index/– empty to start. This will be filled byfeature_extract.py.uploads101/– the app will create this automatically on first upload, but you can create it manually too.models/– make sure your YOLO model file is present:models/deepfashion2_yolov8s-seg.pt- If you don’t have it, place the correct YOLO weight file there.
The DINO model is downloaded automatically by the Transformers library the first time it runs.
You must create the index before running the dashboard. This uses feature_extract.py to extract features for each catalogue image and store them in index/.
- Run:
(venv_dino) python feature_extract.pyWhat happens:
- The script scans
catalogue101/for images. - It extracts features using DINO (and may use the GPU if available).
- It saves the index files to
index/vector.indexandindex/vector.index.paths.txt.
For 15,000 images, this can take some time (first run). With a GPU it will be much faster than CPU.
You only need to rebuild the index when your catalogue changes (see Step 9).
Once the index files exist in index/, start the Streamlit app:
(venv_dino) python -m streamlit run app.py- Streamlit will print a local URL like:
- Local URL: http://localhost:8501
- Click the link or copy it into your browser.
On the dashboard:
- Click “Upload Your Jersey Image.”
- Select a clear image of a jersey (JPG/PNG).
- The app will:
- Save your uploaded image to
uploads101/(one copy per unique image). - Detect and crop the jersey area using YOLO.
- Extract features using DINO (and some additional texture/color logic if enabled).
- Search for similar patterns in your
index/. - Show the top 15 matches with images and scores.
- Provide a “Download Results” button to save the comparison panel.
- Save your uploaded image to
Tip:
- If the app says it’s using “Basic DINO Features (384D),” that matches your current index dimension. This is expected unless you rebuild the index with a different feature set.
Any time you change images in catalogue101/:
-
Stop the app if it’s running (press Ctrl+C in the PowerShell window running Streamlit).
-
Rebuild the index:
(venv_dino) python feature_extract.py
This updates
index/vector.indexandindex/vector.index.paths.txtso the search reflects your latest catalogue. -
Start the app again:
(venv_dino) python -m streamlit run app.py
- Use an NVIDIA GPU and CUDA-enabled PyTorch (Step 4).
- Close other heavy apps while building the index.
- The first time you run the app, models may download and initialize; later runs are faster.
- Keep your images reasonably sized (standard JPG/PNG; extremely large images slow things down).
Example:
your-project-folder/
app.py
feature_extract.py
requirement.txt
models/
deepfashion2_yolov8s-seg.pt
catalogue101/
<your 15,000 images in subfolders or flat>
index/
vector.index
vector.index.paths.txt
uploads101/
<images you uploaded via the dashboard>
-
The app runs but shows errors about “index”:
- Make sure you ran
python feature_extract.pyand thatindex/vector.indexandindex/vector.index.paths.txtexist.
- Make sure you ran
-
It says the YOLO model file is missing:
- Put
deepfashion2_yolov8s-seg.ptinmodels/and run again.
- Put
-
It’s very slow:
- Use a GPU with CUDA (Step 4).
- Confirm CUDA is detected:
(venv_dino) python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
- If it says
False:- Update NVIDIA drivers.
- Reinstall CUDA PyTorch wheel (Step 4).
- Try the official guide: https://pytorch.org/get-started/locally/
-
“Module not found” errors:
- Make sure your virtual environment is activated and you installed dependencies:
.\venv_dino\Scripts\activate pip install -r requirement.txt
- Make sure your virtual environment is activated and you installed dependencies:
-
“Permission denied” or “access is denied”:
- Close files/folders that are open in other apps.
- Make sure your antivirus is not blocking Python.
-
Can I run without a GPU?
- Yes, but it will be slower, especially when building the index for 15,000 images.
-
Do I need to rebuild the index every time?
- Only when you add/remove/rename images in
catalogue101/.
- Only when you add/remove/rename images in
-
Where are my uploaded images saved?
uploads101/inside the project folder. The app avoids saving duplicates of the same image.
-
Can I organize the catalogue in subfolders?
- Yes. The feature extractor scans subfolders too.
After cloning the repo:
python -m venv venv_dino
.\venv_dino\Scripts\activate
pip install --upgrade pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 # for NVIDIA GPU users
pip install -r requirement.txt
python feature_extract.py
python -m streamlit run app.pyWhen catalogue changes:
(venv_dino) python feature_extract.py
(venv_dino) python -m streamlit run app.pyIf you need help, open an issue on GitHub with screenshots of the error and the commands you ran. We’re happy to assist.