BitVision is a Python-based computer vision app that allows users to record actions (3D poses) on video using Mediapipe and map them to keyboard inputs. Training data is associated with actions using random forest classification. During live recording, recorded video frames are processed against the random forest model, generating a set of key presses to perform according to the action being performed.
24.03.2024_12.44.00_REC.mov
- Clone this repository.
- Setup virtual environment and install dependencies according to
INSTALLATIONS.md
. - Install Mediapipe. Follow instructions at https://developers.google.com/mediapipe/solutions/setup_python.
- In DataGenerator.py, set the data file output for the specific action to be recorded on line 70.
- In terminal, navigate to the
/predict
directory and runpython DataGenerator.py {action_class}
. This will start the webcam and begin recording mediapipe data for the specified action. - The output data .csv file will be stored in the
/train/training_data
directory.
- Go to the
/train
directory. - Run
python ModelGenerator.py {model_name}
. This will append the action (training data file name) as the class type for the associated data and then concatenate all training data into one file. - The trained model will be saved to
/models
as a pickled model,{model_name}.pkl
.
- Go to the project root directory.
- Run
python main.py
. This will start the webcam video capture and begin generating key inputs according to the trained model and controller inputs specified in theController
module'scontrol_scheme
.