# Aim of This Tutorial
For anyone who want to use our scripts to continue the project, here is the manual of our code. Most of our code can be stoped by pressing 's' on the keyboard.

# Yolov8 Model
This model is to detect the circuit components and return all the needed information such as the names and the positions of the components. We are using the **Roboflow** website combiing with a colab tutorial to train the data.
- **colabtutorial**: https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/train-yolov8-object-detection-on-custom-dataset.ipynb This is the original file of the notebook, in order to use this notebok for training, something should be modified(will explain later)
- **Roboflow**: We use this website to create the training data, by uploading pictures of the circuits and labeling the components. Please contact Joy to add you into the workspace so that you can use the training data created by Joy and Julie. After fully understand how it works, you are encouraged to create your own workspace and your own training data as the outcome really depends on the training image(light condition, image quality and filming angle .etc) and the quality of labeling.

## Colab tutorial:
This tutorial will first show you an example telling you what can this yolov8 model do, feel free to skip. Then it will tell you how to use the **roboflow** to create your own custom dataset. Then you shall see something like this:
```python
!mkdir {HOME}/datasets
%cd {HOME}/datasets

!pip install roboflow --quiet

from roboflow import Roboflow
rf = Roboflow(api_key="YOUR_API_KEY")
project = rf.workspace("roboflow-jvuqo").project("football-players-detection-3zvbc")
dataset = project.version(1).download("yolov8")
```
Should be modified into the following if you are using our dataset:
```python
from roboflow import Roboflow
rf = Roboflow(api_key="kycCPEnB6cQ4clHmnMlE")
project = rf.workspace().project("test_1_with_board")
dataset = project.version(8).download("yolov8") # change the version everytime you modify th edataset
```
If you are not very sure about the current version of the dataset, go to the workspace of the roboflow website and then check with the latest version of **Test_1_with_board Image Dataset**. If you decide to use your own custom dataset, after creating your dataset, go to the *Deploy* page of the website and then copy&paste:

![Alt Text](pictures/tutorial_pic1.png)


Then you can train the model, you can then run the rest of the cell to see the output, but the output might seem to be really bad. That is because one can not delete pictures from the roboflow but only add more pictures or modify the labeling. So we put all the unwanted pictures into the test set in order to keep it away from the training set. Then all you need to do is to deploy the model by running the following cell:
```python
project.version(dataset.version).deploy(model_type="yolov8", model_path=f"{HOME}/runs/detect/train/")
```
Then use the "model_path" to find the file named "best.pt" and download it, usually under the /runs/detect/train/weights. May named in best1.pt or best2.pt. Please make sure you have download the right best weights.

# Roboflow
We only use this website to generate custom dataset. One should note that once you create a dataset, you can not remove pictures from it eventhough you may think that some of the pictures will influence the training outcome in a bad way. So be carefull with your mian custom dataset. One trick that you can use is to allowcate all the bad pictures into the "test set". The website will automatically split the whole data set into 3 parts: training set, validation set and test set and you have the right to modify it.

Training dataset are always very important, so when you take the pictures for trainning your model, make sure that everything (filming angle/height, light condition ...) is under the same condition as in the final experiment.

Also, the way you label the components really matters. That is to say, when you crop the shape of a cirtain component, you should make it as accurate as possible as a little difference on the location will influence the outcome very hugely. You are welcomed to follow our pattern to label them or using your own way. However, the boundary of the components should be very accurate when you are labeling them. 



## Examples of labelling

| Classes | Overview | Overlap | Bad-behaved part | Buzzer |
|:---------:|:------------------:|:-----------------------------------:|:---------:|:---------:|
| <img src="pictures/classes.png" alt="Classes" width="200"/> | <img src="pictures/boundary.png" alt="Overview" width="200"/> | <img src="pictures/overlap.png" alt="Overlap" width="200"/> | <img src="pictures/not well.png" alt="Bad-behaved part" width="200"/> | <img src="pictures/buzzer.png" alt="Buzzer" width="200"/> |


- The first picture represents the classes that we have been defines for this project, which are just the names of the electronic components.
- As,you can see from the second picture, which is an overview of how it is looked like after labelling, the boundaries of the bounding boxes should be as accurate as possible since our code for circuit model is very sensitive to the boundaries in order to get accurate locations.
- It is okay to have the boundaries overlap.
- The bad-behaved part actually means the overlaping of wires. When the wires overlap a lot in a small space, the small part of the wire lies between the two parallel '2-wire' may not be recognized by the yolov8 model, which may lead to a disconnection error of the circuit during detection. 
- Some of the electronic components are of wierd shape, like buzzer, so in the last picture one can see that the bounding box only fits to the longer edges of the buzzer and this actually works well.
- When the accuracy for certain component is not very ideal, you should add more pictures of this components. That is to say, in order to have better validation output, we should increase the training data.

# Speaker Model
Here are 2 parts of the speaker model. One is origin from the github repository: https://github.com/zachlatta/openai-whisper-speaker-identification/blob/main/transcripts_with_speaker_names.ipynb and we modified a litle bit. Another is to use Azure from Microsoft.

## Azure
Before using the Azure, here is a few warnings.

- Create your own account of Azure to get your API keys and region key.
- Better to use Linux than using Macos: We have to download the SpeechSDK. The SpeechSDK for Macos is a combinition of ArmX86 and X86, and it only contains the name of the required library but no contents with it. Another difficulty is that you should link to the dynamic libraries yourself and maybe you can do that.
- Lack of SpeechSDK in Python version. The 2 main functions that we want to use through Azure are:
    - **Speaker Recognition**
    - **Real Time Diarization** (currently under public review)<br>


### Choose a programming language or tool:
|  | C# | C++ | Go | Java | JavaScript | Objective-C | Python | Swift | CLI | REST |
| -------- | -------- | -------- |-------- |-------- |-------- |-------- |-------- |-------- |-------- |-------- |
| **Speaker Recog** | &#10003; | &#10003; | &#10003; | &#10007;| &#10003;| &#10007;| &#10007;| &#10007; | &#10007; | &#10003;|
| **Real Time Diar** | &#10003; | &#10003; | &#10007; | &#10003;| &#10003;| &#10007;| &#10003;| &#10007; | &#10007; | &#10007;|




Python is my main coding Language, since the current SDK do not support the the above two functions in Python, I tried to use them in C++ with linux (tried C++ with Macos first but don't know how to link to the dynamic libraries) and still faced some problems:
- The current code can be compiled but still has loads of bugs. Code is from: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-speaker-recognition?tabs=script&pivots=programming-language-cpp I have never used C++ before so it is really hard for me to debug so I may leave this to you if you are familiar with C++.


### How to use the Azure:

- Create an account with the Microsoft Azure Service: https://azure.microsoft.com/en-us/free/ai-services and then sign in with the Azure portal.
- Inside the protal, create a speech service resource to get your own keys.
- Then you can feel free to use the services locally.


## Our code
Once again, our code is based on the github repository https://github.com/zachlatta/openai-whisper-speaker-identification/blob/main/transcripts_with_speaker_names.ipynb. Currently, we cannot use it for real time diarization, it can only take a audio file as input then the output will be a txt file telling you 'who speak what'. The format will be like the following:<br>
- speaker 1: hello.....
- speaker 2: hi......<br>

Our current method is to record the real time audio for 40 seconds(can be adjusted), then put it into the dirization function meanwhile start to record audio for another 40 seconds. The disadvantage is that we will always have a 50-second-delay(include the processing time). **real_time_speaker_diart** and **transcripts_with_speaker_names** acctually using the same code from the github repository while the former one contains a recorder function inside the file and the latter one contains only the diarization function with all other helper function in seperate files. There are a few parameters that can be adjusted:

```python
num_speakers = 2 #@param {type:"integer"}

language = 'English' #@param ['any', 'English']

model_size = 'tiny' #@param ['tiny', 'base', 'small', 'medium', 'large']
```
As for now, setting the model size to 'tiny' is enough as we only processing audio file within 1 minutes.

- **recorder**: contains a recorder function which will record the real time audio and save it as a file in the same directory for every 20 seconds (duration can be changed).
- **check_for_new_file**: contains a function that will search for new files in the current directory and return the name of new-added files. we then can pass the new-added function into the 'transcripts_with_...' file to process it and store all transcripts into the file named **transcript.txt**.
- **GPT**: Since we may need to find contribution of each participant through their conversation, we may use a language model to help us to estimate their ability based on the transcipt. So this is a function to pass the text in **initial_promp** together with the **transcript** in to ChatGPT and get the generated-output back. One should have a valid API key for ChatGPT with credit inside the account.

# Helper function
This section refers to the `color_picker.ipynb` of `Masking and Facial` inside the Circuit Model. This file is designed for getting the HSV value of any given color in order to create a filter on it. Why we need this function? As you may noticed that, for some elctronic components such as LED, FM... we need to specify the direction of the current. So for each 'directional component', we used some color tape to specify the input and output like the following:
| Real img | Track bar | Stacked imgs |
|:---------:|:------------------:|:-----------------------------------:|
| <img src="pictures/tutorial_pic3.jpg" alt="Real img" width="200"/> | <img src="pictures/trackbars.ipg.png" alt="Track bar" width="200"/> | <img src="pictures/color_picker.ipg.png" alt="YOLO Image" width="600"/> |

- **Real img**: As you can see from the picture that we have a yellow tape indicating the input of the LED and the orange(red) tape on the FM indicating the output. 
- **Track bar**: The sencon picture is a example of the tracking bar. When you run the code, the stacked image will overlap the tracking bar, so you have to remove the stacked image to see the tracking bar. You can use the track bar to adjust the HSV value and you can see the stacked image for a real time response on the masking. We then use the HSV value to get the location of the tapes and then know the input or output of the electronic component.
- **Stacked img**: The stack-image contains 3 images in total: the orginal image, masking image and image-after-masking. In the above case, we want to only keep the orange object. So as you can see from the masking image, we only keep the orange object in white with all other part in black. And then the last image return the image with the masking.

# For Circuit Detection and Verification

### Circuit Detection:
1. We start by capturing frames from a webcam.
2. Then, for each frame, we focus on the board by cropping it.
3. Depending on whether there are hands on the board:
    1. If hands are present, we don't create a virtual board. Instead, we keep track of the hand's movements over time. By measuring the distance between fingers and pieces, we can figure out who placed each piece.
    2. If there are no hands on the board, we create a virtual representation of the board:
        - First, we identify the board's edges by detecting the black color.
        - If needed, we adjust the board's orientation.
        - We mark the pegs on the board; these marks help us establish the board's coordinates. (For more details, check `virtual_board_all.py`.)
        - To make this virtual board match the real one, we use YOLO, an object detection tool, to find the exact positions of pieces in the frame. We then convert these real coordinates into board coordinates, details in `pieces_location.pieceOnEachLocation`.

| Raw Image | Image with pegs | Image after using YOLO | After converted to board coordinates |
|:---------:|:------------------:|:-----------------:|:-----------------------------------:|
| <img src="pictures/Real_coordinate_image.png" alt="Raw Image" width="200"/> | <img src="pictures/Image_Pegs.png" alt="After Drawing Pegs" width="200"/> | <img src="pictures/Image_YOLO.png" alt="YOLO Image" width="200"/> | <img src="pictures/Box_coordinate_image.png" alt="Board Coordinates" width="200"/> |


### Circuit Verification:

1. **Defining Pieces Classes** (in `pieces.py`):
    - We start by defining classes for different types of pieces.
    - Board: `self.pos` stores the list of pieces at this position.
    - Each class has methods to add, remove, and set input/output connections and other attributes related to electrical circuits.
    - For FM and Music Circuit, they require special connection skills, with ports labelled below: (note that we attach tapes on the pieces for distinguishing their direction, please refer to Helper Function section for more detail)
    
    <img src="pictures/Waves.jpeg" alt="Raw Image" width="500"/>

2. **Assigning IDs and Establishing Connections** (in `add_pieces.py` and `connections.py`):
    - For every piece, we give it a unique ID.
    - We also define connections for each piece at its ports. These connections are crucial for establishing how pieces can be linked on the board. For instance, at (i,j) postion of the board, if piece A has an input, and piece B has an output, this indicates that A and B can be connected.

3. **Creating Skills and Tasks** (in `task.py`):
    - Next, we create skills and tasks. Detailed information about this can be found in `task.py`

4. **Testing and Observation** (in `all_tests.py`):
    - Then, create test for each skill
    - Additionally, we create observation vectors tailored to each task. These vectors capture relevant data for assessing task performance

Example Output:
|                     | Piece Added Information                                     | Skills Information and Observation Vector                       |
|---------------------|-------------------------------------------------------------|----------------------------------------------------------------|
| **Example 1:**      | ![Image 1](pictures/Piece_Added.png "Piece Added Image 1") | ![Image 2](pictures/Observation_Vector.png "Observation Vector Image 2") |
| **Example 2:**      | ![Image 1](pictures/Piece_Added2.png "Piece Added Image 2")| ![Image 2](pictures/Skills2.png "Skills Image 2")              |



## Areas for Improvement:

**Sensitivity to Lighting Conditions:** 
- The current detection performance is influenced by varying lighting conditions. To enhance accuracy, it's advisable to expand the training dataset to include diverse lighting scenarios. The existing dataset, comprising fewer than 200 samples, could benefit from augmentation with more training samples. Increased data variety is likely to yield improved detection accuracy.

**Handling Overlapping Pieces:** 
- Detecting components becomes challenging when multiple pieces overlap, particularly when several pieces are connected in parallel, the shadow they formed will make detection of the wires between them difficult. Addressing this challenge requires the creation of a more extensive training dataset that specifically includes cases of overlapping wires and pieces. While previous efforts have shown performance improvements with additional training data, further enhancements are possible in this area.

**Custom Error Correction:** 
- The `round_to_integer_with_error` function serves a critical role in mitigating errors that can arise from imprecise drawings of the pieces' bounding boxes or improper placement of pieces. In such scenarios, there's a possibility that these pieces may seem to pass through neighboring pegs, leading to potential inaccuracies such as oversized pieces or incorrect connections.
- It's important to note that this error correction method involves the incorporation of a customizable error rate. Fine-tuning this error rate is essential to ensure precise and reliable error correction.

# For Human Detection
We have developed two models using MediaPipe (Website where more information can be found: https://developers.google.com/mediapipe/solutions/vision/hand_landmarker):

1. Hand Detection Model:
    - This model is designed to recognize whose hand it is by analyzing the direction of the fingers. It's important to note that this capability is currently effective when people are sitting face to face, where their fingers are pointing toward each other. 
    - For in-depth exploration and utilization of this hand detection model, you can refer to the code located in `cvzone_hand.py`. Additionally, discover its implementation and video storage procedures in the `gesture.ipynb` notebook.
2. Holistic Model with Hand, Pose, and Face Detection:
    - Our Holistic Model is built to provide a comprehensive understanding of human subjects. While there are opportunities for performance enhancement, the model's potential is substantial. It offers the capability to assimilate intricate details such as facial expressions and body language, which play pivotal roles in gauging emotions and confidence.
    - For practical implementation, you can explore two primary functionalities in the `mediapipe.ipynb` notebook: single human detection and detection of two individuals using YOLO (which uses human identification + cropping regions of interest + deploying the Holistic Model).

|  How to Decide "Who Add What"                      | Performance on Overlapped Hands                  | Performance on Hands Closer to the Other Side                    |
|------------------------------|------------------------------|------------------------------|
|  <img src="pictures/Hand_On_Board.jpeg" alt="Image 1" width="350"/>     | <img src="pictures/Overlapped_Hands.jpeg" alt="Image 1" width="250"/>    | <img src="pictures/Hands_Closer_To_Other_Side.png" alt="Image 1" width="250"/>    |


## Areas For Improvement:

**1. Enhanced Hand Model Functionality:**
- At present, our hand model performs reliably when tracking two individuals seated face-to-face. However, there are opportunities for improvement:
    - Exploring the integration of a tracking mechanism to follow multiple (>2) individuals and allocate unique identifiers (IDs) to each person.
    - Investigating methods to distinguish between individuals when they are seated side by side. This can be challenging due to variations in finger orientation when reaching toward objects at varying distances.

**2. Integration of Hand Model into the Holistic Model:**

- When individuals are seated face-to-face, achieving an optimal camera angle to capture both individuals' frontal views can be a challenging task. Introducing multiple cameras, however, brings about new challenges, such as dealing with an increased volume of information to process and potentially limiting user mobility due to the presence of more equipment.
- The use of YOLO for person detection may encounter instability issues when individuals overlap with each other. I would recommend delving into advanced research on multiple people tracking as an alternative to human detection using YOLO.