# Preprocessing Experiments

In [2]:
%pip install kagglehub

Collecting kagglehub
  Downloading kagglehub-0.3.10-py3-none-any.whl.metadata (31 kB)
Downloading kagglehub-0.3.10-py3-none-any.whl (63 kB)
Installing collected packages: kagglehub
Successfully installed kagglehub-0.3.10
Note: you may need to restart the kernel to use updated packages.


In [3]:
import kagglehub

In [None]:
datasets = [
    "kapillondhe/american-sign-language",
    "ayuraj/asl-dataset"
]

In [None]:
for dataset in datasets:
    path = kagglehub.dataset_download(dataset)
    print("Downloaded", path)
    

## Problems/Challenges
- Dataset only consisting of hands with white skin
- Dataset only consisting of white/uniform background
- Some signs in ASL are very similar, e.g. they are equal in the relative position of the hand/fingers, but the rotation is slightly differen. Thus some data augmentations like rotation can't be used for all signs.
- The dataset(s) only contain images of right handed sign language
- To train our model to recognize "white" hands in front of white backgrounds we assume we would have enough data, however we don't have enough data for a more generalized model.

## Solutions
- Data augmentation
- Finetuning an existing image recoginition model

### Data Augmentation

#### Biased Skin Color
**Augmentation**

We could generate new images by transforming the HSV/RGB values of the hands in our existing dataset to HSV/RGB values of hands with different skin color.

**Histogram Matching**

Could use the color distribution of different skin tones and match our dataset with it.

#### Biased Background
**Substitute the Background**

Could apply a HSV filter to our images to cut out the hand and replace the background with random values or random background images from the internet.

**Diverse Filters**

Could apply blur and other filters to make the learning less susceptible to noise.

#### Combined
**Gray Scale Images**

Could we work with gray scaled images to reduce the dependency on the background as well as on the skin color.

#### Similar Signs
We can't solve the problem that some signs are similar, but we can try to make the distinction between the similar signs as clear as possible by not "augmenting" one sign into the other. We have to pay attention for the following signs:

- I vs. J: J is the same sign as I, but with a slight motion. We are bound to run into a problem here. But we can define I as being an upright pinky finger and J being the one that is in a range of other positions.

The other signs are distinguishable from each other, however M, N, S, T and A are very similary and only show tiny differences. This might pose a problem for the model.

#### Right Handed Bias
The signs for right handed people and left handed people are equal if flipped. Hence we can cover left handed sign language by flipping our data.

### Model Solutions
These solutions don't replace the above but only extend them.

**Finetune existing model**

Could finetune an existing model like ResNet or EfficientNet with our own, augmented dataset. This could for one improve performance and for another get rid of some of the biases we already have. We have to make sure to still have our own models (and a baseline model) as a control group.

How to finetune ResNet: https://medium.com/@engr.akhtar.awan/how-to-fine-tune-the-resnet-50-model-on-your-target-dataset-using-pytorch-187abdb9beeb

How to finetune EfficientNet: https://www.restack.io/p/fine-tuning-answer-efficientnet-pytorch-cat-ai

Other models that already work with pose detection as opposed to object detection:

MediaPipe: https://pypi.org/project/mediapipe/
OpenPose: https://cmu-perceptual-computing-lab.github.io/openpose/web/html/doc/md_doc_03_python_api.html
