## Transfer Learning
- It is a deep learning techniqe where a model developed for one task is reused as the starting point for another model. The latter part of this pre-trained model is then fined-tuned or adapted to suit the specific requirements of the new task. 

## Transfer Learning in Deep Learning Model
Key aspects: 

- Reuse the pre-trained models
- Retrain the latter layers for new tasks
- Leverages learned features for a wide range of tasks, including but not limited to object recognition

## Why use Transfer Learning
- Faster training,  speeds up training by reusing pre-trained model knowledge
- Improved performance with less training time and resources
- Handling small datasets is useful when the new task has a small dataset
- Domain adaptation allows models to adapt to new domains or different data distributions
- Transfer of knowledge facilitates the insights gained from one task to another
- Resource efficiency reduces data annotation needs by utilizing pre-trained models

## scenarios of transfer learning:
- `positive transfer learning`: Positive transfer learning refers to aa situation where knowledge or experience gained from one task improves performance on a different, related task
    - example: a model trained to detect one type of cancer cell may also perform well at detecting variants of those cancer cells in the future

- `negative transfer learning`: Negative transfer learning refers to a situation where knowledge or experience gained from one task hinders performance of a different, unrelated task. 
    - If negative transfer learning is observed, it may be beneficial to conduct further training. 

## selecting pre-trained models 
- pre trained models are pre built deep learning models trained on large datasets, enabling efficient transfer learning for improved performance on new tasks. 

## factors of pre trained model

- `Model size:` model size is the most crucial part of a model, it determines the system storage capacity. 
    - for object detection with an edge device, a small model is preferable to a heavy model.

- `Extension of the model`: It reflects the framework on which the model was trained 
    - if the model is trained with TensorFlow, the file extension is typically .h5, and if it is trained with Pytorch, it is typically .pth. The choice of a pre-trained model depends on the framework being worked 

- `input of the model`: Each model has its own input requirements, which should be ensured in the preprocessing phase. 

- `Output of the model`: After successful input processing, the model outputs can be interpreted to provide the desired result. 

- `Model specifications and accuracy`: specifications vary between pre-trained models based on the tasks to be performed. 

- `Compare and contrast`: After evaluating all the factors, the models under consideration are compared. 
    - Speed: model's prediction time, 
    - accuracy: frequency of correct predictions, balanced with speed and size. 
    - size: Computational and memory demands of the model based on deployment contraints. 


## Pre-trained model list

- models for image domain
        `Face detection`: 
    1. MTCNN (multi-task cascaded convolutional networks) is a deep learning model specifically designed for face detection. 
    2. Inception-ResNet is a hybrid model that combines the inception and resnet architectures
    3. MobileNet is quick and effective for smartphones with limited resources

        `object detection`: 
    1. Detectron2 is an object detection framework developed by Facebook AI research
    2. YOLOv5 (you only look once) is an object detection algorithm known for its real-time processing speed
    3. InceptionResNetV2 is a convolutional neural network architecture that combines the inception and resnet modules

        `Image segmentation`:
    1. Mask RCNN is an object detection and instance segmentation model.
    2. UNet is a popular model architecture used for image segmentation tasks. 
    3. MANet (Microscopy Adaptive Network is a deep learning model designed specifically for microscopy image analysis tasks.)
    4. LinkNet is a lightweight and efficient model architecture for semantic segmentation
    5. DeepLabv3 is a widely adopted model for semantic image segmentation

        `Image classification`:
    1. RegNetY is designed for high performance an dcomputational efficiency in CNN architectures
    2. ResNet-50 revolutionized computer vision with deep architecture and skip connections
    3. VGG-16 isknown for its simplicity and effectiveness in image classfication tasks with deep CNNs
    4. EfficientNet achieves performance while being computationally efficient in CNN architectures

        `Pose detection`:
    1. MoveNet is a lightweight pose estimation model designed for accurate human pose detection
    2. OpenPose is popular framework for keypoint detection and action recognition. 


- models for text domian
        `classification models`:
    1. XLNet: uses permutation-based training to improve contextual learning, suitable for tasks like sentiment analysis and spam detection
    2. ERNIE: integrates structured knowledge, outperforming BERT and XLNet in various bencharmarks, making it ideal for relation extraction and sentiment analysis. 

        `embedding models`:
    1. BERT: known for its bidirectional training and contextual understand. It is used in NER, question answering, and sentiment analysis
    2. Electra: is efficient pre training method with strong performace in text embeddings tasks. 

        `text generation models`: 
    1. SmartReply: is a text generation model developed by Google that provides automated suggestions for short message responese.
    2. RoBERTa: is a state of the art text generation model based on the BERT architecture.

        `text based question answering model`:
    1. TF2NQ: is a text based question answering model specifically designed for the Natural Questions dataset

        `text language models`
    1. GPT-4: superior in handing longer texts, multilingual support, and factual accuracy, useful for language translation and summarization
    2. Enformer: is a text model with a transformer based architecture and enhanced long range context handling. 

- models for audio domian
        `Audio classification`:
    1. YAMNet is designed to classify audio signals into a wide range of sound categories, including environmental sound, musical instruments, and human actions. 

        `audio embedding`:
    1. Trill is an audio embedding model that learns tranferable representations from speech data.
    2. OpenL3 is an open-source python library that computes deep audio and image embeddings. It is based on the look, listen, and learn (L3) approach, which uses both audio and visual data to learn useful representations. 

        `Audio pitch extraction`:
    1. CREPE: is a deep convolutional neural network designed for pitch estimation directly from time domain waveform inputs. It processes raw audio signals, making it robust to various types of noise and distortion

        `Audio speech to text`: 
    1. Wav2Vec converts audio speech signals into textual representations
    2. Wav2Ver2 results in various speech recognition benchmarks and is widely used in industry and academia

    - models for video domain:
        - `video classification`: 
        1. VideoMAE is a video classification model using a masked autoencoder architecture
        2. ViViT uses a transformer based architecture specifically tailored for video classification. It processes video data by applying self-attention mechanisms to capture long range dependencies

        - `video generation`:
        1. VideoFlow Encoder is  component of a video generation model that extracts high level features from input video frames. 
        2. VideoFlow Generator is another component of a video geneeration model that takes the encoded features from the videoFlow encoder and generates new video frames. 
        3. Tweening Conv3D is a video generation model that focuses on generating intermediate frames between two given frames. 

## Advantages of Transfer Learning

- Reducing training time
- Enhanced efficiency in deploying multiple deep learning models
- Better model training using simulations instead of resource- intensive real world environments

    - Transfer learning allows a pre-trained model to be fine-tuned for other task, reducing the need for massive dataset each time.  


