
# 1. About this Sprint

**Purpose of Sprint**  
- Learn about applications for sequence (serial) data.  
- Understand methodologies such as Sequence-to-Sequence and Image-to-Sequence.  
- Apply these methods to machine translation and image captioning.  

**How to learn**  
- Study from publicly available code implementations.  



# 2. Machine Translation

A basic example of a methodology related to sequence data is **machine translation**.  
This is achieved with the **Sequence to Sequence (Seq2Seq)** approach, where sequence data is used as input and sequence data is output.

This method is widely used for:  
- Translating between languages  
- Summarizing sentences  
- Generating automatic text (e.g., greetings)  

We will use the code example:  
[Hard/lstm_seq2seq.py](https://github.com/rstudio/hard/blob/master/lstm_seq2seq.py)


In [None]:

# Example: Import required libraries for Seq2Seq Machine Translation
import numpy as np
from keras.models import Model
from keras.layers import Input, LSTM, Dense
from keras.preprocessing.sequence import pad_sequences



### Code Reading (line by line explanation)

- **Lines 51 to 55**: Importing Libraries (`numpy`, `keras.models`, `keras.layers`, etc.)  
- **Lines 57 to 62**: Setting Hyperparameters (batch size, number of epochs, latent dimension)  
- **Character-by-character tokenization**:  
  Instead of treating words as tokens, this implementation treats **each character as a token**.  
  This makes the model smaller and easier to train, but less semantically powerful.  

🔹 **Example with scikit-learn CountVectorizer**:  
- `analyzer='char'`: Creates n-grams directly from characters.  
- `analyzer='char_wb'`: Creates n-grams only within word boundaries.  



# 3. Image Captioning

**Image captioning** is the task of generating descriptions for images.  
It uses the **Image-to-Sequence** technique: input is an image, output is a text description.  

Applications:  
- Autonomous driving (describing the environment)  
- Search engines (image tagging)  

We will use the PyTorch implementation:  
[yunjey/pytorch-tutorial - Image Captioning](https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials/03-advanced/image_captioning)  


In [None]:

# Example: Testing a pretrained Image Captioning model in PyTorch
# (using provided weights from the repo)

import torch
from PIL import Image
from torchvision import transforms

# Dummy example: preprocess image
transform = transforms.Compose([
    transforms.Resize((224,224)),
    transforms.ToTensor(),
    transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
])

# In real execution: load pretrained model and generate caption
print("Model would generate a caption for the given image here.")



### Notes

- This repo downloads pretrained weights and allows caption generation.  
- Sometimes, the **default weight filename** in the code does not match the one downloaded → must be renamed manually.  

### Research: Running in Keras

- Keras does not have an easy ready-made implementation for Image Captioning.  
- To move from PyTorch to Keras:  
  1. Define CNN encoder (e.g., VGG16, ResNet50).  
  2. Define RNN/LSTM decoder.  
  3. Handle embeddings for captions.  
- For **weights transfer**, PyTorch `.pth` weights must be converted to NumPy and then loaded manually into a Keras model.  



# 4. Advanced Assignments

### Code Reading and Rewriting

- In PyTorch, the model is defined in **`model.py`**.  
- If rewritten in Keras, we would define:  
  - CNN Encoder (`keras.applications.VGG16` with `include_top=False`)  
  - LSTM Decoder (`keras.layers.LSTM`)  
  - Dense output layer with vocabulary size.  

### Developmental Research

1. **Translating into other languages**  
   - Replace dataset with bilingual corpus (e.g., English ↔ Japanese).  
   - Retrain the Seq2Seq model.  

2. **Evolving methods of machine translation**  
   - Use **Attention mechanisms** (Bahdanau, Luong).  
   - Use **Transformer models** (Vaswani et al., 2017).  
   - Pretrained models: BERT, GPT, mBART, MarianMT.  

3. **Generating images from text (opposite of Image Captioning)**  
   - GANs (Generative Adversarial Networks).  
   - DALL·E, Stable Diffusion, Imagen.  
   - Use text embeddings + image generators.  



# 5. Conclusion

- Sequence data has wide applications such as **machine translation** and **image captioning**.  
- Machine translation may use **character-level tokenization**.  
- Running pretrained models (like Image Captioning) requires attention to **weight filenames**.  
- Transitioning to Keras requires manual weight handling from PyTorch.  
- Evolutionary methods include **attention**, **transformers**, and even **text-to-image generation**.  



# 6. References

1. CountVectorizer — scikit-learn 0.21.3 documentation  
   https://scikit-learn.org/0.21/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html  

2. Ren, S., He, K., Girshick, R., Sun, J. (2015). Faster R-CNN.  
   https://arxiv.org/pdf/1506.01497.pdf  

3. Redmon, J., Farhadi, A. (2018). YOLOv3: An Incremental Improvement.  
   https://pjreddie.com/media/files/papers/YOLOv3.pdf  

4. Vaswani, A. et al. (2017). Attention is All You Need.  
   https://arxiv.org/abs/1706.03762  
