PyTorch-Image-Captioning 📷 🖊️

This project uses a ResNet152 convolutional neural network (CNN) as an encoder and a long short-term memory (LSTM) recurrent neural network as a decoder to generate captions for images. The model is trained on the Microsoft Common Objects in Context (MS COCO) dataset.

Architecture

The encoder consists of a ResNet152 CNN followed by a linear layer and batch normalization. The CNN is pretrained on the ImageNet dataset, and its final fully connected layer is removed.

The decoder consists of an LSTM network followed by a linear layer and softmax to generate the final output. The LSTM takes the encoded image features from the encoder and a sequence of previously generated words as input to predict the next word in the caption.

Here are the trained 🔗 model's weights.

📋 To-Do List

Read papers on Image Captioning (ongoing)
Define a custom Dataloader for MS COCO dataset
Build the CNN encoder
Build the LSTM decoder
Build the Training loop
Evaluate the Model
Iterate to make the model better

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
data_analysis		data_analysis
vocab		vocab
.gitattributes		.gitattributes
.gitignore		.gitignore
Image-Captioning.ipynb		Image-Captioning.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTorch-Image-Captioning 📷 🖊️

Architecture

📋 To-Do List

About

Releases

Packages

Contributors 3

Languages

License

Rish-01/PyTorch-Image-Captioning

Folders and files

Latest commit

History

Repository files navigation

PyTorch-Image-Captioning 📷 🖊️

Architecture

📋 To-Do List

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages