Skip to content

Commit

Permalink
Update visual chatbot to Python3 and PyTorch
Browse files Browse the repository at this point in the history
  • Loading branch information
yashkant authored and RishabhJain2018 committed Sep 9, 2019
1 parent f5db5a0 commit e084100
Show file tree
Hide file tree
Showing 58 changed files with 3,478 additions and 2,080 deletions.
18 changes: 17 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
data/
# Demo
/data/
media/
viscap/captioning/detectron/
viscap/captioning/model_data/
viscap/checkpoints/
viscap/data/
env/
static/

*.pyc
db.sqlite3
Expand All @@ -9,3 +16,12 @@ ques_feat.json
models/*.caffemodel
models/*.lua
models/*.prototxt
*.zip

# Pycharm
.idea/

# Installed packages
pytorch/
migrations/
!migrations/__init__.py
12 changes: 9 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
[submodule "neuraltalk2"]
path = neuraltalk2
url = https://github.com/karpathy/neuraltalk2.git
[submodule "viscap/captioning/vqa-maskrcnn-benchmark"]
path = viscap/captioning/vqa-maskrcnn-benchmark
url = https://gitlab.com/yashkant/vqa-maskrcnn-benchmark/
[submodule "viscap/captioning/fastText"]
path = viscap/captioning/fastText
url = https://github.com/facebookresearch/fastText
[submodule "viscap/captioning/pythia"]
path = viscap/captioning/pythia
url = https://github.com/facebookresearch/pythia/
195 changes: 116 additions & 79 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,120 +1,134 @@
# Visual Chatbot

## Introduction
Visual Chatbot
============
Demo for the paper (**Now upgraded to Pytorch, for the Lua-Torch version see [tag]()**).

Demo for the paper

**[Visual Dialog][1]**
**[Visual Dialog][1]** (CVPR 2017 [Spotlight][4]) </br>
Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra
[arxiv.org/abs/1611.08669][1]
[CVPR 2017][4] (Spotlight)

Arxiv Link: [arxiv.org/abs/1611.08669][1]
Live demo: http://visualchatbot.cloudcv.org

[![Visual Chatbot](chat/static/images/screenshot.png)](http://www.youtube.com/watch?v=SztC8VOWwRQ&t=13s "Visual Chatbot")

Introduction
---------------
**Visual Dialog** requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Given an image, dialog history, and a follow-up question about the image, the AI agent has to answer the question. Putting it all together, we demonstrate the first ‘visual chatbot’!

[![Visual Chatbot](chat/static/images/screenshot.png)](http://www.youtube.com/watch?v=SztC8VOWwRQ&t=13s "Visual Chatbot")
What has changed since the last version?
---------------------------------------------------
The model-building code is completely shifted to Pytorch, we have put in a much improved [Bottom Up Top Down][12] captioning model from [Pythia][10] and Mask-RCNN feature extractor from [maskrcnn-benchmark][13]. The Visdial model is borrowed from [visdial-challenge-starter][14] code.

## Installation Instructions
Please follow the instructions below to get the demo running on your local machine. For the previous version of this repository which supports Torch-Lua based models see [tag]().

### Installing the Essential requirements
Setup and Dependencies
------------------------------
Start with installing the Build Essentials , [Redis Server][5] and [RabbiMQ Server][6].
```sh
sudo apt-get update

```shell
# download and install build essentials
sudo apt-get install -y git python-pip python-dev
sudo apt-get install -y python-dev
sudo apt-get install -y autoconf automake libtool curl make g++ unzip
sudo apt-get install -y autoconf automake libtool
sudo apt-get install -y libgflags-dev libgoogle-glog-dev liblmdb-dev
sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
```

### Install Torch

```shell
git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh
source ~/.bashrc
```

### Install PyTorch(Python Lua Wrapper)

```shell
git clone https://github.com/hughperkins/pytorch.git
cd pytorch
source ~/torch/install/bin/torch-activate
./build.sh
```
sudo apt-get install -y libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler

### Install RabbitMQ and Redis Server

```shell
# download and install redis-server and rabbitmq-server
sudo apt-get install -y redis-server rabbitmq-server
sudo rabbitmq-plugins enable rabbitmq_management
sudo service rabbitmq-server restart
sudo service redis-server restart
```

### Lua dependencies

```shell
luarocks install loadcaffe
```
#### Environment Setup

The below two dependencies are only required if you are going to use GPU
You can use Anaconda or Miniconda to setup this code base. Download and install Anaconda or Miniconda distribution based on Python3+ from their [downloads page][17] and proceed below.

```shell
luarocks install cudnn
luarocks install cunn
```

### Cuda Installation
```sh
# clone and download submodules
git clone https://www.github.com/yashkant/visual-chatbot.git
git submodule update init --recursive

Note: CUDA and cuDNN is only required if you are going to use GPU
# create and activate new environment
conda create -n vischat python=3.6.8
conda activate vischat

Download and install CUDA and cuDNN from [nvidia website](https://developer.nvidia.com/cuda-downloads)

### Install dependencies

```shell
git clone https://github.com/Cloud-CV/visual-chatbot.git
cd visual-chatbot
git submodule init && git submodule update
sh models/download_models.sh
# install the requirements of chatbot and visdial-starter code
cd visual-chatbot/
pip install -r requirements.txt
```

If you have not used nltk before, you will need to download a tokenization model.
#### Downloads
Download the BUTD, Mask-RCNN and VisDial model checkpoints and their configuration files.
```sh
sh viscap/download_models.sh
```

```shell
python -m nltk.downloader punkt
#### Install Submodules
Install Pythia to use BUTD captioning model and maskrcnn-benchmark for feature extraction.
```sh
# install fastText (dependency of pythia)
cd viscap/captioning/fastText
pip install -e .

# install pythia for using butd model
cd ../pythia/
sed -i '/torch/d' requirements.txt
pip install -e .

# install maskrcnn-benchmark for feature extraction
cd ../vqa-maskrcnn-benchmark/
python setup.py build
python setup.py develop
cd ../../../
```
#### Cuda Installation

Change lines 2-4 of `neuraltalk2/misc/LanguageModel.lua` to the following:
Note: CUDA and cuDNN is only required if you are going to use GPU. Download and install CUDA and cuDNN from [nvidia website][18].

```shell
local utils = require 'neuraltalk2.misc.utils'
local net_utils = require 'neuraltalk2.misc.net_utils'
local LSTM = require 'neuraltalk2.misc.LSTM'
#### NLTK
We use `PunktSentenceTokenizer` from nltk, download it if you haven't already.
```sh
python -c "import nltk; nltk.download('punkt')"
```

### Create the database

```shell
## Let's run this now!
#### Setup the database
```
# create the database
python manage.py makemigrations chat
python manage.py migrate
```
#### Run server and worker
Launch two separate terminals and run worker and server code.
```sh
# run rabbitmq worker on first terminal
# warning: on the first-run glove file ~ 860 Mb is downloaded, this is a one-time thing
python worker_viscap.py

# run development server on second terminal
python manage.py runserver
```
You are all set now. Visit http://127.0.0.1:8000 and you will have your demo running successfully.

### Running the RabbitMQ workers and Development Server
## Issues
If you run into incompatibility issues, please take a look [here][7] and [here][8].

Open 3 different terminal sessions and run the following commands:
## Model Checkpoint and Features Used
Performance on `v1.0 test-std` (trained on `v1.0` train + val):

```shell
python worker.py
python worker_captioning.py
python manage.py runserver
```
Model | R@1 | R@5 | R@10 | MeanR | MRR | NDCG |
------- | ------ | ------ | ------ | ------ | ------ | ------ |
[lf-gen-mask-rcnn-x101-demo][20] | 0.3930 | 0.5757 | 0.6404 | 18.4950| 0.4863 | 0.5967 |

You are all set now. Visit http://127.0.0.1:8000 and you will have your demo running successfully.
Extracted features from `VisDial v1.0` used to train the above model are here:

- [features_mask_rcnn_x101_train.h5][21]: Mask-RCNN features with 100 proposals per image train split.
- [features_mask_rcnn_x101_val.h5][22]: Mask-RCNN features with 100 proposals per image val split.
- [features_mask_rcnn_x101_test.h5][23]: Mask-RCNN features with 100 proposals per image test split.

*Note*: Above features have key `image_id` (from earlier versions) renamed as `image_ids`.

## Cite this work

Expand All @@ -131,20 +145,43 @@ If you find this code useful, consider citing our work:
```

## Contributors

* [Yash Kant][19] (ysh.kant@gmail.com)
* [Deshraj Yadav][2] (deshraj@gatech.edu)
* [Abhishek Das][3] (abhshkdz@gatech.edu)

## License

BSD

## Credits
## Credits and Acknowledgements

- Visual Chatbot Image: "[Robot-clip-art-book-covers-feJCV3-clipart](https://commons.wikimedia.org/wiki/File:Robot-clip-art-book-covers-feJCV3-clipart.png)" by [Wikimedia Commons](https://commons.wikimedia.org) is licensed under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en)

- The beam-search implementation was borrowed as it is from [AllenNLP](15).
- The vqa-maskrcnn-benchmark code used was forked from @meetshah1995's [fork](16) of the original repository.
- The VisDial model is borrowed from [visdial-starter-challenge ][14].
- The BUTD captioning model comes from this awesome repository [Pythia][10].

[1]: https://arxiv.org/abs/1611.08669
[2]: http://deshraj.github.io
[3]: https://abhishekdas.com
[4]: http://cvpr2017.thecvf.com/
[5]: https://redis.io/
[6]: https://www.rabbitmq.com/
[7]: https://github.com/unbit/uwsgi/issues/1770
[8]: https://stackoverflow.com/questions/41335478/importerror-no-module-named-asgiref-base-layer
[9]: https://gitlab.com/yashkant/vqa-maskrcnn-benchmark](https://gitlab.com/yashkant/vqa-maskrcnn-benchmark)
[10]: https://github.com/facebookresearch/pythia/
[11]: https://github.com/facebookresearch/fastText/
[12]: https://arxiv.org/abs/1707.07998
[13]: https://github.com/facebookresearch/maskrcnn-benchmark
[14]: https://github.com/batra-mlp-lab/visdial-challenge-starter-pytorch/
[15]: https://www.github.com/allenai/allennlp
[16]: https://gitlab.com/meetshah1995/vqa-maskrcnn-benchmark/
[17]: https://conda.io/docs/user-guide/install/download.html
[18]: https://developer.nvidia.com/cuda-downloads
[19]: https://github.com/yashkant
[20]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/lf_gen_mask_rcnn_x101_train_demo.pth
[21]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_train.h5
[22]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_val.h5
[23]: https://s3.amazonaws.com/visual-dialog/data/v1.0/2019/features_mask_rcnn_x101_test.h5

100 changes: 0 additions & 100 deletions captioning.lua

This file was deleted.

Loading

0 comments on commit e084100

Please sign in to comment.