Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Any plan to seperate non-NLP part to its own library to benefit other deep learning research field #2099

Closed
wheatdog opened this issue Nov 26, 2018 · 16 comments

Comments

@wheatdog
Copy link

I found the way that allennlp handle deep learning experiments quite elegant. And the quality of code is so great. I am mainly doing research on computer vision, and didn't find something similar. I think it would be good to reuse some part of allennlp for varies kind of deep learning research.

@schmmd
Copy link
Member

schmmd commented Nov 26, 2018

@wheatdog we don't presently have a plan to separate out the non-NLP components as our group is mainly focused on NLP problems. What problems would you run into if you used the library as is, but avoided the NLP components?

@wheatdog
Copy link
Author

I am not sure how can I use allennlp to build a tool for deep learning experiment in other field, says computer vision. In #1877, I understand that I have to change the logic in Trainer, and maybe the interface of Model. So maybe the important part I can reuse will be Registrable class? The main idea will be reuse most of the part in allennlp if I can. I will try to play around with this idea and see what I can get.

@wheatdog wheatdog changed the title Any plan to seperate non-NLP part to its own library to benefit other deep learning research feild Any plan to seperate non-NLP part to its own library to benefit other deep learning research field Nov 28, 2018
@matt-gardner
Copy link
Contributor

matt-gardner commented Nov 28, 2018

@wheatdog - I'm guessing you can use much more than just that. If you want to use this for computer vision, you'd basically just need to add in some Fields that handle image / video inputs. Everything else should just work. There might be things that are slow, because we haven't dealt with such large files as you get in vision problems (or other problems I'm not foreseeing, as I haven't done much with vision), but that's where I'd start.

If you get this to work well, we'd also be very much interested in having you contribute back to the library, as there are plenty of places where vision and language intersect (e.g., captioning, VQA), and we've love to have the ability to do those tasks in AllenNLP.

@sethah
Copy link
Contributor

sethah commented Dec 27, 2018

I've successfully converted a project of mine to use the Allen NLP training tools in a computer vision application, with no modifications to source code. Overall, the current tools, though designed for NLP, are general enough to work well for CV. This conversion simplified my hand-written PyTorch code a lot.

I must admit I am still learning the Allen NLP APIs. I used a collection of ArrayFields for my inputs and outputs, and a BasicIterator to batch them. My initial impressions are that there are a couple of leaky abstractions and then some missing features (missing might be a misnomer, since this is an NLP library so it is a bit weird to call CV features "missing"). For example, image augmentation (there are other libraries to do this, though).

As far as the leaky abstractions, one example would be the padding methods like get_padding_lengths on the Field class. Another is having to initialize models with empty vocabularies.

class AnchorBoxModel(allennlp.models.Model)

    def __init__(self):
        super(AnchorBoxModel, self).__init__(Vocabulary())

Overall, I was very impressed with how well the abstractions fit. I don't know what the path forward is, if there is one. I understand it's a non-priority, but in the process of building an NLP library it seems the AllenNLP devs have built a well-designed, general training library for PyTorch. It would be a bit of a shame for it to only be used for NLP!

@joelgrus
Copy link
Contributor

This actually is something we're thinking about (at least, it's something I'm thinking about), if you have code you're able to share that would be helpful.

@matt-gardner
Copy link
Contributor

Awesome, this is great to hear. I'd love to hear more about your experience - do you have any data loading bottlenecks, or anything? Would multi-processing on the data side speed things up? Would a separate ImageField be helpful? As for get_padding_lengths, you just mean that some kinds of Fields just return an empty dictionary there? ArrayField is designed to need padding, which is one place where an ImageField could do something different. For data augmentation stuff, importing a third-party library that does this well seems like the right way to go - no need to re-invent something that already works.

It does seem reasonable to me to have Vocabulary be an optional argument to the base Model class, if we're allowing for more general kinds of models. The only place the vocabulary is used is for indexing instances, and that code works fine with None for a vocabulary if there are no Fields that need indexing. Oh, our saving and loading code might also need some tweaking...

And language+vision tasks definitely fall in scope for what we'd like to support (in addition to whatever @joelgrus is thinking about), so having more explicit vision stuff in the library would great. If anyone reading this has any kind of language+vision model / code they want to contribute, please do!

@scarecrow1123
Copy link
Contributor

I've been thinking something of this sorts recently, especially to try to adopt AllenNLP for speech models. I'm just starting with speech recognition and the elegance of AllenNLP's abstractions and DI would be greatly missed just because I'm doing something out of regular NLP problems if I'm going with some other toolkit.
Perhaps, individual forks of the base library specific to respective areas such as vision and speech would be better rather than having too many primitives in a single place?

@sethah
Copy link
Contributor

sethah commented Dec 28, 2018

@joelgrus Sure, the relevant AllenNLP parts are mostly here, here, and here. This was basically a first pass at stuffing things into Allen NLP abstractions so that I could use the Trainer class.

I don't feel qualified just yet to comment on precisely what abstractions need to be added vs which can be used as is. The data loading efficiency for my purposes isn't very critical and so it is currently not at all optimized. That would probably an area worth more investigation.

@scarecrow1123
Copy link
Contributor

Just want to add here that I was able to make my speech related experiments work without any change in the AllenNLP source. I was also able to make it work just with ArrayField(though I may be adding something like an AudioField) and the given iterators.

@matt-gardner
Copy link
Contributor

Thanks @scarecrow1123, that's great to hear. Were there any particular scalability issues or other problems that you ran into?

@scarecrow1123
Copy link
Contributor

@matt-gardner My intention here was to reimplement Mozilla's Deepspeech implementation with AllenNLP. It has worked well so far. My only quibble if any has been not able to use DistributedDataParallel for which I've raised a separate issue discussing about a possible solution. I'll make an attempt to integrate into a separate trainer and share my feedback.

@sethah
Copy link
Contributor

sethah commented May 8, 2019

I made an attempt at a computer vision library in the style of AllenNLP, mostly out of curiosity. It’s full of incomplete code, but it might serve as a useful reference point. It helped me get a feel for some of the things that work well and some that don’t.

The framework works for two tasks: classification and semantic segmentation. I also implemented multiple object detection (significantly more involved) using Faster-RCNN, but I did that by mostly putting AllenNLP-style wrappers around the library here, which is a pain to install. That code lives in a separate branch. All that is to say that the framework has no significant barriers for some of the most common CV tasks.

Some musings

  • I’m not sure on some of the abstractions. Im2ImEncoder (Image -> Image) is quite similar to ImageEncoder (Image -> List[Image]) but both seemed handy. I’m quite sure there are abstractions missing, but those simple few worked for the three tasks mentioned above.
  • I loved the way I could specify the image augmentation in a config file. For example:
local TRAIN_AUGMENTATION = [
            {
                "type": "resize",
                "height": 512,
                "width": 512
            }, {
                "type": "normalize",
                "mean": [0.485, 0.456, 0.406],
                "std": [0.229, 0.224, 0.225]
            }, {
                "type": "flip",
                "p": 0.5
            }, {
                "type": "channel_shuffle",
                "p": 0.5
            }
        ];

local VALID_AUGMENTATION = [
            {
                "type": "resize",
                "height": 512,
                "width": 512
            }, {
                "type": "normalize",
                "mean": [0.485, 0.456, 0.406],
                "std": [0.229, 0.224, 0.225]
            }
        ];

  • Image augmentation would be tricky to get right. I don’t think you’d want to write your own, but to my knowledge there isn’t a single library out there that stands above the rest. I used wrappers around Albumentations. Not to mention, it can be a bottleneck if you don’t take care to parallelize it. I tried using the MultiProcessDatasetReader but there’s some funky stuff going on with multiprocessing and OpenCV. I never got it working (the reader just hangs forever), but didn't try very hard.
  • It was nice to already have a way to combine fields. For object detection, each image has a variable length of bounding box fields, and AllenNLP already supports that through padding and ListFields, so that was a win.

Overall, I think it would be very exciting to have the same type of dev process for computer vision that AllenNLP makes so nice for NLP. Multiple object detection, for instance, seems to be a nightmare to implement - it would be really nice to have a single framework that implements this for you as well as other common CV tasks.

Anyway, I think the computer vision extensions could be worthwhile and not too much extra effort. Curious to hear other thoughts!

@sethah
Copy link
Contributor

sethah commented May 30, 2019

Just want to bump this. Is there any interest in computer vision components for their own sake, or only as they relate to NLP tasks?

Much of what AllenNLP has contributed is making the dev workflow more pleasant, structured, and reproducible. CV practitioners would surely benefit here as well. I'd be interested in contributing one some of these things, but it's not clear what is considered in-scope. It might be good to hear from maintainers if any priorities have changed.

There's a proof-of-concept here. It's a bit too difficult to use the raw AllenNLP library and simply build your own CV parts IMO. But still, most of the important pieces are already provided by AllenNLP - that is to say, the maintenance overhead is not much more than simply adding more models to what already exists.

cc @joelgrus @matt-gardner

@matt-gardner
Copy link
Contributor

Hi @sethah, sorry to be slow - this is a crazy time of year with conferences and conference deadlines. I think there's a good case to be made for splitting things out a bit, but I'm not really sure how far to take this and whether we should spend our limited resources on it. We're having an AllenNLP summit in a couple of months, talking with some power users about directions they'd like to see us go, and I'm definitely going to try to get this on the agenda.

@sethah
Copy link
Contributor

sethah commented Jun 6, 2019

Thanks Matt, I'll keep an eye out for updates.

@matt-gardner
Copy link
Contributor

We are in the process of splitting apart the library into smaller pieces. One of the new repositories we add will probably be allennlp-vision, which will have code for running things on vision+language tasks like NLVR2. We probably aren't going to support a plain vision library ourselves, but splitting things up should make it easier and more obvious how to do something like that if someone wants to support it. Closing this issue, as we have plenty of others covering this stuff.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants