New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a bunch of new features. #405

Merged
merged 1 commit into from Sep 23, 2018

Conversation

Projects
None yet
4 participants
@jonaswinkler
Contributor

jonaswinkler commented Sep 13, 2018

I've made various modifications to paperless to fit it to my needs. Some of the changes I've made may be useful for other people as well, so I've opened a pull request. Most of these changes are non-intrusive and don't affect how paperless works,

  • Debug mode is now configurable in the configuration file. This way, we don't have to edit versioned files to disable it on production systems.
  • Recent correspondents filter (enable in configuration file)
  • Document actions: Edit tags and correspondents on multiple documents at once (it's in the dropdown menu)
  • Replaced month list filter with date drilldown
  • Sortable document count columns on Tag and Correspondent admin
  • Last correspondence column on Correspondent admin
  • Save and edit next functionality for document editing

Some comments on the changes:

  • Debug mode is still enabled by default. However, copying the example configuration file will disable it.
  • The month list filter does not exist anymore. However, the date drilldown feels like a much better alternative.
  • Save and edit next is now the default action on the document change form, meaning that hitting Enter in one of the editable fields will not return to the document list, but to the next document instead.

I've made more changes, which are either experimental, intrusive, or potentially unnecessary for other users. If you want these merged as well, please tell me.

  • Inbox and Archive tags. These are special tags. Inbox tags get assigned to all new documents automatically. Documents tagged with Archive tags will never be modified automatically (i.e., by matching rules). The rationale behind Inbox is that one may want to verify the tags and correspondents on newly added documents and then remove these documents from the inbox by removing the inbox tag.
  • Document types. (i.e., Invoice, Contract, Letter, Receipt, etc)
  • Document viewers. On the document edit page, certain document types (pdf, jpg, png) will be displayed next to the document edit form. Editing metadata is much easier this way. Breaks the page layout on small screens.
  • Archive Serial Number. I keep the physical copy of some important documents. This helps me to quickly find a document if needed.
  • And here's the change I'm most exited about: I've replaced your matching algorithms with a mathematical model that learns matching patterns from already tagged documents. This model is used to tag newly added documents. Works reasonably well (far better than hand-crafted matching rules, from my experience) on my data (which is a collection of ~1000 documents with ~100 correspondents and ~10 tags). Highly experimental!

Cheers.

Added a bunch of new features:
- Debug mode is now configurable in the configuration file. This way, we don't have to edit versioned files to disable it on production systems.
- Recent correspondents filter (enable in configuration file)
- Document actions: Edit tags and correspondents on multiple documents at once
- Replaced month list filter with date drilldown
- Sortable document count columns on Tag and Correspondent admin
- Last correspondence column on Correspondent admin
- Save and edit next functionality for document editing
@massaquah

This comment has been minimized.

massaquah commented Sep 17, 2018

I like your additions very much, especially the date drilldown.

I would also like to see a pull request for this feature: "Document types. (i.e., Invoice, Contract, Letter, Receipt, etc)"

@danielquinn danielquinn merged commit fb6f2e0 into danielquinn:master Sep 23, 2018

1 check failed

continuous-integration/travis-ci/pr The Travis CI build failed
Details

danielquinn added a commit that referenced this pull request Sep 23, 2018

@danielquinn

This comment has been minimized.

Owner

danielquinn commented Sep 23, 2018

Wow, this is an amazing contribution @jonaswinkler, thank you!

A few requests for any future pull-requests though:

  • Please break up pull requests into smaller, more manageable pieces if you can. This is a great chunk of work, but it took me a very long time to go over it, which delays the merge. Smaller pull requests, even when there's more of them, make it easier to understand what's going on.
  • A lot of code breaks with pep8 and the other coding standards for this project. I had a good chunk of time this afternoon, and your new features are super-helpful, so I spent an hour or two going over it and conforming it to the style-guide, but generally it's good form when submitting code to a project, that you make sure your code conforms to the rules. For Python stuff, that's pep8, and for Paperless, it's pep8 + a few things outlined in the documentation (which I just added, so don't feel bad about missing it).

You can automatically check your code for pep8 violations just by installing tox and running it when you're in the Paperless directory. It'll run all of the unit tests for you too, just to make sure nothing broke as a result of your changes.

Honestly though, this is some great work, and thanks so much! I've already updated the changelog with notes about your contribution.

@ddddavidmartin

This comment has been minimized.

Contributor

ddddavidmartin commented Sep 25, 2018

Thanks for this @jonaswinkler, these are some great changes! To answer your question,

Inbox and Archive tags. These are special tags. Inbox tags get assigned to all new documents automatically. Documents tagged with Archive tags will never be modified automatically (i.e., by matching rules). The rationale behind Inbox is that one may want to verify the tags and correspondents on newly added documents and then remove these documents from the inbox by removing the inbox tag.

I'd love to see Inbox tags. I usually don't bother with specific filenames at all and instead edit all my documents in Paperless after I scan or add them. This would work great here.

I've replaced your matching algorithms with a mathematical model that learns matching patterns from already tagged documents.

This sounds great as well!

@jonaswinkler

This comment has been minimized.

Contributor

jonaswinkler commented Sep 26, 2018

* Please break up pull requests into smaller, more manageable pieces if you can.  This is a great chunk of work, but it took me a very long time to go over it, which delays the merge.  Smaller pull requests, even when there's more of them, make it easier to understand what's going on.

Okay, I'll keep that in mind! I wasn't really sure on how to create the pull request. Since I did not separate the features in individual branches, I couldn't simply select a branch to be merged into master. So I branched of master again and selected the changes I wanted to contribute manually. I may need a better branching strategy...

* A lot of code breaks with pep8 and the other [coding standards](https://paperless.readthedocs.io/en/latest/contributing.html) for this project.  I had a good chunk of time this afternoon, and your new features are super-helpful, so I spent an hour or two going over it and conforming it to the style-guide, but generally it's good form when submitting code to a project, that you make sure your code conforms to the rules.  For Python stuff, that's pep8, and for Paperless, it's pep8 + a few things outlined in the documentation (which I just added, so don't feel bad about missing it).

Sorry about the extra work. I'll keep that in mind next time!

@danielquinn

This comment has been minimized.

Owner

danielquinn commented Oct 7, 2018

@jonaswinkler no need to appologise, like I said this is some great work and a very nice addition to the project. Clearly I'm not alone in this sentiment either ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment