Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation for 4.0.0 #2960

Closed
8 tasks done
piskvorky opened this issue Sep 28, 2020 · 19 comments · Fixed by #2981
Closed
8 tasks done

Update documentation for 4.0.0 #2960

piskvorky opened this issue Sep 28, 2020 · 19 comments · Fixed by #2981
Assignees
Labels
documentation Current issue related to documentation housekeeping internal tasks and processes
Milestone

Comments

@piskvorky
Copy link
Owner

piskvorky commented Sep 28, 2020

Internal checklist so I don't forget something:

  • Rewrite the landing page copy for the new website.
  • Restructure top-level TOC tree for the new website.
  • Make all rst docs up-to-date for the new website, check hyperlinks, update stats & copy.
  • Go over rendered API docs and check everything's up to date and makes sense: formatting, descriptions, notes, hyperlinks.
  • Migrate & regenerate all Tutorials and How-tos.
  • Write the Migration guide.
  • Write the 4.0.0 release notes.
  • Improve the Developer page with Gensim code-style: hanging indents, trailing commas, FIXME vs TODO, line length, etc.
@piskvorky piskvorky added documentation Current issue related to documentation housekeeping internal tasks and processes labels Sep 28, 2020
@piskvorky piskvorky added this to the 4.0.0 milestone Sep 28, 2020
@piskvorky
Copy link
Owner Author

Fixes go into PR #2954.

@piskvorky
Copy link
Owner Author

piskvorky commented Oct 16, 2020

@mpenkov release notes below. Can you please edit this comment to add (auto-generate?) the 🔴 Bug fixes / 👍 Improvements / 📚 Tutorial and doc improvements / ⚠️ Deprecations sections?

Next up, I'll fill in the performance numbers & finish the Migration guide, and we're ready to release 4.0.0beta. I don't think any of the other tickets are blocking (CC @gojomo).

1. Tweet


So, Gensim 4.0 is nearly here. Faster models, less RAM, prettier documentation.

Want to help? Install the pre-release and let us know how it went!

pip install --pre --upgrade gensim

https://github.com/RaRe-Technologies/gensim/releases/tag/4.0.0beta


2. Release notes (maybe also a blog post?)

Moved to CHANGELOG: see #2981.


@gojomo
Copy link
Collaborator

gojomo commented Oct 16, 2020

Nice progress! A few thoughts:

  • I don't think Gensim's actually confirmed working yet in Python 3.9 - but by the stated policy, it should be. see Build/test on Python 3.9 #2966 for my quickie test, whose success may just be awaiting scipy catching up & distributing wheels
  • Everywhere that offers the pip install --pre advice should include some boilerplate along the lines of: "if you find problems, please report them ISSUES, and if you need to roll-back, use pip command EXAMPLE"
  • For the reasons I gave on the other issue, I fear too much emphasis on the 'migration script' may cause problems & extra work, implying there's more assurance than there is. It could successfully run, but silently leave models with problems that are only discovered later. On the other hand, even as an advocate of faster sunsetting of old-version compatibility, I think it is unlikely to be strictly true that people will regularly "have to upgrade in multiple hops" (as written now). Much/most of the time, a load from many points back will still work. My goal is just to set the expectation that it's only strongly-targetted to work one-back, and older may require multiple hops, so that some version-increment-improvements can choose to trim their testing-effort & crufty-compatibility code. Not that, for example, 4.3 refuses to direct-load a 4.1-model even in a model class that's not been touched recently.

@piskvorky
Copy link
Owner Author

Thanks. Re. migrations: I replied in #2967, to keep it tidy.

@mpenkov
Copy link
Collaborator

mpenkov commented Oct 17, 2020

Can you please edit this comment to add (auto-generate?)

I think it's better to do this directly in the changelog, because that gets version-controlled and is generally easier to edit than comments in a github PR. It already has some entries. I will double-check that everything merged after 3.8.3 is in there.

@piskvorky
Copy link
Owner Author

piskvorky commented Oct 17, 2020

Thanks. So to be clear, your proposed workflow is:

  1. update Changelog.
  2. copy / edit entries from the Changelog into the appropriate release notes sections here.

I was under the impression you had some script that takes merged PRs + closed tickets and auto-generates this list (which can then be edited). Was I confusing that with something else?

@mpenkov
Copy link
Collaborator

mpenkov commented Oct 17, 2020

Thanks. So to be clear, your proposed workflow is:

Sort of.

copy / edit entries from the Changelog into the appropriate release notes sections here.

Why do we need to copy them here? They can go straight into the release notes when we do the release.

I was under the impression you had some script that takes merged PRs + closed tickets and auto-generates this list (which can then be edited). Was I confusing that with something else?

There's a script that takes a PR number and then grabs its title, etc. We use it for individual PRs - it's not quite what we need here, because there have been dozens of merged PRs since the last release.

@mpenkov
Copy link
Collaborator

mpenkov commented Oct 17, 2020

Please see #2981

@piskvorky
Copy link
Owner Author

Why do we need to copy them here?

Because I don't know what to put in the release notes :D

Relying on my memory is not a good idea.

@mpenkov
Copy link
Collaborator

mpenkov commented Oct 18, 2020

Can't we just link to the changelog in the comment instead of copying? e.g.

The risk is we copy stuff to here, then change it in the changelog, and forget to update the comment.

@piskvorky
Copy link
Owner Author

Yes. I can edit the changelog in #2981, instead of here.

@piskvorky
Copy link
Owner Author

piskvorky commented Oct 25, 2020

@gojomo @mpenkov I updated the Migration guide: https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4

There are two FIXMEs I still wasn't sure about, one for @gojomo and one for @mpenkov . Can you please have a look?

And fix / expand / improve anything else you might see there. Otherwise ready for the beta release (with #2981) this week. Thanks!

@gojomo
Copy link
Collaborator

gojomo commented Oct 27, 2020

I addressed the FIXME with my name beside it, clarifying the corpus_iterable (instead of sentences/documents) change. This could possibly be listed higher - it's not really advanced, but it's also somewhat uncommon for that parameter to be passed as a named-parameter rather than positional.

I also added extra clarification around index2entity & vocab, and a note about the removal of .trainables/.vocabulary.

I think the commitment "All your existing stored models... will continue to work" is an overly-strong commitment, as our testing doesn't really guarantee that for all models/all features, and I doubt it will be expanded to guarantee that, and if some stray reports of problems with old/rare configurations come up, I know I won't necessarily have time to fix them. The truth is closer to: "most will work & we haven't gratuitously/knowingly broken anything in recent/common use - try it & check results."

(For example, some shallow testing of pre-1.0 models used to be present but I disabled-via-renaming during other updates. Similarly. some other testing of old-models-of-inexact-vintage is disabled. Will these old models load & work? Should they work? Would re-enabling this test code actually test anything useful? I don't know, but I suspect it'd be a waste of time given the real possibility that there are zero people in the world who'd actually need this.)

Similarly, and for reasons I've also alluded to elsewhere, I think the process authoritatively described at https://github.com/RaRe-Technologies/gensim/wiki/Gensim-And-Compatibility#upgrading-trained-models is somewhat over-involved for the common case. Usually, things will just work, no multiple-hops necessary, because most 4.X to 4.[X+1] releases won't even touch most models, and even where it does, we'll try to support forward-migration when practical. But in the occasional case where bigger changes happen and/or we didn't notice an issue, the guidance risks creating too much confidence in this "migration" script when it runs without error. That won't, practically, be a true guarantee everything is OK given our current (& foreseeable) level of testing. Saying less would be be more reasonable.

@piskvorky
Copy link
Owner Author

piskvorky commented Oct 27, 2020

Thanks.

What I meant is "if loading your model worked in 3.8.3, it will continue to work in 4.0.0". Of course we don't support & test everything meticulously, all the way back to Gensim 0.4.0. I'll check your links and reword as appropriate.

About https://github.com/RaRe-Technologies/gensim/wiki/Gensim-And-Compatibility#upgrading-trained-models : was discussed here #2967 (comment)

@piskvorky
Copy link
Owner Author

piskvorky commented Oct 27, 2020

I checked your links and don't think that's something we need to draw attention to or split hairs over, pedantically.

If there are any people who need to load models from Gensim<1.0 or some such "support", we can handle them on a case-by-case basis. I suspect there are none, just as you said.

And by "handle" I expect saying "go and retrain your model in 4.0".

@gojomo
Copy link
Collaborator

gojomo commented Oct 27, 2020

My main concern is that we don't project a false confidence - to do so increases the risk of users being less vigilant in their upgrades, and increases the implied obligations about what might be fixed if buggy.

The current wording without extra emphasis on "all... will work" is better than before, but my own preferred wording would still be "should work", implying a goal not a guarantee. More like: "Our goal has been that most older models should load & function as before, but you should independently verify that your models load & work, especially if older than 3.8.3".

It's hair-splitting, but it just seems safer to me to lean in the direction of lower-expectation/lower-commitments.

@piskvorky
Copy link
Owner Author

piskvorky commented Oct 27, 2020

OK, I'll relax the language further. It's a Wiki page, so we can update it any time easily, even post-release. I'm hoping the beta release will shine light on several places not covered by the migration guide, or not covered adequately.

Or nobody will care, which is also quite possible.

@mpenkov
Copy link
Collaborator

mpenkov commented Oct 28, 2020

@piskvorky This got closed by automation: not sure if that's what we actually wanted to do, so double-checking.

@piskvorky
Copy link
Owner Author

Yes, I'm done here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Current issue related to documentation housekeeping internal tasks and processes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants