-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Project Documentation Enhancement #32
Comments
What's the advantage over a standard README? How tough is it to maintain? On Tuesday, November 24, 2015, Sebastian Raschka notifications@github.com
Randal S. Olson, Ph.D. E-mail: rso@randalolson.com | Twitter: @randal_olson |
Well, of course you can always put 'everything' into a README file as well, but depending on future additions, this README file can become huge and user unfriendly. I'd say it's the same reason why people don't build websites as 1 large html/text file ... |
Doesn't hurt to have the web page docs then. I don't think the project is On Wednesday, November 25, 2015, Sebastian Raschka notifications@github.com
Randal S. Olson, Ph.D. E-mail: rso@randalolson.com | Twitter: @randal_olson |
@rhiever Yes, I was also thinking more in terms of "in the long run." It would certainly help though to start early and document "as we go." If we were to set up a project documentation, we probably want to use a static html builder like Sphinx, MkDocs, or Jekyll. I think it's typical for Python projects to use Sphinx. It's really a neat tool, but it's also a pretty complex beast, and personally, I find that the default themes are really clunky and ugly. I think MkDocs would work just fine and I don't see any disadvantage of using Markdown over the restructured text format. Once it's setup, it's actually pretty easy to maintain:
That's basically it. |
I would be happy for you two to take the helm on establishing the project On Wednesday, November 25, 2015, Sebastian Raschka notifications@github.com
Randal S. Olson, Ph.D. E-mail: rso@randalolson.com | Twitter: @randal_olson |
@rhiever @pronojitsaha Alright, sounds like a plan. I suggested to setup the MkDocs framework with API generator and stuff as I've done this for other projects already, but if @pronojitsaha wants to do it, it would be fine with me too. Just let me know so that we don't implement the same thing twice :). |
@rasbt As you already have a similar framework in place, I dont believe its make sense to reinvent the wheel again! You can share the existing framework as a separate repository and then we can decide the structure and contribute to individual pages as mutually decided. Does that work for you? |
@rhiever @pronojitsaha Sorry for the late response, I took a few days off over the long Thanksgiving weekend. Unfortunately, I am in the midst of wrapping up a few research projects before I I'll go on vacation in a few days so I probably wouldn't get to it before January. But setting up a basic framework via Sphinx or Mkdocs should be pretty straight-forward I guess. The gplearn library is actually a nice, lean example: https://gplearn.readthedocs.org/en/latest/examples.html I would suggest using the Readme file as a template; I think the goal of the documentation would be a to have an "appealing" with a convenient navigation to find relevant information. Maybe I'd start with the following sections/pages
|
@rasbt Hope you had a good thanksgiving. Ok, I will look into it and setup the initial framework using Mkdocs which we can later work on together once you are available in January. Enjoy the vacation! |
@pronojitsaha Just got home and read your message; I thought: up the template literally just takes 10 minutes, let's do this ;). See pull request #35 I basically just pasted the sections from the Readme file for now, you can see it live at (If you fetch or merge it, you can see it live locally by running So, I guess I'll take a look at the API documentation in January then, but I wanted to set this up so that you guys can maybe write the rest of the documentation and come up with some more examples and tutorials or so in the mean time. |
@rasbt Ok, great! Will dwell into it further. |
Thanks for the great start on these docs. I've merged #35. |
@rasbt, I've been updating the docs for the new export functionality and it takes double the work to update both the README and the docs. Any recommendations to avoid this duplication of labor? @pronojitsaha, now that we have docs up and running, I can think of a couple things that would be invaluable at this point:
|
@rhiever I'd recommend not to cram too much into the README file but focus on the "essentials" like an overview, a quick example, installation, license info, and short contributing info. I would insert a "important links section at the top pointing to the actual documentation then.
|
@rhiever Ok, I will look into the two points. I understand at this point we have only implemented for classifications tasks, so for examples following are few data sets in my mind, please let me know your views:
However, hardware is a challenge as increasing data set sizes will slow down TPOT considerably and increase the time involvement. This also applies for #41 for unit testing. As such have you thought of having EC2 instances for this project or any other alternative to account for this? |
FWIW, other Python based GP projects tend to use OpenCL/PyOpenCL to make better use of dedicated CPU/GPU and FPGA resources. In fact, a number are even using CUDA (which is NVIDIA specific) |
For now, I think we'll stick to smaller data sets (e.g., the sklearn MNIST subset) for the examples in the docs. i.e., examples that can be executed and see results in less than 10 minutes. I wouldn't want to require the user to fire up an EC2 instance or hop on a HPCC to run a basic TPOT example. However, for some use cases it may take several hours to run TPOT -- especially with large data sets -- and I think it would be a good idea to note that in the docs. Perhaps in an "Expectations for TPOT" section of the docs? |
Note that OpenCL is just an abstraction mechanism, i.e. the underlying "kernels" (C-like code) will work on CPUs, GPUs and FPGA hardware. Since you mention MNIST: I suggest to run a corresponding google search, there are a number of examples where using the GPU instead of the CPU (via OpenCL/CUDA) provided a x100 factor speedup when using the MNIST dataset, e.g. see: http://corpocrat.com/2014/11/09/running-a-neural-network-in-gpu/ (note that this is also using python and skl) http://www.cs.berkeley.edu/~demmel/cs267_Spr11/Lectures/CatanzaroIntroToGPUs.pdf |
@rhiever Ok, I think it makes sense to work on small/sub sets now and focus more on the implementation of the examples. Will look into it. |
Anyone working on documenting the pipeline operators and public functions? I've made some significant headway on it, but want to make sure I'm not duplicating labor. |
Hi @bartleyn, I am not working on those at the moment. |
PR #71 is related and still in review (will get to it soon, promise -- I'm back from vacation now), but otherwise I believe that's the only pending change to the docs. |
I was thinking that it may be worthwhile setting up a project documentation page other than this github repo -- for example, via Sphinx or MkDocs. This would have the advantage to create & organize an API documentation and tutorials/examples. I could set up something like at http://rasbt.github.io/biopandas/ if you'd find it useful.
The text was updated successfully, but these errors were encountered: