Project Documentation Enhancement #32

rasbt · 2015-11-25T01:18:58Z

I was thinking that it may be worthwhile setting up a project documentation page other than this github repo -- for example, via Sphinx or MkDocs. This would have the advantage to create & organize an API documentation and tutorials/examples. I could set up something like at http://rasbt.github.io/biopandas/ if you'd find it useful.

rhiever · 2015-11-25T13:53:54Z

What's the advantage over a standard README? How tough is it to maintain?

On Tuesday, November 24, 2015, Sebastian Raschka notifications@github.com
wrote:

I was thinking that it may be worthwhile setting up a project
documentation page other than this github repo -- for example, via Sphinx
or MkDocs. This would have the advantage to create & organize an API
documentation and tutorials/examples. I could set up something like at
http://rasbt.github.io/biopandas/ if you'd find it useful.

—
Reply to this email directly or view it on GitHub
#32.

Randal S. Olson, Ph.D.
Postdoctoral Researcher, Institute for Biomedical Informatics
University of Pennsylvania

E-mail: rso@randalolson.com | Twitter: @randal_olson
https://twitter.com/randal_olson
http://www.randalolson.com

rasbt · 2015-11-25T15:39:20Z

Well, of course you can always put 'everything' into a README file as well, but depending on future additions, this README file can become huge and user unfriendly. I'd say it's the same reason why people don't build websites as 1 large html/text file ...
I think for larger projects, breaking it -- the documentation -- down into logical sections (e.g., one document to list and describe version changes, one to document the API, and several ones for tutorials/examples) wouldn't hurt.
I think that a README file is important though, it should certainly contain the most important information about a project.

rhiever · 2015-11-25T17:10:36Z

Doesn't hurt to have the web page docs then. I don't think the project is
large enough to merit that yet, but we will probably get there soon.

On Wednesday, November 25, 2015, Sebastian Raschka notifications@github.com
wrote:

Well, of course you can always put 'everything' into a README file as
well, but depending on future additions, this README file can become huge
and user unfriendly. I'd say it's the same reason why people don't build
websites as 1 large html/text file ...
I think for larger projects, breaking it -- the documentation -- down into
logical sections (e.g., one document to list and describe version changes,
one to document the API, and several ones for tutorials/examples) wouldn't
hurt.
I think that a README file is important though, it should certainly
contain the most important information about a project.

—
Reply to this email directly or view it on GitHub
#32 (comment).

Randal S. Olson, Ph.D.
Postdoctoral Researcher, Institute for Biomedical Informatics
University of Pennsylvania

E-mail: rso@randalolson.com | Twitter: @randal_olson
https://twitter.com/randal_olson
http://www.randalolson.com

pronojitsaha · 2015-11-25T17:50:43Z

Hi @rhiever & @rasbt, I am quite interested & motivated by the possibilities & impact potential of TPOT. If possible, I would like to contribute to it and I think starting with the project documentation would be good, if you require? Look forward to hear from you guys.

Thanks.

rasbt · 2015-11-25T18:06:02Z

@rhiever Yes, I was also thinking more in terms of "in the long run." It would certainly help though to start early and document "as we go."

If we were to set up a project documentation, we probably want to use a static html builder like Sphinx, MkDocs, or Jekyll. I think it's typical for Python projects to use Sphinx. It's really a neat tool, but it's also a pretty complex beast, and personally, I find that the default themes are really clunky and ugly. I think MkDocs would work just fine and I don't see any disadvantage of using Markdown over the restructured text format.

Once it's setup, it's actually pretty easy to maintain:

make a change in the markdown file(s)
view the changes live via mkdocs serve
build the HTML via mkdocs build --clean
deploy the changes to Gihub-Pages via mkdocs gh-deploy

That's basically it.

rhiever · 2015-11-25T18:16:35Z

I would be happy for you two to take the helm on establishing the project
docs. Once I get back on Monday, I'll be focusing on development again.

On Wednesday, November 25, 2015, Sebastian Raschka notifications@github.com
wrote:

@rhiever https://github.com/rhiever Yes, I was also thinking more in
terms of "in the long run." It would certainly help though to start early
and document "as we go."

If we were to set up a project documentation, we probably want to use a
static html builder like Sphinx, MkDocs, or Jekyll. I think it's typical
for Python projects to use Sphinx. It's really a neat tool, but it's also a
pretty complex beast, and personally, I find that the default themes are
really clunky and ugly. I think MkDocs would work just fine and I don't see
any disadvantage of using Markdown over the restructured text format.

Once it's setup, it's actually pretty easy to maintain:

make a change in the markdown file(s)

view the changes live via mkdocs serve

build the HTML via mkdocs build --clean

deploy the changes to Gihub-Pages via mkdocs gh-deploy

That's basically it.

—
Reply to this email directly or view it on GitHub
#32 (comment).

Randal S. Olson, Ph.D.
Postdoctoral Researcher, Institute for Biomedical Informatics
University of Pennsylvania

E-mail: rso@randalolson.com | Twitter: @randal_olson
https://twitter.com/randal_olson
http://www.randalolson.com

rasbt · 2015-11-25T18:32:48Z

@rhiever @pronojitsaha Alright, sounds like a plan. I suggested to setup the MkDocs framework with API generator and stuff as I've done this for other projects already, but if @pronojitsaha wants to do it, it would be fine with me too. Just let me know so that we don't implement the same thing twice :).

pronojitsaha · 2015-11-26T10:02:45Z

@rhiever & @rasbt thanks.

@rasbt As you already have a similar framework in place, I dont believe its make sense to reinvent the wheel again! You can share the existing framework as a separate repository and then we can decide the structure and contribute to individual pages as mutually decided. Does that work for you?

rasbt · 2015-12-01T01:49:09Z

@rhiever @pronojitsaha Sorry for the late response, I took a few days off over the long Thanksgiving weekend. Unfortunately, I am in the midst of wrapping up a few research projects before I I'll go on vacation in a few days so I probably wouldn't get to it before January. But setting up a basic framework via Sphinx or Mkdocs should be pretty straight-forward I guess. The gplearn library is actually a nice, lean example: https://gplearn.readthedocs.org/en/latest/examples.html

I would suggest using the Readme file as a template; I think the goal of the documentation would be a to have an "appealing" with a convenient navigation to find relevant information.
I think that it will definitely pay off in the long run when the code base grows (regarding the API documentation) as well as the number of tutorials and examples.

Maybe I'd start with the following sections/pages

"Contributing" (basic GitHub instructions: filing issues, forking, and pull requests etc.)
"Version History" (keeping track of new features and changes over time)
"API documentation" i.e., auto-parsing the docstrings
"Installation"
"Tutorials" or "Examples"

pronojitsaha · 2015-12-01T06:18:34Z

@rasbt Hope you had a good thanksgiving. Ok, I will look into it and setup the initial framework using Mkdocs which we can later work on together once you are available in January. Enjoy the vacation!

rasbt · 2015-12-01T08:44:42Z

@pronojitsaha Just got home and read your message; I thought: up the template literally just takes 10 minutes, let's do this ;). See pull request #35

I basically just pasted the sections from the Readme file for now, you can see it live at
http://rasbt.github.io/tpot/

(If you fetch or merge it, you can see it live locally by running mkdocs serve from the docs/source directory -- by default it's http://127.0.0.1:8000/.)

So, I guess I'll take a look at the API documentation in January then, but I wanted to set this up so that you guys can maybe write the rest of the documentation and come up with some more examples and tutorials or so in the mean time.

pronojitsaha · 2015-12-01T09:31:43Z

@rasbt Ok, great! Will dwell into it further.

rhiever · 2015-12-02T18:32:57Z

Thanks for the great start on these docs. I've merged #35.

rhiever · 2015-12-02T21:03:46Z

@rasbt, I've been updating the docs for the new export functionality and it takes double the work to update both the README and the docs. Any recommendations to avoid this duplication of labor?

@pronojitsaha, now that we have docs up and running, I can think of a couple things that would be invaluable at this point:

Not all of the public TPOT functions are thoroughly documented. fit, score, and export in particular need more documentation since those are the primary functions that people will be using. Currently we have a basic example of using them in the README, but it'd be great to expand on those docs and go into detail on what each function -- and what parameter of each function -- does.
More examples are always welcome! Currently we only have the MNIST example from sklearn, but it'd be great to provide code examples of many different types of data sets.

rasbt · 2015-12-02T21:14:18Z

@rhiever I'd recommend not to cram too much into the README file but focus on the "essentials" like an overview, a quick example, installation, license info, and short contributing info. I would insert a "important links section at the top pointing to the actual documentation then.
Otherwise, I'd suggest to just assemble the README.md, e.g.,

cat index.md installation.md contributing.md MNIST_Example.md ... > README.md

pronojitsaha · 2015-12-07T09:29:24Z

@rhiever Ok, I will look into the two points. I understand at this point we have only implemented for classifications tasks, so for examples following are few data sets in my mind, please let me know your views:

However, hardware is a challenge as increasing data set sizes will slow down TPOT considerably and increase the time involvement. This also applies for #41 for unit testing. As such have you thought of having EC2 instances for this project or any other alternative to account for this?

UniqueFool · 2015-12-07T09:37:12Z

hardware is a challenge as increasing data set sizes will slow down TPOT considerably

FWIW, other Python based GP projects tend to use OpenCL/PyOpenCL to make better use of dedicated CPU/GPU and FPGA resources. In fact, a number are even using CUDA (which is NVIDIA specific)

rhiever · 2015-12-07T16:07:20Z

For now, I think we'll stick to smaller data sets (e.g., the sklearn MNIST subset) for the examples in the docs. i.e., examples that can be executed and see results in less than 10 minutes. I wouldn't want to require the user to fire up an EC2 instance or hop on a HPCC to run a basic TPOT example.

However, for some use cases it may take several hours to run TPOT -- especially with large data sets -- and I think it would be a good idea to note that in the docs. Perhaps in an "Expectations for TPOT" section of the docs?

UniqueFool · 2015-12-08T10:21:29Z

Note that OpenCL is just an abstraction mechanism, i.e. the underlying "kernels" (C-like code) will work on CPUs, GPUs and FPGA hardware.
Wrappers like pyopencl even hide the nitty gritty details and expose all this flexibility to scripting space, which means that a python script can implement heavy algorithms as "kernels" that will automatically make use of dedicated hardware if available.
the only real issue is that OpenCL does not currently lend itself to clustering/distribution.

Since you mention MNIST: I suggest to run a corresponding google search, there are a number of examples where using the GPU instead of the CPU (via OpenCL/CUDA) provided a x100 factor speedup when using the MNIST dataset, e.g. see: http://corpocrat.com/2014/11/09/running-a-neural-network-in-gpu/ (note that this is also using python and skl)

http://www.cs.berkeley.edu/~demmel/cs267_Spr11/Lectures/CatanzaroIntroToGPUs.pdf

pronojitsaha · 2015-12-08T11:57:54Z

@rhiever Ok, I think it makes sense to work on small/sub sets now and focus more on the implementation of the examples. Will look into it.

bartleyn · 2016-01-17T21:22:48Z

Anyone working on documenting the pipeline operators and public functions? I've made some significant headway on it, but want to make sure I'm not duplicating labor.

pronojitsaha · 2016-01-18T12:37:03Z

Hi @bartleyn, I am not working on those at the moment.

rhiever · 2016-01-18T17:22:05Z

PR #71 is related and still in review (will get to it soon, promise -- I'm back from vacation now), but otherwise I believe that's the only pending change to the docs.

rasbt added the enhancement label Nov 25, 2015

rhiever mentioned this issue Dec 2, 2015

Add Executable Examples #24

Closed

rhiever closed this as completed Feb 9, 2016

AIAdventures mentioned this issue Jun 6, 2017

Titanic example -problem with 2nd last cell. #492

Closed

saddy001 mentioned this issue Mar 20, 2018

Segfault on optimization process #676

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Documentation Enhancement #32

Project Documentation Enhancement #32

rasbt commented Nov 25, 2015

rhiever commented Nov 25, 2015

rasbt commented Nov 25, 2015

rhiever commented Nov 25, 2015

pronojitsaha commented Nov 25, 2015

rasbt commented Nov 25, 2015

rhiever commented Nov 25, 2015

rasbt commented Nov 25, 2015

pronojitsaha commented Nov 26, 2015

rasbt commented Dec 1, 2015

pronojitsaha commented Dec 1, 2015

rasbt commented Dec 1, 2015

pronojitsaha commented Dec 1, 2015

rhiever commented Dec 2, 2015

rhiever commented Dec 2, 2015

rasbt commented Dec 2, 2015

pronojitsaha commented Dec 7, 2015

UniqueFool commented Dec 7, 2015

rhiever commented Dec 7, 2015

UniqueFool commented Dec 8, 2015

pronojitsaha commented Dec 8, 2015

bartleyn commented Jan 17, 2016

pronojitsaha commented Jan 18, 2016

rhiever commented Jan 18, 2016

Project Documentation Enhancement #32

Project Documentation Enhancement #32

Comments

rasbt commented Nov 25, 2015

rhiever commented Nov 25, 2015

rasbt commented Nov 25, 2015

rhiever commented Nov 25, 2015

pronojitsaha commented Nov 25, 2015

rasbt commented Nov 25, 2015

rhiever commented Nov 25, 2015

rasbt commented Nov 25, 2015

pronojitsaha commented Nov 26, 2015

rasbt commented Dec 1, 2015

pronojitsaha commented Dec 1, 2015

rasbt commented Dec 1, 2015

pronojitsaha commented Dec 1, 2015

rhiever commented Dec 2, 2015

rhiever commented Dec 2, 2015

rasbt commented Dec 2, 2015

pronojitsaha commented Dec 7, 2015

UniqueFool commented Dec 7, 2015

rhiever commented Dec 7, 2015

UniqueFool commented Dec 8, 2015

pronojitsaha commented Dec 8, 2015

bartleyn commented Jan 17, 2016

pronojitsaha commented Jan 18, 2016

rhiever commented Jan 18, 2016