Added 'testseq' function #1262

Adil-Iqbal · 2017-05-27T17:01:17Z

Hopefully resolved all of the Travis CI style conflicts.

This function (and all of its helper functions/classes) have been instrumental in demonstrating the Biopython module to others. My hope is to pass this utility on to others and help them as well.

Very open to comments and criticisms. And will support my work for the foreseeable future.

Adil-Iqbal · 2017-05-27T17:37:55Z

I've reviewed the Travis CI test failures and they all occurred during the docstring tests.

Due to the random nature of the sequence generation, no two occurences will be exactly alike.
If you take the time to look at the reports, all of the Travis outputs serve to prove all of the concepts discussed in the docstring.

It would mean a lot to me if you would help me figure out how to circumvent this situation.

Thanks,
Adil.

Adil-Iqbal · 2017-05-27T17:58:45Z

Alright. I've decided to add a random seed to the function.

The user will have the option of changing the seed as a function argument.

I will also play around with having a global Boolean variable that the user can change to revert the function's behavior to what it is currently but it will require some testing.

Thanks,
Adil

peterjc · 2017-05-29T10:05:49Z

As you have noticed, random examples are not well suited to doctests.

I'm wondering if rather than adding a testseq function to Bio/SeqUtils/__init__.py this would be better under Scripts/ or simply in the main Tutorial?

(Note the Tutorial is currently written in LaTeX with special comments for doctests which are run via Tests/test_Tutorial.py)

I think we should ask for more opinions on the Biopython(-dev) mailing list - are you subscribed to those? http://biopython.org/wiki/Mailing_lists

- added global seed variable 'shuffle_seed' - added argument 'rand_seed' - added argument 'shuffle' - updated docstrings to seeded outputs - updated docstrings to include new arguments

andrewguy · 2017-05-30T03:03:20Z

Just a couple of comments.

While I think that this is a good idea for the purpose of education, I don't know if there is any obvious utility in everyday research use. Happy to hear otherwise - I haven't given this too much thought yet. As such, I think I agree with @peterjc that this would be better off somewhere other than the main SeqUtils module. Or perhaps we keep it under SeqUtils, but don't import it by default?

A few comments on the code itself. I'd be really hesitant to introduce a global variable (shuffle_seed) unless you really, really need to. The whole approach to setting a random seed is unnecessarily complicated. Just set a seed the first time you use the function, and everything from that point will be repeatable (and negating the need for the shuffle argument). If you want to retrieve the same sequence twice, just use the same seed in both calls. Just to illustrate how this works:

>>> import random
>>> random.seed(1234)
>>> random.random()
0.9664535356921388

Without providing another seed, we get a different number the second time we call random()

>>> random.random()
0.4407325991753527

Resetting the seed reproduces the same sequence of random numbers

>>> random.seed(1234)
>>> random.random()
0.9664535356921388
>>> random.random()
0.4407325991753527

Given this, I think you can reduce your random seeding code down to:

if rand_seed is not None:
    seed(rand_seed)

Set the default random seed to 0. No need to use an arbitrary number such as 9001. And you can then drop the shuffle argument.

One other comment from my cursory look at this - if a user sets gc_target > 100 or < 0, wouldn't it make more sense to raise a warning or exception, rather than silently set the GC target to the upper or lower bound? In this scenario it is likely that the user doesn't really understand what the gc_target represents, and hence should be alerted to the issue.

Happy to give this a more thorough review if there is consensus that this is a useful addition to the code-base.

Edit: You need to set the default random seed to None, not 0. Sorry.

andrewguy · 2017-05-30T03:22:29Z

For random number generation, an even better idea would be to instantiate a local version of the random.Random class, so any seeding that happens in your function doesn't affect other parts of the code (i.e. other modules!) which use the default random instance.

andrewguy · 2017-06-01T12:20:56Z

Hi Adil,

I've just seen your message on the dev mailing list.

With regards to the random number generation, I would put the rand_instance = Random() outside your functions definitions. This should solve all the issues that you have with seeding. As you've got it currently, you are creating a new instance each time you call the function. You want a single instance created when you load the module, which you can then seed from within the main testseq function. Let me know if that's unclear and I will do a pull request on your repo to show what I mean.

Within the function body you can then reduce it to:

if rand_seed is not None:
    rand_instance.seed(rand_seed)

No need to convert the seed to an integer, as Python uses the hash of the provided seed, so the seed can really be any hashable object.

Adil-Iqbal · 2017-06-10T20:10:02Z

See Pull Request #1269
Closing pull request.

Adil-Iqbal added 3 commits May 27, 2017 10:46

First draft

4efe934

Resolved (hopefully all of the) Travis CI style conflicts.

8f3540a

Resolved MORE Travis CI style conflicts.

276ee67

peterjc mentioned this pull request May 29, 2017

"Testseq" Function #1261

Closed

peterjc added Documentation Enhancement labels May 29, 2017

Added seeded PRNG

7f4429c

- added global seed variable 'shuffle_seed' - added argument 'rand_seed' - added argument 'shuffle' - updated docstrings to seeded outputs - updated docstrings to include new arguments

Adil-Iqbal closed this Jun 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added 'testseq' function #1262

Added 'testseq' function #1262

Adil-Iqbal commented May 27, 2017

Adil-Iqbal commented May 27, 2017

Adil-Iqbal commented May 27, 2017

peterjc commented May 29, 2017

andrewguy commented May 30, 2017 •

edited

andrewguy commented May 30, 2017

andrewguy commented Jun 1, 2017

Adil-Iqbal commented Jun 10, 2017

Added 'testseq' function #1262

Added 'testseq' function #1262

Conversation

Adil-Iqbal commented May 27, 2017

Adil-Iqbal commented May 27, 2017

Adil-Iqbal commented May 27, 2017

peterjc commented May 29, 2017

andrewguy commented May 30, 2017 • edited

andrewguy commented May 30, 2017

andrewguy commented Jun 1, 2017

Adil-Iqbal commented Jun 10, 2017

andrewguy commented May 30, 2017 •

edited