Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BF: Fix get_text_rendering when Citation is passed with Doi #46

Merged
merged 7 commits into from Sep 3, 2015

Conversation

mvdoc
Copy link
Member

@mvdoc mvdoc commented Aug 15, 2015

We're getting there... however there is still a problem with the injector because there should be two citations for spectral clustering (see below), but only one appears...probably due to how they're stored. Will look into that.

(duecredit)contematto@talete ~/github/scikit-learn/examples/cluster (master*) $ python -m duecredit plot_cluster_comparison.py

=========================================================
Comparing different clustering algorithms on toy datasets
=========================================================

This example aims at showing characteristics of different
clustering algorithms on datasets that are "interesting"
but still in 2D. The last dataset is an example of a 'null'
situation for clustering: the data is homogeneous, and
there is no good clustering.

While these examples give some intuition about the algorithms,
this intuition might not apply to very high dimensional data.

The results could be improved by tweaking the parameters for
each clustering strategy, for instance setting the number of
clusters for the methods that needs this parameter
specified. Note that affinity propagation has a tendency to
create many clusters. Thus in this example its two parameters
(damping and per-point preference) were set to to mitigate this
behavior.

/Users/contematto/virtualenv/duecredit/lib/python2.7/site-packages/sklearn/manifold/spectral_embedding_.py:217: UserWarning: Graph is not fully connected, spectral embedding may not work as expected.
  warnings.warn("Graph is not fully connected, spectral embedding"
/Users/contematto/virtualenv/duecredit/lib/python2.7/site-packages/sklearn/cluster/hierarchical.py:207: UserWarning: the number of connected components of the connectivity matrix is 2 > 1. Completing it to avoid stopping the tree early.
  connectivity, n_components = _fix_connectivity(X, connectivity)
/Users/contematto/virtualenv/duecredit/lib/python2.7/site-packages/sklearn/cluster/hierarchical.py:443: UserWarning: the number of connected components of the connectivity matrix is 2 > 1. Completing it to avoid stopping the tree early.
  connectivity, n_components = _fix_connectivity(X, connectivity)
/Users/contematto/virtualenv/duecredit/lib/python2.7/site-packages/sklearn/cluster/hierarchical.py:207: UserWarning: the number of connected components of the connectivity matrix is 3 > 1. Completing it to avoid stopping the tree early.
  connectivity, n_components = _fix_connectivity(X, connectivity)
/Users/contematto/virtualenv/duecredit/lib/python2.7/site-packages/sklearn/cluster/hierarchical.py:443: UserWarning: the number of connected components of the connectivity matrix is 3 > 1. Completing it to avoid stopping the tree early.
  connectivity, n_components = _fix_connectivity(X, connectivity)

DueCredit Report:
- scipy (v 0.14) [1]
- sklearn (v 0.17.dev0) [2]
  - sklearn.cluster.affinity_propagation_ (v 0.17.dev0) [3]
  - sklearn.cluster.dbscan_:dbscan (v 0.17.dev0) [4]
  - sklearn.cluster.spectral:spectral_clustering (v 0.17.dev0) [5]

2 packages cited
1 modules cited
2 functions cited

References
----------

[1] Jones, E. et al., 2001. SciPy: Open source scientific tools for Python.
[2] Pedregosa, F. et al., 2011. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, pp.2825–2830.
[3] Frey, B.J. & Dueck, D., 2007. Clustering by Passing Messages Between Data Points. Science, 315(5814), pp.972–976.
[4] Ester, M. et al., 1996. A density-based algorithm for discovering clusters in large spatial databases with noise.. In Kdd. pp. 226–231.
[5] von Luxburg, U., 2007. A tutorial on spectral clustering. Stat Comput, 17(4), pp.395–416.

citation_bibtex = Citation(_sample_bibtex, path='mypath')
citation_doi = Citation(_sample_doi, path='mypath')
# smoke test
assert_true(get_text_rendering(citation_bibtex))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why just a smoke test ? ideally mock out format_bibtex a nd verify that it gets called in both cases. and you could even verify that they get correct input to them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok -- we should talk about patches. I tried but can't seem to make it work. It does nothing.

@patch('duecredit.io.format_bibtex')
def test_get_text_rendering(mocked_format_bibtex):
    citation_bibtex = Citation(_sample_bibtex, path='mypath')
    citation_doi = Citation(_sample_doi, path='mypath')
    # smoke test
    assert_true(get_text_rendering(citation_bibtex))
    mocked_format_bibtex.assert_called_with(citation_bibtex.entry)
    assert_true(get_text_rendering(citation_doi))

this won't work (assertion fails because mocked_format_bibtex wasn't called). hints?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Sat, 15 Aug 2015, Matteo Visconti di Oleggio Castello wrote:

this won't work. hints?

because there is a bug! ;) aren't tests great? ;) although bug is only in the
test itself.

I first thought that it is due to some intricacies of mock, so you might
like to read
http://alexmarandon.com/articles/python_mock_gotchas/
http://www.voidspace.org.uk/python/mock/patch.html#where-to-patch

but then, after I added

import pdb; pdb.set_trace()

right before

assert_true(get_text_rendering(citation_bibtex))

it pointed that citation.entry is not BibTeX, it is a str! So we have a bit
"ambigous" API here for the Citation -- entry could be a string key or an
actual DueCreditEntry. Would be nice to add some safeguard I guess, e.g. throw
a warning if provided entry is a string and has new lines in it (unlikely we
should allow keys to have new lines). Or may be we should make it more
explicit??? (make it have two separate parameters -- entry and key)

anyways -- fix here would be

citation_bibtex = Citation(BibTeX(_sample_bibtex), path='mypath')

and then failure would be different and more meaningful ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, the things one learns! how sneaky this was...thanks :-)

path=citation.path,
version=citation.version,
cite_module=citation.cite_module,
tags=citation.tags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

future todo -- I don't like that this 'cloning' happens outside of the Citation code. We should may be make entry a property with a setter, and then just copy old citation (copy.copy) into a new one and override entry with bibtex one

@yarikoptic
Copy link
Member

I keep coming to this PR always thinking we have already merged it ;) please resolve conflicts (and may be address my comment) and let's get it merged

* origin/master:
  BF: print description, not just path. Closes duecredit#49
  ENH/NF: all conditions must be met ('and' logic) + allow access values from attributes of the arguments
  RF: improve test, conditioning of the method
  ENH: few more injections for sklearn
  BF: catch also AttributeError while getting to the object
  ENH: few more injections for sklearn
  DOC: provide custom short long_description
  adjusting for -m python
  RF: move external versions into its separate submodule

 Conflicts:
	duecredit/tests/test_collector.py

Accept remote
@mvdoc
Copy link
Member Author

mvdoc commented Sep 2, 2015

Done (but why travis is not running?)

@yarikoptic
Copy link
Member

not sure about travis -- could you push 1 more change to this PR to see if it still doesn't react?

@mvdoc
Copy link
Member Author

mvdoc commented Sep 3, 2015

OK travis got back on track

yarikoptic added a commit that referenced this pull request Sep 3, 2015
BF: Fix get_text_rendering when Citation is passed with Doi
@yarikoptic yarikoptic merged commit 72a78bc into duecredit:master Sep 3, 2015
@yarikoptic
Copy link
Member

and a new release was tagged ;-)

@mvdoc mvdoc deleted the bf/citationdoi branch September 3, 2015 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants