Skip to content

Commit

Permalink
Merge pull request #1 from ogrisel/pberkes-mldata
Browse files Browse the repository at this point in the history
pep8 fixes
  • Loading branch information
pberkes committed May 24, 2011
2 parents e4dee4f + 81bcf5a commit 4ec4a02
Show file tree
Hide file tree
Showing 239 changed files with 32,451 additions and 10,892 deletions.
1 change: 1 addition & 0 deletions .gitignore
Expand Up @@ -6,6 +6,7 @@
.DS_Store
build
scikits/learn/datasets/__config__.py
scikits/learn/**/*.html

dist/
doc/_build/
Expand Down
12 changes: 11 additions & 1 deletion .mailmap
@@ -1,12 +1,17 @@
Gael Varoquaux <gael.varoquaux@normalesup.org> gvaroquaux <gael.varoquaux@normalesup.org>
Gael Varoquaux <gael.varoquaux@normalesup.org> Gael varoquaux <gael.varoquaux@normalesup.org>
Gael Varoquaux <gael.varoquaux@normalesup.org> GaelVaroquaux <gael.varoquaux@normalesup.org>
Olivier Grisel <olivier.grisel@ensta.org> ogrisel <olivier.grisel@ensta.org>
Alexandre Gramfort <alexandre.gramfort@inria.fr> Alexandre Gramfort <alexandre.gramfort@gmail.com>
Matthieu Perrot <matthieu.perrot@cea.fr> Matthieu Perrot <revilyo@earth.(none)>
Matthieu Perrot <matthieu.perrot@cea.fr> revilyo <revilyo@earth.(none)>
Vincent Michel <vincent.michel@inria.fr> vincent <vincent@vincent.org>
Vincent Michel <vincent.michel@inria.fr> vincent <vincent@axon.(none)>
Vincent Michel <vincent.michel@inria.fr> vincent M <vm.michel@gmail.com>
Vincent Michel <vincent.michel@inria.fr> Vincent M <vm.michel@gmail.com>
Vincent Michel <vincent.michel@inria.fr> Vincent Michel <vincent.michel@logilab.fr>
Vincent Michel <vincent.michel@inria.fr> Vincent M <vincent.michel@logilab.fr>
Vincent Michel <vincent.michel@inria.fr> Vincent michel <vmic@crater2.logilab.fr>
Ariel Rokem <arokem@berkeley.edu> arokem <arokem@berkeley.edu>
Bertrand Thirion <bertrand.thirion@inria.fr> bthirion <bertrand.thirion@inria.fr>
Peter Prettenhofer <peter.prettenhofer@gmail.com> pprett <peter.prettenhofer@gmail.com>
Expand All @@ -15,4 +20,9 @@ Vincent Dubourg <vincent.dubourg@gmail.com> dubourg <vincent.dubourg@gmail.com>
Vincent Dubourg <vincent.dubourg@gmail.com> dubourg <dubourg@PTlami14.(none)>
Christian Osendorfer <osendorf@gmail.com> osdf <osendorf@gmail.com>
James Bergstra <james.bergstra@gmail.com> james.bergstra <james.bergstra@gmail.com>
Xinfan Meng <mxf3306@gmail.com> mxf <mxf@chomsky.localdomain>
Xinfan Meng <mxf3306@gmail.com> mxf <mxf@chomsky.localdomain>
Jan Schlüter <scikit-learn@jan-schlueter.de> f0k <scikit-learn@jan-schlueter.de>
Vlad Niculae <vlad@vene.ro> vene <vlad@vene.ro>
Andreas Müller <amueller@ais.uni-bonn.de> amueller <amueller@ais.uni-bonn.de>
Virgile Fritsch <virgile.fritsch@gmail.com> VirgileFritsch <virgile.fritsch@gmail.com>
Virgile Fritsch <virgile.fritsch@gmail.com> Virgile <virgile.fritsch@gmail.com>
60 changes: 29 additions & 31 deletions AUTHORS.rst
Expand Up @@ -20,9 +20,9 @@ People
------


* David Cournapeau, 2007-2009
* David Cournapeau

* Fred Mailhot, 2008, Artificial Neural Networks (ann) module.
* Fred Mailhot

* David Cooke

Expand All @@ -32,64 +32,62 @@ People

* Ed Schofield

* Eric Jones, 2008, Genetic Algorithms (ga) module. No longer part
of scikits.learn.
* Eric Jones

* Jarrod Millman

* `Matthieu Brucher <http://matt.eifelle.com/>`_ contributed the
manifold module. It is not currently part of the scikit, althouth
it is planned to be newly included in the upcoming 0.7 release.
* `Matthieu Brucher <http://matt.eifelle.com/>`_

* Travis Oliphant

* Pearu Peterson

* `Fabian Pedregosa <http://fseoane.net/blog/>`_ joined the project
in January 2010 and is the current maintainer.
* `Fabian Pedregosa <http://fseoane.net/blog/>`_ (maintainer)

* `Gael Varoquaux <http://gael-varoquaux.info/blog/>`_

* `Jake VanderPlas <http://www.astro.washington.edu/users/vanderplas/>`_
contributed the BallTree module in February 2010.
* `Jake VanderPlas <http://www.astro.washington.edu/users/vanderplas/>`_

* `Alexandre Gramfort
<http://www-sop.inria.fr/members/Alexandre.Gramfort/index.fr.html>`_

* `Olivier Grisel <http://twitter.com/ogrisel>`_

* Vincent Michel.
* Bertrand Thirion

* Vincent Michel

* Chris Filo Gorgolewski

* `Angel Soler Gollonet <http://webylimonada.com>`_ contributed the
official logo and web page layout.
* `Angel Soler Gollonet <http://webylimonada.com>`_

* `Yaroslav Halchenko <http://www.onerussian.com/>`_ is the
maintainer for Debian OS and has contributed several fixes.
* `Yaroslav Halchenko <http://www.onerussian.com/>`_

* Ron Weiss joined the project in July 2010 and contributed both the
mixture and hmm module.
* Ron Weiss

* `Virgile Fritsch
<http://parietal.saclay.inria.fr/Members/virgile-fritsch>`_. Bug
fixes.
<http://parietal.saclay.inria.fr/Members/virgile-fritsch>`_

* `Mathieu Blondel <http://mblondel.org/journal>`_ joined the
project in September 2010 and has worked since on the sparse
matrix support, Ridge generalized crossval, text feature
extraction and general bug fixes.
* `Mathieu Blondel <http://mblondel.org>`_

* `Peter Prettenhofer
<http://sites.google.com/site/peterprettenhofer/>`_ joined the
project in October 2010 and contributed the :ref:`sgd` module as
well as several examples and fixes.
<http://sites.google.com/site/peterprettenhofer/>`_

* Vincent Dubourg

* `Alexandre Passos <http://atpassos.posterous.com>`_

* `Vlad Niculae <http://vene.ro>`_

* Edouard Duchesnay

* Thouis (Ray) Jones

* Lars Buitinck

* Vincent Dubourg joined the project in November 2010 and
contributed the :ref:`gaussian_process` module.
* Paolo Losi

* `Alexandre Passos <http://atpassos.posterous.com>`_ joined the
project in November 2010 contributed the fast SVD variant.
* Nelle Varoquaux


If I forgot anyone, do not hesitate to send me an email to
Expand Down
57 changes: 19 additions & 38 deletions README.rst
Expand Up @@ -3,8 +3,8 @@
About
=====

scikits.learn is a python module for machine learning built on top of
scipy.
scikits.learn is a Python module for machine learning built on top of
SciPy.

The project was started in 2007 by David Cournapeau as a Google Summer
of Code project, and since then many volunteers have contributed. See
Expand All @@ -13,47 +13,38 @@ the AUTHORS.rst file for a complete list of contributors.
It is currently maintained by a team of volunteers.


Download
========

You can download source code and Windows binaries from SourceForge:

http://sourceforge.net/projects/scikit-learn/files/
Important links
===============

- Official source code repo: https://github.com/scikit-learn/scikit-learn
- HTML documentation (stable release): http://scikit-learn.sourceforge.net/
- HTML documentation (development version): http://scikit-learn.sourceforge.net/dev/
- Download releases: http://sourceforge.net/projects/scikit-learn/files/
- Issue tracker: https://github.com/scikit-learn/scikit-learn/issues
- Mailing list: https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
- IRC channel: ``#scikit-learn`` at ``irc.freenode.net``

Dependencies
============

The required dependencies to build the software are python >= 2.5,
setuptools, NumPy >= 1.2, SciPy >= 0.7 and a working C++ compiler.
The required dependencies to build the software are Python >= 2.5,
setuptools, Numpy >= 1.2, SciPy >= 0.7 and a working C++ compiler.

To run the tests you will also need nose >= 0.10.


Install
=======

This packages uses distutils, which is the default way of installing
python modules. The install command is::

python setup.py install


Mailing list
============

There's a general and development mailing list, visit
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general to
subscribe to the mailing list.
This package uses distutils, which is the default way of installing
python modules. To install in your home directory, use::

python setup.py install --home

IRC channel
===========
To install for all users on Unix/Linux::

Some developers tend to hang around the channel ``#scikit-learn``
at ``irc.freenode.net``, especially during the week preparing a new
release. If nobody is available to answer your questions there don't
hesitate to ask it on the mailing list to reach a wider audience.
python setup.py build
sudo python setup.py install


Development
Expand All @@ -73,13 +64,6 @@ or if you have write privileges::

git clone git@github.com:scikit-learn/scikit-learn.git

Bugs
----

Please submit bugs you might encounter, as well as patches and feature
requests to the tracker located at github
https://github.com/scikit-learn/scikit-learn/issues


Testing
-------
Expand All @@ -91,6 +75,3 @@ source directory (you will need to have nosetest installed)::

See web page http://scikit-learn.sourceforge.net/install.html#testing
for more information.



143 changes: 143 additions & 0 deletions benchmarks/bench_plot_fastkmeans.py
@@ -0,0 +1,143 @@
import gc
from time import time
import sys

from collections import defaultdict

import numpy as np
from numpy import random as nr

from scikits.learn.cluster.k_means_ import KMeans, MiniBatchKMeans


def compute_bench(samples_range, features_range):

it = 0
iterations = 200
results = defaultdict(lambda: [])
chunk = 100

max_it = len(samples_range) * len(features_range)
for n_samples in samples_range:
for n_features in features_range:
it += 1
print '=============================='
print 'Iteration %03d of %03d' %(it, max_it)
print '=============================='
print ''
data = nr.random_integers(-50, 50, (n_samples, n_features))

print 'K-Means'
tstart = time()
kmeans = KMeans(init='k-means++',
k=10).fit(data)

delta = time() - tstart
print "Speed: %0.3fs" % delta
print "Inertia: %0.5f" % kmeans.inertia_
print ''

results['kmeans_speed'].append(delta)
results['kmeans_quality'].append(kmeans.inertia_)

print 'Fast K-Means'
# let's prepare the data in small chunks
mbkmeans = MiniBatchKMeans(init='k-means++',
k=10,
chunk_size=chunk)
tstart = time()
mbkmeans.fit(data)
delta = time() - tstart
print "Speed: %0.3fs" % delta
print "Inertia: %f" % mbkmeans.inertia_
print ''
print ''

results['minibatchkmeans_speed'].append(delta)
results['minibatchkmeans_quality'].append(mbkmeans.inertia_)

return results


def compute_bench_2(chunks):
results = defaultdict(lambda: [])
n_features = 50000
means = np.array([[1, 1], [-1, -1], [1, -1], [-1, 1],
[0.5, 0.5], [0.75, -0.5], [-1, 0.75], [1, 0]])
X = np.empty((0, 2))
for i in xrange(8):
X = np.r_[X, means[i] + 0.8 * np.random.randn(n_features, 2)]
max_it = len(chunks)
it = 0
for chunk in chunks:
it += 1
print '=============================='
print 'Iteration %03d of %03d' %(it, max_it)
print '=============================='
print ''

print 'Fast K-Means'
tstart = time()
mbkmeans = MiniBatchKMeans(init='k-means++',
k=8,
chunk_size=chunk)

mbkmeans.fit(X)
delta = time() - tstart
print "Speed: %0.3fs" % delta
print "Inertia: %0.3fs" % mbkmeans.inertia_
print ''

results['minibatchkmeans_speed'].append(delta)
results['minibatchkmeans_quality'].append(mbkmeans.inertia_)

return results


if __name__ == '__main__':
from mpl_toolkits.mplot3d import axes3d # register the 3d projection
import matplotlib.pyplot as plt

samples_range = np.linspace(50, 150, 5).astype(np.int)
features_range = np.linspace(150, 50000, 5).astype(np.int)
chunks = np.linspace(500, 10000, 15).astype(np.int)

results = compute_bench(samples_range, features_range)
results_2 = compute_bench_2(chunks)

max_time = max([max(i) for i in [t for (label, t) in results.iteritems()
if "speed" in label]])
max_inertia = max([max(i) for i in [
t for (label, t) in results.iteritems()
if "speed" not in label]])

fig = plt.figure()
for c, (label, timings) in zip('brcy',
sorted(results.iteritems())):
if 'speed' in label:
ax = fig.add_subplot(2, 2, 1, projection='3d')
ax.set_zlim3d(0.0, max_time * 1.1)
else:
ax = fig.add_subplot(2, 2, 2, projection='3d')
ax.set_zlim3d(0.0, max_inertia * 1.1)

X, Y = np.meshgrid(samples_range, features_range)
Z = np.asarray(timings).reshape(samples_range.shape[0],
features_range.shape[0])
ax.plot_surface(X, Y, Z.T, cstride=1, rstride=1, color=c, alpha=0.5)
ax.set_xlabel('n_samples')
ax.set_ylabel('n_features')


i = 0
for c, (label, timings) in zip('br',
sorted(results_2.iteritems())):
i += 1
ax = fig.add_subplot(2, 2, i + 2)
y = np.asarray(timings)
ax.plot(chunks, y, color=c, alpha=0.8)
ax.set_xlabel('chunks')
ax.set_ylabel(label)


plt.show()

0 comments on commit 4ec4a02

Please sign in to comment.