Browse files

Adjusting some line breaks. Removed the long section on moving from B…

…io.kmeans or Bio.xkmeans to Bio.Cluster, as this is unlikely to be of interest anymore.
  • Loading branch information...
1 parent 72ae5ad commit 4b3e68b429eccbcf39b40808c0f269e8424fc9c4 @peterjc peterjc committed Jan 30, 2009
Showing with 14 additions and 245 deletions.
  1. +14 −245 DEPRECATED
@@ -18,7 +18,7 @@ remove support for colour and centre in later releases of Biopython.
Bio.AlignAce and Bio.MEME
As of Biopython 1.50, these modules are considered to be obsolete with the
-introduction of Bio.Motif, and will be deprecated in a future release.
+introduction of Bio.Motif, and they will be deprecated in a future release.
Numeric support
@@ -72,8 +72,8 @@ Deprecated in Release 1.48.
-Deprecated in Release 1.48, this parser was replaced by Bio.Emboss.Primer3 and
-Bio.Emboss.PrimerSearch instead.
+Deprecated in Release 1.48, this parser was replaced by Bio.Emboss.Primer3
+and Bio.Emboss.PrimerSearch instead.
@@ -164,8 +164,8 @@ Deprecated as of Release 1.45, removed in Release 1.48
-The modules under Bio.WWW were deprecated in Release 1.45, and removed in 1.48.
-The remaining stub Bio.WWW was deprecated in Release 1.48.
+The modules under Bio.WWW were deprecated in Release 1.45, and removed in
+Release 1.48. The remaining stub Bio.WWW was deprecated in Release 1.48.
The functionality in Bio.WWW.SCOP, Bio.WWW.InterPro and Bio.WWW.ExPASy
is now available from Bio.SCOP, Bio.InterPro and Bio.ExPASy instead.
@@ -199,8 +199,8 @@ Bio.FormatIO
This was removed in Release 1.44 (a deprecation was not possible).
-Bio.expressions (and therefore Bio.config, Bio.dbdefs, Bio.formatdefs, Bio.dbdefs)
+Bio.expressions, Bio.config, Bio.dbdefs, Bio.formatdefs and Bio.dbdefs
These were deprecated in Release 1.44, and removed in Release 1.49.
@@ -215,8 +215,8 @@ Use the functions 'complement' and 'reverse_complement' in Bio.Seq instead.
-The functions 'forward_complement' and 'antiparallel' in Bio.GFF.easy have been
-deprecated as of Release 1.31, and removed in Release 1.43.
+The functions 'forward_complement' and 'antiparallel' in Bio.GFF.easy have
+been deprecated as of Release 1.31, and removed in Release 1.43.
Use the functions 'complement' and 'reverse_complement' in Bio.Seq instead.
@@ -235,242 +235,11 @@
-Deprecated as of Release 1.30, removed in Release 1.42.
-RecordFile wasn't completely implemented and duplicates the work
-of most standard parsers.
+Deprecated as of Release 1.30, removed in Release 1.42. RecordFile wasn't
+completely implemented and duplicates the work of most standard parsers.
Bio.kMeans and Bio.xkMeans
-Deprecated as of Release 1.30, removed in Release 1.42.
-The k-Means algorithm is an algorithm for unsupervised clustering of data.
-Biopython includes an implementation of the k-means clustering algorithm
-in Recently, a larger set of clustering algorithms entered
-Biopython as Bio.Cluster. As the kcluster routine in Bio.Cluster also implements
-the k-means clustering algorithm, the module has been deprecated.
-Below you will find a description of how to switch from to
-Bio.Cluster's kcluster.
-The function kcluster in Bio.Cluster performs k-means or k-medians clustering.
-The corresponding function in is called cluster. This function takes
-the following arguments:
-o data
-o k
-o distance_fn
-o init_centroids_fn
-o calc_centroid_fn
-o max_iterations
-o update_fn
-The function kcluster in Bio.Cluster takes the following arguments:
-o data
-o nclusters
-o mask
-o weight
-o transpose
-o npass
-o method
-o dist
-o initialid
-Arguments for's cluster, and their equivalents in Bio.Cluster
-o data:
-In, data is a list of vectors, each containing the same number of
-data points. Within the context of clustering genes based on their gene
-expression values, each vector would correspond to the gene expression data of
-one particular gene, and the values in the vector would correspond to the
-measured gene expression value by the different microarrays. The cluster
-routine in always performs a row-wise clustering by grouping vectors.
-The argument data to Bio.Cluster's kcluster has the same structure as in However, Bio.Cluster allows row-wise and column-wise clustering by
-the transpose argument. If transpose==0 (the default value), kcluster performs
-row-wise clustering, consistent with If transpose==1, kcluster
-performs column-wise clustering. The same behavior can be obtained, of course,
-by transposing the data array before calling kcluster.
-o k:
-The desired number of clusters is specified by the input argument k in The corresponding argument in Bio.Cluster's kcluster is nclusters.
-o distance_fn:
-In, the argument distance_fn represents the distance function to
-calculate the distances between items and cluster centroids. This argument
-corresponds to a true Python function. The default value is the Euclidean
-distance, implemented as distance.euclidean in User-defined
-distance functions can also be used.
-The k-means routine in Bio.Cluster does not allow user-specified distance
-functions. Instead, it provides the following nine built-in distance functions,
-depending on the argument dist:
-dist=='e': Euclidean distance
-dist=='h': Harmonically summed Euclidean distance
-dist=='b': City-block distance
-dist=='c': Pearson correlation
-dist=='a': absolute value of the Pearson correlation
-dist=='u': uncentered correlation
-dist=='x': absolute uncentered correlation
-dist=='s': Spearmans rank correlation
-dist=='k': Kendalls tau
-User-defined distance functions are possible only by modifying the C code in
-cluster.c (which may not be as hard as it sounds). The default distance function
-is the Euclidean distance (distance=='e'). Note that in Bio.Cluster the
-Euclidean distance is defined as the sum of squared differences, whereas in the square root of this quantity is taken. This does not affect the
-clustering result.
-o init_centroids_fn:
-This function specifies the initial choice for the cluster centroids. By
-default, cluster in uses a random initial choice of cluster centroids
-by randomly choosing k data vectors from the input vectors in the data input
-argument. Alternatively, the user can specify a user-defined function to choose
-the initial cluster centroids.
-In Bio.Cluster, the k-means algorithm in kcluster starts from an initial cluster
-assignment instead of an initial choice of cluster centroids. As far as I know,
-these two initialization methods are equivalent in practice. Similar to the
-cluster routine in, Bio.Cluster's kcluster performs a random initial
-assignment of items to clusters. Alternatively, users can specify a
-(deterministic) initial clustering via the initialid argument. This argument is
-None by default. If not None, it should be a 1D array (or list) containing the
-number (between 0 and nclusters-1) of the cluster to which each item is
-assigned initially.
-Note that the k-means routine in Bio.Cluster performs automatic repeats of the
-algorithm, each time starting from a different random initial clustering. See
-the comment for the npass argument below.
-o calc_centroid_fn:
-This argument specifies how to calculate the cluster centroids, given the data
-vectors of the items that belong to each cluster. By default, the mean over the
-vectors is calculated. A user-defined function can also be used.
-Bio.Cluster's kcluster does not allow user-defined functions. Instead, the
-method to calculate the cluster centroid is determined by the argument method,
-which can be either 'a' (arithmetic mean) or 'm' (median). The default is to
-calculate the mean ('a').
-o max_iterations:
-The cluster routine in has an argument max_iterations, which is used
-to stop the iteration it the routine does not converge after the given number of
-The kcluster routine in Bio.Cluster does not have such an argument. The failure
-of a k-means algorithm to converge is due to the occurrence of periodic
-clustering solutions during the course of the k-means algorithm. The kcluster
-routine in Bio.Cluster automatically checks for the occurrence of such a
-periodicity in the solutions. If a periodic behavior is detected, the algorithm
-is interrupted and the last clustering solution is returned. Accordingly, the
-kcluster routine is guaranteed to return a clustering solution. Also see the
-discussion of the npass argument below.
-o update_fn:
-The argument update_fn to cluster in is a hook function that is
-called at the beginning of every iteration and passed the iteration number,
-cluster centroids, and current cluster assignments. It is used by,
-which provides a visualization of k-means clustering. Currently there is no
-equivalent in Bio.Cluster.
-Other arguments for Bio.Cluster's kcluster.
-Three arguments in Bio.Cluster's kcluster do not have a direct equivalent in's cluster.
-o mask:
-Microarray experiments tend to suffer from a large number of missing data. The
-argument mask to Bio.Cluster's kcluster lets the user specify which data are
-missing. This argument is an array with the same shape as data, and contains
-a 1 for each data point that is present, and a 0 for a missing data point:
- mask[i,j]==1: data[i,j] is valid
- mask[i,j]==0: data[i,j] is a missing data point
-Missing data points are ignored by the clustering algorithm. By default, mask
-is an array containing 1's everywhere.
-o weight:
-The weight argument is used to put different weights on different data point.
-For example, when clustering genes based on their gene expression profile, we
-may want to attach a bigger weight to some microarrays compared to others. By
-default, the weight argument contains equal weights of 1.0 for all data points.
-Note that for row-wise clustering, the weight argument is a 1D vector whose
-length is equal to the number of columns. For column-wise clustering, the length
-of this argument is equal to the number of rows.
-o npass:
-Typical implementations of the k-means clustering algorithm rely on a random
-initialization. Unlike Self-Organizing Maps, however, the k-means algorithm has
-a clearly defined goal, which is to minimize the within-cluster sum of
-distances. Different k-means clustering solutions (based on different initial
-clusterings) can therefore be compared to each other directly. In order to
-increase the chance of finding the optimal k-means clustering solution, the
-k-means routine in Bio.Cluster automatically repeats the algorithm npass times,
-each time starting from a different initial random clustering. The best
-clustering solution, as well as in how many of the npass attempts it was found,
-is returned to the user. For more information, see the output variable nfound
-Return values
-The cluster routine in returns two values:
-o centroids
-o clusters
-The kcluster routine in Bio.Cluster returns four values:
-o clusterid
-o centroids
-o error
-o nfound
-o centroids:
-The centroids return value contains the centroids of the k clusters that were
-found, and corresponds to the centroids return value from Bio.Cluster's
-kcluster routine.
-o clusters:
-The clusters return value contains the number of the cluster to which each
-vector was assigned. The corresponding return value in Bio.Cluster's kcluster
-is clusterid.
-o error:
-The error return value from Bio.Cluster's kcluster is the within-cluster sum of
-distances for the optimal clustering solution that was found. This value can be
-used to compare different clustering solutions to each other.
-o nfound:
-The nfound return value from Bio.Cluster's kcluster shows in how many of the
-npass runs the optimal clustering solution was found. Accordingly, nfound is at
-least 1 and at most equal to npass. A large value for nfound is an indication
-that the clustering solution that was found is optimal. On the other hand, if
-nfound is equal to 1, it is very well possible that a better clustering solution
-exists than the one found by kcluster.
+Deprecated as of Release 1.30, removed in Release 1.42. Instead, please use
+the function kcluster in Bio.Cluster which performs k-means or k-medians

0 comments on commit 4b3e68b

Please sign in to comment.