Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 357 lines (273 sloc) 13.127 kb
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
1 This file provides documentation for modules in Biopython that have been moved
2 or deprecated in favor of other modules. This provides some quick and easy
3 to find documentation about how to update your code to work again.
4
81015eb @peterjc Declaring Bio.PubMed and the online parts of Bio.GenBank as OBSOLETE,…
peterjc authored
5 Bio.GenBank
6 ===========
7 The online functionality (search_for, download_many, and NCBIDictionary) was
8 declared obsolete in Release 1.48, with the intention of an official deprecation
9 in the following release. Please use Bio.Entrez instead.
10
11 Bio.PubMed
12 ==========
13 Declared obsolete in Release 1.48, with the intention of an official deprecation
14 in the following release. Please use Bio.Entrez instead.
15
01e6e76 @peterjc Bio.EUtils deprecated in favour of Bio.Entrez
peterjc authored
16 Bio.EUtils
17 ==========
18 Deprecated in favor of Bio.Entrez in Release 1.48
19
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
20 Bio.Blast.NCBIWWW
21 =================
22 The HTML BLAST parser was deprecated as of Release 1.48
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
23 The deprecated functions blast and blasturl were removed in Release 1.44
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
24
25 Bio.Saf
26 =======
27 Deprecated as of Release 1.48, as it appears to have no users, and relies
28 on Martel which doesn't work properly with mxTextTools 3.0
29
5be4221 @peterjc Deprecating Bio.IntelliGenetics in favour of the ig format in Bio.SeqIO
peterjc authored
30 Bio.IntelliGenetics
31 ===================
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
32 Deprecated as of Release 1.48 in favor of the "ig" format in Bio.SeqIO
d01c450 Getting ready for release 1.46.
mdehoon authored
33
5e507c9 Updating for release 1.47.
mdehoon authored
34 Bio.ECell
35 =========
36 Deprecated as of Release 1.47, as it appears to have no users, and the code
37 does not seem relevant for ECell 3.
38
d01c450 Getting ready for release 1.46.
mdehoon authored
39 Bio.Rebase
40 ==========
41 Deprecated as of Release 1.46.
42
43 Bio.Gobase
44 ==========
45 Deprecated as of Release 1.46.
46
47 Bio.CDD
48 =======
49 Deprecated as of Release 1.46.
50
21059b1 @peterjc Bio.biblio was deprecated for Biopython 1.45, but I didn't remember t…
peterjc authored
51 Bio.biblio
52 ==========
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
53 Deprecated as of Release 1.45, removed in Release 1.48
21059b1 @peterjc Bio.biblio was deprecated for Biopython 1.45, but I didn't remember t…
peterjc authored
54
4556db2 @peterjc Bringing these up to date with changes since Biopython 1.44
peterjc authored
55 Bio.WWW
56 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
57 The modules under Bio.WWW were deprecated in Release 1.45, and removed in 1.48.
58 The remaining stub Bio.WWW was deprecated in Release 1.48.
59
4556db2 @peterjc Bringing these up to date with changes since Biopython 1.44
peterjc authored
60 The functionality in Bio.WWW.SCOP, Bio.WWW.InterPro and Bio.WWW.ExPASy
61 is now available from Bio.SCOP, Bio.InterPro and Bio.ExPASy instead.
62
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
63 Bio.SeqIO
64 =========
65 The old Bio.SeqIO.FASTA and Bio.SeqIO.generic were deprecated in favour of
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
66 the new Bio.SeqIO module as of Release 1.44, removed in Release 1.47
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
67
68 Bio.lcc
69 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
70 Deprecated in favor of Bio.SeqUtils.lcc in Release 1.44, removed in 1.46
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
71
72 Bio.crc
73 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
74 Deprecated in favor of Bio.SeqUtils.CheckSum in Release 1.44, removed in 1.46
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
75
76 Bio.FormatIO
77 ============
78 This was removed in Release 1.44
79
fa7ab2d @peterjc Updating for recent deprecations
peterjc authored
80 Bio.expressions (and therefore Bio.config, Bio.dbdefs, Bio.formatdefs, Bio.dbdefs)
5145a4d @peterjc Bringing this up to date for Biopython 1.44
peterjc authored
81 ===============
82 This has been deprecated as of Release 1.44
83
84 Bio.Kabat
85 =========
86 This was deprecated in Release 1.43 and removed in Release 1.44
87
34b4f31 Added the functions 'complement' and 'reverse_complement' to Bio.Seq'…
mdehoon authored
88 Bio.SeqUtils
89 ============
90 The functions 'complement' and 'antiparallel' in Bio.SeqUtils have been
91 deprecated as of Release 1.31. Use the functions 'complement' and
92 'reverse_complement' in Bio.Seq instead.
93
94 Bio.GFF
95 =======
96 The functions 'forward_complement' and 'antiparallel' in Bio.GFF.easy have been
97 deprecated as of Release 1.31. Use the functions 'complement' and
98 'reverse_complement' in Bio.Seq instead.
efd9b60 Added blast to qblast change to DEPRECATED file
chapmanb authored
99
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
100 Bio.sequtils
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
101 ============
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
102 Deprecated as of Release 1.30, removed in Release 1.42
1c02f0c Bio.sequtils and Bio.SeqUtils were duplicated code, and even worse we…
chapmanb authored
103 Use Bio.SeqUtils instead.
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
104
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
105 Bio.SVM
106 =======
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
107 Deprecated as of Release 1.30, removed in Release 1.42
909bae9 Deprecated Bio.SVM and recommend usage of libsvm.
chapmanb authored
108 The Support Vector Machine code in Biopython has been superceeded by a
109 more robust (and maintained) SVM library, which includes a python
110 interface. We recommend using LIBSVM:
111
112 http://www.csie.ntu.edu.tw/~cjlin/libsvm/
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
113
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
114 Bio.RecordFile
115 ==============
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
116 Deprecated as of Release 1.30, removed in Release 1.42
23b046b Removed internal references to RecordFile, which are really not needed.
chapmanb authored
117 RecordFile wasn't completely implemented and duplicates the work
118 of most standard parsers. We recommend using a specific iterator
119 (Bio.Fasta.Iterator for example) without a parser to get back
120 text records.
121
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
122 Bio.kMeans and Bio.xkMeans
123 ==========================
b927439 @peterjc Bio.WWW deprecation, and updating old entries to say when they were r…
peterjc authored
124 Deprecated as of Release 1.30, removed in Release 1.42
b0acc00 Added instructions on how to move to Bio.Cluster from Bio.kMeans and
mdehoon authored
125
126 The k-Means algorithm is an algorithm for unsupervised clustering of data.
127 Biopython includes an implementation of the k-means clustering algorithm
128 in kMeans.py. Recently, a larger set of clustering algorithms entered
129 Biopython as Bio.Cluster. As the kcluster routine in Bio.Cluster also implements
130 the k-means clustering algorithm, the kMeans.py module has been deprecated.
131 Below you will find a description of how to switch from kMeans.py to
132 Bio.Cluster's kcluster.
133
134 The function kcluster in Bio.Cluster performs k-means or k-medians clustering.
135 The corresponding function in kMeans.py is called cluster. This function takes
136 the following arguments:
137
138 o data
139 o k
140 o distance_fn
141 o init_centroids_fn
142 o calc_centroid_fn
143 o max_iterations
144 o update_fn
145
146 The function kcluster in Bio.Cluster takes the following arguments:
147
148 o data
149 o nclusters
150 o mask
151 o weight
152 o transpose
153 o npass
154 o method
155 o dist
156 o initialid
157
158
159 Arguments for kMeans.py's cluster, and their equivalents in Bio.Cluster
160 -----------------------------------------------------------------------
161
162
163 o data:
164
165 In kMeans.py, data is a list of vectors, each containing the same number of
166 data points. Within the context of clustering genes based on their gene
167 expression values, each vector would correspond to the gene expression data of
168 one particular gene, and the values in the vector would correspond to the
169 measured gene expression value by the different microarrays. The cluster
170 routine in kMeans.py always performs a row-wise clustering by grouping vectors.
171
172 The argument data to Bio.Cluster's kcluster has the same structure as in
173 kMeans.py. However, Bio.Cluster allows row-wise and column-wise clustering by
174 the transpose argument. If transpose==0 (the default value), kcluster performs
175 row-wise clustering, consistent with kMeans.py. If transpose==1, kcluster
176 performs column-wise clustering. The same behavior can be obtained, of course,
177 by transposing the data array before calling kcluster.
178
179
180 o k:
181
182 The desired number of clusters is specified by the input argument k in
183 kMeans.py. The corresponding argument in Bio.Cluster's kcluster is nclusters.
184
185 o distance_fn:
186
187 In kMeans.py, the argument distance_fn represents the distance function to
188 calculate the distances between items and cluster centroids. This argument
189 corresponds to a true Python function. The default value is the Euclidean
190 distance, implemented as distance.euclidean in distance.py. User-defined
191 distance functions can also be used.
192
193 The k-means routine in Bio.Cluster does not allow user-specified distance
194 functions. Instead, it provides the following nine built-in distance functions,
195 depending on the argument dist:
196
197 dist=='e': Euclidean distance
198 dist=='h': Harmonically summed Euclidean distance
199 dist=='b': City-block distance
200 dist=='c': Pearson correlation
201 dist=='a': absolute value of the Pearson correlation
202 dist=='u': uncentered correlation
203 dist=='x': absolute uncentered correlation
204 dist=='s': Spearmans rank correlation
205 dist=='k': Kendalls tau
206
207 User-defined distance functions are possible only by modifying the C code in
208 cluster.c (which may not be as hard as it sounds). The default distance function
209 is the Euclidean distance (distance=='e'). Note that in Bio.Cluster the
210 Euclidean distance is defined as the sum of squared differences, whereas in
211 kMeans.py the square root of this quantity is taken. This does not affect the
212 clustering result.
213
214 o init_centroids_fn:
215
216 This function specifies the initial choice for the cluster centroids. By
217 default, cluster in kMeans.py uses a random initial choice of cluster centroids
218 by randomly choosing k data vectors from the input vectors in the data input
219 argument. Alternatively, the user can specify a user-defined function to choose
220 the initial cluster centroids.
221
222 In Bio.Cluster, the k-means algorithm in kcluster starts from an initial cluster
223 assignment instead of an initial choice of cluster centroids. As far as I know,
224 these two initialization methods are equivalent in practice. Similar to the
225 cluster routine in kMeans.py, Bio.Cluster's kcluster performs a random initial
226 assignment of items to clusters. Alternatively, users can specify a
227 (deterministic) initial clustering via the initialid argument. This argument is
228 None by default. If not None, it should be a 1D array (or list) containing the
229 number (between 0 and nclusters-1) of the cluster to which each item is
230 assigned initially.
231
232 Note that the k-means routine in Bio.Cluster performs automatic repeats of the
233 algorithm, each time starting from a different random initial clustering. See
234 the comment for the npass argument below.
235
236 o calc_centroid_fn:
237
238 This argument specifies how to calculate the cluster centroids, given the data
239 vectors of the items that belong to each cluster. By default, the mean over the
240 vectors is calculated. A user-defined function can also be used.
241
242 Bio.Cluster's kcluster does not allow user-defined functions. Instead, the
243 method to calculate the cluster centroid is determined by the argument method,
244 which can be either 'a' (arithmetic mean) or 'm' (median). The default is to
245 calculate the mean ('a').
246
247 o max_iterations:
248
249 The cluster routine in kMeans.py has an argument max_iterations, which is used
250 to stop the iteration it the routine does not converge after the given number of
251 iterations.
252
253 The kcluster routine in Bio.Cluster does not have such an argument. The failure
254 of a k-means algorithm to converge is due to the occurrence of periodic
255 clustering solutions during the course of the k-means algorithm. The kcluster
256 routine in Bio.Cluster automatically checks for the occurrence of such a
257 periodicity in the solutions. If a periodic behavior is detected, the algorithm
258 is interrupted and the last clustering solution is returned. Accordingly, the
259 kcluster routine is guaranteed to return a clustering solution. Also see the
260 discussion of the npass argument below.
261
262 o update_fn:
263
264 The argument update_fn to cluster in kMeans.py is a hook function that is
265 called at the beginning of every iteration and passed the iteration number,
266 cluster centroids, and current cluster assignments. It is used by xkMeans.py,
267 which provides a visualization of k-means clustering. Currently there is no
268 equivalent in Bio.Cluster.
269
270
271 Other arguments for Bio.Cluster's kcluster.
272 -------------------------------------------
273
274 Three arguments in Bio.Cluster's kcluster do not have a direct equivalent in
275 kMeans.py's cluster.
276
277 o mask:
278
279 Microarray experiments tend to suffer from a large number of missing data. The
280 argument mask to Bio.Cluster's kcluster lets the user specify which data are
281 missing. This argument is an array with the same shape as data, and contains
282 a 1 for each data point that is present, and a 0 for a missing data point:
283
284 mask[i,j]==1: data[i,j] is valid
285 mask[i,j]==0: data[i,j] is a missing data point
286
287 Missing data points are ignored by the clustering algorithm. By default, mask
288 is an array containing 1's everywhere.
289
290 o weight:
291
292 The weight argument is used to put different weights on different data point.
293 For example, when clustering genes based on their gene expression profile, we
294 may want to attach a bigger weight to some microarrays compared to others. By
295 default, the weight argument contains equal weights of 1.0 for all data points.
296 Note that for row-wise clustering, the weight argument is a 1D vector whose
297 length is equal to the number of columns. For column-wise clustering, the length
298 of this argument is equal to the number of rows.
299
300 o npass:
301
302 Typical implementations of the k-means clustering algorithm rely on a random
303 initialization. Unlike Self-Organizing Maps, however, the k-means algorithm has
304 a clearly defined goal, which is to minimize the within-cluster sum of
305 distances. Different k-means clustering solutions (based on different initial
306 clusterings) can therefore be compared to each other directly. In order to
307 increase the chance of finding the optimal k-means clustering solution, the
308 k-means routine in Bio.Cluster automatically repeats the algorithm npass times,
309 each time starting from a different initial random clustering. The best
310 clustering solution, as well as in how many of the npass attempts it was found,
311 is returned to the user. For more information, see the output variable nfound
312 below.
313
314
315 Return values
316 -------------
317
318 The cluster routine in kMeans.py returns two values:
319
320 o centroids
321 o clusters
322
323 The kcluster routine in Bio.Cluster returns four values:
324
325 o clusterid
326 o centroids
327 o error
328 o nfound
329
330
331 o centroids:
332
333 The centroids return value contains the centroids of the k clusters that were
334 found, and corresponds to the centroids return value from Bio.Cluster's
335 kcluster routine.
336
337 o clusters:
338
339 The clusters return value contains the number of the cluster to which each
340 vector was assigned. The corresponding return value in Bio.Cluster's kcluster
341 is clusterid.
342
343 o error:
344
345 The error return value from Bio.Cluster's kcluster is the within-cluster sum of
346 distances for the optimal clustering solution that was found. This value can be
347 used to compare different clustering solutions to each other.
348
349 o nfound:
350
351 The nfound return value from Bio.Cluster's kcluster shows in how many of the
352 npass runs the optimal clustering solution was found. Accordingly, nfound is at
353 least 1 and at most equal to npass. A large value for nfound is an indication
354 that the clustering solution that was found is optimal. On the other hand, if
355 nfound is equal to 1, it is very well possible that a better clustering solution
356 exists than the one found by kcluster.
Something went wrong with that request. Please try again.