Skip to content

Commit

Permalink
committed a patch from a kind contributor adding feature X
Browse files Browse the repository at this point in the history
  • Loading branch information
mdehoon committed Oct 7, 2009
1 parent c638926 commit 447a2c5
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions Doc/Tutorial.tex
Expand Up @@ -80,7 +80,7 @@

\author{Jeff Chang, Brad Chapman, Iddo Friedberg, Thomas Hamelryck, \\
Michiel de Hoon, Peter Cock, Tiago Ant\~ao}
\date{Last Update -- 28 September 2009 (Biopython 1.52+)}
\date{Last Update -- 7 October 2009 (Biopython 1.52+)}

%Hack to get the logo at the start of the HTML front page:
%(hopefully this isn't going to be too wide for most people)
Expand Down Expand Up @@ -5003,12 +5003,12 @@ \section{ESpell: Obtaining spelling suggestions}
\section{Parsing huge Entrez XML files}
The \verb+Entrez.read+ function reads the entire XML file returns by Entrez into a single Python object, which is kept in memory. Some Entrez XML files are so large that they do not fit in memory. To parse such files, you can use the functio n\verb+Entrez.parse+, which is a generator function that reads records in the XML file one by one. This function is only useful if the XML file reflects a Python list object (in other words, if \verb+Entrez.read+ on a computer with infinite memory resources would return a Python list).
The \verb+Entrez.read+ function reads the entire XML file returned by Entrez into a single Python object, which is kept in memory. To parse Entrez XML files too large to fit in memory, you can use the function \verb+Entrez.parse+. This is a generator function that reads records in the XML file one by one. This function is only useful if the XML file reflects a Python list object (in other words, if \verb+Entrez.read+ on a computer with infinite memory resources would return a Python list).
For example, you can download the entire Entrez Gene database for a given organism as a file from NCBI's ftp site. These files can be very large. As an example, on September 4, 2009, the file \verb+Homo_sapiens.ags.gz+, containing the Entrez Gene database for human, had a size of 116576 kB. This file, which is in the \verb+ASN+ format, can be converted into an XML file using NCBI's \verb+gene2xml+ progam (see NCBI's ftp site for more information):
\begin{verbatim}
gene2xml -b T -i Homo_sapiens.ags.gz Homo_sapiens.xml
gene2xml -b T -i Homo_sapiens.ags -o Homo_sapiens.xml
\end{verbatim}
The resulting XML file has a size of 6.1 GB. Attempting \verb+Entrez.read+ on this file will result in a \verb+MemoryError+ on many computers.
Expand Down

0 comments on commit 447a2c5

Please sign in to comment.