<h1>Reading Files in Python</h1><p><img src="images/1line.png" width="100%"  /></p>

<h3>Reading a File as a Sequence</h3>
<ul>
<li>A <strong>file handle</strong> open for read can be treated as a <strong>sequence</strong> of strings where each line in the file is a string in the sequence</li>
<li>We can use the <strong>for</strong> statement to iterate through a <strong>sequence</strong></li>
<li>Remember - a <strong>sequence</strong> is an ordered set</li>
</ul>
<pre>xfile = open('mbox.txt')<br />for line in xfile:<br />  print(line)</pre>
<hr />
<h4>Counting Lines in a File</h4>
<ul>
<li>Open a file read-only</li>
<li>Use a for loop to read each line</li>
<li>Count the lines and print out the number of lines</li>
</ul>
<pre>fhand = open('mbox.txt')<br />count = 0<br />for line in fhand:<br />&nbsp; &nbsp; count = count + 1<br />print('Line Count:', count)</pre>
<pre>$ python open.py<br /><span style="color: #0000ff;">Line Count: 132045</span></pre>
<hr />
<h4>Searching Through a File</h4>
<ul>
<li>We can put an if statement in our for loop to only print lines that meet some criteria</li>
</ul>
<pre>fhand = open('mbox-short.txt')<br />for line in fhand:<br />&nbsp; &nbsp; if line.startswith('From:') :<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;print(line)</pre>
<p>Outputs:</p>
<pre><span style="color: #0000ff;">From: stephen.marquard@uct.ac.za

From: louis@media.berkeley.edu

From: zqian@umich.edu

From: rjlowe@iupui.edu</span>
...</pre>
<ul>
<li>Since each line from the file has a newline at the end and the print statement adds a newline to each line</li>
<li>You&nbsp;can strip the whitespace from the right-hand side of the string using rstrip() from the string library
<ul>
<li>The newline is considered &ldquo;white space&rdquo; and is stripped</li>
</ul>
</li>
</ul>
<pre>fhand = open('mbox-short.txt')<br />for line in fhand:<br />&nbsp; &nbsp; line = line.rstrip()<br />&nbsp; &nbsp; if line.startswith('From:') :<br />&nbsp; &nbsp; &nbsp; &nbsp; print(line)</pre>
<hr />
<h3>Using in to select lines</h3>
<ul>
<li>You&nbsp;can look for a string anywhere <strong>in</strong> a line as a&nbsp;selection criteria</li>
</ul>
<pre>fhand = open('mbox-short.txt')<br />for line in fhand:<br />    line = line.rstrip()<br />    if not '@uct.ac.za' in line : <br />        continue<br />    print(line)</pre>
<ul>
<li>It would find all of the lines below from the file:</li>
</ul>
<pre><span style="color: #0000ff;">From stephen.marquard<strong>@uct.ac.za</strong> Sat Jan 5 09:14:16 2008</span><br /><span style="color: #0000ff;">X-Authentication-Warning: set sender to stephen.marquard<strong>@uct.ac.za</strong> using &ndash;f</span><br /><span style="color: #0000ff;">From: stephen.marquard<strong>@uct.ac.za</strong></span><br /><span style="color: #0000ff;">Author: stephen.marquard@<strong>uct.ac.za</strong></span><br /><span style="color: #0000ff;">From david.horwitz<strong>@uct.ac.za</strong> Fri Jan 4 07:02:32 2008</span><br /><span style="color: #0000ff;">X-Authentication-Warning: set sender to david.horwitz<strong>@uct.ac.za</strong> using -f...</span></pre>
<hr />
<h3>Reading the Whole File</h3>
<ul>
<li>We can read the whole file (newlines and all) into a single string using <strong>read()</strong></li>
</ul>
<pre>&gt;&gt;&gt; fhand = open('mbox-short.txt')<br />&gt;&gt;&gt; inp = fhand.read()<br />&gt;&gt;&gt; print(len(inp))<br /><span style="color: #0000ff;">94626</span><br />&gt;&gt;&gt; print(inp[:20])<br /><span style="color: #0000ff;">From stephen.marquar</span></pre>
<hr />
<h3>You Can Prompt Users For the File Name</h3>
<pre>fname = input('Enter the file name: ')<br />fhand = open(fname)<br />count = 0<br />for line in fhand:<br />&nbsp; &nbsp;if line.startswith('Subject:') :<br />&nbsp; &nbsp; &nbsp; count = count + 1<br />print('There were', count, 'subject lines in', fname)</pre>
<p>&nbsp;</p>
<pre><span style="color: #0000ff;">Enter the file name: mbox.txt</span><br /><span style="color: #0000ff;">There were 1797 subject lines in mbox.txt</span><br /><br /><span style="color: #0000ff;">Enter the file name: mbox-short.txt</span><br /><span style="color: #0000ff;">There were 27 subject lines in mbox-short.txt</span></pre>
<ul>
<li>You can use try/except to catch any errors that might occur if someone enters a bad file name:</li>
</ul>
<pre>fname = input('Enter the file name: ')<br />try:<br />  fhand = open(fname)<br />  count = 0<br />  for line in fhand:<br />      if line.startswith('Subject:') :<br />      count = count + 1<br />  print('There were', count, 'subject lines in', fname)<br />except:<br />  print('File cannot be opened:', fname)</pre>
<pre><span style="color: #0000ff;">Enter the file name: na na boo boo</span><br /><span style="color: #0000ff;">File cannot be opened: na na boo boo</span></pre>
<p>&nbsp;</p>

<hr><h3>References</h3>
<ul>
<li>This Juptyer Notebook contains content from &ldquo;Python for Everybody&rdquo; by Charles R Severance is licensed under <a href="https://runestone.academy/ns/books/published/universityofcoloradodenver_py4e-int_summer23/ack/creativecommons.org/licenses/by-nc-sa/3.0/">CC BY-ND 3.0</a>.</li>