-
Notifications
You must be signed in to change notification settings - Fork 54
/
Copy pathWorkingWithFiles.html
818 lines (742 loc) · 64.3 KB
/
WorkingWithFiles.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<meta content="Topic: Working with paths and files, Difficulty: Medium, Category: Section" name="description" />
<meta content="open file, read file, pathlib, join directory, context manager, close file, rb, binary file, utf-8, encoding, pickle, numpy, load, archive, npy, npz, pkl, glob, read lines, write, save" name="keywords" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Working with Files — Python Like You Mean It</title>
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="../_static/my_theme.css" type="text/css" />
<!--[if lt IE 9]>
<script src="../_static/js/html5shiv.min.js"></script>
<![endif]-->
<script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
<script src="../_static/jquery.js"></script>
<script src="../_static/underscore.js"></script>
<script src="../_static/doctools.js"></script>
<script async="async" src="https://www.googletagmanager.com/gtag/js?id=UA-115029372-1"></script>
<script src="../_static/gtag.js"></script>
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
<script>window.MathJax = {"tex": {"inlineMath": [["$", "$"], ["\\(", "\\)"]], "processEscapes": true}, "options": {"ignoreHtmlClass": "tex2jax_ignore|mathjax_ignore|document", "processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
<script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
<script src="../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Import: Modules and Packages" href="Modules_and_Packages.html" />
<link rel="prev" title="Matplotlib" href="Matplotlib.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../index.html" class="icon icon-home"> Python Like You Mean It
</a>
<div class="version">
1.4
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Table of Contents:</span></p>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../intro.html">Python Like You Mean It</a></li>
<li class="toctree-l1"><a class="reference internal" href="../module_1.html">Module 1: Getting Started with Python</a></li>
<li class="toctree-l1"><a class="reference internal" href="../module_2.html">Module 2: The Essentials of Python</a></li>
<li class="toctree-l1"><a class="reference internal" href="../module_2_problems.html">Module 2: Problems</a></li>
<li class="toctree-l1"><a class="reference internal" href="../module_3.html">Module 3: The Essentials of NumPy</a></li>
<li class="toctree-l1"><a class="reference internal" href="../module_3_problems.html">Module 3: Problems</a></li>
<li class="toctree-l1"><a class="reference internal" href="../module_4.html">Module 4: Object Oriented Programming</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="../module_5.html">Module 5: Odds and Ends</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="Writing_Good_Code.html">Writing Good Code</a></li>
<li class="toctree-l2"><a class="reference internal" href="Matplotlib.html">Matplotlib</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">Working with Files</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#Working-with-Paths">Working with Paths</a><ul>
<li class="toctree-l4"><a class="reference internal" href="#pathlib.Path">pathlib.Path</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="#Opening-Files">Opening Files</a><ul>
<li class="toctree-l4"><a class="reference internal" href="#Specifying-the-Open-Mode">Specifying the Open-Mode</a></li>
<li class="toctree-l4"><a class="reference internal" href="#Working-with-the-File-Object">Working with the File Object</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="#Example:-Writing-and-Reading-a-Text-File">Example: Writing and Reading a Text File</a></li>
<li class="toctree-l3"><a class="reference internal" href="#Globbing-for-Files">Globbing for Files</a></li>
<li class="toctree-l3"><a class="reference internal" href="#Saving-&-Loading-Python-Objects:-pickle">Saving & Loading Python Objects: pickle</a></li>
<li class="toctree-l3"><a class="reference internal" href="#Saving-and-Loading-NumPy-Arrays">Saving and Loading NumPy Arrays</a></li>
<li class="toctree-l3"><a class="reference internal" href="#Links-to-Official-Documentation">Links to Official Documentation</a></li>
<li class="toctree-l3"><a class="reference internal" href="#Reading-Comprehension-Solutions">Reading Comprehension Solutions</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="Modules_and_Packages.html">Import: Modules and Packages</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../changes.html">Changelog</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../index.html">Python Like You Mean It</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="../index.html" class="icon icon-home"></a> »</li>
<li><a href="../module_5.html">Module 5: Odds and Ends</a> »</li>
<li>Working with Files</li>
<li class="wy-breadcrumbs-aside">
<a href="../_sources/Module5_OddsAndEnds/WorkingWithFiles.md.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<style>
/* CSS for nbsphinx extension */
/* remove conflicting styling from Sphinx themes */
div.nbinput.container div.prompt *,
div.nboutput.container div.prompt *,
div.nbinput.container div.input_area pre,
div.nboutput.container div.output_area pre,
div.nbinput.container div.input_area .highlight,
div.nboutput.container div.output_area .highlight {
border: none;
padding: 0;
margin: 0;
box-shadow: none;
}
div.nbinput.container > div[class*=highlight],
div.nboutput.container > div[class*=highlight] {
margin: 0;
}
div.nbinput.container div.prompt *,
div.nboutput.container div.prompt * {
background: none;
}
div.nboutput.container div.output_area .highlight,
div.nboutput.container div.output_area pre {
background: unset;
}
div.nboutput.container div.output_area div.highlight {
color: unset; /* override Pygments text color */
}
/* avoid gaps between output lines */
div.nboutput.container div[class*=highlight] pre {
line-height: normal;
}
/* input/output containers */
div.nbinput.container,
div.nboutput.container {
display: -webkit-flex;
display: flex;
align-items: flex-start;
margin: 0;
width: 100%;
}
@media (max-width: 540px) {
div.nbinput.container,
div.nboutput.container {
flex-direction: column;
}
}
/* input container */
div.nbinput.container {
padding-top: 5px;
}
/* last container */
div.nblast.container {
padding-bottom: 5px;
}
/* input prompt */
div.nbinput.container div.prompt pre {
color: #307FC1;
}
/* output prompt */
div.nboutput.container div.prompt pre {
color: #BF5B3D;
}
/* all prompts */
div.nbinput.container div.prompt,
div.nboutput.container div.prompt {
width: 4.5ex;
padding-top: 5px;
position: relative;
user-select: none;
}
div.nbinput.container div.prompt > div,
div.nboutput.container div.prompt > div {
position: absolute;
right: 0;
margin-right: 0.3ex;
}
@media (max-width: 540px) {
div.nbinput.container div.prompt,
div.nboutput.container div.prompt {
width: unset;
text-align: left;
padding: 0.4em;
}
div.nboutput.container div.prompt.empty {
padding: 0;
}
div.nbinput.container div.prompt > div,
div.nboutput.container div.prompt > div {
position: unset;
}
}
/* disable scrollbars on prompts */
div.nbinput.container div.prompt pre,
div.nboutput.container div.prompt pre {
overflow: hidden;
}
/* input/output area */
div.nbinput.container div.input_area,
div.nboutput.container div.output_area {
-webkit-flex: 1;
flex: 1;
overflow: auto;
}
@media (max-width: 540px) {
div.nbinput.container div.input_area,
div.nboutput.container div.output_area {
width: 100%;
}
}
/* input area */
div.nbinput.container div.input_area {
border: 1px solid #e0e0e0;
border-radius: 2px;
/*background: #f5f5f5;*/
}
/* override MathJax center alignment in output cells */
div.nboutput.container div[class*=MathJax] {
text-align: left !important;
}
/* override sphinx.ext.imgmath center alignment in output cells */
div.nboutput.container div.math p {
text-align: left;
}
/* standard error */
div.nboutput.container div.output_area.stderr {
background: #fdd;
}
/* ANSI colors */
.ansi-black-fg { color: #3E424D; }
.ansi-black-bg { background-color: #3E424D; }
.ansi-black-intense-fg { color: #282C36; }
.ansi-black-intense-bg { background-color: #282C36; }
.ansi-red-fg { color: #E75C58; }
.ansi-red-bg { background-color: #E75C58; }
.ansi-red-intense-fg { color: #B22B31; }
.ansi-red-intense-bg { background-color: #B22B31; }
.ansi-green-fg { color: #00A250; }
.ansi-green-bg { background-color: #00A250; }
.ansi-green-intense-fg { color: #007427; }
.ansi-green-intense-bg { background-color: #007427; }
.ansi-yellow-fg { color: #DDB62B; }
.ansi-yellow-bg { background-color: #DDB62B; }
.ansi-yellow-intense-fg { color: #B27D12; }
.ansi-yellow-intense-bg { background-color: #B27D12; }
.ansi-blue-fg { color: #208FFB; }
.ansi-blue-bg { background-color: #208FFB; }
.ansi-blue-intense-fg { color: #0065CA; }
.ansi-blue-intense-bg { background-color: #0065CA; }
.ansi-magenta-fg { color: #D160C4; }
.ansi-magenta-bg { background-color: #D160C4; }
.ansi-magenta-intense-fg { color: #A03196; }
.ansi-magenta-intense-bg { background-color: #A03196; }
.ansi-cyan-fg { color: #60C6C8; }
.ansi-cyan-bg { background-color: #60C6C8; }
.ansi-cyan-intense-fg { color: #258F8F; }
.ansi-cyan-intense-bg { background-color: #258F8F; }
.ansi-white-fg { color: #C5C1B4; }
.ansi-white-bg { background-color: #C5C1B4; }
.ansi-white-intense-fg { color: #A1A6B2; }
.ansi-white-intense-bg { background-color: #A1A6B2; }
.ansi-default-inverse-fg { color: #FFFFFF; }
.ansi-default-inverse-bg { background-color: #000000; }
.ansi-bold { font-weight: bold; }
.ansi-underline { text-decoration: underline; }
div.nbinput.container div.input_area div[class*=highlight] > pre,
div.nboutput.container div.output_area div[class*=highlight] > pre,
div.nboutput.container div.output_area div[class*=highlight].math,
div.nboutput.container div.output_area.rendered_html,
div.nboutput.container div.output_area > div.output_javascript,
div.nboutput.container div.output_area:not(.rendered_html) > img{
padding: 5px;
margin: 0;
}
/* fix copybtn overflow problem in chromium (needed for 'sphinx_copybutton') */
div.nbinput.container div.input_area > div[class^='highlight'],
div.nboutput.container div.output_area > div[class^='highlight']{
overflow-y: hidden;
}
/* hide copybtn icon on prompts (needed for 'sphinx_copybutton') */
.prompt .copybtn {
display: none;
}
/* Some additional styling taken form the Jupyter notebook CSS */
div.rendered_html table {
border: none;
border-collapse: collapse;
border-spacing: 0;
color: black;
font-size: 12px;
table-layout: fixed;
}
div.rendered_html thead {
border-bottom: 1px solid black;
vertical-align: bottom;
}
div.rendered_html tr,
div.rendered_html th,
div.rendered_html td {
text-align: right;
vertical-align: middle;
padding: 0.5em 0.5em;
line-height: normal;
white-space: normal;
max-width: none;
border: none;
}
div.rendered_html th {
font-weight: bold;
}
div.rendered_html tbody tr:nth-child(odd) {
background: #f5f5f5;
}
div.rendered_html tbody tr:hover {
background: rgba(66, 165, 245, 0.2);
}
/* CSS overrides for sphinx_rtd_theme */
/* 24px margin */
.nbinput.nblast.container,
.nboutput.nblast.container {
margin-bottom: 19px; /* padding has already 5px */
}
/* ... except between code cells! */
.nblast.container + .nbinput.container {
margin-top: -19px;
}
.admonition > p:before {
margin-right: 4px; /* make room for the exclamation icon */
}
/* Fix math alignment, see https://github.com/rtfd/sphinx_rtd_theme/pull/686 */
.math {
text-align: unset;
}
</style>
<section id="Working-with-Files">
<h1>Working with Files<a class="headerlink" href="#Working-with-Files" title="Permalink to this headline"></a></h1>
<p>This section will discuss the best practices for writing Python code that involves reading from and writing to files. We will learn about the built-in <code class="docutils literal notranslate"><span class="pre">pathlib.Path</span></code> object, which will help to ensure that the code that we write is portable across operating systems (OS) (e.g. Windows, MacOS, Linux). We will also be introduced to a <em>context manager</em>, <code class="docutils literal notranslate"><span class="pre">open</span></code>, which will permit us to read-from and write-to a file safely; by “safely” we mean that we will be assured that any file that we open will
eventually be closed properly, so that it will not be corrupted even in the event that our code hits an error. Next, we will learn how to “glob” for files, meaning that we will learn to search for and list files whose names match specific patterns. Lastly, we will briefly encounter the <code class="docutils literal notranslate"><span class="pre">pickle</span></code> module which allows us to save (or “pickle”) and load Python objects to and from your computer’s file system.</p>
<section id="Working-with-Paths">
<h2>Working with Paths<a class="headerlink" href="#Working-with-Paths" title="Permalink to this headline"></a></h2>
<p>Suppose you are writing a Jupyter notebook where you are analyzing data that is saved to your computer. You will naturally need to detail the location where your data is stored on your computer’s file system so that you can load your data. Let’s suppose that this notebook is in the directory <code class="docutils literal notranslate"><span class="pre">my_folder</span></code> and that there is a directory, <code class="docutils literal notranslate"><span class="pre">data</span></code>, within it, which contains some text files with your data. Thus your directory structure looks like this:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>my_folder/
|-notebook.ipynb
|-data/
|-data1.txt
|-data2.txt
</pre></div>
</div>
<p>Now, if you are on a machine that is running Linux or MacOS, the path to <code class="docutils literal notranslate"><span class="pre">data1.txt</span></code> relative to the notebook is: <code class="docutils literal notranslate"><span class="pre">./data/data1.txt</span></code>. See that the character <code class="docutils literal notranslate"><span class="pre">/</span></code> is used as a separator used to denote subsequent directories in a path. On a Windows machine, the separator is <code class="docutils literal notranslate"><span class="pre">\</span></code>, thus the path to your data would be written as <code class="docutils literal notranslate"><span class="pre">.\data\data1.txt</span></code>. We want to write our code so that it can be utilized, without modification, across operating systems. This where Python’s fantastic <code class="docutils literal notranslate"><span class="pre">pathlib</span></code>
module comes in handy.</p>
<section id="pathlib.Path">
<h3>pathlib.Path<a class="headerlink" href="#pathlib.Path" title="Permalink to this headline"></a></h3>
<p>The standard library’s <a class="reference external" href="https://docs.python.org/3/library/pathlib.html">pathlib module</a> provides a number of classes that make it easy to work with file system paths across operating systems. We will limit our discussion to the <code class="docutils literal notranslate"><span class="pre">pathlib.Path</span></code> class, which will take care of all of our most pressing needs. This class allows us to write all of our path-related code in a single way, and it will convert the path to the operating system-appropriate format for us underneath the hood.</p>
<p>Let’s begin by creating a <code class="docutils literal notranslate"><span class="pre">Path</span></code> object that points to the directory containing the present notebook:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># creating a path-object pointing to the present directory</span>
<span class="o">>>></span> <span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
<span class="o">>>></span> <span class="n">root</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="s2">"."</span><span class="p">)</span> <span class="c1"># '.' means: the present directory that this code exists in</span>
</pre></div>
</div>
<p>Because I am running this code from a Windows machine, this will form a <code class="docutils literal notranslate"><span class="pre">WindowsPath</span></code> object automatically:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">root</span>
<span class="go">WindowsPath('.')</span>
</pre></div>
</div>
<p>If I were running on a Linux or MacOS machine, it would have formed a <code class="docutils literal notranslate"><span class="pre">PosixPath</span></code> object instead. Fortunately, we need not worry about these details as these classes handle them for us! The <code class="docutils literal notranslate"><span class="pre">Path</span></code> class has many useful methods for us to leverage. First, see that it conveniently overrides the <code class="docutils literal notranslate"><span class="pre">/</span></code> operator (by implementing a <a class="reference external" href="http://www.pythonlikeyoumeanit.com/Module4_OOP/Special_Methods.html">special method</a>) so that we can create a path to a subsequent directory. Let’s see this in
action:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># creating a path to the file 'data1.txt' in the subdirectory 'data'</span>
<span class="o">>>></span> <span class="n">path_to_data1</span> <span class="o">=</span> <span class="n">root</span> <span class="o">/</span> <span class="s2">"data"</span> <span class="o">/</span> <span class="s2">"data1.txt"</span>
<span class="o">>>></span> <span class="n">path_to_data1</span>
<span class="n">WindowsPath</span><span class="p">(</span><span class="s1">'data/data1.txt'</span><span class="p">)</span>
</pre></div>
</div>
<p>See that the <code class="docutils literal notranslate"><span class="pre">/</span></code> operator, when used in conjunction with a <code class="docutils literal notranslate"><span class="pre">Path</span></code> instance, created a new path with the appropriate path-separator for the present OS. This is extremely convenient!</p>
<p>Let’s proceed to explore some other useful methods that <code class="docutils literal notranslate"><span class="pre">Path</span></code> provides us with. These methods enable us to inspect directories and files, create new directories, list all of the files in a directory, open files to for reading/writing, and much more. A complete listing of these methods can be found <a class="reference external" href="https://docs.python.org/3/library/pathlib.html#methods-and-properties">here</a> and <a class="reference external" href="https://docs.python.org/3/library/pathlib.html#methods">here</a>, collectively; it is highly recommended that
you take time to look through them.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">root</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="s2">"."</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">path_to_data1</span> <span class="o">=</span> <span class="n">root</span> <span class="o">/</span> <span class="s2">"data"</span> <span class="o">/</span> <span class="s2">"data1.txt"</span>
<span class="go"># Checking to see if a file or directory exists:</span>
<span class="gp">>>> </span><span class="n">path_to_data1</span><span class="o">.</span><span class="n">exists</span><span class="p">()</span>
<span class="go">True</span>
<span class="gp">>>> </span><span class="p">(</span><span class="n">root</span> <span class="o">/</span> <span class="s2">"bogus_path"</span><span class="p">)</span><span class="o">.</span><span class="n">exists</span><span class="p">()</span>
<span class="go">False</span>
<span class="go"># Getting the "absolute" path to a file or directory:</span>
<span class="gp">>>> </span><span class="n">path_to_data1</span><span class="o">.</span><span class="n">absolute</span><span class="p">()</span>
<span class="go">WindowsPath('C:/Users/TerranceWasabi/Desktop/PLYMI/Module5_OddsAndEnds/data/data1.txt')</span>
<span class="go"># Access the name of the file that the path is pointing to</span>
<span class="gp">>>> </span><span class="n">path_to_data1</span><span class="o">.</span><span class="n">name</span>
<span class="go">'data1.txt'</span>
<span class="go"># Create a new directory, named 'new_folder' within the root directory</span>
<span class="gp">>>> </span><span class="n">new_dir</span> <span class="o">=</span> <span class="n">root</span> <span class="o">/</span> <span class="s2">"new_folder"</span>
<span class="gp">>>> </span><span class="n">new_dir</span><span class="o">.</span><span class="n">mkdir</span><span class="p">()</span>
<span class="go"># Use 'glob' to return a generator over all files</span>
<span class="go"># that match a specified pattern. E.g. get path to every</span>
<span class="go"># .txt file in a directory</span>
<span class="gp">>>> </span><span class="nb">list</span><span class="p">((</span><span class="n">root</span> <span class="o">/</span> <span class="s2">"data"</span><span class="p">)</span><span class="o">.</span><span class="n">glob</span><span class="p">(</span><span class="s2">"*.txt"</span><span class="p">))</span>
<span class="go">[WindowsPath('data/data1.txt'), WindowsPath('data/data2.txt')]</span>
<span class="go"># convert a path-object to a string formatted for the present OS</span>
<span class="gp">>>> </span><span class="nb">str</span><span class="p">(</span><span class="n">path_to_data1</span><span class="p">)</span>
<span class="go">'data\\data1.txt'</span>
</pre></div>
</div>
<div class="admonition note">
<p class="admonition-title fa fa-exclamation-circle"><strong>Takeaway</strong>:</p>
<p>You should strive to utilize <code class="docutils literal notranslate"><span class="pre">pathlib.Path</span></code> whenever you are working with file system paths in your code. To reiterate - this will ensure that your code is portable across operating systems, it will help make your path handling easy to read, plus this class’s methods provides a massive amount of functionality for you to leverage at your convenience.</p>
</div>
<div class="admonition warning">
<p class="admonition-title fa fa-exclamation-circle"><strong>Note</strong>:</p>
<p><code class="docutils literal notranslate"><span class="pre">pathlib</span></code> was introduced in Python 3.4. Although many 3rd party libraries have updated their file-I/O utilities to accept both strings and <code class="docutils literal notranslate"><span class="pre">pathlib.Path</span></code> objects (e.g. <code class="docutils literal notranslate"><span class="pre">numpy.save</span></code> can be passed a <code class="docutils literal notranslate"><span class="pre">Path</span></code> instance to tell it where to save a numpy-array), some libraries are late to the party and will only accept strings as paths. On such occasions you can simple convert your <code class="docutils literal notranslate"><span class="pre">Path</span></code> instance to a string by calling <code class="docutils literal notranslate"><span class="pre">str</span></code> on it, and then pass the resulting string-path to the file-I/O
function. This is also a friendly reminder to accomodate <code class="docutils literal notranslate"><span class="pre">pathlib.Path</span></code> objects whenever you find yourself writing your own file-I/O functions!</p>
</div>
</section>
</section>
<section id="Opening-Files">
<h2>Opening Files<a class="headerlink" href="#Opening-Files" title="Permalink to this headline"></a></h2>
<p>It is recommended that you refer to the <a class="reference external" href="https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files">official Python tutorial</a> for a simple rundown of file reading and writing</p>
<p>Whenever you instruct your code to open a file for reading or writing, you must take care that the file ultimately is closed so that its data is not vulnerable to being modified. Python provides the <code class="docutils literal notranslate"><span class="pre">open</span></code> context manager, which is designed to ensure that a file will be closed even in the event that our code raises an error.</p>
<p>The following code opens the file “file1.txt” for writing:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># demonstrating the use of the `open` context manager</span>
<span class="c1"># we will write to the file named "file1.txt", located</span>
<span class="c1"># in the present directory</span>
<span class="n">path_to_file</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="s2">"file1.txt"</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">path_to_file</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="s2">"w"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="c1"># The indented space enters the "context" of the open file.</span>
<span class="c1"># Leaving the indented space exits the context of the opened file, forcing</span>
<span class="c1"># the file to be closed. This is ensured even if the code within the indented</span>
<span class="c1"># block causes an error.</span>
<span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'this is a line.</span><span class="se">\n</span><span class="s1">This is a second line.</span><span class="se">\n</span><span class="s1">This is the third line.'</span><span class="p">)</span>
<span class="c1"># The file is closed here.</span>
</pre></div>
</div>
<p>The syntax <code class="docutils literal notranslate"><span class="pre">with</span> <span class="pre"><context_manager>()</span> <span class="pre">as</span> <span class="pre"><context_variable>:</span></code> signifies the creation of a context with the object <code class="docutils literal notranslate"><span class="pre"><context_variable></span></code> . In this case <code class="docutils literal notranslate"><span class="pre">open</span></code> is the context manager, and the variable we named <code class="docutils literal notranslate"><span class="pre">f</span></code> is the file-object that is opened within that context, which is delimited by the subsequent indented space. You can also call <code class="docutils literal notranslate"><span class="pre">open</span></code> directly from a <code class="docutils literal notranslate"><span class="pre">Path</span></code> instance:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">path_to_file</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">mode</span><span class="o">=</span><span class="s2">"w"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s1">'this is a line.</span><span class="se">\n</span><span class="s1">This is a second line.</span><span class="se">\n</span><span class="s1">This is the third line.'</span><span class="p">)</span>
</pre></div>
</div>
<p>The complete documentation for <code class="docutils literal notranslate"><span class="pre">open</span></code> can be found <a class="reference external" href="https://docs.python.org/3/library/functions.html#open">here</a>.</p>
<section id="Specifying-the-Open-Mode">
<h3>Specifying the Open-Mode<a class="headerlink" href="#Specifying-the-Open-Mode" title="Permalink to this headline"></a></h3>
<p>Specifying <code class="docutils literal notranslate"><span class="pre">mode='w'</span></code> indicates that we will be writing to the file anew - if the file already has any content, that content will be <em>erased</em> before being written to. The following are the available “modes” for opening a file:</p>
<table class="docutils align-default">
<colgroup>
<col style="width: 50%" />
<col style="width: 50%" />
</colgroup>
<thead>
<tr class="row-odd"><th class="head"><p>Mode</p></th>
<th class="head"><p>Explanation</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">r</span></code></p></td>
<td><p>Open the file for reading text</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">w</span></code></p></td>
<td><p>Open the file, <strong>clearing its contents</strong>, for writing text anew</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">a</span></code></p></td>
<td><p>Open the file to write text to end of any existing content, thus “appending” to the file</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">x</span></code></p></td>
<td><p>Open the file for writing text, failing if the file already exists</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">+</span></code></p></td>
<td><p>Open the file for both reading and writing text</p></td>
</tr>
</tbody>
</table>
<p>By default, these modes will read and write text utilizing the unicode (utf-8) decoding/encoding specification. That is, when you read data from your file system with <code class="docutils literal notranslate"><span class="pre">mode='r'</span></code> Python will automatically <em>decode</em> that binary data that was stored on your machine according to utf-8, which converts the binary data to written text stored as a string. Similarly, writing a string to a file in modes ‘w’, ‘a’, ‘x’, or ‘+’ will presume that the string should be encoded into a binary representation
(which is necessary for it to be stored as a file) according to the utf-8 encoding scheme.</p>
<p>You can instead force Python to read and write strictly in terms of binary data by adding a <code class="docutils literal notranslate"><span class="pre">'b'</span></code> to these modes: <code class="docutils literal notranslate"><span class="pre">'rb'</span></code>, <code class="docutils literal notranslate"><span class="pre">'wb'</span></code>, <code class="docutils literal notranslate"><span class="pre">'ab'</span></code>, <code class="docutils literal notranslate"><span class="pre">'xb'</span></code>, <code class="docutils literal notranslate"><span class="pre">'+b'</span></code>. It is important to be aware of this binary mode. For example, if you are saving a NumPy-array, you should open a file in the ‘wb’ or ‘xb’ modes so that it expects binary data to be written to it; obviously we are not saving text when we are saving a NumPy array of numbers.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># saving a NumPy-array to the file 'array.npy'</span>
<span class="o">>>></span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="o">>>></span> <span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
<span class="c1"># file must be open for binary-write mode</span>
<span class="c1"># since we are not saving text</span>
<span class="o">>>></span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"array.npy"</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="s2">"wb"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="o">...</span> <span class="n">np</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span>
</pre></div>
</div>
</section>
<section id="Working-with-the-File-Object">
<h3>Working with the File Object<a class="headerlink" href="#Working-with-the-File-Object" title="Permalink to this headline"></a></h3>
<p>When we invoke <code class="docutils literal notranslate"><span class="pre">open</span></code> to open a file, the context manager produces an opened file object. The methods of this file object allow us to write-to and read-from the opened file (assuming that we have utilized the appropriate mode when opening it).</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># demonstrating the `read` method of the file object</span>
<span class="o">>>></span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">path_to_file</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="s2">"r"</span><span class="p">)</span> <span class="k">as</span> <span class="n">var</span><span class="p">:</span>
<span class="o">...</span> <span class="c1"># reads the entire content of the file as a string</span>
<span class="o">...</span> <span class="n">content</span> <span class="o">=</span> <span class="n">var</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="o">>>></span> <span class="n">content</span>
<span class="s1">'this is a line.</span><span class="se">\n</span><span class="s1">This is a second line.</span><span class="se">\n</span><span class="s1">This is the third line.'</span>
<span class="o">>>></span> <span class="nb">print</span><span class="p">(</span><span class="n">content</span><span class="p">)</span>
<span class="n">this</span> <span class="ow">is</span> <span class="n">a</span> <span class="n">line</span><span class="o">.</span>
<span class="n">This</span> <span class="ow">is</span> <span class="n">a</span> <span class="n">second</span> <span class="n">line</span><span class="o">.</span>
<span class="n">This</span> <span class="ow">is</span> <span class="n">the</span> <span class="n">third</span> <span class="n">line</span><span class="o">.</span>
</pre></div>
</div>
<p>The following summarizes some of the methods available to this file object:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">read()</span></code>: Read the entire content of the file as a string or as bytes (depending on the open-mode)</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">readline()</span></code>: Read the next line of text from the file, including the trailing <code class="docutils literal notranslate"><span class="pre">'\n'</span></code> character</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">readlines()</span></code>: Read in the lines of text from the file, storing each line as an string in a list.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">write(x)</span></code>: Write <code class="docutils literal notranslate"><span class="pre">x</span></code> (a string) to the file.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">writelines(x)</span></code>: Given an iterable of strings, treat each string as a line of text to be written to the file (the inverse of <code class="docutils literal notranslate"><span class="pre">readlines</span></code>)</p></li>
</ul>
<p>Also, it is important to note that the file object can be <em>iterated over</em>, and that each iteration will return an individual line of text from the file. This is the best way to read through an entire file line-by-line.</p>
</section>
</section>
<section id="Example:-Writing-and-Reading-a-Text-File">
<h2>Example: Writing and Reading a Text File<a class="headerlink" href="#Example:-Writing-and-Reading-a-Text-File" title="Permalink to this headline"></a></h2>
<p>Given the following string:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># recall: triple-quotes can be used to write multi-line strings</span>
<span class="o">>>></span> <span class="n">some_text</span> <span class="o">=</span> <span class="s2">"""A bagel rolled down the hill.</span>
<span class="s2">I mean *all* the way down the hill.</span>
<span class="s2">A lady watched it roll.</span>
<span class="s2">Way to help me out."""</span>
<span class="o">>>></span> <span class="n">some_text</span>
<span class="s1">'A bagel rolled down the hill.</span><span class="se">\n</span><span class="s1">I mean *all* the way down the hill.</span><span class="se">\n</span><span class="s1">A lady watched it roll.</span><span class="se">\n</span><span class="s1">Way to help me out.'</span>
</pre></div>
</div>
<p>Write that string to a file, “a_poem.txt”, in the present directory:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># use mode-x to ensure that we don't overwrite the file</span>
<span class="c1"># if it already exists</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"a_poem.txt"</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="s2">"x"</span><span class="p">)</span> <span class="k">as</span> <span class="n">my_open_file</span><span class="p">:</span>
<span class="n">my_open_file</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">some_text</span><span class="p">)</span>
</pre></div>
</div>
<p>Now let’s read in each line of the file and append them to the list <code class="docutils literal notranslate"><span class="pre">out</span></code>, but <em>only if that line starts with the letter ‘A’</em> (just to make things a little bit more involved):</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"a_poem.txt"</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="s2">"r"</span><span class="p">)</span> <span class="k">as</span> <span class="n">my_open_file</span><span class="p">:</span>
<span class="c1"># recall: iterating over the file-object yields each line of the file</span>
<span class="c1"># one line at a time</span>
<span class="n">out</span> <span class="o">=</span> <span class="p">[</span><span class="n">line</span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">my_open_file</span> <span class="k">if</span> <span class="n">line</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s2">"A"</span><span class="p">)]</span>
</pre></div>
</div>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># verify that the output is what we expect</span>
<span class="o">>>></span> <span class="n">out</span>
<span class="p">[</span><span class="s1">'A bagel rolled down the hill.</span><span class="se">\n</span><span class="s1">'</span><span class="p">,</span> <span class="s1">'A lady watched it roll.</span><span class="se">\n</span><span class="s1">'</span><span class="p">]</span>
</pre></div>
</div>
</section>
<section id="Globbing-for-Files">
<h2>Globbing for Files<a class="headerlink" href="#Globbing-for-Files" title="Permalink to this headline"></a></h2>
<p>There are many cases in which we may want to construct a list of files to iterate over. For example, if we have several data files, it would be useful to create a file list which we can iterate through and process in sequence. One way to do this would be to manually construct such a list of files:</p>
<div class="nbinput nblast docutils container">
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[1]:
</pre></div>
</div>
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">my_files</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'data/file1.txt'</span><span class="p">,</span> <span class="s1">'data/file2.txt'</span><span class="p">,</span> <span class="s1">'data/file3.txt'</span><span class="p">,</span> <span class="s1">'data/file4.txt'</span><span class="p">]</span>
</pre></div>
</div>
</div>
<p>However, this is extraordinarily tedious and prone to error, either by mis-typing a file name or forgetting a file. A much more powerful way to construct such a list of files is by file globbing. A <code class="docutils literal notranslate"><span class="pre">glob</span></code> is a set of file names matching some pattern. To glob files, we use special wildcard characters that will match all the files with a certain part of a file name. In our case, <code class="docutils literal notranslate"><span class="pre">*</span></code> will be the wildcard character we use the most - it matches any character. This is much better motivated with an
example. Below, we see some globs and the types of patterns they will match:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># matches anything that starts with `file` and ends with `.txt` like
# file1.txt, filefilefile.txt, file.txt, file12345.txt, ...
file*.txt
# matches all .txt files in the 'data' directory
data/*.txt
# matches any file name
*
# matches all png image files
*.png
# matches anything that contains 'test' as part of its file name
*test*
# matches all .py files that contain 'number'
*number*.py
</pre></div>
</div>
<p>The <code class="docutils literal notranslate"><span class="pre">pathlib</span></code> module provides convenient functionality for globbing files. Once we have a <code class="docutils literal notranslate"><span class="pre">Path</span></code> object, we can simply call <code class="docutils literal notranslate"><span class="pre">glob()</span></code> on it and pass in a glob string. This will return a <a class="reference external" href="http://www.pythonlikeyoumeanit.com/Module2_EssentialsOfPython/Generators_and_Comprehensions.html#Introducing-Generators">generator</a> that will yield each of the globbed files.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># glob all of the text files in the present directory</span>
<span class="c1"># that start with 'test' and end with '.txt'</span>
<span class="o">>>></span> <span class="n">root_dir</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="s1">'.'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">files</span> <span class="o">=</span> <span class="n">root_dir</span><span class="o">.</span><span class="n">glob</span><span class="p">(</span><span class="s1">'test*.txt'</span><span class="p">)</span> <span class="c1"># this produces a generator</span>
<span class="o"><</span><span class="n">generator</span> <span class="nb">object</span> <span class="n">Path</span><span class="o">.</span><span class="n">glob</span> <span class="n">at</span> <span class="mh">0x00000146CE118620</span><span class="o">></span>
<span class="c1"># get a sorted list of the globbed paths</span>
<span class="o">>>></span> <span class="nb">sorted</span><span class="p">(</span><span class="n">files</span><span class="p">)</span>
<span class="p">[</span><span class="n">PosixPath</span><span class="p">(</span><span class="s1">'test_0.txt'</span><span class="p">),</span>
<span class="n">PosixPath</span><span class="p">(</span><span class="s1">'test_1.txt'</span><span class="p">),</span>
<span class="n">PosixPath</span><span class="p">(</span><span class="s1">'test_apple.txt'</span><span class="p">)]</span>
<span class="c1"># iterating over the generator directly</span>
<span class="o">>>></span> <span class="k">for</span> <span class="n">file</span> <span class="ow">in</span> <span class="n">root_dir</span><span class="o">.</span><span class="n">glob</span><span class="p">(</span><span class="s1">'test*.txt'</span><span class="p">):</span>
<span class="o">>>></span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="s1">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="o">...</span> <span class="c1"># do some processing</span>
<span class="o">...</span> <span class="k">pass</span>
</pre></div>
</div>
<p>For more details on globbing, see <a class="reference external" href="https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob">the documentation</a>.</p>
<div class="admonition note">
<p class="admonition-title fa fa-exclamation-circle"><strong>Reading Comprehension: Basic glob patterns</strong></p>
<p>Write a glob pattern for each of the following prompts</p>
<ul class="simple">
<li><p>Glob all .txt files in the directory <code class="docutils literal notranslate"><span class="pre">./files</span></code></p></li>
<li><p>Glob all files that contain ‘quirk’ as part of their file name</p></li>
<li><p>Glob all file that begins with ‘data’</p></li>
<li><p>Glob all file that starts with the letter ‘q’, contains a ‘w’, and ends with a ‘.npy’ extension</p></li>
</ul>
</div>
<p>The <code class="docutils literal notranslate"><span class="pre">*</span></code> wildcard is not the only pattern available to us. Sometimes it can be useful to match certain subsets of characters. For example, we may only want to match file names that start with a number. With the <code class="docutils literal notranslate"><span class="pre">*</span></code> wildcard alone, that’s not possible. Luckily for us, these common use-cases are also taken care of.</p>
<p>To match a subset of characters, we can use square brackets: <code class="docutils literal notranslate"><span class="pre">[abc]*</span></code> will match anything that starts with ‘a’, ‘b’, or ‘c’ and nothing else. We can also use a ‘-’ inside our brackets to glob groups of characters. For example:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># matches any file that starts with a number
[0-9]*.txt
# matches any file that has a vowel in its name
*[aeiou]*
# matches any file that starts with a lowercase letter
[a-z]*
</pre></div>
</div>
<div class="admonition note">
<p class="admonition-title fa fa-exclamation-circle"><strong>Reading Comprehension: More glob patterns</strong></p>
<p>Write a glob pattern for each of the following prompts</p>
<ul class="simple">
<li><p>Any file with an odd number in its name</p></li>
<li><p>All txt files that have the letters ‘q’ or ‘z’ in them</p></li>
</ul>
</div>
</section>
<section id="Saving-&-Loading-Python-Objects:-pickle">
<h2>Saving & Loading Python Objects: pickle<a class="headerlink" href="#Saving-&-Loading-Python-Objects:-pickle" title="Permalink to this headline"></a></h2>
<p>Suppose that you have just populated a dictionary that is serving as a grade book for a course that you are teaching:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">grades</span> <span class="o">=</span> <span class="p">{</span><span class="s2">"Albert"</span><span class="p">:</span> <span class="mi">92</span><span class="p">,</span> <span class="s2">"David"</span><span class="p">:</span> <span class="mi">85</span><span class="p">,</span> <span class="s2">"Emmy"</span><span class="p">:</span> <span class="mi">98</span><span class="p">,</span> <span class="s2">"Marie"</span><span class="p">:</span> <span class="mi">79</span><span class="p">}</span>
</pre></div>
</div>
<p>How do you save this dictionary so that you can revisit these grades at a later time? Python’s standard library includes the <a class="reference external" href="https://docs.python.org/3/library/pickle.html">pickle</a> module, which provides functions for saving and loading Python objects to disk. Let’s “pickle” this dictionary, saving it to the file “grades.pkl” in our present directory:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pickle</span>
<span class="c1"># pickling a dictionary</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"grades.pkl"</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="s2">"wb"</span><span class="p">)</span> <span class="k">as</span> <span class="n">opened_file</span><span class="p">:</span>
<span class="n">pickle</span><span class="o">.</span><span class="n">dump</span><span class="p">(</span><span class="n">grades</span><span class="p">,</span> <span class="n">opened_file</span><span class="p">)</span>
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">pickle.dump</span></code> creates a serialized representation of our dictionary, which is then written to our opened file via the file object that we supplied. Note that we open the file in write-binary mode as we are writing binary data and not text data that first needs to be encoded to binary data. Also note that we use the “.pkl” suffix to indicate that the file is binary data that was written using Python’s pickle protocol. Using this suffix is not necessary but is good practice.</p>
<p><code class="docutils literal notranslate"><span class="pre">pickle.load</span></code> will unpickle our Python object from disk, permitting us to resume work with our grade book.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># unpickling a dictionary</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"grades.pkl"</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="s2">"rb"</span><span class="p">)</span> <span class="k">as</span> <span class="n">opened_file</span><span class="p">:</span>
<span class="n">my_loaded_grades</span> <span class="o">=</span> <span class="n">pickle</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">opened_file</span><span class="p">)</span>
</pre></div>
</div>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">my_loaded_grades</span>
<span class="go">{'Albert': 92, 'David': 85, 'Emmy': 98, 'Marie': 79}</span>
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">pickle.dump</span></code> and <code class="docutils literal notranslate"><span class="pre">pickle.load</span></code> cover the vast majority of our object-pickling needs. A wide range of Python objects can be saved in this way, including functions that we define and instances of custom classes. Please refer to <a class="reference external" href="https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled">the official documentation</a> for a discussion of the Python objects that can and cannot be pickled.</p>
</section>
<section id="Saving-and-Loading-NumPy-Arrays">
<h2>Saving and Loading NumPy Arrays<a class="headerlink" href="#Saving-and-Loading-NumPy-Arrays" title="Permalink to this headline"></a></h2>
<p>NumPy provides its own functions for saving and loading arrays. Although these arrays can be pickled, it is strongly advised to leverage NumPy’s file-IO functions. NumPy’s standard binary file type used to store array data is known as an ‘.npy’ file. The NumPy binary archive format, which stores multiple arrays in one file, is known as the ‘.npz’ format.</p>
<p>Let’s save the array <code class="docutils literal notranslate"><span class="pre">x</span> <span class="pre">=</span> <span class="pre">np.array([1,</span> <span class="pre">2,</span> <span class="pre">3])</span></code> to the binary file (not a text file) “my_array.npz”. <code class="docutils literal notranslate"><span class="pre">numpy.save</span></code> and <code class="docutils literal notranslate"><span class="pre">numpy.load</span></code> will save and load arrays, handling all of the file opening and closing for you. Thus there is no need to use a context manager when using these functions.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="gp">>>> </span><span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
<span class="go"># save a numpy array to disk</span>
<span class="gp">>>> </span><span class="n">np</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">"my_array.npy"</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span>
<span class="go"># load the saved array from disk</span>
<span class="gp">>>> </span><span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"my_array.npy"</span><span class="p">)</span>
<span class="gp">>>> </span><span class="n">y</span>
<span class="go">array([1, 2, 3])</span>
</pre></div>
</div>
<p>We can use <code class="docutils literal notranslate"><span class="pre">numpy.savez</span></code> to save multiple arrays to a single archive file “my_archive.npz”. Here we will save three arrays to the archive. We can specify the names of these arrays, via the keyword arguments that we provide, so that we can distinguish them when loading the archive.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># save three arrays to a numpy archive file</span>
<span class="n">a0</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
<span class="n">a1</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">])</span>
<span class="n">a2</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">])</span>
<span class="c1"># we provide the keywords arguments `soil`, `crust`, and `bedrock`,</span>
<span class="c1"># as the names of the respective arrays in the archive.</span>
<span class="n">np</span><span class="o">.</span><span class="n">savez</span><span class="p">(</span><span class="s2">"my_archive.npz"</span><span class="p">,</span> <span class="n">soil</span><span class="o">=</span><span class="n">a0</span><span class="p">,</span> <span class="n">crust</span><span class="o">=</span><span class="n">a1</span><span class="p">,</span> <span class="n">bedrock</span><span class="o">=</span><span class="n">a2</span><span class="p">)</span>
</pre></div>
</div>
<p>Loading arrays from an archive is slightly more involved than loading a single array; we will want to open our archive file using a context manager and then load the arrays as we see fit. <code class="docutils literal notranslate"><span class="pre">np.load</span></code> can be used as a context manager in lieu of <code class="docutils literal notranslate"><span class="pre">open</span></code>. The file-object that it produces is our archive of numpy arrays, and it provides a dictionary-like interface for accessing these arrays:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># opening the archive and accessing each array by name</span>
<span class="k">with</span> <span class="n">np</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"my_archive.npz"</span><span class="p">)</span> <span class="k">as</span> <span class="n">my_archive_file</span><span class="p">:</span>
<span class="n">out0</span> <span class="o">=</span> <span class="n">my_archive_file</span><span class="p">[</span><span class="s2">"soil"</span><span class="p">]</span>
<span class="n">out1</span> <span class="o">=</span> <span class="n">my_archive_file</span><span class="p">[</span><span class="s2">"crust"</span><span class="p">]</span>
<span class="n">out2</span> <span class="o">=</span> <span class="n">my_archive_file</span><span class="p">[</span><span class="s2">"bedrock"</span><span class="p">]</span>
</pre></div>
</div>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">out0</span>
<span class="go">array([1, 2, 3])</span>
<span class="gp">>>> </span><span class="n">out1</span>
<span class="go">array([4, 5, 6])</span>
<span class="gp">>>> </span><span class="n">out2</span>
<span class="go">array([7, 8, 9])</span>
</pre></div>
</div>
</section>
<section id="Links-to-Official-Documentation">
<h2>Links to Official Documentation<a class="headerlink" href="#Links-to-Official-Documentation" title="Permalink to this headline"></a></h2>
<ul class="simple">
<li><p><a class="reference external" href="https://docs.python.org/3/library/pathlib.html">The ‘pathlib’ module</a></p></li>
<li><p><a class="reference external" href="https://docs.python.org/3/library/functions.html#open">The ‘open’ function</a></p></li>
<li><p><a class="reference external" href="https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files">Official tutorial: reading and writing files</a></p></li>
<li><p><a class="reference external" href="https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob">Globbing files</a></p></li>
<li><p><a class="reference external" href="https://docs.python.org/3/library/pickle.html">The pickle module</a></p>
<ul>
<li><p><a class="reference external" href="https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled">What can and cannot be pickled?</a></p></li>
</ul>
</li>
</ul>
</section>
<section id="Reading-Comprehension-Solutions">
<h2>Reading Comprehension Solutions<a class="headerlink" href="#Reading-Comprehension-Solutions" title="Permalink to this headline"></a></h2>
<p><strong>Basic glob patterns: Solutions</strong></p>
<ul class="simple">
<li><p>Glob all .txt files in the directory <code class="docutils literal notranslate"><span class="pre">./files</span></code> (answer: <code class="docutils literal notranslate"><span class="pre">./files/*.txt</span></code>)</p></li>
<li><p>Glob all files that contain ‘quirk’ as part of their file name (answer: <code class="docutils literal notranslate"><span class="pre">*quirk*</span></code>)</p></li>
<li><p>Glob all file that begins with ‘data’ (answer: <code class="docutils literal notranslate"><span class="pre">data*</span></code>)</p></li>
<li><p>Glob all file that starts with the letter ‘q’, contains a ‘w’, and ends with a ‘.npy’ extension (answer: <code class="docutils literal notranslate"><span class="pre">q*w*.npy</span></code>)</p></li>
</ul>
<p><strong>More glob patterns: Solutions</strong></p>
<p>Write a glob pattern for each of the following prompts</p>
<ul class="simple">
<li><p>Any file with an odd number in its name (answer: <code class="docutils literal notranslate"><span class="pre">*[13579]*</span></code>)</p></li>
<li><p>All txt files that have the letters ‘q’ or ‘z’ in them (answer: <code class="docutils literal notranslate"><span class="pre">*[qz]*.txt</span></code>)</p></li>
</ul>
</section>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="Matplotlib.html" class="btn btn-neutral float-left" title="Matplotlib" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="Modules_and_Packages.html" class="btn btn-neutral float-right" title="Import: Modules and Packages" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>© Copyright 2021, Ryan Soklaski.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>