# READING AND WRITING FILES

Variables are temporary for storing data while for a 'permnant solution', you need to save data in a file. You can think of a file’s contents as a **single string value**, potentially gigabytes in size. In this chapter, you will learn how to use Python to **create, read, and save files** on the hard drive.

## Files and File Paths

A file has two key properties: a *path* and  a *filename*.

A example: `C:\Users\Al\Documents\project.docx`

  * *filename*: `project.docx`
  *  *path*: `C:\Users\Al\Documents` (windows)
      * root: `C:\` (windows), `/` (linux)
      * seprator: `\` (windows), `/` (linux)

![image.png](path.png)

## The Current Working Directory

In [1]:
from pathlib import Path
import os

cwd = Path.cwd() # current working directory

cwd

PosixPath('/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/chapters/Files')

In [2]:
os.getcwd()

'/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/chapters/Files'

In [4]:
new_dir = '/tmp/test'

os.makedirs(new_dir)

In [5]:
os.chdir(new_dir)

Path.cwd()

PosixPath('/tmp/test')

In [6]:
os.chdir(cwd)

Path.cwd()

PosixPath('/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/chapters/Files')

In [7]:
new_dir = '/home/cuyghur/python_books/'

os.chdir(new_dir)

os.listdir(Path.cwd())

['python_oeguenish_at_dr_dil', 'learn_python_at_dr_dil']

## The Home Directory

All users have a folder for their own files on the computer called the home directory or home folder.

The home directories are located in a set place depending on your operating system:

   * On Windows, home directories are under `C:\Users`.
   * On Mac, home directories are under `/Users`.
   * On Linux, home directories are often under `/home`.

In [8]:
Path.home()

PosixPath('/home/cuyghur')

## Absolute vs. Relative Paths

There are two ways to specify a file path:

   * An absolute path, which always begins with the root folder
   * A relative path, which is relative to the program’s current working directory
   
![image.png](path_rel_abs.png)   

In [2]:
import os
from pathlib import Path

cwd = Path.cwd()

cwd

cwd/'bla'

PosixPath('/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/chapters/Files/bla')

In [12]:
print('path\\bla')

path\bla


In [13]:
os.chdir('../')
Path.cwd()

PosixPath('/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/chapters')

In [14]:
os.chdir(cwd)
Path.cwd()

PosixPath('/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/chapters/Files')

In [12]:
os.chdir('.')
Path.cwd()

PosixPath('/home/cuyghur/python_books')

## Creating New Folders 

In [24]:
import os

os.listdir(cwd/'learn_python_at_dr_dil/learn_python_at_dr_dil/chapters/')

['data_structure',
 'iterations',
 'control_flow',
 'functions',
 'Files',
 '.ipynb_checkpoints']

In [23]:
os.chdir('/home/cuyghur/python_books')
cwd = Path.cwd()
cwd

PosixPath('/home/cuyghur/python_books')

In [25]:
import os

folder_file = cwd/'learn_python_at_dr_dil/learn_python_at_dr_dil/chapters/'/'Files'

os.makedirs(folder_file)
os.chdir(folder_file)
os.getcwd()

FileExistsError: [Errno 17] File exists: '/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/chapters/Files'

## Get Absolute Paths

In [16]:
import os

os.path.abspath('.')

'/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/chapters/Files'

In [17]:
os.path.abspath('..')

'/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/chapters'

In [18]:
os.path.isabs('.')

False

In [19]:
os.path.isabs(os.path.abspath('.'))

True

## Getting the Parts of a File Path

![image.png](attachment:de60a208-e65e-46ad-a02f-2b071071d232.png)

In [20]:
import os

os.chdir('/home/cuyghur/python_books/learn_python_at_dr_dil/')

os.listdir('.')

['.git',
 'LICENSE',
 'requirements.txt',
 'CONDUCT.md',
 'README.md',
 '.github',
 'CONTRIBUTING.md',
 '.ipynb_checkpoints',
 'learn_python_at_dr_dil']

In [21]:
from pathlib import Path

cwf = Path.cwd()/os.listdir('.')[3]

cwf

PosixPath('/home/cuyghur/python_books/learn_python_at_dr_dil/CONDUCT.md')

In [22]:
cwf.anchor

'/'

In [23]:
cwd.parent

PosixPath('/home/cuyghur')

In [24]:
cwd.name

'python_books'

In [25]:
cwd.stem

'python_books'

In [26]:
cwd.suffix

''

In [27]:
cwd.drive

''

In [28]:
os.path.basename(cwf)

'CONDUCT.md'

In [29]:
os.path.dirname(cwf)

'/home/cuyghur/python_books/learn_python_at_dr_dil'

In [30]:
os.path.split(cwf)

('/home/cuyghur/python_books/learn_python_at_dr_dil', 'CONDUCT.md')

In [31]:
str(cwf).split(os.sep)

['', 'home', 'cuyghur', 'python_books', 'learn_python_at_dr_dil', 'CONDUCT.md']

## Finding File Sizes and Folder Contents

In [28]:
Path.cwd()

PosixPath('/home/cuyghur/python_books')

In [29]:
os.listdir('.')

['python_oeguenish_at_dr_dil', 'learn_python_at_dr_dil']

In [37]:
cwf = Path.cwd()/'learn_python_at_dr_dil'
cwf = cwf/'requirements.txt'
cwf

PosixPath('/home/cuyghur/python_books/learn_python_at_dr_dil/requirements.txt')

In [39]:
os.path.getsize(cwf)

40

In [52]:
os.listdir('./learn_python_at_dr_dil')

['.git',
 'LICENSE',
 'requirements.txt',
 'CONDUCT.md',
 'README.md',
 '.github',
 'CONTRIBUTING.md',
 '.ipynb_checkpoints',
 'learn_python_at_dr_dil']

In [54]:
os.getcwd()

'/home/cuyghur/python_books'

In [63]:
?os.path.isfile

[0;31mSignature:[0m [0mos[0m[0;34m.[0m[0mpath[0m[0;34m.[0m[0misfile[0m[0;34m([0m[0mpath[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Test whether a path is a regular file
[0;31mFile:[0m      ~/miniconda3/envs/Python_book/lib/python3.10/genericpath.py
[0;31mType:[0m      function


In [4]:
os.getcwd()

'/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/chapters/Files'

In [15]:
import os
os.chdir('/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil')

for file in os.listdir('.'):
    file = f'{file}'
    if os.path.isfile(file):
        path = 'file'
    elif os.path.isdir(file):     
        path = 'directory'
        
    size = os.path.getsize(file)
    
    print(f"{file}, {path}, {size/10**6}M")

logo.png, file, 0.036302M
_config.yml, file, 0.001518M
python_books, directory, 0.004096M
chapters, directory, 0.004096M
_build, directory, 0.004096M
index.ipynb, file, 0.004995M
.ipynb_checkpoints, directory, 0.004096M
demo.ipynb, file, 0.02343M
logo.svg, file, 0.007923M
_toc.yml, file, 0.00036M


In [34]:
os.listdir('..')

['python_oeguenish_at_dr_dil', 'learn_python_at_dr_dil']

In [35]:
totalSize = 0
for filename in os.listdir('.'):
    totalSize = totalSize + os.path.getsize(os.path.join('.', filename))
totalSize   

24899

## Find files/directories with a pattern

### A Problem: `How to find files with a pattern?`

An Example:

At `/home/cuyghur/python_books/learn_python_at_dr_dil/`, 

the content are:

* .git
* LICENSE
* requirements.txt
* CONDUCT.md
* README.md
* .github
* CONTRIBUTING.md
* .ipynb_checkpoints
* learn_python_at_dr_dil

how to find the files with the suffix `md`?

The solution is following:

`glob.glob(pathname)`

In [75]:
import glob

wd = "/home/cuyghur/python_books/learn_python_at_dr_dil/"

md_files = glob.glob(wd+'/*md')

md_files

['/home/cuyghur/python_books/learn_python_at_dr_dil/CONDUCT.md',
 '/home/cuyghur/python_books/learn_python_at_dr_dil/README.md',
 '/home/cuyghur/python_books/learn_python_at_dr_dil/CONTRIBUTING.md']

In [76]:
from pathlib import Path

[Path(file).name for file in md_files]

['CONDUCT.md', 'README.md', 'CONTRIBUTING.md']

### Search Files Using Wildcard Characters

~~~
* : Matches everything 
*.md -> 'CONDUCT.md', 'README.md', 'CONTRIBUTING.md'

? : Matches any single character 
??????.md -> README.md

[]: Matches any character in the sequence  
[RDE]*md -> 'README.md'

[!]: Matches any character not in sequence 
[!RDE]*md -> 'CONDUCT.md', 'CONTRIBUTING.md'
~~~

In [77]:
md_files = glob.glob(wd+'/??????.md')

md_files

['/home/cuyghur/python_books/learn_python_at_dr_dil/README.md']

In [78]:
md_files = glob.glob(wd+'/[RDE]*.md')

md_files

['/home/cuyghur/python_books/learn_python_at_dr_dil/README.md']

In [79]:
md_files = glob.glob(wd+'/[!RDE]*.md')

md_files

['/home/cuyghur/python_books/learn_python_at_dr_dil/CONDUCT.md',
 '/home/cuyghur/python_books/learn_python_at_dr_dil/CONTRIBUTING.md']

### Search files in all subdirectories

An Example:

How to recursively find the `png` files of sub-directories at `/home/cuyghur/python_books/learn_python_at_dr_dil/`?


Solution:

In [81]:
import glob

wd = "/home/cuyghur/python_books/learn_python_at_dr_dil/"
glob.glob(wd+"/**/*png", recursive=True)  

['/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/logo.png',
 '/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/chapters/data_structure/dictionary_concept.png',
 '/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/_build/html/_static/logo.png',
 '/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/_build/html/_static/file.png',
 '/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/_build/html/_static/minus.png',
 '/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/_build/html/_static/plus.png',
 '/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/_build/html/_static/images/logo_colab.png',
 '/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/_build/html/_images/dictionary_concept.png',
 '/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/_build/_page/chapters-Files-Files_reading_writing/h

### glob vs iglob

The `glob.iglob()` works exactly the same as the `glob.glob()`. The advantage of `glob.iglob()` is to save the memory usage since it returns the `iterator` object which will load results in memory when called.


In [85]:
import sys 

wd = "/home/cuyghur/python_books/learn_python_at_dr_dil/"
files=  glob.glob(wd+"/**/*", recursive=True)  

sys.getsizeof(files)

2520

In [87]:
import sys 

wd = "/home/cuyghur/python_books/learn_python_at_dr_dil/"
files=  glob.iglob(wd+"/**/*", recursive=True)  

sys.getsizeof(files)

104

### Homework

Recursively find all files with a pattern (txt, png etc.)  at a given path and sort the files according to the date and time of modification. 

Hint:

In [91]:
import os

file_stat = os.stat('/home/cuyghur/python_books/learn_python_at_dr_dil/learn_python_at_dr_dil/logo.png')

file_stat

os.stat_result(st_mode=33204, st_ino=14188493, st_dev=66305, st_nlink=1, st_uid=1000, st_gid=1000, st_size=36302, st_atime=1641914073, st_mtime=1641420276, st_ctime=1641420276)

In [93]:
# the date and time of modification
file_stat.st_mtime

1641420276.1515005