# Files and Folder using Python 

#### As we create and tap files for analysis, we need to stay organized programmatically.

- Let's understand Google Colab storage structure.

- We'll use the ```os module``` and ```pathlib``` to create, navigate and delete files and folders programmatically.

- We'll also use command line/UNIX commands like ```ls```, ```cd``` and ```mkdir```.

- [Download the sample files](https://drive.google.com/file/d/1uTbYtyV_1QBLvW0yUKaDRuk8Bgjhl3Er/view?usp=sharing) we will need.

In [12]:
## import libraries
import os  ## allows you to navigate, create, delete folders
from pathlib import Path ## allows to create paths to files and folders
import shutil ## To empty a directory with files in it, we use another library called shutil
# from google.colab import files ## code for downloading in google colab
import glob ## import the glob library for collecting specific files into a list


## UNIX Command Line

### NOTE: these commands have to be in empty cells

## Where am I?

## ```pwd```

In [1]:
pwd

'/Users/angeladegbesan/djournalism/Practical Python/in_class/practice'

## list directories

## ```ls```

In [2]:
ls

animals.csv
beverages.csv
ceo_bios.csv
disorganized.csv
lifeforms.csv
pdf_1.pdf
pdf_10.pdf
pdf_2.pdf
pdf_3.pdf
pdf_4.pdf
pdf_5.pdf
pdf_6.pdf
pdf_7.pdf
pdf_8.pdf
pdf_9.pdf
text_doc_01.txt
text_doc_02.txt
text_doc_03.txt
text_doc_04.txt
text_doc_05.txt
text_doc_06.txt
text_doc_07.txt
text_doc_08.txt
text_doc_09.txt
text_doc_10.txt
[34mvenv[m[m/
week-1-working-with-datatypes-BLANKS-Copy1.ipynb
week-1-working-with-datatypes-BLANKS.ipynb
week-2-inclass-exercises-BLANKS.ipynb
week-4-scraping-intro-beautifulsoup-BLANKS.ipynb
week-5-A-single-page-table_BLANKS.ipynb
week-5-B-inclass-non-tabular-scrape_BLANKS.ipynb
week-6-multi-page-table-scrape.ipynb
week-7-download_docs_BLANK.ipynb
week-7-flattening_lists_BLANK.ipynb
[34mweek-8-sample-folder[m[m/
week-8A-file_folder_mngmt_demo.ipynb
week-8B-download-and-read_BLANK.ipynb
week_3_A_list comprehensions BLANKS.ipynb
week_3_B_defined_functions-BLANKS.ipynb


## change directories

## ```cd```

let's enter our ```sample_data``` folder

In [3]:
pwd

'/Users/angeladegbesan/djournalism/Practical Python/in_class/practice'

In [4]:
cd ..

/Users/angeladegbesan/djournalism/Practical Python/in_class


In [5]:
cd in_class/

[Errno 2] No such file or directory: 'in_class/'
/Users/angeladegbesan/djournalism/Practical Python/in_class


## What does this folder hold?

## Back out of folder to the enclosing folder

```cd ..```

In [16]:
cd ~

/Users/angeladegbesan


Where am I?

In [17]:
pwd

'/Users/angeladegbesan'

In [18]:
ls

[34mApplications[m[m/
[34mCreative Cloud Files[m[m/
[34mCreative Cloud Files angel.adegbesan76@journalism.cuny.edu 08a11bd1f7306ac41bec5cc092a1d6a2155b35a79f191b3de3efd6ebe4a7d11b[m[m/
[34mDesktop[m[m/
[34mDocuments[m[m/
[34mDownloads[m[m/
[34mLibrary[m[m/
[34mMotion graphics[m[m/
[34mMovies[m[m/
[34mMusic[m[m/
[34mPictures[m[m/
[34mPublic[m[m/
[34mZoom SV Class[m[m/
[34mangel.adegbesan@yorkmail.cuny.edu Creative Cloud Files[m[m/
[34mcoding[m[m/
derby.log
[34mdjournalism[m[m/
get-pip.py
[34mgraphicIllustrator[m[m/
[34mopt[m[m/
[34mweek-8-sample-folder[m[m/


In [19]:
cd ~/week-8-sample-folder/

/Users/angeladegbesan/week-8-sample-folder


In [20]:
ls

alien-invasion.jpg  alien-invasion.png  delays.txt          manager.txt
alien-invasion.pdf  ceo_bios.csv        energy.csv          password.png


# COLAB ONLY Importing files

We can use an import library specific to Colab

## *WARNING*: These are temporary uploads. When you restart, you need to reupload.

```from google.colab import files```

```files.upload()```

Let's confirm where we are:

In [None]:
## UPLOAD FILE


### Let's see if it uploaded

# Programmatic Folder/Files Management

- We'll use the ```os module```.

In [21]:
## Python scriptable
os.listdir()

['ceo_bios.csv',
 'password.png',
 'energy.csv',
 'alien-invasion.jpg',
 'alien-invasion.png',
 'alien-invasion.pdf',
 'manager.txt',
 'delays.txt']

In [22]:
## what object is that?
type(os.listdir())

list

In [25]:
## create a path to folder called some_new_folder
## we store that path in a variable called my_new_directory
my_new_directory = Path("some_new_folder")

In [26]:
## create that directory
## exists_ok=True checks to see if the folder already exists
my_new_directory.mkdir(exist_ok=True)

### You don't have to create a variable for the path, but it is easier to resuse that path
```Path('folder_name/').mkdir(exist_ok=True)```

In [24]:
### create junk_folder

junk_folder = Path("my_junk_folder")
junk_folder.mkdir(exist_ok=True)

In [27]:
##OR

Path("folder_junk").mkdir(exist_ok=True)

UNIX command to show list of folders

In [28]:
ls

alien-invasion.jpg  ceo_bios.csv        [34mfolder_junk[m[m/        password.png
alien-invasion.pdf  delays.txt          manager.txt         [34msome_new_folder[m[m/
alien-invasion.png  energy.csv          [34mmy_junk_folder[m[m/


In [29]:
## show list programmatically
os.listdir()

['my_junk_folder',
 '.DS_Store',
 'ceo_bios.csv',
 'folder_junk',
 'some_new_folder',
 'password.png',
 'energy.csv',
 'alien-invasion.jpg',
 'alien-invasion.png',
 'alien-invasion.pdf',
 'manager.txt',
 'delays.txt']

## let's delete a folder

In [None]:
## remove an empty directory
## NOTE: This only removes empty directories

In [30]:
rmdir some_new_folder/

In [31]:
## show directory now programmatically
os.listdir()

['my_junk_folder',
 '.DS_Store',
 'ceo_bios.csv',
 'folder_junk',
 'password.png',
 'energy.csv',
 'alien-invasion.jpg',
 'alien-invasion.png',
 'alien-invasion.pdf',
 'manager.txt',
 'delays.txt']

## Manually add some junk to the junk folder and check its content.

Only then do the next step

In [32]:
rmdir my_junk_folder/

rmdir: my_junk_folder/: Directory not empty


In [33]:
os.chdir("..")

In [34]:
pwd

'/Users/angeladegbesan'

In [35]:
cd week-8-sample-folder/

/Users/angeladegbesan/week-8-sample-folder


## Delete junk_folder (this will break)

In [36]:
shutil.rmtree("my_junk_folder")

In [37]:
## show directory now USING OS
os.listdir()

['.DS_Store',
 'ceo_bios.csv',
 'folder_junk',
 'password.png',
 'energy.csv',
 'alien-invasion.jpg',
 'alien-invasion.png',
 'alien-invasion.pdf',
 'manager.txt',
 'delays.txt']

## back out of directory because you can't delete a folder while you're in it!

In [None]:
## show directory now USING OS


In [None]:
## Now delete all contents


In [None]:
## show directory now USING OS


## Zip folder and download using UNIX commands

In [None]:
## Use colab to download


# Take a detour to fix last week's download issue.

# glob

## Yes, glob.

glob is a UNIX-based library for collecting specific files into a list.

## Using a path

We can store our path structure to a variable.

Right-click on the folder in the left column and copy path:
```/content/sample_data```

This is the raw path. We are already in ```content``` so instead we want:
```sample_data``` plus what files we are looking for (let's say all csv files).

In [None]:
## grab only the csv files


In [38]:
pwd

'/Users/angeladegbesan/week-8-sample-folder'

In [39]:
ls

alien-invasion.jpg  ceo_bios.csv        [34mfolder_junk[m[m/
alien-invasion.pdf  delays.txt          manager.txt
alien-invasion.png  energy.csv          password.png


In [40]:
## grab all the files! 
my_csv_files = glob.glob ("*.csv")      ##it doesnt matter what the file name is, but it has to be a csv
my_csv_files

['ceo_bios.csv', 'energy.csv']

In [42]:
my_txt_files = glob.glob("*.txt")
my_txt_files

['manager.txt', 'delays.txt']

In [48]:
all_files = glob.glob ("alien-invasion.*")   ##it has to be an alien-invasion, but it doesnt matter what the anchor is
all_files

['alien-invasion.jpg', 'alien-invasion.png', 'alien-invasion.pdf']

In [49]:
al_files = glob.glob ("*alien*")  ##it doesnt matter what comes before or after, as long as it has alien in the name
al_files

['alien-invasion.jpg', 'alien-invasion.png', 'alien-invasion.pdf']

In [50]:
a_files = glob.glob ("*") ##doesnt matter what it is, just give me all the files
a_files

['ceo_bios.csv',
 'folder_junk',
 'password.png',
 'energy.csv',
 'alien-invasion.jpg',
 'alien-invasion.png',
 'alien-invasion.pdf',
 'manager.txt',
 'delays.txt']

In [None]:
## grab only the .md file(s)



In [None]:
## show directory now


In [None]:
## make a new directory called project_a


In [None]:
## show directory now


In [3]:
pwd

'/Users/angeladegbesan/djournalism/practical_python/in_class/practice'

In [4]:
ls

animals.csv
beverages.csv
ceo_bios.csv
disorganized.csv
lifeforms.csv
pdf_1.pdf
pdf_10.pdf
pdf_2.pdf
pdf_3.pdf
pdf_4.pdf
pdf_5.pdf
pdf_6.pdf
pdf_7.pdf
pdf_8.pdf
pdf_9.pdf
text_doc_01.txt
text_doc_02.txt
text_doc_03.txt
text_doc_04.txt
text_doc_05.txt
text_doc_06.txt
text_doc_07.txt
text_doc_08.txt
text_doc_09.txt
text_doc_10.txt
[34mvenv[m[m/
week-1-working-with-datatypes-BLANKS-Copy1.ipynb
week-1-working-with-datatypes-BLANKS.ipynb
week-2-inclass-exercises-BLANKS.ipynb
week-4-scraping-intro-beautifulsoup-BLANKS.ipynb
week-5-A-single-page-table_BLANKS.ipynb
week-5-B-inclass-non-tabular-scrape_BLANKS.ipynb
week-6-multi-page-table-scrape.ipynb
week-7-download_docs_BLANK.ipynb
week-7-flattening_lists_BLANK.ipynb
week-8A-file_folder_mngmt_demo.ipynb
week-8B-download-and-read_BLANK.ipynb
week_3_A_list comprehensions BLANKS.ipynb
week_3_B_defined_functions-BLANKS.ipynb


In [5]:
cd ..

/Users/angeladegbesan/djournalism/practical_python/in_class


In [10]:
cd week-8-practice-files/

/Users/angeladegbesan/djournalism/practical_python/in_class/practice/week-8-practice-files


In [None]:
## change directory into project_a


In [None]:
## show directory now


In [None]:
## upload all our files to it


In [None]:
## show directory now


# Start reading files

In [11]:
## create a text wrapper object by "reading" the 'read_sample1.txt' file
## remember we are already in the test folder
with open ("read_sample1.txt", "r") as myfile:
    print(myfile)

<_io.TextIOWrapper name='read_sample1.txt' mode='r' encoding='UTF-8'>


## We can interpret this ```<class '_io.TextIOWrapper'>``` to read the actual contents

In [12]:
## create a variable that holds our file name
file_name = "read_sample1.txt"

In [13]:
## read and print entire file
with open (file_name, "r") as myfile:
    print(myfile.read())

Pandemic Rages in the U.S., Spurring Quarantines and Mask Orders
By Margaret Newkirk, Jonathan Levin, and Michelle Fay Cortez

The Covid-19 pandemic is tearing through the U.S. heartland, setting records for hospitalizations and forcing businesses to rethink their plans to reopen as new modeling predicts the virus will kill 180,000 Americans by October.

With the U.S. seeing one of its highest-ever increases in cases Wednesday, some states took drastic measures, imposing face mask orders and internal quarantines. The country recorded more than 34,500 new infections for a second day, rattling markets as numbers neared the peak of 36,188 set April 24, when the virus was coursing through New York.



In [14]:
myfile

<_io.TextIOWrapper name='read_sample1.txt' mode='r' encoding='UTF-8'>

In [17]:
## read and print 50 characters
with open (file_name, "r") as myfile:
    print(myfile.read(65))

Pandemic Rages in the U.S., Spurring Quarantines and Mask Orders



## Saving file to memory
So far, we haven't saved the text. 
The content is only available inside ```with open```.
If we try to read the lines, outside the ```with open```, we'll get a ```ValueError: I/O operation on closed file.```

In [18]:
myfile.read()

ValueError: I/O operation on closed file.

## We fix that my saving the myfile object inside a variable

In [19]:
## read hold the first 25 characters in a variable
with open (file_name, "r") as myfile:
    first_50 = myfile.read(50)

In [20]:
## call the variable above
first_50

'Pandemic Rages in the U.S., Spurring Quarantines a'

In [21]:
## read the first line into a variable
with open (file_name, "r") as myfile:
    first_line = myfile.readline()

In [22]:
## call the variable above
first_line

'Pandemic Rages in the U.S., Spurring Quarantines and Mask Orders\n'

In [23]:
## read the whole thing into a variable
with open (file_name, "r") as myfile:
    all_text = myfile.read()

In [24]:
## call the variable above
all_text

'Pandemic Rages in the U.S., Spurring Quarantines and Mask Orders\nBy Margaret Newkirk, Jonathan Levin, and Michelle Fay Cortez\n\nThe Covid-19 pandemic is tearing through the U.S. heartland, setting records for hospitalizations and forcing businesses to rethink their plans to reopen as new modeling predicts the virus will kill 180,000 Americans by October.\n\nWith the U.S. seeing one of its highest-ever increases in cases Wednesday, some states took drastic measures, imposing face mask orders and internal quarantines. The country recorded more than 34,500 new infections for a second day, rattling markets as numbers neared the peak of 36,188 set April 24, when the virus was coursing through New York.\n'

In [25]:
print (all_text)

Pandemic Rages in the U.S., Spurring Quarantines and Mask Orders
By Margaret Newkirk, Jonathan Levin, and Michelle Fay Cortez

The Covid-19 pandemic is tearing through the U.S. heartland, setting records for hospitalizations and forcing businesses to rethink their plans to reopen as new modeling predicts the virus will kill 180,000 Americans by October.

With the U.S. seeing one of its highest-ever increases in cases Wednesday, some states took drastic measures, imposing face mask orders and internal quarantines. The country recorded more than 34,500 new infections for a second day, rattling markets as numbers neared the peak of 36,188 set April 24, when the virus was coursing through New York.



## It's more useful to save the text object inside a list. 
Remember, ```readlines()``` actually shows each line as part of a list.

In [26]:
## store entire text file in list
with open (file_name, "r") as myfile:
    all_text_list = myfile.readlines() ##readlines gives you a list of each lines. readline only gives you d 1st line
    
all_text_list

['Pandemic Rages in the U.S., Spurring Quarantines and Mask Orders\n',
 'By Margaret Newkirk, Jonathan Levin, and Michelle Fay Cortez\n',
 '\n',
 'The Covid-19 pandemic is tearing through the U.S. heartland, setting records for hospitalizations and forcing businesses to rethink their plans to reopen as new modeling predicts the virus will kill 180,000 Americans by October.\n',
 '\n',
 'With the U.S. seeing one of its highest-ever increases in cases Wednesday, some states took drastic measures, imposing face mask orders and internal quarantines. The country recorded more than 34,500 new infections for a second day, rattling markets as numbers neared the peak of 36,188 set April 24, when the virus was coursing through New York.\n']


## We can then slice our list

In [28]:
## Show list item 3
all_text_list [3]

'The Covid-19 pandemic is tearing through the U.S. heartland, setting records for hospitalizations and forcing businesses to rethink their plans to reopen as new modeling predicts the virus will kill 180,000 Americans by October.\n'