# Lesson 06: Accessing Resources: Handling Files in Python

The goal of this resource is to quickly set you up to use Python in your own projects.

When working on a project in Python, you will need to access external resources. You might want to access **files** to import existing data stored on our computer, or export the output of our coding into a permanent file.

In [None]:
#Run this cell to import lesson questions
from QuestionsFiles import Q1, Q2, Q3, question, solution

---
## Lesson Objectives
- Access files within Python
- Manipulate files within Python
- Open .csv files with Pandas
- Write output to file


**Key Concepts:** open(), read(), write(), pandas.read_csv(), file handle

---
# Accessing Files

Python provides several ways of accessing files natively or through modules. 

There are several ways to open a file which will vary in the way they use memory and the way code is structured. 

For now, it is good to recognize the different options for when you encounter them in code.

We will look at the **open** and **read** functions and using **Pandas** to handle .csv files (spreadsheets).

# Opening text files with the open() function

Python's **open()** function uses a filepath to create a "file handle", a point of access to a file.

It does nothing else-- not even read it. It creates a variable that you can later use to read the file. 

```python
myfile= open('filename.txt', 'mode')
```

As with any variable, you can choose which name to assign (here, 'myfile').

You must give a string with the filename and filetype, ex. 'data.txt'. If the file is in another directory, you have to specify a filepath.

And you can optionally give a mode for opening the file. 'r' will read an existing file. 'w' allows you to write it (create a new file if one with that name does not exist, or write onto an existing file). 'a' allows you to append data to an existing file as new lines. You can also dictate whether a the file is not text but binary format by following the mode with the letter 'b'. 

Let's open a file:

In [None]:
theRaven = open('Other_files/TheRaven.txt', mode='r')

As can be seen, this line does not output any visible code. Instead, it has created an object called 'theRaven' that points to the file.

This becomes clear if we use print().

In [None]:
print(theRaven)

What we are seeing is some information on our file handle, the object that points to our file. 

The file handle is a type of object with several methods, including those that help us access the data.

**read()** and **write()** are methods of file objects used for reading from and writing to files. 

**read()** can be used in several ways to access a file.

# Reading files with the read() method

One option is to create a file handle and then use the read statement: 

```python
myfile= open('filename or path', 'mode')
data= myfile.read()
```

We use open to create a file handle, and then use the read() method, assigned to the variable 'data'. 

read() is a method of the open() function.

This option will bring in the whole file as a block of text, and you will later have to find ways to subdivide the file. 

Let's try it:

In [None]:
theRaven = open('Other_files/TheRaven.txt','r')
Raventext = theRaven.read()

As you can see, this also has not printed the content of the file. Instead, we have created an object that reads the content of the file. We can now print it:

In [None]:
print(Raventext)

**Exercise** We have prepared a file called TheWalrusandtheCarpenter.txt in our Other_files folder. Create a file handle with the name "walrus" and use it to print the contents of the file.

In [None]:
solution(Q1)

## Missing or Misspelled Filenames

Open() accepts filenames or filepaths.

Python will look for a filename in its current working directory. 

Alternatively, you can specify an absolute file path to help Python find your file.

If you get an error, you can import this module and use this command to check the current working directory:

`import os
os.getcwd()`

This can give you an indication as to whether Python is looking in the right folder for your file.

# The File close() Method

Files you open will eventually close on their own, but there are several reasons you should close your files when you are done with them:

1. To improve performance by minimizing the spaced used in the computer's RAM
1. To make sure all changes go into effect
1. To ensure they can be used by other programs and scripts
1. To avoid any accidental changes

>Source: https://stackoverflow.com/questions/25070854/why-should-i-close-files-in-python

It is good practice to use the filehandle.close() method to close your file when you are done with it.

In our case, this would be:

In [None]:
theRaven.close()

Now, when we call the object again, we will get an error. We can no longer act upon it:

In [None]:
theRaven.read()

# Variations on the open() method: 'with'

To ensure that we don't forget to close a file, a common way of opening a file is to use a 'with' statement:

```python
with open('file path', 'a') as myfile:
   data = myfile.read()
```

Here you should recognize the same elements as before: the open statement, with a file path or name and a mode (read or write), and a name for our file handle. 

```python
with open('file path', 'a') as myfile:
   data = myfile.read()
```

The difference is that with 'with', we are opening a code block. Anything you want to do to the data will have to remain in the indented block that follows. Once we exit the code block, we will also stop accessing the file.

In our case, we might open our file like this:

In [None]:
with open('Other_files/TheRaven.txt','r') as theRaven:
    Raventext = theRaven.read()
    print(Raventext)

This is equivalent to the code using **open()**, **read()**, **print()** and **close()** as discussed before but within the same code block. 

Note that as soon as you leave the indented block, you cannot do anything to the file.

# Variations on the open() method: 'for line in file'

The options above loaded the file as a block of text. To continue working with it as data, you would have to use functions and methods to split it or manipulate it in different ways. 

Another option is to create a file handle and then do something to the data, line by line.

```python
myfile= open('filename', 'mode')
for line in myfile:
    print(line)
```
    
This can be combined with loops and conditionals to select and manipulate specific lines from the file.

```python
myfile= open('filename', 'mode')
for line in myfile:
    if line.startswith('This'): # find lines that start with "This"
        print(line)
```

In our example, we might want to open the file in question and print only the lines that mention the raven:

In [None]:
theRaven=open('Other_files/TheRaven.txt','r')

for line in theRaven:
    if ' raven' in line:
        print(line)

We could conceive of more sophisticated ways of using the 'for line' statement. For instance, we could count the number of lines in the poem:

In [None]:
count = 0

theRaven=open('Other_files/TheRaven.txt','r')

for line in theRaven:
    count=count+1
print(count)

**Exercise:** Use the for line in file syntax to print the Walrus and the Carpenter poem line by line:

In [None]:
solution(Q2)

# Writing output to file

Just as you can use read() to read files, you can also export your output to a file with the **write()** method.

Like read(), write() is a method of a file object. Thus, you can use open() with write() similarly to the ways described above to save data to a file. 

Write() takes strings and writes them into a file opened under write mode. This file can exist already, or you can create it; if a filename doesn't exist, it will be created in the current directory.

Let's create a file for the number of lines in The Raven and export that value out to it:

In [None]:
with open('Other_files/Ravencount.txt','w') as file:
    file.write(str(count))

In [None]:
question(Q3)

---
# Opening Spreadsheets with Pandas

Oftentimes, we don't have a text file, organized into lines, but a spreadsheet with tabular data arranged into rows and columns. Common files are .csv or Excel files. 

**Pandas** is a module for data manipulation and analysis that provides a powerful solution for working with spreadsheet data.

>A **Dataframe** is a data structure for storing tabular data in rows and columns.

Pandas provides powerful methods for the manipulation, analysis and plotting of such data, and is worth getting to know from the beginning. We will introduce Pandas and dataframes here, and later give a more in-depth lesson.

# Pandas.read_csv()

The pandas.read_csv() method is similar to the file.read() method, but this is a method of pandas specifically designed for csv files. 

```python
import pandas as pd

data = pd.read_csv("filepath or name")
```

1. We import the pandas module, giving it the convenient and conventional alias 'pd'

2. We assign the variable "data" to the object we are creating for our file. We use the pandas.read_csv() method.

The read_csv method requires a filepath or filename with its extension.

There are many other options, including specifying rules for reading the header, specifying the encoding, etc., which are described in the pandas [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html).

Pandas will create a **Dataframe** for your spreadsheet, an object that allows you to display your data in rows and columns.

DataFrames can be operated on. You can extract information, as well as append new data and perform queries.

Let's open a CSV file with Pandas, and do some basic manipulation.

In [None]:
import pandas as pd
NGAData = pd.read_csv("Other_files/NGAData/constituents.csv")

As we can see, we have created a variable for our data. This is a table of artworks from the National Gallery, with attributes that describe the pieces (including an ID for the object, the title, dates related to its creation, among other variables).

Let's call it to see what we have. The .head() method of DataFrames allows us to see the first rows of the DataFrame.

In [None]:
NGAData.head()

As seen above, pandas organizes the information into an object structured into rows and columns. 

In lesson 10, we present how to manipulate and work with DataFrames.

---
# Lesson Summary

- The open() command creates a file handle, an object that serves as an access point to a file
- read() and write() can be used on the file handle to extract or add information to a file.
- The ```python with open() as file:```syntax helps ensure you close a file after finishing with it
- ```for line in file:```acts on lines in a file individually
- read() can be used in several ways
- CSVs are easily accessed with the Pandas module's pandas.read_csv() function. 

---
# Further Resources

[Python 4 Everybody - Files Lesson (video)](https://www.py4e.com/lessons/files)
