<img src="images/notebook8_header.png" width="1024" alt="Python for Geospatial Data Science" style="border-radius:10px"/>

**Dr Gunnar Mallon** (g.mallon@rug.nl), *Department of Cultural Geography (Faculty of Spatial Science)*, *University of Groningen*

---


In this lecture, we'll explore how to read files in Python. Whether you're working with text files, CSVs, JSON, or any other data format, the ability to read and process files is crucial for various programming tasks.


## File Handling Basics

File input/output (I/O) is a fundamental concept in programming. It allows us to read data from files and write data to files. This is particularly useful when we want to store and retrieve information that persists beyond the lifetime of a program.

#### Importance of File I/O in Programming

File I/O is important because it enables us to:

1. **Read data from files**: We can extract information stored in files, such as text documents, CSV files, JSON files, etc., and use it in our programs. For example, we can read a CSV file containing geospatial data and perform calculations on it.

2. **Write data to files**: We can store data generated by our programs into files for future use or to share with others. For example, we can write the output of a program to a text file or save a graph that you've made.

3. **Manipulate files**: We can create, delete, rename, and modify files using file I/O operations. This allows us to manage files and directories on our computer.

#### Common Use Cases for Reading Files in Python

Reading files is a common task in many programming scenarios. Some common use cases include:

1. **Data analysis**: Reading data from files is essential for performing data analysis tasks.

2. **Configuration files**: Many programs use configuration files to store settings and preferences. Reading these files allows us to customize the behavior of our programs.

3. **Text processing**: Reading text files is often necessary for tasks such as parsing, searching, and manipulating text data.

#### Overview of Different File Formats

Python supports various file formats, the most common ones that you'll work with are:

1. **Text files**: These files contain plain text and are the most common type of file. They can be opened and read using Python's built-in functions.

2. **CSV files**: CSV (Comma-Separated Values) files store tabular data, with each line representing a row and each value separated by a comma. Python provides libraries to read and write CSV files.

3. **JSON files**: JSON (JavaScript Object Notation) files store structured data in a human-readable format. Python has built-in support for reading and writing JSON files.

When we look at GeoPandas, you will also start working with shape files.

### Opening and Closing Files

Before we can read or write data to a file, we need to open it. Python provides the `open()` function for this purpose.

#### Opening Files with `open()`

To open a file, we need to specify its name and the mode in which we want to open it. The mode determines whether we want to read, write, or append to the file, as well as whether we want to work with binary or text data.

Here's the general syntax for opening a file:

```python
file = open(filename, mode)
```

- `filename` is the name of the file we want to open, including the file extension.
- `mode` is a string that specifies the mode in which we want to open the file.

#### Specifying File Modes

Python supports several file modes, including:

- `'r'`: Read mode. Opens the file for reading. This is the default mode if no mode is specified.
- `'w'`: Write mode. Opens the file for writing. If the file already exists, its contents will be overwritten. If the file does not exist, a new file will be created.
- `'a'`: Append mode. Opens the file for appending. If the file already exists, new data will be added to the end of the file. If the file does not exist, a new file will be created.
- `'b'`: Binary mode. Opens the file in binary mode, allowing us to read or write binary data.
- `'t'`: Text mode. Opens the file in text mode, allowing us to read or write text data. This is the default mode if no mode is specified.

We can combine these modes by specifying multiple characters. For example, `'rb'` opens the file in binary read mode.

#### Closing Files

After we finish working with a file, it's important to close it to free up system resources. We can close a file using the `close()` method.

```python
file.close()
```

However, it's easy to forget to close a file, especially if an error occurs. To ensure that a file is always closed, we can use the `with` statement. The `with` statement automatically takes care of closing the file for you.

```python
with open(filename, mode) as file:
    # Perform file operations here
```

The `with` statement ensures that the file is closed as soon as we exit the block of code. This is a recommended practice for working with files in Python.

Now that we understand the basics of file handling, let's practice opening and closing files in Python.

---
### Exercises

1. Create a text file named "example.txt" and write the following text to it: "Hello, world!"

2. Open the file in read mode and print its contents.

3. Open the file in append mode and add the text "This is an example file." to it.

4. Open the file in read mode again and print its updated contents.

---
## Reading Text Files

### Reading Text Files Line by Line

In this section, we will learn how to read text files line by line using file objects in Python. Reading files line by line is a common task when working with text data, as it allows us to process each line individually.

To read a text file line by line, we first need to open the file using the `open()` function. 

```python
file = open('file.txt', 'r')
```

Once we have opened the file, we can iterate through its lines using a `for` loop. Each line can be accessed using the file object as an iterator.

```python
file = open('file.txt', 'r')

for line in file:
    # Process each line here
    print(line)
    
file.close()
```


### Reading the Entire File

In some cases, we may want to read the entire contents of a text file at once. This can be useful when we need to process the file as a single string or when we want to work with the file content as a list of lines.

To read the entire file, we can use the `read()` method of the file object. This method returns the entire content of the file as a string.

```python
file = open('file.txt', 'r')
content = file.read()
print(content)
file.close()
```

Alternatively, we can use the `readlines()` method to read the file content as a list of lines.

```python
file = open('file.txt', 'r')
lines = file.readlines()
print(lines)
file.close()
```

When working with large text files, it is important to consider strategies for efficiently reading them. Reading the entire file at once may not be feasible if the file is too large to fit in memory. In such cases, we can process the file line by line or in smaller chunks using techniques like buffering.

ðŸš€ Go on, try it out! ðŸš€

### Error Handling and Exception Handling

When reading files, we may encounter potential errors such as file not found, permission denied, or invalid file format. To handle these errors gracefully, we can use error handling and exception handling techniques.

```python
try:
    file = open('file.txt', 'r')
    # Process the file here
    file.close()
except FileNotFoundError:
    print("File not found!")
except PermissionError:
    print("Permission denied!")
except Exception as e:
    print("An error occurred:", str(e))
```

By using specific exception types in the `except` blocks, we can handle different types of errors separately. If we don't know the specific exception type, we can use the generic `Exception` class to catch any type of exception.

To summarize, in this section, we learned how to read text files line by line using file objects, how to read the entire file at once, and how to handle potential errors and exceptions when reading files. These skills are essential for working with text data and processing files in Python.

## Reading Structured Data

### Reading CSV Files

CSV (Comma-Separated Values) files are a common format for storing structured data. This is how I work with your exam results ðŸ˜‰ They consist of plain text data where each line represents a row and each value within a row is separated by a comma. In this section, we will learn how to read and parse CSV files using the `csv` module in Python.

To begin, we need to import the `csv` module:

```python
import csv
```

The `csv` module provides a `reader` object that allows us to read data from a CSV file. We can open a CSV file using the `open()` function and pass the file object to the `reader()` function:

```python
with open('data.csv', 'r') as file:
    csv_reader = csv.reader(file)
```

By default, the `csv.reader` object treats the first line of the CSV file as the header. We can access the header using the `next()` function:

```python
header = next(csv_reader)
```

To access the data rows, we can iterate over the `csv_reader` object:

```python
for row in csv_reader:
    print(row)
```

Each row is returned as a list of values, where each value corresponds to a column in the CSV file.

Sometimes, CSV files may have a different delimiter character instead of a comma. We can specify the delimiter using the `delimiter` parameter when creating the `csv_reader` object:

```python
csv_reader = csv.reader(file, delimiter=';')
```

ðŸš€ In the zip file for this lecture, you will find a file `data.csv` you can use that to play around with opening csv files ðŸš€

### Reading JSON Files

JSON (JavaScript Object Notation) is a lightweight data interchange format that is easy for humans to read and write. It is widely used for storing and transmitting structured data. This is how most computers communicate with each other when they are sending data between them. In this section, we will learn how to read and decode JSON data using the `json` module in Python.

To begin, we need to import the `json` module:

```python
import json
```

The `json` module provides a `load()` function that allows us to read JSON data from a file. We can open a JSON file using the `open()` function and pass the file object to the `load()` function:

```python
with open('data.json', 'r') as file:
    json_data = json.load(file)
```

The `json.load()` function decodes the JSON data and returns a Python object (this is not covered in this course). We can then access and process the data as needed.

JSON data can contain nested structures, such as dictionaries within dictionaries or lists within dictionaries. We can access and process nested structures using indexing and iteration:

```python
for item in json_data:
    student_id = entry['student_id']
    address_type = entry['address_type']
    print(f"Student ID: {student_id}, Address Type: {address_type}")
```

In this example, we are accessing the value of the 'name' key within each item in the 'items' list. 

ðŸš€ In the zip file, you will also find a file called data.json. Please have a play around with loading the file and extracting data from it. ðŸš€

### Reading Other Data Formats

Apart from CSV and JSON, there are other common data formats that we may encounter, such as XML and Excel. While Python provides basic support for reading XML files using the `xml` module, it is often more convenient to use third-party libraries for specific data formats.

For XML, we can use libraries like `xml.etree.ElementTree` or `lxml` to parse and process XML data. Similarly, for Excel files, we can use libraries like `pandas` (we'll cover this next week) or `openpyxl` to read and manipulate Excel data.

## File Paths and Directories

In this section, we will learn about file paths and how to work with them in Python. A file path is the location of a file or directory in a file system. There are two types of file paths: absolute and relative.

#### Absolute File Paths

An absolute file path specifies the complete location of a file or directory from the root of the file system. It starts with the root directory, followed by the directories leading to the file or directory. For example, on a Windows system, an absolute file path might look like this: `C:\Users\John\Documents\file.txt`. On a Unix-based system (such as a Mac), it might look like this: `/home/john/documents/file.txt`.

To create an absolute file path in Python, we can use the `os` module. The `os.path.join()` function allows us to join multiple path components together to create a complete file path. Here's an example:

```python
import os

path = os.path.join('C:', 'Users', 'John', 'Documents', 'file.txt')
print(path)
```

Output:
```
C:\Users\John\Documents\file.txt
```

#### Relative File Paths

A relative file path specifies the location of a file or directory relative to the current working directory. It does not start with the root directory. Instead, it starts with a directory or file name and navigates from there. For example, if the current working directory is `/home/john`, a relative file path to a file in the `documents` directory might look like this: `documents/file.txt`.

To create a relative file path in Python, we can simply specify the path as a string. Here's an example:

```python
path = 'documents/file.txt'
print(path)
```

Output:
```
documents/file.txt
```

#### Navigating Directories and Locating Files

The `os` module provides several functions for navigating directories and locating files. Here are a few commonly used functions:

- `os.getcwd()`: Returns the current working directory as a string.
- `os.chdir(path)`: Changes the current working directory to the specified path.
- `os.listdir(path)`: Returns a list of all files and directories in the specified path.
- `os.path.exists(path)`: Returns `True` if the specified path exists, otherwise `False`.
- `os.path.isfile(path)`: Returns `True` if the specified path is a file, otherwise `False`.
- `os.path.isdir(path)`: Returns `True` if the specified path is a directory, otherwise `False`.

Here's an example that demonstrates how to use these functions:

```python
import os

# Get the current working directory
current_dir = os.getcwd()
print("Current directory:", current_dir)

# Change the current working directory
os.chdir('documents')
print("Changed directory:", os.getcwd())

# List all files and directories in the current directory
files = os.listdir()
print("Files in current directory:", files)

# Check if a file exists
file_path = 'file.txt'
if os.path.exists(file_path):
    print(file_path, "exists")
else:
    print(file_path, "does not exist")

# Check if a path is a file or directory
if os.path.isfile(file_path):
    print(file_path, "is a file")
elif os.path.isdir(file_path):
    print(file_path, "is a directory")
else:
    print(file_path, "is neither a file nor a directory")
```

Output:
```
Current directory: /home/john
Changed directory: /home/john/documents
Files in current directory: ['file.txt', 'folder']
file.txt exists
file.txt is a file
```

## Best Practices and Considerations

### Best Practices for Reading Files

When working with files, it is important to follow some best practices to ensure efficient and safe file handling. Here are some key points to consider:

1. **Opening Files**: Always use the `with` statement when opening files. This ensures that the file is properly closed after you are done with it, even if an exception occurs. For example:

   ```python
   with open('file.txt', 'r') as file:
       # Perform operations on the file
   ```

2. **Reading Files**: Use the appropriate mode when opening files for reading. The most common modes are `'r'` for reading in text mode and `'rb'` for reading in binary mode. For example:

   ```python
   with open('file.txt', 'r') as file:
       content = file.read()
   ```

   You can also read the file line by line using the `readline()` method or iterate over the file object directly.

3. **Closing Files**: Although the `with` statement takes care of closing the file for you, it is good practice to explicitly close the file after you are done with it, especially if you are working with a large number of files. This can be done using the `close()` method. For example:

   ```python
   file = open('file.txt', 'r')
   # Perform operations on the file
   file.close()
   ```

4. **Efficiently Processing Large Files**: When working with large files, it is important to process them efficiently to avoid memory issues. Instead of reading the entire file into memory at once, consider reading it line by line or in chunks. This can be done using a loop or the `readlines()` method. For example:

   ```python
   with open('large_file.txt', 'r') as file:
       for line in file:
           # Process each line
   ```

   Alternatively, you can use libraries like `pandas` or `numpy` to handle large datasets more efficiently. We will look at both of these in the next section of the course.

## Conclusion

In this lecture, you've acquired the essential skills for reading files in Python. You've learned the basics of file I/O, how to read text files line by line, read structured data from formats like CSV and JSON, and manage file paths and directories. These skills are fundamental for various data processing, analysis, and manipulation tasks in Python.

Well done for making it this far ðŸŽ‚ Next we will look at three libraries, Numpy, Pandas, and GeoPandas in much more detail, so buckle up.