## Reading/Writing Multiple lines

Thus far, we have been reading/writing only a single line to a file but data files often have multiple lines or _rows_ for data. In this lesson, we're going to cover the other file object methods that allow us to write with larger files.



### Generating Random Data

Before we get into reading & writing files, let's take a look at interesting pypi library called **[Faker](https://pypi.org/project/Faker/)**. This is a really handy library to generate random data. A lot of times as data engineers we need to generate random data to test our data pipelines and this library provides an extensive number of methods to generate all sort of data like people, vehicles, credit cards, and emails in multiple languages.

Go ahead and install this package on your _virtual environment_. This package is already installed via the `requirements.txt` file for this lesson, don't be surprised if nothing happens:

In [None]:
%pip install Faker

**NOTE:** 

In Jupyter notebooks if you start a code cell with a percentage `%` or exclamation point `!`, the code cell is as a bash terminal instead of the default python kernel.

You can read the [documentation](https://faker.readthedocs.io/en/stable/index.html) for Faker to better understand how to use this module. Here's we're going to use a simple method called `.name()` to generate some random names:

In [None]:
from faker import Faker

# create a fake class
fake = Faker()

# generate 10 names
for i in range(10):
    name = fake.name()      # this method return a random name
    print(name)

Please note that:
- We create a new `Faker()` object called `fake`
- This object provides extensive methods to generate fake data. One of these methods is called `.name()` to return random full names
- [Explore](https://faker.readthedocs.io/en/stable/index.html) the Faker module. There are many other methods such as `.email()` or `.phone_number()`
- You can even set up Faker to generate names or addresses from other countries by passing the locale argument when instantiating the class: `fake = Faker(locale=['it_IT', 'en-US'])`. Read the docs! This will generate fake data for both USA and Italy.

### Writing Multiple Lines

Now, let's try to write our randomly generated names into a file where each name is on a separate _row_:

In [None]:
from faker import Faker

# create a fake class
fake = Faker()

# open the file as before
with open("./data/names.txt", "w", encoding="utf-16") as output_file:
    # loop to generate 10 names
    for i in range(10):
        name = fake.name()
        content = f"{name}\n"           # we must add an endline (ENTER) character to the end of our content
        output_file.write(content)      # write the content to file

print("done!")

Pay attention to some minor detail here:
- We open our file and create a file object (or handle) called `output_file`
- Since the `for` loop is inside our `with` block, the `.write()` method would be called 10 times resulting in writing 10 rows of names to our file
- Most _importantly_, we add an `\n` string literal (for ENTER) to the end of each line. You must write your own line terminators to the file. Enter or newline is commonly used as a _line terminator_ but other data formats could choose to have different terminators.

<br/>

It's important to see that we could also write multiple lines as once using the `writelines()` method of a file. The following code has the same results as before but now using the `writelines()` method instead:

In [None]:

# open the file as before
with open("./data/names.txt", "w", encoding="utf-16") as output_file:
    # store multiple lines into a list
    lines = []
    # loop to generate 10 names
    for i in range(10):
        name = fake.name()
        content = f"{name}\n"       # we must add an endline (ENTER) character to the end of our content
        lines.append(content)       # append to our list of lines to write

    # outside of the for loop, write all the 10 lines at once
    output_file.writelines(lines)


print("done!")

Open the file and examine its content. Since the data is randomly generated, you should get a different content every time you write your file.

#### Exercise

- Using the [Faker module documentation](https://faker.readthedocs.io/en/stable/index.html), generate two other random pieces of information such as email or phone_number for each person. Do **NOT** use the `.address()` method since these addresses can include newline characters that separates the street_address from city, state, zip.
- Write the name and these additional information on each line separated by a comma. For example a single like might look like this: `Clayton Stephenson,vaughnjanet@example.org,+1-150-770-2326x20019`
- Be sure to write 30 rows
- You can use either `.write()` or `writelines()`
- **PLEASE** change your file name to something different

In [None]:
# make a new Faker object

# open file and write 30 rows

#### Exercise

Do it again! This time you can write different lyrics of your favorite song into a file. Be sure to create a list with different lines of the song.

In [None]:
# write out the lyrics for your favorite song

### Reading Multiple Lines

In this example, we're going to see how to read the same file containing our randomly generated names.

Python makes reading lines from a file extremely easy. You can loop over the file object itself to read its content line by line. Think of the file object as a _collection of lines_ (or list of lines).

Let's see this in action:

In [None]:
# open our file for reading
with open("./data/names.txt", "r", encoding="utf-16") as input_file:
    for line in input_file:
        line = line.rstrip()
        print(line)

Let's digest this code quickly:
- You can see that by looping over our file object, we read individual lines from our file
- Python returns the line content **_including_** the terminating newline (\n) character
- We use the `str.rstrip()` method to remove the trailing newline character from our lines

**NOTE:** The file object return the line **_including_** the terminating newline (\n) character. To eliminate these, we can use the `str.rstrip()` method which remove trailing whitespace characters from the right of the string. 


#### Exercise

- Read the content of `data/biggie_smalls_juicy.txt` line by line
- Extra points: try to do some fun stuff:
  - Count the number of lines
  - Count the number of words (hint: use `str` built-in methods to split a line into words)
  - Count the number of characters
  - Print the line number where Salt-n-pepa is mentioned

In [None]:
# your code here


#### Exercise

- Read the content of `data/queen_latifah_ladies_first.txt` line by line
- Do fun stuff:
  - Count the number of times the word _"ladies"_ is mentioned (capitalized or not)
  - Count the number of lines

In [None]:
# your code here


<br/>

#### Alternatively: Using the `readline()` method

It's important to note that you can also use the file object `readline()` or `readlines()` to read lines from a file. 

Let's see both these in action:

In [None]:
# open our file for reading
with open("./data/names.txt", "r", encoding="utf-16") as input_file:
    line = input_file.readline()            # read the first line
    line_number = 1                         # keep track of line numbers
    while line:
        line = line.rstrip()                # drop the trailing newline
        print(f"{line_number}: {line}")
        line = input_file.readline()        # read the next line
        line_number += 1                    # incr. line number


Let's take a closer look at our code:
- In this example, we use the `while` loop to read our file
- The `readline()` method returns an empty string when it reaches the end of the file. This allows us to check for empty string in our `while` condition. Remember, empty strings in Python always evaluate to the boolean value `False`
- The rest is simple, we continue reading the next line until we encounter the end.

#### Exercise

- Read back the same file that you generated earlier with multiple random fields per person
- Split your lines by `,` to get back the original fields

In [None]:
# enter your code here