# Lesson 4.6: Parsing Log Files
# Activity 6A: Opening and Reading Files


## Open a File

The easiest way to open a file is using the `open` method, which is built-in in Python. 

`file_name = ”/voc/public/passwd”`

`open_file = open(file_name)`

Once you have opened the file, the object will have a a few methods available. Here are some useful ones:
* ```close()```	Closes the file
* ```read()```	Returns the file content
* ```readline()```	Returns one line from the file
* ```readlines()```	Returns a list of lines from the file
* ```seek()```	Change the file position
* ```tell()```	Returns the current file position

Let's start with the `open` method. The command is:
`file_name=open('file_name',mode)`

Here are descriptions of all the modes:
* ```r```: Opens a file for reading.
* ```w```: Opens a file for writing. Creates a new file if it does not exist or truncates the file if it exists.
* ```x```: Opens a file for exclusive creation. If the file already exists, the operation fails.
* ```a```: Opens a file for appending at the end of the file without truncating it. Creates a new file if it does not exist.
* ```t```: Opens in text mode.
* ```b```: Opens in binary mode.
* ```+```: Opens a file for updating.


In [None]:
file_name = "/voc/public/passwd"
open_file = open( file_name, "r")


To see the content of the file, use a method called `read()`. Let’s continue with the example of the `/voc/public/passwd` file.

In [None]:
open_file.read()


As we can see, this is not organized in an easy-to-read way. Another thing that happened is the cursor moved to the end of the file.
If you are interested to see where the cursor is in the file, use the `tell()` method.

In [None]:
open_file.tell()

If you want to go to the beginning of the file, use the `seek()` method. This goes back to the beginning.

In [None]:
open_file.seek(0)
open_file.tell()

If you want to read line by line, use the `readline()` method.

In [None]:
open_file.readline()

Using this method, you can see the first line. After you read the first line, the cursor moves one line down. If you ask to print the line again, you will get the second line and so on.

In [None]:
open_file.readline()

Since we are interested in seeing all of the lines in an easy-to-read format, use a for loop.
Using `readlines()` will put the contents if the file in a list. Once you have it in a list you can loop over it using a for loop.

In [None]:
for line in open_file.readlines():
    print(line)

**Note:** Notice the use of `readlines()` instead of `readline()`. It is better method to read line-by-line than the `readline()` method.

## Close a File

When you finish reading a file you need to close it to vacate the space in memory. You can do that by using the `close()` method.

In [None]:
open_file.close()

To prevent you from forgetting to close a file, use a method that opens and closes a file automatically. This can be done using the `with()` method.

**Note:** There is no need to close the file here.

```with open(file_name,'r') as f:```


In [None]:
file_name="/voc/public/passwd"
with open(file_name,'r') as f:
    for line in f:
        print(line)

## Instructor Demo
Write a script that:
* Opens the `/voc/public/passwd` file
* Reads the entire file
* Goes back to the beginning of the file 
* Iterates over the file and prints each line
* Closes the file 

In [None]:
def show_file(f):
    print(f.read())
    f.seek(0)
    for line in f.readlines():
        print(line)
    f.close()

def main():
    f=open("/voc/public/passwd")
    show_file(f)
main()

## Student Exercise 
### Problem 1 
Write a script that:
* Opens the `log.txt` file in your current directory 
* Prints the entire file 
* Goes back to the beginning of the file 
* Prints only the first line
* Prints each line of the log file 
* Closes the file 

### Problem 2
Write a script that:
* Opens the `log1.txt` file 
* Prints every line in the file without the new line character 
* Closes the file 

### Problem 3
Write a script that:
* Opens the `/voc/public/passwd` file 
* Counts how many lines are in the file 
* Prints that number
* Closes the file 

# Activity 6B: File Parsing

## Instructor Demo

Let's try to parse the `passwd` file.

In [None]:
with open("./voc/public/passwd") as f:
    for line in f.readlines():
        list=line.split(":")
        
        user=list[0]
        print("user: "+user)
        description=list[4]
        print("description: "+description)
        home_dir=list[5]
        print("home_dir: "+home_dir)
        print("***************")

Now that you can read the content of a file you can start extracting information from it.
As shown previously, the way to parse a file is by first knowing the headers of each column. Each log file will be different, meaning you need to take some time to learn the structure of each log file.

Let’s take a look at an Apache file. This file logs every access to this local website.

`cat access.log.1`
* 10.0.2.17 - - [19/Feb/2022:13:57:30 -0500] "GET / HTTP/1.1" 200 10956 "-" "curl/7.68.0"
* 10.0.2.17 - - [19/Feb/2022:13:57:58 -0500] "GET / HTTP/1.1" 200 10956 "-" "curl/7.68.0"

As you can see, the first field is the client IP that accesses the website. The second field is the date and time. The third field is the `http` method used to access the website. The result status and finally the last field is the user agent.

You can use Python to extract information from this file if you take one line and put it in a string.

`list1=line1.split()`

```ip.src=list1[0]
date=list1[3]
Http_metod=list1[5]
response_code=list1[8]
user_agent=list1[11]


In [None]:
def parse_log(line):
     list1=line.split()
     #print(list1)
     ip_src=list1[0]
     date=list1[3]
     Date=date.strip("[")
     print("time: " + date )
     Http_method=list1[5]
     print("Http_method:"+Http_method)
     Response_code=list1[8]
     print("Response_code: "+Response_code)
     User_agent=list1[11]
     print("User_agent: "+User_agent)
     print("*********************")

#create a new line with the fields we are interested in 
                   
def main():
    with open(".voc/public/access1.log") as f:
       for line in f.readlines():
           parse_log(line)
main()

## Student Exercise
### Problem 1 

Write a script that:
* Opens the `iptablessyslog.txt` file
* Iterates over the lines of the file 
* Prints the IP src, IP dest, src port, dst port, and protocol for each line
* Closes the file 

### Problem 2
Write a script that:
* Opens the file `network.log`
* Iterates over the lines of the file 
* Prints IP src, IP dst, port src, port dst 
* Closes the file