# Lab 0.4 File IO

## Objective

1. Read information from files using Python
2. Use regular expressions to extract information from text
3. Create files using Python

*The challenge section and "just for fun" section are optional.*

## Rubric

- 6 pts - Contains all required components and uses professional language
- 5 pts - Contains all required components, but uses unprofessional language, formating, etc. 
- 4 pts - Contains some, but not all, of the required components
- 3 pts - Did not submit

## Part 1: Letter Frequency

A Caesar cipher, or a shift cipher, is one of the simplest encryption techniques. This method is named after Julius Caesar who would use it to send private messages. To encrypt information with a Caesar cipher, each letter in your message or plaintext is replaced by a letter a fixed numbers of positions away in the alphabet to generate your ciphertext.

For example, if I wanted to encrypt the message `ECHO` using a left shift of 3, I would rewrite each character by shifting the entire alphabet left by 3 characters. Using the chart and key below, we can see that `E -> B`, `C -> Z`, `H -> E`, and `O -> L`. So `ECHO` becomes `BZEL`.

![Pasted image 20231227102315](https://github.com/gormes-EPIC/FileIO-CSV-DSF/assets/134316348/36015604-5669-475c-a8c6-3d4674da98d4)
- Plaintext:  ABCDEFGHIJKLMNOPQRSTUVWXYZ
- Ciphertext: XYZABCDEFGHIJKLMNOPQRSTUVW

We can use the same cipher to encrypt the plaintext `THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG` as the ciphertext `QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD`. Then decrypt it using our key in the other direction and shifting right by 3.

As long as whoever is reading the message knows you have shifted the alphabet left by 3, it is straightforward to decrypt `BZEL` as `ECHO`. But what if you intercepted this message and didn't know the original shift? By exploiting patterns in the English language, we can actually decrypt Caesar ciphers without knowing the original shift. [Source](https://www.101computing.net/caesar-cipher/)


### Your Task

One way to break a Caesar cipher is to look at the frequency of the letters. In a typical English text, some letters are much more frequent that others.

To create your frequency table you will:

1. Using [Project Gutenburg](https://www.gutenberg.org/) download at least one book into your directory. *Hint: Once you navigate to a book, copy the URL of the Plain Text UTF-8 download and user the `wget` command in your terminal.*
2. Open your book using Python, count each of the letters, and create a frequency table.
3. After you are done, print out the information.

#### Example Output

```
A: 1023
B: 356
C: 40
...
```



In [1]:
letters = []

# Read the file and process each line
with open("pg75125.txt.1", "r") as file:
    for line in file:
        # Add all characters from the line to the letters list
        letters.extend(line.lower())

# Filter only alphabetic characters
letters = [char for char in letters if char.isalpha()]

# Count occurrences of each letter
occurrences = {char: letters.count(char) for char in sorted(set(letters))}

# Print the occurrences in sorted order
for char, count in occurrences.items():
    print(f"Letter: {char}, Occurrences: {count}")


Letter: a, Occurrences: 66016
Letter: b, Occurrences: 12360
Letter: c, Occurrences: 19741
Letter: d, Occurrences: 36838
Letter: e, Occurrences: 103724
Letter: f, Occurrences: 18178
Letter: g, Occurrences: 15766
Letter: h, Occurrences: 52504
Letter: i, Occurrences: 51866
Letter: j, Occurrences: 674
Letter: k, Occurrences: 6675
Letter: l, Occurrences: 36074
Letter: m, Occurrences: 21207
Letter: n, Occurrences: 56567
Letter: o, Occurrences: 63283
Letter: p, Occurrences: 14096
Letter: q, Occurrences: 707
Letter: r, Occurrences: 51148
Letter: s, Occurrences: 49516
Letter: t, Occurrences: 69031
Letter: u, Occurrences: 23674
Letter: v, Occurrences: 8526
Letter: w, Occurrences: 18594
Letter: x, Occurrences: 1155
Letter: y, Occurrences: 17677
Letter: z, Occurrences: 390
Letter: à, Occurrences: 9
Letter: â, Occurrences: 3
Letter: æ, Occurrences: 6
Letter: ç, Occurrences: 1
Letter: è, Occurrences: 17
Letter: é, Occurrences: 84
Letter: ê, Occurrences: 31
Letter: ë, Occurrences: 1
Letter: ô, Occurr

### Just for Fun! Break this Caesar Cipher

Decode the following ciphertext. Start by using the frequency table you just made and matching the most popular letters with the letters from above. *Tip: In addition to using your letter frequency table from above to help you, look at the 1 and 2 letter words carefully. There are limited options those characters could be! Also, look try to identify frequently used words like `THE` or `AND` in your ciphertext.*

  Ciphertext:

```

PA PZ H WLYPVK VM JPCPS DHY. YLILS ZWHJLZOPWZ, ZAYPRPUN MYVT H OPKKLU IHZL, OHCL DVU AOLPY MPYZA CPJAVYF HNHPUZA AOL LCPS NHSHJAPJ LTWPYL. KBYPUN AOL IHAASL, YLILS ZWPLZ THUHNLK AV ZALHS ZLJYLA WSHUZ AV AOL LTWLYVY'Z BSAPTHAL DLHWVU, AOL KLHAO ZAHY, HU HYTVYLK ZWHJL ZAHAPVU DPAO LUVBNO WVDLY AV KLZAYVF HU LUAPYL WSHULA. WBYZBLK IF AOL LTWLYVY'Z ZPUPZALY HNLUAZ, WYPUJLZZ SLPH YHJLZ OVTL HIVHYK OLY ZAHYZOPW, JBZAVKPHU VM AOL ZAVSLU WSHUZ AOHA JHU ZHCL OLY WLVWSL HUK YLZAVYL MYLLKVT AV AOL NHSHEF ....

```

## Part 2: Analyzing Server Activity

One important way for businesses to keep themselves secure is to monitor their server logs.

Read in `server_log.txt` containing server access logs with entries like "IP Address-Timestamp-Page Accessed". Notice which character we are using as a delimiter.

- Count the total number of unique IP addresses that accessed the server.
- Identify the top three most used IP addresses.
- Generate a report file `server_summary.txt` containing this information.

In [2]:
numbers = numbers = [
    "104.249.247.218",
    "248.238.205.128",
    "237.227.142.204",
    "248.40.19.101",
    "107.112.27.175",
    "237.227.142.204",
    "22.24.221.156",
    "41.241.198.100",
    "237.227.142.204",
    "237.227.142.204",
    "38.194.5.252",
    "141.60.118.184",
    "238.189.191.200",
    "74.142.220.226",
    "32.216.221.208",
    "22.24.221.156",
    "207.102.161.112",
    "130.108.71.170",
    "134.33.134.241",
    "36.219.106.53",
    "41.241.198.100",
    "11.97.193.242",
    "248.238.205.128",
    "84.175.5.79",
    "178.93.83.95",
    "207.102.161.112",
    "134.33.134.241",
    "23.125.237.117",
    "41.241.198.100",
    "65.140.160.255",
    "241.68.123.79",
    "237.227.142.204",
    "59.149.131.167",
    "32.216.221.208",
    "41.219.115.73",
    "11.97.193.242",
    "18.32.39.181",
    "232.246.124.165",
    "70.40.179.190",
    "35.15.107.59",
    "168.8.71.30",
    "248.40.19.101",
    "41.219.115.73",
    "84.175.5.79",
    "172.88.165.81",
    "234.0.152.47",
    "237.227.142.204",
    "116.176.205.112",
    "81.132.12.200",
    "22.19.43.81",
    "81.132.12.200",
    "238.189.191.200",
    "237.227.142.204",
    "41.241.198.100",
    "18.32.39.181",
    "145.179.63.57",
    "81.132.12.200",
    "74.142.220.226",
    "207.102.161.112",
    "141.60.118.184",
    "65.140.160.255",
    "104.52.100.123",
    "59.149.131.167",
    "3.62.212.127",
    "248.26.47.147",
    "23.125.237.117",
    "65.140.160.255",
    "89.155.15.63",
    "220.66.221.182",
    "49.83.80.133",
    "89.155.15.63",
    "36.219.106.53",
    "41.241.198.100",
    "113.244.127.167",
    "241.68.123.79",
    "41.241.198.100",
    "65.140.160.255",
    "65.140.160.255",
    "22.241.105.189",
    "116.176.205.112",
    "237.227.142.204",
    "241.68.123.79",
    "41.219.115.73",
    "107.90.202.165",
    "32.216.221.208",
    "84.175.5.79",
    "107.112.27.175",
    "120.177.53.51",
    "18.32.39.181",
    "59.149.131.167",
    "65.140.160.255",
    "104.249.247.218",
    "172.88.165.81",
    "64.119.52.191",
    "232.246.124.165",
    "64.119.52.191",
    "41.241.198.100",
    "130.108.71.170",
    "41.241.198.100",
    "172.88.165.87",
    "59.149.131.167",
    "35.15.107.59",
    "248.26.47.147",
    "107.90.202.165",
    "104.249.247.218",
    "113.244.127.167",
    "160.4.224.189",
    "178.93.83.95",
    "172.88.165.81",
    "248.57.118.100",
    "248.26.47.147",
    "145.179.63.57",
    "178.93.83.95",
    "22.19.43.81",
    "128.43.149.131",
    "130.108.71.170",
    "41.241.198.100",
    "49.83.80.133",
    "170.18.14.180",
    "41.241.198.100",
    "104.249.247.218",
    "113.244.127.167",
    "160.4.224.189",
    "178.93.83.95",
    "172.88.165.81",
    "248.57.118.100",
    "237.227.142.204",
    "237.227.142.204",
    "237.227.142.204",
    "237.227.142.204"]
# Count occurrences of each list item
occurrences = {}

# Iterate over the original list
for item in numbers:
    if item in occurrences:
        occurrences[item] += 1
    else:
        occurrences[item] = 1

# Print the counts
for item, count in occurrences.items():
    print(f"{item}: {count}")


104.249.247.218: 4
248.238.205.128: 2
237.227.142.204: 12
248.40.19.101: 2
107.112.27.175: 2
22.24.221.156: 2
41.241.198.100: 10
38.194.5.252: 1
141.60.118.184: 2
238.189.191.200: 2
74.142.220.226: 2
32.216.221.208: 3
207.102.161.112: 3
130.108.71.170: 3
134.33.134.241: 2
36.219.106.53: 2
11.97.193.242: 2
84.175.5.79: 3
178.93.83.95: 4
23.125.237.117: 2
65.140.160.255: 6
241.68.123.79: 3
59.149.131.167: 4
41.219.115.73: 3
18.32.39.181: 3
232.246.124.165: 2
70.40.179.190: 1
35.15.107.59: 2
168.8.71.30: 1
172.88.165.81: 4
234.0.152.47: 1
116.176.205.112: 2
81.132.12.200: 3
22.19.43.81: 2
145.179.63.57: 2
104.52.100.123: 1
3.62.212.127: 1
248.26.47.147: 3
89.155.15.63: 2
220.66.221.182: 1
49.83.80.133: 2
113.244.127.167: 3
22.241.105.189: 1
107.90.202.165: 2
120.177.53.51: 1
64.119.52.191: 2
172.88.165.87: 1
160.4.224.189: 2
248.57.118.100: 2
128.43.149.131: 1
170.18.14.180: 1


## Part 3: Creating Usernames

Use the file `emails.txt` to create a list of usernames and random passwords for each user. Then, output the emails, usernames, and random passwords into an output file `output.txt`.

The usernames should be the same username as the email. So for  `findlay_butler@hr.yahoo.com`, his username would be `findlay_butler`.

The passwords should be 8 characters long and a random combination of letters and numbers. 

For the first user, `output.txt` should look like: 
```
findlay_butler@hr.yahoo.com,findlay_butler,abiojash
```

### Challenge: Using Regular Expressions

Instead of using the email username as their user account, their username should be their first initial and their last name instead. So for `findlay_butler@hr.yahoo.com`  the username would be `fbutler`. The easiest way to do this is probably **using regular expressions.** 

For more explanation and practice with regular expressions, use [regexone.com](https://regexone.com/). For help creating your regular expression query, use [regex101.com](https://regex101.com/). 

In [3]:
import random

# Function to generate random password
def word():
    lower = "qwertyuioplkjhgfdsazxcvbnm"
    upper = "QWERTYUIOPLKJHGFDSAZXCVBNM"
    symbols = "!@#$%^&*()_+-={}|\][:;'?><,./"
    numbers = "1234567890"
    characters = lower + upper + symbols + numbers
    length = 8
    return ''.join(random.sample(characters, length))  # Return the generated password

# Function to process emails and generate output
def process_emails(input_file, output_file):
    try:
        with open(input_file, 'r') as infile:
            emails = [line.strip() for line in infile if line.strip()]

        with open(output_file, 'w') as outfile:
            for email in emails:
                username = email.split('@')[0]
                password = word()  # Get a random password
                outfile.write(f"{email},{username},{password}\n")
            print(f"Output written to {output_file}.")

    except FileNotFoundError:
        print(f"Error: {input_file} not found.")
    except Exception as e:
        print(f"An error occurred: {e}")

# Define input and output file paths
input_file = 'emails.txt'
output_file = 'output.txt'

# Process emails
process_emails(input_file, output_file)

Output written to output.txt.


  symbols = "!@#$%^&*()_+-={}|\][:;'?><,./"
