# Lab 0.4 File IO

## Objective

1. Read information from files using Python
2. Use regular expressions to extract information from text
3. Create files using Python

*The challenge section and "just for fun" section are optional.*

## Rubric

- 6 pts - Contains all required components and uses professional language
- 5 pts - Contains all required components, but uses unprofessional language, formating, etc. 
- 4 pts - Contains some, but not all, of the required components
- 3 pts - Did not submit

## Part 1: Letter Frequency

A Caesar cipher, or a shift cipher, is one of the simplest encryption techniques. This method is named after Julius Caesar who would use it to send private messages. To encrypt information with a Caesar cipher, each letter in your message or plaintext is replaced by a letter a fixed numbers of positions away in the alphabet to generate your ciphertext.

For example, if I wanted to encrypt the message `ECHO` using a left shift of 3, I would rewrite each character by shifting the entire alphabet left by 3 characters. Using the chart and key below, we can see that `E -> B`, `C -> Z`, `H -> E`, and `O -> L`. So `ECHO` becomes `BZEL`.

![Pasted image 20231227102315](https://github.com/gormes-EPIC/FileIO-CSV-DSF/assets/134316348/36015604-5669-475c-a8c6-3d4674da98d4)
- Plaintext:  ABCDEFGHIJKLMNOPQRSTUVWXYZ
- Ciphertext: XYZABCDEFGHIJKLMNOPQRSTUVW

We can use the same cipher to encrypt the plaintext `THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG` as the ciphertext `QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD`. Then decrypt it using our key in the other direction and shifting right by 3.

As long as whoever is reading the message knows you have shifted the alphabet left by 3, it is straightforward to decrypt `BZEL` as `ECHO`. But what if you intercepted this message and didn't know the original shift? By exploiting patterns in the English language, we can actually decrypt Caesar ciphers without knowing the original shift. [Source](https://www.101computing.net/caesar-cipher/)


### Your Task

One way to break a Caesar cipher is to look at the frequency of the letters. In a typical English text, some letters are much more frequent that others.

To create your frequency table you will:

1. Using [Project Gutenburg](https://www.gutenberg.org/) download at least one book into your directory. *Hint: Once you navigate to a book, copy the URL of the Plain Text UTF-8 download and user the `wget` command in your terminal.*
2. Open your book using Python, count each of the letters, and create a frequency table.
3. After you are done, print out the information.

#### Example Output

```
A: 1023
B: 356
C: 40
...
```



### Just for Fun! Break this Caesar Cipher

Decode the following ciphertext. Start by using the frequency table you just made and matching the most popular letters with the letters from above. *Tip: In addition to using your letter frequency table from above to help you, look at the 1 and 2 letter words carefully. There are limited options those characters could be! Also, look try to identify frequently used words like `THE` or `AND` in your ciphertext.*

  Ciphertext:

```

PA PZ H WLYPVK VM JPCPS DHY. YLILS ZWHJLZOPWZ, ZAYPRPUN MYVT H OPKKLU IHZL, OHCL DVU AOLPY MPYZA CPJAVYF HNHPUZA AOL LCPS NHSHJAPJ LTWPYL. KBYPUN AOL IHAASL, YLILS ZWPLZ THUHNLK AV ZALHS ZLJYLA WSHUZ AV AOL LTWLYVY'Z BSAPTHAL DLHWVU, AOL KLHAO ZAHY, HU HYTVYLK ZWHJL ZAHAPVU DPAO LUVBNO WVDLY AV KLZAYVF HU LUAPYL WSHULA. WBYZBLK IF AOL LTWLYVY'Z ZPUPZALY HNLUAZ, WYPUJLZZ SLPH YHJLZ OVTL HIVHYK OLY ZAHYZOPW, JBZAVKPHU VM AOL ZAVSLU WSHUZ AOHA JHU ZHCL OLY WLVWSL HUK YLZAVYL MYLLKVT AV AOL NHSHEF ....

```

## Part 2: Analyzing Server Activity

One important way for businesses to keep themselves secure is to monitor their server logs.

Read in `server_log.txt` containing server access logs with entries like "IP Address-Timestamp-Page Accessed". Notice which character we are using as a delimiter.

- Count the total number of unique IP addresses that accessed the server.
- Identify the top three most used IP addresses.
- Generate a report file `server_summary.txt` containing this information.

In [75]:
#open the file, as readable, and save the text under vairable lines 
with open("server_log.txt", "r") as file:
    lines = file.readlines()

#make new dict called ip_addresess
ip_addresses = {}
#for each line in the text (excluding line 0), split the lines by "-"
for line in lines[1:]:
    line_parts = line.split("-")
    ip = line_parts[0].strip()
    
    #if lines at i (starting with 1) is already in the dict change the value (frequency) to one more then before
    #if lines at i is not already in the dict add it to the dict with starting value 1
    if ip in ip_addresses:
        ip_addresses[ip] += 1 
    else:
        ip_addresses[ip] = 1 
# This converts the dictionary ip_addresses into a list of key-value pairs (tuples)
#key argument specifies to sort by keys rather than values
#lambda is the type of sort, x[1] says to use second element of the tuple which is the key
#reverse means that it is sorted from highest to lowest 
ip_addresses_list = sorted(ip_addresses.items(), key=lambda x: x[1], reverse=True)

#This loop iterates through the sorted_ips list, where each item is a tuple containing a key (IP address) and a value (frequency)
#The print formats it as frequency:IP address
for ip, count in ip_addresses_list:
    print(f"{count}:  {ip}")

for index, (ip, count) in enumerate(ip_addresses_list[:3], 1):
        print(f"{index}. {count}: {ip}")

num_values = len(ip_addresses)

print(f"{num_values} unique IP addresses accessed the server")


12:  237.227.142.204
10:  41.241.198.100
6:  65.140.160.255
5:  172.88.165.81
4:  104.249.247.218
4:  178.93.83.95
4:  59.149.131.167
3:  32.216.221.208
3:  207.102.161.112
3:  130.108.71.170
3:  84.175.5.79
3:  241.68.123.79
3:  41.219.115.73
3:  18.32.39.181
3:  81.132.12.200
3:  248.26.47.147
3:  113.244.127.167
2:  248.238.205.128
2:  248.40.19.101
2:  107.112.27.175
2:  22.24.221.156
2:  141.60.118.184
2:  238.189.191.200
2:  74.142.220.226
2:  134.33.134.241
2:  36.219.106.53
2:  11.97.193.242
2:  23.125.237.117
2:  232.246.124.165
2:  35.15.107.59
2:  116.176.205.112
2:  22.19.43.81
2:  145.179.63.57
2:  89.155.15.63
2:  49.83.80.133
2:  107.90.202.165
2:  64.119.52.191
2:  160.4.224.189
2:  248.57.118.100
1:  38.194.5.252
1:  70.40.179.190
1:  168.8.71.30
1:  234.0.152.47
1:  104.52.100.123
1:  3.62.212.127
1:  220.66.221.182
1:  22.241.105.189
1:  120.177.53.51
1:  128.43.149.131
1:  170.18.14.180
1. 12: 237.227.142.204
2. 10: 41.241.198.100
3. 6: 65.140.160.255
50 unique IP a

In [76]:
with open("server_summary.txt", "w") as report_file:
    report_file.write("Server Access Summary\n")
    report_file.write("======================\n")
    report_file.write("\nTop 3 IP addresses:\n")
    for index, (ip, count) in enumerate(ip_addresses_list[:3], 1):
        report_file.write(f"{index}. {count}:  {ip}\n")
    num_values = len(ip_addresses)
    report_file.write(f"\n{num_values} unique IP addresses accessed the server.\n")

print("Report generated: 'server_summary.txt'")

Report generated: 'server_summary.txt'


## Part 3: Creating Usernames

Use the file `emails.txt` to create a list of usernames and random passwords for each user. Then, output the emails, usernames, and random passwords into an output file `output.txt`.

The usernames should be the same username as the email. So for  `findlay_butler@hr.yahoo.com`, his username would be `findlay_butler`.

The passwords should be 8 characters long and a random combination of letters and numbers. 

For the first user, `output.txt` should look like: 
```
findlay_butler@hr.yahoo.com,findlay_butler,abiojash
```

### Challenge: Using Regular Expressions

Instead of using the email username as their user account, their username should be their first initial and their last name instead. So for `findlay_butler@hr.yahoo.com`  the username would be `fbutler`. The easiest way to do this is probably **using regular expressions.** 

For more explanation and practice with regular expressions, use [regexone.com](https://regexone.com/). For help creating your regular expression query, use [regex101.com](https://regex101.com/). 

In [77]:
import random 
import string 
with open("emails.txt", "r") as file:
    lines = file.readlines()


with open("output.txt", "w") as report_file:
    report_file.write("User Database\n")
    for line in lines: 
        line = line.strip()
        (user, gmail) = line.split("@")
        length = 8
        characters = string.ascii_letters + string.digits
        passwd = ''.join(random.choice(characters) for i in range(length))
        print(f"{line},{user},{passwd}\n")
        report_file.write(f"{line},{user},{passwd}\n")

    
print("Report generated: 'output.txt'")




findlay_butler@hr.yahoo.com,findlay_butler,ohBXsrJ4

cain_mosley@finance.yahoo.com,cain_mosley,9I534SmO

donna_beltran@accounting.yahoo.com,donna_beltran,AOXY20zI

sian_ramirez@sales.yahoo.com,sian_ramirez,f6LzClHB

angelo_fulton@it.yahoo.com,angelo_fulton,IowsIaQQ

daniyal_castro@ops.yahoo.com,daniyal_castro,1yuuCJTX

cayden_morrison@purchasing.yahoo.com,cayden_morrison,tVEjHza9

amir_haney@randd.yahoo.com,amir_haney,RJqfj19j

olive_fowler@production.yahoo.com,olive_fowler,tBK8Klev

ernest_bauer@marketing.yahoo.com,ernest_bauer,mCnUqWvR

isla_burnett@leadership.yahoo.com,isla_burnett,66b9u411

albert_velazquez@sales.yahoo.com,albert_velazquez,bui1Xntp

filip_donovan@it.yahoo.com,filip_donovan,cp3oDWX2

hamza_crawford@ops.yahoo.com,hamza_crawford,sioQN0KL

astrid_obrien@purchasing.yahoo.com,astrid_obrien,6WQeCd6O

milan_odling@randd.yahoo.com,milan_odling,uXE4lTzU

ruairi_stevenson@production.yahoo.com,ruairi_stevenson,B7ldMTKf

ria_bonner@marketing.yahoo.com,ria_bonner,coA9ip63

ela_h