# Lab 0.4 File IO

## Objective

1. Read information from files using Python
2. Use regular expressions to extract information from text
3. Create files using Python

*The challenge section and "just for fun" section are optional.*

## Rubric

- 6 pts - Contains all required components and uses professional language
- 5 pts - Contains all required components, but uses unprofessional language, formating, etc. 
- 4 pts - Contains some, but not all, of the required components
- 3 pts - Did not submit

## Part 1: Letter Frequency

A Caesar cipher, or a shift cipher, is one of the simplest encryption techniques. This method is named after Julius Caesar who would use it to send private messages. To encrypt information with a Caesar cipher, each letter in your message or plaintext is replaced by a letter a fixed numbers of positions away in the alphabet to generate your ciphertext.

For example, if I wanted to encrypt the message `ECHO` using a left shift of 3, I would rewrite each character by shifting the entire alphabet left by 3 characters. Using the chart and key below, we can see that `E -> B`, `C -> Z`, `H -> E`, and `O -> L`. So `ECHO` becomes `BZEL`.

![Pasted image 20231227102315](https://github.com/gormes-EPIC/FileIO-CSV-DSF/assets/134316348/36015604-5669-475c-a8c6-3d4674da98d4)
- Plaintext:  ABCDEFGHIJKLMNOPQRSTUVWXYZ
- Ciphertext: XYZABCDEFGHIJKLMNOPQRSTUVW

We can use the same cipher to encrypt the plaintext `THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG` as the ciphertext `QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD`. Then decrypt it using our key in the other direction and shifting right by 3.

As long as whoever is reading the message knows you have shifted the alphabet left by 3, it is straightforward to decrypt `BZEL` as `ECHO`. But what if you intercepted this message and didn't know the original shift? By exploiting patterns in the English language, we can actually decrypt Caesar ciphers without knowing the original shift. [Source](https://www.101computing.net/caesar-cipher/)


### Your Task

One way to break a Caesar cipher is to look at the frequency of the letters. In a typical English text, some letters are much more frequent that others.

To create your frequency table you will:

1. Using [Project Gutenburg](https://www.gutenberg.org/) download at least one book into your directory. *Hint: Once you navigate to a book, copy the URL of the Plain Text UTF-8 download and user the `wget` command in your terminal.*
2. Open your book using Python, count each of the letters, and create a frequency table.
3. After you are done, print out the information.

#### Example Output

```
A: 1023
B: 356
C: 40
...
```



In [None]:
def count(filename):
    file = open(filename, 'r')
    txt = file.read()                  # opening file and reading to find letters
    file.close()
    
    txt = txt.upper()                                               # making all the letters capital
    letters = {letter: 0 for letter in "ABCDEFGHIJKLMNOPQRSTUVWXYZ"}    # making all the letters set to 0 
    
    for char in txt:
        if char in letters:
            letters[char] += 1    # checking if the char is a letter and if it is it makes its value 1
    
    for letter in letters:
        print(letter, ":", letters[letter])   # prints the letter along with its count in the terminal

count("pg75592.txt") # change txt file for ur own

### Just for Fun! Break this Caesar Cipher

Decode the following ciphertext. Start by using the frequency table you just made and matching the most popular letters with the letters from above. *Tip: In addition to using your letter frequency table from above to help you, look at the 1 and 2 letter words carefully. There are limited options those characters could be! Also, look try to identify frequently used words like `THE` or `AND` in your ciphertext.*

  Ciphertext:

```

PA PZ H WLYPVK VM JPCPS DHY. YLILS ZWHJLZOPWZ, ZAYPRPUN MYVT H OPKKLU IHZL, OHCL DVU AOLPY MPYZA CPJAVYF HNHPUZA AOL LCPS NHSHJAPJ LTWPYL. KBYPUN AOL IHAASL, YLILS ZWPLZ THUHNLK AV ZALHS ZLJYLA WSHUZ AV AOL LTWLYVY'Z BSAPTHAL DLHWVU, AOL KLHAO ZAHY, HU HYTVYLK ZWHJL ZAHAPVU DPAO LUVBNO WVDLY AV KLZAYVF HU LUAPYL WSHULA. WBYZBLK IF AOL LTWLYVY'Z ZPUPZALY HNLUAZ, WYPUJLZZ SLPH YHJLZ OVTL HIVHYK OLY ZAHYZOPW, JBZAVKPHU VM AOL ZAVSLU WSHUZ AOHA JHU ZHCL OLY WLVWSL HUK YLZAVYL MYLLKVT AV AOL NHSHEF ....

```

In [None]:
it is a period of civil war. rebel spaceships, striking from a hidden base, have won their first victory against the evil galactic empire. during the battle, rebel spies managed to steal secret plans to the emperor's ultimate weapon, the death star, an armored space station with enough power to destroy an entire planet. pursued by the emperor's sinister agents, princess leia races home aboard her starship, custodian of the stolen plans that can save her people and restore freedom to the galaxy

## Part 2: Analyzing Server Activity

One important way for businesses to keep themselves secure is to monitor their server logs.

Read in `server_log.txt` containing server access logs with entries like "IP Address-Timestamp-Page Accessed". Notice which character we are using as a delimiter.

- Count the total number of unique IP addresses that accessed the server.
- Identify the top three most used IP addresses.
- Generate a report file `server_summary.txt` containing this information.

In [None]:
from collections import Counter

with open("server_log.txt", "r") as file:
    text = file.read()  #opening and reading the file

ips = text.split()  #making list of ips from text

ipcounter = Counter(ips)

most_common_ips = ipcounter.most_common(3)  #getting top three ips

# making output file
with open("server_summary.txt", "w") as output_file:
    print("top three ips:", file=output_file)
    for ip, count in most_common_ips:
        print(f"{ip}: {count}", file=output_file)

print("generated summary file")


## Part 3: Creating Usernames

Use the file `emails.txt` to create a list of usernames and random passwords for each user. Then, output the emails, usernames, and random passwords into an output file `output.txt`.

The usernames should be the same username as the email. So for  `findlay_butler@hr.yahoo.com`, his username would be `findlay_butler`.

The passwords should be 8 characters long and a random combination of letters and numbers. 

For the first user, `output.txt` should look like: 
```
findlay_butler@hr.yahoo.com,findlay_butler,abiojash
```

### Challenge: Using Regular Expressions

Instead of using the email username as their user account, their username should be their first initial and their last name instead. So for `findlay_butler@hr.yahoo.com`  the username would be `fbutler`. The easiest way to do this is probably **using regular expressions.** 

For more explanation and practice with regular expressions, use [regexone.com](https://regexone.com/). For help creating your regular expression query, use [regex101.com](https://regex101.com/). 

In [None]:
import random, string

with open("emails.txt") as bruh:
    emails = bruh.read().splitlines() #reading the file and splitting into lines

with open("output.txt", "w") as bruh: #making it so it rites in the file
    for email in emails:
        user = email.split("@")[0] #splitting email from the @ so its just the name
        password = ''.join(random.choices(string.ascii_letters + string.digits, k=8)) #i got this line from stacksoverflow but it takes all upper and lowercase letters + numbers for 8 chracters then puts them together
        bruh.write(f"{email}    --    Username: {user}   Password: {password}\n") #writing the final product in the format for each email
print("created users and passwords")