# Group Project 002

- *Submission Instruction*
    - Clearly states your team number, team name, team members' first and last names and their contribution percentages.
    - Please include **adequate comments/pseudocodes** in your script. Also, comments should also **highlight the Python concepts you adopted**. 
    - Team leader is responsible for submitting the script for the team to Canvas. You can submit either a Anaconda Cloud shared link or py/ipynb file. 
    - Script must be submitted before leaving the classroom.

- *Requirements*
    -  Must **exclusively** utilize Python concepts and techniques that have been covered in the class.
    -  The use of external Python modules is strictly prohibited unless explicitly provided.

- *Grading Criteria (Possible Surprising Points: 100)*
    - **Correctness (40%)**: The code should function as intended, producing accurate outputs.
    - **Readability (30%)**: Code readability is essential for collaborative work in professional environments. This includes clear and consistent naming conventions for variables and functions, as well as adequate but not too lengthy commenting (pseudocodes) to explain complex logic.
    - **Conciseness (30%)**: While readable, codes should also be concise, avoiding unnecessary complexity or redundancy. For instance, repetitive codes can be avoided by defining functions, using walrus operator, or conditional expression, list/dictionary comprehensions, etc.

In [1]:
# please enter your team information here
team_info = {
    "Team No": "xxx",
    "Team Name": "xxx",
    ("Team Member1", "Contribution"): ("Nicole", "100%"), # replace Nicole with your name, replace 100% with your contribution percentage.
    ("Team Member2", "Contribution"): ("xxx", "xx%"),  # remove if not applicable
    ("Team Member3", "Contribution"): ("xxx", "xx%")   # remove if not applicable
}

for k, v in team_info.items():
    print(f'{k}: {v}')

Team No: xxx
Team Name: xxx
('Team Member1', 'Contribution'): ('Nicole', '100%')
('Team Member2', 'Contribution'): ('xxx', 'xx%')
('Team Member3', 'Contribution'): ('xxx', 'xx%')


- **Task Description**

    - Develop a Python program to read and analyze a text file named 'Preventing Ransomware Attacks at Scale.txt'. The program will extract and analyze words based on specific criteria, such as identifying frequent terms related to cybersecurity concepts and vulnerabilities.
    - The task is decomposed into the following subtasks:
    
        - 1. Use Python's file handling methods to ```open and read``` the contents of 'Preventing Ransomware Attacks at Scale.txt'.
        - 2. Try your best to clean each line by, e.g., removing punctuations, and split it into individual words. Please handle case sensitivity, e.g., treat 'Word' and 'word' as the same word. The use of the ```string``` module is permitted. 
               
        - 3. ```Iterate``` over the list of words.
              - Extract and count the frequency of words that end with specific cybersecurity-related suffixes, such as:
              
                   -'tion' (e.g., "implementation", "prevention").
                   
                   -'ing' (e.g., "attacking", "defending").
                   
                   -'ity' (e.g., "security", "complexity").

        - 4. Store and Sort Data
               - Store the words ending with the specified suffix in separate dictionaries, with the word as the key and its frequency as the value.
               - Convert each dictionary into a list of tuples, where each tuple contains a frequency count and the corresponding word.
               - Sort the lists in descending order of frequency using the sort() method.
        - 5. Get the Top 10 Words for each suffix.
             
        - 6. Present the top 10 words for each suffix, using f-strings for formatting the results neatly. 
             
    
    - Please include reasonable pseudocodes and clearly comment which concepts covered in the class you have used to develop your program.

In [4]:
import string  # For cleaning and handling punctuation in text

# Define punctuation to clean words
punctuations = string.punctuation + '“”'

# Dictionaries to store different word counts
tion_word_count = {}
ing_word_count = {}
ity_word_count = {}

# Pseudocode:
# 1. Open and read the contents of 'Preventing Ransomware Attacks at Scale.txt'.
# 2. Iterate through each line in the text file.
# 3. Clean each line by removing punctuation and converting text to lowercase.
# 4. Identify and count words ending with specific suffixes ('tion', 'ing', 'ity').
# 5. Store these counts in separate dictionaries.
# 6. Sort and retrieve the top 10 words for each suffix.
# 7. Display the results in a readable format.

# Open the file and process it
try:
    with open('Preventing Ransomware Attacks at Scale.txt', 'r', encoding='UTF-8') as text:
        for line in text:
            # Clean and split line into words
            clean_words = [
                word.strip(punctuations).lower()
                for word in line.split()
            ]

            # Process words for each suffix
            for word in clean_words:
                # Words ending with 'ing'
                if word.endswith("ing"):
                    ing_word_count[word] = ing_word_count.get(word, 0) + 1

                # Words ending with 'ity'
                if word.endswith("ity"):
                    ity_word_count[word] = ity_word_count.get(word, 0) + 1

                # Words ending with 'tion'
                if word.endswith("tion"):
                    tion_word_count[word] = tion_word_count.get(word, 0) + 1

except FileNotFoundError:
    print("Error: File not found. Please ensure the file is in the correct directory.")

# Function to sort and retrieve top 10 words
def get_top_10_words(word_count_dict):
    # Convert dictionary to a list of tuples and sort by frequency in descending order
    sorted_items = sorted(word_count_dict.items(), key=sort_by_frequency, reverse=True)
    return sorted_items[:10]

# Helper function for sorting
def sort_by_frequency(item):
    return item[1]

# Retrieve top 10 words for each suffix
ing_top_10_words = get_top_10_words(ing_word_count)
ity_top_10_words = get_top_10_words(ity_word_count)
tion_top_10_words = get_top_10_words(tion_word_count)

# Display the results
print("Top 10 words ending with 'ing':")
for word, count in ing_top_10_words:
    print(f"{word}: {count}")

print("\nTop 10 words ending with 'ity':")
for word, count in ity_top_10_words:
    print(f"{word}: {count}")

print("\nTop 10 words ending with 'tion':")
for word, count in tion_top_10_words:
    print(f"{word}: {count}")

 


    


Top 10 words ending with 'ing':
including: 3
working: 2
bearing: 1
responding: 1
signing: 1
reporting: 1
creating: 1
coding: 1
damaging: 1
allowing: 1

Top 10 words ending with 'ity':
security: 6
vulnerability: 6
cybersecurity: 4
community: 1
majority: 1
responsibility: 1
priority: 1
reality: 1
entity: 1

Top 10 words ending with 'tion':
injection: 8
action: 2
authentication: 2
prescription: 1
foundation: 1
information: 1
inaction: 1
recommendation: 1
