#  Module 13 Assigment: Level 1



## 💻 Python Datetime Exercises


## 1. Get the current day, month, year, hour, minute and timestamp from datetime module.

In [7]:
from datetime import datetime
now = datetime.now()
print(now)                      # 2021-07-08 07:34:46.549883


2025-01-22 11:42:51.025500


## 2. Format the current date using this format: "%m/%d/%Y, %H:%M:%S")


In [12]:
from datetime import datetime
# current date and time
now = datetime.now()
time = now.strftime("%m/%d/%Y, %H:%M:%S")
# mm/dd/YY H:M:S format
print("Date & time:", time)

Date & time: 01/22/2025, 11:46:05


## 3. Today is 5 December, 2019. Change this time string to time.

In [16]:
from datetime import datetime

date_string = "5 December, 2019"

date_object = datetime.strptime(date_string, "%d %B, %Y")

print(date_object)


2019-12-05 00:00:00


## 4. Calculate the time difference between now and new year.


In [19]:
from datetime import datetime

current_date = datetime.strptime("20 January, 2025", "%d %B, %Y")

new_year_date = datetime.strptime("1 January, 2026", "%d %B, %Y")

time_difference = new_year_date - current_date

print(f"Time difference between now and new year: {time_difference.days} days")


Time difference between now and new year: 346 days


## 5. Calculate the time difference between 1 January 1970 and now.


In [22]:
old_date = datetime.strptime("1 January, 1970", "%d %B, %Y")

current_date = datetime.now()
time_difference = current_date - old_date

print(f"Time difference between 1970 and now: {time_difference.days} days")

Time difference between 1970 and now: 20110 days


## 6. Think, what can you use the datetime module for? Examples:
- ### Time series analysis
- ### To get a timestamp of any activities in an application
- ### Adding posts on a blog

The `datetime` module is a powerful tool in Python for working with dates and times. Here are various practical use cases beyond the examples you provided:

### 1. **Time Series Analysis**
   - **Use Case**: Analyzing stock prices, weather data, or sensor readings over time.
   - **Example**: Using `datetime` to parse time data and index data frames for financial analysis.

### 2. **Timestamps for Logging Activities**
   - **Use Case**: Tracking user actions, system events, or debugging logs.
   - **Example**:  
     ```python
     from datetime import datetime
     log_time = datetime.now()
     print(f"User logged in at {log_time}")
     ```

### 3. **Scheduling and Notifications**
   - **Use Case**: Building alarms, reminders, or task schedulers.
   - **Example**: Checking if a task is due and sending a reminder.

### 4. **Data Expiration and Validity Checking**
   - **Use Case**: Managing limited-time offers, subscriptions, or access tokens.
   - **Example**: Comparing the current date with an expiry date.

### 5. **Calculating Age or Duration**
   - **Use Case**: Finding the age of a person or duration of an event.
   - **Example**:  
     ```python
     birth_date = datetime(2000, 5, 10)
     current_date = datetime.now()
     age = (current_date - birth_date).days // 365
     print(f"Age: {age} years")
     ```

### 6. **Date-Based Filenames or Backup Systems**
   - **Use Case**: Naming backups or logs with timestamps.
   - **Example**:  
     ```python
     filename = f"backup_{datetime.now().strftime('%Y%m%d_%H%M%S')}.zip"
     ```

### 7. **Expiration Reminder for Online Stores**
   - **Use Case**: Sending notifications for cart expiration or flash sales.

### 8. **Analytics on Blog or Website Posts**
   - **Use Case**: Sorting and filtering posts by date, or finding the most popular post in a time frame.

### 9. **Event Countdown or Progress Tracking**
   - **Use Case**: Countdown to a product launch, event, or holiday.

### 10. **Time Zone and Localization Handling**
   - **Use Case**: Supporting users from different time zones by converting to local time using `pytz`.

These examples demonstrate how `datetime` empowers applications to manage and manipulate time effectively, making it crucial for real-world programming solutions.

# 💻 File Handling Exercises:


## Exercises: Level 1

## 1. Write a function which count number of lines and number of words in a text. All the files are in the data the folder: a) Read obama_speech.txt file and count number of lines and words b) Read michelle_obama_speech.txt file and count number of lines and words c) Read donald_speech.txt file and count number of lines and words d) Read melina_trump_speech.txt file and count number of lines and words



In [40]:
def count_lines_and_words(file_path):
    try:
        with open(file_path, 'r') as file:
            lines = file.readlines()
            number_of_lines = len(lines)
            number_of_words = sum(len(line.split()) for line in lines)
            return number_of_lines, number_of_words
    except FileNotFoundError:
        print(f"Error: The file {file_path} was not found.")
        return 0, 0

files = [
    "data/obama_speech.txt",
    "data/michelle_obama_speech.txt",
    "data/donald_speech.txt",
    "data/melina_trump_speech.txt"
]

for file_path in files:
    lines, words = count_lines_and_words(file_path)
    print(f"File: {file_path[5:]}")
    print(f"Number of lines: {lines}")
    print(f"Number of words: {words}\n")


File: obama_speech.txt
Number of lines: 66
Number of words: 2400

File: michelle_obama_speech.txt
Number of lines: 83
Number of words: 2204

File: donald_speech.txt
Number of lines: 48
Number of words: 1259

File: melina_trump_speech.txt
Number of lines: 33
Number of words: 1375



### 2. Read the countries_data.json data file in data directory, create a function that finds the ten most spoken languages
```python
# Your output should look like this
print(most_spoken_languages(filename='./data/countries_data.json', 10))
[(91, 'English'),
(45, 'French'),
(25, 'Arabic'),
(24, 'Spanish'),
(9, 'Russian'),
(9, 'Portuguese'),
(8, 'Dutch'),
(7, 'German'),
(5, 'Chinese'),
(4, 'Swahili'),
(4, 'Serbian')]

# Your output should look like this
print(most_spoken_languages(filename='./data/countries_data.json', 3))
[(91, 'English'),
(45, 'French'),
(25, 'Arabic')]


In [41]:
import json
from collections import Counter

def most_spoken_languages(filename, top_n):
    with open(filename, 'r') as file:
        countries_data = json.load(file)
    languages = []

    for country in countries_data:
        languages.extend(country.get('languages', []))

    language_count = Counter(languages)

    most_common_languages = language_count.most_common(top_n)

    return [(count, language) for language, count in most_common_languages]

print(most_spoken_languages(filename='./data/countries_data.json', top_n=10))
print(most_spoken_languages(filename='./data/countries_data.json', top_n=3))


[(91, 'English'), (45, 'French'), (25, 'Arabic'), (24, 'Spanish'), (9, 'Portuguese'), (9, 'Russian'), (8, 'Dutch'), (7, 'German'), (5, 'Chinese'), (4, 'Serbian')]
[(91, 'English'), (45, 'French'), (25, 'Arabic')]


### 3. Read the countries_data.json data file in data directory, create a function that creates a list of the ten most populated countries

```python
# Your output should look like this
print(most_populated_countries(filename='./data/countries_data.json', 10))

[
{'country': 'China', 'population': 1377422166},
{'country': 'India', 'population': 1295210000},
{'country': 'United States of America', 'population': 323947000},
{'country': 'Indonesia', 'population': 258705000},
{'country': 'Brazil', 'population': 206135893},
{'country': 'Pakistan', 'population': 194125062},
{'country': 'Nigeria', 'population': 186988000},
{'country': 'Bangladesh', 'population': 161006790},
{'country': 'Russian Federation', 'population': 146599183},
{'country': 'Japan', 'population': 126960000}
]

# Your output should look like this

print(most_populated_countries(filename='./data/countries_data.json', 3))
[
{'country': 'China', 'population': 1377422166},
{'country': 'India', 'population': 1295210000},
{'country': 'United States of America', 'population': 323947000}
]

In [45]:
import json

def most_populated_countries(filename, n):
    # Read the JSON file
    with open(filename, 'r') as file:
        countries_data = json.load(file)
    
    # Sort countries by population in descending order
    sorted_countries = sorted(
        countries_data, key=lambda country: country['population'], reverse=True
    )
    
    # Get the top n populated countries
    top_countries = sorted_countries[:n]
    
    # Create the desired output format
    result = [{'country': country['name'], 'population': country['population']} for country in top_countries]
    
    return result

# Example usage:
print(most_populated_countries(filename='./data/countries_data.json', n=10))
print(most_populated_countries(filename='./data/countries_data.json', n=3))


[{'country': 'China', 'population': 1377422166}, {'country': 'India', 'population': 1295210000}, {'country': 'United States of America', 'population': 323947000}, {'country': 'Indonesia', 'population': 258705000}, {'country': 'Brazil', 'population': 206135893}, {'country': 'Pakistan', 'population': 194125062}, {'country': 'Nigeria', 'population': 186988000}, {'country': 'Bangladesh', 'population': 161006790}, {'country': 'Russian Federation', 'population': 146599183}, {'country': 'Japan', 'population': 126960000}]
[{'country': 'China', 'population': 1377422166}, {'country': 'India', 'population': 1295210000}, {'country': 'United States of America', 'population': 323947000}]


## Exercises: Level 2


### 4. Extract all incoming email addresses as a list from the email_exchange_big.txt file.


In [None]:
import re

def extract_emails(filename):
    # Regex pattern for email addresses
    email_pattern = r'[\w\.-]+@[\w\.-]+\.\w+'
    
    # Read file and extract emails
    with open(filename, 'r', encoding='utf-8') as file:
        content = file.read()
        emails = re.findall(email_pattern, content)
    
    # Remove duplicates while maintaining order
    unique_emails = list(dict.fromkeys(emails))
    return unique_emails

# Example usage
emails = extract_emails(filename='./data/email_exchange_big.txt')
print("\nExtracted email addresses:")
for email in emails:
    print(email)

FileNotFoundError: [Errno 2] No such file or directory: '/data/email_exchange_big.txt'

### 5. Find the most common words in the English language. Call the name of your function find_most_common_words, it will take two parameters - a string or a file and a positive integer, indicating the number of words. Your function will return an array of tuples in descending order. Check the output
```python
    # Your output should look like this
    print(find_most_common_words('sample.txt', 10))
    [(10, 'the'),
    (8, 'be'),
    (6, 'to'),
    (6, 'of'),
    (5, 'and'),
    (4, 'a'),
    (4, 'in'),
    (3, 'that'),
    (2, 'have'),
    (2, 'I')]

    # Your output should look like this
    print(find_most_common_words('sample.txt', 5))

    [(10, 'the'),
    (8, 'be'),
    (6, 'to'),
    (6, 'of'),
    (5, 'and')]

In [72]:
import re
from collections import Counter

def find_most_common_words(source, n):
    if isinstance(source, str):
        try:
            with open(source, 'r') as file:
                text = file.read()
        except FileNotFoundError:
            text = source  # If file not found, treat source as a string
    else:
        raise ValueError("Source must be a filename (string) or text content.")

    # Normalize text by removing punctuation and converting to lowercase
    words = re.findall(r'\b\w+\b', text.lower())
    
    # Count word frequencies
    word_count = Counter(words)
    
    # Get the n most common words
    most_common = word_count.most_common(n)
    
    return [(count, word) for word, count in most_common]

print(find_most_common_words('sample.txt', 5)) # sample.txt is existing file available in the data folder


[(21, 'the'), (15, 'to'), (15, 'of'), (12, 'and'), (7, 'that')]


### 6. Use the function, find_most_frequent_words to find: a) The ten most frequent words used in Obama's speech b) The ten most frequent words used in Michelle's speech c) The ten most frequent words used in Trump's speech d) The ten most frequent words used in Melina's speech

In [82]:
import re
from collections import Counter

def find_most_common_words(source, n):
    if isinstance(source, str):
        try:
            with open(source, 'r') as file:
                text = file.read()
        except FileNotFoundError:
            text = source  # If file not found, treat source as a string
    else:
        raise ValueError("Source must be a filename (string) or text content.")

    # Normalize text by removing punctuation and converting to lowercase
    words = re.findall(r'\b\w+\b', text.lower())
    
    # Count word frequencies
    word_count = Counter(words)
    
    # Get the n most common words
    most_common = word_count.most_common(n)
    
    return [(count, word) for word, count in most_common]

print(f"The ten most frequent words used in Obama's speech: {find_most_common_words('data/obama_speech.txt', 10)}")
print(f"The ten most frequent words used in Michelle's speech: {find_most_common_words('data/michelle_obama_speech.txt', 10)}")
print(f"The ten most frequent words used in Melina's speech: {find_most_common_words('data/melina_trump_speech.txt', 10)}")
print(f"The ten most frequent words used in Trump's speech: {find_most_common_words('data/donald_speech.txt', 10)}")


The ten most frequent words used in Obama's speech: [(129, 'the'), (113, 'and'), (81, 'of'), (70, 'to'), (67, 'our'), (62, 'we'), (50, 'that'), (48, 'a'), (36, 'is'), (25, 'in')]
The ten most frequent words used in Michelle's speech: [(96, 'and'), (85, 'the'), (84, 'to'), (50, 'that'), (46, 'of'), (41, 'a'), (37, 'he'), (36, 'in'), (28, 'my'), (28, 'i')]
The ten most frequent words used in Melina's speech: [(77, 'and'), (55, 'to'), (52, 'the'), (29, 'is'), (28, 'i'), (27, 'for'), (25, 'of'), (24, 'that'), (22, 'a'), (21, 'you')]
The ten most frequent words used in Trump's speech: [(65, 'the'), (59, 'and'), (44, 'we'), (40, 'will'), (38, 'of'), (32, 'to'), (30, 'our'), (20, 'is'), (17, 'america'), (13, 'for')]


### 7. Write a python application that checks similarity between two texts. It takes a file or a string as a parameter and it will evaluate the similarity of the two texts. For instance check the similarity between the transcripts of Michelle's and Melina's speech. You may need a couple of functions, function to clean the text(clean_text), function to remove support words(remove_support_words) and finally to check the similarity(check_text_similarity). List of stop words are in the data directory.

In [93]:
import string
from Data import stop_words  # Import stop words from the stop_words.py file

def clean_text(text):
    """Clean the input text by removing punctuation and converting to lowercase."""
    text = text.lower()  # Convert to lowercase
    text = text.translate(str.maketrans('', '', string.punctuation))  # Remove punctuation
    return text

def remove_support_words(text):
    """Remove stop words from the text using the imported stop_words list."""
    words = text.split()
    filtered_words = [word for word in words if word not in stop_words]
    return filtered_words

def jaccard_similarity(list1, list2):
    """Calculate the Jaccard similarity between two lists."""
    intersection = len(set(list1) & set(list2))
    union = len(set(list1) | set(list2))
    return intersection / union if union != 0 else 0

def process_text(file1, file2):
    """Process the input text files, clean them, and remove stop words."""
    # Read the text files
    with open(file1, 'r') as f:
        text1 = f.read()
    with open(file2, 'r') as f:
        text2 = f.read()

    # Clean the text and remove stop words, returning word lists
    text1 = remove_support_words(clean_text(text1))
    text2 = remove_support_words(clean_text(text2))

    # Calculate Jaccard similarity using word lists
    return jaccard_similarity(text1, text2)

# Example usage:
similarity = process_text(file1='data/melina_trump_speech.txt', file2='data/michelle_obama_speech.txt')
print(f"Jaccard similarity score between the two speeches: {similarity:.4f}")


TypeError: argument of type 'module' is not iterable

### 8. Find the 10 most repeated words in the romeo_and_juliet.txt

In [94]:
import re
from collections import Counter

def find_most_common_words(source, n):
    if isinstance(source, str):
        try:
            with open(source, 'r') as file:
                text = file.read()
        except FileNotFoundError:
            text = source  # If file not found, treat source as a string
    else:
        raise ValueError("Source must be a filename (string) or text content.")

    # Normalize text by removing punctuation and converting to lowercase
    words = re.findall(r'\b\w+\b', text.lower())
    
    # Count word frequencies
    word_count = Counter(words)
    
    # Get the n most common words
    most_common = word_count.most_common(n)
    
    return [(count, word) for word, count in most_common]

print(f"The ten most repeated word in romeo_and_juliet.txt: {find_most_common_words('data/romeo_and_juliet.txt', 10)}")



The ten most repeated word in romeo_and_juliet.txt: [(868, 'the'), (800, 'and'), (661, 'to'), (658, 'i'), (535, 'of'), (530, 'a'), (381, 'is'), (378, 'in'), (371, 'that'), (367, 'you')]


### 9. Read the hacker news csv file and find out: a) Count the number of lines containing python or Python b) Count the number lines containing JavaScript, javascript or Javascript c) Count the number lines containing Java and not JavaScript

In [100]:
import csv

def count_keywords_in_csv(file_path):
    # Initialize counters
    count_python = 0
    count_javascript = 0
    count_java_not_javascript = 0

    # Open the CSV file
    with open(file_path, 'r', encoding='utf-8') as file:
        reader = csv.reader(file)
        
        # Assuming the title/content is in the first column (index 0)
        for row in reader:
            if len(row) > 0:  # Check if the row has content
                text = row[0].lower()  # Assuming the content to check is in the first column
                # Check for Python-related occurrences
                if 'python' in text:
                    count_python += 1
                # Check for JavaScript-related occurrences
                if 'javascript' in text:
                    count_javascript += 1
                # Check for Java but not JavaScript
                if 'java' in text and 'javascript' not in text:
                    count_java_not_javascript += 1

    return count_python, count_javascript, count_java_not_javascript

# Example usage:
file_path = 'data/hacker_news.csv'  # Ensure this is the correct file path
python_count, javascript_count, java_not_js_count = count_keywords_in_csv(file_path)

print(f"Count of lines containing 'python' or 'Python': {python_count}")
print(f"Count of lines containing 'JavaScript', 'javascript', or 'Javascript': {javascript_count}")
print(f"Count of lines containing 'Java' but not 'JavaScript': {java_not_js_count}")


Count of lines containing 'python' or 'Python': 0
Count of lines containing 'JavaScript', 'javascript', or 'Javascript': 0
Count of lines containing 'Java' but not 'JavaScript': 0
