# Python Datatime and File Handling

## Python Datetime Exercises:

Task1: Get the current day, month, year, hour, minute and timestamp from datetime module

Task 2: Format the current date using this format: "%m/%d/%Y, %H:%M:%S")

Task 3: Today is 5 December, 2019. Change this time string to time.

Task 4: Calculate the time difference between now and new year.

Task 5: Calculate the time difference between 1 January 1970 and now.

Task 6: Think, what can you use the datetime module for? Examples:

- Time series analysis
- To get a timestamp of any activities in an application
- Adding posts on a blog 

## File Handling Exercises:

### Exercises: Level 1

Task 1: Write a function which count number of lines and number of words in a text. All the files are in the data the folder: a) Read obama_speech.txt file and count number of lines and words b) Read michelle_obama_speech.txt file and count number of lines and words c) Read donald_speech.txt file and count number of lines and words d) Read melina_trump_speech.txt file and count number of lines and words

In [None]:
from pathlib import Path

def count_lines_and_words(file):
    path = Path(file)
    content = path.read_text()
    lines = content.splitlines()
    words = content.split()
    return len(lines), len(words)

obama_speech_stats = count_lines_and_words('../data/obama_speech.txt')
print(
    f"The Obama speech has {obama_speech_stats[0]} "
    f"lines and {obama_speech_stats[1]} words."
)

michelle_speech_stats = count_lines_and_words('../data/michelle_obama_speech.txt')
print(
    f"The Obama speech has {michelle_speech_stats[0]} "
    f"lines and {michelle_speech_stats[1]} words."
)

donald_speech_stats = count_lines_and_words('../data/donald_speech.txt')
print(
    f"The Obama speech has {donald_speech_stats[0]} "
    f"lines and {donald_speech_stats[1]} words."
)

melina_speech_stats = count_lines_and_words('../data/melina_trump_speech.txt')
print(
    f"The Obama speech has {melina_speech_stats[0]} "
    f"lines and {melina_speech_stats[1]} words."
)

Task 2: Read the countries_data.json data file in data directory, create a function that finds the ten most spoken languages

In [None]:
import json
from pathlib import Path
from collections import Counter

def ten_most_spoken_languages(countries):
    language_counter = Counter()
    for country in countries:
        languages = country['languages']
        language_counter.update(languages)
    return language_counter.most_common(10)

path = Path('../data/countries_data.json')
countries_data = json.loads(path.read_text())

most_spoken_languages = ten_most_spoken_languages(countries_data)
print(most_spoken_languages)

Task 3: Read the countries_data.json data file in data directory, create a function that creates a list of the ten most populated countries

In [None]:
import json
from pathlib import Path

def ten_most_populated_countries(countries):
    sorted_countries = sorted(countries, key=lambda x:
                              x['population'], reverse=True)
    return sorted_countries[:10]

path = Path('../data/countries_data.json')
countries_data = json.loads(path.read_text())

most_populated_countries = ten_most_populated_countries(countries_data)
for country in most_populated_countries:
    print(f"{country['name']}: {country['population']}")

### Exercises: Level 2

Task 4: Extract all incoming email addresses as a list from the email_exchange_big.txt file.

In [None]:
import re
from pathlib import Path

path = Path('../data/email_exchanges_big.txt')
content = path.read_text()
lines = content.splitlines()
email_addresses = []
for line in lines:
    email_addresses.extend(re.findall(r'\b\w+@\w+\.\w+\b', line)) # use regular
    # expression to find email addresses
print(email_addresses)

Task 5: Find the most common words in the English language. Call the name of your function find_most_common_words, it will take two parameters - a string or a file and a positive integer, indicating the number of words. Your function will return an array of tuples in descending order. Check the output

In [None]:
from collections import Counter
from pathlib import Path

def find_most_common_words(source, n):
    """Return the n most common words in a string or file."""
    if isinstance(source, str):
        content = source
    elif isinstance(source, Path):
        content = source.read_text(encoding='utf-8')
    else:
        raise ValueError("The first argument must be a string or a Path object.")

    words = content.split()
    word_counter = Counter()
    for word in words:
        # Clean and normalize the word
        clean_word = ''.join(char for char in word if char.isalnum()).lower()
        if clean_word:  # Ignore empty strings
            word_counter[clean_word] += 1

    return word_counter.most_common(n)

Task 6: Use the function, find_most_frequent_words to find: a) The ten most frequent words used in Obama's speech b) The ten most frequent words used in Michelle's speech c) The ten most frequent words used in Trump's speech d) The ten most frequent words used in Melina's speech

In [None]:
from pathlib import Path

path = Path('../data/obama_speech.txt')
most_common_words = find_most_common_words(path, 10)
print("Top 10 most common words in Obama's speech:", most_common_words)

path = Path('../data/michelle_obama_speech.txt')
most_common_words = find_most_common_words(path, 10)
print("Top 10 most common words in Michelle Obama's speech:", most_common_words)

path = Path('../data/donald_speech.txt')
most_common_words = find_most_common_words(path, 10)
print("Top 10 most common words in Donald Trump's speech:", most_common_words)

path = Path('../data/melina_trump_speech.txt')
most_common_words = find_most_common_words(path, 10)
print("Top 10 most common words in Melina Trump's speech:", most_common_words)

Task 7: Write a python application that checks similarity between two texts. It takes a file or a string as a parameter and it will evaluate the similarity of the two texts. For instance check the similarity between the transcripts of Michelle's and Melina's speech. You may need a couple of functions, function to clean the text(clean_text), function to remove support words(remove_support_words) and finally to check the similarity(check_text_similarity). List of stop words are in the data directory

Task 8: Find the 10 most repeated words in the romeo_and_juliet.txt

Task 9: Read the hacker news csv file and find out: a) Count the number of lines containing python or Python b) Count the number lines containing JavaScript, javascript or Javascript c) Count the number lines containing Java and not JavaScript