# Session 5  
## Basic Libraries II  

In this session, I worked with annotation files stored in a directory and performed various data processing tasks. I counted and grouped annotations by month, saved the data in different formats, and organized it for easier analysis. Through this process, I applied file handling techniques, manipulated datetime objects, and utilized data serialization methods using JSON and Pickle.

### Exercises  

Reusing the same annotations from the previous session, complete the following tasks using the libraries covered in this session:  

1. **Annotations by Month and Year**  
   - Count the number of annotations per month and year.  
   - Identify which month has the highest number of annotation files.  

2. **Organize Annotations by Month**  
   Create a dictionary where each **key** is a month, and the corresponding **value** is a list of annotation names with dates that correspond to that month.  
   a. Save the dictionary in JSON format, then load it again to ensure the data was saved and loaded correctly.  
   b. Save the dictionary using Pickle as a binary format.  
   c. Enhance the data by modifying the dictionary so that, instead of a list of annotation names, each annotation is represented as a dictionary with the following keys:  
      - **name**: The annotation filename.  
      - **date**: The date (as a datetime object) extracted from the filename.  

3. **Sort Annotations from the Second Half of 2024**  
   - Print all the annotations from the second half of 2024 (June to December), sorted from the oldest to the newest.  



### Exercise 1

In [21]:
import os
import re 
from datetime import datetime
import pandas as pd 

# Specify the directory containing the annotation files
annotations_dir = r"C:\Users\selee\OneDrive\Documents\GitHub\Python4DS\S5_execises\annotations"

# Initialize a dictionary to track the number of annotations per month
annotations_by_month = {}

# Loop through each file in the annotations directory
for file in os.listdir(annotations_dir):  # Iterate over all files in the directory
    match = re.match(r'(\d{8}_\d{6}).*\.txt', file)  # Extract the date from the filename using a regular expression
    if match:  # If a date is found in the filename:
        date_str = match.group(1)  # Get the date string in the format YYYYMMDD_HHMMSS
        
        date = datetime.strptime(date_str, "%Y%m%d_%H%M%S")  # Convert the date string into a datetime object
        
        month_key = date.strftime("%Y-%m")  # Format the date to extract the year and month (YYYY-MM)
        
        # Add the month to the dictionary or increment its count
        if month_key not in annotations_by_month:
            annotations_by_month[month_key] = 0
        annotations_by_month[month_key] += 1

# Identify the month with the highest number of annotations
most_annotations_month = max(annotations_by_month, key=annotations_by_month.get)  # Find the month with the largest count

# Create a DataFrame from the dictionary to display the data in tabular form
df_annotations = pd.DataFrame(list(annotations_by_month.items()), columns=["Month", "Count"])

# Sort the DataFrame to show the month with the most annotations at the top
df_annotations = df_annotations.sort_values(by="Count", ascending=False)

# Print the results
print("Here’s the table of annotations per month and year, sorted by count:")
print(df_annotations)  # Display the DataFrame as a table

# Highlight the month with the most annotations
print("The month with the most amount of annotations is:")
print(f"{most_annotations_month} with {annotations_by_month[most_annotations_month]} annotations")


Here’s the table of annotations per month and year, sorted by count:
     Month  Count
5  2024-06     52
1  2024-02     45
3  2024-04     37
4  2024-05     28
0  2024-01     27
2  2024-03     17
The month with the most amount of annotations is:
2024-06 with 52 annotations


### Exercise 2

#### Ex. 2a

In [28]:
import json

# Create a dictionary where each key is a month, and the value is a list of all annotation names for that month
annotations_grouped = {}

for file in os.listdir(annotations_dir):
    match = re.match(r'(\d{8}_\d{6}).*\.txt', file)  # Extract the date part from the file name
    if match:
        date_str = match.group(1)
        date = datetime.strptime(date_str, "%Y%m%d_%H%M%S")  # Convert to a datetime object
        month_key = date.strftime("%Y-%m")  # Extract the month (YYYY-MM)
        
        # Add the annotation name to the corresponding month
        if month_key not in annotations_grouped:
            annotations_grouped[month_key] = []
        annotations_grouped[month_key].append(file)

# Save the dictionary to a JSON file
json_path = r"C:\Users\selee\OneDrive\Documents\GitHub\Python4DS\S5_execises\annotations_month.json"
with open(json_path, 'w') as json_file:
    json.dump(annotations_grouped, json_file)

# Load the JSON file again to verify it works
with open(json_path, 'r') as json_file:
    loaded_annotations = json.load(json_file)

# Print to confirm the data is loaded correctly
print(f"JSON location: {json_path}")
print("JSON data:")
print(loaded_annotations)


JSON location: C:\Users\selee\OneDrive\Documents\GitHub\Python4DS\S5_execises\annotations_month.json
JSON data:
{'2024-01': ['20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3770.txt', '20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3772.txt', '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4162.txt', '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4164.txt', '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_554_4162.txt', '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3740.txt', '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3742.txt', '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_396_3752.txt', '20240102_185527_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_740_3850.txt', '20240102_185605_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_690_3572.txt', '20240102_185954_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_414_3786.txt', '20240104_220339_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_55

#### Ex. 2b

In [27]:
import pickle

# Save the dictionary as a Pickle file
pickle_path = r"C:\Users\selee\OneDrive\Documents\GitHub\Python4DS\S5_execises\annotations_month.pkl"

with open(pickle_path, 'wb') as pickle_file:
    pickle.dump(annotations_grouped, pickle_file)

# Load the Pickle file again to verify it works
with open(pickle_path, 'rb') as pickle_file:
    annotations_pickle = pickle.load(pickle_file)

# Print to confirm the data is loaded correctly
print(f"Pickle location: {pickle_path}")
print("Pickle data:")
print(annotations_pickle)


Pickle location: C:\Users\selee\OneDrive\Documents\GitHub\Python4DS\S5_execises\annotations_month.pkl
Pickle data:
{'2024-01': ['20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3770.txt', '20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3772.txt', '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4162.txt', '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4164.txt', '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_554_4162.txt', '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3740.txt', '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3742.txt', '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_396_3752.txt', '20240102_185527_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_740_3850.txt', '20240102_185605_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_690_3572.txt', '20240102_185954_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_414_3786.txt', '20240104_220339_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N

#### Ex. 2c

In [26]:
# Create a dictionary where each key is a month, and the value is a list of dictionaries with 'name' and 'date' keys
annotations_grouped_with_details = {}

for file in os.listdir(annotations_dir):
    match = re.match(r'(\d{8}_\d{6}).*\.txt', file)  # Extract the date part from the file name
    if match:
        date_str = match.group(1)
        date = datetime.strptime(date_str, "%Y%m%d_%H%M%S")  # Convert to a datetime object
        month_key = date.strftime("%Y-%m")  # Extract the month (YYYY-MM)
        
        # Create a dictionary with 'name' and 'date' for this annotation
        annotation_details = {'name': file, 'date': date}
        
        # Add the annotation details to the corresponding month
        if month_key not in annotations_grouped_with_details:
            annotations_grouped_with_details[month_key] = []
        annotations_grouped_with_details[month_key].append(annotation_details)

# Save the modified dictionary to a new JSON file for verification
json_path_nd = r"C:\Users\selee\OneDrive\Documents\GitHub\Python4DS\S5_execises\annotations_name_date.json"

with open(json_path_nd, 'w') as json_file:
    json.dump(annotations_grouped_with_details, json_file, default=str)

print(f"Detailed JSON location: {json_path_nd}")
print("Example grouped annotations:")
for month, annotations in list(annotations_grouped_with_details.items())[:1]:  # Show only one month
    print(f"{month}: {annotations}")


Detailed JSON location: C:\Users\selee\OneDrive\Documents\GitHub\Python4DS\S5_execises\annotations_name_date.json
Example grouped annotations:
2024-01: [{'name': '20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3770.txt', 'date': datetime.datetime(2024, 1, 1, 17, 43, 1)}, {'name': '20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3772.txt', 'date': datetime.datetime(2024, 1, 1, 17, 43, 1)}, {'name': '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4162.txt', 'date': datetime.datetime(2024, 1, 1, 19, 28, 56)}, {'name': '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4164.txt', 'date': datetime.datetime(2024, 1, 1, 19, 28, 56)}, {'name': '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_554_4162.txt', 'date': datetime.datetime(2024, 1, 1, 19, 28, 56)}, {'name': '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3740.txt', 'date': datetime.datetime(2024, 1, 1, 21, 36, 1)}, {'name': '20240101_213601_SN31_QUICKVIEW

## Exercise 3

In [30]:
# Create an empty list to store annotations from the second half of 2024
second_half_2024 = []

# Loop through the dictionary to find months in the second half of 2024, Iterate over each month and its annotations
for month, annotations in annotations_grouped_with_details.items():
    # Check if the month falls between June and December 2024
    if "2024-06" <= month <= "2024-12":
        # Add all annotations for this month to the list
        second_half_2024.extend(annotations)

# Sort the annotations by date
second_half_2024_sorted = sorted(second_half_2024, key=lambda x: x['date'])  # Sort by the 'date' key

# Print the sorted annotations
print("Here are all the annotations from the second half of 2024, sorted by date:")
for annotation in second_half_2024_sorted:
    print(f"File: {annotation['name']}, Date: {annotation['date']}")

Here are all the annotations from the second half of 2024, sorted by date:
File: 20240602_215203_SN30_QUICKVIEW_VISUAL_1_6_0_SATL-2KM-10N_712_3948.txt, Date: 2024-06-02 21:52:03
File: 20240602_215203_SN30_QUICKVIEW_VISUAL_1_6_0_SATL-2KM-10N_714_3948.txt, Date: 2024-06-02 21:52:03
File: 20240603_215226_SN28_QUICKVIEW_VISUAL_1_6_0_SATL-2KM-11N_248_4068.txt, Date: 2024-06-03 21:52:26
File: 20240603_215348_SN28_QUICKVIEW_VISUAL_1_6_0_SATL-2KM-11N_346_3786.txt, Date: 2024-06-03 21:53:48
File: 20240604_214955_SN29_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-10N_594_4136.txt, Date: 2024-06-04 21:49:55
File: 20240605_212717_SN29_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_718_3608.txt, Date: 2024-06-05 21:27:17
File: 20240606_180251_SN33_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-10N_556_4180.txt, Date: 2024-06-06 18:02:51
File: 20240607_200250_SN27_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-10N_554_4172.txt, Date: 2024-06-07 20:02:50
File: 20240608_214614_SN29_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_248_4068.txt, Date: 2024-06-08 21:46:1