# Session 5: Processing and Organizing Annotation Files  

## Overview  
This session focuses on processing annotation files stored in a directory. The tasks include counting annotations, grouping them by month, saving the data in different formats, and organizing it for more efficient analysis. Through these exercises, you will practice file handling, datetime manipulation, and data serialization using formats like JSON and Pickle.  

---

## Exercises  

1. **Count Annotations Per Month and Year**  
   - **Task**: Count the number of annotation files for each month and year using dates extracted from filenames. Find the month with the most annotations and present the data in a sorted table.  

2. **Save Data in Multiple Formats**  
   a. **Save Data in JSON Format**  
      - Group annotations by month into a dictionary where:  
        - Each key represents a month.  
        - Each value is a list of annotation filenames corresponding to that month.  
        - Save the dictionary as a JSON file and reload it to verify the data.  
   b. **Save Data in Pickle Format**  
      - Save the same grouped dictionary in Pickle format, a binary format. Reload the file and display the data in a tabular format to confirm its accuracy.  
   c. **Enhance Data with Names and Dates**  
      - Modify the grouped dictionary so that each entry for a month contains:  
        - A list of dictionaries, where each dictionary includes; name and date   
      - Save the enhanced structure as a JSON file for detailed and future-proof data representation.  

3. **Sort Annotations from the Second Half of 2024**  
   - Extract and sort annotations from the second half of 2024. Display the filenames and corresponding dates in chronological order.  

---

##### Exercise 1:



In [15]:
import os
import re 
from datetime import datetime
import pandas as pd 

# Location od annotation folder
annotations_dir = r"C:\Users\selee\OneDrive\Documents\GitHub\Python4DS\S5_execises\annotations"

# Creating a dictionary to store annotations for each month
annotations_by_month = {}

# Going through each file in my annotations directory
for file in os.listdir(annotations_dir):
    match = re.match(r'(\d{8}_\d{6}).*\.txt', file)
    if match: 

        date_str = match.group(1)
        
        date = datetime.strptime(date_str, "%Y%m%d_%H%M%S")  
        
        month_key = date.strftime("%Y-%m")
        
        if month_key not in annotations_by_month:
            annotations_by_month[month_key] = 0
        
        annotations_by_month[month_key] += 1

# Find which month has the most annotations
most_annotations_month = max(annotations_by_month, key=annotations_by_month.get)        # Using the max() function to get the month with the highest count

# Creating a DataFrame from my dictionary to make the data easier to view
df_annotations = pd.DataFrame(list(annotations_by_month.items()), columns=["Month", "Count"])

# Sorting the table so that the month with the most annotations is at the top
df_annotations = df_annotations.sort_values(by="Count", ascending=False)

# Displaying the results
print(df_annotations)

print(f"Month with the most annotations: {most_annotations_month} with {annotations_by_month[most_annotations_month]} annotations")

     Month  Count
5  2024-06     52
1  2024-02     45
3  2024-04     37
4  2024-05     28
0  2024-01     27
2  2024-03     17
Month with the most annotations: 2024-06 with 52 annotations


#### Exercise 2a:

In [16]:
import json

# Create a dictionary where each key is a month, and the value is a list of all annotation names for that month
annotations_grouped = {}

for file in os.listdir(annotations_dir):
    match = re.match(r'(\d{8}_\d{6}).*\.txt', file)  # Extract the date part from the file name
    if match:
        date_str = match.group(1)
        date = datetime.strptime(date_str, "%Y%m%d_%H%M%S")  # Convert to a datetime object
        month_key = date.strftime("%Y-%m")  # Extract the month (YYYY-MM)
        
        # Add the annotation name to the corresponding month
        if month_key not in annotations_grouped:
            annotations_grouped[month_key] = []
        annotations_grouped[month_key].append(file)

# Save the dictionary to a JSON file
json_path = r"C:\Users\selee\OneDrive\Documents\GitHub\Python4DS\S5_execises\annotations_month.json"
with open(json_path, 'w') as json_file:
    json.dump(annotations_grouped, json_file)

# Load the JSON file again to verify it works
with open(json_path, 'r') as json_file:
    loaded_annotations = json.load(json_file)

print("JSON data:")
print(loaded_annotations)


JSON data:
{'2024-01': ['20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3770.txt', '20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3772.txt', '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4162.txt', '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4164.txt', '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_554_4162.txt', '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3740.txt', '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3742.txt', '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_396_3752.txt', '20240102_185527_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_740_3850.txt', '20240102_185605_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_690_3572.txt', '20240102_185954_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_414_3786.txt', '20240104_220339_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_556_4178.txt', '20240110_192002_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_380_3728.txt', '20240112_1925

#### Exercise 2b:

In [17]:
import pickle

# Save the dictionary as a Pickle file
pickle_path = r"C:\Users\selee\OneDrive\Documents\GitHub\Python4DS\S5_execises\annotations_month.pkl"
with open(pickle_path, 'wb') as pickle_file:
    pickle.dump(annotations_grouped, pickle_file)

# Load the Pickle file to verify it works
with open(pickle_path, 'rb') as pickle_file:
    loaded_annotations_pickle = pickle.load(pickle_file)

print("Pickle data:")
print(loaded_annotations_pickle)


Pickle data:
{'2024-01': ['20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3770.txt', '20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3772.txt', '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4162.txt', '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4164.txt', '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_554_4162.txt', '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3740.txt', '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3742.txt', '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_396_3752.txt', '20240102_185527_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_740_3850.txt', '20240102_185605_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_690_3572.txt', '20240102_185954_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_414_3786.txt', '20240104_220339_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_556_4178.txt', '20240110_192002_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_380_3728.txt', '20240112_19

#### Exercise 2c:

In [18]:
# Create a dictionary where each key is a month, and the value is a list of dictionaries with 'file_name' and 'file_date' keys
monthly_annotations = {}

for file_name in os.listdir(annotations_dir):
    match = re.match(r'(\d{8}_\d{6}).*\.txt', file_name)  # Extract the date part from the file name
    if match:
        date_string = match.group(1)
        file_date = datetime.strptime(date_string, "%Y%m%d_%H%M%S")  # Convert to a datetime object
        month_key = file_date.strftime("%Y-%m")  # Extract the month (YYYY-MM)
        
        # Create a dictionary with 'file_name' and 'file_date' for this annotation
        annotation_info = {'file_name': file_name, 'file_date': file_date}
        
        # Add the annotation details to the corresponding month
        if month_key not in monthly_annotations:
            monthly_annotations[month_key] = []
        monthly_annotations[month_key].append(annotation_info)

# Save the modified dictionary to a new JSON file for verification
json_output_path = r"C:\Users\selee\OneDrive\Documents\GitHub\Python4DS\S5_execises\annotations_detailes.json"
with open(json_output_path, 'w') as json_file:
    json.dump(monthly_annotations, json_file, default=str)


    # Display a small sample of annotations for verification
    sample_month = '2024-06'
    sample_annotations = monthly_annotations[sample_month][:5]  # Get the first 5 annotations for the sample month

    print(f"Sample annotations for {sample_month}:")
    for annotation in sample_annotations:
        print(annotation)


Sample annotations for 2024-06:
{'file_name': '20240602_215203_SN30_QUICKVIEW_VISUAL_1_6_0_SATL-2KM-10N_712_3948.txt', 'file_date': datetime.datetime(2024, 6, 2, 21, 52, 3)}
{'file_name': '20240602_215203_SN30_QUICKVIEW_VISUAL_1_6_0_SATL-2KM-10N_714_3948.txt', 'file_date': datetime.datetime(2024, 6, 2, 21, 52, 3)}
{'file_name': '20240603_215226_SN28_QUICKVIEW_VISUAL_1_6_0_SATL-2KM-11N_248_4068.txt', 'file_date': datetime.datetime(2024, 6, 3, 21, 52, 26)}
{'file_name': '20240603_215348_SN28_QUICKVIEW_VISUAL_1_6_0_SATL-2KM-11N_346_3786.txt', 'file_date': datetime.datetime(2024, 6, 3, 21, 53, 48)}
{'file_name': '20240604_214955_SN29_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-10N_594_4136.txt', 'file_date': datetime.datetime(2024, 6, 4, 21, 49, 55)}


#### Exercise 3:

In [19]:
# Initialize an empty list to store annotations from the second half of 2024
annotations_second_half_2024 = []

# Iterate through the dictionary to find months in the second half of 2024
for month, annotations_list in monthly_annotations.items():  # Loop through each month and its annotations
    # Check if the month is between June and December 2024
    if "2024-06" <= month <= "2024-12":
        # Add all annotations for this month to the list
        annotations_second_half_2024.extend(annotations_list)

# Sort the annotations by their date
sorted_annotations_2024 = sorted(annotations_second_half_2024, key=lambda x: x['file_date'])  # Sort by the 'file_date' key

# Print the sorted annotations
print("All annotations from the second half of 2024 sorted:")
for annotation in sorted_annotations_2024:
    print(f"File: {annotation['file_name']}, Date: {annotation['file_date']}")


All annotations from the second half of 2024 sorted:
File: 20240602_215203_SN30_QUICKVIEW_VISUAL_1_6_0_SATL-2KM-10N_712_3948.txt, Date: 2024-06-02 21:52:03
File: 20240602_215203_SN30_QUICKVIEW_VISUAL_1_6_0_SATL-2KM-10N_714_3948.txt, Date: 2024-06-02 21:52:03
File: 20240603_215226_SN28_QUICKVIEW_VISUAL_1_6_0_SATL-2KM-11N_248_4068.txt, Date: 2024-06-03 21:52:26
File: 20240603_215348_SN28_QUICKVIEW_VISUAL_1_6_0_SATL-2KM-11N_346_3786.txt, Date: 2024-06-03 21:53:48
File: 20240604_214955_SN29_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-10N_594_4136.txt, Date: 2024-06-04 21:49:55
File: 20240605_212717_SN29_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_718_3608.txt, Date: 2024-06-05 21:27:17
File: 20240606_180251_SN33_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-10N_556_4180.txt, Date: 2024-06-06 18:02:51
File: 20240607_200250_SN27_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-10N_554_4172.txt, Date: 2024-06-07 20:02:50
File: 20240608_214614_SN29_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_248_4068.txt, Date: 2024-06-08 21:46:14
File: 20240609_19174