## Session 5 Assignment

Reusing the same annotations we work with in the previous session, answer the following items using the libraries we saw today: 

1. How many annotations you have per month and year. Which month has more annotation files.
2. Create a dictionary where each **key** is a month, and the corresponding **value** is a list containing all the annotation names with where their date corresponds to the month. 
    a. Save it following the json format, and load it again to check that everything is ok.
    b. Save it this time using Pickle.
    c. Instead of storing a list of all the annotation names happening that month, let's create for each annotation a dictionary with keys: name and date (using a datetime object).
3. Print all the annotations from the oldest ones to the newest one during the seconf half of the 2024. 

In [1]:
import re
import os
import glob
from datetime import datetime
import numpy as np

annotations = glob.glob('../Session_5/annotations/*.txt')
annotations

['../Session_5/annotations\\20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3770.txt',
 '../Session_5/annotations\\20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3772.txt',
 '../Session_5/annotations\\20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4162.txt',
 '../Session_5/annotations\\20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4164.txt',
 '../Session_5/annotations\\20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_554_4162.txt',
 '../Session_5/annotations\\20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3740.txt',
 '../Session_5/annotations\\20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3742.txt',
 '../Session_5/annotations\\20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_396_3752.txt',
 '../Session_5/annotations\\20240102_185527_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_740_3850.txt',
 '../Session_5/annotations\\20240102_185605_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_690_3

In [2]:
pattern = r'(\d{8})_(\d{6})_SN(\d+)_QUICKVIEW_VISUAL_([\d_]+)_([A-Za-z0-9\-_.]+)\.txt'

for annotation in annotations:

    # extract the file name
    filename = os.path.basename(annotation)
    
    # Search and extract values
    match = re.match(pattern, filename)
    if match:
        date, time, satellite_number, version, unique_region = match.groups()
        print(f"Date: {date}; Time: {time}; SN: {satellite_number}; ver: {version}; region: {unique_region}")
    else:
        print(f"Could not match {filename}")

Date: 20240101; Time: 174301; SN: 33; ver: 1_1_10; region: SATL-2KM-11N_404_3770
Date: 20240101; Time: 174301; SN: 33; ver: 1_1_10; region: SATL-2KM-11N_404_3772
Date: 20240101; Time: 192856; SN: 24; ver: 1_1_10; region: SATL-2KM-10N_552_4162
Date: 20240101; Time: 192856; SN: 24; ver: 1_1_10; region: SATL-2KM-10N_552_4164
Date: 20240101; Time: 192856; SN: 24; ver: 1_1_10; region: SATL-2KM-10N_554_4162
Date: 20240101; Time: 213601; SN: 31; ver: 1_1_10; region: SATL-2KM-11N_392_3740
Date: 20240101; Time: 213601; SN: 31; ver: 1_1_10; region: SATL-2KM-11N_392_3742
Date: 20240101; Time: 213601; SN: 31; ver: 1_1_10; region: SATL-2KM-11N_396_3752
Date: 20240102; Time: 185527; SN: 27; ver: 1_1_10; region: SATL-2KM-11N_740_3850
Date: 20240102; Time: 185605; SN: 27; ver: 1_1_10; region: SATL-2KM-11N_690_3572
Date: 20240102; Time: 185954; SN: 24; ver: 1_1_10; region: SATL-2KM-11N_414_3786
Date: 20240104; Time: 220339; SN: 31; ver: 1_1_10; region: SATL-2KM-10N_556_4178
Date: 20240110; Time: 192002

### Exercise 1

How many annotations you have per month and year. Which month has more annotation files.

In [3]:
# Dictionaries to track annotations per month and per year
monthly_counts = {}
yearly_counts = {}

In [4]:
#Create loop to iterate through the annotations

for annotation in annotations:
    # Extract the file name
    filename = os.path.basename(annotation)
    
    # Search and extract values
    match = re.match(pattern, filename)
    if match:
        date_str, _, _, _, _ = match.groups()
        
        # Convert date from 'YYYYMMDD' to 'YYYY-MM' and 'YYYY'
        try:
            date_obj = datetime.strptime(date_str, '%Y%m%d')
            month = date_obj.strftime('%m')
            year = date_obj.strftime('%Y')
            
            # Update monthly counts
            if month in monthly_counts:
                monthly_counts[month] += 1
            else:
                monthly_counts[month] = 1
            
            # Update yearly counts
            if year in yearly_counts:
                yearly_counts[year] += 1
            else:
                yearly_counts[year] = 1
        except ValueError:
            print(f"Invalid date format in {filename}")
    else:
        print(f"Could not match {filename}")

Could not match 20240405_183824_409694_MS_NS24_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_736_3716.txt
Could not match 20240407_190149_742846_MS_NS24_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_258_4028.txt
Could not match 20240408_211552_958249_MS_NS29_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_734_3742.txt
Could not match 20240410_214305_399233_MS_NS43_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_380_3764.txt
Could not match 20240410_214321_024179_MS_NS30_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_296_3786.txt
Could not match 20240412_052750_556466_MS_NS29_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-51N_688_4420.txt
Could not match 20240412_191539_377035_MS_NS24_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_258_4038.txt
Could not match 20240412_191539_631044_MS_NS24_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_258_4036.txt
Could not match 20240412_191549_672087_MS_NS24_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_240_3966.txt
Could not match 20240417_215406_715231_MS_NS43_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-10N_740_4446.txt
Could not match 20240418_213446_163074_M

In [5]:
# Display the monthly counts
print("Monthly Counts:")
for month, count in sorted(monthly_counts.items()):
    print(f"{month}: {count} annotations")

Monthly Counts:
01: 27 annotations
02: 45 annotations
03: 17 annotations
04: 25 annotations
05: 28 annotations
06: 52 annotations


In [6]:
# Find the month with the most annotations
most_annotations_month = max(monthly_counts.items(), key=lambda x: x[1])
print(f"\nMonth with the most annotations: {most_annotations_month[0]} with {most_annotations_month[1]} annotations")


Month with the most annotations: 06 with 52 annotations


In [7]:
# Display the yearly counts
print("\nYearly Counts:")
for year, count in sorted(yearly_counts.items()):
    print(f"{year}: {count} annotations")


Yearly Counts:
2024: 194 annotations


In [8]:
# Find the year with the most annotations
most_annotations_year = max(yearly_counts.items(), key=lambda x: x[1])
print(f"Year with the most annotations: {most_annotations_year[0]} with {most_annotations_year[1]} annotations")

Year with the most annotations: 2024 with 194 annotations


### Exercise 2

Create a dictionary where each **key** is a month, and the corresponding **value** is a list containing all the annotation names with where their date corresponds to the month.

    a. Save it following the json format, and load it again to check that everything is ok. 
    b. Save it this time using Pickle. 
    c. Instead of storing a list of all the annotation names happening that month, let's create for each annotation a dictionary with keys: name and date (using a datetime object).

For easy reference:


pattern = r'(\d{8})_(\d{6})_SN(\d+)_QUICKVIEW_VISUAL_([\d_]+)_([A-Za-z0-9\-_.]+)\.txt'


annotations = glob.glob('../Session_5/annotations/*.txt')

In [10]:

#Dictionary to group annotations by month:
annotations_by_month = {}

# Run a for loop to iterate through the annotations
for annotation in annotations:
    
    filename = os.path.basename(annotation)
    match = re.match(pattern, filename)
    
    if match:
        date_str, _, _, _, _ = match.groups()
        
        date_obj = datetime.strptime(date_str, '%Y%m%d')
        month = date_obj.strftime('%m')  # Format as 'MM'
            
        if month not in annotations_by_month:
            annotations_by_month[month] = [] # Initialize the month key if it doesn't exist
        annotations_by_month[month].append(filename) # Add the annotation name to the list for this month

    else:
        print(f"Could not match {filename}")

annotations_by_month

Could not match 20240405_183824_409694_MS_NS24_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_736_3716.txt
Could not match 20240407_190149_742846_MS_NS24_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_258_4028.txt
Could not match 20240408_211552_958249_MS_NS29_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_734_3742.txt
Could not match 20240410_214305_399233_MS_NS43_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_380_3764.txt
Could not match 20240410_214321_024179_MS_NS30_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_296_3786.txt
Could not match 20240412_052750_556466_MS_NS29_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-51N_688_4420.txt
Could not match 20240412_191539_377035_MS_NS24_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_258_4038.txt
Could not match 20240412_191539_631044_MS_NS24_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_258_4036.txt
Could not match 20240412_191549_672087_MS_NS24_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_240_3966.txt
Could not match 20240417_215406_715231_MS_NS43_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-10N_740_4446.txt
Could not match 20240418_213446_163074_M

{'01': ['20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3770.txt',
  '20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3772.txt',
  '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4162.txt',
  '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4164.txt',
  '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_554_4162.txt',
  '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3740.txt',
  '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3742.txt',
  '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_396_3752.txt',
  '20240102_185527_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_740_3850.txt',
  '20240102_185605_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_690_3572.txt',
  '20240102_185954_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_414_3786.txt',
  '20240104_220339_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_556_4178.txt',
  '20240110_192002_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_380_3728.txt',
  '202

#### Part 2a: Save annotations by month dictionary following the json format, and load it again to check that everything is ok. 

In [11]:
import json

#Step 1: Save to JSON file
with open('annotations_by_month.json', 'w') as json_file:
    json.dump(annotations_by_month, json_file)

#Step 2: Load from JSON file
with open('annotations_by_month.json', 'r') as json_file:
    loaded_annotations = json.load(json_file)

loaded_annotations

{'01': ['20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3770.txt',
  '20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3772.txt',
  '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4162.txt',
  '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4164.txt',
  '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_554_4162.txt',
  '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3740.txt',
  '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3742.txt',
  '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_396_3752.txt',
  '20240102_185527_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_740_3850.txt',
  '20240102_185605_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_690_3572.txt',
  '20240102_185954_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_414_3786.txt',
  '20240104_220339_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_556_4178.txt',
  '20240110_192002_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_380_3728.txt',
  '202

In [12]:
# Check if the loaded data matches the original dictionary
assert loaded_annotations == annotations_by_month, "Loaded data does not match the original"
print("Verification successful: Loaded data matches the original dictionary.")

Verification successful: Loaded data matches the original dictionary.


#### Part 2b: Save annotations by month dictionary using Pickle. 

In [13]:
import pickle

In [14]:
# Step 1: Save annotations_by_month to a Pickle file
with open('annotations_by_month.pkl', 'wb') as pickle_file:
    pickle.dump(annotations_by_month, pickle_file)

# Step 2: Load annotations_by_month from Pickle file
with open('annotations_by_month.pkl', 'rb') as pickle_file:
    loaded_annotations_as_pickle = pickle.load(pickle_file)

loaded_annotations_as_pickle

{'01': ['20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3770.txt',
  '20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3772.txt',
  '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4162.txt',
  '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4164.txt',
  '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_554_4162.txt',
  '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3740.txt',
  '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3742.txt',
  '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_396_3752.txt',
  '20240102_185527_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_740_3850.txt',
  '20240102_185605_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_690_3572.txt',
  '20240102_185954_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_414_3786.txt',
  '20240104_220339_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_556_4178.txt',
  '20240110_192002_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_380_3728.txt',
  '202

In [15]:
# Check if the loaded data matches the original dictionary
assert loaded_annotations_as_pickle == annotations_by_month, "Loaded data does not match the original"
print("Verification successful: Loaded data matches the original dictionary.")

Verification successful: Loaded data matches the original dictionary.


#### Part 2c: Instead of list, we create for each annotation a dictionary with keys: name and date.

In [22]:
# Change from storing as items in a list by month, to a dictionary with 'name' and 'date' by month inside the list

annotation_month_dict = {}

for annotation in annotations:
    filename = os.path.basename(annotation)

    match = re.match(pattern, filename)
    if match:
        date_str, _, _, _, _ = match.groups()

        date_obj = datetime.strptime(date_str, '%Y%m%d')
        month = date_obj.strftime('%m')
        
        if month not in annotation_month_dict:
            annotation_month_dict[month] = []
        
        # Add a dictionary with 'name' and 'date' to the list for this month
        annotation_month_dict[month].append({
            'name': filename,
            'date': date_obj
        })
    
    else:
        print(f"Could not match filename: {filename}")

annotation_month_dict

Could not match filename: 20240405_183824_409694_MS_NS24_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_736_3716.txt
Could not match filename: 20240407_190149_742846_MS_NS24_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_258_4028.txt
Could not match filename: 20240408_211552_958249_MS_NS29_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_734_3742.txt
Could not match filename: 20240410_214305_399233_MS_NS43_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_380_3764.txt
Could not match filename: 20240410_214321_024179_MS_NS30_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_296_3786.txt
Could not match filename: 20240412_052750_556466_MS_NS29_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-51N_688_4420.txt
Could not match filename: 20240412_191539_377035_MS_NS24_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_258_4038.txt
Could not match filename: 20240412_191539_631044_MS_NS24_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_258_4036.txt
Could not match filename: 20240412_191549_672087_MS_NS24_QUICKVIEW_VISUAL_1_3_0_SATL-2KM-11N_240_3966.txt
Could not match filename: 20240417_215406_7152

{'01': [{'name': '20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3770.txt',
   'date': datetime.datetime(2024, 1, 1, 0, 0)},
  {'name': '20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3772.txt',
   'date': datetime.datetime(2024, 1, 1, 0, 0)},
  {'name': '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4162.txt',
   'date': datetime.datetime(2024, 1, 1, 0, 0)},
  {'name': '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4164.txt',
   'date': datetime.datetime(2024, 1, 1, 0, 0)},
  {'name': '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_554_4162.txt',
   'date': datetime.datetime(2024, 1, 1, 0, 0)},
  {'name': '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3740.txt',
   'date': datetime.datetime(2024, 1, 1, 0, 0)},
  {'name': '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_392_3742.txt',
   'date': datetime.datetime(2024, 1, 1, 0, 0)},
  {'name': '20240101_213601_SN31_QUICKVIEW_VISUAL_1_1_10

### Exercise 3

Print all the annotations from the oldest ones to the newest one during the second half of the 2024. 

In [26]:
# List to collect all relevant annotations from the second half of 2024
second_half_2024_annotations = []

# Filter and collect annotations from July to December 2024
for month in ["07", "08", "09", "10", "11", "12"]:
    if month in annotation_month_dict:
        for annotation in annotation_month_dict[month].values():
            # Check if the year is 2024
            if annotation['date'].year == 2024:
                second_half_2024_annotations.append(annotation)

# Sort the annotations by date in ascending order
second_half_2024_annotations.sort(key=lambda x: x['date'])

# Print the annotations in the sorted order
print("Annotations from the second half of 2024, oldest to newest:")
for annotation in second_half_2024_annotations:
    print(f"Date: {annotation['date'].strftime('%Y-%m-%d')}, Name: {annotation['name']}")

Annotations from the second half of 2024, oldest to newest:


Note that above satellites are only from Jan to June 2024, hence no satellites to be sorted in the second half of 2024.