# Python for Data Science
---

## Assignment 5

### Basic Libraries II
---

# README.md

## Overview
This project processes annotation files stored in a directory and performs various operations such as counting annotations, grouping them by month, saving the data in different formats, and organizing it for easier analysis. The exercises showcase how to work with file handling, datetime manipulation, and data serialization using JSON and Pickle.

## Exercises

### 1. Count Annotations Per Month and Year
- **Description**: Counts the number of annotation files for each month and year based on the dates in their filenames. Identifies the month with the highest number of annotations and displays the data as a sorted table.

### 2a. Save Data in JSON Format
- **Description**: Groups annotations by month into a dictionary where:
  - Each **key** is a month (e.g., `2024-06`).
  - Each **value** is a list of annotation filenames for that month.
  The data is saved in JSON format, and the JSON file is reloaded to verify correctness.

### 2b. Save Data in Pickle Format
- **Description**: Saves the same grouped dictionary as in 2a but using Pickle, a binary format. The Pickle file is reloaded and displayed in a tabular format to ensure its integrity.

### 2c. Enhance Data with Names and Dates
- **Description**: Modifies the grouped dictionary so that each entry for a month contains:
  - A list of dictionaries, where each dictionary has:
    - `name`: The annotation filename.
    - `date`: The date (as a `datetime` object) extracted from the filename.
  This detailed structure is saved in JSON format for easier future use.

### 3. Sort Annotations from the Second Half of 2024
- **Description**: Extracts annotations from the second half of 2024 (June to December) and sorts them chronologically. The output is displayed as a list, showing both filenames and dates in ascending order.

---
## Exercise 1

**Description**: Counts the number of annotation files for each month and year based on the dates in their filenames. Identifies the month with the highest number of annotations and displays the data as a sorted table.

In [1]:
import os
import re 
from datetime import datetime
import pandas as pd 

# Step 1: Specifying where my annotation files are located
annotations_dir = '/Users/noor/Desktop/Python for Data Science/Week 5/session_5/annotations'

# Step 2: Creating a dictionary to store how many annotations I find for each month
annotations_by_month = {}

# Step 3: Going through each file in my annotations directory
for file in os.listdir(annotations_dir):  # This gives me the name of every file in the folder
    match = re.match(r'(\d{8}_\d{6}).*\.txt', file)     # Using a regular expression to find the date part in the file name
    if match: 

        date_str = match.group(1)
        
        date = datetime.strptime(date_str, "%Y%m%d_%H%M%S")  
        
        month_key = date.strftime("%Y-%m")
        
        if month_key not in annotations_by_month:
            annotations_by_month[month_key] = 0
        
        annotations_by_month[month_key] += 1

# Step 4: To find which month has the most annotations
most_annotations_month = max(annotations_by_month, key=annotations_by_month.get)        # Using the max() function to get the month with the highest count

# Step 5: Creating a table (DataFrame) from my dictionary to make the data easier to view
df_annotations = pd.DataFrame(list(annotations_by_month.items()), columns=["Month", "Count"])

# Step 6: Sorting the table so that the month with the most annotations is at the top
df_annotations = df_annotations.sort_values(by="Count", ascending=False)

# Step 7: Displaying the results
print("Here’s the table of annotations per month and year, sorted by count:")
print(df_annotations)  # This prints a neat table

# Finally, highlighting which month has the most annotations
print("\nThe month with the most annotations is:")
print(f"{most_annotations_month} with {annotations_by_month[most_annotations_month]} annotations")

Here’s the table of annotations per month and year, sorted by count:
     Month  Count
1  2024-06     52
3  2024-02     45
2  2024-04     37
5  2024-05     28
0  2024-01     27
4  2024-03     17

The month with the most annotations is:
2024-06 with 52 annotations


---
## Exercise 2a

**Description**: Groups annotations by month into a dictionary where:
  - Each **key** is a month (e.g., `2024-06`).
  - Each **value** is a list of annotation filenames for that month.
  The data is saved in JSON format, and the JSON file is reloaded to verify correctness.

In [13]:
import json

# Step 1: Creating the dictionary where each key is a month, and the value is a list of all annotation names for that month
annotations_grouped = {}

for file in os.listdir(annotations_dir):
    match = re.match(r'(\d{8}_\d{6}).*\.txt', file)  # Extracting date part from the file name
    if match:
        date_str = match.group(1)
        date = datetime.strptime(date_str, "%Y%m%d_%H%M%S")  # Converting to datetime
        month_key = date.strftime("%Y-%m")  # Extracting month (YYYY-MM)
        
        # Adding the annotation name to the corresponding month
        if month_key not in annotations_grouped:
            annotations_grouped[month_key] = []
        annotations_grouped[month_key].append(file)

# Step 2: Saving the dictionary to a JSON file
json_path = '/Users/noor/Desktop/Python for Data Science/Week 5/session_5/annotations_by_month.json'
with open(json_path, 'w') as json_file:
    json.dump(annotations_grouped, json_file)

# Step 3: Loading the JSON file again to verify it works
with open(json_path, 'r') as json_file:
    loaded_annotations = json.load(json_file)

# Printing to confirm the data is loaded correctly
print(f"JSON saved at: {json_path}")
print("Loaded JSON data:")
print(loaded_annotations)

JSON saved at: /Users/noor/Desktop/Python for Data Science/Week 5/session_5/annotations_by_month.json
Loaded JSON data:
{'2024-01': ['20240102_185527_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_740_3850.txt', '20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3770.txt', '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4164.txt', '20240102_185954_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_414_3786.txt', '20240104_220339_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_556_4178.txt', '20240115_213834_SN28_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_376_3722.txt', '20240126_173752_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_386_3722.txt', '20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3772.txt', '20240130_173903_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_366_3756.txt', '20240127_190620_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_500_3600.txt', '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4162.txt', '20240127_190620_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2K

In [7]:
# Displaying the grouped annotations as a table
df_grouped = pd.DataFrame([
    {"Month": month, "Annotations": len(files), "Files": files}
    for month, files in annotations_grouped.items()
])

# Showing the DataFrame
print("Annotations grouped by month:")
print(df_grouped)

Annotations grouped by month:
     Month  Annotations                                              Files
0  2024-01           27  [20240102_185527_SN27_QUICKVIEW_VISUAL_1_1_10_...
1  2024-06           52  [20240623_193704_SN27_QUICKVIEW_VISUAL_1_7_0_S...
2  2024-04           37  [20240402_184757_SN24_QUICKVIEW_VISUAL_1_2_0_S...
3  2024-02           45  [20240201_075140_SN26_QUICKVIEW_VISUAL_1_1_10_...
4  2024-03           17  [20240322_212516_SN28_QUICKVIEW_VISUAL_1_2_0_S...
5  2024-05           28  [20240506_192008_SN26_QUICKVIEW_VISUAL_1_5_0_S...


# Exercise 2b

**Description**: Saves the same grouped dictionary as in 2a but using Pickle, a binary format. The Pickle file is reloaded and displayed in a tabular format to ensure its integrity.

In [12]:
import pickle

# Saving the same dictionary as a Pickle file
pickle_path = '/Users/noor/Desktop/Python for Data Science/Week 5/session_5/annotations_by_month.pkl'
with open(pickle_path, 'wb') as pickle_file:
    pickle.dump(annotations_grouped, pickle_file)

# Loading the Pickle file again to verify it works
with open(pickle_path, 'rb') as pickle_file:
    loaded_annotations_pickle = pickle.load(pickle_file)

# Printing to confirm the data is loaded correctly
print(f"Pickle saved at: {pickle_path}")
print("Loaded Pickle data:")
print(loaded_annotations_pickle)

Pickle saved at: /Users/noor/Desktop/Python for Data Science/Week 5/session_5/annotations_by_month.pkl
Loaded Pickle data:
{'2024-01': ['20240102_185527_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_740_3850.txt', '20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3770.txt', '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4164.txt', '20240102_185954_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_414_3786.txt', '20240104_220339_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_556_4178.txt', '20240115_213834_SN28_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_376_3722.txt', '20240126_173752_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_386_3722.txt', '20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3772.txt', '20240130_173903_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_366_3756.txt', '20240127_190620_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_500_3600.txt', '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4162.txt', '20240127_190620_SN27_QUICKVIEW_VISUAL_1_1_10_SATL

In [10]:
# Loading the Pickle file
with open(pickle_path, 'rb') as pickle_file:
    loaded_annotations_pickle = pickle.load(pickle_file)

# Converting the Pickle data into a table format for better display
df_pickle_grouped = pd.DataFrame([
    {"Month": month, "Annotations": len(files), "Files": files}
    for month, files in loaded_annotations_pickle.items()
])

# Displaying the table
print("Annotations grouped by month (Pickle data):")
print(df_pickle_grouped)

Annotations grouped by month (Pickle data):
     Month  Annotations                                              Files
0  2024-01           27  [20240102_185527_SN27_QUICKVIEW_VISUAL_1_1_10_...
1  2024-06           52  [20240623_193704_SN27_QUICKVIEW_VISUAL_1_7_0_S...
2  2024-04           37  [20240402_184757_SN24_QUICKVIEW_VISUAL_1_2_0_S...
3  2024-02           45  [20240201_075140_SN26_QUICKVIEW_VISUAL_1_1_10_...
4  2024-03           17  [20240322_212516_SN28_QUICKVIEW_VISUAL_1_2_0_S...
5  2024-05           28  [20240506_192008_SN26_QUICKVIEW_VISUAL_1_5_0_S...


# Exercise 2c

**Description**: Modifies the grouped dictionary so that each entry for a month contains:
  - A list of dictionaries, where each dictionary has:
    - `name`: The annotation filename.
    - `date`: The date (as a `datetime` object) extracted from the filename.
  This detailed structure is saved in JSON format for easier future use.

In [11]:
# Step 1: Creating the dictionary where each key is a month
annotations_grouped_with_details = {}   # The value will now be a list of dictionaries with 'name' and 'date' keys

for file in os.listdir(annotations_dir):
    match = re.match(r'(\d{8}_\d{6}).*\.txt', file)  # Extracting date part from the file name
    if match:
        date_str = match.group(1)
        date = datetime.strptime(date_str, "%Y%m%d_%H%M%S")  # Converting to datetime
        month_key = date.strftime("%Y-%m")  # Extracting month (YYYY-MM)
        
        # Creating a dictionary with name and date for this annotation
        annotation_details = {'name': file, 'date': date}
        
        # Adding the annotation details to the corresponding month
        if month_key not in annotations_grouped_with_details:
            annotations_grouped_with_details[month_key] = []
        annotations_grouped_with_details[month_key].append(annotation_details)

# Saving the modified dictionary to a new JSON file for verification
json_path_detailed = '/Users/noor/Desktop/Python for Data Science/Week 5/session_5/annotations_with_details.json'
with open(json_path_detailed, 'w') as json_file:
    json.dump(annotations_grouped_with_details, json_file, default=str)

print(f"Detailed JSON saved at: {json_path_detailed}")
print("Sample of grouped annotations with details:")
for month, annotations in list(annotations_grouped_with_details.items())[:1]:  # Showing only one month
    print(f"{month}: {annotations}")

Detailed JSON saved at: /Users/noor/Desktop/Python for Data Science/Week 5/session_5/annotations_with_details.json
Sample of grouped annotations with details:
2024-01: [{'name': '20240102_185527_SN27_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_740_3850.txt', 'date': datetime.datetime(2024, 1, 2, 18, 55, 27)}, {'name': '20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_404_3770.txt', 'date': datetime.datetime(2024, 1, 1, 17, 43, 1)}, {'name': '20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_552_4164.txt', 'date': datetime.datetime(2024, 1, 1, 19, 28, 56)}, {'name': '20240102_185954_SN24_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_414_3786.txt', 'date': datetime.datetime(2024, 1, 2, 18, 59, 54)}, {'name': '20240104_220339_SN31_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-10N_556_4178.txt', 'date': datetime.datetime(2024, 1, 4, 22, 3, 39)}, {'name': '20240115_213834_SN28_QUICKVIEW_VISUAL_1_1_10_SATL-2KM-11N_376_3722.txt', 'date': datetime.datetime(2024, 1, 15, 21, 38, 34)}, {'name': '20240126_173

In [15]:
# Converting the detailed dictionary into a DataFrame for better display
detailed_annotations_list = [
    {"Month": month, "Name": item["name"], "Date": item["date"]}
    for month, annotations in annotations_grouped_with_details.items()
    for item in annotations
]

# Creating a DataFrame
df_detailed = pd.DataFrame(detailed_annotations_list)

# Showing the DataFrame
print("Detailed annotations (name and date):")
print(df_detailed.head(10))  # Showing only the first 10 rows

Detailed annotations (name and date):
     Month                                               Name  \
0  2024-01  20240102_185527_SN27_QUICKVIEW_VISUAL_1_1_10_S...   
1  2024-01  20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_S...   
2  2024-01  20240101_192856_SN24_QUICKVIEW_VISUAL_1_1_10_S...   
3  2024-01  20240102_185954_SN24_QUICKVIEW_VISUAL_1_1_10_S...   
4  2024-01  20240104_220339_SN31_QUICKVIEW_VISUAL_1_1_10_S...   
5  2024-01  20240115_213834_SN28_QUICKVIEW_VISUAL_1_1_10_S...   
6  2024-01  20240126_173752_SN33_QUICKVIEW_VISUAL_1_1_10_S...   
7  2024-01  20240101_174301_SN33_QUICKVIEW_VISUAL_1_1_10_S...   
8  2024-01  20240130_173903_SN33_QUICKVIEW_VISUAL_1_1_10_S...   
9  2024-01  20240127_190620_SN27_QUICKVIEW_VISUAL_1_1_10_S...   

                 Date  
0 2024-01-02 18:55:27  
1 2024-01-01 17:43:01  
2 2024-01-01 19:28:56  
3 2024-01-02 18:59:54  
4 2024-01-04 22:03:39  
5 2024-01-15 21:38:34  
6 2024-01-26 17:37:52  
7 2024-01-01 17:43:01  
8 2024-01-30 17:39:03  
9 2024

---
## Exercise 3

**Description**: Extracts annotations from the second half of 2024 (June to December) and sorts them chronologically. The output is displayed as a list, showing both filenames and dates in ascending order.

In [3]:
# Step 1: Creating an empty list to store annotations from the second half of 2024
second_half_2024 = []

# Step 2: Looping through the dictionary to find months in the second half of 2024
for month, annotations in annotations_grouped.items():  # Going through each month and its annotations
    # If the month is between June and December 2024 (Since there are no annotations July onwards)
    if "2024-06" <= month <= "2024-12":
        # Add all annotations for this month to my list
        second_half_2024.extend(annotations)

# Step 3: Sorting the annotations by date
second_half_2024_sorted = sorted(second_half_2024, key=lambda x: x['date'])  # Sorting by the 'date' key

# Step 4: Printing the sorted annotations
print("Here are all the annotations from the second half of 2024, sorted by date:")
for annotation in second_half_2024_sorted:
    print(f"File: {annotation['name']}, Date: {annotation['date']}")

Here are all the annotations from the second half of 2024, sorted by date:
File: 20240602_215203_SN30_QUICKVIEW_VISUAL_1_6_0_SATL-2KM-10N_714_3948.txt, Date: 2024-06-02 21:52:03
File: 20240602_215203_SN30_QUICKVIEW_VISUAL_1_6_0_SATL-2KM-10N_712_3948.txt, Date: 2024-06-02 21:52:03
File: 20240603_215226_SN28_QUICKVIEW_VISUAL_1_6_0_SATL-2KM-11N_248_4068.txt, Date: 2024-06-03 21:52:26
File: 20240603_215348_SN28_QUICKVIEW_VISUAL_1_6_0_SATL-2KM-11N_346_3786.txt, Date: 2024-06-03 21:53:48
File: 20240604_214955_SN29_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-10N_594_4136.txt, Date: 2024-06-04 21:49:55
File: 20240605_212717_SN29_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_718_3608.txt, Date: 2024-06-05 21:27:17
File: 20240606_180251_SN33_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-10N_556_4180.txt, Date: 2024-06-06 18:02:51
File: 20240607_200250_SN27_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-10N_554_4172.txt, Date: 2024-06-07 20:02:50
File: 20240608_214614_SN29_QUICKVIEW_VISUAL_1_7_0_SATL-2KM-11N_248_4068.txt, Date: 2024-06-08 21:46:1