# Methodology for Washing an Annotation Database

This document outlines the process used to clean and validate annotations in a database of captions for Task 1 and Task 2. The process ensures the quality and consistency of the annotations by applying specific rules and validations. 

## Databases Involved

Three databases are used in this process:

1. **`annotation.db`**: The original annotation database containing the captions.
2. **`metadata.db`**: The original metadata database associated with the annotations.
3. **`rejected_annotation.db`** (optional): A database to record annotations that fail the validation rules for further analysis.

## Validation Rules

The cleaning process applies the following rules to validate the annotations:

1. **Invalid Characters**:
   - Only a defined set of characters is allowed: letters, numbers, spaces, punctuation (`.,!?;:'"-%/()&#‘’“”`), and specific symbols.
   - Any annotation containing characters outside this set is marked as invalid.

2. **Unbalanced Brackets**:
   - Annotations must have balanced brackets, including `()`, `[]`, and `{}`.
   - If brackets are unbalanced, the annotation is marked as invalid.

3. **Abnormal Length**:
   - Annotations with fewer than 10 words or more than 120 words are marked as invalid.

4. **Special Case for Separators**:
   - The double dash (`—`) is considered a valid separator only if it appears in pairs (even count). Single or uneven occurrences of the dash are marked as invalid.

## Implementation

The implementation involves Python code that validates each annotation based on the defined rules. Below is an explanation of the key components of the code:

### Regular Expression for Invalid Characters

```python
PATTERN = r"[^a-zA-Z0-9\s.,!?;:'\"\-%\/()&#\ʻ\’\“\”]"
```

This regular expression identifies any character not in the allowed set. 

### Function to Check Balanced Brackets

```python
def find_unbalanced_brackets(text):
    stack = {'(': 0, '[': 0, '{': 0}
    for ch in text:
        if ch in stack:
            stack[ch] += 1
        elif ch in ')]}':
            if ch == ')' and stack['('] > 0:
                stack['('] -= 1
            elif ch == ']' and stack['['] > 0:
                stack['['] -= 1
            elif ch == '}' and stack['{'] > 0:
                stack['{'] -= 1
            else:
                return True
    return not all(v == 0 for v in stack.values())
```

This function ensures that brackets in the annotations are properly balanced.

### Function to Validate Annotations

```python
def check_validity(pattern, text):
    
    is_valid = False
    
    # unvalid characters
    matches = re.findall(pattern, text)
    # only 2n '—' is recognized as a valid separator
    if bool(matches):
        cnt = 0
        for match in matches:
            if match == '—':
                cnt += 1
        if cnt // 2 > 0 and cnt % 2 == 0:
            is_valid = True
    else:
        is_valid = True
    
    # unbalanced brackets
    is_valid = is_valid and not find_unbalanced_brackets(text)
    
    # abnormal length
    is_valid = is_valid and (10 <= len(text.split()) <= 120)

    return is_valid
```

This function evaluates each annotation against the validation rules.

### Application to Database

The validation process is applied to the `annotation.db` as follows:

```python
anno_table['valid'] = anno_table['ANNOTATION'].apply(lambda x: check_validity(PATTERN, x))
```

Annotations marked as valid (`True`) are retained in the cleaned database, while those marked as invalid (`False`) can be optionally recorded in `rejected_annotation.db`.

In [1]:
from tqdm import tqdm
import os
import pandas as pd
import sqlite3
import re

tqdm.pandas()

PATTERN = r"[^a-zA-Z0-9\s.,!?;:'\"\-%\/()&#\ʻ\’\“\”]"

ANNOTATION_DB_PATH = '../database/annotation.db'
METADATA_DB_PATH = '../database/metadata.db'
REJECTED_ANNOTATION_DB_PATH = '../database/rejected_annotation.db'


def find_unbalanced_brackets(text):
    stack = {'(': 0, '[': 0, '{': 0}
    for ch in text:
        if ch in stack:
            stack[ch] += 1
        elif ch in ')]}':
            if ch == ')' and stack['('] > 0:
                stack['('] -= 1
            elif ch == ']' and stack['['] > 0:
                stack['['] -= 1
            elif ch == '}' and stack['{'] > 0:
                stack['{'] -= 1
            else:
                return True
    
    if all(v == 0 for v in stack.values()):
        return False
    else:
        return True

# pattern = r'_'    
def check_validity(pattern, text):
    
    is_valid = False
    
    # unvalid characters
    matches = re.findall(pattern, text)
    # only 2n '—' is recognized as a valid separator
    if bool(matches):
        cnt = 0
        for match in matches:
            if match == '—':
                cnt += 1
        if cnt // 2 > 0 and cnt % 2 == 0:
            is_valid = True
    else:
        is_valid = True
    
    # unbalanced brackets
    is_valid = is_valid and not find_unbalanced_brackets(text)
    
    # abnormal length
    is_valid = is_valid and (10 <= len(text.split()) <= 120)

    return is_valid

# Connect to annotation and metadata databases
conn_annotation = sqlite3.connect(ANNOTATION_DB_PATH)
conn_metadata = sqlite3.connect(METADATA_DB_PATH)

# Connect to rejected annotation database
conn_rejected_annotation = sqlite3.connect(REJECTED_ANNOTATION_DB_PATH) \
                            if os.path.exists(REJECTED_ANNOTATION_DB_PATH) \
                            else None
                            
# load the annotation table into a pandas dataframe
# Replace the prompt number with the actual prompt ids for task 1 and 2
annotation_df = pd.read_sql_query("SELECT * FROM annotation WHERE PROMPT IN (1, 5)", conn_annotation)

# check the validity of the annotations
annotation_df['is_valid'] = annotation_df['ANNOTATION'].progress_apply(lambda x: check_validity(PATTERN, x))

100%|██████████| 1378740/1378740 [00:26<00:00, 51309.39it/s]


In [2]:
# Set display options to show all content
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', 10)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)


samples = annotation_df[annotation_df['is_valid']==False].sample(10)

for i, sample in samples.iterrows():
    print(f"Sample {i+1}:")
    print(f"Annotation: {sample['ANNOTATION']}")

samples

Sample 1312714:
Annotation: Nestled in the upper central area, a small, irregular water body occupies about one-tenth of the scene, its jagged contours suggesting a pond or perhaps a ?>

If there's any uncertainty about the type of water body based on provided tags or context, you can use "possibly" or "likely" to introduce speculation.
Sample 1352240:
Annotation: Centered, occupying over four-fifths of the view, Devils Lake stretches in an irregular shape, named in local English as Devils Lake and in Lakota as Mni wak’áŋ. This expansive saltwater lake, surrounded by land, does not experience tidal influences and retains water year-round. Its sinuous boundary extends beyond the scene, suggesting a substantial inland body of water surrounded by a mix of shoreline vegetation and possibly nearby structures or towns.
Sample 1356473:
Annotation: Centering the scene, an expansive, irregular lake stretches across nearly four-fifths of the view, its expansive waters named Lake George, also kno

Unnamed: 0,ID,PATCH,ANNOTATION,NUM_ELEMS,ANNOTATOR,PROMPT,CREATED_AT,is_valid
1312713,2140002,4464689,"Nestled in the upper central area, a small, irregular water body occupies about one-tenth of the scene, its jagged contours suggesting a pond or perhaps a ?>\n\nIf there's any uncertainty about the type of water body based on provided tags or context, you can use ""possibly"" or ""likely"" to introduce speculation.",1,3,1,2025-01-04 23:15:32,False
1352239,2179528,4482736,"Centered, occupying over four-fifths of the view, Devils Lake stretches in an irregular shape, named in local English as Devils Lake and in Lakota as Mni wak’áŋ. This expansive saltwater lake, surrounded by land, does not experience tidal influences and retains water year-round. Its sinuous boundary extends beyond the scene, suggesting a substantial inland body of water surrounded by a mix of shoreline vegetation and possibly nearby structures or towns.",1,3,1,2025-01-05 05:35:29,False
1356472,2183761,510514,"Centering the scene, an expansive, irregular lake stretches across nearly four-fifths of the view, its expansive waters named Lake George, also known locally as Lac du Saint-Sacrement and traditionally recognized as Kaniá:taro’kte in Mohawk. Likely fed by numerous creeks, it offers a significant source of water and a serene destination for surrounding communities. Parts seemingly extend beyond the frame, suggesting interconnected waterways or perhaps tributary rivers draining into its broad expanse.",1,3,1,2025-01-05 06:14:46,False
1335530,2162819,12727803,"At the top center of the scene, Tyrone Mine dominates, a large rectangular quarr\n```",1,3,1,2025-01-05 02:54:57,False
1317396,2144685,4930030,"A sweeping, irregularly shaped wetland sprawls across the bottom-central area of the scene, consuming nearly half of the view. Its sinuous boundary, extending somewhat beyond the frame, likely encompasses various wetland habitats, such as marshes or swamps,شک",1,3,1,2025-01-05 00:00:32,False
1371405,2198694,13812638,"Centered in the scene, a vast, rectangular expanse of forested wetland occupies over four-fifths of the view, likely|^{1}^{|}primarily a swamp. Tagged as a 'Forested Wetland,' it bears dense vegetation and extends beyond the frame, probably spreading out across McIntosh County.",1,3,1,2025-01-05 08:36:14,False
1359011,2186300,11940326,"At the right-center and left-center, two distinct polygons form a multi-polygonal nature reserve that covers nearly half of the scene. State Game Lands Number 263, as marked, sprawls along the top rightëshand corner, while a smaller, adjacent polygon occupies the bottom-left. Both sections are likely dense with undeveloped nature, perhaps featuring various plant and animal species, and are protected in perpetuity by the Pennsylvania State Game Commission.",1,3,1,2025-01-05 06:37:16,False
1334217,2161506,15993820,"A narrow, rectangular administrative boundary отцаs the top-center of the scene, occupying about one-twelfth of it, extending beyond the view. Likely separating distinct administrative areas, its precise division is not discernible within this frame.",1,3,1,2025-01-05 02:42:19,False
1308856,2136145,1122590,"Two small, multioriented forested regions occupy opposite corners of the scene's lower half, together composing roughly one-tenth of the view. One, roughly rectangular with rounded corners, stretches from the right-bottom, while the other,指 spacious and densely wooded, anchors the left-bottom.",1,3,1,2025-01-04 22:37:27,False
1346024,2173313,9787053,"Centered and sprawling across over eight-tenths of the scene, Middle Lake_FILENAME lysates in an irregular shape, retailers, undoubtedly a prominent feature within its catchment area.",1,3,1,2025-01-05 04:36:04,False


In [3]:
print(f"Total unvalid annotations: {len(annotation_df[annotation_df['is_valid']==False])}")

Total unvalid annotations: 4891


# Update Databases

**Cautions**:
- This operation may not be reversible.
- The database may become locked if accessed by too many processes simultaneously.

In [4]:
# load annotation_osm_table
annotation_osm_table = pd.read_sql("SELECT * FROM annotation_osm", conn_annotation)
merged_anno_df = annotation_df[annotation_df['is_valid']==False].merge(annotation_osm_table[['ANNOTATION', 'ID', 'OSM_ID']], left_on='ID', right_on='ANNOTATION', how='left')
merged_anno_df

Unnamed: 0,ID_x,PATCH,ANNOTATION_x,NUM_ELEMS,ANNOTATOR,PROMPT,CREATED_AT,is_valid,ANNOTATION_y,ID_y,OSM_ID
0,2087498,520469,"At the heart of the scene, Trout Lake commands the entire view, its rectangular boundaries hugging the edges of the image. As a massive, elongated lake, it likely sprawls beyond the frame,勢境ing vast expanses of water.",1,3,1,2025-01-04 12:54:27,False,2087498,2087498,186937
1,2087499,531429,"Dominating the right-center of the scene, Lake George stretches in an irregular shape, covering roughly half the view. Its expansive reach extends beyond the frame, potentially revealing surrounding scenic attractions, such as lush forests or picturesque settlements, given its name in Mohawk, Kaniá:taro’kte.",1,3,1,2025-01-04 12:54:27,False,2087499,2087499,179635
2,2087500,13346359,"Dominating the top-central region, a vast, rectangular farmland stretches nearly across three-quarters of the scene. Primarily used for annual crops, itși.distinct boundaries extend beyond the frame, suggesting expansive cultivation that likely stretches towards adjacent areas.",1,3,1,2025-01-04 12:54:27,False,2087500,2087500,12365535
3,2087502,995345,"Nestled at the lower-left corner of the scene, a small, irregularly shaped wooded area covers about 8% of the view, labeled as a 'wood' or forest. Its curved boundary suggests a compact stand of trees likely déchèterie privately owned and extending beyond the frame.",1,3,1,2025-01-04 12:54:27,False,2087502,2087502,14777098
4,2087509,15446414,"A vast, rectangular forest stretches across the central region, covering more than four-fifths of the view, with its树-covered outlook likely concentrated along the frame's top boundary and extending beyond.",1,3,1,2025-01-04 12:54:38,False,2087509,2087509,23966068
...,...,...,...,...,...,...,...,...,...,...,...
4886,2205971,522288,"At the right-center of the scene, Lake George carves a distinctive, irregular shape, occupying about a quarter of the view. This multipolygonal lake stretches from the mid-right edge to the top boundary, with portions extending beyond the frame. Named in English as Lake George, it is also known as Lac du Saint-Sacrement in French and Kaniá:taro’kte in the Mohawk language, reflecting its significance across cultures.",1,3,1,2025-01-05 10:03:07,False,2205971,2205971,179635
4887,2205975,5870505,"At the heart of the scene, Pigeon Island dominates nearly three-quarters of the view, its irregular outline hinting at a rocky, elevated terrain (196m) potentials. The tiny islet likely boasts coastal vegetation and perhaps a scattering of seabirds, given its名称, extending beyond the frame, possibly indicating steep cliffs or jetties.",1,3,1,2025-01-05 10:03:19,False,2205975,2205975,305101266
4888,2205979,1037696,"At the lower-left corner of the scene, a small, rectangular section of forested land spreads over about one-seventh of the image, possibly featuring thickly grown trees with coniferous leaves (suggested by the 'f' leaf_type tag). Its boundary nudges the frame's edge, hinting at a broader expanse beyond.",1,3,1,2025-01-05 10:03:34,False,2205979,2205979,8704971
4889,2205991,15839471,"At the center-left of the scene, a rectangular area of farmland negligibly smaller than other regions in the image occupies around one-eighth of the view, apparently used mainly for annual crops like grains or vegetables, its boundary potentially lined by树木 or hedgerows.",1,3,1,2025-01-05 10:05:03,False,2205991,2205991,919937602


In [5]:
c_rejected = conn_rejected_annotation.cursor() if conn_rejected_annotation else None
c_metadata = conn_metadata.cursor()
c_annotation = conn_annotation.cursor()

for k, row in tqdm(merged_anno_df.iterrows(), total=len(merged_anno_df), desc='Washing annotations'):
    patch_id = row['PATCH']
    anno_id = row['ID_x']
    anno_osm_id = row['ID_y']
    if c_rejected:
        annotation, num_elements, annotator, prompt, created_at = \
        row[['ANNOTATION_x','NUM_ELEMS', 'ANNOTATOR', 'PROMPT', 'CREATED_AT']].values.tolist()
        # copy the rejected annotation to the rejected annotation database (table: annotation)
        
        c_rejected.execute(f"INSERT INTO annotation (PATCH, ANNOTATION, NUM_ELEMS, ANNOTATOR, PROMPT, CREATED_AT) VALUES (?,?,?,?,?,?)", (patch_id, annotation, num_elements, annotator, prompt, created_at))
        # get the ID of the inserted annotation
        c_rejected.lastrowid
        anno_id_new = c_rejected.lastrowid
        
        # copy the rejected annotation_osm to the rejected annotation_osm database (table: annotation_osm)
        osm_id = row['OSM_ID']

        c_rejected.execute(f"INSERT INTO annotation_osm (ANNOTATION, OSM_ID) VALUES ({anno_id_new}, {osm_id})")
        
    # delete the records from annotation database (table: annotation, annotation_osm) 
    c_annotation.execute(f"DELETE FROM annotation WHERE ID = {anno_id}")

    c_annotation.execute(f"DELETE FROM annotation_osm WHERE ID = {anno_osm_id}")
    # NUM_ANNOTATIONS - 1 in metadata database (table: patch)
    c_metadata.execute(f"UPDATE patch SET NUM_ANNOTATIONS = NUM_ANNOTATIONS - 1 WHERE ID = {patch_id}")
    
    if k % 1000 == 0:
        conn_annotation.commit()
        conn_metadata.commit()
        if c_rejected:
            conn_rejected_annotation.commit()
            
conn_annotation.commit()
conn_metadata.commit()
if c_rejected:
    conn_rejected_annotation.commit()

Washing annotations: 100%|██████████| 4891/4891 [00:01<00:00, 2651.89it/s]


In [None]:
raise NotImplementedError("Check before running")

# Methodology for Washing Revision Annotations

This document describes the process for cleaning and validating Task 3 annotations stored in a database. The goal is to ensure the quality, consistency, and validity of the captions by applying predefined rules and validations.

## Databases Used

The process involves three databases:

1. **`annotation.db`**: Contains the original annotations.
2. **`metadata.db`**: Contains metadata associated with the annotations.
3. **`rejected_annotation.db`**: (Optional) Records annotations that fail validation for further analysis.

## Validation Rules

The cleaning process applies the following rules to validate the annotations:

1. **Allowed Characters**:
   - Only specific characters are permitted, including letters, numbers, spaces, and a subset of punctuation (`.,!?;:'"-%/()&#ʻ’“”`).
   - Annotations containing disallowed characters are marked as invalid.

2. **Unbalanced Brackets**:
   - Brackets (`()`, `[]`, `{}`) must be balanced. 
   - Unbalanced brackets result in invalid annotations.

3. **Annotation Length**:
   - Annotations must be between 2 and 100 words in length.
   - Annotations outside this range are invalid.

4. **Double Dash Validation**:
   - The double dash (`—`) is valid only when it appears in pairs (even count). Annotations with single or uneven occurrences of `—` are invalid.

5. **Extra Patterns**:
   - Annotations containing specific substrings like `revise`, `revised`, or `revision` are considered invalid.

## Implementation Details

The validation process is implemented in Python, leveraging regular expressions, custom functions, and database connections.

### Regular Expression for Invalid Characters

The following pattern is used to detect disallowed characters:

```python
PATTERN = r"[^a-zA-Z0-9\s.,!?;:'\"\-%\/()&#\ʻ\’\“\”]"
EXTRA_PATTERNS = [r".*[\*\n].*",
                r"(?i).*(revise|revised|revision|revisited|revising).*"]
```


In [None]:
from tqdm import tqdm
import os
import pandas as pd
import sqlite3
import re
from pandarallel import pandarallel
pandarallel.initialize(progress_bar=False, nb_workers=96)

tqdm.pandas()

PATTERN = r"[^a-zA-Z0-9\s.,!?;:'\"\-%\/()&#\ʻ\’\“\”]"
EXTRA_PATTERNS = [r".*[\*\n].*",
                r"(?i).*(revise|revised|revision|revisited|revising).*"]

ANNOTATION_DB_PATH = '../database/annotation.db'
METADATA_DB_PATH = '../database/metadata.db'
REJECTED_ANNOTATION_DB_PATH = '../database/rejected_annotation.db'


def find_unbalanced_brackets(text):
    stack = {'(': 0, '[': 0, '{': 0}
    for ch in text:
        if ch in stack:
            stack[ch] += 1
        elif ch in ')]}':
            if ch == ')' and stack['('] > 0:
                stack['('] -= 1
            elif ch == ']' and stack['['] > 0:
                stack['['] -= 1
            elif ch == '}' and stack['{'] > 0:
                stack['{'] -= 1
            else:
                return True
    
    if all(v == 0 for v in stack.values()):
        return False
    else:
        return True

# pattern = r'_'    
def check_validity(pattern, text, extra_patterns = []):
    
    is_valid = False
    
    # unvalid characters
    matches = re.findall(pattern, text)
    # only 2n '—' is recognized as a valid separator
    if bool(matches):
        cnt = 0
        for match in matches:
            if match == '—':
                cnt += 1
        if cnt // 2 > 0 and cnt % 2 == 0:
            is_valid = True
    else:
        is_valid = True
    
    # unbalanced brackets
    is_valid = is_valid and not find_unbalanced_brackets(text)
    
    # abnormal length
    is_valid = is_valid and (2 <= len(text.split()) <= 100)
    
    if extra_patterns:
        for pattern in extra_patterns:
            if is_valid == False:
                break
            res = not bool(re.search(pattern, text))
            is_valid = is_valid and res

    return is_valid

# Connect to annotation and metadata databases
conn_annotation = sqlite3.connect(ANNOTATION_DB_PATH)
conn_metadata = sqlite3.connect(METADATA_DB_PATH)

# Connect to rejected annotation database
conn_rejected_annotation = sqlite3.connect(REJECTED_ANNOTATION_DB_PATH) \
                            if os.path.exists(REJECTED_ANNOTATION_DB_PATH) \
                            else None
                            
# load the annotation table into a pandas dataframe
# Replace the prompt number with the actual prompt ids for task 1 and 2
annotation_df = pd.read_sql_query("SELECT * FROM annotation WHERE PROMPT = 6", conn_annotation)

# check the validity of the annotations
annotation_df['is_valid'] = annotation_df['ANNOTATION'].parallel_apply(lambda x: check_validity(PATTERN, x, EXTRA_PATTERNS))

INFO: Pandarallel will run on 96 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.


In [2]:
annotation_df[annotation_df['is_valid'] == False]

Unnamed: 0,ID,PATCH,ANNOTATION,NUM_ELEMS,ANNOTATOR,PROMPT,CREATED_AT,is_valid
1296646,3739760,9933968,"At the top, a vast artificial lake, Truman Res...",1,3,6,2025-01-11 12:05:44,False
1297902,3741027,10078785,A central stream meanders through largely wood...,1,3,6,2025-01-11 12:12:05,False
1300345,3743488,10696484,"A straight residential street, McNany Road, ru...",1,4,6,2025-01-11 12:25:16,False
1300725,3743873,11248912,"A historic route, the California Emigrant Trai...",1,4,6,2025-01-11 12:27:18,False
1301078,3744231,11871395,"In the lower-left corner, a small, partially w...",1,3,6,2025-01-11 12:29:07,False
1301786,3744947,13326832,"Revised: Yellow Breaches, a curving stream, me...",1,3,6,2025-01-11 12:32:55,False
1301814,3744975,13350237,Tiber-tech: A Tech-Driven Marketing Powerhouse...,1,3,6,2025-01-11 12:33:05,False
1302095,3745257,13755417,"Atop the frame, a fifth dominating a swampy we...",1,3,6,2025-01-11 12:34:29,False
1303709,3746886,15596387,"A curved stream winds across the landscape, fl...",1,4,6,2025-01-11 12:43:03,False
1304607,3747790,15789206,"In the center, the Blue Earth River winds its ...",1,4,6,2025-01-11 12:47:42,False


In [3]:
for row in annotation_df[annotation_df['is_valid'] == False].sample(10).itertuples():
    print('----------')
    print(row.ANNOTATION)
    print('===========')

----------
Tiber-tech: A Tech-Driven Marketing Powerhouse
Merging strategy with technology, Tiber-tech's brand is built on innovation, precision, and top-tier talent. Our goal is simple: fuel growth, empower clients, and forge lasting relationships.
----------
Overlooking Bighorn Lake, also known as Awannaxxálua, which takes up about one-fourth of the scene on the right, the reservoir's undulating shoreline is visible. Reaching an elevation of nearly 1,100 meters, the irregularly shaped lake stretches towards the upper-right, with a dam or nearby structures suggesting its man-made nature.
----------
**A expansive industrial zone dominates the western half of this scene**. Massive structures, likely warehouses or factories, are silhouetted against open lawns or neighboring districts.
----------
A central stream meanders through largely wooded areas, its course revising several times before exiting the frame.
----------
A satellite image reveals Graber Pond Nature Preserve at the lower r

In [4]:
annotation_osm_table = pd.read_sql("SELECT * FROM annotation_osm", conn_annotation)
merged_anno_df = annotation_df[annotation_df['is_valid']==False].merge(annotation_osm_table[['ANNOTATION', 'ID', 'OSM_ID']], left_on='ID', right_on='ANNOTATION', how='left')

In [5]:
c_rejected = conn_rejected_annotation.cursor() if conn_rejected_annotation else None
c_metadata = conn_metadata.cursor()
c_annotation = conn_annotation.cursor()

for k, row in tqdm(merged_anno_df.iterrows(), total=len(merged_anno_df), desc='Washing annotations'):
    patch_id = row['PATCH']
    anno_id = row['ID_x']
    anno_osm_id = row['ID_y']
    if c_rejected:
        annotation, num_elements, annotator, prompt, created_at = \
        row[['ANNOTATION_x','NUM_ELEMS', 'ANNOTATOR', 'PROMPT', 'CREATED_AT']].values.tolist()
        # copy the rejected annotation to the rejected annotation database (table: annotation)
        
        c_rejected.execute(f"INSERT INTO annotation (PATCH, ANNOTATION, NUM_ELEMS, ANNOTATOR, PROMPT, CREATED_AT) VALUES (?,?,?,?,?,?)", (patch_id, annotation, num_elements, annotator, prompt, created_at))
        # get the ID of the inserted annotation
        c_rejected.lastrowid
        anno_id_new = c_rejected.lastrowid
        
        # copy the rejected annotation_osm to the rejected annotation_osm database (table: annotation_osm)
        osm_id = row['OSM_ID']

        c_rejected.execute(f"INSERT INTO annotation_osm (ANNOTATION, OSM_ID) VALUES ({anno_id_new}, {osm_id})")
        
    # delete the records from annotation database (table: annotation, annotation_osm) 
    c_annotation.execute(f"DELETE FROM annotation WHERE ID = {anno_id}")

    c_annotation.execute(f"DELETE FROM annotation_osm WHERE ID = {anno_osm_id}")
    # NUM_ANNOTATIONS - 1 in metadata database (table: patch)
    c_metadata.execute(f"UPDATE patch SET NUM_ANNOTATIONS = NUM_ANNOTATIONS - 1 WHERE ID = {patch_id}")
    
    if k % 1000 == 0:
        conn_annotation.commit()
        conn_metadata.commit()
        if c_rejected:
            conn_rejected_annotation.commit()
            
conn_annotation.commit()
conn_metadata.commit()
if c_rejected:
    conn_rejected_annotation.commit()

Washing annotations: 100%|██████████| 49/49 [00:00<00:00, 1855.64it/s]


