### Notebook to parse text files to produce cleaned text of RAD decisions

Sean Rehaag

License: Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). 

Dataset & Code to be cited as:

Sean Rehaag, "Refugee Appeal Division Bulk Decisions Dataset" (2023), online: Refugee Law Laboratory <https://refugeelab.ca/bulk-data/rad/>.

Notes:

(1) Data Source: Immigration and Refugee Board. In the Fall of 2022, the IRB added the Refugee Law Laboratory to their email distribution list for legal publishers of RAD decisions. The RLL therefore receives new RAD cases as they are released for publication by the IRB. Also, in the fall of 2022 the Immigration and Refugee Board provided the RLL with a full backlog of approximately 116k published decisions from all divisions (RAD, RPD, ID, IAD). 

(2) Unofficial Data: The data are unofficial reproductions. For official versions, please contact the Immigration and Refugee Board. 

(3) Non-Affiliation / Endorsement: The data has been collected and reproduced without any affiliation or endorsement from the Immigration and Refugee Board.

(4) Non-Commerical Use: As indicated in the license, data may be used for non-commercial use (with attribution) only. For commercial use, please contact the Immigration and Refugee Board. 

(5) Accuracy: Data was collected and processed programmatically for the purposes of academic research. While we make best efforts to ensure accuracy, data gathering of this kind inevitably involves errors. As such the data should be viewed as preliminary information aimed to prompt further research and discussion, rather than as definitive information.

Acknowledgements: Thanks to Rafael Dolores for coding the parsing scripts.


# Installing Libraries

In [32]:
!pip install langdetect
!pip install regex



# Importing Libraries

In [14]:
import os
import regex as re 
import pandas as pd
from datetime import datetime
from langdetect import detect
from difflib import get_close_matches
import json

## Declaring Constant
Here, we specify the directories containing our data files.

In [23]:
DATA_DIRS = ["../RAD Decisions TEXT"] #  "../RAD Decisions TEXT"] 

## Language Detection
This function determines the language of a given text.

In [24]:
def detect_language(text):
    try:
        return detect(text)
    except:
        return None

## Decision Maker Extraction
This function searches the given file for the decision maker using regular expressions.

In [25]:
def extract_decision_maker(content):
    patterns = [
        # String in line immediately after 'Panel' and before 'Tribunal', allowing tabs and spaces
        r"^Panel\s*([^\n]+?)\s*\n\s*Tribunal\s*$",  
      
        # String in line immediately after 'Tribunal' and before 'Panel', allowing tabs and spaces
        r"^Tribunal\s*([^\n]+?)\s*\n\s*Panel\s*$",
        # String in line immediately after 'Tribunal' and followed by another 'Tribunal', allowing tabs and spaces
        r"^Tribunal\s*([^\n]+?)\s*\n\s*Tribunal\s*$"
    ]

    for pattern in patterns:
        # Use re.MULTILINE to allow ^ and $ to match the start and end of each line
        match = re.search(pattern, content, re.IGNORECASE | re.MULTILINE)
        if match:
            captured = match.group(1).strip()
            # Check if captured group ends with 'Tribunal' or 'Panel' and exclude it
            if not captured.endswith("Tribunal") and not captured.endswith("Panel"):
                return captured
    return None



## Regular Expression Detector
Functions to parse the date from text files while accounting for several different formats

In [26]:
def match_date_patterns(content):
    patterns = {
        "custom": (r"Date (?:of decision|de la décision)\s*\n\s*([A-Za-z]+)\s+(\d{1,2})\.\s*(\d{4})", lambda m: [m.group(1), m.group(2), m.group(3)]),
        "primary": (r"Date (?:of decision|de la décision)\s*(?:Le )?\s*((?:(?:\d{1,2}|1er)\s+[\w]+\s*,?\s*\d{1,4})|\w+\s+\d{1,2}(?:st|nd|rd|th)?\s*,?\s*\d{1,4}|\d{1,2}-\d{1,2}-\d{1,4})", lambda m: m.group(1).replace(',', '').split()),
        "original_decision": (r"Date of decision\s+([\w\s]+),\s+(\d{4})\s+\(original decision\)", lambda m: m.group(1).strip().split() + [m.group(2).strip()]),
        "tribunal": (r"Tribunal\s*\n\s*([\w\s]+?)\s*\n\s*Date of decision", lambda m: m.group(1).replace(',', '').split()),
        "original": (r"Original\s+([\w]+\s+\d{1,2}(?:st|nd|rd|th)?,?\s+\d{4})", lambda m: m.group(1).replace(',', '').split())
    }

    for key, (pattern, process) in patterns.items():
        match = re.search(pattern, content, re.IGNORECASE)
        if match:
            return process(match)

    return None

## Date Formatter
Takes detected regular expression and converts into one common format

In [27]:
french_to_english = {
        'janvier': 'January', 'fevrier': 'February', 'mars': 'March', 'avril': 'April',
        'mai': 'May', 'juin': 'June', 'juillet': 'July', 'aout': 'August',
        'septembre': 'September', 'octobre': 'October', 'novembre': 'November', 'decembre': 'December'
}

def correct_month_name(misspelled_month, possibilities=['Janvier','January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'], cutoff=0.6):
    correct_months = get_close_matches(misspelled_month, possibilities, n=1, cutoff=cutoff)
    if correct_months:
        corrected_month = correct_months[0]
        # Check if the corrected month is in the French to English mapping
        return french_to_english.get(corrected_month.lower(), corrected_month)
    else:
        return misspelled_month

def correct_year_typo(year):
    if len(year) == 3 and year.startswith("0"):
        return "20" + year[1:]
    return year

def correct_year_typo(year):
    """Corrects year format typos."""
    return "20" + year[1:] if len(year) == 3 and year.startswith("0") else year

def process_numeric_format(parts):
    """Processes numeric date format 'dd-mm-yyyy'."""
    day, month, year = parts[0].split('-')
    year = correct_year_typo(year)
    return datetime(int(year), int(month), int(day)).date().strftime('%Y-%m-%d')

def process_day_first_format(parts, french_month_mapping):
    """Processes dates in 'day month year' format, French or English."""
    day = 1 if parts[0].lower() == '1er' else int(parts[0])

    month = ''
    # Check if month and year are concatenated
    if len(parts) == 2 and not parts[1].isdigit():
        month_year_str = parts[1]
        for i in range(1, len(month_year_str)):
            if month_year_str[i:].isdigit():
                month_str, year_str = month_year_str[:i], month_year_str[i:]
                year = correct_year_typo(year_str)
                month = french_month_mapping.get(month_str.lower().replace('é', 'e').replace('û', 'u').replace('ô', 'o'), month_str.capitalize())
                break
    else:
        month = parts[1].lower().replace('é', 'e').replace('û', 'u').replace('ô', 'o')
        year = correct_year_typo(parts[2])

    if month in french_month_mapping:
        return datetime(int(year), french_month_mapping[month], day).date().strftime('%Y-%m-%d')
    else:
        if isinstance(month, int):
            return datetime(int(year), month, day).date().strftime('%Y-%m-%d')
        
        corrected_month = correct_month_name(month.capitalize())
        try:
            return datetime.strptime(f"{corrected_month} {day} {year}", '%B %d %Y').date().strftime('%Y-%m-%d')
        except ValueError as e:
            print(f"Error parsing date: {e}")
            return None

def process_month_first_format(parts):
    """Processes month first format with possible ordinal suffix."""
    day = 0
    month = ''
    year = ''
    
    if len(parts) == 2 and parts[1].isdigit() and len(parts[1]) > 2:
        
        if parts[1].isdigit() and len(parts[1]) > 4: 
            month = parts[0]
            year_str = parts[1][-4:]
            day_str = parts[1][:-4]
            year = year_str
            day = int(day_str)
            
        elif parts[1].isdigit()and len(parts[1]) > 3: #Year is the second entry
            month_day_str = parts[0]
            for i in range(1, len(month_day_str)):
                if not month_day_str[i].isdigit():
                    day_str, month_str = month_day_str[:i], month_day_str[i:]
                    day = int(day_str)
                    month = french_to_english.get(month_str.lower().replace('é', 'e').replace('û', 'u').replace('ô', 'o'), month_str)
                    parts[0] = month
                    break
            year = parts[1]
        else:
            year_str = parts[1][-4:]
            day_str = parts[1][:-4]
            year = correct_year_typo(year_str)
            day = int(day_str)

    else:
        day = re.sub(r"[^\d]", "", parts[1])
        day = int(day) if day.isdigit() else 1
        year = correct_year_typo(parts[2])
        
    
    try:
        corrected_month = correct_month_name(parts[0].capitalize())
        return datetime.strptime(f"{corrected_month} {day} {year}", '%B %d %Y').date().strftime('%Y-%m-%d')
    except ValueError as e:
        print(f"Error parsing date: {e}")
        return None


## Document Date Extraction
This function searches the given file for the document date using regular expressions, taking into account both French and English texts.

In [28]:
def process_date_parts(parts, french_month_mapping):
    """Determines the correct date processing method based on the format of the parts."""
    if '-' in parts[0]:
        return process_numeric_format(parts)
    elif parts[0].isdigit() or parts[0].lower() == '1er':
        return process_day_first_format(parts, french_month_mapping)
    else:
        return process_month_first_format(parts)

def extract_document_date(content):
    french_month_mapping = {
        'janvier': 1, 'fevrier': 2, 'mars': 3, 'avril': 4,
        'mai': 5, 'juin': 6, 'juillet': 7, 'aout': 8,
        'septembre': 9, 'octobre': 10, 'novembre': 11, 'decembre': 12
    }
    
    parts = match_date_patterns(content)
    if not parts:
        return None
    return process_date_parts(parts, french_month_mapping)

## File Processor Helpers

In [29]:
def extract_rad_number(content):
    """Extracts the RAD number from the content, ignoring IAD files."""
    # Check for lines indicating the file should be ignored
    ignore_lines = ["IAD File", "IMMIGRATION APPEAL DIVISION"]
    for line in content.splitlines():
        if any(ignore_line in line for ignore_line in ignore_lines):
            return None

        if "RAD File" in line:
            rad_number_match = re.search(r"([A-Z]{2}\d+-\d+)", line)
            if rad_number_match:
                return rad_number_match.group(1)
    return None

def process_file(file_path):
    """Processes a single file and extracts data."""
    with open(file_path, 'r', errors='replace') as file:
        content = file.read()

    rad_number = extract_rad_number(content)
    if rad_number:
        lang = detect_language(content)
        decision_maker_name = extract_decision_maker(content)
        document_date = extract_document_date(content)
        year = int(document_date.split('-')[0]) if document_date else None

        return {
            'citation1': rad_number,
            'citation2': '',
            'dataset': 'RAD',
            'name': '',
            'source_url': os.path.basename(file_path),
            'scraped_timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
            'document_date': document_date,
            'year': year,
            'unofficial_text': '',
            'language': lang,
            'other': json.dumps({'decision-maker_name': decision_maker_name}, ensure_ascii=False),
        }
    return None

## Processing Files
This block of code reads each file in the dataset directories to extract the needed information, using the previously defined functions and form a Pandas dataframe which is provided in a csv.

In [30]:
# Main data processing loop
data_records = []

for data_dir in DATA_DIRS:
    if os.path.exists(data_dir) and os.path.isdir(data_dir):
        for file_name in os.listdir(data_dir):
            print(f"Parsing file: {file_name}")
            if not file_name.startswith('~'):
                file_path = os.path.join(data_dir, file_name)
                record = process_file(file_path)
                if record:
                    data_records.append(record)

Parsing file: AppToReopenTB8-14702a.txt
['December', '8', '2020']
MONTH FIRST FORMAT
Parsing file: AppToReopenTB8-14702tf.txt
['8', 'décembre', '2020']
DAY FIRST FORMAT
MONTH:  decembre
Parsing file: Decision concerning G. Bazin (finale version) (amended 5.5.2022) (003) - sanitized.txt
Parsing file: Decision G. Bazin_ francais_modifiee (003) - sanitized.txt
Parsing file: MB7-00112a.txt
['September', '15', '2020']
MONTH FIRST FORMAT
Parsing file: MB7-00112tf.txt
['15', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: MB7-03926f.txt
['26', 'octobre', '2020']
DAY FIRST FORMAT
MONTH:  octobre
Parsing file: MB7-03926ta.txt
['October', '26', '2020']
MONTH FIRST FORMAT
Parsing file: MB7-24221 f.txt
['1', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: MB7-24221ta.txt
['April', '1', '2021']
MONTH FIRST FORMAT
Parsing file: MB8-01089a.txt
['October', '1', '2020']
MONTH FIRST FORMAT
Parsing file: MB8-01089tf.txt
['1er', 'octobre', '2020']
DAY FIRST FORMAT
MONTH

['3', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: MB9-00271ta.txt
['September', '3', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-00390f.txt
['16', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: MB9-00390ta.txt
['September', '16', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-00408f.txt
['29', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: MB9-00408ta.txt
['September', '29', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-00409f.txt
['28', 'juillet', '2020']
DAY FIRST FORMAT
MONTH:  juillet
Parsing file: MB9-00409ta.txt
['July', '28', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-00752f.txt
['28', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: MB9-00752ta.txt
['September', '28', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-00810f.txt
['14', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: MB9-00810ta.txt
['August', '14', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-01120a.txt
['October'

Parsing file: MB9-08049tf.txt
['24', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: MB9-08198a.txt
['August', '4', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-08198tf.txt
['4', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: MB9-08365f.txt
['9', 'Septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: MB9-08365ta.txt
['September', '9', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-08500f.txt
['1er', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: MB9-08500ta.txt
['September', '1', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-08845 denovo a.txt
['April', '26', '2021']
MONTH FIRST FORMAT
Parsing file: MB9-08845 tf.txt
['26', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: MB9-08947f.txt
['10', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: MB9-08947ta.txt
['August', '10', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-08953f.txt
['13', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file:

['September', '14', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-13081tf.txt
['14', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: MB9-13083f.txt
['26', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: MB9-13083ta.txt
['August', '26', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-13102f.txt
['31', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: MB9-13102ta.txt
['August', '31', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-13218f.txt
['16', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: MB9-13218ta.txt
['September', '16', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-13246f.txt
['25', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: MB9-13246ta.txt
['August', '25', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-13382a.txt
['August', '4', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-13382tf.txt
['4', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: MB9-13394f.txt
['5', 'août', '2020']
DAY FIRST FORMAT
M

['August', '28', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-16259f.txt
['24', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: MB9-16259ta.txt
['September', '24', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-16265f.txt
['26', 'octobre', '2020']
DAY FIRST FORMAT
MONTH:  octobre
Parsing file: MB9-16265ta.txt
['October', '26', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-16266f.txt
['17', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: MB9-16266ta.txt
['August', '17', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-16269f.txt
['28', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: MB9-16269ta.txt
['August', '28', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-16429f.txt
['6', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: MB9-16429ta.txt
['August', '6', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-16438f.txt
['5', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: MB9-16438ta.txt
['August', '5', '2020']
MONTH FIRST FORMAT
Pars

['September', '9', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-20957tf.txt
['9', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: MB9-20958a.txt
['September', '30', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-20958tf.txt
['30', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: MB9-20959a.txt
['November', '24', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-20959tf.txt
['24', 'novembre', '2020']
DAY FIRST FORMAT
MONTH:  novembre
Parsing file: MB9-20971a.txt
['November', '24', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-20971tf.txt
['24', 'novembre', '2020']
DAY FIRST FORMAT
MONTH:  novembre
Parsing file: MB9-20992a.txt
['September', '3', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-20992tf.txt
['3', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: MB9-20994a.txt
['October', '16', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-20994tf.txt
['16', 'octobre', '2020']
DAY FIRST FORMAT
MONTH:  octobre
Parsing file: MB9-21004f.txt
['

['March', '12', '2021']
MONTH FIRST FORMAT
Parsing file: MB9-25183tf.txt
['12', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MB9-25193a.txt
['October', '16', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-25193tf.txt
['16', 'octobre', '2020']
DAY FIRST FORMAT
MONTH:  octobre
Parsing file: MB9-25322f.txt
['8', 'octobre', '2020']
DAY FIRST FORMAT
MONTH:  octobre
Parsing file: MB9-25322ta.txt
['October', '8', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-25324f.txt
['11', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: MB9-25324ta.txt
['September', '11', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-25330f.txt
['4', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: MB9-25330ta.txt
['September', '4', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-25477a.txt
['November', '2', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-25477tf.txt
['2', 'novembre', '2020']
DAY FIRST FORMAT
MONTH:  novembre
Parsing file: MB9-25479e.txt
['September', '11', '

['29', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MB9-27895ta.txt
['March', '29', '2021']
MONTH FIRST FORMAT
Parsing file: MB9-27900e.txt
['September', '15', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-27900tf.txt
['15', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: MB9-28232a.txt
['25', 'novembre', '2020']
DAY FIRST FORMAT
MONTH:  novembre
Parsing file: MB9-28232tf.txt
['November', '25', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-28277f .txt
['23', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MB9-28277ta.txt
['March', '23', '2021']
MONTH FIRST FORMAT
Parsing file: MB9-28279f.txt
['18', 'novembre', '2020']
DAY FIRST FORMAT
MONTH:  novembre
Parsing file: MB9-28279ta.txt
['November', '18', '2020']
MONTH FIRST FORMAT
Parsing file: MB9-28281f.txt
['30', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MB9-28281ta.txt
['March', '30', '2021']
MONTH FIRST FORMAT
Parsing file: MB9-28397f.txt
['30', 'octobre', '2020']
DAY FIR

['March', '25', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-02054f.txt
['30', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MC0-02054ta.txt
['March', '30', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-02055a.txt
['March', '22', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-02055tf.txt
['22', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MC0-02058f.txt
['19', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MC0-02058ta.txt
['March', '19', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-02182f .txt
['19', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MC0-02182ta.txt
['March', '19', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-02183f .txt
['23', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MC0-02183ta.txt
['March', '23', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-02199f.txt
['16', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MC0-02199ta.txt
['March', '16', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-02299f.

['March', '17', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-04565tf.txt
['17', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MC0-04566 f.txt
['26', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: MC0-04566ta.txt
['April', '26', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-04681 a.txt
['April', '20', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-04681tf.txt
['20', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: MC0-04685 f.txt
['27', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: MC0-04685ta.txt
['April', '27', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-04686 f.txt
['14', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: MC0-04686ta.txt
['April', '14', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-04698 a.txt
['April', '30', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-04698tf.txt
['30', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: MC0-04796 f.txt
['6', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril


['26', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MC0-08326ta.txt
['March', '26', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-08380 a.txt
['April', '26', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-08380 tf.txt
['26', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: MC0-08406 f.txt
['29', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: MC0-08406ta.txt
['April', '29', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-08418f .txt
['17', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MC0-08418ta.txt
['March', '17', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-08472 f.txt
['23', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: MC0-08472ta.txt
['April', '23', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-08482 f.txt
['24', 'août', '2021']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: MC0-08482ta.txt
['August', '24', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-08515a.txt
['March', '30', '2021']
MONTH FIRST FORMAT
Parsing file: 

['8', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MC0-09806ta.txt
['March', '8', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-09914 f.txt
['19', 'octobre', '2021']
DAY FIRST FORMAT
MONTH:  octobre
Parsing file: MC0-09914 ta.txt
['October', '19', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-09941a .txt
['March', '2', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-09941tf.txt
['2', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MC0-09960f .txt
['9', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MC0-09960ta.txt
['March', '9', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-09965f.txt
['9', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MC0-09965ta.txt
['March', '9', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-09983f.txt
['26', 'mai', '2021']
DAY FIRST FORMAT
MONTH:  mai
Parsing file: MC0-09983ta.txt
['May', '26', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-09990a.txt
['March', '3', '2021']
MONTH FIRST FORMAT
Parsing file: MC0-09990tf.tx

['August', '6', '2021']
MONTH FIRST FORMAT
Parsing file: MC1-01340tf.txt
['6', 'août', '2021']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: MC1-01356 f.txt
['9', 'août', '2021']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: MC1-01356ta.txt
['August', '9', '2021']
MONTH FIRST FORMAT
Parsing file: MC1-01400 f.txt
['10', 'août', '2021']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: MC1-01400ta.txt
['August', '10', '2021']
MONTH FIRST FORMAT
Parsing file: MC1-01560 f.txt
['27', 'octobre', '2021']
DAY FIRST FORMAT
MONTH:  octobre
Parsing file: MC1-01560ta.txt
['October', '27', '2021']
MONTH FIRST FORMAT
Parsing file: MC1-01602 f.txt
['6', 'août', '2021']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: MC1-01602ta.txt
['August', '6', '2021']
MONTH FIRST FORMAT
Parsing file: MC1-01850 a.txt
['August', '13', '2021']
MONTH FIRST FORMAT
Parsing file: MC1-01850tf.txt
['13', 'août', '2021']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: MC1-02084 a.txt
['December', '13', '2021']
MONTH FIRST FORMAT
Parsing fil

['27', 'janvier', '2022']
DAY FIRST FORMAT
MONTH:  janvier
Parsing file: MC1-08212ta.txt
['January', '27', '2022']
MONTH FIRST FORMAT
Parsing file: MC1-08838 f.txt
['21', 'janvier', '2022']
DAY FIRST FORMAT
MONTH:  janvier
Parsing file: MC1-08838ta.txt
['January', '21', '2022']
MONTH FIRST FORMAT
Parsing file: MC1-09022 a.txt
['March', '11', '2022']
MONTH FIRST FORMAT
Parsing file: MC1-09022tf.txt
['11', 'mars', '2022']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MC1-09244 f.txt
['27', 'avril', '2022']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: MC1-09244ta.txt
['April', '27', '2022']
MONTH FIRST FORMAT
Parsing file: MC1-09489 f.txt
Parsing file: MC1-09489 ta.txt
Parsing file: MC1-09544 f.txt
['2', 'mars', '2022']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: MC1-09544ta.txt
['March', '2', '2022']
MONTH FIRST FORMAT
Parsing file: MC1-09652 f.txt
['14', 'avril', '2022']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: MC1-09652ta.txt
['April', '14', '2022']
MONTH FIRST FORMAT
Parsing file: M

['24', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB8-10509a.txt
['August', '4', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-10509tf.txt
['4', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB8-10742a.txt
['August', '28', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-10742tf.txt
['28', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB8-10866 f.txt
['4', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB8-10866ta.txt
['September', '4', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-10873a.txt
['August', '31', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-10873tf.txt
['31', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB8-11176a.txt
['September', '25', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-11176tf.txt
['25', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB8-11442a.txt
['October', '15', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-11442tf.txt
Parsing file: TB8-11448a.txt
[

['21', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB8-15344a.txt
['August', '13', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-15344tf.txt
['13', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB8-15586a.txt
['August', '4', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-15586tf.txt
['4', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB8-15598a.txt
['October', '14', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-15598tf.txt
['14', 'octobre', '2020']
DAY FIRST FORMAT
MONTH:  octobre
Parsing file: TB8-15599a.txt
['September', '8', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-15599tf.txt
['8', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB8-15610a.txt
['August', '7', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-15610tf.txt
['7', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB8-15935a.txt
['September', '14', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-15935tf.txt
['14', 'septembre', '2020']
DAY FIRST

['10', 'décembre', '2020']
DAY FIRST FORMAT
MONTH:  decembre
Parsing file: TB8-21348a.txt
['September', '3', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-21348tf.txt
['3', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB8-21364a.txt
['August', '11', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-21364tf.txt
['11', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB8-22127a.txt
['September', '7', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-22127tf.txt
['7', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB8-23186a.txt
['August', '4', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-23186tf.txt
['4', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB8-23349a.txt
['October', '7', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-23349tf.txt
['7', 'octobre', '2020']
DAY FIRST FORMAT
MONTH:  octobre
Parsing file: TB8-23479.txt
['15', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB8-23479a.txt
['September', 

['10', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB8-32870 a.txt
['April', '30', '2021']
MONTH FIRST FORMAT
Parsing file: TB8-32870tf.txt
['30', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TB8-33181a.txt
['August', '5', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-33181tf.txt
['5', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB8-33587a.txt
['September', '1', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-33587tf.txt
['1er', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB8-33593a.txt
['August', '10', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-33593tf.txt
['10', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB8-33621a.txt
['August', '31', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-33621tf.txt
['31', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB8-33695a.txt
['September', '9', '2020']
MONTH FIRST FORMAT
Parsing file: TB8-33695tf.txt
['9', 'septembre', '2020']
DAY FIRST FO

['28', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB9-08334tf.txt
['September', '28', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-08452f.txt
['20', 'novembre', '2020']
DAY FIRST FORMAT
MONTH:  novembre
Parsing file: TB9-08452ta.txt
['November', '20', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-08583 a.txt
['April', '15', '2021']
MONTH FIRST FORMAT
Parsing file: TB9-08583tf.txt
['15', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TB9-08701a.txt
['August', '3', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-08701tf.txt
['3', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB9-08712 a.txt
['April', '26', '2021']
MONTH FIRST FORMAT
Parsing file: TB9-08712tf.txt
['26', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TB9-08842a.txt
['July', '29', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-08842tf.txt
['29', 'juillet', '2020']
DAY FIRST FORMAT
MONTH:  juillet
Parsing file: TB9-08854a.txt
['August', '12', '2020']
MONTH FIRS

['August', '17', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-14262tf.txt
['17', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB9-14275a.txt
['August', '11', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-14275tf.txt
['11', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB9-14396a.txt
['August', '11', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-14396tf.txt
['11', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB9-14415e.txt
['September', '14', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-14415tf.txt
['14', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB9-14517a.txt
['September', '25', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-14517tf.txt
['25', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB9-14795a.txt
['September', '30', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-14795tf.txt
['30', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB9-14903a.txt
['July', '27', '2020']
MO

['28', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB9-19235e.txt
['September', '18', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-19235tf.txt
['18', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB9-19440a.txt
['September', '8', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-19440tf.txt
['8', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB9-19522e.txt
['September', '11', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-19522tf.txt
['11', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB9-19685a.txt
['August', '26', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-19685tf.txt
['26', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB9-19698a.txt
['August', '10', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-19698tf.txt
['10', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB9-19853a.txt
['August', '30', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-19853tf.txt
['30', 'août', '2020']
DAY 

['25', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB9-26712a.txt
['August', '31', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-26712tf.txt
['31', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TB9-26857a.txt
['March', '3', '2021']
MONTH FIRST FORMAT
Parsing file: TB9-26857tf.txt
['3', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TB9-27050a.txt
['September', '30', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-27050tf.txt
['30', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB9-27062e.txt
['September', '15', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-27062ta.txt
['15', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB9-27066a.txt
['September', '8', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-27066tf.txt
['8', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TB9-27494a .txt
['March', '27', '2021']
MONTH FIRST FORMAT
Parsing file: TB9-27494tf.txt
['27', 'mars', '2021

['16', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TB9-33439a.txt
['October', '6', '2020']
MONTH FIRST FORMAT
Parsing file: TB9-33439tf.txt
['6', 'octobre', '2020']
DAY FIRST FORMAT
MONTH:  octobre
Parsing file: TB9-33974 a.txt
['April', '19', '2021']
MONTH FIRST FORMAT
Parsing file: TB9-33974tf.txt
['19', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TB9-33976a.txt
['February', '15', '2021']
MONTH FIRST FORMAT
Parsing file: TB9-33976tf.txt
['15', 'février', '2021']
DAY FIRST FORMAT
MONTH:  fevrier
Parsing file: TB9-34368a.txt
['March', '25', '2021']
MONTH FIRST FORMAT
Parsing file: TB9-34368tf.txt
['25', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TB9-34376a .txt
['March', '25', '2021']
MONTH FIRST FORMAT
Parsing file: TB9-34376tf.txt
['25', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TB9-34377 a.txt
['December', '3', '2021']
MONTH FIRST FORMAT
Parsing file: TB9-34377tf.txt
['3', 'décembre', '2021']
DAY FIRST FORMAT
MON

['14', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TC0-01899 a.txt
['April', '7', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-01899tf.txt
['7', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TC0-02068 a.txt
['April', '28', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-02068tf.txt
['28', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TC0-02075e.txt
['September', '21', '2020']
MONTH FIRST FORMAT
Parsing file: TC0-02075tf.txt
['21', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TC0-02222a .txt
['March', '31', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-02222tf.txt
['31', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-02224a.txt
['March', '31', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-02224tf.txt
['31', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-02240 a.txt
['March', '2', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-02240tf.txt
['2', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  ma

['26', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TC0-05484a .txt
['March', '31', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-05484tf.txt
['31', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-05564 a.txt
['April', '30', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-05564tf.txt
['30', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TC0-05838 a.txt
['March', '12', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-05838tf.txt
['12', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-05851 a.txt
['April', '29', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-05851tf.txt
['29', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TC0-05852 a.txt
['April', '6', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-05852tf.txt
['6', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TC0-05853 a.txt
['April', '29', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-05853tf.txt
['29', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Par

['March', '8', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-07427tf.txt
['8', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-07431a.txt
['March', '12', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-07431tf.txt
['12', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-07535a.txt
['March', '29', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-07535tf.txt
['29', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-07544a.txt
['March', '19', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-07544tf.txt
['19', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-07663 a.txt
['April', '26', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-07663tf.txt
['26', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TC0-07668a .txt
['March', '19', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-07668tf.txt
['19', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-07672 a.txt
['April', '22', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-07672t

['March', '9', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-09018tf.txt
['9', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-09076 a.txt
['April', '30', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-09076tf.txt
['30', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TC0-09082a.txt
['April', '13', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-09082tf.txt
['13', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TC0-09131 a.txt
['April', '28', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-09131tf.txt
['28', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TC0-09139a.txt
['5', 'March', '2021']
DAY FIRST FORMAT
MONTH:  march
Parsing file: TC0-09139tf.txt
['5', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-09183a.txt
['March', '25', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-09183tf.txt
['25', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-09190a.txt
['5', 'March', '2021']
DAY FIRST FORMAT
MONTH:  march


['8', 'September', '2021']
DAY FIRST FORMAT
MONTH:  september
Parsing file: TC0-09812tf.txt
['8', 'septembre', '2021']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TC0-09821 a.txt
['March', '11', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-09821tf.txt
['11', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-09833 a.txt
['April', '14', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-09833tf.txt
['14', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TC0-09849 a.txt
['March', '10', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-09849tf.txt
['10', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-09879 a.txt
['April', '14', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-09879tf.txt
['14', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TC0-09893 a.txt
['April', '6', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-09893tf.txt
['6', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TC0-09894 a.txt
['April', '16', '2021']
MONTH FI

['22', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-11009 a.txt
['March', '11', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-11009tf.txt
['11', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-11036 a.txt
['March', '23', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-11036tf.txt
['23', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-11061 a.txt
['March', '26', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-11061tf.txt
['26', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-11085 a.txt
['April', '28', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-11085tf.txt
['28', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: TC0-11095 a.txt
['March', '19', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-11095tf.txt
['19', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: TC0-11199 a.txt
['May', '19', '2021']
MONTH FIRST FORMAT
Parsing file: TC0-11199tf.txt
['19', 'mai', '2021']
DAY FIRST FORMAT
MONTH:  mai
Parsing file:

['September', '8', '2021']
MONTH FIRST FORMAT
Parsing file: TC1-00569tf.txt
['8', 'septembre', '2021']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TC1-00635 a.txt
['September', '15', '2021']
MONTH FIRST FORMAT
Parsing file: TC1-00635tf.txt
['15', 'septembre', '2021']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TC1-00644 a.txt
['July', '27', '2021']
MONTH FIRST FORMAT
Parsing file: TC1-00644tf.txt
['27', 'juillet', '2021']
DAY FIRST FORMAT
MONTH:  juillet
Parsing file: TC1-00718 a.txt
['September', '10', '2021']
MONTH FIRST FORMAT
Parsing file: TC1-00718tf.txt
['10', 'septembre', '2021']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TC1-00747 a.txt
['July', '22', '2021']
MONTH FIRST FORMAT
Parsing file: TC1-00747tf.txt
['22', 'juillet', '2021']
DAY FIRST FORMAT
MONTH:  juillet
Parsing file: TC1-00752 a.txt
['July', '30', '2021']
MONTH FIRST FORMAT
Parsing file: TC1-00752tf.txt
['30', 'juillet', '2021']
DAY FIRST FORMAT
MONTH:  juillet
Parsing file: TC1-00790 a.txt
['August'

['17', 'septembre', '2021']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TC1-03933 a.txt
['July', '26', '2021']
MONTH FIRST FORMAT
Parsing file: TC1-03933tf.txt
['26', 'juillet', '2021']
DAY FIRST FORMAT
MONTH:  juillet
Parsing file: TC1-04056 a.txt
['24', 'September', '2021']
DAY FIRST FORMAT
MONTH:  september
Parsing file: TC1-04056tf.txt
['24', 'septembre', '2021']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: TC1-04203 a.txt
['November', '12', '2021']
MONTH FIRST FORMAT
Parsing file: TC1-04203tf.txt
['12', 'novembre', '2021']
DAY FIRST FORMAT
MONTH:  novembre
Parsing file: TC1-04300 a.txt
['November', '16', '2021']
MONTH FIRST FORMAT
Parsing file: TC1-04300tf.txt
['16', 'novembre', '2021']
DAY FIRST FORMAT
MONTH:  novembre
Parsing file: TC1-04541 a.txt
['November', '16', '2021']
MONTH FIRST FORMAT
Parsing file: TC1-04541tf.txt
['16', 'novembre', '2021']
DAY FIRST FORMAT
MONTH:  novembre
Parsing file: TC1-04556 a.txt
['October', '12', '2021']
MONTH FIRST FORMAT
Parsing file: 

['January', '14', '2022']
MONTH FIRST FORMAT
Parsing file: TC1-10177tf.txt
['14', 'janvier', '2022']
DAY FIRST FORMAT
MONTH:  janvier
Parsing file: TC1-10937 a.txt
['December', '20', '2021']
MONTH FIRST FORMAT
Parsing file: TC1-10937TF.txt
['20', 'décembre', '2021']
DAY FIRST FORMAT
MONTH:  decembre
Parsing file: TC1-11196 a.txt
['February', '7', '2022']
MONTH FIRST FORMAT
Parsing file: TC1-11196tf.txt
['7', 'février', '2022']
DAY FIRST FORMAT
MONTH:  fevrier
Parsing file: TC1-11198 a.txt
['August', '19', '2021']
MONTH FIRST FORMAT
Parsing file: TC1-11198tf.txt
['19', 'août', '2021']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: TC1-11703f.txt
['14', 'février', '2022']
DAY FIRST FORMAT
MONTH:  fevrier
Parsing file: TC1-11703ta.txt
['February', '14', '2022']
MONTH FIRST FORMAT
Parsing file: TC1-12204 a.txt
['January', '17', '2022']
MONTH FIRST FORMAT
Parsing file: TC1-12204tf.txt
['17', 'janvier', '2022']
DAY FIRST FORMAT
MONTH:  janvier
Parsing file: TC1-12315 a.txt
['November', '29', '2

['31', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: VB9-04355a.txt
['Septembre', '2', '2020']
MONTH FIRST FORMAT
Parsing file: VB9-04355tf.txt
['2', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: VB9-04480a.txt
['August', '3', '2020']
MONTH FIRST FORMAT
Parsing file: VB9-04480tf.txt
['3', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: VB9-04482 a.txt
['April', '8', '2021']
MONTH FIRST FORMAT
Parsing file: VB9-04482tf.txt
['8', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: VB9-04510a.txt
['September', '2', '2020']
MONTH FIRST FORMAT
Parsing file: VB9-04510tf.txt
['2', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: VB9-04513 a.txt
['March', '15', '2021']
MONTH FIRST FORMAT
Parsing file: VB9-04513tf.txt
['15', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: VB9-04683a.txt
['March', '17', '2021']
MONTH FIRST FORMAT
Parsing file: VB9-04683tf.txt
['17', 'mars', '2021']
DAY FIRST FORMAT
MONTH

['October', '9', '2020']
MONTH FIRST FORMAT
Parsing file: VB9-07589tf.txt
['9', 'octobre', '2020']
DAY FIRST FORMAT
MONTH:  octobre
Parsing file: VB9-07636a.txt
['August', '11', '2020']
MONTH FIRST FORMAT
Parsing file: VB9-07636tf.txt
['11', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: VB9-07694a.txt
['September', '8', '2020']
MONTH FIRST FORMAT
Parsing file: VB9-07694tf.txt
['8', 'septembre', '2020']
DAY FIRST FORMAT
MONTH:  septembre
Parsing file: VB9-07859a.txt
['October', '31', '2020']
MONTH FIRST FORMAT
Parsing file: VB9-07859tf.txt
['31', 'octobre', '2020']
DAY FIRST FORMAT
MONTH:  octobre
Parsing file: VB9-07867f.txt
['19', 'novembre', '2020']
DAY FIRST FORMAT
MONTH:  novembre
Parsing file: VB9-07867ta.txt
['November', '19', '2020']
MONTH FIRST FORMAT
Parsing file: VB9-07868 a.txt
['16', 'June', '2021']
DAY FIRST FORMAT
MONTH:  june
Parsing file: VB9-07868tf.txt
['16', 'juin', '2021']
DAY FIRST FORMAT
MONTH:  juin
Parsing file: VB9-07982 a.txt
['March', '23', '202

['March', '17', '2021']
MONTH FIRST FORMAT
Parsing file: VC0-00827tf.txt
['17', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: VC0-00832 a.txt
['April', '23', '2021']
MONTH FIRST FORMAT
Parsing file: VC0-00832tf.txt
['23', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: VC0-00912 a.txt
['April', '9', '2021']
MONTH FIRST FORMAT
Parsing file: VC0-00912tf.txt
['9', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: VC0-00917a.txt
['August', '31', '2020']
MONTH FIRST FORMAT
Parsing file: VC0-00917tf.txt
['31', 'août', '2020']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: VC0-00959 a.txt
['April', '23', '2021']
MONTH FIRST FORMAT
Parsing file: VC0-00959tf.txt
['23', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: VC0-01016 a.txt
['April', '20', '2021']
MONTH FIRST FORMAT
Parsing file: VC0-01016tf.txt
['20', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: VC0-01017 a.txt
['March', '31', '2021']
MONTH FIRST FORMAT
Parsing file: V

['5', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: VC0-02679 a.txt
['March', '30', '2021']
MONTH FIRST FORMAT
Parsing file: VC0-02679tf.txt
['30', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: VC0-02752 a.txt
['March', '25', '2021']
MONTH FIRST FORMAT
Parsing file: VC0-02752tf.txt
['25', 'mars', '2021']
DAY FIRST FORMAT
MONTH:  mars
Parsing file: VC0-02798 a.txt
['August', '19', '2021']
MONTH FIRST FORMAT
Parsing file: VC0-02798tf.txt
['19', 'août', '2021']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: VC0-02813 a.txt
['August', '31', '2021']
MONTH FIRST FORMAT
Parsing file: VC0-02813tf.txt
['31', 'août', '2021']
DAY FIRST FORMAT
MONTH:  aout
Parsing file: VC0-02845 a.txt
['April', '14', '2021']
MONTH FIRST FORMAT
Parsing file: VC0-02845tf.txt
['14', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsing file: VC0-02874 a.txt
['April', '13', '2021']
MONTH FIRST FORMAT
Parsing file: VC0-02874tf.txt
['13', 'avril', '2021']
DAY FIRST FORMAT
MONTH:  avril
Parsin

## Placing output in DATA/YEARLY format jsons

In [35]:
df = pd.DataFrame(data_records)
df['document_date'] = pd.to_datetime(df['document_date']).dt.strftime('%Y-%m-%d')
output_dir = 'DATA/YEARLY'
os.makedirs(output_dir, exist_ok=True)

for year, group in df.groupby('year'):
    json_path = os.path.join(output_dir, f'{int(year)}.json')
    

    records = group.to_dict(orient='records')
    
    # Write the records to a JSON file
    with open(json_path, 'w', encoding='utf-8') as f:
        json.dump(records, f, ensure_ascii=False, indent=4)
    
    print(f'Outputted {len(records)} records to {json_path}')

Outputted 2 records to DATA/YEARLY\2018.json
Outputted 8 records to DATA/YEARLY\2019.json
Outputted 1716 records to DATA/YEARLY\2020.json
Outputted 2017 records to DATA/YEARLY\2021.json
Outputted 128 records to DATA/YEARLY\2022.json
Outputted 2 records to DATA/YEARLY\2023.json
