# Pyrus

## Modules

### STRING MANAGER

This module provides a series of operations that can be used to manage string data.

#### string_normaliser

This function normalises string data by performing several operations:
- Converting all characters to lowercase
- Removing leading/trailing white space
- Removing double spaces
- Replacing accented and special characters with their ASCII equivalents

Args:
    string (str): The string to be normalised.

Returns:
    normalised_string (str): The normalized string data.

In [7]:
from unidecode import unidecode
import unicodedata
import regex as re
import string

whitespace = string.whitespace

def string_normaliser(string, normalise_encoding=False):

    
    # Turn the entire string to lowercase
    lowercase_string = string.lower()

    if normalise_encoding:
        # Transform string into canonical representation
        unicode_normalised_string = unicodedata.normalize(
            'NFKD', lowercase_string)

        # Encode string into ASCII format,
        # Ignore letters that can't be turned into ASCII
        lowercase_string = unidecode(unicode_normalised_string)
    
    replaced_string = lowercase_string.replace('@', '').replace('#', '')
    # Strip the string of all whitespace
    stripped_string = replaced_string.strip()
    stripped_string = stripped_string.replace(whitespace, '')
    

    # Remove special characters from the string
    pattern = re.compile(r'[^\p{L}\s\d@#]')
    special_character_removed_string = pattern.sub('', stripped_string)

    
    # While the string contains double spaces
    # This ensures triple and more spaces are replaced
    while '  ' in special_character_removed_string:
        # Turn double spaces into single spaces
       special_character_removed_string = special_character_removed_string.replace('  ', ' ')

    normalised_string = special_character_removed_string

    return normalised_string

##### string_normaliser: tests

In [8]:
import unittest

class StringNormaliserTests(unittest.TestCase):

    def subtester(self, test_values):
        
        for value, expected_result in test_values:
            with self.subTest(value=value):
                result = string_normaliser(value)
                self.assertEqual(result, expected_result)
    
    def test_case_normalisation(self):
        test_values = [
            ("Hello World!", "hello world"),
            ("ThIs Is A MiXeD CaSe StRiNg", "this is a mixed case string"),
            ("Áccéntéd Cháráctérs", "áccéntéd cháráctérs"),
            ("ALL UPPERCASE", "all uppercase"),
            ("", ""),  # Empty string should remain the same
        ]
    
        self.subtester(test_values)

    def test_whitespace_normalisation(self):
        test_values = [
            ("   Remove  extra  spaces   ", "remove extra spaces"),
            ("  Leading and trailing spaces  ", "leading and trailing spaces"),
            ("    ", ""),  # All whitespace, expect empty string
            ("", "")
        ]
        
        self.subtester(test_values)

    def test_double_space_normalisation(self):
        test_values = [
            ("This  has  double  spaces", "this has double spaces"),
            ("No  double  spaces", "no double spaces"),
            ("Single spaces", "single spaces"),
            ("", ""),  # Empty string should remain the same
        ]
        
        self.subtester(test_values)

    def test_encoding_normalisation(self):
        test_values = [
            ("Thís Štríng Hás Áccénted Characters", "thís štríng hás áccénted characters"),
            ("Ünicöde Äscii Êncoding", "ünicöde äscii êncoding"),
            ("Keep 1234567890 digits", "keep 1234567890 digits"),
            ("", ""),  # Empty string should remain the same
        ]
        
        self.subtester(test_values)

    def test_special_char_normalisation(self):
        test_values = [
            ("Hello@# World!", "hello world"),
            ("Remove !@#$ special %^&* characters", "remove special characters"),
            ("Keep digits 1234567890", "keep digits 1234567890"),
            ("", ""),  # Empty string should remain the same
        ]
        
        self.subtester(test_values)

    def test_combined_normaisation(self):
        test_values = [
            # Normal test values
            ("Hello World!", "hello world"),
            ("   Remove  extra  spaces   ", "remove extra spaces"),
            ("Thís Štríng Hás Áccénted Characters", "thís štríng hás áccénted characters"),
            ("Ünicöde Äscii Êncoding", "ünicöde äscii êncoding"),
            ("Hello@# World!", "hello world"),
            ("Remove !@#$ special %^&* characters", "remove special characters"),
            ("Keep digits 1234567890", "keep digits 1234567890"),

            # Extreme test values
            ("    ", ""),  # All whitespace, expect empty string
            ("!@#$%^&*()_+", ""),  # All special characters, expect empty string
            ("ÁČÇÈÑTÉÐ ßÞÉÇÏÀL ÇHÁRÁÇTÉRS", "áčçèñtéð ßþéçïàl çháráçtérs"),
            ("ÛÑÎÇØÐÊ ÄŠÇÏÏ ÊÑÇØÐÏÑG", "ûñîçøðê äšçïï êñçøðïñg"),
            ("     ÛÑÎÇØÐÊ   ", "ûñîçøðê"),  # Leading/trailing whitespace with accented characters
            ("\n    ÛÑÎÇØÐÊ     ÄŠÇÏÏ     ÊÑÇØÐÏÑG    ", "ûñîçøðê äšçïï êñçøðïñg"),  # Multiple operations with accented characters and whitespace
        ]
        
        self.subtester(test_values)


unittest.main(argv=[''], exit=False)

......
----------------------------------------------------------------------
Ran 6 tests in 0.004s

OK


<unittest.main.TestProgram at 0x105cc5cf0>

## Librarian

In [11]:
"""
librarian.py - A module for managing collections and performing lookups in a library.

This module provides the `Librarian` class, which allows you to interact with collections stored as CSV files in a library.

Classes:
    Librarian: A class for managing collections and performing lookups in a library.

Usage:
    from librarian import Librarian

    librarian = Librarian()
    result = librarian.lookup("my_collection", "key1", "key2")

    # The `lookup` method performs a lookup operation on a collection, using one or more keys.

"""
import csv
from collections import ChainMap
from contextlib import contextmanager
from typing import Dict, List, Tuple, Generator, Any


class Librarian:
    """
    A class for managing collections and performing lookups in a library.

    Attributes:
        LIBRARY (str): The path to the library directory.

    Methods:
        get_collection: Context manager for accessing a collection file.
        lookup: Perform a lookup operation on a collection.
        extrapolate_dicts: Extrapolate dictionaries from an open CSV file.
        create_datamap: Create a ChainMap data structure from dictionaries.
    """
    LIBRARY = 'library/'

    def __init__(self) -> None:
        pass

    @contextmanager
    def get_collection(self, collection: str) -> Generator:
        """
        Context manager to open a collection file.

        Args:
            collection (str): The name of the collection.

        Yields:
            file: The opened file object.

        Raises:
            Exception: If an error occurs while opening the file.
        """
        file = None

        try:
            file = open(collection, 'r', newline='', encoding='utf-8-sig')
            yield file

        except Exception as e:
            raise e

        finally:
            if file:
                file.close()

    def lookup(self, collection: str, *keys: str):
        """
        Perform a lookup operation on a collection.

        Args:
            collection (str): The name of the collection.
            *keys (str): Variable number of keys for the lookup.

        Returns:
            The result of the lookup operation.

        Raises:
            Exception: If an error occurs while performing the lookup.
        """
        collection_filepath = f'{self.LIBRARY}{collection}.csv'

        with self.get_collection(collection_filepath) as file:
            rows: List[List[str]] = list(csv.reader(file))
            parent_dict, child_dict, rows, parent_keys = self.extrapolate_dicts(rows)
            result_map = self.create_datamap(parent_dict, child_dict, rows, parent_keys)
            
            for key in keys:
                if key in result_map:
                    result_map = result_map[key]

            return result_map

    def extrapolate_dicts(self, rows) -> Tuple[Dict[str, Dict[str, str]], Dict[str, str], List[List[str]], List[str]]:
        """
        Extrapolate dictionaries from an open CSV file.

        Args:
            open_file (TextIO): The open text file object.

        Returns:
            Tuple containing:
            - parent_dict (Dict[str, Dict[str, str]]): The parent dictionary.
            - child_dict (Dict[str, str]): The child dictionary.
            - rows (List[List[str]]): The rows of the CSV file.
            - parent_keys (List[str]): The keys of the parent dictionary.
        """
        parent_keys: List[str] = rows[0]
        child_dict: Dict[str, str] = {}

        for row in rows[1:]:
            child_key = row[0]
            child_value = row[1]
            child_dict[child_key] = child_value

        parent_dict: Dict[str, Dict[str, str]] = dict(zip(parent_keys, [child_dict] * len(parent_keys)))
        return parent_dict, child_dict, rows, parent_keys

    def create_datamap(self, parent_dict: Dict[str, Dict[str, Any]], child_dict: Dict[str, Any], rows: List[List[str]], parent_keys: List[str]) -> ChainMap:
        """
        Create a ChainMap data structure from dictionaries.

        Args:
            parent_dict (Dict[str, Dict[str, Any]]): The parent dictionary.
            child_dict (Dict[str, Any]): The child dictionary.
            rows (List[List[str]]): The rows of the CSV file.
            parent_keys (List[str]): The keys of the parent dictionary.

        Returns:
            ChainMap: The created ChainMap data structure.
        """
        for row in rows[1:]:
            child_key = row[0]
            child_value = row[1]
            child_dict[child_key] = child_value

        parent_dict = dict(zip(parent_keys, [child_dict] * len(parent_keys)))
        data_map: ChainMap = ChainMap(parent_dict, child_dict)
        return data_map



In [None]:
librarian = Librarian()
print(librarian.lookup('articles_conjunctions_prepositions'))

In [16]:
import unittest
from unittest.mock import mock_open, patch

class TestLibrarian(unittest.TestCase):
    def setUp(self):
        self.librarian = Librarian()

    def test_get_collection_success(self):
        collection = 'my_collection'
        expected_file_path = 'library/my_collection.csv'

        with patch('builtins.open', mock_open()) as mock_file:
            with self.librarian.get_collection(collection) as file:
                mock_file.assert_called_once_with(expected_file_path, 'r', newline='', encoding='utf-8-sig')

    def test_get_collection_exception(self):
        collection = 'non_existent_collection'
        expected_file_path = 'library/non_existent_collection.csv'

        with patch('builtins.open', side_effect=FileNotFoundError):
            with self.assertRaises(Exception):
                with self.librarian.get_collection(collection):
                    pass

    def test_lookup_success(self):
        collection = 'my_collection'
        keys = ('key1', 'key2')
        expected_result = 'lookup_result'

        with patch('builtins.open', mock_open()) as mock_file:
            mock_reader = mock_file.return_value.__enter__.return_value
            mock_reader.__iter__.return_value = [['parent_key', 'child_key', expected_result]]
            
            result = self.librarian.lookup(collection, *keys)
            
            self.assertEqual(result, expected_result)

    def test_lookup_key_not_found(self):
        collection = 'my_collection'
        keys = ('key1', 'key2')
        
        with patch('builtins.open', mock_open()) as mock_file:
            mock_reader = mock_file.return_value.__enter__.return_value
            mock_reader.__iter__.return_value = [['parent_key', 'child_key', 'lookup_result']]
            
            result = self.librarian.lookup(collection, *keys)
            
            self.assertIsNone(result)

    def test_extrapolate_dicts(self):
        rows = [
            ['parent_key', 'child_key', 'child_value'],
            ['parent1', 'child1', 'value1'],
            ['parent2', 'child2', 'value2']
        ]
        expected_parent_dict = {
            'parent_key': {
                'child_key': 'child_value'
            }
        }
        expected_child_dict = {
            'child1': 'value1',
            'child2': 'value2'
        }
        expected_parent_keys = ['parent_key']

        parent_dict, child_dict, _, parent_keys = self.librarian.extrapolate_dicts(rows)

        self.assertEqual(parent_dict, expected_parent_dict)
        self.assertEqual(child_dict, expected_child_dict)
        self.assertEqual(parent_keys, expected_parent_keys)

    def test_create_datamap(self):
        parent_dict = {
            'parent_key': {
                'child_key': 'child_value'
            }
        }
        child_dict = {
            'child1': 'value1',
            'child2': 'value2'
        }
        rows = [
            ['parent_key', 'child_key', 'child_value'],
            ['parent1', 'child1', 'value1'],
            ['parent2', 'child2', 'value2']
        ]
        parent_keys = ['parent_key']

        expected_datamap = {
            'parent_key': {
                'child_key': 'child_value'
            },
            'child1': 'value1',
            'child2': 'value2'
        }

        datamap = self.librarian.create_datamap(parent_dict, child_dict, rows, parent_keys)

        self.assertEqual(datamap, expected_datamap)

unittest.main(argv=[''], exit=False)


......FF.FEE
ERROR: test_lookup_key_not_found (__main__.TestLibrarian)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/folders/90/7k5gmw150tnbl3nn8np374kw0000gn/T/ipykernel_11035/1685998256.py", line 46, in test_lookup_key_not_found
    result = self.librarian.lookup(collection, *keys)
  File "/Users/charliemarshall/Documents/GitHub/iammokzeeee personal tools/filemanagement/pyrus/src/pyrus/librarian.py", line 36, in lookup
    unmapped_dicts = self.extrapolate_dicts(open_file)
  File "/Users/charliemarshall/Documents/GitHub/iammokzeeee personal tools/filemanagement/pyrus/src/pyrus/librarian.py", line 48, in extrapolate_dicts
    parent_keys = rows[0]
IndexError: list index out of range

ERROR: test_lookup_success (__main__.TestLibrarian)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/folders/90/7k5gmw150tnbl3nn8np374kw0000gn/T/ipykernel_11035/16

<unittest.main.TestProgram at 0x10a00ffd0>

### Presenter

#### Presneter - Articles, Conjunctions and Prepositions

Conventions on Articles, Conjunctions, Prepositions:

Group 1: Articles, conjunctions, and prepositions are usually not capitalized in titles, unless they are the first word or part of a proper noun.

French

Group 2: Articles, conjunctions, and prepositions are typically not capitalized unless they are the first or last word, or if they have four or more letters.

German
Dutch

Group 3: Articles, conjunctions, and prepositions are generally not capitalized in titles, unless they are the first or last word, or if they are stressed as part of the title's style or emphasis.

Spanish

Group 4: Articles, conjunctions, and prepositions are typically not capitalized in titles, except when they are the first or last word, or if they have special emphasis or are part of proper nouns.

Italian
Portuguese

Group 5: Articles, conjunctions, and prepositions are generally not capitalized in titles unless they are the first or last word, or if they have special emphasis or are part of proper nouns.

Norwegian
Swedish
Danish
Finnish

## Data Validator