# Demonstration of STIX-D's Clex Importer Tool

## Introduction

Welcome to this demonstration of the Clex Importer tool. The Clex Importer is a specialized utility designed to seed the lexicon table in the STIX-D Corpus Database with the Attempto Controlled English (ACE) common lexicon. ACE is a controlled natural language that allows for unambiguous interpretation by both humans and machines, making it an essential component for applications requiring precise language processing.

During this demonstration, we will walk through the process of importing the ACE common lexicon into the database. The tool reads the Clex lexicon file, parses its content, and systematically imports the lexical entries into the database. By the end of this demonstration, the `lexicon` table will be populated with sample entries to support ACE-based natural language processing tasks.

## Agenda

1. Use Case
1. Project Design
1. Code Interaction with Database
1. Test Cases
    - All Tests
    - Unit Tests
    - Integration Tests
    - End-to-End Tests
1. Code Execution
    - Command Line Interface (in notebook)
    - Web Interface (not in notebook)

## 1. Use Case

The STIX-D Use Case L1 involves seeding the `stixd_corpus.lexicon` database table with lexical entries from the ACE Common Lexicon (Clex) or similar files. An administrator provides a URI to the lexicon file, and the system connects to the local database via the `mysql_repository.py` module. For each line in the lexicon file, the system extracts relevant character strings to create a word tag and form, generates a SHA256 hash of these components, and checks for the hash in the `lexicon` table. If the hash exists, it links the existing entry with an existing source ID; if not, it creates a new entry. The system also imports additional arguments into appropriate fields and outputs summary information or error messages as necessary.

## 2. Project Design

### Project Overview
The Clex Importer tool imports lexical entries from the Attempto Controlled English (ACE) lexicon file, stored as Prolog facts, into the `lexicon` table of the STIX-D MySQL database. This tool is accessible via the command line or a web form served by a Flask API, where users input a URL pointing to an ACE lexicon file. The system then parses the each Prolog fact and maps it to the appropriate attributes in the `lexicon` table and creates relevant entries in the `stix_obects` table (i.e., source documents) and the `obj_lex_jt` juntion table.

### OOP Principles in the Project
This project is designed using object-oriented programming (OOP) principles to create a modular, extensible, and maintainable system. The key OOP principles in the project are as follows:

- **Abstraction**: The project uses abstract classes and methods to define interfaces and enforce a common structure. For example, the Repository class defines abstract methods for interacting with the database, which are implemented by MySQLRepository.
- **Encapsulation**: Each class is responsible for a specific aspect of the project, encapsulating related data and behavior. For example, ClexImporter encapsulates the logic for importing Clex entries, while MySQLRepository encapsulates database interactions.
- **Inheritance**: The project uses inheritance to create a hierarchy of classes with shared behavior. For example, MySQLRepository inherits from Repository to reuse common database interaction methods.
- **Polymorphism**: The project uses polymorphism to allow different classes to be used interchangeably. For example, the Repository interface allows different types of repositories to be used with the ClexImporter.

### Key Modules and Their OOP Design
The project consists of the following key modules, each designed using OOP principles:

- **`ClexImporter` Class in `clex_importer.py`**: 
    - **Responsibility**: Manages the importation of Clex entries into the database.
    - **Attributes**:
        - `db_repo`: Represents the database repository where Clex entries will be stored.
        - `uri`: The location of the Clex file to be imported.
    - **Methods**:
        - `import_clex_entries()`: Imports Clex entries from the specified file into the database.
        - `parse_clex_entry()`: Parses a single Clex entry from the file.
        - `map_clex_entry_to_lexicon()`: Maps the parsed Clex entry to the `lexicon` table schema.

- **`MySQLRepository` Class in `mysql_repository.py`**: 
    - **Responsibility**: Abstracts the database interactions for MySQL databases.
    - **Attributes**:
        - `connection`: Represents the connection to the MySQL database.
        - `table_name`: The name of the table in the database.
    - **Methods**:
        - `create_table()`: Creates the table in the database.

- **MySQLRepository**: Contains the MySQLRepository class, which implements the Repository interface for interacting with a MySQL database. The class uses the mysql-connector-python library to connect to the database and execute queries.

## 3. Code Interaction with the Database

### Setup Notebook Environment

#### Install Necessary Packages

If you are running this notebook for the first time, uncomment the code cell below or run the command from your terminal to install the necessary Python packages for this demonstration notebook. Once you have installed the packages, please re-comment the code cell for a cleaner notebook interface.

In [6]:
# %pip install -r ../demos/requirements.txt

## 4. Test Cases

In this section, we explore and demonstrate the comprehensive testing strategy employed in the project, which includes various types of test cases to ensure the reliability and correctness of the implemented code. The test suite consists of unit tests, which validate individual components such as database operations, NLP processing, and web scraping functionalities; integration tests, which check the interactions between different modules; and end-to-end tests, which simulate real user workflows to ensure the entire application behaves as expected from front-end to back-end. By running all tests together or focusing on specific ones, these test cases provide a robust framework for identifying and addressing issues, ensuring the system functions correctly across different scenarios and use cases.

### Import Necessary Libraries & Set Global Variables

In [None]:
# Import Standard Libraries
import os, sys, pytest
# from IPython.display import IFrame, display

# Get the current working directory (CWD)
cwd = os.getcwd()
# Move up two levels to reach the stixd directory
stixd_path = os.path.abspath(os.path.join(cwd, '..', '..'))
# Append the stixd directory to the Python path
sys.path.append(stixd_path)

# Load Jupyter Notebook extensions
%load_ext sql

# Define Global Variables
TEST_DIR = os.path.join(os.getcwd(), '../tests')
VERBOSITY = '-q' # Quiet
TRACEBACK = '--tb=line' # One line

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [None]:
# Connect to the database
%sql mysql+mysqlconnector://your_username:your_password@localhost:3306/stixd_corpus

### All Test Cases

You can quickly run all the tests in the STIX-D project, including those outside of the Clex Importer tool, by executing the code snippet below. This command will execute every test case located in the test directory, providing a comprehensive check of the entire system in just 30-60 seconds. This is an efficient way to ensure that all components of the project are functioning as expected, and it is highly recommended to run this after making any significant changes to the codebase.

Please note that one or both of the end-to-end (e2e) test cases may fail sometimes but often pass if you run the test(s) again. If the e2e test(s) fails twice in a row, try restarting the notebook kernel and rerun the test. This intermittent failure is likely due to a race condition in this notebook environment that does not manifest in a typical local environment. I will continue to investigate this issue to provide a more stable testing experience.

In [16]:
# Run all tests in the test directory (~30-60 seconds)
pytest.main([TEST_DIR, VERBOSITY, TRACEBACK])


[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m                                         [100%][0m
[32m[32m[1m32 passed[0m[32m in 43.01s[0m[0m


<ExitCode.OK: 0>

### Unit Tests

Unit tests are designed to validate the functionality of individual components or methods within the system. These tests focus on specific, isolated parts of the code, ensuring that each function or class behaves as expected under various conditions. Unit tests are the foundation of a robust testing strategy, catching issues early in the development process.

#### Test Case 1: doc_scrapper

The unit test case in the file `test_10_doc_scrapper.py` is designed to validate several key functions in a web scraping module. It includes tests for fetching HTML content from a URL using a mocked requests.get response, verifying if a webpage allows scraping by checking the presence of specific meta tags, converting HTML content to Markdown format, and saving the Markdown content to a file. The test suite utilizes fixtures for consistent HTML content, mocking to simulate file operations and HTTP requests, and asserts to ensure the correctness of the functions involved. Overall, the tests cover both functional aspects and edge cases of the web scraping process, ensuring that each component behaves as expected under controlled conditions.

In [17]:
# Run a specific test file in the test directory
test_file = "test_10_doc_scrapper.py"
pytest.main([os.path.join(TEST_DIR, test_file), "-v", "--tb=auto"])


platform win32 -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0 -- d:\OneDrive\Code\hltms\stixd\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: d:\OneDrive\Code\hltms\stixd
configfile: pytest.ini
plugins: anyio-4.4.0, mock-3.14.0
[1mcollecting ... [0mcollected 5 items

..\tests\test_10_doc_scrapper.py::test_fetch_html [32mPASSED[0m[32m                 [ 20%][0m
..\tests\test_10_doc_scrapper.py::test_allows_scraping [32mPASSED[0m[32m            [ 40%][0m
..\tests\test_10_doc_scrapper.py::test_convert_html_to_markdown [32mPASSED[0m[32m   [ 60%][0m
..\tests\test_10_doc_scrapper.py::test_save_markdown [32mPASSED[0m[32m              [ 80%][0m
..\tests\test_10_doc_scrapper.py::test_process_url [32mPASSED[0m[32m                [100%][0m



<ExitCode.OK: 0>

#### Test Case 2: gen_clex_uuid

The unit test case in the file test_20_gen_clex_uuid.py is designed to test the functionality of the generate_stix_uuid function, which generates STIX-compliant UUIDs based on the input parameters. The test utilizes parameterized test cases, where different combinations of UUID version, object type, and file URLs are passed to the function. It uses a mock for the requests.get function to simulate the retrieval of content from specified URLs without making actual network requests. The test verifies that the generated STIX identifiers have the correct format, including the specified object type prefix and a valid UUID. This ensures that the generate_stix_uuid function behaves correctly under different input scenarios.

In [18]:
# Run a specific test file in the test directory
test_file = "test_20_gen_clex_uuid.py"
pytest.main([os.path.join(TEST_DIR, test_file), "-v", "--tb=auto"])


platform win32 -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0 -- d:\OneDrive\Code\hltms\stixd\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: d:\OneDrive\Code\hltms\stixd
configfile: pytest.ini
plugins: anyio-4.4.0, mock-3.14.0
[1mcollecting ... [0mcollected 2 items

..\tests\test_20_gen_clex_uuid.py::test_generate_uuid[4-x-stixd-clex-https:\raw.githubusercontent.com\ciioprof0\stixd\03c934281777fecd3edb1d8622310bbf0839c17d\tests\test_clex.pl] [32mPASSED[0m[32m [ 50%][0m
..\tests\test_20_gen_clex_uuid.py::test_generate_uuid[4-x-stixd-clex-https:\raw.githubusercontent.com\Attempto\Clex\20960a5ce07776cb211a8cfb25dc8c81fcdf25e2\clex_lexicon.pl] [32mPASSED[0m[32m [100%][0m



<ExitCode.OK: 0>

#### Test Case 3: mysql_repo

The unit test case in the file `test_30_mysql_repo.py` is focused on testing the functionality of the `MySQLRepository` class, which is responsible for interacting with a MySQL database. The test suite includes several key tests: one for saving an entry into a database table and then loading it, and another for finding an entry by its ID (specifically by the `tag_form_hash`). The tests utilize mocking techniques to simulate database connections and operations, ensuring that the code interacts with the database correctly without needing an actual database connection. Assertions are made to verify that SQL queries are correctly executed and that the retrieved data matches the expected output. These tests ensure that the repository's methods function as intended in managing database entries.

In [19]:
# Run a specific test file in the test directory
test_file = "test_30_mysql_repo.py"
pytest.main([os.path.join(TEST_DIR, test_file), "-v", "--tb=auto"])


platform win32 -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0 -- d:\OneDrive\Code\hltms\stixd\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: d:\OneDrive\Code\hltms\stixd
configfile: pytest.ini
plugins: anyio-4.4.0, mock-3.14.0
[1mcollecting ... [0mcollected 2 items

..\tests\test_30_mysql_repo.py::test_save_and_load_entry [32mPASSED[0m[32m          [ 50%][0m
..\tests\test_30_mysql_repo.py::test_find_entry_by_id [32mPASSED[0m[32m             [100%][0m



<ExitCode.OK: 0>

#### Test Case 4: nlp_manager

The unit test case in the file `test_40_nlp_manager.py` is designed to test the `NLPManager` class, which is part of an NLP (Natural Language Processing) pipeline. The test suite includes tests for two main methods: `process_text` and `process_sentence`. Both tests utilize mocking to simulate the behavior of these methods within the `NLPManager` class. The `process_text` test checks that the method correctly processes a block of text and returns both the processed text and associated metadata. Similarly, the `process_sentence` test ensures that the method processes a single sentence and returns the expected data structure. These tests help confirm that the `NLPManager` class performs its text and sentence processing tasks accurately, even when integrated with other components of the NLP pipeline.

In [20]:
# Run a specific test file in the test directory
test_file = "test_40_nlp_manager.py"
pytest.main([os.path.join(TEST_DIR, test_file), "-v", "--tb=auto"])


platform win32 -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0 -- d:\OneDrive\Code\hltms\stixd\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: d:\OneDrive\Code\hltms\stixd
configfile: pytest.ini
plugins: anyio-4.4.0, mock-3.14.0
[1mcollecting ... [0mcollected 2 items

..\tests\test_40_nlp_manager.py::test_process_text [32mPASSED[0m[32m                [ 50%][0m
..\tests\test_40_nlp_manager.py::test_process_sentence [32mPASSED[0m[32m            [100%][0m



<ExitCode.OK: 0>

#### Test Case 5: doc_manager

The unit test case in the file `test_50_doc_manager.py` is designed to validate the functionality of the `DocumentManager` class, which is responsible for managing documents within a database. The test suite includes three key tests: `test_create_document`, `test_link_document`, and `test_process_document_text`. Each test uses mocking to simulate the database connection and operations without requiring a real database. The `test_create_document` ensures that a new STIX object is correctly inserted into the `documents` table. The `test_link_document` verifies that the correct association between a STIX ID and a document ID is made in the join table. Lastly, `test_process_document_text` checks that the document text is processed by the `NLPManager` and the resulting processed text and metadata are accurately updated in the database. These tests ensure that the `DocumentManager` performs its tasks of creating, linking, and processing documents effectively.

In [21]:
# Run a specific test file in the test directory
test_file = "test_50_doc_manager.py"
pytest.main([os.path.join(TEST_DIR, test_file), "-v", "--tb=auto"])


platform win32 -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0 -- d:\OneDrive\Code\hltms\stixd\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: d:\OneDrive\Code\hltms\stixd
configfile: pytest.ini
plugins: anyio-4.4.0, mock-3.14.0
[1mcollecting ... [0mcollected 3 items

..\tests\test_50_doc_manager.py::test_create_document [32mPASSED[0m[32m             [ 33%][0m
..\tests\test_50_doc_manager.py::test_link_document [32mPASSED[0m[32m               [ 66%][0m
..\tests\test_50_doc_manager.py::test_process_document_text [32mPASSED[0m[32m       [100%][0m



<ExitCode.OK: 0>

#### Test Case 6: sent_manager

The unit test case in the file `test_53_sent_manager.py` is designed to validate the functionality of the `SentenceManager` class, which manages sentences within a database context. The test suite includes three primary tests: `test_create_sentence`, `test_link_sentence`, and `test_process_sentence_text`. Each test uses mocking to simulate the database connection and operations, ensuring that the tests do not require a live database. The `test_create_sentence` ensures that a sentence associated with a specific document ID is correctly inserted into the `sentences` table. The `test_link_sentence` verifies that the relationship between a document and a sentence is accurately recorded in the join table. Lastly, `test_process_sentence_text` checks that the sentence is processed using the `NLPManager`, and the processed text is correctly updated in the database. These tests ensure that the `SentenceManager` class performs its roles of creating, linking, and processing sentences effectively and reliably.

In [22]:
# Run a specific test file in the test directory
test_file = "test_53_sent_manager.py"
pytest.main([os.path.join(TEST_DIR, test_file), "-v", "--tb=auto"])


platform win32 -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0 -- d:\OneDrive\Code\hltms\stixd\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: d:\OneDrive\Code\hltms\stixd
configfile: pytest.ini
plugins: anyio-4.4.0, mock-3.14.0
[1mcollecting ... [0mcollected 3 items

..\tests\test_53_sent_manager.py::test_create_sentence [32mPASSED[0m[32m            [ 33%][0m
..\tests\test_53_sent_manager.py::test_link_sentence [32mPASSED[0m[32m              [ 66%][0m
..\tests\test_53_sent_manager.py::test_process_sentence_text [32mPASSED[0m[32m      [100%][0m



<ExitCode.OK: 0>

#### Test Case 7: lexicon_manager

The unit test case in the file `test_57_lexicon_manager.py` is designed to validate the functionality of the `LexiconManager` class, which is responsible for managing lexicon entries within a database. The test suite includes three primary tests: `test_create_lexicon_entry`, `test_link_lexicon_entry`, and `test_process_word`. These tests use mocking to simulate database connections and operations, ensuring they do not depend on a real database. The `test_create_lexicon_entry` verifies that a new word associated with a specific sentence ID is correctly inserted into the `lexicon` table. The `test_link_lexicon_entry` ensures that the relationship between a sentence and a lexicon entry is accurately recorded in the join table. Lastly, the `test_process_word` checks that a word is processed using the `NLPManager`, and the resulting processed word is correctly updated in the database. These tests ensure that the `LexiconManager` class performs its tasks of creating, linking, and processing lexicon entries effectively and reliably.

In [23]:
# Run a specific test file in the test directory
test_file = "test_57_lexicon_manager.py"
pytest.main([os.path.join(TEST_DIR, test_file), "-v", "--tb=auto"])


platform win32 -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0 -- d:\OneDrive\Code\hltms\stixd\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: d:\OneDrive\Code\hltms\stixd
configfile: pytest.ini
plugins: anyio-4.4.0, mock-3.14.0
[1mcollecting ... [0mcollected 3 items

..\tests\test_57_lexicon_manager.py::test_create_lexicon_entry [32mPASSED[0m[32m    [ 33%][0m
..\tests\test_57_lexicon_manager.py::test_link_lexicon_entry [32mPASSED[0m[32m      [ 66%][0m
..\tests\test_57_lexicon_manager.py::test_process_word [32mPASSED[0m[32m            [100%][0m



<ExitCode.OK: 0>

#### Test Case 8: clex_importer_local

The unit test case in the file `test_70_clex_importer_local.py` is focused on testing the `ClexImporter` service layer, which is responsible for importing lexicon entries into a MySQL database from a specified Clex file. The test case utilizes parameterized inputs to check various scenarios, including different `lex_id`, `word_tag`, `word_form`, `logical_symbol`, `third_arg`, and `tag_form_hash` values. The test ensures that the `import_clex_entries` method of the `ClexImporter` correctly imports the data and that the entries are accurately stored in the `lexicon` table in the database. It validates this by querying the database after the import and comparing the results to the expected values. These tests ensure the reliability and correctness of the lexicon import process.

In [24]:
# Run a specific test file in the test directory (~ 10 seconds)
test_file = "test_70_clex_importer_local.py"
pytest.main([os.path.join(TEST_DIR, test_file), "-v", "--tb=auto"])


platform win32 -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0 -- d:\OneDrive\Code\hltms\stixd\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: d:\OneDrive\Code\hltms\stixd
configfile: pytest.ini
plugins: anyio-4.4.0, mock-3.14.0
[1mcollecting ... [0mcollected 4 items

..\tests\test_70_clex_importer_local.py::test_import_clex_entries[1-adv-fast-fast-None-67e9b1c5cbd53045919deda792be49b18b41a09b3bd328f9cc406bb27d951f62] [32mPASSED[0m[32m [ 25%][0m
..\tests\test_70_clex_importer_local.py::test_import_clex_entries[19-noun_pl-months-month-neutr-6e7ab17fe3f242d10f360197f40646b443db6079d730e9d746c96824a2606336] [32mPASSED[0m[32m [ 50%][0m
..\tests\test_70_clex_importer_local.py::test_import_clex_entries[39-iv_finsg-walks-walk-None-f02be7a15dcd7cca79dc9b1c141991d479120352658c50030c7268da9372e6ff] [32mPASSED[0m[32m [ 75%][0m
..\tests\test_70_clex_importer_local.py::test_import_clex_entries[58-dv_pp-succeeded-succeed-as-8ee745975fad537905042b710e2f602f6c6bbe6c72f123b3596ce0b

<ExitCode.OK: 0>

#### Test Case 9: clex_importer_ci

The unit test case in the file `test_75_clex_importer_ci.py` is designed to test the `import_clex_entries` method of the `ClexImporter` class, which is part of a system that imports Clex lexicon entries into a MySQL database. The test uses mocking to simulate the behavior of external dependencies, such as HTTP requests and UUID generation, as well as interactions with the database via the `MySQLRepository`. The test mocks an HTTP request to retrieve Clex data and verifies that the data is correctly parsed and stored in the database. It also checks that UUIDs are generated appropriately for each entry and that the database methods for saving and linking entries are called the expected number of times. These tests ensure that the `import_clex_entries` method performs as expected under controlled conditions, handling both data retrieval and database interactions effectively.

In [25]:
# Run a specific test file in the test directory
test_file = "test_75_clex_importer_ci.py"
pytest.main([os.path.join(TEST_DIR, test_file), "-v", "--tb=auto"])


platform win32 -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0 -- d:\OneDrive\Code\hltms\stixd\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: d:\OneDrive\Code\hltms\stixd
configfile: pytest.ini
plugins: anyio-4.4.0, mock-3.14.0
[1mcollecting ... [0mcollected 1 item

..\tests\test_75_clex_importer_ci.py::test_import_clex_entries [32mPASSED[0m[32m    [100%][0m



<ExitCode.OK: 0>

### Integration Tests

End-to-End tests simulate real user interactions with the entire system, testing the complete workflow from start to finish. These tests ensure that the application functions correctly as a whole, from the user interface down to the underlying database operations, replicating the experience of an actual user.

#### Test Case 10: api

The unit test case in the file `test_80_api.py` is focused on testing the `/import_clex` endpoint of a Flask API, which is responsible for handling requests to import Clex lexicon entries. The test suite uses mocking to simulate the behavior of the `ClexImporter` and `MySQLRepository` classes, ensuring that the tests do not rely on external systems. Several scenarios are tested: a successful import, a bad request with missing or invalid data, handling of general exceptions during the import process, MySQL-specific errors, and system-level errors like `OSError`. The tests verify that the API responds with the correct HTTP status codes and JSON messages depending on the situation, ensuring robust error handling and correct functionality of the API endpoint under various conditions.

In [26]:
# Run a specific test file in the test directory
test_file = "test_80_api.py"
pytest.main([os.path.join(TEST_DIR, test_file), "-v", "--tb=auto"])


platform win32 -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0 -- d:\OneDrive\Code\hltms\stixd\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: d:\OneDrive\Code\hltms\stixd
configfile: pytest.ini
plugins: anyio-4.4.0, mock-3.14.0
[1mcollecting ... [0mcollected 5 items

..\tests\test_80_api.py::test_import_clex_success [32mPASSED[0m[32m                 [ 20%][0m
..\tests\test_80_api.py::test_import_clex_bad_request [32mPASSED[0m[32m             [ 40%][0m
..\tests\test_80_api.py::test_import_clex_request_exception [32mPASSED[0m[32m       [ 60%][0m
..\tests\test_80_api.py::test_import_clex_mysql_error [32mPASSED[0m[32m             [ 80%][0m
..\tests\test_80_api.py::test_import_clex_system_error [32mPASSED[0m[32m            [100%][0m



<ExitCode.OK: 0>

### End-to-End Tests

End-to-End tests simulate real user interactions with the entire system, testing the complete workflow from start to finish. These tests ensure that the application functions correctly as a whole, from the user interface down to the underlying database operations, replicating the experience of an actual user.

#### Test Case 11: e2e_local

The unit test case in the file `test_90_e2e_local.py` is an end-to-end (E2E) test for a Flask web application, focusing on verifying the full functionality of the app from the user's perspective. The test uses Selenium WebDriver to automate interactions with a web form that allows users to submit a URI for importing Clex lexicon entries. The test includes setting up the Flask application in a separate process, initializing the Selenium WebDriver with specific options, interacting with the form by filling it out and submitting it, and capturing the response from the application to assert that the operation was successful. The E2E test ensures that the integration of the front-end and back-end components works as expected, simulating real user behavior and validating the complete workflow of the form submission process.

Please note that one or both of the end-to-end (e2e) test cases may fail sometimes and often pass if you run the test(s) again. If the e2e test(s) fails twice in a row, try restarting the notebook kernel and rerun the test(s). 

In [27]:
# Run a specific test file in the test directory (~15 seconds)
test_file = "test_90_e2e_local.py"
pytest.main([os.path.join(TEST_DIR, test_file), "-v", "--tb=auto"])


platform win32 -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0 -- d:\OneDrive\Code\hltms\stixd\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: d:\OneDrive\Code\hltms\stixd
configfile: pytest.ini
plugins: anyio-4.4.0, mock-3.14.0
[1mcollecting ... [0mcollected 1 item

..\tests\test_90_e2e_local.py::test_form_submission [32mPASSED[0m[32m               [100%][0m



<ExitCode.OK: 0>

#### Test Case 12: e2e_ci

The unit test case in the file `test_95_e2e_ci.py` is an end-to-end (E2E) test designed to verify the complete functionality of a Flask web application as it would be executed in a continuous integration (CI) environment. Similar to the local E2E test, this test uses Selenium WebDriver to automate browser interactions, simulating a user submitting a form on the web interface of the application. The test involves starting the Flask app in a separate process, interacting with the form by inputting a URI, submitting it, and then checking the response to ensure that the process completes successfully with the expected output ("Import successful"). This test ensures that the web application works correctly in a CI environment, validating the entire user journey from form submission to the final response.

Please note that one or both of the end-to-end (e2e) test cases may fail sometimes and often pass if you run the test(s) again. If the e2e test(s) fails twice in a row, try restarting the notebook kernel and rerun the test(s). 

In [30]:
# Run a specific test file in the test directory (~15 seconds)
test_file = "test_95_e2e_ci.py"
pytest.main([os.path.join(TEST_DIR, test_file), "-v", "--tb=auto"])


platform win32 -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0 -- d:\OneDrive\Code\hltms\stixd\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: d:\OneDrive\Code\hltms\stixd
configfile: pytest.ini
plugins: anyio-4.4.0, mock-3.14.0
[1mcollecting ... [0mcollected 1 item

..\tests\test_95_e2e_ci.py::test_form_submission [32mPASSED[0m[32m                  [100%][0m



<ExitCode.OK: 0>

## 5. Code Execution

### Initialize the Database

 When running the code cell below to reset the database, you may encounter `Error: 1064 (42000)`.
 
This error occurs because the MySQL `DELIMITER` command is not recognized by the `mysql.connector` library used in Python. Despite this error, the SQL script executes as intended. The error can be safely ignored.


In [None]:
# Reset the database to start with an empty database
%run ../app/reset_database.py

Error: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'DELIMITER ;


-- Create procedure to check for prolog constraints (sp_check_prol' at line 1


### Show Database Initial State

After resetting the database, all tables should be empty. Let's verify the initial state of the database by running a query to select all entries from the three tables affected by the Clex Importer tool.

In [None]:
# Fetch table counts
lexicon_count = %sql SELECT COUNT(*) FROM lexicon;
stix_objects_count = %sql SELECT COUNT(*) FROM stix_objects;
obj_lex_jt_count = %sql SELECT COUNT(*) FROM obj_lex_jt;

# Display number of rows in each table
print(f"\nRows in 'stix_objects' table:  {stix_objects_count[0][0]}\n"
      f"Rows in 'lexicon' table:       {lexicon_count[0][0]}\n"
      f"Rows in 'obj_lex_jt' table:    {obj_lex_jt_count[0][0]}")


 * mysql+mysqlconnector://your_username:***@localhost:3306/stixd_corpus
1 rows affected.
 * mysql+mysqlconnector://your_username:***@localhost:3306/stixd_corpus
1 rows affected.
 * mysql+mysqlconnector://your_username:***@localhost:3306/stixd_corpus
1 rows affected.

Rows in 'stix_objects' table:  0
Rows in 'lexicon' table:       0
Rows in 'obj_lex_jt' table:    0


### Execution from Command Line

The Clex Importer tool can be executed from the command line. The following command demonstrates how to run the Clex Importer tool from the command line:

    `python clex_importer.py --uri <URL_TO_CLEX_FILE>`

- After a database reset, expect 62 total entries with 59 new and 3 existing entries
- Without a database reset, expect 62 total entries with 0 new and 62 existing entries.

In [None]:
%run ../app/clex_importer.py "https://github.com/ciioprof0/stixd/raw/main/lexicon/test_clex.pl"


Saved STIX object with ID: x-stixd-clex--6052abaa-8eaf-4378-86b6-b7a368673255
Generated hash: 67e9b1c5cbd53045919deda792be49b18b41a09b3bd328f9cc406bb27d951f62 for adv - fast
Inserted entry with lex_id: 1 into lexicon
Linking lex_id 1 with stix_object_id x-stixd-clex--6052abaa-8eaf-4378-86b6-b7a368673255
Successfully linked lex_id 1 with stix_uuid x-stixd-clex--6052abaa-8eaf-4378-86b6-b7a368673255
Generated hash: 38a31bf0527ff6fd23c6be74bfba58c46dbad709ce90b6d09b9a26f103a326b5 for adv_comp - faster
Inserted entry with lex_id: 2 into lexicon
Linking lex_id 2 with stix_object_id x-stixd-clex--6052abaa-8eaf-4378-86b6-b7a368673255
Successfully linked lex_id 2 with stix_uuid x-stixd-clex--6052abaa-8eaf-4378-86b6-b7a368673255
Generated hash: 55fee0f355e343b2c6a4d63b72a8ea8bcaa1a71698ada04e01533a8dc98fb4ee for adv_sup - fastest
Inserted entry with lex_id: 3 into lexicon
Linking lex_id 3 with stix_object_id x-stixd-clex--6052abaa-8eaf-4378-86b6-b7a368673255
Successfully linked lex_id 3 with sti

## Show Database State After Code Execution

After running the Clex Importer tool, the tables below should have the expected number of rows. 

- Rows in 'stix_objects' table: 1 
- Rows in 'lexicon' table:     59 
- Rows in 'obj_lex_jt' table:  59

Let's verify the state of the database by running a query to count entries in each table.

In [None]:
# Fetch table counts
lexicon_count = %sql SELECT COUNT(*) FROM lexicon;
stix_objects_count = %sql SELECT COUNT(*) FROM stix_objects;
obj_lex_jt_count = %sql SELECT COUNT(*) FROM obj_lex_jt;

# Display number of rows in each table
print(f"\nRows in 'stix_objects' table:  {stix_objects_count[0][0]}\n"
      f"Rows in 'lexicon' table:      {lexicon_count[0][0]}\n"
      f"Rows in 'obj_lex_jt' table:   {obj_lex_jt_count[0][0]}")

 * mysql+mysqlconnector://your_username:***@localhost:3306/stixd_corpus
1 rows affected.
 * mysql+mysqlconnector://your_username:***@localhost:3306/stixd_corpus
1 rows affected.
 * mysql+mysqlconnector://your_username:***@localhost:3306/stixd_corpus
1 rows affected.

Rows in 'stix_objects' table:  1
Rows in 'lexicon' table:      59
Rows in 'obj_lex_jt' table:   59


We will also display a sample of the first five entries in each table to demonstrate the successful importation of Clex entries.

In [None]:
# Display the first 5 rows of the lexicon table
%sql SELECT * FROM stixd_corpus.lexicon LIMIT 5;


 * mysql+mysqlconnector://your_username:***@localhost:3306/stixd_corpus
5 rows affected.


lex_id,word_tag,word_form,logical_symbol,third_arg,tag_form_hash,word_def,synsets,tagsets
1,adv,fast,fast,,67e9b1c5cbd53045919deda792be49b18b41a09b3bd328f9cc406bb27d951f62,,,
2,adv_comp,faster,fast,,38a31bf0527ff6fd23c6be74bfba58c46dbad709ce90b6d09b9a26f103a326b5,,,
3,adv_sup,fastest,fast,,55fee0f355e343b2c6a4d63b72a8ea8bcaa1a71698ada04e01533a8dc98fb4ee,,,
4,adv,quickly,quickly,,b0a248290b9aa18bfbbbfd5367dc0cc0dc82a9e90dd83b88cce59361b8d67e8a,,,
5,adj_itr,large,large,,bbe9bafa7a2a6e250fdf482a7c46217d7c63ccee917b3ae48324b61659c7e32d,,,


In [None]:
# Display the first 5 rows of the stix_objects table
%sql SELECT * FROM stixd_corpus.stix_objects LIMIT 5;


 * mysql+mysqlconnector://your_username:***@localhost:3306/stixd_corpus
1 rows affected.


obj_id,type,created_by_ref,description,spec_version,created,modified,revoked,labels,confidence,lang,external_references,object_marking_refs,granular_markings,extensions,derived_from,duplicate_of,related_to,other_properties
x-stixd-clex--6052abaa-8eaf-4378-86b6-b7a368673255,x-stixd-clex,user,ACE Common Lexicon Import,2.1,2024-08-17 12:15:54,2024-08-17 12:15:54,0,"[""lexicon""]",100,en,[],[],[],[],,,[],


In [None]:
# Display the first 5 rows of the obj_lex_jt junction table
%sql SELECT * FROM stixd_corpus.obj_lex_jt LIMIT 5;


 * mysql+mysqlconnector://your_username:***@localhost:3306/stixd_corpus
5 rows affected.


obj_id,lex_id
x-stixd-clex--6052abaa-8eaf-4378-86b6-b7a368673255,1
x-stixd-clex--6052abaa-8eaf-4378-86b6-b7a368673255,2
x-stixd-clex--6052abaa-8eaf-4378-86b6-b7a368673255,3
x-stixd-clex--6052abaa-8eaf-4378-86b6-b7a368673255,4
x-stixd-clex--6052abaa-8eaf-4378-86b6-b7a368673255,5


### Execution via Web Form

I have not yet figured out how to run the Flask API in a notebook. However, we can run the Flask API locally by following these steps:

1. Navigate to the directory containing the `api.py` file.
1. Activate the virtual environment.
    - On Windows: `.venv\Scripts\activate`
    - On macOS/Linux: `source .venv/bin/activate`
1. Install dependencies, if necessary
    - `pip install -r requirements.txt`
1. Run the Flask API
    - `python api.py`
1. Access the web form at `http://localhost:5000/`
1. When finished, stop the Flask API by pressing `Ctrl+C` in the terminal.
