# **VisTool Documentation**

Welcome to the **VisTool** user guide! This documentation provides an overview of the package, guidance and a tutorial on how to use the key functionalities.  

---

## **Overview**  

**VisTool** is a Python package designed to provide high-level visualisation tools for analysing health datasets. This toolkit enables users to:  
- Easily load and clean health-related datasets.  
- Customise visualisations for better insights and decision-making.  
- Merge, concatenate and manipulate datasets with ease.  
- Leverage interactive tools for dynamic data exploration.  

## Why use this package?

In today's data-driven world, the ability to transform raw health data into actionable insights is critical. **VisTool** addresses this need by extracting meaningful insights from your data and offering:
- Simplicity
- Efficiency
- Customisation
- Relevance towards Healthcare Data

---

# **Functionality Overview**

## **Modules and Their Functions**

### **1. Download Module**
This module provides utilities for downloading files from the internet and working with them locally. It simplifies the process of retrieving datasets, making it easier to integrate data from external sources.

#### **Key Functions**
- `download_file(url, save_path)`: Downloads a file from a specified URL and saves it to a provided path.
- `download_csv(url)`: Downloads a CSV file from a URL and loads it directly into a Pandas DataFrame.

#### **Usage Instructions**
1. Navigate to the `example_usage.ipynb` notebook.
2. Locate the "Imports to be ran" block and execute the code.
3. Scroll down to the **Download Module** section.
4. Find a file to download and copy its URL.
5. Provide a save path for the file, e.g.,:
   ```python
   url = "https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv"
   save_path = "data/airtravel.csv"
   download_file(url, save_path)
   ```
6. To download and load a CSV into a DataFrame:
   ```python
   url = "https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv"
   df = download_csv(url)
   ```

---

### **2. Combine Module**
This module provides functions to merge or concatenate datasets, enabling seamless integration of multiple data sources.

#### **Key Functions**
- `merge_datasets(data1, data2, on, how)`: Combines datasets based on a common column or index with a specified join type (e.g., left, inner).
- `concat_datasets(datasets, axis)`: Concatenates datasets along rows or columns.

#### **Usage Instructions**
1. Navigate to the `example_usage.ipynb` notebook.
2. Scroll to the **Combine Module** section.
3. Ensure you have the datasets to combine.
4. Merge datasets:
   ```python
   merged_data = merge_datasets(data1, data2, on="primary_id", how="inner")
   ```
5. Concatenate datasets:
   ```python
   concatenated_data = concat_datasets([data3, data4], axis=1)
   ```

---

### **3. Wrangle Module**
This module streamlines pre-processing tasks such as cleaning and transforming datasets. It enables users to focus on insights rather than data inconsistencies.

#### **Key Functions**
- `clean_data(data, apply_to)`: Cleans and standardises datasets.
- `filter_data(data, condition)`: Filters data based on specific conditions.
- `rename_columns(data, columns_mapping)`: Renames columns for clarity.
- `label_encode(data, column)`: Perform label encoding on a categorical column using Pandas and NumPy.

#### **Usage Instructions**
1. Navigate to the `example_usage.ipynb` notebook.
2. Scroll to the **Wrangle Module** section and run the code block.
3. Select a data wrangling option (e.g., removing or filling NaN values).
4. Follow prompts to preview and clean the dataset:
   - Example: Remove rows with NaN values.
   ```python
   cleaned_data = clean_data(remove_nan=True, axis=0)
   ```
5. Filter data with a condition:
   ```python
   filtered_data = filter_data("other_emergency_admissions > 100")
   ```

---

### **4. Visualize Module**
This module simplifies the creation of visualisations for exploratory data analysis.

#### **Key Functions**
- `plot_histogram(data, column)`: Visualises the distribution of values in a column.
- `plot_scatter(data, x_column, y_column)`: Examines the relationship between two numeric variables.
- `plot_correlation_matrix(data)`: Analyses correlations among numeric columns.
- `plot_line(data, x_column, y_column)`: Visualises trends or changes over time.
- `plot_overlay(data, columns, plot_types)`: Overlays multiple plots for comparison.

#### **Usage Instructions**
1. Navigate to the **Visualize Module** section in `example_usage.ipynb`.
2. Choose a graph type (e.g., histogram, scatter, correlation matrix).
3. User will then see available columns to choose from within the NHS A&E Attendances dataset.
4. Specify the columns for plotting:
   ```python
   plot_histogram("admission_counts")
   plot_scatter("time", "admission_counts")
   ```
5. Overlay plots for comparison:
   ```python
   plot_overlay(["column1", "column2"], ["line", "bar"])
   ```
6. User will be prompted whether they would like to save the plot in the existing folder location.

---

***If you wish for some further in-depth usage examples, please make your way to the `advanced_example_usage.ipynb` notebook.***

# Testing Documentation

This section outlines the testing strategies employed to ensure the robustness and reliability of the project. Various types of tests were conducted, including unit tests and functional tests. The goal of these were to verify that individual components function as expected.

## Unit Testing - pytest
Unit tests using pytest were implemented for the key modules of the project to verify the correctness of individual functions. Below is a summary of the unit tests conducted:

### 1. `test_combine.py`
This module tests the dataset merging and concatenation functionality provided by `merge_datasets` and `concat_datasets`.

#### Key Tests:
- **`test_merge_datasets`**:
  - Validates the merging of two datasets based on a specified column and join type.
  - Ensures the end dataset contains the correct number of rows and columns.
  - Confirms that the expected columns exist in the merged dataset.
- **`test_concat_datasets`**:
  - Tests the concatenation of two datasets along a specified axis.
  - Verifies the dimensions and column names of the concatenated dataset.

### 2. `test_download.py`
This module tests the data downloading functionalities provided by `download_file` and `download_csv`.

#### Key Tests:
- **`test_download_file`**:
  - Ensures a file can be downloaded from a valid URL and saved locally.
  - Confirms that the file exists and is not empty after download.
- **`test_download_csv`**:
  - Tests downloading a CSV file and loading it into a pandas DataFrame.
  - Checks the existence of expected columns and ensures the DataFrame is not empty.
- **`test_download_invalid_url`**:
  - Verifies that an exception is raised when an invalid URL is provided.

### 3. `test_visualize.py`
This module tests the visualisation functionalities, including plotting histograms, scatter plots, correlation matrices, line charts and overlay plots.

#### Key Tests:
- **`test_plot_histogram`**:
  - Validates histogram generation with and without saving the plot.
- **`test_plot_scatter`**:
  - Ensures scatter plot generation and saving functionality.
- **`test_plot_correlation_matrix`**:
  - Verifies the creation of a correlation matrix plot and saving functionality.
- **`test_plot_line`**:
  - Tests the generation of line charts with and without saving the plot.
- **`test_plot_overlay`**:
  - Ensures the creation of overlay plots with specified types and saving functionality.

### 4. `test_wrangle.py`
This module tests the data wrangling functionalities, including data cleaning, filtering and column renaming.

#### Key Tests:
- **`test_clean_data`**:
  - Ensures data is cleaned by removing rows with NaN values or filling NaN values with specified methods (e.g., mean).
  - Validates that an exception is raised for invalid cleaning options.
- **`test_filter_data`**:
  - Tests filtering of data based on a condition and validates edge cases such as invalid conditions.
- **`test_rename_columns`**:
  - Confirms that columns are renamed correctly and validates behaviour for invalid column mappings.


#### How to run pytest:

To ensure the package is running as expected, you can can run the provided test suite using pytest. 

1. Open a terminal and navigate to the root directory of the project where `tests/` folder is located.
2. Run the following command if you wish to run them all:
```bash
pytest
```
3. If you wish to run all tests but with a detailed output:
```bash
pytest -v
```
4. If you wish to isolate single tests:
```bash
pytest tests/test_visualize.py
```

## Functional Testing
Functional tests were conducted using Excel to simulate real-world scenarios and ensure that the functions behave as expected with various inputs.

### Functional Testing Process:
1. Inputs were manually supplied in Excel to test the functions.
2. Outputs were generated and compared against expected results.
3. Any discrepancies between actual and expected results were documented.

### Example Functional Test Cases:
- **Merge Datasets:**
  - Inputs: Two datasets with overlapping `id` columns.
  - Expected Output: Merged dataset with only matching rows.
- **Download CSV:**
  - Input: A valid CSV file URL.
  - Expected Output: A non-empty DataFrame with specific column names.
- **Plot Histogram:**
  - Input: A dataset column with numerical values.
  - Expected Output: A histogram plot with accurate representation of value distribution.

## Summary
The combination of unit and functional testing ensures that all components of the project function as intended under various conditions. The rigorous stress testing process also facilitates early detection of issues, improving the overall reliability of the package.

---

Whether you're a data analyst, healthcare researcher, or BI professional, VisTool empowers you to extract meaningful insights from your data. By bridging the gap between data preparation and analysis, this package helps users make informed decisions that can drive impactful outcomes in the healthcare domain.

---

# Support

If you encounter any issues, please raise them in the GitHub Issues section.

---

# Contributing

We welcome contributions to VisTool. Please fork the repository, make your changes and submit a pull request.

---

<i> Written By: Kayleigh Haydock</i>