Skip to content

Commit

Permalink
Merge pull request #43 from ECRL/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
sanskriti-s committed Apr 12, 2019
2 parents 8756509 + dac814d commit 1acc7bc
Show file tree
Hide file tree
Showing 9 changed files with 690 additions and 296 deletions.
136 changes: 106 additions & 30 deletions README.md
Expand Up @@ -5,52 +5,121 @@
[![PyPI version](https://badge.fury.io/py/ecabc.svg)](https://badge.fury.io/py/ecabc)
[![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/ECRL/ecabc/blob/master/LICENSE)

**ECabc** is a generic, small scale feature tuning program that works with any **fitness function**, and **value set**. An **employer bee** is an object which stores a set of values, and a fitness score that correlates to that value, which are both passed by the user. The **onlooker bee** will create a new set of random values, which will then be assigned to a poorly performing employer bee as a replacement.
**ECabc** is a generic, small scale feature tuning program based on the Artificial Bee Colony by N. Karboga that imitates the honey foraging techniques of bees. ECabc optimizes user supplied functions called the **fitness function** using a given set of variables known as the **value set**. The bee colony consists of three types of bees: employers, onlookers and scouts. An **employer bee** is an object which stores a set of values and a **fitness score** that correlates to that value as well as the bee's probability of being picked by an onlooker bee. An **onlooker bee** is an object that chooses employer bees with a high probability and calculates new positions for them. The **scout bee** will create a new set of random values, which will then be assigned to a poorly performing employer bee as a replacement.

The fitness function that is passed must take a tuple of **(value_type, (value_min, value_max))**, with the value types allowed either being a type **float** or a type **int**. The value_type should be passed in as a string. The user may define whether they would like the fitness cost to be minimized or maximized. The user may also decide whether they would like visual feeback by turning print statements either one or off.

All scores will be saved in a file that you can specify in the constructor argument. The file name will default to **settings.json** and will hold the information of each iterations best fitness score and values.
### Research applications
While it has several applications, ECabc has been successfully used by the Energy and Combustion Research Laboratory (ECRL) at the University of Massachusetts Lowell to tune the hyperparameters of ECNet, a large-scale machine learning project for predicting fuel properties. ECNet provides scientists an open source tool for predicting key fuel properties of potential next-generation biofuels, reducing the need for costly fuel synthesis and experimentation. By increasing the accuracy of ECNet and similar models efficiently, ECabc helps to provide a higher degree of confidence in discovering new, optimal fuels. A single run of ECabc on ECNet yielded a lower average root mean square error (RMSE) for cetane number (CN) and yield sooting index (YSI) when compared to the RMSE generated by a year of manual tuning. While the manual tuning generated an RMSE of 10.13, the ECabc was able to yield an RMSE of 8.06 in one run of 500 iterations.

# Installation

### Prerequisites:
- Have python 3.5 installed
- Have python 3.X installed
- Have the ability to install python packages

### Method 1: pip
If you are working in a Linux/Mac environment
- **sudo pip install ecabc**
If you are working in a Linux/Mac environment:
```
sudo pip install ecabc
```

Alternatively, in a windows environment, make sure you are running cmd as administrator
- **pip install ecabc**
Alternatively, in a windows environment, make sure you are running cmd as administrator:
```
pip install ecabc
```

Note: if multiple Python releases are installed on your system (e.g. 2.7 and 3.5), you may need to execute the correct version of pip. For Python 3.5, change **"pip install ecabc"** to **"pip3 install ecabc"**.
To update your version of ECabc to the latest release version, use
```
pip install --upgrade ecabc
```

Note: if multiple Python releases are installed on your system (e.g. 2.7 and 3.6), you may need to execute the correct version of pip. For Python 3.6, change **"pip install ecabc"** to **"pip3 install ecabc"**.

### Method 2: From source
- Download the ECabc repository, navigate to the download location on the command line/terminal, and execute
**"python setup.py install"**.
- Download the ECabc repository, navigate to the download location on the command line/terminal, and execute:
```
python setup.py install
```

Additional package dependencies (Numpy) will be installed during the ECabc installation process.

To update your version of ECabc to the latest release version, use "**pip install --upgrade ecabc**".

# Usage
The artificial bee colony can take a mulitude of parameters.
- **value_ranges**: Your value ranges. Values must be passed as a list of tuples with a **type/(min_value, max_value)** pairing. See the example file for more details.
- **fitness_fxn**: The user defined function that will be used to generate the cost of each employer bee's values.
- **file_logging**: Accepts a 'debug'/'info'/'warn'/'error'/'crit' or 'disable'. If set to disable, will not file log for abc. This **can** be quite exepensive. If your fitness function is trivial, this will add an unecessary amount of time to reach target goal. You should instead just output to console. This is set to disable by default.
- **print_level**: Accepts 'debug'/'info'/'warn'/'error'/'crit' or 'disable'. This will print out log information to the console, and is less costly compared to saving logs to a file. If set to disable, won't output to console. Defaults to logging.INFO.
- **processes**: Decide how many processes you'd like to have running at a time. A process will run the fitness function once per iteration. Processes run in parallel, and thus the more processes you utilize, the more fitness functions can run concurrently, cutting program run time significantly. If your fitness function takes 5 seconds to execute. Utilizing 50 bees, and 5 processes, calculating the values for all bees will take 50 seconds, rather than 250. Be mindful that this will increase CPU usage heavily, and you should be careful with how many processes you allow a time, to avoid a crash or computer freeze. **If your fitness function is trivial, set processes to 0. Process spawning is expensive, and only worth it for costly fitness functions.** Defaults to 4.
- **limit**: The maximum amount of times a bee will attempt to find a new food source before abandoning it's current food source for a new random one. A bee will compare food sources around it's own a 'limit' amount of times, before replacing its food source with a completely random one if not of these searches are better than where it currently is. This helps to avoid stagnation. This defaults to 20.
- **args**: Any additional arguments that your fitness function must take outside of the values given in value_ranges. This defaults to None. Expects a dictionary with a keyword - value pair. If the argument for you fitness function is called test_argument, and you'd like that value to be 10, then you would pass {'test_argument', 10} as an argument to the abc.

The artificial bee colony also utilizes a variety to methods to toggle certain settings.
- **minimize**: If set to true, the bee colony will minimize the fitness function, otherwise it will maximize it.
- **import_settings**: Accepts a json file by name. If the file exists, the artificial bee colony will import and use these settings, then return True. If the file doesn't exist, an error message will be logged, settings will be set to default, and the function will return False.
- **save_settings**: Accepts a json file name. If the file exists, the artificial bee colony settings will be saved to this file.

# 2.0.0 Update
Update 2.0.0 changed ecabc quite a bit. We have given more control to the user by making the program no longer self terminating. Instead the user can utilize the run_iteration method to run the abc, and surround it with a necessary loop in order to ensure the abc is working to their liking. An example of the abc in action can be seen in the code snippet below.

To get started import ECabc
```python
from ecabc import *
```
Then define your fitness function as a function. The fitness function is the user defined function whose solution is being optimized. Pass in the values and args and have it return the output that is being optimized
```python
def fitness_function(values,args):
***code***
return output
```
After that, in the main function define your value ranges i.e. the user defined ranges for the variables which are being optimized
```python
values = [('int', (0,10)), ('int', (0,100)), ('float',(0,80)), ('float', (0, 360))]
```
Optionally, one can also add args. Any additional arguments that your fitness function must take outside of the values given in value_ranges. This defaults to None.
```python
arguments = {'test_argument', 10}
```
Then call ECabc as follows:
```python
abc = ABC(fitness_fxn=fitness_function, value_ranges=values, args = arguments)
```
Certain setting also need to be toggled, such as
```python
abc._minimize = True
```
And the settings can be imported and saved as follows
```python
abc._import_settings = example.json
abc._save_settings = output.json
```
Then call create_employers on it to generate your population of employer bees. This ony needs to be done once
```python
abc.create_employers()
```
After this, the code should enter a loop with a break condition. The contents of ECabc that should be in the loop have been encompassed in `run_iteration()` for simplicity.
```python
while True:
abc.run_iteration()
if (abc.best_performer[0] < 2):
break
```
The above snippet shows the setup if one wants to run ECabc until a certain output value has been obtained. Alternatively one could just set it up so that it runs for a preset number of cycles as follows:
```python
for i in range(500):
run_iteration()
```
Other parameters that can be specified in the loop are:
file logging: debug'/'info'/'warn'/'error'/'crit' or 'disable
```python
abc._logger.file_level = 'info'
abc._logger.file_level = 'debug'
abc._logger.file_level = 'warn'
abc._logger.file_level = 'error'
abc._logger.file_level = 'crit'
abc._logger.file_level = 'disable'
```
print_level. This will print out log information to the console:
```python
abc._logger.stream_level = 'info'
abc._logger.stream_level = 'debug'
abc._logger.stream_level = 'warn'
abc._logger.stream_level = 'error'
abc._logger.stream_level = 'crit'
abc._logger.stream_level = 'disable'
```
and processes:
```python
processes = 1
```
Finally, to view the output:
```python
print(abc.best_performer[2], abc.best_performer[1])
```
where best_performer[2] is the values and best_performer[1] is the fitness score associated with it.


# Example

Expand Down Expand Up @@ -99,4 +168,11 @@ if __name__ == '__main__':
print("execution time = {}".format(time.time() - start))
```

# Contributing, Reporting Issues and Other Support:

To contribute to ECabc, make a pull request. Contributions should include tests for new features added, as well as extensive documentation.

To report problems with the software or feature requests, file an issue. When reporting problems, include information such as error messages, your OS/environment and Python version.

For additional support/questions, contact Sanskriti Sharma (sanskriti_sharma@student.uml.edu), Travis Kessler (travis.j.kessler@gmail.com), Hernan Gelaf-Romer (hernan_gelafromer@student.uml.edu) and/or John Hunter Mack (Hunter_Mack@uml.edu).

2 changes: 1 addition & 1 deletion ecabc/__init__.py
@@ -1,3 +1,3 @@
import ecabc.abc
import ecabc.bees
__version__ = '2.2.2'
__version__ = '2.2.3'

0 comments on commit 1acc7bc

Please sign in to comment.