1. Bibliografía:
  - El **tema 6** permitirá al alumno familiarizarse con la programación genética (PG) 
  - y, **el mencionado artículo**, con una de las variantes de la PG, denominada Evolución Gramatical (del inglés Grammatical Evolution, GE). 
  - El **capítulo 8** describe distintos mecanismos para sintonizar de forma adaptativa cada uno de los diferentes parámetros de los que consta un algoritmo evolutivo (AE). 
  - El **capítulo 10** describe la forma de hibridar un AE con otros métodos de búsqueda. 
  - Finalmente, en el **capítulo 12**, se muestran distintas estrategias para manejar la existencia de restricciones en problemas de optimización que son abordados mediante AEs.
  
2. Secciones
  - Descripción del problema a resolver
  - Método para resolverlo 
    - se debe analizar la idoneidad o no del uso de GE para resolver el problema planteado
    - se debe incluir la expresión matemática de la función de evaluación finalmente empleada
    - se debe incluir la descripción de los diferentes operadores de inicialización, variación y selección empleados
    - se debe incluir la forma de manejar las restricciones, los mecanismos de control de parámetros utilizados, así como los mecanismos de búsqueda local implementados
  - Los resultados de los distintos experimentos realizados
  - Un análisis y comparación de resultados
  - Una sección de conclusiones
  - Una descripción del código implementado. 

3. Evaluación
  - Sobre la presentación (2/10)
    - Se evaluará especialmente la claridad en la redacción de la memoria y la capacidad de síntesis.  
  - Sobre el manejo de restricciones (1/10)
      - Se valorará la originalidad del mecanismo o mecanismos usados para el manejo de restricciones.
  - Sobre la configuración del algoritmo (2/10)
    - Aquí se valorará el procedimiento seguido por el alumno a la hora de elegir la mejor configuración de parámetros del algoritmo, incluyendo la implementación de mecanismos de control de parámetros adaptativos o auto-adaptativos
  - Sobre la hibridación del algoritmo con técnicas de búsqueda local (1/10)
    - Se valorará la originalidad del mecanismo de búsqueda local utilizado.
  - Sobre el análisis y comparación de resultados, y conclusiones (4/10)
    - Se valorará la forma de interpretar y comparar los diferentes experimentos realizados.
      - Es muy importante que dicha valoración se haga siempre en términos de los índices SR, MBF, AES 
      - y cualquier otra gráfica que considere oportuna como, por ejemplo, los plots de progreso de convergencia. 
    - Finalmente, se valorará la calidad de las conclusiones obtenidas a partir de la interpretación y comparación de resultados.


#1. Descripción del problema a resolver

Repitiendo las indicaciones dadas en el documento de la actividad, el problema consiste en implementar un algoritmo evolutivo para calcular la derivada simbólica de una función 
$$ f:X \subseteq \mathcal{R} \rightarrow \mathcal{R} $$ 

Disponemos de las siguientes dos definiciones:

> **Definición de derivada de una función en un punto**: Sea $X \subseteq \mathcal{R}$ un intervalo abierto. Diremos que $f:X \subseteq \mathcal{R} \rightarrow \mathcal{R}$ es derivable en $x_0 \in X$, denotado por $f'(x_0)$, si existe y es finito el límite:
$$
f'(x_0) = \lim \limits_{h \to 0} \frac{f(x_0+h)-f(x_0)}{h}  \tag{1}
$$

> **Definición de derivada de una función en un intervalo**: Sea $X \subseteq \mathcal{R}$ un intervalo abierto. Diremos que $f:X \subseteq \mathcal{R} \rightarrow \mathcal{R}$ es derivable en el intervalo $[a,b] \subseteq X$, si $f$ es derivable en cada uno de los puntos de dicho intervalo, es decir, si:
$$
f'(x) = \lim \limits_{h \to 0} \frac{f(x+h)-f(x)}{h}, \forall x \in [a,b]  \tag{2}
$$

Suponiendo que $f$ sea derivable en $[a,b]$, el problema de calcular la derivada lo vamos a transformar en un nuevo problema de optimización consistente en encontrar una función $g(x)$ que minimice la expresión:
$$
\min \limits_{g(x)} \frac{1}{b-a}\int_{a}^{b} error[f'(x),g(x)]dx \tag{3}
$$

dónde $f'(x)$ se calcularía utilizando la expresión $(2)$.

No obstante, el problema anterior se puede resolver de forma aproximada discretizando el intervalo de definición, es decir, cambiando el operador integral por un sumatorio:
$$
\min \limits_{g(x)} \frac{1}{N+1}\sum_{i=0}^{N} error_i[f'(a+i*h),g(a+i*h)] \tag{4}
$$
dónde $h=\frac{b-a}{N}$ es la anchura del subintervalo de muestreo para conseguir muestrear $N+1$ puntos en el intervalo $[a,b]$, y $f'(a+i*h)$ viene dado por:
$$
f'(a+i*h)=\frac{f(a+(i+1)*h) - f(a+i*h)}{h}, \forall i \in \{0,1,...,N\} \tag{5}
$$

#2.  Método para resolverlo

##2.1. Idoneidad de GE para resolver el problema

La estrategia utilizada para la resolución de este problema consiste en elgir un conjunto de familias de funciones con dominio $ [a,b] $ e implementar un algoritmo de búsqueda


In [None]:
#!/usr/bin/env python
#
#   Copyright (C) 2008  Don Smiley  ds@sidorof.com

#   This program is free software: you can redistribute it and/or modify
#   it under the terms of the GNU General Public License as published by
#   the Free Software Foundation, either version 3 of the License, or
#   (at your option) any later version.

#   This program is distributed in the hope that it will be useful,
#   but WITHOUT ANY WARRANTY; without even the implied warranty of
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#   GNU General Public License for more details.

#   You should have received a copy of the GNU General Public License
#   along with this program.  If not, see <http://www.gnu.org/licenses/>

#   See the LICENSE file included in this archive
#

"""
This sample program shows a simple use of grammatical evolution.  The
evolutionary process drives the fitness values towards zero.

"""

from pyneurgen.grammatical_evolution import GrammaticalEvolution
from pyneurgen.fitness import FitnessElites, FitnessTournament
from pyneurgen.fitness import ReplacementTournament, MAX, MIN, CENTER


bnf =   """
<expr>              ::= <expr> <biop> <expr> | <uop> <expr> | <real> 
                      | math.log(abs(<expr>)) | <pow> | math.sin(<expr> )
                      | value | (<expr>)
<biop>              ::= + | - | * | /
<uop>               ::= + | -
<pow>               ::= pow(<expr>, <real>)
<plus>              ::= +
<minus>             ::= -
<real>              ::= <int-const>.<int-const>
<int-const>         ::= <int-const> | 1 | 2 | 3 | 4 | 5 | 6 |
                        7 | 8 | 9 | 0
<S>                 ::=
import math
total = 0.0
for i in xrange(100):
    value = float(i) / float(100)
    total += abs(<expr> - pow(value, 3))
fitness = total
self.set_bnf_variable('<fitness>', fitness)
        """


ges = GrammaticalEvolution()

ges.set_bnf(bnf)
ges.set_genotype_length(start_gene_length=20,
                        max_gene_length=50)
ges.set_population_size(50)
ges.set_wrap(True)

ges.set_max_generations(10)
ges.set_fitness_type(MIN, .01)

ges.set_max_program_length(100)
ges.set_timeouts(10, 120)
ges.set_fitness_fail(100.0)

ges.set_mutation_rate(.025)
ges.set_fitness_selections(
    FitnessElites(ges.fitness_list, .05),
    FitnessTournament(ges.fitness_list, tournament_size=2))
ges.set_max_fitness_rate(.5)

ges.set_crossover_rate(.2)
ges.set_children_per_crossover(2)
ges.set_mutation_type('m')
ges.set_max_fitness_rate(.25)

ges.set_replacement_selections(
        ReplacementTournament(ges.fitness_list, tournament_size=3))

ges.set_maintain_history(True)
ges.create_genotypes()
print ges.run()
print ges.fitness_list.sorted()
print
print
gene = ges.population[ges.fitness_list.best_member()]
print gene.get_program()

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

print ges.get_best_member().get_program()

# plt.plot(ges.get_fitness_history())

In [None]:
import math 
print math.exp(1)
print math.log(2.72)
print math.sin(0)
print math.cos(0)

In [None]:
from pyneurgen.genotypes import Genotype

# g = Genotype(10,20,3)
# g.local_bnf
# g.decimal_gene
# g.compute_fitness()

ges.set

In [None]:
def eval(t):
    """pre:  t  is a TREE, 
             where TREE ::= NUMERAL | [ OP, TREE, TREE ]
                   OP ::=  "+" | "-"
                   NUMERAL ::=  a string of digits
       post: ans  is the numerical meaning of t
       returns: ans
    """
    if isinstance(t, str) and t.isdigit():  # is  t  a string holding an int?
        ans = int(t)  # cast the string to an int
    else:  # t is a list, [op, t1, t2]
        op = t[0]
        t1 = t[1]
        t2 = t[2]
        ans1 = eval(t1)
        # assert:  ans1  is the numerical meaning of t1
        ans2 = eval(t2)
        # assert:  ans2  is the numerical meaning of t2
        if op == "+":
            ans = ans1 + ans2
        elif op == "-":
            ans = ans1 - ans2
        else:  # something's wrong with argument  t !
            print "eval error:", t, "is illegal"
            raise Exception  # stops the program
    return ans

eval(["-", ["+", "2", "1"], ["-", "3", "4"]])

In [None]:
def postfix(t):
    """pre:  t  is a TREE,  where TREE ::= NUM | [ OP, TREE, TREE ]
       post: ans  is a string holding a postfix (operator-last) sequence
             of the symbols within  t
       returns:  ans
    """
    if isinstance(t, str) and t.isdigit():  # is  t  a string holding an int?
        ans = t  #(*)  the postfix of a NUM is just the NUM itself 
    else:  # t is a list, [op, t1, t2]
        op = t[0]
        t1 = t[1]
        t2 = t[2]
        ans1 = postfix(t1)
        # assert:  ans1  is a string holding the postfix form of t1
        ans2 = postfix(t2)
        # assert:  ans2  is a string holding the postfix form of t2
        # concatenate the subanswers into one string:
        if op == "+":
            ans = ans1 + ans2 + "+"  #(*)
        elif op == "-":
            ans = ans1 + ans2 + "-"  #(*)
        else:
            print "error:", t, "is illegal"
            raise Exception  # stops the program
    return ans

postfix(["+", ["-", "2", "1"], "4"])

In [None]:
import re


class Grammar(object):
    def __init__(self, filename):
        self.regex = re.compile("<([-\w]+?)>")
        self.tokens = {}
        self._process_file(filename)
        self._dims = {}
        for x in self.tokens.keys():
            self._dims[x] = len(self.tokens[x])

    def _process_file(self, filename):
        self._cur_token = ""
        file = open(filename)
        lines = file.readlines()
        for line in lines:
            self._process_line(line)

    def _process_line(self, line):
        line = line.strip()
        if "::=" in line:
            split_rule = line.partition("::=")
            self._cur_token = split_rule[0].strip()
            line = split_rule[2].strip()
        self._process_rules(line)

    def _process_rules(self, rules):
        for rule in rules.split("|"):
            rule = rule.strip()
            try:
                if rule:
                    self.tokens[self._cur_token].append(rule)
            except KeyError:
                self.tokens[self._cur_token] = []
                self.tokens[self._cur_token].append(rule)

    def _replace(self, expr, match, codon):
        rep_str = self.tokens[match][codon % self._dims[match]]
        return expr.replace(match, rep_str, 1)

    def expand(self, genome, expr="<start>"):
        """This function expands a string using the loaded grammar
       and passed genome.  The default grammar symbol is <start>,
       but an alternate start string can be passed in after the
       genome."""
        l_gen = len(genome)
        if not l_gen:
            raise ValueError, "Empty array passed for genome"
        new_expr = expr
        idx = 0
        x = self.regex.search(new_expr)
        while x:
            new_expr = self._replace(new_expr, x.group(0), genome[idx])
            idx += 1
            if idx >= l_gen:
                idx %= l_gen
            x = self.regex.search(new_expr)
        return new_expr
    


s = """
<alpha> ::= <greeting> <noun> | <greeting> and <greeting> <noun>

<greeting> ::= Hello | Yarr | Goodbye

<noun> ::=  World | Mars
"""

# g = Grammar(s)


In [1]:
#! /usr/bin/env python

# PonyGE
# Copyright (c) 2009 Erik Hemberg and James McDermott
# Hereby licensed under the GNU GPL v3.
""" Small GE implementation """

import sys, copy, re, random, math, operator


class Grammar(object):
    """ Context Free Grammar """
    NT = "NT"  # Non Terminal
    T = "T"  # Terminal

    def __init__(self, grammar):
        self.rules = {}
        self.non_terminals, self.terminals = set(), set()
        self.start_rule = None

        self.read_bnf(grammar)

    def read_bnf(self, grammar):
        """Read a grammar file in BNF format"""
        # <.+?> Non greedy match of anything between brackets
        non_terminal_pattern = "(<.+?>)"
        rule_separator = "::="
        production_separator = "|"

        # Read the grammar file
        for line in grammar.splitlines():
            if not line.startswith("#") and line.strip() != "":
                # Split rules. Everything must be on one line
                if line.find(rule_separator):
                    lhs, productions = line.split(rule_separator)
                    lhs = lhs.strip()
                    if not re.search(non_terminal_pattern, lhs):
                        raise ValueError("lhs is not a NT:", lhs)
                    self.non_terminals.add(lhs)
                    if self.start_rule == None:
                        self.start_rule = (lhs, self.NT)
                    # Find terminals
                    tmp_productions = []
                    for production in [
                            production.strip()
                            for production in productions.split(
                                production_separator)
                    ]:
                        tmp_production = []
                        if not re.search(non_terminal_pattern, production):
                            self.terminals.add(production)
                            tmp_production.append((production, self.T))
                        else:
                            # Match non terminal or terminal pattern
                            # TODO does this handle quoted NT symbols?
                            for value in re.findall("<.+?>|[^<>]*",
                                                    production):
                                if value != '':
                                    if not re.search(non_terminal_pattern,
                                                     value):
                                        symbol = (value, self.T)
                                        self.terminals.add(value)
                                    else:
                                        symbol = (value, self.NT)
                                    tmp_production.append(symbol)
                        tmp_productions.append(tmp_production)
                    # Create a rule
                    if not lhs in self.rules:
                        self.rules[lhs] = tmp_productions
                    else:
                        raise ValueError("lhs should be unique", lhs)
                else:
                    raise ValueError("Each rule must be on one line")

    def __str__(self):
        return "%s %s %s %s" % (self.terminals, self.non_terminals, self.rules,
                                self.start_rule)

    def generate(self, _input, max_wraps=2):
        """Map input via rules to output. Returns output and used_input"""
        used_input = 0
        wraps = 0
        output = []
        production_choices = []

        unexpanded_symbols = [self.start_rule]
        print 1, unexpanded_symbols
        while (wraps < max_wraps) and (len(unexpanded_symbols) > 0):
            # Wrap
            if used_input % len(_input) == 0 and \
                    used_input > 0 and \
                    len(production_choices) > 1:
                wraps += 1
            # Expand a production
            current_symbol = unexpanded_symbols.pop(0)
            # Set output if it is a terminal
            if current_symbol[1] != self.NT:
                output.append(current_symbol[0])
            else:
                production_choices = self.rules[current_symbol[0]]
                # Select a production
                current_production = _input[used_input % len(_input)] % len(
                    production_choices)
                # Use an input if there was more then 1 choice
                if len(production_choices) > 1:
                    used_input += 1
                # Derviation order is left to right(depth-first)
                unexpanded_symbols = production_choices[
                    current_production] + unexpanded_symbols

        print 2, unexpanded_symbols
        #Not completly expanded
        if len(unexpanded_symbols) > 0:
            return (None, used_input)

        return ("".join(output), used_input)



In [67]:
grammar = """
    <alpha> ::= <greeting> <noun> | <greeting> and <greeting> <noun>

    <greeting> ::= Hello | Yarr | Goodbye

    <noun> ::=  World | Mars
"""

g = Grammar(grammar)

g.generate([1,2,3,4])

1 [('<alpha>', 'NT')]
2 []


('Goodbye and Hello World', 4)

In [73]:
import numpy as np
from math import sin, cos, exp, log
grammar = """
<expr>   ::= <expr><op><expr> \
           | (<expr><op><expr>) \
           | <pre_op>(<expr>) \
           | <var>
<op>     ::= + | - | * | / 
<pre_op> ::= sin | cos | exp | log
<var>    ::= x 
"""

g = Grammar(grammar)

print g.non_terminals
print g.terminals
print g.generate([np.random.randint(0,255) for _ in range(10)])

set(['<op>', '<expr>', '<var>', '<pre_op>'])
set(['cos', 'log', ')', '(', '+', '*', '-', '/', 'exp', 'x', 'sin'])
1 [('<expr>', 'NT')]
2 [('<pre_op>', 'NT'), ('(', 'T'), ('<expr>', 'NT'), (')', 'T'), ('<op>', 'NT'), ('<expr>', 'NT'), (')', 'T'), (')', 'T'), (')', 'T'), ('<op>', 'NT'), ('<expr>', 'NT'), (')', 'T')]
(None, 11)


In [83]:
from multiprocessing import cpu_count
from os import path
from socket import gethostname

hostname = gethostname().split('.')
machine_name = hostname[0]

from math import floor
from re import match, finditer, DOTALL, MULTILINE
from sys import maxsize

# from algorithm.parameters import params

"""Algorithm parameters"""
params = {
        # Set default step and search loop functions
        'SEARCH_LOOP': 'search_loop',
        'STEP': 'step',

        # Evolutionary Parameters
        'POPULATION_SIZE': 500,
        'GENERATIONS': 50,
        'HILL_CLIMBING_HISTORY': 1000,

        # Set optional experiment name
        'EXPERIMENT_NAME': None,
        # Set default number of runs to be done.
        # ONLY USED WITH EXPERIMENT MANAGER.
        'RUNS': 1,

        # Class of problem
        'FITNESS_FUNCTION': "regression",
        # "regression"
        # "string_match"
        # "classification"
        # "supervised_learning"

        # Select problem dataset
        'DATASET_TRAIN': "Vladislavleva4/Train.txt",
        'DATASET_TEST': "Vladislavleva4/Test.txt",
        'DATASET_DELIMITER': None,

        # Set grammar file
        'GRAMMAR_FILE': "Vladislavleva4.bnf",
        # "Vladislavleva4.bnf"
        # "Keijzer6.bnf"
        # "Dow.bnf"
        # "Banknote.bnf"
        # "letter.bnf"
        # "supervised_learning.bnf"

        # Select error metric
        'ERROR_METRIC': None,
        # "mse"
        # "mae"
        # "rmse"
        # "hinge"
        # "f1_score"

        'OPTIMIZE_CONSTANTS': False,

        # Specify target for target problems
        'TARGET': "ponyge_rocks",

        # Set max sizes of individuals
        'MAX_TREE_DEPTH': None,
        'MAX_TREE_NODES': None,
        'CODON_SIZE': 100000,
        'MAX_GENOME_LENGTH': None,
        'MAX_WRAPS': 0,

        # INITIALISATION
        'INITIALISATION': "operators.initialisation.PI_grow",
        # "operators.initialisation.uniform_genome"
        # "operators.initialisation.rhh"
        # "operators.initialisation.PI_grow"
        'INIT_GENOME_LENGTH': 200,
        # Set the maximum geneome length for initialisation.
        'MAX_INIT_TREE_DEPTH': 10,
        # Set the maximum tree depth for initialisation.
        'MIN_INIT_TREE_DEPTH': None,
        # Set the minimum tree depth for initialisation.

        # SELECTION
        'SELECTION': "operators.selection.tournament",
        # "operators.selection.tournament"
        # "operators.selection.truncation",
        'TOURNAMENT_SIZE': 2,
        # For tournament selection
        'SELECTION_PROPORTION': 0.5,
        # For truncation selection
        'INVALID_SELECTION': False,
        # Allow for selection of invalid individuals during selection process.

        # OPERATOR OPTIONS
        'WITHIN_USED': True,
        # Boolean flag for selecting whether or not mutation is confined to
        # within the used portion of the genome. Default set to True.

        # CROSSOVER
        'CROSSOVER': "operators.crossover.variable_onepoint",
        # "operators.crossover.fixed_onepoint",
        # "operators.crossover.subtree",
        'CROSSOVER_PROBABILITY': 0.75,
        'NO_CROSSOVER_INVALIDS': False,
        # Prevents crossover from generating invalids.

        # MUTATION
        'MUTATION': "operators.mutation.int_flip_per_codon",
        # "operators.mutation.subtree",
        # "operators.mutation.int_flip_per_codon",
        # "operators.mutation.int_flip_per_ind",
        'MUTATION_PROBABILITY': None,
        'MUTATION_EVENTS': 1,
        'NO_MUTATION_INVALIDS': False,
        # Prevents mutation from generating invalids.

        # REPLACEMENT
        'REPLACEMENT': "operators.replacement.generational",
        # "operators.replacement.generational",
        # "operators.replacement.steady_state",
        'ELITE_SIZE': None,

        # DEBUGGING
        # Use this to turn on debugging mode. This mode doesn't write any files
        # and should be used when you want to test new methods.
        'DEBUG': False,

        # PRINTING
        # Use this to print out basic statistics for each generation to the
        # command line.
        'VERBOSE': False,
        # Use this to prevent anything being printed to the command line.
        'SILENT': False,

        # SAVING
        'SAVE_ALL': False,
        # Use this to save the phenotype of the best individual from each
        # generation. Can generate a lot of files. DEBUG must be False.
        'SAVE_PLOTS': True,
        # Saves a plot of the evolution of the best fitness result for each
        # generation.

        # MULTIPROCESSING
        'MULTICORE': False,
        # Multiprocessing of phenotype evaluations.
        'CORES': cpu_count(),

        # STATE SAVING/LOADING
        'SAVE_STATE': False,
        # Saves the state of the evolutionary run every generation. You can
        # specify how often you want to save the state with SAVE_STATE_STEP.
        'SAVE_STATE_STEP': 1,
        # Specifies how often the state of the current evolutionary run is
        # saved (i.e. every n-th generation). Requires int value.
        'LOAD_STATE': None,
        # Loads an evolutionary run from a saved state. You must specify the
        # full file path to the desired state file. Note that state files have
        # no file type.

        # SEEDING
        'SEED_GENOME': None,
        # Specify a genome for an individual with which to seed the initial
        # population.
        'SEED_INDIVIDUAL': None,
        # Specify an individual with which to seed the initial population.
    
        # CACHING
        'CACHE': False,
        # The cache tracks unique individuals across evolution by saving a
        # string of each phenotype in a big list of all phenotypes. Saves all
        # fitness information on each individual. Gives you an idea of how much
        # repetition is in standard GE/GP.
        'LOOKUP_FITNESS': False,
        # Uses the cache to look up the fitness of duplicate individuals. CACHE
        #  must be set to True if you want to use this.
        'LOOKUP_BAD_FITNESS': False,
        # Uses the cache to give a bad fitness to duplicate individuals. CACHE
        # must be True if you want to use this (obviously)"""
        'MUTATE_DUPLICATES': False,
        # Removes duplicate individuals from the population by replacing them
        # with mutated versions of the original individual. Hopefully this will
        # encourage diversity in the population.

        # Set machine name (useful for doing multiple runs)
        'MACHINE': machine_name,

        # Set Random Seed
        'RANDOM_SEED': None,

        # Reverse Mapping to GE individual:
        'REVERSE_MAPPING_TARGET': None
}


class Grammar(object):
    """
    Parser for Backus-Naur Form (BNF) Context-Free Grammars.
    """

    def __init__(self, file_name):
        """
        Initialises an instance of the grammar class. This instance is used
        to parse a given file_name grammar.
        :param file_name: A specified BNF grammar file.
        """

        if file_name.endswith("pybnf"):
            # Use python filter for parsing grammar output as grammar output
            # contains indented python code.
            self.python_mode = True

        else:
            # No need to filter/interpret grammar output, individual
            # phenotypes can be evaluated as normal.
            self.python_mode = False

        # Initialise empty dict for all production rules in the grammar.
        # Initialise empty dict of permutations of solutions possible at
        # each derivation tree depth.
        self.rules, self.permutations = {}, {}

        # Initialise dicts for terminals and non terminals, set params.
        self.non_terminals, self.terminals = {}, {}
        self.start_rule, self.codon_size = None, params['CODON_SIZE']
        self.min_path, self.max_arity, self.min_ramp = None, None, None

        # Set regular expressions for parsing BNF grammar.
        self.ruleregex = '(?P<rulename><\S+>)\s*::=\s*(?P<production>(?:(?=\#)\#[^\r\n]*|(?!<\S+>\s*::=).+?)+)'
        self.productionregex = '(?=\#)(?:\#.*$)|(?!\#)\s*(?P<production>(?:[^\'\"\|\#]+|\'.*?\'|".*?")+)'
        self.productionpartsregex = '\ *([\r\n]+)\ *|([^\'"<\r\n]+)|\'(.*?)\'|"(.*?)"|(?P<subrule><[^>|\s]+>)|([<]+)'

        # Read in BNF grammar, set production rules, terminals and
        # non-terminals.
        self.read_bnf_file(file_name)

        # Check the minimum depths of all non-terminals in the grammar.
        self.check_depths()

        # Check which non-terminals are recursive.
        self.check_recursion(self.start_rule["symbol"], [])

        # Set the minimum path and maximum arity of the grammar.
        self.set_arity()

        # Generate lists of recursive production choices and shortest
        # terminating path production choices for each NT in the grammar.
        # Enables faster tree operations.
        self.set_grammar_properties()

        # Calculate the total number of derivation tree permutations and
        # combinations that can be created by a grammar at a range of depths.
        self.check_permutations()

        if params['MIN_INIT_TREE_DEPTH']:
            # Set the minimum ramping tree depth from the command line.
            self.min_ramp = params['MIN_INIT_TREE_DEPTH']

        elif hasattr(params['INITIALISATION'], "ramping"):
            # Set the minimum depth at which ramping can start where we can
            # have unique solutions (no duplicates).
            self.get_min_ramp_depth()

        if params['REVERSE_MAPPING_TARGET']:
            # Initialise dicts for reverse-mapping GE individuals.
            self.concat_NTs, self.climb_NTs = {}, {}

            # Find production choices which can be used to concatenate
            # subtrees.
            self.find_concatination_NTs()

    def read_bnf_file(self, file_name):
        """
        Read a grammar file in BNF format. Parses the grammar and saves a
        dict of all production rules and their possible choices.
        :param file_name: A specified BNF grammar file.
        :return: Nothing.
        """

        with open(file_name, 'r') as bnf:
            # Read the whole grammar file.
            content = bnf.read()

            for rule in finditer(self.ruleregex, content, DOTALL):
                # Find all rules in the grammar

                if self.start_rule is None:
                    # Set the first rule found as the start rule.
                    self.start_rule = {
                        "symbol": rule.group('rulename'),
                        "type": "NT"
                    }

                # Create and add a new rule.
                self.non_terminals[rule.group('rulename')] = {
                    'id': rule.group('rulename'),
                    'min_steps': maxsize,
                    'expanded': False,
                    'recursive': True,
                    'b_factor': 0
                }

                # Initialise empty list of all production choices for this
                # rule.
                tmp_productions = []

                for p in finditer(self.productionregex,
                                  rule.group('production'), MULTILINE):
                    # Iterate over all production choices for this rule.
                    # Split production choices of a rule.

                    if p.group('production') is None or p.group(
                            'production').isspace():
                        # Skip to the next iteration of the loop if the
                        # current "p" production is None or blank space.
                        continue

                    # Initialise empty data structures for production choice
                    tmp_production, terminalparts = [], None

                    # special case: GERANGE:dataset_n_vars will be transformed
                    # to productions 0 | 1 | ... | n_vars-1
                    GE_RANGE_regex = r'GE_RANGE:(?P<range>\w*)'
                    m = match(GE_RANGE_regex, p.group('production'))
                    if m:
                        try:
                            if m.group('range') == "dataset_n_vars":
                                # set n = number of columns from dataset
                                n = params['FITNESS_FUNCTION'].n_vars
                            else:
                                # assume it's just an int
                                n = int(m.group('range'))
                        except (ValueError, AttributeError):
                            raise ValueError("Bad use of GE_RANGE: " + m.group(
                            ))

                        for i in range(n):
                            # add a terminal symbol
                            tmp_production, terminalparts = [], None
                            symbol = {
                                "symbol": str(i),
                                "type": "T",
                                "min_steps": 0,
                                "recursive": False
                            }
                            tmp_production.append(symbol)
                            if str(i) not in self.terminals:
                                self.terminals[str(i)] = \
                                    [rule.group('rulename')]
                            elif rule.group('rulename') not in \
                                self.terminals[str(i)]:
                                self.terminals[str(i)].append(
                                    rule.group('rulename'))
                            tmp_productions.append({
                                "choice": tmp_production,
                                "recursive": False,
                                "NT_kids": False
                            })
                        # don't try to process this production further
                        # (but later productions in same rule will work)
                        continue

                    for sub_p in finditer(self.productionpartsregex,
                                          p.group('production').strip()):
                        # Split production into terminal and non terminal
                        # symbols.

                        if sub_p.group('subrule'):
                            if terminalparts is not None:
                                # Terminal symbol is to be appended to the
                                # terminals dictionary.
                                symbol = {
                                    "symbol": terminalparts,
                                    "type": "T",
                                    "min_steps": 0,
                                    "recursive": False
                                }
                                tmp_production.append(symbol)
                                if terminalparts not in self.terminals:
                                    self.terminals[terminalparts] = \
                                        [rule.group('rulename')]
                                elif rule.group('rulename') not in \
                                    self.terminals[terminalparts]:
                                    self.terminals[terminalparts].append(
                                        rule.group('rulename'))
                                terminalparts = None

                            tmp_production.append({
                                "symbol": sub_p.group('subrule'),
                                "type": "NT"
                            })

                        else:
                            # Unescape special characters (\n, \t etc.)
                            if terminalparts is None:
                                terminalparts = ''
                            terminalparts += ''.join([
                                part.encode().decode('unicode-escape')
                                for part in sub_p.groups() if part
                            ])

                    if terminalparts is not None:
                        # Terminal symbol is to be appended to the terminals
                        # dictionary.
                        symbol = {
                            "symbol": terminalparts,
                            "type": "T",
                            "min_steps": 0,
                            "recursive": False
                        }
                        tmp_production.append(symbol)
                        if terminalparts not in self.terminals:
                            self.terminals[terminalparts] = \
                                [rule.group('rulename')]
                        elif rule.group('rulename') not in \
                            self.terminals[terminalparts]:
                            self.terminals[terminalparts].append(
                                rule.group('rulename'))
                    tmp_productions.append({
                        "choice": tmp_production,
                        "recursive": False,
                        "NT_kids": False
                    })

                if not rule.group('rulename') in self.rules:
                    # Add new production rule to the rules dictionary if not
                    # already there.
                    self.rules[rule.group('rulename')] = {
                        "choices": tmp_productions,
                        "no_choices": len(tmp_productions)
                    }

                    if len(tmp_productions) == 1:
                        # Unit productions.
                        print("Warning: Grammar contains unit production "
                              "for production rule", rule.group('rulename'))
                        print("       Unit productions consume GE codons.")
                else:
                    # Conflicting rules with the same name.
                    raise ValueError("lhs should be unique",
                                     rule.group('rulename'))

    def check_depths(self):
        """
        Run through a grammar and find out the minimum distance from each
        NT to the nearest T. Useful for initialisation methods where we
        need to know how far away we are from fully expanding a tree
        relative to where we are in the tree and what the depth limit is.
        :return: Nothing.
        """

        # Initialise graph and counter for checking minimum steps to Ts for
        # each NT.
        counter, graph = 1, []

        for rule in sorted(self.rules.keys()):
            # Iterate over all NTs.
            choices = self.rules[rule]['choices']

            # Set branching factor for each NT.
            self.non_terminals[rule]['b_factor'] = self.rules[rule][
                'no_choices']

            for choice in choices:
                # Add a new edge to our graph list.
                graph.append([rule, choice['choice']])

        while graph:
            removeset = set()
            for edge in graph:
                # Find edges which either connect to terminals or nodes
                # which are fully expanded.
                if all([
                        sy["type"] == "T" or
                        self.non_terminals[sy["symbol"]]['expanded']
                        for sy in edge[1]
                ]):
                    removeset.add(edge[0])

            for s in removeset:
                # These NTs are now expanded and have their correct minimum
                # path set.
                self.non_terminals[s]['expanded'] = True
                self.non_terminals[s]['min_steps'] = counter

            # Create new graph list and increment counter.
            graph = [e for e in graph if e[0] not in removeset]
            counter += 1

    def check_recursion(self, cur_symbol, seen):
        """
        Traverses the grammar recursively and sets the properties of each rule.
        :param cur_symbol: symbol to check.
        :param seen: Contains already checked symbols in the current traversal.
        :return: Boolean stating whether or not cur_symbol is recursive.
        """

        if cur_symbol not in self.non_terminals.keys():
            # Current symbol is a T.
            return False

        if cur_symbol in seen:
            # Current symbol has already been seen, is recursive.
            return True

        # Append current symbol to seen list.
        seen.append(cur_symbol)

        # Get choices of current symbol.
        choices = self.rules[cur_symbol]['choices']
        nt = self.non_terminals[cur_symbol]

        recursive = False
        for choice in choices:
            for sym in choice['choice']:
                # Recurse over choices.
                recursive_symbol = self.check_recursion(sym["symbol"], seen)
                recursive = recursive or recursive_symbol

        # Set recursive properties.
        nt['recursive'] = recursive
        seen.remove(cur_symbol)

        return nt['recursive']

    def set_arity(self):
        """
        Set the minimum path of the grammar, i.e. the smallest legal
        solution that can be generated.
        Set the maximum arity of the grammar, i.e. the longest path to a
        terminal from any non-terminal.
        :return: Nothing
        """

        # Set the minimum path of the grammar as the minimum steps to a
        # terminal from the start rule.
        self.min_path = self.non_terminals[self.start_rule["symbol"]][
            'min_steps']

        # Initialise the maximum arity of the grammar to 0.
        self.max_arity = 0

        # Find the maximum arity of the grammar.
        for NT in self.non_terminals:
            if self.non_terminals[NT]['min_steps'] > self.max_arity:
                # Set the maximum arity of the grammar as the longest path
                # to a T from any NT.
                self.max_arity = self.non_terminals[NT]['min_steps']

        # Add the minimum terminal path to each production rule.
        for rule in self.rules:
            for choice in self.rules[rule]['choices']:
                NT_kids = [i for i in choice['choice'] if i["type"] == "NT"]
                if NT_kids:
                    choice['NT_kids'] = True
                    for sym in NT_kids:
                        sym['min_steps'] = self.non_terminals[sym["symbol"]][
                            'min_steps']

        # Add boolean flag indicating recursion to each production rule.
        for rule in self.rules:
            for prod in self.rules[rule]['choices']:
                for sym in [i for i in prod['choice'] if i["type"] == "NT"]:
                    sym['recursive'] = self.non_terminals[sym["symbol"]][
                        'recursive']
                    if sym['recursive']:
                        prod['recursive'] = True

    def set_grammar_properties(self):
        """
        Goes through all non-terminals and finds the production choices with
        the minimum steps to terminals and with recursive steps.
        :return: Nothing
        """

        for nt in self.non_terminals:
            # Loop over all non terminals.
            # Find the production choices for the current NT.
            choices = self.rules[nt]['choices']

            for choice in choices:
                # Set the maximum path to a terminal for each produciton choice
                choice['max_path'] = max(
                    [item["min_steps"] for item in choice['choice']])

            # Find shortest path to a terminal for all production choices for
            # the current NT. The shortest path will be the minimum of the
            # maximum paths to a T for each choice over all chocies.
            min_path = min([choice['max_path'] for choice in choices])

            # Set the minimum path in the self.non_terminals dict.
            self.non_terminals[nt]['min_path'] = [
                choice for choice in choices if choice['max_path'] == min_path
            ]

            # Find recursive production choices for current NT. If any
            # constituent part of a production choice is recursive,
            # it is added to the recursive list.
            self.non_terminals[nt]['recursive'] = [
                choice for choice in choices if choice['recursive']
            ]

    def check_permutations(self, ramps=5):
        """
        Calculates how many possible derivation tree combinations can be
        created from the given grammar at a specified depth. Only returns
        possible combinations at the specific given depth (if there are no
        possible permutations for a given depth, will return 0).
        :param ramps: The number of depths permutations are calculated for
        (starting from the minimum path of the grammar)
        :return: Nothing.
        """

        perms_list = []
        if self.max_arity > self.min_path:
            for i in range(max((self.max_arity + 1 - self.min_path), ramps)):
                x = self.check_all_permutations(i + self.min_path)
                perms_list.append(x)
                if i > 0:
                    perms_list[i] -= sum(perms_list[:i])
                    self.permutations[i + self.min_path] -= sum(perms_list[:i])
        else:
            for i in range(ramps):
                x = self.check_all_permutations(i + self.min_path)
                perms_list.append(x)
                if i > 0:
                    perms_list[i] -= sum(perms_list[:i])
                    self.permutations[i + self.min_path] -= sum(perms_list[:i])

    def check_all_permutations(self, depth):
        """
        Calculates how many possible derivation tree combinations can be
        created from the given grammar at a specified depth. Returns all
        possible combinations at the specific given depth including those
        depths below the given depth.
        :param depth: A depth for which to calculate the number of
        permutations of solution that can be generated by the grammar.
        :return: The permutations possible at the given depth.
        """

        if depth < self.min_path:
            # There is a bug somewhere that is looking for a tree smaller than
            # any we can create
            s = "representation.grammar.Grammar.check_all_permutations\n" \
                "Error: cannot check permutations for tree smaller than the " \
                "minimum size."
            raise Exception(s)

        if depth in self.permutations.keys():
            # We have already calculated the permutations at the requested
            # depth.
            return self.permutations[depth]

        else:
            # Calculate permutations at the requested depth.
            # Initialise empty data arrays.
            pos, depth_per_symbol_trees, productions = 0, {}, []

            for NT in self.non_terminals:
                # Iterate over all non-terminals to fill out list of
                # productions which contain non-terminal choices.
                a = self.non_terminals[NT]

                for rule in self.rules[a['id']]['choices']:
                    if rule['NT_kids']:
                        productions.append(rule)

            # Get list of all production choices from the start symbol.
            start_symbols = self.rules[self.start_rule["symbol"]]['choices']

            for choice in productions:
                # Generate a list of the symbols of each production choice
                key = str([sym['symbol'] for sym in choice['choice']])

                # Initialise permutations dictionary with the list
                depth_per_symbol_trees[key] = {}

            for i in range(2, depth + 1):
                # Find all the possible permutations from depth of min_path up
                # to a specified depth

                for choice in productions:
                    # Iterate over all production choices
                    sym_pos = 1

                    for j in choice['choice']:
                        # Iterate over all symbols in a production choice.
                        symbol_arity_pos = 0

                        if j["type"] is "NT":
                            # We are only interested in non-terminal symbols
                            for child in self.rules[j["symbol"]]['choices']:
                                # Iterate over all production choices for
                                # each NT symbol in the original choice.

                                if len(child['choice']) == 1 and \
                                   child['choice'][0]["type"] == "T":
                                    # If the child choice leads directly to
                                    # a single terminal, increment the
                                    # permutation count.
                                    symbol_arity_pos += 1

                                else:
                                    # The child choice does not lead
                                    # directly to a single terminal.
                                    # Generate a key for the permutations
                                    # dictionary and increment the
                                    # permutations count there.
                                    key = [
                                        sym['symbol']
                                        for sym in child['choice']
                                    ]
                                    if (i - 1) in depth_per_symbol_trees[str(
                                            key)].keys():
                                        symbol_arity_pos += depth_per_symbol_trees[
                                            str(key)][i - 1]

                            # Multiply original count by new count.
                            sym_pos *= symbol_arity_pos

                    # Generate new key for the current production choice and
                    # set the new value in the permutations dictionary.
                    key = [sym['symbol'] for sym in choice['choice']]
                    depth_per_symbol_trees[str(key)][i] = sym_pos

            # Calculate permutations for the start symbol.
            for sy in start_symbols:
                key = [sym['symbol'] for sym in sy['choice']]
                if str(key) in depth_per_symbol_trees:
                    pos += depth_per_symbol_trees[str(key)][
                        depth] if depth in depth_per_symbol_trees[str(
                            key)] else 0
                else:
                    pos += 1

            # Set the overall permutations dictionary for the current depth.
            self.permutations[depth] = pos

            return pos

    def get_min_ramp_depth(self):
        """
        Find the minimum depth at which ramping can start where we can have
        unique solutions (no duplicates).
        :param self: An instance of the representation.grammar.grammar class.
        :return: The minimum depth at which unique solutions can be generated
        """

        max_tree_depth = params['MAX_INIT_TREE_DEPTH']
        size = params['POPULATION_SIZE']

        # Specify the range of ramping depths
        depths = range(self.min_path, max_tree_depth + 1)

        if size % 2:
            # Population size is odd
            size += 1

        if size / 2 < len(depths):
            # The population size is too small to fully cover all ramping
            # depths. Only ramp to the number of depths we can reach.
            depths = depths[:int(size / 2)]

        # Find the minimum number of unique solutions required to generate
        # sufficient individuals at each depth.
        unique_start = int(floor(size / len(depths)))
        ramp = None

        for i in sorted(self.permutations.keys()):
            # Examine the number of permutations and combinations of unique
            # solutions capable of being generated by a grammar across each
            # depth i.
            if self.permutations[i] > unique_start:
                # If the number of permutations possible at a given depth i is
                # greater than the required number of unique solutions,
                # set the minimum ramp depth and break out of the loop.
                ramp = i
                break
        self.min_ramp = ramp

    def find_concatination_NTs(self):
        """
        Scour the grammar class to find non-terminals which can be used to
        combine/reduce_trees derivation trees. Build up a list of such
        non-terminals. A concatenation non-terminal is one in which at least
        one production choice contains multiple non-terminals. For example:
            <e> ::= (<e><o><e>)|<v>
        is a concatenation NT, since the production choice (<e><o><e>) can
        reduce_trees multiple NTs together. Note that this choice also includes
        a combination of terminals and non-terminals.
        :return: Nothing.
        """

        # Iterate over all non-terminals/production rules.
        for rule in sorted(self.rules.keys()):

            # Find rules which have production choices leading to NTs.
            concat = [
                choice for choice in self.rules[rule]['choices']
                if choice['NT_kids']
            ]

            if concat:
                # We can reduce_trees NTs.
                for choice in concat:

                    symbols = [[sym['symbol'], sym['type']]
                               for sym in choice['choice']]

                    NTs = [
                        sym['symbol'] for sym in choice['choice']
                        if sym['type'] == "NT"
                    ]

                    for NT in NTs:
                        # We add to our self.concat_NTs dictionary. The key is
                        # the root node we want to reduce_trees with another
                        # node. This way when we have a node and wish to see
                        # if we can reduce_trees it with anything else, we
                        # simply look up this dictionary.
                        conc = [choice['choice'], rule, symbols]

                        if NT not in self.concat_NTs:
                            self.concat_NTs[NT] = [conc]
                        else:
                            if conc not in self.concat_NTs[NT]:
                                self.concat_NTs[NT].append(conc)

    def __str__(self):
        return "%s %s %s %s" % (self.terminals, self.non_terminals, self.rules,
                                self.start_rule)

In [91]:
grammar = """
<expr>   ::= <expr><op><expr> \
           | (<expr><op><expr>) \
           | <pre_op>(<expr>) \
           | <var>
<op>     ::= + | - | * | / 
<pre_op> ::= sin | cos | exp | log
<var>    ::= x 
"""

g = Grammar('grammar.bnf')

# print g.non_terminals
# print g.terminals


       Unit productions consume GE codons.


In [148]:
import sys
sys.path.insert(0, 'PonyGE2/src')

import representation as r
import algorithm as a
import numpy as np

def set_up_grammar(g):
    f = open("grammar.bnf", "w")
    f.write(g)
    f.close()
    
set_up_grammar("""
<expr>   ::= <expr><op><expr> | (<expr><op><expr>) | <pre_op>(<expr>) | <var>
<op>     ::= + | - | * | / 
<pre_op> ::= sin | cos | exp | log
<var>    ::= x | 1.0
""")    

g = r.grammar.Grammar('grammar.bnf')
a.parameters.params['BNF_GRAMMAR'] = g

i = 0
while i<100:
    m = a.mapper.map_ind_from_genome([np.random.randint(0,255) for _ in range(10)])
    if m[0]:
        i = i + 1
        print(m)

('x', [231, 144, 134, 33, 100, 170, 34, 193, 13, 231], None, 3, False, 3, 2)
('x', [127, 180, 161, 24, 65, 37, 28, 195, 250, 61], None, 3, False, 3, 2)
('x', [251, 94, 79, 65, 83, 158, 14, 127, 248, 35], None, 3, False, 3, 2)
('exp(1.0)', [18, 14, 27, 73, 108, 137, 213, 24, 5, 93], None, 6, False, 4, 4)
('(x+x)', [233, 27, 210, 40, 79, 136, 176, 41, 227, 88], None, 9, False, 4, 6)
('1.0', [27, 163, 213, 95, 215, 215, 46, 98, 195, 141], None, 3, False, 3, 2)
('(1.0+exp(1.0))', [177, 243, 233, 208, 250, 210, 235, 177, 85, 46], None, 12, False, 5, 8)
('1.0', [71, 179, 226, 160, 88, 90, 106, 109, 85, 73], None, 3, False, 3, 2)
('sin(x+x)', [238, 44, 164, 75, 158, 60, 79, 132, 106, 135], None, 12, False, 5, 8)
('x', [203, 138, 254, 218, 189, 89, 110, 111, 254, 7], None, 3, False, 3, 2)
('1.0', [247, 71, 174, 179, 192, 135, 94, 242, 242, 90], None, 3, False, 3, 2)
('log(x*cos(1.0))', [242, 223, 8, 147, 18, 214, 162, 9, 231, 145], None, 15, False, 6, 10)
('1.0*log(1.0)', [128, 179, 45, 70, 22

In [151]:
import sys
sys.path.insert(0, 'ponyge-master/src')
import numpy as np

import ponyge as p


g = p.Grammar('grammar.bnf')

g.generate([np.random.randint(0,255) for _ in range(10)])


i = 0
while i<100:
    m = g.generate([np.random.randint(0,255) for _ in range(10)])
    if m[0]:
        i = i + 1
        print(m)

('1.0', 2)
('cos(x)+x', 8)
('x', 2)
('cos(x)', 4)
('cos(1.0)', 4)
('(x/x)', 6)
('cos(cos(1.0))', 6)
('1.0', 2)
('cos(exp(log(1.0)))', 8)
('1.0', 2)
('log(1.0)', 4)
('1.0', 2)
('1.0', 2)
('1.0', 2)
('1.0', 2)
('x', 2)
('x', 2)
('1.0', 2)
('cos(cos(x))', 6)
('x', 2)
('1.0', 2)
('cos(cos(x))', 6)
('1.0+1.0', 6)
('log(1.0)', 4)
('log(log(cos(x)))', 8)
('1.0', 2)
('1.0', 2)
('exp(log(x))', 6)
('x', 2)
('(x/1.0)', 6)
('x', 2)
('x', 2)
('x', 2)
('x', 2)
('x', 2)
('x', 2)
('1.0', 2)
('x', 2)
('1.0', 2)
('x', 2)
('log(1.0)', 4)
('x', 2)
('x', 2)
('1.0', 2)
('x', 2)
('cos(1.0)', 4)
('(1.0+cos(x))', 8)
('exp(x)', 4)
('(x+1.0)', 6)
('cos((1.0+1.0))', 8)
('1.0', 2)
('x', 2)
('cos(x)', 4)
('x', 2)
('cos(1.0)', 4)
('x', 2)
('cos(x)', 4)
('1.0', 2)
('1.0', 2)
('1.0', 2)
('exp((x+1.0))', 8)
('1.0', 2)
('x', 2)
('x', 2)
('(1.0-x)', 6)
('x', 2)
('1.0', 2)
('(x-1.0)', 6)
('sin(1.0)', 4)
('x', 2)
('x', 2)
('x', 2)
('x', 2)
('x', 2)
('log(x)', 4)
('x', 2)
('x*cos(1.0)', 8)
('x-x', 6)
('sin(x)', 4)
('(exp(1.

(None, 5000)