New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

用这个框架优化SVM参数 #2

Closed
lxhao opened this Issue Sep 28, 2017 · 12 comments

Comments

Projects
None yet
2 participants
@lxhao

lxhao commented Sep 28, 2017

我想用您的框架做svm调参,对python和遗传算法都不是很熟悉,想麻烦您指点一下。我调整的4个参数是class_weight,C,gamma,和特征选择,gamma本来应该在0到1调整,我定义为(1,10000),使用的时候除以10000得到实际的gamma值;
特征选择是这样做的,比如有50个特征,参数的取值范围就是1到2的49次方,转成二进制后,如果第0位是1,选择第0个特征。

  # Define  weight, C, gamma, features
  features = len(x[0])
  indv_template = GAIndividual(
      ranges=[(1, 20), (1, 1000), (1, 10000), (1, 2 ** (features - 1))],
      encoding='binary',
      eps=1)

这是fitness函数

# Define fitness function.
# @engine.fitness_register
def fitness(indv, x, y):
  weight, C, gama, features = indv.variants
  gama = gama / 10000
  # 选择特征
  features = format(np.long(features), 'b')
  selectedFetures = []
  for i, ch in enumerate(features):
    if ch == '1':
      selectedFetures.append(i)
  x = x[:, selectedFetures]
  svmModel = SVC(class_weight={1: weight}, gamma=gama, C=C)
  value = crossValidation(svmModel, x, y, scorer=scorer)
  return value

@lxhao lxhao changed the title from 怎么使用参数的二进制 to 用这个框架优化SVM参数 Sep 28, 2017

@PytLab

This comment has been minimized.

Show comment
Hide comment
@PytLab

PytLab Sep 29, 2017

Owner

这个问题,需要针对个体中的不同变量设置不同的精度,目前gaft还不支持这样设置,我这两天把这个添加进去,使GAIndividual支持不同精度的设置。

Owner

PytLab commented Sep 29, 2017

这个问题,需要针对个体中的不同变量设置不同的精度,目前gaft还不支持这样设置,我这两天把这个添加进去,使GAIndividual支持不同精度的设置。

@PytLab

This comment has been minimized.

Show comment
Hide comment
@PytLab

PytLab Oct 1, 2017

Owner

我把gaft更新了,现在定义个体支持不同变量不同精度的设置,对于能被2整除的范围,可以直接将你的变量离散精度设为1。如果是浮点数,直接设为范围[0, 1],精度0.001就好了。

例如:

indv = GAIndividual(ranges=[(0, 1), (1, 2**49)], encoding='binary', eps=[0.001, 1])
Owner

PytLab commented Oct 1, 2017

我把gaft更新了,现在定义个体支持不同变量不同精度的设置,对于能被2整除的范围,可以直接将你的变量离散精度设为1。如果是浮点数,直接设为范围[0, 1],精度0.001就好了。

例如:

indv = GAIndividual(ranges=[(0, 1), (1, 2**49)], encoding='binary', eps=[0.001, 1])
@lxhao

This comment has been minimized.

Show comment
Hide comment
@lxhao

lxhao Oct 1, 2017

谢谢大神

lxhao commented Oct 1, 2017

谢谢大神

@lxhao lxhao closed this Oct 1, 2017

@lxhao

This comment has been minimized.

Show comment
Hide comment
@lxhao

lxhao Oct 1, 2017

还有一个小问题,发现输出的结果不是最优解,下面这个最优解应该是0.583,输出的是0.51:
gaft.ConsoleOutputAnalysis INFO Generation number: 5 Population number: 30
gaft.ConsoleOutputAnalysis INFO Generation: 0, best fitness: 0.583
gaft.ConsoleOutputAnalysis INFO Generation: 1, best fitness: 0.561
gaft.ConsoleOutputAnalysis INFO Generation: 2, best fitness: 0.513
gaft.ConsoleOutputAnalysis INFO Generation: 3, best fitness: 0.511
gaft.ConsoleOutputAnalysis INFO Generation: 4, best fitness: 0.511
gaft.ConsoleOutputAnalysis INFO Optimal solution: ([17.466666666666665, 200.40900195694715, 458.77377609571477], 0.5106504981331608)

另外,当我设置种群大小为30时,发现每次迭代会调用fitness函数很多次,我以为只会调用30次,是我哪里理解错了吗?

lxhao commented Oct 1, 2017

还有一个小问题,发现输出的结果不是最优解,下面这个最优解应该是0.583,输出的是0.51:
gaft.ConsoleOutputAnalysis INFO Generation number: 5 Population number: 30
gaft.ConsoleOutputAnalysis INFO Generation: 0, best fitness: 0.583
gaft.ConsoleOutputAnalysis INFO Generation: 1, best fitness: 0.561
gaft.ConsoleOutputAnalysis INFO Generation: 2, best fitness: 0.513
gaft.ConsoleOutputAnalysis INFO Generation: 3, best fitness: 0.511
gaft.ConsoleOutputAnalysis INFO Generation: 4, best fitness: 0.511
gaft.ConsoleOutputAnalysis INFO Optimal solution: ([17.466666666666665, 200.40900195694715, 458.77377609571477], 0.5106504981331608)

另外,当我设置种群大小为30时,发现每次迭代会调用fitness函数很多次,我以为只会调用30次,是我哪里理解错了吗?

@lxhao lxhao reopened this Oct 1, 2017

@PytLab

This comment has been minimized.

Show comment
Hide comment
@PytLab

PytLab Oct 1, 2017

Owner

这个是优化哪个?

Owner

PytLab commented Oct 1, 2017

这个是优化哪个?

@lxhao

This comment has been minimized.

Show comment
Hide comment
@lxhao

lxhao Oct 1, 2017

我同时优化了4个参数,fitness函数返回的是模型的评估值

lxhao commented Oct 1, 2017

我同时优化了4个参数,fitness函数返回的是模型的评估值

@PytLab

This comment has been minimized.

Show comment
Hide comment
@PytLab

PytLab Oct 1, 2017

Owner
  1. 如果你没有对fitness函数进行标定的话,正常gaft会搜索fitness的最大值。你可以自己通过取反等操作来处理,也可以用engine.linear_scaling(target='max')装饰器来标定。关于标定,可以看我这篇博客:遗传算法中适值函数的标定与大变异算法
  2. 正常是应该计算30次的,可能我程序里排序的时候有多余的调用,这里我需要找到在优化下哈
Owner

PytLab commented Oct 1, 2017

  1. 如果你没有对fitness函数进行标定的话,正常gaft会搜索fitness的最大值。你可以自己通过取反等操作来处理,也可以用engine.linear_scaling(target='max')装饰器来标定。关于标定,可以看我这篇博客:遗传算法中适值函数的标定与大变异算法
  2. 正常是应该计算30次的,可能我程序里排序的时候有多余的调用,这里我需要找到在优化下哈
@lxhao

This comment has been minimized.

Show comment
Hide comment
@lxhao

lxhao Oct 1, 2017

在fitness函数执行时间比较长的时候,这里确实比较浪费资源,我计算了一下,同时优化4个参数,种群大小设置为30,每次迭代会调用fitness将近900次。
如果改好了麻烦说一声哈,谢谢大神

lxhao commented Oct 1, 2017

在fitness函数执行时间比较长的时候,这里确实比较浪费资源,我计算了一下,同时优化4个参数,种群大小设置为30,每次迭代会调用fitness将近900次。
如果改好了麻烦说一声哈,谢谢大神

@lxhao

This comment has been minimized.

Show comment
Hide comment
@lxhao

lxhao Oct 1, 2017

我试了下用字典保存种群中计算过的结果可以解决重复计算的问题

lxhao commented Oct 1, 2017

我试了下用字典保存种群中计算过的结果可以解决重复计算的问题

@PytLab

This comment has been minimized.

Show comment
Hide comment
@PytLab

PytLab Oct 7, 2017

Owner

嗯,我正在做profiling,然后优化,也欢迎你pull request哈😝

Owner

PytLab commented Oct 7, 2017

嗯,我正在做profiling,然后优化,也欢迎你pull request哈😝

@PytLab

This comment has been minimized.

Show comment
Hide comment
@PytLab

PytLab Oct 8, 2017

Owner

gaft已通过函数返回值缓存的方式进行了优化,你可以在尝试用一下,应该会快很多😜 具体的优化可以参考我最近的commits

Owner

PytLab commented Oct 8, 2017

gaft已通过函数返回值缓存的方式进行了优化,你可以在尝试用一下,应该会快很多😜 具体的优化可以参考我最近的commits

@PytLab

This comment has been minimized.

Show comment
Hide comment
@PytLab

PytLab Oct 9, 2017

Owner

优化过程我记录了下,可以参考这篇博客:遗传算法框架GAFT优化小记

Owner

PytLab commented Oct 9, 2017

优化过程我记录了下,可以参考这篇博客:遗传算法框架GAFT优化小记

@PytLab PytLab added the question label Oct 15, 2017

@PytLab PytLab closed this Oct 16, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment