Learnable pooling with Context Gating for video classification #30

chullhwan-song · 2018-07-30T00:10:41Z

https://arxiv.org/abs/1706.06905

chullhwan-song · 2018-07-30T00:30:38Z

abstract

video classification

구조

Learnable feature (learning feature)

global descritptor
feature
- NetVLAD : NetVLAD: CNN architecture for weakly supervised place recognition #3 참고
  - based VLAD encoding > 이를 학습가능하도록 만든 feature
  - 즉, clusters를 학습 : vald의 k-means clustering 대신에 backpropagation tuning 가능하도록
    - soft-assignment idea
- Beyond NetVLAD aggregation
  - based soft-assignment idea > bow & fisher vectors에 확장
    - use soft assignment of descriptors to visual word clusters
  - learnable BOW
    - 는 soft-assignment = softmax
      - 주의) not learnable BOW 논문이 이전에 있었다.
    - NetVLAD보다 장점은 고정된 cluster 수에서, 좀더 간결하게 다수의 feature들을 결합할 수 있다.
    - 반면에 단점은 부유한 결합된 feature를 생성하기 위해서는 k cluster를 더 크게 가져가야한다.
  - learnable Fisher Vector(NetFV)
    - clusters안에서 2차 근사(자질)(2차 근사 함수) feature를 학습하도록 NetVLAD를 수정한 형태로 만든다.
      - 이는 standard Fisher Vector encoding를 모방하는 형태로 만든다는 의미.
        
        FV 1 is capturing the first-order statistics
        
        FV 2 is capturing the second-order statistics
        
        learnable clusters
        
        는 the clusters’ diagonal covariances.
        * positive
        
        we first randomly initialize their value with a Gaussian noise with unit mean and small variance. 그리고 이들을 양수로 유지하기 위해 제곱을 취하면서 학습시킨다(then take the square of the values during training so that they stays positive.)
  - learnable Residual-less NetVLAD(NetRVLAD)
    - original NetVLAD의 simple version
    - 여기서, Residual은 나머지(차이)
    - centroid와 차이값를 계산하지 않고, soft-assigne값과 input값을 곱하는 형태로 결합한다는 의미
      - 내가 이런형태로 학습해보니, 별 차이가 없었던듯~

Context Gating

사견) 제 생각으론 이 논문은 주제 자체를 잘못잡은듯~ 위의 feature들을 좋은 category 분류..쪽으로 갔다면 더 좋았을..
암튼 이논문은 위의 learnable feature보다 이 주제를 더 주요하게 생각함.
Models interdependencies between activations with a self-gating mechanism
- X : input layer, Y: output layer, W,b : param to learn
- activation function에 대한 recalibration
- 이전 연구에서 Gated Linear Unit (GLU)로 처음 비슷한 개념으로 소개 - for language modeling 즉, 여기서 영감을 얻음.
- 여기서 소개하는것은 이에 대한 simple한 형태
audio feature도 존재
visual feature는
- The visual features consist of ReLU activations of the last fully connected layer from a publicly vailable2 Inception network trained on Imagenet.

실험결과

chullhwan-song added the Video label Jul 31, 2018

chullhwan-song mentioned this issue Mar 14, 2019

Efficient Video Classification Using Fewer Frames #110

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learnable pooling with Context Gating for video classification #30

Learnable pooling with Context Gating for video classification #30

chullhwan-song commented Jul 30, 2018

chullhwan-song commented Jul 30, 2018 •

edited

Loading

Learnable pooling with Context Gating for video classification #30

Learnable pooling with Context Gating for video classification #30

Comments

chullhwan-song commented Jul 30, 2018

chullhwan-song commented Jul 30, 2018 • edited Loading

abstract

구조

Learnable feature (learning feature)

Context Gating

실험결과

chullhwan-song commented Jul 30, 2018 •

edited

Loading