Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make python DeepImageFeaturizer use Scala version. #88

Merged
merged 13 commits into from Jan 23, 2018

Commits on Dec 19, 2017

  1. Configuration menu
    Copy the full SHA
    73898fd View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    83a6ac9 View commit details
    Browse the repository at this point in the history

Commits on Jan 16, 2018

  1. Updated featurizer tests.

    Due to the difference in images resized by different libraries, DeepImageFeaturizer is no longer required to match the results from keras on raw (non-resized) images.
    Instead of computing l-inf or l2 norm of the two feature vectors, we compare their cosine distance and require it to be sufficiently "low" (< 1e-2).
    We also ran several transfer learning examples and ensured that the results were comparable.
    These experiments were successfull and new sparkdl's features proved to be at least as good as native keras ones, however, they have not been added as automated tests.
    
    Overall I think the combination of (1) you get exact match with no resize and (2) not too different with resize is good enough.
    
    Cosine distance justification:
    Cosine distance ensures that the resulting feature vector has similar direction. Intuitively, this is important property for the generated features and I think ensuring cosine distance is low enough gives better guarantees than computing l2 or l-inf norm and comparing with huge allowed diff.
    
    I did comparisons to different images, various amounts of added noise and some obvious bugs I could think of such as skipping the preprocess or having the color channels flipped. Most distances came orders of magnitude higher, noise with sd = 0.01 got comparable distance. Here's the breakdown on the test images: (the distance metric is by definition from [0,1] interval):
    
    cosine distance per image to the same image with added (normal, mean = 0) noise:
    sd = 1.00: [0.69, 0.78, 0.77 0.75, 0.76]
    sd = 0.10: [0.1, 0.2, 0.31, 0.12, 0.23]
    sd = 0.01: [0.0078, 0.0094, 0.060, 0.0040, 0.0085]
    
    cosine distance with no preprocessing: all ~ 0.9
    cosine distance with faulty preprocess (mean of one channel is incorrect): all ~ 0.1
    cosine distance with flipped channels: all ~ 0.3
    cosine distance matrix for the test images:
    [ 0.00 0.75 0.71 0.70 0.67]
    [ 0.75 0.00 0.79 0.85 0.74]
    [ 0.71 0.80 0.00 0.75 0.69]
    [ 0.70 0.85 0.75 0.00 0.70]
    [ 0.67 0.74 0.69 0.70 0.00]
    tomasatdatabricks committed Jan 16, 2018
    Configuration menu
    Copy the full SHA
    8e40dd4 View commit details
    Browse the repository at this point in the history
  2. wip updates

    tomasatdatabricks committed Jan 16, 2018
    Configuration menu
    Copy the full SHA
    a8b6ba9 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    dcc7131 View commit details
    Browse the repository at this point in the history
  4. Addressed some of reviewer's comments. One larger change: generate_ap…

    …p_models.py script no longer modifies the Models.scala source direclty. Instead it generates a file in the current working directory and lets user copy it.
    tomasatdatabricks committed Jan 16, 2018
    Configuration menu
    Copy the full SHA
    dbbcd6b View commit details
    Browse the repository at this point in the history
  5. Adressed some of reviewer's comments.

      Few minor fixes, scaleHint converter in DeepImageFeaturizer is now lazy (and params are eagerly trasnfered to jvm)
      Added licences for generated named model wrappers.
    tomasatdatabricks committed Jan 16, 2018
    Configuration menu
    Copy the full SHA
    dee337c View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    ad20ccc View commit details
    Browse the repository at this point in the history
  7. Addressed revieweres comments. Mostly cosmetic changes, moved gen_app…

    …_model.py to python/model_gen/
    tomasatdatabricks committed Jan 16, 2018
    Configuration menu
    Copy the full SHA
    3350a91 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    b27ad7b View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    71c7b3e View commit details
    Browse the repository at this point in the history

Commits on Jan 19, 2018

  1. Addressed reviewer's comments. Fixed indentations and added missing s…

    …pace around operators and arguments."
    tomasatdatabricks committed Jan 19, 2018
    Configuration menu
    Copy the full SHA
    64a1d5c View commit details
    Browse the repository at this point in the history

Commits on Jan 23, 2018

  1. Configuration menu
    Copy the full SHA
    daffce8 View commit details
    Browse the repository at this point in the history