Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotate photos using the Google Cloud Vision API #14

Open
simonw opened this issue Apr 28, 2020 · 5 comments
Open

Annotate photos using the Google Cloud Vision API #14

simonw opened this issue Apr 28, 2020 · 5 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Contributor

simonw commented Apr 28, 2020

It can detect faces, run OCR, do image labeling (it knows what a lemur is!) and do object localization where it identifies objects and returns bounding polygons for them.

@simonw simonw added the enhancement New feature or request label Apr 28, 2020
@simonw
Copy link
Contributor Author

simonw commented Apr 28, 2020

Pricing is pretty good: free for first 1,000 calls per month, then $1.50 per thousand after that.

@simonw
Copy link
Contributor Author

simonw commented Apr 28, 2020

Python library docs: https://googleapis.dev/python/vision/latest/index.html

I'm creating a new project for this called simonwillison-photos: https://console.cloud.google.com/projectcreate

https://console.cloud.google.com/home/dashboard?project=simonwillison-photos

Then I enabled the Vision API. The direct link to https://console.cloud.google.com/flows/enableapi?apiid=vision-json.googleapis.com which they provided in the docs didn't work - it gave me a "You don't have sufficient permissions to use the requested API" error - but starting at the "Enable APIs" page and searching for it worked fine.

I created a new service account as an "owner" of that project: https://console.cloud.google.com/apis/credentials/serviceaccountkey (and complained about it on Twitter and through their feedback form)

pip install google-cloud-vision

from google.cloud import vision
client = vision.ImageAnnotatorClient.from_service_account_file("simonwillison-photos-18c570b301fe.json")
# Photo of a lemur
response = client.annotate_image(
    {
        "image": {
            "source": {
                "image_uri": "https://photos.simonwillison.net/i/1b3414ee9ade67ce04ade9042e6d4b433d1e523c9a16af17f490e2c0a619755b.jpeg"
            }
        },
        "features": [
            {"type": vision.enums.Feature.Type.IMAGE_PROPERTIES},
            {"type": vision.enums.Feature.Type.OBJECT_LOCALIZATION},
            {"type": vision.enums.Feature.Type.LABEL_DETECTION},
        ],
    }
)
response

Output is:

label_annotations {
  mid: "/m/09686"
  description: "Vertebrate"
  score: 0.9851104021072388
  topicality: 0.9851104021072388
}
label_annotations {
  mid: "/m/04rky"
  description: "Mammal"
  score: 0.975814163684845
  topicality: 0.975814163684845
}
label_annotations {
  mid: "/m/01280g"
  description: "Wildlife"
  score: 0.8973650336265564
  topicality: 0.8973650336265564
}
label_annotations {
  mid: "/m/02f9pk"
  description: "Lemur"
  score: 0.8270352482795715
  topicality: 0.8270352482795715
}
label_annotations {
  mid: "/m/0fbf1m"
  description: "Terrestrial animal"
  score: 0.7443860769271851
  topicality: 0.7443860769271851
}
label_annotations {
  mid: "/m/06z_nw"
  description: "Tail"
  score: 0.6934166550636292
  topicality: 0.6934166550636292
}
label_annotations {
  mid: "/m/0b5gs"
  description: "Branch"
  score: 0.6203985214233398
  topicality: 0.6203985214233398
}
label_annotations {
  mid: "/m/05s2s"
  description: "Plant"
  score: 0.585474967956543
  topicality: 0.585474967956543
}
label_annotations {
  mid: "/m/089v3"
  description: "Zoo"
  score: 0.5488107800483704
  topicality: 0.5488107800483704
}
label_annotations {
  mid: "/m/02tcwp"
  description: "Trunk"
  score: 0.5200017690658569
  topicality: 0.5200017690658569
}
image_properties_annotation {
  dominant_colors {
    colors {
      color {
        red: 172.0
        green: 146.0
        blue: 116.0
      }
      score: 0.24523821473121643
      pixel_fraction: 0.027533333748579025
    }
    colors {
      color {
        red: 54.0
        green: 50.0
        blue: 42.0
      }
      score: 0.10449723154306412
      pixel_fraction: 0.12893334031105042
    }
    colors {
      color {
        red: 141.0
        green: 121.0
        blue: 97.0
      }
      score: 0.1391485631465912
      pixel_fraction: 0.039133332669734955
    }
    colors {
      color {
        red: 28.0
        green: 25.0
        blue: 20.0
      }
      score: 0.08589499443769455
      pixel_fraction: 0.11506666988134384
    }
    colors {
      color {
        red: 87.0
        green: 82.0
        blue: 74.0
      }
      score: 0.0845794677734375
      pixel_fraction: 0.16113333404064178
    }
    colors {
      color {
        red: 121.0
        green: 117.0
        blue: 108.0
      }
      score: 0.05901569500565529
      pixel_fraction: 0.13379999995231628
    }
    colors {
      color {
        red: 94.0
        green: 83.0
        blue: 66.0
      }
      score: 0.049011144787073135
      pixel_fraction: 0.03946666792035103
    }
    colors {
      color {
        red: 155.0
        green: 117.0
        blue: 90.0
      }
      score: 0.04164913296699524
      pixel_fraction: 0.0023333332501351833
    }
    colors {
      color {
        red: 178.0
        green: 143.0
        blue: 102.0
      }
      score: 0.02993861958384514
      pixel_fraction: 0.0012666666880249977
    }
    colors {
      color {
        red: 61.0
        green: 51.0
        blue: 35.0
      }
      score: 0.027391711249947548
      pixel_fraction: 0.01953333243727684
    }
  }
}
crop_hints_annotation {
  crop_hints {
    bounding_poly {
      vertices {
        x: 2073
      }
      vertices {
        x: 4008
      }
      vertices {
        x: 4008
        y: 3455
      }
      vertices {
        x: 2073
        y: 3455
      }
    }
    confidence: 0.65625
    importance_fraction: 0.746666669845581
  }
}
localized_object_annotations {
  mid: "/m/0jbk"
  name: "Animal"
  score: 0.7008256912231445
  bounding_poly {
    normalized_vertices {
      x: 0.0390297956764698
      y: 0.26235100626945496
    }
    normalized_vertices {
      x: 0.8466796875
      y: 0.26235100626945496
    }
    normalized_vertices {
      x: 0.8466796875
      y: 0.9386426210403442
    }
    normalized_vertices {
      x: 0.0390297956764698
      y: 0.9386426210403442
    }
  }
}

@simonw
Copy link
Contributor Author

simonw commented Apr 28, 2020

For face detection:

    {"type": vision.enums.Feature.Type.Type.FACE_DETECTION}

For OCR:

    {"type": vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION}

@simonw
Copy link
Contributor Author

simonw commented Apr 28, 2020

Database schema for this will require some thought. Just dumping the output into a JSON column isn't going to be flexible enough - I want to be able to FTS against labels and OCR text, and potentially query against other characteristics too.

@simonw
Copy link
Contributor Author

simonw commented Apr 28, 2020

The default timeout is a bit aggressive and sometimes failed for me if my resizing proxy took too long to fetch and resize the image.

client.annotate_image(..., timeout=3.0) may be worth trying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant