# Google Cloud Vision API

> ... Cloud Vision API allows developers to easily integrate vision detection features within applications, including image labeling, face and landmark detection, optical character recognition (OCR)

Please see the [Before you begin section](https://cloud.google.com/vision/docs/detect-labels-image-client-libraries#before-you-begin) in [Detect labels in an image by using client libraries](https://cloud.google.com/vision/docs/detect-labels-image-client-libraries) on Google Cloud Guides.

#### Enable the Google Cloud Vision API

1. In the Google Cloud console, navigate to APIs & services.
2. In Enabled APIs & services, search for the Cloud Vision API service.
3. Enable it.

#### Grant roles to your Google Cloud Platform user account

1. Open up a Cloud Shell terminal for your GCP user/account.<p/>
2. Authenticate in the Cloud Shell (even though you will see a warning that that is not necessary):<br/>`gcloud auth login`<p/>
3. You will be given a sign-in prompt URL link; open that in a Chrome browser signed in as your GCP user. Follow the instructions and authenticate.<p/>
4. [Confirm the IAM Policy currently in `[PROJECT ID]`](https://cloud.google.com/sdk/gcloud/reference/projects/get-iam-policy):<br/> `gcloud projects get-iam-policy [PROJECT_ID]`<p/>
5. Grant an appropriate role to your user account so that our program can view the storage contents, at a minimum:<br/>`gcloud projects add-iam-policy-binding [PROJECT_ID] --member="user:[USER_IDENTIFIER]" --role=roles/storage.folderAdmin`<p/>


#### Install the client libraries


    pip install --upgrade google-cloud-vision

    pip install --upgrade google-cloud-storage

In [1]:
import json
import re

from google.cloud import vision
from google.cloud import storage

In [2]:
def async_detect_document(gcs_source_uri, gcs_destination_uri, page=1):
    """OCR with PDF/TIFF as source files on GCS"""

    # Supported mime_types are: 'application/pdf' and 'image/tiff'
    mime_type = "application/pdf"

    # How many pages should be grouped into each json output file.
    batch_size = 1

    client = vision.ImageAnnotatorClient()

    feature = vision.Feature(type_=vision.Feature.Type.DOCUMENT_TEXT_DETECTION)

    gcs_source = vision.GcsSource(uri=gcs_source_uri)
    input_config = vision.InputConfig(gcs_source=gcs_source, mime_type=mime_type)

    gcs_destination = vision.GcsDestination(uri=gcs_destination_uri)
    output_config = vision.OutputConfig(
        gcs_destination=gcs_destination, batch_size=batch_size
    )

    async_request = vision.AsyncAnnotateFileRequest(
        features=[feature], input_config=input_config, output_config=output_config
    )

    operation = client.async_batch_annotate_files(requests=[async_request])

    print("Waiting for the operation to finish.")
    operation.result(timeout=420)

    # Once the request has completed and the output has been
    # written to GCS, we can list all the output files.
    storage_client = storage.Client()

    match = re.match(r"gs://([^/]+)/(.+)", gcs_destination_uri)
    bucket_name = match.group(1)
    prefix = match.group(2)

    bucket = storage_client.get_bucket(bucket_name)

    # List objects with the given prefix, filtering out folders.
    blob_list = [
        blob
        for blob in list(bucket.list_blobs(prefix=prefix))
        if not blob.name.endswith("/")
    ]
    print("Output files:")
    for blob in blob_list:
        print(blob.name)

    # Process the first output file from GCS.
    # Since we specified batch_size=2, the first response contains
    # the first two pages of the input file.
    output = blob_list[page-1]

    json_string = output.download_as_bytes().decode("utf-8")
    response = json.loads(json_string)

    # The actual response for the first page of the input file.
    first_page_response = response["responses"][0]
    annotation = first_page_response["fullTextAnnotation"]

    # Here we print the full text from the first page.
    # The response contains more information:
    # annotation/pages/blocks/paragraphs/words/symbols
    # including confidence scores and bounding boxes
    print(f"Full text, (page {page}):\n")
    print(annotation["text"])


----

### Saint-marc HD PDF for 2025-Jan 月次売上情報

![Saint-marc HD PDF for 2025-Jan 月次売上情報](samples/saintmarc-hd_20250213.pdf.png "Saint-marc HD PDF for 2025-Jan 月次売上情報")

In [3]:
%%time

gcs_source_uri = 'gs://so_olliphant/samples/saintmarc-hd_20250213.pdf'
gcs_destination_uri = 'gs://so_olliphant/out/saintmarc-hd_20250213.pdf.'

async_detect_document(gcs_source_uri, gcs_destination_uri)

Waiting for the operation to finish.
Output files:
out/saintmarc-hd_20250213.pdf.output-1-to-1.json
Full text, (page 1):

月次売上情報
年度
4月 | 5月 | 6月
7月 3月 9月
上半期 || 10月 | 11月
12月
1月 | 2月 | 3月 通期
| 昨年对比
2022
118.9 144.0 126.3 110.7 124.0 127.2
全店売上(%)
2023
116.0 110.7 109.5 117.6 119.1 114.3
124.5 115.7 107.4 106.1 122.5 140.1 120.5 120.7
114.6 106.5 108.6 108.8 108.1 107.9 110.3 111.3
2024
102.6 102.4 109.9 100.7 106.6 105.6
104.6 98.8 104.5 101.8 101.2
| 昨年对比
2022
115.1
既存店売上(%)
2023
126.0 122.8 111.5 124.8 127.4
119.7 114.6 113.8 120.7 122.9 117.1
120.9 115.8 107.0 106.3 123.6 143.7 123.8
119.6
2024
107.1 106.3 113.9 105.2 110.5 110.3
118.2 110.3 113.1 113.3 113.2 112.8 115.1 115.5
108.8 103.8 109.2 104.9 104.0
(注)既存店は、開店月を含め20ヶ月を経過した店舗を対象としております。
CPU times: user 67.9 ms, sys: 18.8 ms, total: 86.6 ms
Wall time: 13.8 s


----

### Saint-marc HD PDF for 2025-Feb 月次売上情報

![Saint-marc HD PDF for 2025-Feb 月次売上情報](samples/saintmarc-hd_20250313.pdf.png "Saint-marc HD PDF for 2025-Feb 月次売上情報")

In [4]:
%%time

gcs_source_uri = 'gs://so_olliphant/samples/saintmarc-hd_20250313.pdf'
gcs_destination_uri = 'gs://so_olliphant/out/saintmarc-hd_20250313.pdf.'

async_detect_document(gcs_source_uri, gcs_destination_uri)

Waiting for the operation to finish.
Output files:
out/saintmarc-hd_20250313.pdf.output-1-to-1.json
Full text, (page 1):

月次売上情報
年度
4月
5月 6月
7月 8月 9月
上半期 || 10月
11月 12月
1月
2月
3月
通期
| 昨年对比
2022
118.9 144.0 126.3 110.7 124.0 127.2
124.5 115.7 107.4 106.1 122.5
140.1
120.5
120.7
全店売上(%)
2023
2024
| 昨年对比
2022
既存店売上 (%)
2023
116.0 110.7 109.5 117.6 119.1 114.3
102.6 102.4 109.9 100.7 106.6 105.6
115.1 126.0 122.8 111.5 124.8 127.4
119.7 114.6 113.8 120.7 122.9 117.1
114.6 106.5 108.6 108.8 108.1 107.9 110.3
104.6 98.8 104.5 101.8 101.2 102.5
111.3
2024
107.1 106.3 113.9 105.2 110.5 110.3
120.9|| 115.8 107.0 106.3 123.6 143.7 123.8
118.2 110.3 113.1 113.3 113.2 112.8 115.1
108.8 103.8 109.2 104.9 104.0 104.6
119.6
115.5
(注)既存店は、開店月を含め20ヶ月を経過した店舗を対象としております。
CPU times: user 55.8 ms, sys: 19.9 ms, total: 75.8 ms
Wall time: 18.1 s


----

### ACEA Press Release, 2025-Feb

![ACEA Press Release, 2025-Feb](samples/Press_release_car_registrations_February_2025.pdf.png "ACEA Press Release, 2025-Feb")

In [5]:
%%time

gcs_source_uri = 'gs://so_olliphant/samples/Press_release_car_registrations_February_2025.pdf'
gcs_destination_uri = 'gs://so_olliphant/out/Press_release_car_registrations_February_2025.pdf.'

async_detect_document(gcs_source_uri, gcs_destination_uri, page=3)

Waiting for the operation to finish.
Output files:
out/Press_release_car_registrations_February_2025.pdf.output-1-to-1.json
out/Press_release_car_registrations_February_2025.pdf.output-2-to-2.json
out/Press_release_car_registrations_February_2025.pdf.output-3-to-3.json
out/Press_release_car_registrations_February_2025.pdf.output-4-to-4.json
out/Press_release_car_registrations_February_2025.pdf.output-5-to-5.json
out/Press_release_car_registrations_February_2025.pdf.output-6-to-6.json
Full text, (page 3):

acea
NEW CAR REGISTRATIONS BY MARKET AND POWER SOURCE
MONTHLY
BATTERY ELECTRIC
PLUG-IN HYBRID
February
February % change
February February % change
HYBRID ELECTRIC
February
OTHERS²
PETROL
DIESEL
TOTAL
February % change
2025
2024
25/24
2025
2024
25/24
2025
2024
25/24
February
2025
February % change
2024
February February % change
February
February % change
February
February % change
25/24
2025
2024
25/24
2025
2024
25/24
2025
2024
25/24
Austria
4,233
3,322
+27.4
1,613
1,335
+20.8
5,549


<hr width=40%/>

----

## Conclusion

* Google's Cloud Vision API is somewhat useful, but it does not appear to recognize document formatting, etc.
* Hence, it is not very useful for document understanding.