# TIF to JPEG Image Conversion 

## Disclaimer

This tool is not supported by the Google engineering team or product team. It is provided and supported on a best-effort basis by the DocAI Incubator Team. No guarantees of performance are implied.	

## Objective

The objective of the tool is to convert TIF files to jpeg files.

## Prerequisite
* Vertex AI Notebook or Google Colab Notebook
* Bucket details of dataset.

## Step by Step procedure

### 1. Input Details

In [2]:
PROJECT_ID = "xxxx-xxxx-xx"
BUCKET_NAME = "xxxxxxxx"
INPUT_FOLDER_PATH = "xxxxxxxxx/xxxxxxxx/xxxx"  # without bucket name
OUTPUT_FOLDER_PATH = "xxxxxx/xxxxxxxx/xxxxx"  # without bucket name

**PROJECT_ID**: *provide the project id*

**BUCKET_NAME**: *Provide the bucket name from google cloud storage.*

**INPUT_FOLDER_PATH**: *Provide the location of the TIF file stored on google cloud storage*

**OUTPUT_FOLDER_PATH**: *Provide the location of the output path to store the jpeg image files*


### 2. Run the Code
Use the function given in the sample code which stores the converted jpeg files in the desired location of google cloud storage.

#### Install the required Libraries

In [None]:
!pip install pillow
!pip install tqdm
!pip install google-cloud

In [None]:
from google.cloud import storage
from PIL import Image
from io import BytesIO
from typing import IO
from tqdm.notebook import tqdm
from google.cloud.storage.blob import Blob

storage_client = storage.Client(project=PROJECT_ID)
bucket = storage_client.get_bucket(BUCKET_NAME)


def get_file(file_path: str) -> IO[BytesIO]:
    """
    Read a file from Google Cloud Storage.

    Args:
        file_path (str): The path of the file in Google Cloud Storage.

    Returns:
        IO[BytesIO]: A BytesIO stream of the file content.
    """
    blob = bucket.blob(file_path)
    byte_stream = BytesIO()
    blob.download_to_file(byte_stream)
    byte_stream.seek(0)
    return byte_stream


def store_blob(document: bytes, file_path: str) -> None:
    """
    Store a file in Google Cloud Storage.

    Args:
        document (bytes): The content of the file to be stored.
        file_path (str): The path where the file will be stored in Google Cloud Storage.

    Returns:
        None
    """
    blob = bucket.blob(file_path)
    blob.upload_from_string(document, content_type="image/jpeg")


blobs = bucket.list_blobs(prefix=INPUT_FOLDER_PATH)
files = [
    blob.name
    for blob in blobs
    if blob.name.endswith(".TIF")
    or blob.name.endswith(".tif")
    or blob.name.endswith(".TIFF")
    or blob.name.endswith(".tiff")
]

for file_path in tqdm(files):
    print(file_path)
    image_data = get_file(file_path)
    outfile_name = OUTPUT_FOLDER_PATH + "/" + file_path.split("/")[-1][:-4] + "jpeg"
    im = Image.open(image_data)
    out = im.convert("RGB")
    image_str = BytesIO()
    out.save(image_str, "JPEG")
    contents = image_str.getvalue()
    store_blob(contents, outfile_name)

## Output

Converted Jpeg files will be saved in the output folder.