# GPT VISION API to extract data from Health App screenshots
Some Health Apps that are paired with a smart-watch, used to monitor physical activities like running and biking, don't have a mean of exporting the history of exercises in a dataframe. So we will screenshot from the app the detail page of each activity and use the GPT VISION API to convert it into a dataframe.

Most of the code below has been generated by GPT-4  

### Installing dependicies
Pillow for image treatment and openai for you know what

In [None]:
!pip install pillow
!pip install openai

In [None]:
import os
from PIL import Image
import pandas as pd
import re
from openai import OpenAI
import base64

If used in google drive for accessing the screeshots

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Image processing
In this example we put the screenshots in a drive folder, then we create a list of images paths to encode them in b64 to be fed to the gpt-vision model.

In [None]:
# Function to process the images and create a DataFrame
def process_images_in_folder(folder_path):
    data = []
    # Iterate over the folder containing the images
    for image_name in os.listdir(folder_path):
        if image_name.lower().endswith(('.jpeg', '.jpg', '.png')):  # Process only image files
            image_path = os.path.join(folder_path, image_name)
            data.append(image_path)
    return data

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

In [None]:
folder_path = '/content/drive/My Drive/screenshots_folder'

imgs = process_images_in_folder(folder_path)

encoded_images = []
for img_path in imgs:
    encoded_images.append(encode_image(img_path))


### GPT Call
We iterate through the list of encoded images and call the model "gpt-4-vision-preview". We prompt the model in a way that we get as an output a string in this format : '[data1:value1; data2:value2;...]' where the content is the physical activity indicators extracted from each screenshot.

The API cost is around 2$ for 90 iterations.

[Link to OpenAI documentation](https://platform.openai.com/docs/guides/vision)

In [None]:
client = OpenAI(api_key="API_key")
responses = []

for i in range(len(encoded_images)):
  responses.append(client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages=[
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "Je vais te donner des captures d'écran d'une application de santé qui donne les performances d'exercices physiques (vélo et course à pieds). Je veux que tu me donnes seulement une ligne de texte contenant les informations que tu as récupérer de l'image grace a gpt vision, voici un exemple de cette ligne de texte avec X étant la donnée à extraire de l'image si disponible sinon mettre N/A : [type d'activité: X; date :X; Distance totale : X ;Durée totale: X; Allure moyenne: X; Vitesse moyenne: X; Energie: X; Cadence moyenne (pas/min): X; Longueur moyenne de pas: X; Total Pas : X; Rythme cardiaque moyen: X; Gain d'élévation: X; Descente totale: X; Effet d'entraînement aérobie: X; VO2Max: X]"},
          {
            "type": "image_url",
            "image_url": f"data:image/jpeg;base64,{encoded_images[i]}",
          },
        ],
      }
    ],
    max_tokens=300,
  ))

The responses are gathered in a list of string.

In [None]:
results = []
for i in range(len(imgs)):
  results.append(responses[i].choices[0].message.content)

### Extracting Dataframe
As we prompted the GPT model to answer in a specific manner, we can extract the data-value pairs and generate a dataframe that synthesizes all the activities.


In [None]:
# Function to parse the activity string and extract key-value pairs
def parse_activity_string(activity_str):
    key_value_pairs = activity_str.strip('[]').split(';')
    activity_data = {}
    for pair in key_value_pairs:
        if pair.strip():  # Check if the pair is not empty
            key, value = pair.split(':', 1)
            activity_data[key.strip()] = value.strip()
    return activity_data

parsed_responses = [parse_activity_string(response) for response in results]

df = pd.DataFrame(parsed_responses)

df.head()

Unnamed: 0,type d'activité,date,Distance totale,Durée totale,Allure moyenne,Vitesse moyenne,Energie,Cadence moyenne (pas/min),Longueur moyenne de pas,Total Pas,Rythme cardiaque moyen,Gain d'élévation,Descente totale,Effet d'entraînement aérobie,VO2Max,Énergie,Effet d'entraînement anaérobie,Durée de récupération
0,Course en extérieur,"20 novembre 2021, 15:48","6,60 km",00:38:20,"5'48""/km","10,33 km/h",328 kcal,162,106 cm,6 210,140 bpm,"33,5 m","35,8 m",21,54 ml/kg/min,,,
1,Course en extérieur,"25 octobre 2021, 19:36","6,02 km",00:34:25,"5'43""/km","10,49 km/h",287 kcal,160 pas/min,109 cm,5 541 pas,137 bpm,"14,4 m","19,4 m",20,54 ml/kg/min,,,
2,Course en extérieur,"20 octobre 2021, 16:53","6,66 km",00:37:21,"5'36"" /km","10,70 km/h",333 kcal,161 pas/min,110 cm,6 032 pas,144 bpm,"61,5 m","60,4 m",23,54 ml/kg/min,,,
3,Course en extérieur,20 août 2021,"3,95 km",00:20:44,"5'15"" /km","11,43 km/h",178 kcal,163,117 cm,3 387,135 bpm,"4,4 m","3,5 m",21,55 ml/kg/min,,,
4,Course en extérieur,"11 août 2021, 19:35","6,45 km",00:34:33,"5'21"" /km","11,20 km/h",288 kcal,164 pas/min,114 cm,5 674 pas,136 bpm,"77,2 m","86,5 m",22,55 ml/kg/min,,,


Exporting it to excel, for some manual cleaning 😊

In [None]:
df.to_excel('/content/drive/My Drive/data_output.xlsx', index=False)