## Upwork project summary from client
My daughter day care has an app where they post photos everyday but they need to be downloaded one at a time. Seeing if there is a way to write a script to download all media from the app. Would be great to capture these memories

App is called “my bright day” likely 1000 plus photos and video

- download both video and pictures (high resolution version not thumbnail)
- Ideally have photo have dates in file_name so can be sorted by time
- Upload to a Google Drive folder and share
- ideally provide instructions so i can repeat the process later for future photos.


## How to setup
This notebook is written to provide the solution for the porject. To run this notebook, you should have python **3.10** installed on local machine, no extra python package is needed. To run this notebook, I recommend using VSCode (download VS Code from [here](https://code.visualstudio.com/download)). Of course if you need some help or want to run it another way, can message and we can setup a consultation session.

## How this notebook is organized
This notebook groups each functional block in steps with documents. Just follow the instruction on each step and run the code.

## Technical details
Photos and videos are stored in Google Cloud storage can be retrieved by sending a request to the **signed URL**(basically GCS URL with a time sensitive token). To get signed URL, **my bright day's a media API** is exposed that takes the media ID and my bright day's authentication key. The media IDs can be retrieved by **my bright day's snapshot API** that contains all the media information for a given date range. With this knowledge, the script offers 2 use cases 
 - Download all the medias all the way to the earlier date (ths project ask)
 - Download media for a particular date range.

You can specify the download destination root folder. Within this folder, the script creates sub folder using "YYYY-mm-dd" format. The media is named after its capture time. Currently only **jpeg** and **mp4** format are supoorted.
The download process creates a report that conatains all the metadata (media name, download success/fail, fail reason, etc) for the download.

The script uses multi thread to speed up download. Actual number of work is dependent to the number of cores of your environment.

In order for the script to authenticate. You need to login to My bright app from laptop and get the api key. Instruction is provided in the below section close to the actual code.

If there is any question, please let me know so we can schedule a consultation session.

## Step 1: Execute the below cell 
It'll make all the functions available for later use.

In [6]:
import os
from datetime import datetime
import requests
from io import BytesIO
from PIL import Image
import numpy as np
from concurrent.futures import ThreadPoolExecutor, as_completed


def get_dependent_from_guardian(api_key, guardian_id):
    """
    Get dependent ID from the guardian ID.
    """
    dependent_url = f"https://mybrightday.brighthorizons.com/api/v2/dependents/guardian/{guardian_id}"
    header={"x-api-key" : api_key}
    response = requests.get(dependent_url, headers=header)
    response.raise_for_status()
    
    return response.json()[0]['id']


def get_video_from_signed_url(signed_url, video_name, save_path):
    response = requests.get(signed_url)
    with open(os.path.join(save_path, video_name), "wb") as f_out:
          f_out.write(response.content)
    
def get_image_from_signed_url(signed_url, image_name, save_path, save=True):
    response = requests.get(signed_url)
    response.raise_for_status()
    im = BytesIO(response.content)
    img = Image.open(im)
    img = img.convert('RGB')
    if save:
        img.save(os.path.join(save_path, image_name))
    else:
        return img


def get_media_metadata_from_media_id(api_key, media_id):
    mbd_url = f"https://mybrightday.brighthorizons.com/api/v2/media/{media_id}"
    header={"x-api-key" : api_key}
    response = requests.get(mbd_url, headers=header)
    response.raise_for_status()
    
    main_image_url = response.json()['signed_url']
    media_type = response.json()['mime_type']
    return main_image_url, media_type

def get_daily_report(api_key, dependent_id, start_date, end_date=None):
    if not end_date:
        end_date = datetime.today().strftime('%Y-%m-%d')
    daily_report_url = f"https://mybrightday.brighthorizons.com/api/v2/dependent/{dependent_id}/daily_reports?start={start_date}&end={end_date}"
    header={"x-api-key" : api_key}
    response = requests.get(daily_report_url, headers=header)
    response.raise_for_status()
    
    return response.json()

def get_media_for_days(api_key, days, base_path):
    """
    days: a list of metadata from get_daily_report
    """
    all_report = []
    for day in days:
        date = day['for_date']
        #os.makedirs(base_path, exist_ok=True)
        os.makedirs(os.path.join(base_path, date), exist_ok=True)

        snapshots = day['snapshot_entries']
        report = []
        for snapshot in snapshots:
            media_id = snapshot['attachment_id']
            capture_time = snapshot['capture_time']
            report_snapshot = {"media_id":media_id, "capture_time": capture_time}
            try:
                media_url, media_type = get_media_metadata_from_media_id(api_key, media_id)
                report_snapshot["media_type"] = media_type
            except:
                report_snapshot['state']='failed'
                report_snapshot['reason']='cannot get media metadata'
                continue

            if media_type == "image/jpeg":
                try:
                    get_image_from_signed_url(media_url, f"{capture_time}.jpg", os.path.join(base_path, date), True)
                    report_snapshot['state']='success'
                except Exception as exc:
                    report_snapshot['state']='failed'
                    report_snapshot['reason']= exc
            elif media_type == "video/mp4":
                try:
                    get_video_from_signed_url(media_url, f"{capture_time}.mp4", os.path.join(base_path, date))
                    report_snapshot['state']='success'
                except Exception as exc:
                    report_snapshot['state']='failed'
                    report_snapshot['reason']= exc
            else:
                report_snapshot['state']='failed'
                report_snapshot['reason']='unsupported media type'
            report.append(report_snapshot)
        all_report.extend(report)
    return all_report

def get_media_for_days_m(api_key, days, base_path):
    all_report_m = []
    worker_count = min(32, os.cpu_count() + 4)
    chunked_days = np.array_split(days, worker_count)
    with ThreadPoolExecutor(max_workers=worker_count) as executor:
        future_to_days = {executor.submit(get_media_for_days, api_key, days, base_path): days for days in chunked_days}
        for future in as_completed(future_to_days):
            try:
                report = future.result()
                all_report_m.extend(report)
            except Exception as exc:
                print(f'Encountered an exception: {exc}')
    return all_report_m

def get_stats_on_report(all_report):
    success_count, failed_count=0,0
    for report in all_report:
        if report['state'] == "success":
            success_count+=1 
        else:
            failed_count += 1
    print(f"A total of {success_count} downloads are successful, and {failed_count} downloads are failed.")
    

## Step 2: Login to My Bright Day website from Chrome on your laptop (not cellphone)

## Step 3: Find your guardian ID from the URL
The URL should look like:

https://mybrightdayapp.brighthorizons.com/home/{guardian_id}/activity-feed

The ID between home and activity-feed is your guardian ID

## Step 4: Get the api key
- While you are on My bright day home page, right click and click inspect to bring up the developer tools. 
- Navigate to Network tab.
- In the Filter, type selections.
- On the bottom right panel, under Headers tab - Request Headers, there is a key called X-Api-Key. 
- Check this image for example
![image](get_api_key.png)
Set the X-Api-Key to `api_key` and guadian ID to `guardian_ID` in the below cell. Set save directory.

In [5]:
api_key= "15b037bc-799c-4558-ae3f-35a30972e758"
guardian_id = "6214d4027193ddd5b1c937c2"

save_directory = "all_media_12"

## Now start to download the media!
Assuming you have one child. The below script will download the child's all the media generated between `last_month`, which is by default set to `today` and `months_lookback` months. Execute the cell and wait patiently. This cell is tend to run long. To give you an idea, 1000 media takes 6m on a M1 macbook pro.

In [16]:
from dateutil.relativedelta import relativedelta
import json 

child_id = get_dependent_from_guardian(api_key, guardian_id)
## Set this month back
month_lookback = 12
last_month = datetime.today()

all_report = []

while month_lookback > 0:
    this_month = last_month
    this_month_str = this_month.strftime('%Y-%m-%d')
    last_month= this_month - relativedelta(months=1)
    last_month_str = last_month.strftime('%Y-%m-%d')
    print(f"get daily snapshot between {last_month_str} and {this_month_str}.")
    days = get_daily_report(api_key, child_id, last_month_str, this_month_str)
    report = get_media_for_days_m(api_key, days, save_directory)
    all_report.extend(report)
    month_lookback -=1

get_stats_on_report(all_report)

with open("report.json", "w") as outfile:
    json.dump(all_report, outfile)

get daily snapshot between 2023-06-11 and 2023-07-11.
daily report url:https://mybrightday.brighthorizons.com/api/v2/dependent/62154527be6f1b5ca640916b/daily_reports?start=2023-06-11&end=2023-07-11
get daily snapshot between 2023-05-11 and 2023-06-11.
daily report url:https://mybrightday.brighthorizons.com/api/v2/dependent/62154527be6f1b5ca640916b/daily_reports?start=2023-05-11&end=2023-06-11
get daily snapshot between 2023-04-11 and 2023-05-11.
daily report url:https://mybrightday.brighthorizons.com/api/v2/dependent/62154527be6f1b5ca640916b/daily_reports?start=2023-04-11&end=2023-05-11
Encountered an exception: [Errno 17] File exists: 'save_directory/2023-05-11'
get daily snapshot between 2023-03-11 and 2023-04-11.
daily report url:https://mybrightday.brighthorizons.com/api/v2/dependent/62154527be6f1b5ca640916b/daily_reports?start=2023-03-11&end=2023-04-11
Encountered an exception: [Errno 17] File exists: 'save_directory/2023-04-11'
get daily snapshot between 2023-02-11 and 2023-03-11

## Another option to download
This is the one button click approach. It'll down all the media all the way to the beginning of the history. No parameter needs to be set for this path.

In [None]:
from dateutil.relativedelta import relativedelta
import json 

child_id = get_dependent_from_guardian(api_key, guardian_id)
last_month = datetime.today()

all_report = []

while True:
    this_month = last_month
    this_month_str = this_month.strftime('%Y-%m-%d')
    last_month= this_month - relativedelta(months=1)
    last_month_str = last_month.strftime('%Y-%m-%d')
    print(f"get daily snapshot between {last_month_str} and {this_month_str}.")
    days = get_daily_report(api_key, child_id, last_month_str, this_month_str)
    if len(days) == 0:
        break
    report = get_media_for_days_m(api_key, days, save_directory)
    all_report.extend(report)

get_stats_on_report(all_report)

with open("report.json", "w") as outfile:
    json.dump(all_report, outfile)

In [53]:
# Single threaded execution for debugging

api_key= "15b037bc-799c-4558-ae3f-35a30972e758"
guardian_id = "6214d4027193ddd5b1c937c2"
child_id = get_dependent_from_guardian(guardian_id)[0]

days = get_daily_report(api_key, child_id, start_date="2022-06-10", end_date="2022-07-01")
all_report = []
for day in days:
    date = day['for_date']
    report = get_media_for_days(api_key, [day])
    all_report.extend(report)