# Download Reports from the YouTube Reporting API

Use this Colab to download all available reports from the YouTube Reporting API to your Google Drive. You can choose the CMS that you want to download reports for. You will need to run this colab for each content owner that you have.

In order to use this Colab you will need the following information:
1. Your Content Owner ID. This is a 22 character ID that you can find in the URL to your content owner. **If you plan on downloading reports with revenue numbers, you will need to have permission to view revenue for the content owner.**

2. A service account key that has access to your CMS

    a. Create a Google Cloud Project if you don't have one

    b. Enable the YouTube APIs on of the [Google Cloud Console](https://support.google.com/googleapi/answer/6158841?hl=en).

    c. Create a service account in IAM section and a service account key for it using the [following tutorial](https://cloud.google.com/iam/docs/keys-create-delete).

    d. Add your service account as an admin in your CMS using the [following tutorial](https://support.google.com/youtube/answer/4524878?hl=en).  The email adress to invite should be the one associated with your service account ending with iam.gserviceaccount.com


3. Choose if you want the reports to be decompressed when downloaded or left as a gzip format. This script downloads the reports as gzip files by default to make sure every size can be handled. We added a decompression parameter to decompress the gzip as a CSV directly after download if desired. By default it is set to `True` (box is checked), and the files are decompressed. Change this field to `False` (uncheck the box) if you do not want the reports to be decompressed.

Make sure you have the YouTube Reporting API enabled on your Google Cloud Project. Instructions [here](https://support.google.com/googleapi/answer/6158841?hl=en&scdeb=scapi&sjid=5729436496856850200-EU).

Please enter your content owner ID in the fields below
on the right.


In [None]:

global CONTENT_OWNER_ID
global DECOMPRESSION


# Content Owner ID
CONTENT_OWNER_ID = 'a1b2c3d4e5f6' #@param {type:"string"}

# File Decompression settings
DECOMPRESSION = True # @param {type:"boolean"}

## API Authentication

The code below will authenticate you to the reporting API.

The authentication process gives rights to the following scopes:
- yt-analytics-monetary.readonly
- yt-analytics.readonly
- cloud-platform.readonly


In [None]:
!pip3 install 'google-api-python-client==2.84.0'
!pip3 install 'google-auth==2.17.3'
!pip3 install 'google-auth-httplib2==0.1.1'
!pip3 install 'google-auth-oauthlib==1.0.0'

In [None]:
"""YouTube Reporting API Authentication"""
import json



SCOPES = ['https://www.googleapis.com/auth/yt-analytics-monetary.readonly',
          'https://www.googleapis.com/auth/yt-analytics.readonly',
          'https://www.googleapis.com/auth/cloud-platform.read-only']
API_SERVICE_NAME = 'youtubereporting'
API_VERSION = 'v1'


# Authorize the request and store authorization credentials.
def get_authenticated_service():
  service_account_upload = files.upload()
  service_account_json = json.loads(next(iter(service_account_upload.values())))
  credentials = service_account.Credentials.from_service_account_info(
        service_account_json, scopes=SCOPES)
  print('Success!')
  return build(API_SERVICE_NAME, API_VERSION, credentials = credentials)

## Mounting your Google Drive

The `mount()` function in Colab mounts your Google Drive account to the Colab runtime. This means that you can access all of the files in your Google Drive from within Colab, and you can also save files from Colab to your Google Drive.

Please note this step will redirect to an authentication step, and it might take a few minutes if you have a lot of files in Drive.


This step is optional: If you don't execute it, the files will be downloaded locally where the instance of the Colab stands.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Import the required packages for the report download



In [None]:
import argparse
import os
import re
import requests
import http.client
import httplib2
import random
import sys
import time
import logging
import optparse
import urllib3
import gzip
import tempfile
import shutil
from types import prepare_class

from google.colab import files
import google.oauth2.credentials
import google_auth_oauthlib.flow
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from google_auth_oauthlib.flow import InstalledAppFlow
from oauth2client.tools import argparser, run_flow
from google.oauth2 import service_account
from googleapiclient.http import MediaIoBaseDownload

The following code retrieves all available bulk and system-managed reports in one go and saves them to your Google Drive.

Below are all the functions that will call the different methods of the reporting API.

The last 3 functions, `list_reporting_jobs()`, `retrieve_reports()` and `download_report()`, call respectively the jobs.list method to list jobs, jobs.reports.list to list the reports and then media.download method to download the report files from the download url.

In [None]:
def remove_empty_kwargs(**kwargs):
  """Remove keyword arguments that are not set."""
  good_kwargs = {}
  if kwargs is not None:
    for key, value in kwargs.items():
      if value:
        good_kwargs[key] = value
  return good_kwargs


def list_reporting_jobs(youtube_reporting, **kwargs):
  """Return reporting jobs from the YouTube Reporting API's jobs.list()
     method."""
  # Only include the onBehalfOfContentOwner keyword argument if the user
  # set a value for the --content_owner argument.
  kwargs = remove_empty_kwargs(**kwargs)
  res = []
  # Retrieve the reporting jobs for the user (or content owner).
  results = youtube_reporting.jobs().list(**kwargs).execute()

  if 'jobs' in results and results['jobs']:
    jobs = results['jobs']
    for job in jobs:
      res.append({
          'id':job['id'],
          'name': job['name'],
          'reportTypeId': job['reportTypeId']
          })
    return res
  else:
    print('No jobs found')


def retrieve_reports(youtube_reporting, **kwargs):
  """Return reports created by a job with the YouTube Reporting API's
     reports.list() method."""
  # Only include the onBehalfOfContentOwner keyword argument if the user
  # set a value for the --content_owner argument.
  kwargs = remove_empty_kwargs(**kwargs)

  # Retrieve available reports for the selected job.
  results = youtube_reporting.jobs().reports().list(**kwargs).execute()

  if 'reports' in results and results['reports']:
    return results['reports']


# Call the YouTube Reporting API's media.download method to download the report.
def download_report(youtube_reporting, report_url, local_file, decompression):
  """Download locally the report contained in the report url. Returns None.
  The local file does not contain the suffix. If decompression = True then
  the local file is in csv, otherwise in gzip"""
  request = youtube_reporting.media().download(
    resourceName=' '
  )
  request.uri = report_url
  fh = tempfile.NamedTemporaryFile()
  # Stream/download the report in a single request.
  downloader = MediaIoBaseDownload(fh, request, chunksize=-1)
  done = False
  while done is False:
    try:
      status, done = downloader.next_chunk()
    #add a throttle in case the limit of quota is exceeded
    except requests.exceptions.HTTPError as e:
      if e.response.status_code == 429:
        print('429 error detected. Sleeping for 60 seconds...')
        time.sleep(60)
        continue
      else:
        raise e
    if status:
      print(f'Download {int(status.progress() * 100)}%.')
  print('Download Complete for '+ local_file)
  # The file that is ready to be downloaded locally is fh.
  # Is it a compressed file?
  fh.seek(0)
  # If file is already compressed, save it according to DECOMPRESSION variable.
  try:
    src = gzip.open(fh)
    fh.seek(0)
    if decompression:
      local_file_csv = local_file + ".csv"
      with open(local_file_csv, 'wb') as f_out:
        shutil.copyfileobj(src, f_out)
    else:
      local_file_gzip = local_file + ".csv.gz"
      with gzip.open(local_file_gzip, 'wb') as f_out:
          shutil.copyfileobj(src, f_out)
  # If file is not compressed, save it according to the DECOMPRESSION variable.
  except OSError:
    src = fh
    if decompression:
      local_file_csv = local_file + ".csv"
      with open(local_file_csv, 'wb') as f_out:
        shutil.copyfileobj(src, f_out)
    else:
      local_file_gzip = local_file + ".csv.gz"
      with gzip.open(local_file_gzip, 'wb') as f_out:
          shutil.copyfileobj(src, f_out)
  src.close()

As the functions above list the jobs, retrieve the reports and download the reports, we need to combine the 3 of them to bulk download the reports. The function below allows you to download all the reports currently available in the Reporting API for your content owner.

An exhaustive list of the system-managed reports that are available in the API can be found [here](https://developers.google.com/youtube/reporting/v1/reports/system_managed/reports).

An exhaustive list of the bulk reports that are available in the API can be found [here](https://developers.google.com/youtube/reporting/v1/reports/content_owner_reports).

In [None]:
def bulk_download(youtube_reporting, content_owner_id, directory_path,
                   decompression=True):
  """Bulk downloads all reports in a CMS and returns None. The path is only a
  prefix, the suffix will depend on whether the file is compressed or not."""
  print('Authenticated successfully.')
  try:

    # First retrieve the jobs that correspond to all the reports you can download
    jobs = list_reporting_jobs(youtube_reporting,
                            onBehalfOfContentOwner=content_owner_id,
                            includeSystemManaged=True)

    if jobs:
      print(f'Now treating {str(len(jobs))} jobs')
      init_job = 0
      for job in jobs:
        init_job += 1
        # Each job will contain a list of reports
        reports = retrieve_reports(youtube_reporting,
                          jobId=job['id'],
                  onBehalfOfContentOwner=content_owner_id)
        print(f'Treating Job {str(init_job)}')
        if reports :
          init_report = 0
          for report in reports:
            init_report+=1
            print(f'Treating report {str(init_report)} of job {str(init_job)}')
            if report:
              # download each report in a compression .gz to make sure
              # every size can be handled.
              full_path = f'{directory_path}{content_owner_id}_' \
                          f'{job["reportTypeId"]}_'\
                          f'{re.sub(r":", "_", report["startTime"])}'
              try:
                download_report(youtube_reporting, report['downloadUrl'],
                              full_path, decompression)
              except HttpError as e:
                if e.resp.status == 429:
                    print("HTTP error 429 (Rate Limit Exceeded) occurred. "
                          "Waiting for 60 seconds...")
                    time.sleep(60)  # Sleep for 60 seconds
                    continue
                else:
                  print(f'An HTTP error {e.resp.status} occurred:\n{e.content}')

        else:
          print('No reports')
    else:
      print('No jobs')
  except HttpError as e:
    print(f'An HTTP error {e.resp.status} occurred:\n{e.content}')


## Time to download the reports!

If you have mounted your Google Drive, all the reports should be downloaded with the prefix `/content/gdrive` to see them in the same folder as the colab. All the files downloaded have to be the downloaded with the prefix `'/content/gdrive/My Drive/reports`.

If you chose to export your reports to your Google Drive (default), the compressed reports will be exported into a `reports` folder in your drive. If you don't already have a `reports` folder, then the code below will create it for you.

**If you did not choose to mount and export your reports to Google Drive:** the reports will be downloaded locally in the same location as the Colab. You need to set the `prefix` to `None`, so it won't point to your drive folder.

First you will authenticate to the API. This will require you to locate the json key of the service account that has access to your CMS. The execution of the cell below will open up a prompt that will allow you to upload the json file.

The `bulk_download()` function downloads the reports as gzip files to make sure every size can be handled. We added a decompression parameter to decompress the report file to a CSV directly after download if desired. By default, it is set to `True` (`decompression=True`), and the files are decompressed. If you set `DECOMPRESSION` to false at the beginning of the script, the reports will be left as Gzip format.

In [None]:
# Create a reports folder and change directory to the reports folder.

youtube_reporting = get_authenticated_service()
directory_path = '/content/drive/My Drive/reports/'
if not os.path.exists(directory_path):
    os.makedirs(directory_path)
os.chdir(directory_path)
prefix = directory_path

bulk_download(youtube_reporting,
              CONTENT_OWNER_ID,
              directory_path,
              decompression=DECOMPRESSION)

## Let's make sure it worked!

Now that your reports have finished downloading, you can use the command below to view the reports that you've downloaded.

In [None]:
!ls