# Prerequisites for getting permission to download the Waymo Open Dataset

1. Visit the [Waymo Open Dataset website](https://waymo.com/open/).
2. Click the "Download" button. If you never downloaded the dataset before, you will be prompted to accept the license and sign up for an account.
3. After signing up, scoll down to the bottom of the page and you will be presented with 3 options: Motion Dataset, Perception Dataset with maps, and Perception Dataset Modular without maps. We use the Perception Dataset Modular without maps.
4. Click the corresponding "Download" button.
5. This will redirect you to Google Cloud where again you will have to sign in. 
6. By now you should be at a link like this: https://console.cloud.google.com/storage/browser/waymo_open_dataset_v_2_0_1;tab=objects
7. If you can see the list of files, then you have the appropriate permissions.

Now you can run the following commands to download the dataset splits. You do not need to download the entire dataset but only the splits you need.

### Installing the Google Cloud SDK

In [1]:
import platform
import shutil
import subprocess

if shutil.which('gcloud') is not None:
    print('Google Cloud SDK is already installed.')
else:
    print('Google Cloud SDK is not installed.')
    
    if platform.system() == 'Windows':
        print('Installing Google Cloud SDK on Windows is a bit tricky, please refer to https://cloud.google.com/sdk/docs/install#windows')
    elif platform.system() == 'Darwin':
        print('Installing Google Cloud SDK on mac assuming that you have Homebrew installed')
        subprocess.run(['brew', 'install', '--cask', 'google-cloud-sdk'])
    elif platform.system() == 'Linux':
        print('Installing Google Cloud SDK on Linux (Debian/Ubuntu) assuming that you have snap installed')
        subprocess.run(['sudo', 'apt-get', 'purge', '--auto-remove', 'gsutil'])
        subprocess.run(['sudo', 'snap', 'install', 'google-cloud-sdk', '--classic'])

Google Cloud SDK is not installed.
Installing Google Cloud SDK on mac assuming that you have Homebrew installed


==> Auto-updating Homebrew...
Adjust how often this is run with HOMEBREW_AUTO_UPDATE_SECS or disable with
HOMEBREW_NO_AUTO_UPDATE. Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).
==> Auto-updated Homebrew!
Updated 2 taps (homebrew/core and homebrew/cask).
==> New Formulae
aliae
ampl-asl
aws-c-auth
aws-c-cal
aws-c-common
aws-c-compression
aws-c-event-stream
aws-c-http
aws-c-io
aws-c-mqtt
aws-c-s3
aws-c-sdkutils
aws-checksums
aws-crt-cpp
azure-core-cpp
cargo-chef
cargo-expand
cargo-run-bin
cloudquery
cobo-cli
cpp-peglib
directx-headers
draft
dragonbox
ducker
dum
eventpp
fltk@1.3
funzzy
glibc@2.17
harlequin
imgp
ipatool
kfr
kraftkit
kubeone
libultrahdr
libunicode
lla
meli
minio-warp
nrm
php@8.3
pie
protoc-gen-grpc-java
rainfrog
redka
regal
rogcat
rshijack
scooter
sesh
sixtunnel
sqlitecpp
tabulate
television
topiary
victorialogs
wgo
==> New Casks
bananas
beaver-notes
font-agu-display
font-badeen-display
font-cica-without-emoji
font-playwrite-ar-guides
font-playwrite-at-guides

==> Downloading https://raw.githubusercontent.com/Homebrew/homebrew-cask/cc2831cb174ade41dd9da27cb58f06ebdd740fd4/Casks/g/google-cloud-sdk.rb
==> Downloading https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-502.0.0-darwin-arm.tar.gz
All dependencies satisfied.
==> Installing Cask google-cloud-sdk
==> Running installer script 'google-cloud-sdk/install.sh'



Your current Google Cloud CLI version is: 502.0.0
The latest available version is: 502.0.0

To install or remove components at your current SDK version [502.0.0], run:
  $ gcloud components install COMPONENT_ID
  $ gcloud components remove COMPONENT_ID

To update your SDK installation to the latest version [502.0.0], run:
  $ gcloud components update

==> Source [/opt/homebrew/share/google-cloud-sdk/completion.zsh.inc] in your profile to enable shell command completion for gcloud.
==> Source [/opt/homebrew/share/google-cloud-sdk/path.zsh.inc] in your profile to add the Google Cloud SDK command line tools to your $PATH.


Welcome to the Google Cloud CLI!
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                    Components                                                   │
├───────────────┬──────────────────────────────────────────────────────┬──────────────────────────────┬───────────┤
│     Status    │                         Name                         │              ID              │    Size   │
├────────��──────┼──────────────────────────────────────────────────────┼──────────────────────────────┼───────────┤
│ Not Installed │ App Engine Go Extensions                             │ app-engine-go                │   4.5 MiB │
│ Not Installed │ Appctl                                               │ appctl                       │  18.5 MiB │
│ Not Installed │ Artifact Registry Go Module Package Helper           │ package-go-module            │   < 1 MiB │
│ Not Installed │ Cloud Bigtable Comma

### Logging into Google Cloud

In [2]:
!gcloud auth login

Your browser has been opened to visit:

    https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=32555940559.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8085%2F&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fappengine.admin+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.login+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcompute+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Faccounts.reauth&state=epN8nrblYEwV1hrmxkdjboxx7yCvUk&access_type=offline&code_challenge=ACyC94HGhAFHXAQT6Jp-M3GhudULrNWDUmToJ3scf1U&code_challenge_method=S256


You are now logged in as [k.e@berkeley.edu].
Your current project is [prompt-ad-game].  You can change this setting by running:
  $ gcloud config set project PROJECT_ID


To take a quick anonymous survey, run:
  $ gcloud survey



### Downloading the splits

In [16]:
# Uncomment the subset of data you want to download

# !gsutil -m cp -r \
  # "gs://waymo_open_dataset_v_2_0_1/testing" \ # 9.6 GiB
  # "gs://waymo_open_dataset_v_2_0_1/testing_location" \ # 79.3 GiB
  # "gs://waymo_open_dataset_v_2_0_1/training" \ # 463.4 GiB
  # "gs://waymo_open_dataset_v_2_0_1/validation" \ # 119.8 GiB
  # .

!mkdir -p ../data/waymo_open_dataset_v_2_0_1
!gsutil -m cp -r \
  "gs://waymo_open_dataset_v_2_0_1/testing" \
  .

!mv testing ../data/waymo_open_dataset_v_2_0_1

print('Download complete.')

INFO 1202 23:27:28.328846 retry_util.py] Retrying request, attempt #1...
If you experience problems with multiprocessing on MacOS, they might be related to https://bugs.python.org/issue33725. You can disable multiprocessing by editing your .boto config or by adding the following flag to your command: `-o "GSUtil:parallel_process_count=1"`. Note that multithreading is still available even if you disable multiprocessing.

Copying gs://waymo_open_dataset_v_2_0_1/testing/camera_box/10504764403039842352_460_000_480_000.parquet...
Copying gs://waymo_open_dataset_v_2_0_1/testing/camera_box/11987368976578218644_1340_000_1360_000.parquet...
Copying gs://waymo_open_dataset_v_2_0_1/testing/camera_box/14188689528137485670_2660_000_2680_000.parquet...
Copying gs://waymo_open_dataset_v_2_0_1/testing/camera_box/10980133015080705026_780_000_800_000.parquet...
Copying gs://waymo_open_dataset_v_2_0_1/testing/camera_box/14737335824319407706_1980_000_2000_000.parquet...
Copying gs://waymo_open_dataset_v_2