-
Notifications
You must be signed in to change notification settings - Fork 2
/
index.html.md.erb
238 lines (154 loc) · 9.95 KB
/
index.html.md.erb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
---
title: Use Google Cloud Platform
weight: 1
last_reviewed_on: 2024-04-24
review_in: 6 months
---
# Use Google Cloud Platform
Google Cloud Platform (GCP) is a suite of cloud computing services. GDS maintains a central account.
## Get access to GCP
You already have access to GCP through your GDS Google Account.
You can use GCP through:
- the cloud console
- the command line on your local machine
- code that runs on your local machine
- code that runs on GCP
You will need specific roles and permissions to use each of the GCP services.
### Get a specific role or permissions to use GCP
A role is a set of permissions. You should only have the specific role or permissions you need to use the GCP.
See the [Google Cloud documentation on understanding roles](https://cloud.google.com/iam/docs/understanding-roles) and the [IAM permissions reference](https://cloud.google.com/iam/docs/permissions-reference) for more information.
If your code fails to run, check the error message to see if there was a role or permission error.
If you need to run code unsupervised, that is without logging into your Google account, then you need a service account. [It's unlikely that you will need a service account for personal use](https://cloud.google.com/iam/docs/best-practices-for-using-and-managing-service-accounts#development).
[Contact the Data Engineering community on Slack](https://gds.slack.com/archives/C032AH1D6MU) to ask for a role, permission or service account.
## Use GCP through the cloud console
You can use GCP from the cloud console by opening one of the following URLs in a web browser:
- the [cloud console](https://console.cloud.google.com/)
- the [cloud shell](https://console.cloud.google.com/home/dashboard?cloudshell=true)
## Use GCP through the command line on your local machine
You can use the [Google Cloud CLI](https://cloud.google.com/sdk/gcloud) to:
- interact directly with GCP
- run code that interacts with GCP
1. [Install the Google Cloud CLI](https://cloud.google.com/sdk/docs/install-sdk#mac) on your local machine.
1. Run `gcloud init` in the command line to configure the Google Cloud CLI. Authorise the Google Cloud CLI to use your Google account when prompted in a separate window.
1. Once configuration is complete, you and your code are authenticated for an hour. You can refresh your authentication by running `gcloud auth login` in the command line.
## Use GCP through code that runs on your local machine
1. Follow the [documentation on using GCP through the command line on your local machine](#use-gcp-through-the-command-line-on-your-local-machine).
1. Run `gcloud auth login --update-adc` in the command line.
Authorise gcloud to use your Google account when prompted in a separate window.
Running this command also creates a file in a location that it prints at the command line, `~/.config/gcloud/application_default_credentials.json`.
Code that you run will use this file to authenticate itself on GCP.
### Check that you can now use GCP through code that runs on your local machine
You should check that you can now use GCP through code that runs on your local machine.
If you cannot, this is likely to be an authentication issue.
1. Get the __BigQuery Job User__ role for your GDS Google account by [contacting the Data Engineering community on Slack](https://gds.slack.com/archives/C032AH1D6MU).
1. Install the [BigQuery API client library for Python](https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-python) by running `pip install --upgrade google-cloud-bigquery` in the command line.
You should do this in a new [python virtual environment](https://gds-way.cloudapps.digital/manuals/programming-languages/python/python.html#environments).
1. Run `gcloud auth login --update-adc`. The flag `--update-adc` allows code that you run to use your credentials.
1. Create a file called `bigquery.py` that contains the following code:
```python
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client(project="govuk-bigquery-analytics")
query = """
SELECT name, SUM(number) as total_people
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE state = 'TX'
GROUP BY name, state
ORDER BY total_people DESC
LIMIT 20
"""
query_job = client.query(query) # Make an API request.
print("The query data:")
for row in query_job:
# You can access row values by field name or index.
print("name={}, count={}".format(row[0], row["total_people"]))
```
1. Run `python bigquery.py` in the command line. You should see the following output:
```text
The query data:
name=James, count=272793
name=John, count=235139
name=Michael, count=225320
name=Robert, count=220399
name=David, count=219028
name=Mary, count=209893
name=William, count=173092
name=Jose, count=157362
name=Christopher, count=144196
name=Maria, count=131056
name=Charles, count=126509
name=Daniel, count=117470
name=Richard, count=109888
name=Juan, count=109808
name=Jennifer, count=98696
name=Joshua, count=90679
name=Elizabeth, count=90465
name=Joseph, count=89097
name=Matthew, count=88464
name=Joe, count=87977
```
If you do not see this output, then you currently cannot use GCP through code that runs on your local machine.
[Contact the Data Engineering community on Slack](https://gds.slack.com/archives/C032AH1D6MU) to ask for help.
### Access files on Google Drive
1. Follow the [documentation on using GCP through code that runs on your local machine](#use-gcp-through-the-command-line-on-your-local-machine).
1. Run `gcloud auth login --enable-gdrive-access --update-adc` in the command line. The flag `--enable-gdrive-access` allows code to access files on Google Drive.
1. Install the legacy Google API Python Client by running `pip install google-api-python-client` in the command line.
You should do this in a new [python virtual environment](https://gds-way.cloudapps.digital/manuals/programming-languages/python/python.html#environments).
No newer client library supports Google Drive, Sheets or Docs.
1. Create a file called `sheets.py` that contains the following code:
```python
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from pprint import pprint
from argparse import ArgumentParser
parser = ArgumentParser()
parser.add_argument("-s", "--sheet-id", dest="SHEET_ID",
help="ID of a Google Sheet")
parser.add_argument("-r", "--range", dest="RANGE",
help="Range of cells in A1 or R1C1 notation")
args = parser.parse_args()
# The ID and range of a sample spreadsheet.
# SHEET_ID = '1ThBiDsAtMhfwvPXW39siZkZPdFS3FW_IGgQuET4YjnM'
# RANGE = 'scores!A:E'
try:
service = build('sheets', 'v4')
# Call the Sheets API
result = service.spreadsheets().values().get(
spreadsheetId=args.SHEET_ID, range=args.RANGE).execute()
rows = result.get('values', [])
if not rows:
print('No data found.')
# print('{0} rows retrieved.'.format(len(rows)))
pprint(rows)
except HttpError as err:
print(err)
```
1. Run the code in this file by running `python sheets.py -s=SHEET_ID -r=RANGE` in the command line, where:
- `SHEET_ID` is the ID of a sheet that that you can access
- `RANGE` is the name of a tab and a range of cells, such as `Sheet1\!A1:E5`.
You will probably need to escape the exclamation mark with a backslash, as shown in the example.
Running this code prints the values of some cells of this file sheet in the command line.
## Use GCP through code that runs on GCP
You can use the GCP through code that runs on the GCP itself.
This is the same as running code unsupervised or automatically as part of a service, without needing to log into your Google account.
Therefore you need a service account. [Contact the Data Engineering community on Slack](https://gds.slack.com/archives/C032AH1D6MU) to ask for a service account, saying what [roles and permissions]() it requires.
[Contact the Data Engineering community on Slack](https://gds.slack.com/archives/C032AH1D6MU) to ask them to attach the service account to the resource in GCP that the code will run in.
You should not need to modify your code, because this code will automatically find and use the service account credentials.
### Access files on Google Drive, such as Sheets and Docs, using a service account
In Google Drive, share the files with the email address of the service account.
## Impersonate a service account on your local device
1. [Contact the Data Engineering community on Slack](https://gds.slack.com/archives/C032AH1D6MU) to get a service account that grants your personal account the role __Service Account Token Creator__, which includes the permission __iam.serviceAccounts.getAccessToken__.
Note the email address of the service account.
1. Follow the [documentation on using GCP through the command line on your local machine](#use-gcp-through-the-command-line-on-your-local-machine).
1. Run `gcloud auth application-default login --impersonate-service-account=EMAIL_ADDRESS` on the command line, where `EMAIL_ADDRESS` is the email of the service account, ending in `.iam.gserviceaccount.com`.
The Google Cloud client libraries will now use the service account.
## Use GCP from code that runs on Google Colab
Paste and run the following code into a cell in a Colab notebook.
```py
from google.colab import auth
# Authenticate the user - follow the link and the prompts to get an authentication token
auth.authenticate_user()
```
## Avoid using a credentials file
[You should not use a credentials file](https://cloud.google.com/blog/products/identity-security/how-to-authenticate-service-accounts-to-help-keep-applications-secure) because they pose a security risk.
The Google Cloud client libraries will automatically find the credentials that it needs, whether you are running code on your local device or in GCP itself.