<a href="https://colab.research.google.com/github/BengiNouri/Assignments/blob/main/Eksterne_data_Lokale_filer%2C_Drev%2C_Sheets_og_Cloud_Storage.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Denne notesbog leverer opskrifter, der kan indlæse og gemme data fra eksterne kilder.

# Lokalt filsystem

## Upload af filer fra dit lokale filsystem

<code>files.upload</code> henter en ordbog over filer, der blev uploadet.
Ordbogen er nøglemærket med filnavnet, og værdierne er de data, der blev uploadet.

In [None]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

## Download af filer til dit lokale filsystem

<code>files.download</code> vil aktivere en browserdownload af filen til din lokale computer.


In [None]:
from google.colab import files

with open('example.txt', 'w') as f:
  f.write('some content')

files.download('example.txt')

# Google Drev

Du kan få adgang til filer i Drev på forskellige måder, herunder:
- Isætning af Google Drev på kørselstidens virtuelle maskine
- Brug af en wrapper rundt om API'en som f.eks. <a href="https://docs.iterative.ai/PyDrive2/">PyDrive2</a>
- Brug af den <a href="https://developers.google.com/drive/v3/web/about-sdk">indbyggede REST API</a>



Du finder eksempler på hver enkelt nedenfor.

## Lokal isætning af Google Drev

Eksemplerne nedenfor viser, hvordan du isætter dit Google Drev på din kørselstid ved hjælp af en godkendelseskode, og hvordan du skriver og læser filer der. Når den er blevet eksekveret, vil du kunne se den nye fil &#40;<code>foo.txt</code>&#41; på <a href="https://drive.google.com/">https://drive.google.com/</a>.

Dette understøtter kun muligheden for at læse, skrive og flytte filer. Hvis du programmatisk vil ændre delingsindstillinger eller andre metadata, skal du bruge en af de andre valgmuligheder nedenfor.

<strong>Bemærk!</strong> Når du bruger knappen "Isæt Drev" i filbrowseren, er der ikke brug for godkendelseskoder til notesbøger, der kun er blevet redigeret af den nuværende bruger.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code
Enter your authorization code:
··········
Mounted at /content/drive


In [None]:
with open('/content/drive/My Drive/foo.txt', 'w') as f:
  f.write('Hello Google Drive!')
!cat /content/drive/My\ Drive/foo.txt

Hello Google Drive!

In [None]:
drive.flush_and_unmount()
print('All changes made in this colab session should now be visible in Drive.')

All changes made in this colab session should now be visible in Drive.


## PyDrive2

Eksemplerne nedenfor demonstrerer godkendelse og filupload/-download ved hjælp af PyDrive2. Der er flere tilgængelige eksempler i <a href="https://docs.iterative.ai/PyDrive2/">dokumentationen til PyDrive2</a>.

In [None]:
from pydrive2.auth import GoogleAuth
from pydrive2.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

Godkend og opret PyDrive2-klienten.


In [None]:
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

Opret og upload en tekstfil.


In [None]:
uploaded = drive.CreateFile({'title': 'Sample upload.txt'})
uploaded.SetContentString('Sample upload file content')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))

Uploaded file with ID 14vDAdqp7BSCQnoougmgylBexIr2AQx2T


Indlæs en fil efter id, og udskriv dens indhold.


In [None]:
downloaded = drive.CreateFile({'id': uploaded.get('id')})
print('Downloaded content "{}"'.format(downloaded.GetContentString()))

Downloaded content "Sample upload file content"


## REST API for Drev

For at kunne bruge Drive API skal vi først godkende og konstruere en API-klient.


In [None]:
from google.colab import auth
auth.authenticate_user()
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')

Med denne klient kan vi bruge alle funktionerne i <a href="https://developers.google.com/drive/v3/reference/">Google Drive API-referencen</a>. Eksempler nedenfor.


### Oprettelse af en ny fil i Drev med data fra Python

Først skal du oprette en lokal fil, der skal uploades.

In [None]:
with open('/tmp/to_upload.txt', 'w') as f:
  f.write('my sample file')

print('/tmp/to_upload.txt contains:')
!cat /tmp/to_upload.txt

/tmp/to_upload.txt contains:
my sample file

Upload den ved hjælp af <a href="https://developers.google.com/drive/v3/reference/files/create"><code>files.create</code></a>-metoden. Du kan finde yderligere oplysninger om at uploade filer i <a href="https://developers.google.com/drive/v3/web/manage-uploads">dokumentationen for udviklere</a>.

In [None]:
from googleapiclient.http import MediaFileUpload

file_metadata = {
  'name': 'Sample file',
  'mimeType': 'text/plain'
}
media = MediaFileUpload('/tmp/to_upload.txt',
                        mimetype='text/plain',
                        resumable=True)
created = drive_service.files().create(body=file_metadata,
                                       media_body=media,
                                       fields='id').execute()
print('File ID: {}'.format(created.get('id')))

File ID: 1Cw9CqiyU6zbXFD9ViPZu_3yX-sYF4W17


Når du har eksekveret cellen ovenfor, ser du en ny fil med navnet "Sample file" på <a href="https://drive.google.com/">https://drive.google.com/</a>.

### Download af data fra en fil i Drev ind i Python

Download den fil, vi uploadede ovenfor.

In [None]:
file_id = created.get('id')

import io
from googleapiclient.http import MediaIoBaseDownload

request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
  # _ is a placeholder for a progress object that we ignore.
  # (Our file is small, so we skip reporting progress.)
  _, done = downloader.next_chunk()

downloaded.seek(0)
print('Downloaded file contents are: {}'.format(downloaded.read()))

Downloaded file contents are: b'my sample file'


Hvis du vil downloade en anden fil, skal du indstille <code>file&#95;id</code> ovenfor til den pågældende fils id, som vil ligne "1uBtlaggVyWshwcyP6kEI-y&#95;W3P8D26sz".

# Google Sheets

Vores eksempler nedenfor bruger open source <a href="https://github.com/burnash/gspread"><code>gspread</code></a>-samlingen til at interagere med Google Sheets.

Importér samlingen, godkend, og opret grænsefladen til Sheets.

In [None]:
from google.colab import auth
auth.authenticate_user()

import gspread
from google.auth import default
creds, _ = default()

gc = gspread.authorize(creds)

Nedenfor kan du se nogle få <code>gspread</code>-eksempler. Du kan finde flere eksempler på GitHub-siden for <a href="https://github.com/burnash/gspread#more-examples"><code>gspread</code></a>.

## Oprettelse af et nyt regneark med data fra Python

In [None]:
sh = gc.create('My cool spreadsheet')

Når du har eksekveret cellen ovenfor, kan du se et nyt regneark med navnet "My cool spreadsheet" på <a href="https://sheets.google.com/">https://sheets.google.com</a>.

Åbn vores nye regneark, og tilføj nogle tilfældige data.

In [None]:
worksheet = gc.open('My cool spreadsheet').sheet1

cell_list = worksheet.range('A1:C2')

import random
for cell in cell_list:
  cell.value = random.randint(1, 10)

worksheet.update_cells(cell_list)

{'spreadsheetId': '1dsQeN0YzXuM387l_CuyEbsYzL2ew9TJFzR-E-RQnwxs',
 'updatedCells': 6,
 'updatedColumns': 3,
 'updatedRange': 'Sheet1!A1:C2',
 'updatedRows': 2}

## Download af data fra et regneark i Python som et Pandas DataFrame

Læs de tilfældige data, vi indsatte ovenfor, og konverter resultatet til et <a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html">Pandas DataFrame</a>.

In [None]:
worksheet = gc.open('My cool spreadsheet').sheet1

# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
print(rows)

import pandas as pd
pd.DataFrame.from_records(rows)

[['6', '3', '4'], ['7', '2', '1']]


Unnamed: 0,0,1,2
0,6,3,4
1,7,2,1


# Google Cloud Storage &#40;GCS&#41;

Hvis du vil bruge Colaboratory med GCS, skal du oprette et <a href="https://cloud.google.com/storage/docs/projects">Google Cloud-projekt</a> eller bruge et allerede eksisterende.

Angiv dit projekt-id nedenfor:

In [None]:
project_id = 'Your_project_ID_here'

Filer i GCS opbevares i <a href="https://cloud.google.com/storage/docs/buckets">samlinger</a>.

Samlinger skal have et navn, der er unik globalt set, så vi genererer et her.

In [None]:
import uuid
bucket_name = 'colab-sample-bucket-' + str(uuid.uuid1())

Adgang til GCS kræver godkendelse.

In [None]:
from google.colab import auth
auth.authenticate_user()

Det er muligt at få adgang til GCS via kommandolinjeværktøjet til <code>gsutil</code> eller via den indbyggede Python API.

## `gsutil`

Først konfigurerer vi <code>gsutil</code> til at bruge det projekt, vi angav ovenfor, ved hjælp af <code>gcloud</code>.

In [None]:
!gcloud config set project {project_id}

Updated property [core/project].


Opret en lokal fil, der skal uploades.

In [None]:
with open('/tmp/to_upload.txt', 'w') as f:
  f.write('my sample file')

print('/tmp/to_upload.txt contains:')
!cat /tmp/to_upload.txt

/tmp/to_upload.txt contains:
my sample file

Lav en samling, som vi kan uploade filen til &#40;<a href="https://cloud.google.com/storage/docs/gsutil/commands/mb">dokumentation</a>&#41;.

In [None]:
!gsutil mb gs://{bucket_name}

Creating gs://colab-sample-bucket-44971372-baaf-11e7-ae30-0242ac110002/...


Kopiér filen til vores nye samling &#40;<a href="https://cloud.google.com/storage/docs/gsutil/commands/cp">dokumentation</a>&#41;.

In [None]:
!gsutil cp /tmp/to_upload.txt gs://{bucket_name}/

Copying file:///tmp/to_upload.txt [Content-Type=text/plain]...
/ [1 files][   14.0 B/   14.0 B]                                                
Operation completed over 1 objects/14.0 B.                                       


Gem indholdet af vores nyligt kopierede fil for at sikre, at det hele virkede &#40;<a href="https://cloud.google.com/storage/docs/gsutil/commands/cat">dokumentation</a>&#41;.


In [None]:
!gsutil cat gs://{bucket_name}/to_upload.txt

my sample file

In [None]:
# @markdown Når upload er gennemført, vises dataene i Cloud Console-lagerbrowseren for dit projekt:
print('https://console.cloud.google.com/storage/browser?project=' + project_id)

https://console.cloud.google.com/storage/browser?project=Your_project_ID_here


Til sidst downloader vi den fil, vi lige har uploadet i eksemplet ovenfor. Det er så enkelt som at vende om på rækkefølgen i kommandoen <code>gsutil cp</code>.

In [None]:
!gsutil cp gs://{bucket_name}/to_upload.txt /tmp/gsutil_download.txt

# Udskriv resultatet for at sikre, at overførslen virkede.
!cat /tmp/gsutil_download.txt

Copying gs://colab-sample-bucket483f20dc-baaf-11e7-ae30-0242ac110002/to_upload.txt...
/ [1 files][   14.0 B/   14.0 B]                                                
Operation completed over 1 objects/14.0 B.                                       
my sample file

## Python API

Disse kodestykker er baseret på <a href="https://github.com/GoogleCloudPlatform/storage-file-transfer-json-python/blob/master/chunked_transfer.py">et større eksempel</a>, der viser anden brug af API'en.

Først opretter vi serviceklienten.

In [None]:
from googleapiclient.discovery import build
gcs_service = build('storage', 'v1')

Opret en lokal fil, der skal uploades.

In [None]:
with open('/tmp/to_upload.txt', 'w') as f:
  f.write('my sample file')

print('/tmp/to_upload.txt contains:')
!cat /tmp/to_upload.txt

/tmp/to_upload.txt contains:
my sample file

Opret en samling i det projekt, der er angivet ovenfor.

In [None]:
# Brug et andet samlingsnavn, der er unikt globalt set, fra gsutil-eksemplet ovenfor.
import uuid
bucket_name = 'colab-sample-bucket-' + str(uuid.uuid1())

body = {
  'name': bucket_name,
  # For a full list of locations, see:
  # https://cloud.google.com/storage/docs/bucket-locations
  'location': 'us',
}
gcs_service.buckets().insert(project=project_id, body=body).execute()
print('Done')

Done


Upload filen til vores nyoprettede samling.

In [None]:
from googleapiclient.http import MediaFileUpload

media = MediaFileUpload('/tmp/to_upload.txt',
                        mimetype='text/plain',
                        resumable=True)

request = gcs_service.objects().insert(bucket=bucket_name,
                                       name='to_upload.txt',
                                       media_body=media)

response = None
while response is None:
  # _ is a placeholder for a progress object that we ignore.
  # (Our file is small, so we skip reporting progress.)
  _, response = request.next_chunk()

print('Upload complete')

Upload complete


In [None]:
# @markdown Når upload er gennemført, vises dataene i Cloud Console-lagerbrowseren for dit projekt:
print('https://console.cloud.google.com/storage/browser?project=' + project_id)

https://console.cloud.google.com/storage/browser?project=Your_project_ID_here


Download den fil, vi lige har uploadet.

In [None]:
from apiclient.http import MediaIoBaseDownload

with open('/tmp/downloaded_from_gcs.txt', 'wb') as f:
  request = gcs_service.objects().get_media(bucket=bucket_name,
                                            object='to_upload.txt')
  media = MediaIoBaseDownload(f, request)

  done = False
  while not done:
    # _ is a placeholder for a progress object that we ignore.
    # (Our file is small, so we skip reporting progress.)
    _, done = media.next_chunk()

print('Download complete')

Download complete


Undersøg den downloadede fil.


In [None]:
!cat /tmp/downloaded_from_gcs.txt

my sample file