- I have worked on GroupIntoBatches for dataset gas_retail.csv
- My Google Colab Notebook on GroupIntoBatches
- Demonstration Video link
- Python
- Apache beam
- Google Colaboratory
- Install apache beam using the below command.
pip install apache-beam
- Next install the dependencies required using below command.
!pip install apache-beam[gcp,aws,test,docs]
- The command that lists all the files.
! ls
- First sign in to google drive account and google colab with same credentials and upload .csv file to google drive account.
- Import .csv file into google colab.
# Code to read csv file into colaboratory:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Autheticate E-Mail ID
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# Get File from Drive using file-ID
# Get the file
downloaded = drive.CreateFile({'id':'1b73yN7MjGytqSP5wimYAQmtByOvGGe8Y'}) # replace the id with id of file you want to access
downloaded.GetContentF
- Command for result
! cat results.txt-00000-of-00001
- For installation of apache beam.
- For installing required dependencies and libraries.
- Program for GroupIntoBatches.
- For importing file into colobaratory.
- For display of list of files.
- For output of the file.