<p style="margin-top: 22px; font-size:40px; text-align: left; font-weight: bold;  line-height: 100%;">
    Tutorial how to acess files on Google Desktop or Cloud Storage</p>

**Objective**<br>
Present functions for loading file from Google Cloud Services (Sheets, Drive and Cloud Storage).

In [None]:
# Libraries to install
!pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
!pip install gspread
!pip install pandas
!pip install PyDrive2
!pip install pyarrow

# Acessing publicly available files
In order to access files publicly available in a Google Drive or Spreadsheets you can use the functions imported in the next cell.

In [1]:
from gcloud_data_analysis_functions import (
    read_public_sheets,
    read_public_file_from_gdrive,
)

## Google Sheets (Public files)
To load a sheet from a public Google Spreadsheet you need to generate a download URL.
1. Open your Spreadsheet file;
2. Click in File > Share > Publish to Web
3. On the Link Option select the desired sheet;
4. On the file type select Comma-separated value (.csv);
5. Copy the newly created link below.

In [2]:
print(read_public_sheets.__doc__)

Read a publicly available Google Sheets file as a Pandas Dataframe.

    Parameters
    ----------
    file_url : str
        URL adress to the spreadsheet CSV file.

    Returns
    -------
    pd.DataFrame
        Dataframe loaded from the CSV adress.

    


In [None]:
df_sheet_public = read_public_sheets(
    "<Google Sheets URL link>"
)
df_sheet_public

## Google Drive (Public files)
To make a file available at Google Drive:
1. Right Click the file and choose "Share";
2. On the "Get link" option click on "Change to anyone with the link";
3. Copy the generated link.

In [5]:
print(read_public_file_from_gdrive.__doc__)

Read public files stored in a Google Drive.

    Parameters
    ----------
    file_url : str
        Google Drive file URL.

    file_format : str
        Type of file format: 'csv', 'xlsx' or 'json'.

    Returns
    -------
    Union[pd.DataFrame, Dict]
        Dataframe (for 'csv' or 'xlsx') or Dictionary ('json') from Google
        Drive file.

    


In [7]:
file_url_xslx = "<Google Drive xlsx file link>"

In [8]:
read_public_file_from_gdrive(
    file_url_xslx,
    "xlsx",
)

Unnamed: 0,RM,LSTAT,PTRATIO,MEDV
0,6575.00,4.98,15.3,504000.0
1,6421.00,9.14,17.8,453600.0
2,7185.00,4.03,17.8,728700.0
3,6998.00,2.94,18.7,701400.0
4,7147.00,5.33,18.7,760200.0
...,...,...,...,...
484,6593.00,9.67,21.0,470400.0
485,6.12,9.08,21.0,432600.0
486,6976.00,5.64,21.0,501900.0
487,6794.00,6.48,21.0,462000.0


In [9]:
file_url_csv = (
    "<Google Drive csv file link>"
)

In [10]:
read_public_file_from_gdrive(
    file_url_csv,
    "csv",
)

Unnamed: 0,RM,LSTAT,PTRATIO,MEDV
0,6.575,4.98,15.3,504000.0
1,6.421,9.14,17.8,453600.0
2,7.185,4.03,17.8,728700.0
3,6.998,2.94,18.7,701400.0
4,7.147,5.33,18.7,760200.0
...,...,...,...,...
484,6.593,9.67,21.0,470400.0
485,6.120,9.08,21.0,432600.0
486,6.976,5.64,21.0,501900.0
487,6.794,6.48,21.0,462000.0


In [11]:
file_url_json = (
    "<Google Drive json file link>"
)

In [12]:
read_public_file_from_gdrive(
    file_url_json,
    "json",
)

{'1': 'b', '2': 'c'}

# Acessing private files

In [13]:
from gcloud_data_analysis_functions import (
    read_private_sheets,
    read_file_from_gcloud_storage,
)

## Google Sheets (Private files)
To access private files you need to [generate a Google Cloud Credentials](https://docs.gspread.org/en/latest/oauth2.html) in order to connect to the services described below.

In [15]:
print(read_private_sheets.__doc__)

Read a private available Google Sheets as a Pandas Dataframe.

    Parameters
    ----------
    credentials_json : str
        Path to JSON file with GCloud Credentials.

    sheet_url : str
        Spreadheet URL adress.

    worksheet : int (default=0)
        Index or name for the target worksheet.

    Returns
    -------
    pd.DataFrame
        Dataframe loaded from the spreadsheet.

    


In [14]:
# Define the path to your Google Cloud Credentials
gcloud_credentials = "<Path to Google Cloud Credentials json>"

In [16]:
sheet_url = "<Google Sheet link>"

In [17]:
read_private_sheets(gcloud_credentials, sheet_url, 0)

Unnamed: 0,RM,LSTAT,PTRATIO,MEDV
0,6.575,4.98,15.3,504000
1,6.421,9.14,17.8,453600
2,7.185,4.03,17.8,728700
3,6.998,2.94,18.7,701400
4,7.147,5.33,18.7,760200
...,...,...,...,...
484,6.593,9.67,21,470400
485,6.12,9.08,21,432600
486,6.976,5.64,21,501900
487,6.794,6.48,21,462000


## Google Drive (Private files)
To generate the credentials needed to access private files on Google Drive, please follow the steps described in [Pydrive2 documentation](https://docs.iterative.ai/PyDrive2/quickstart/).

In [13]:
from pydrive2.auth import GoogleAuth
from gcloud_data_analysis_functions import read_private_file_from_gdrive, read_file_from_gcloud_storage

In [None]:
gauth = GoogleAuth()
gauth.LocalWebserverAuth()

In [8]:
file_url_csv = "<Google Drive csv link>"
file_url_json = "<Google Drive json link>"
file_url_xlsx = "<Google Drive xlsx link>"
file_url_parquet = "<Google Drive parquet link>"

In [None]:
read_private_file_from_gdrive(file_url_json, "json", gauth)

In [None]:
read_private_file_from_gdrive(file_url_csv, "csv", gauth)

In [None]:
read_private_file_from_gdrive(file_url_xlsx, "xlsx", gauth)

In [None]:
read_private_file_from_gdrive(file_url_parquet, "parquet", gauth)

## Google Cloud Storage

To access files stored in Google Cloud Storage you need to [generate a json file with your credentials](https://developers.google.com/identity/protocols/oauth2?hl=en). 

In [14]:
print(read_file_from_gcloud_storage.__doc__)

Read file from Google Cloud Storage into a specific Python object.

    Parameters
    ----------
    file_name : str
        String with the name of the target file.

    file_format : str
        File format can be 'csv', 'xlsx', 'parquet', 'json' or 'txt'.

    gcp_bucket : str
        String with bucket name.

    gcp_project : str (default="jeitto-datascience")
        String with the name of the project in GCP.

    gcp_credentials_file : str
        Dictionary with GCP credentials.

    Returns
    -------
    Union[pd.DataFrame, Dict, str].
        The specified object generate from target file.

    


In [None]:
read_file_from_gcloud_storage(
    "<File name>",
    "<File format>",
    "<Bucket name>",
    "<Project Name>",
    "<Path to GCloud Credentials>",
)