<a href="https://colab.research.google.com/github/dcdesmond/colab-notebooks/blob/master/Tutorial_Loading_CSV_from_Drive_Link.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Introduction

This is a tutorial on how to load a .csv file into a Colab notebook from any Google Drive link. This link may be one that someone else has provided for you, or one that you have acquired from one of your own files that you wish to use. Instructive material acquired from [Get Started: 3 Ways to Load CSV files into Colab](https://towardsdatascience.com/3-ways-to-load-csv-files-into-colab-7c14fcbdcb92). This notebook is for personal reflection and sharing to those who I feel may find it helpful.


First, be sure that ```pandas``` is imported, as the following code is dependent on it.

In [0]:
import pandas as pd

The following code will install PyDrive, which will allow the notebook to access files from Google Drive. You must be a Google user and will need to follow the authentication link and paste the code into the cell output.

In [0]:
# Code to read csv file into Colaboratory:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

Then, we create an object to contain the link to the .csv that we want to import.

In [0]:
link = 'https://drive.google.com/open?id=1v433bCEsM0389afinfzQaUiOjoifgEcx' # The shareable link

The above link leads to a .csv of enrollment data from a Udacity course, but it can be replaced with any other Google Drive link leading to a .csv file and the rest will work just the same.

Next, because the ID of the specific link comes after ```open?id=``` in the link, we want to make sure that the hash the follows matches the link we inserted.

In [7]:
fluff, id = link.split('=')
print (id) # Verify that you have everything after '='

1v433bCEsM0389afinfzQaUiOjoifgEcx


If everything is correct, then the next cell will download and create a copy of the linked file in the Colab directory; please note that it will create this file in your _current_ directory, so be sure to navigate to where you want the file to be accordingly before running the following code. To name the file something else, simply change ```g_u_enrollments``` to the desired name.

In [0]:
downloaded = drive.CreateFile({'id':id}) 
downloaded.GetContentFile('g_u_enrollments.csv')  # the file should now be available in the Colab directory

In many cases for data analysis, we would want to store the file in a Pandas Dataframe, which we can do with the following code. This also allows us to handle and manipulate the data so we can examine it.

In [0]:
df3 = pd.read_csv('g_u_enrollments.csv')
# Dataset is now stored in a Pandas Dataframe

To see the top of the .csv that was imported, run the following cell.

In [10]:
df3.head() #should have the top 5 rows of the dataset

Unnamed: 0,account_key,status,join_date,cancel_date,days_to_cancel,is_udacity,is_canceled
0,448,canceled,2014-11-10,2015-01-14,65.0,True,True
1,448,canceled,2014-11-05,2014-11-10,5.0,True,True
2,448,canceled,2015-01-27,2015-01-27,0.0,True,True
3,448,canceled,2014-11-10,2014-11-10,0.0,True,True
4,448,current,2015-03-10,,,True,False




---

