<a href="https://colab.research.google.com/github/anjaleeDS/TECH26_F24/blob/main/Common_Methods_for_Uploading_Data_into_Colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Frequently used ways to getting data**

The following code works if you have the files and folders it is trying to read.

1. From your Google Drive - usually a .csv file
2. From a URL that has a .csv at the end of the file
3. From a Google sheet



#**NOTE ON HOW TO USE**

Use this notebook as a resource. Cut and Paste code section into your notebook to use.


---



If you run this notebook in it's entirety with ***Runtime > Run All***, Google Drive will sometimes not allow you to access while you have another mount active.


---



When this happens go to ***Runtime > Restart Session*** and this will clear any connections that are open who are causing this error message.

##1 Getting data from vega_datasets

vega_datasets is a collection of datasets to use when working with Python and Pandas. Since this set doesn't change it is frequently used for testing and demonstrations.


In [None]:
# get your data from Vega data, a popular repository made available for everyone
# that contains 20 something sets of data.

import pandas as pd
from vega_datasets import data

data.list_datasets() # what data sets are in vega_datasets?
len(data.list_datasets()) # how many was that?

df = data.cars()
df

Unnamed: 0,Name,Miles_per_Gallon,Cylinders,Displacement,Horsepower,Weight_in_lbs,Acceleration,Year,Origin
0,chevrolet chevelle malibu,18.0,8,307.0,130.0,3504,12.0,1970-01-01,USA
1,buick skylark 320,15.0,8,350.0,165.0,3693,11.5,1970-01-01,USA
2,plymouth satellite,18.0,8,318.0,150.0,3436,11.0,1970-01-01,USA
3,amc rebel sst,16.0,8,304.0,150.0,3433,12.0,1970-01-01,USA
4,ford torino,17.0,8,302.0,140.0,3449,10.5,1970-01-01,USA
...,...,...,...,...,...,...,...,...,...
401,ford mustang gl,27.0,4,140.0,86.0,2790,15.6,1982-01-01,USA
402,vw pickup,44.0,4,97.0,52.0,2130,24.6,1982-01-01,Europe
403,dodge rampage,32.0,4,135.0,84.0,2295,11.6,1982-01-01,USA
404,ford ranger,28.0,4,120.0,79.0,2625,18.6,1982-01-01,USA


##2 Getting your code to look at your google drive

After connecting to your google drive, you have to make sure you're in the right directory to call on the file you want.

To see if you are in the correct directory use ```%ls``` to print the list of files in the directory.

To change the directory use ```%cd <directory_name>``` to move into that directory.


In [None]:
import pandas as pd
from google.colab import drive

drive.mount('gdrive/', force_remount=True)
# %ls

# take a look at this menu with a little file symbol on the leftside, middle of page on colab screen
%cd gdrive/MyDrive/Data-Visualization-Stanford/TECH 26 F24/
%ls



Mounted at gdrive/
/content/gdrive/MyDrive/Data-Visualization-Stanford/TECH 26 F24
 amazon_prime_titles.csv
 API_NY.GDP.MKTP.CD_DS2_en_csv_v2_31795.csv
 [0m[01;34mCode[0m/
'Copy of amenities.gsheet'
'Copy of Mobile_Food_Facility_Permit_20240123.csv'
 geological_earthquakes.csv
'Homework Submissions TECH 26.gsheet'
 [01;34mLectures[0m/
 MBA.csv
 Mobile_Food_Facility_Permit_20240123.csv
'Published Stanford Workplace Violence Prevention Plan.gdoc'
'TECH26 Survival Guide Intro to Visualization with Python.gslides'
'Untitled document.gdoc'


##3 Getting your data from a .csv URL

URLs like this example: https://www.mydomain.com/somelist-of-shows.csv





In [None]:
# get your data from a URL

import pandas as pd

url = "https://raw.githubusercontent.com/anjaleeDS/TECH26_F24/main/tvshows.csv"
df = pd.read_csv(url)

df.head(3)

Unnamed: 0,index,id,title,type,description,release_year,age_certification,runtime,imdb_id,imdb_score,imdb_votes
0,0,tm84618,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,113,tt0075314,8.3,795222.0
1,1,tm127384,Monty Python and the Holy Grail,MOVIE,"King Arthur, accompanied by his squire, recrui...",1975,PG,91,tt0071853,8.2,530877.0
2,2,tm70993,Life of Brian,MOVIE,"Brian Cohen is an average young Jewish man, bu...",1979,R,94,tt0079470,8.0,392419.0


##4 Getting your data from a gsheet inside Google Drive

Requirements:
- must be a type gsheet
- must be accessible in your google drive
- must have a unique name (within your google drive)

In [None]:
# get your data from GSHEET ANYWHERE in your google drive
# this is ddirectly from google's codesnippet icon on the lower left
# icon looks like this: < >

import pandas as pd
from google.colab import auth
auth.authenticate_user()

import gspread
from google.auth import default
creds, _ = default()

gc = gspread.authorize(creds)

worksheet = gc.open('amenities').sheet1

# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
# print(rows)

# Convert to a DataFrame and render.
my_gsheet = pd.DataFrame.from_records(rows)
my_gsheet.head(3)

Unnamed: 0,0,1,2,3
0,unified_id,month,hot_tub,pool
1,AIR10052559,2022-12,1,0
2,AIR10178668,2022-12,0,0


##5 Getting data from a .CSV file that lives inside your Google Drive

- take a look at the menu with a little file symbol on the leftside, middle of page on colab screen. this is your file directory inside google drive

In [None]:
# open a CSV file, NOT A GSHEET FILE in your google drive
# also from the code snippet icon on the lower left
import pandas as pd
from google.colab import drive

drive.mount('gdrive/', force_remount=True)

# This is my file drive where I have my csv file. you'll want to find where you have
# your .csv file inside your own google drive


%cd /content/gdrive/MyDrive/Data-Visualization-Stanford/TECH 26 F24/
filename = 'amazon_prime_titles.csv'
df = pd.read_csv(filename)
df.head()

##6 Getting a data set from a website that allows you to download the file

*Steps*

1.   Find and go to site with dataset
2.   Download dataset
3. Find downloaded file
4. Go to Google drive and click on the "+" button on the upper left and select Upload File
5. Be sure that the file is .csv type.

6. Use the Getting data from a .CSV file that lives inside your Google Drive instructions to read your file.

Find and upload the dataset and keep track of where the file is in your google drive so you can go to the right directory


In [None]:
# open a file downloaded from Kaggle.com or any other dataset source

import pandas as pd
from google.colab import drive

drive.mount('gdrive/', force_remount=True)
%cd /content/gdrive/MyDrive/Data-Visualization-Stanford/TECH 26 F24/

filename = 'MBA.csv'

df = pd.read_csv(filename)
df.head(4)
#(original files from https://www.kaggle.com/datasets/taweilo/mba-admission-dataset)