# Google Colab
As I promised, I figured out the file import protocols for **colab.research.google.com**.

### NOTE This Link Only Works Using Colab Jupyter Notebooks:

[Open in Colab](https://chrome.google.com/webstore/detail/open-in-colab/) is a Chrome extension that will allow you to open notebooks from GitHub, then save them to your google drive.  

## Saving Notebooks To GitHub or Drive

Any time you open a GitHub hosted notebook in Colab, it opens a new editable view of the notebook. You can run and modify the notebook without worrying about overwriting the source.

If you would like to save your changes from within Colab, you can use the File menu to save the modified notebook either to Google Drive or back to GitHub. Choose **File→Save a copy in Drive** or **File→Save a copy to GitHub** and follow the resulting prompts. To save a Colab notebook to GitHub requires giving Colab permission to push the commit to your repository.

After you have installed ***Open in Colab*** let's test it with this week's notebook:

[GALA Coding Club 15.ipynb](https://github.com/BrianArbuckle/GALA/blob/master/GALA_Coding_Club_15.ipynb)

Now you can save a local copy, which will be in your google drive. 

# Files and File Paths

I have glossed over this topic, as we have been importing ZIP Code data from a local source, that I have always created.  Starting to store our own data is crucial next step to working with the data in our project. 

Variables are a great way to store data while your program or notebook is running, but if you want
your data to persist even after your program has finished, you need to save it to a file.  

Think of our large shape files that we imported from the US Census data. We are in the process of reducing the data to a smaller 'workable' data frame just the unique zip zip codes.  We will need to save this reduced size if we want to continue working with the data. 


### Properties
A file has two main properties: a **filename** (typically written as one word) and a **path**. The path specifies the location of a file on the computer.   The part of the filename after the last period is called the file’s **extension** and tells you a file’s type. **project.docx** is a Word document, and our first zip code data that we imported **free-zipcode-database-Primary.csv** is a csv, or column separated values file.  

Let's look at our import portion of out notebook from last week *GALA Coding Club 14* if you run this, is will not work as you do not have a local copy:

In [0]:
import pandas as pd

zip_df = pd.read_csv("data/free-zipcode-database-Primary.csv",index_col=0)

Take note of the *data* before our csv and the shape file.  'data' is a ***folder***, or more often in computer programing, it is called a **directory**.  Also notices the slash.

### Backslash on Windows and Forward Slash on OS X and Linux
On Windows, paths are written using backslashes (\\) as the separator between folder names. OS X and Linux, however, use the forward slash (/) as their path separator. If you want your programs to work on all operating systems, you will have to write your Python scripts to handle both cases.  We will not worry about it for a long time, but the good news is there is a module that helps with this process, which is called the **<code>os</code>** module.  In addition to slashes, there fantastic things the os module does, which we will dive into now.  First, as we have for other modules or *libraries*, we will need to import the **os** module.  

In [0]:
import os

### Jupyter Notebooks and Directories. 

Trying to find our current working location can be quite important when writing a program. The nice thing when working with Jupyter Notebooks is our working directory, unless we change it, is the same as our working directory. Let's confirm that with the **<code>os.getcwd()</code>**.

In [0]:
os.getcwd()

'/content'

In GALA Coding Club 14 we listed all the various files needed when importing the single <code>shape</code>.  We did this with the **<code>os.listdir()</code>**.

In [0]:
os.listdir()

['.config', 'sample_data']

We can also create a new director with the **<code>os.mkdir()</code>**, which need a string argument for the name of the folder / directory.  Since we will be needing the data files for our project, lets create a new directory called data<br>
PLEASE RUN IF THIS IS THE FIRST TIME USING THIS NOTEBOOOK:

In [0]:
os.mkdir('data')

We can also change the working directory with the **<code>os.chdir()</code>** to the data folder:

In [0]:
os.chdir('data')

Let's check inside with os.listdir() again.

In [0]:
os.listdir()

[]

Look what happens when we try and change the the directory to `data` again:

In [0]:
os.chdir('data')

FileNotFoundError: ignored

# Absolute vs. Relative Paths
There are two ways to specify a file path.

* An **absolute path**, which always begins with the root folder, on my computer that is 'Users'<br>
* However an **absolute path**, though is the same in colab, has a complicated twist.  Your "root" folder, is technically the computer that is assigned to you when you start a new colab notebook.  When you close it and start it again, it become a new computer.  So it is important to note that we treat the '/content' as our root directory. <br>
* A **relative path**, which is relative to the program’s current working directory.

By changing the path to 'data' without the full path, python assumes that it is relative to the current working directory.  

For now, since it is easy to do in Jupyter Notebooks, we will be using the relative path.  Before we do that, we need to change back to our main folder:

    '/content'
    
We coud use the Absolute Path, but there is a quicker way to move to the director containing the current director, and that is '..':

In [0]:
os.chdir('..')

In [0]:
os.listdir()

['.config', 'data', 'sample_data']

# Getting Data Into Colab

Once again we will use a module, this time it is **<code>urllib</code>** module. And all we need inside the module is a sub-library called **<code>request</code>** and a single function **<code>urlretrieve</code>**. And we will assign the csv link, from our GitHub repository and assign it to the variable **url**:

In [0]:
from urllib.request import urlretrieve

#Assign url of file: url
url = "http://federalgovernmentzipcodes.us/free-zipcode-database-Primary.csv"

# Save file locally

We will use the **<code>urlretrieve</code>** module, which requires two arguments. First is the web address of the the file that we want to copy. The second, is the destination path, notice that I have the **<code>data/</code>** in the path name.  Since we changed to our main directory, <code>/content</code> and we created the new directory 'data'.

In [0]:
urlretrieve(url, 'data/free-zipcode-database-Primary.csv')

('data/free-zipcode-database-Primary.csv',
 <http.client.HTTPMessage at 0x7fb494b3e518>)

Now Let's list the contents of the data directory.  This time, we will not change our working directory, but simiply pass the 'data' folder name as the argument in the **<code>os.listdir()</code>. 

In [0]:
os.listdir('data')

['free-zipcode-database-Primary.csv']

### Now we can use the data (This is the code from above)

In [0]:
import pandas as pd

zip_df = pd.read_csv("data/free-zipcode-database-Primary.csv",index_col=0)

In [16]:
zip_df.head()

Unnamed: 0_level_0,ZipCodeType,City,State,LocationType,Lat,Long,Location,Decommisioned,TaxReturnsFiled,EstimatedPopulation,TotalWages
Zipcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
705,STANDARD,AIBONITO,PR,PRIMARY,18.14,-66.26,NA-US-PR-AIBONITO,False,,,
610,STANDARD,ANASCO,PR,PRIMARY,18.28,-67.14,NA-US-PR-ANASCO,False,,,
611,PO BOX,ANGELES,PR,PRIMARY,18.28,-66.79,NA-US-PR-ANGELES,False,,,
612,STANDARD,ARECIBO,PR,PRIMARY,18.45,-66.73,NA-US-PR-ARECIBO,False,,,
601,STANDARD,ADJUNTAS,PR,PRIMARY,18.16,-66.72,NA-US-PR-ADJUNTAS,False,,,


# Save the data once

We have been using the <code>free-zipcode-database-Primary.csv</code> in many of our notebooks, but only need to copy it and save it to our data folder once. 