#Importing Data into Your Colab Notebooks
There are several ways to import data, such as a csv, into your notebook. We will discuss 3 ways.  
- Directly from a url.
- Importing from your local machine.
- By mounting your Google Drive.

In all 3 methods, we will use pandas to read the files after they are imported.  

In Colab's documentation under More Resources, they provide a link ("Loading data:...") that connects you to documentation for ways to load your data into colab. In this link, you can find information for the ways we will load data plus other processes for loading data such as PyDrive, REST API, and Google Cloud Services. We will not practice these, but this is the link if you are interested.  
[Loading data:Drive,Sheets, and Google Cloud Servies]('https://colab.research.google.com/notebooks/io.ipynb')





## Directly from a URL  
You can import directly from a url if the **raw** csv data is available on a url.  
This is not common.  But if the data is available as raw data from a url, this is the simpliest way to import.  
The example below is one of the csv examples that we already looked at in class.  
If you use the link in your browser, you can see the raw csv data.


In [None]:
# Start by importing pyplot from matplotlib as plt. And import pandas as pd.


# Now we will assign the url link as a string to the variable named url.
# And use the read_csv method in pandas to assign the csv information to the variable we named data.
# Uncomment the next 2 lines
# url = 'https://raw.githubusercontent.com/rashida048/Datasets/master/movie_dataset.csv'
# data = pd.read_csv(url)

# Now that our data has been read by pandas, we can reference columns of data using data['name at the top of the column']
# Use ctrl-right click to go to the url above to see the csv data in another window and you will see the column names I am using below.
# Uncomment the next 2 lines and run the code.
# budget = data['budget']
# revenue = data['revenue']

# We are going to run this as a scatter plot.
plt.scatter(budget, revenue)

plt.title('Movie Data')
plt.xlabel('Budget')
plt.ylabel('Revenue')

plt.show()


### Another url Example
This is another example of importing from a url that I included as another reference for you.  
This example come from Corey Schafer's tutorial.

In [None]:
#@title
import pandas as pd
from matplotlib import pyplot as plt

url = 'https://raw.githubusercontent.com/CoreyMSchafer/code_snippets/master/Python/Matplotlib/05-Fill_Betweens/data.csv'
data = pd.read_csv(url)
ages = data['Age']
dev_salaries = data['All_Devs']
py_salaries = data['Python']
js_salaries = data['JavaScript']

plt.plot(ages, dev_salaries, color='#444444',
         linestyle='--', label='All Devs')

plt.plot(ages, py_salaries, label='Python')

overall_median = 57287

plt.fill_between(ages, py_salaries, dev_salaries, where=(py_salaries > dev_salaries),
                 interpolate=True, alpha=0.25, label='Above Avg')

plt.fill_between(ages, py_salaries, dev_salaries, where=(py_salaries <= dev_salaries),
                 interpolate=True, color='red', alpha=0.25, label='Below Avg')
plt.legend()

plt.title('Median Salary (USD) by Age')
plt.xlabel('Ages')
plt.ylabel('Median Salary (USD)')

plt.tight_layout()

plt.show()

## Importing from Your Local Machine
**Files that are uploaded directly into a colab notebook are not saved when the session ends, so you will need to upload the files each time you start a new session.**

You can upload files in 2 ways:
- Using the Files Tab  
- Using code in your code cell

### Using the Files Tab
On the left side of the window, there is a button that looks like a file.  
If you click that file button, there is a button that will let you "Upload to session storage".  
You can click that button and choose the file you want to upload from the local machine.  
After the file is uploaded to the session storage, you can access it in your code.


In the Google Classroom Assignment, I included Howard_Data.csv.  
- Download that csv to your computer, then upload it to this session.

You will get a pop up that warns you that the csv will not be saved at the end of the session.


Now that the csv is in your session storage, go through the code below to create a histogram.

**To open the file, we will need the path to the file.  
To copy the path, click on the 3 dots on the right side of the file name.**


In [None]:
from matplotlib import pyplot as plt
import pandas as pd

plt.style.use('seaborn')

# The next 2 lines of code let you change the height and width of the graph.
plt.rcParams['figure.dpi'] = 100
plt.rcParams['savefig.dpi'] = 300

# In your Codecademy lesson, you learned to use with open to access files.
# We will use with open to access the Howard_Data csv.
# To identify the file, we will need the path. (in this case, '/content/Howard_Data.csv')
# ***To copy the path, click on the 3 dots to the right of the file name.***
with open('/content/Howard_Data.csv') as file:
  data = pd.read_csv(file)

# In histograms, we need to assign bins to set the divisions of the data.
bins = [20, 30, 40, 50, 60, 70, 80, 90, 100, 110]

# This is the line of code for the histogram.
# The edgecolor puts a line of color around the columns so they will be easier to see.
plt.hist(data, bins=bins, color='#daf01a', edgecolor='black')

# Here I used the max() and min() methods to find the max and min of the data.
max = int(data.max())
min = int(data.min())

# Here I added vertical lines at 75, max and min.
# I also added labels to these lines, so the labels can be added to the legend.
plt.axvline(75, color='green', label='Passing: 75', linewidth=4)
plt.axvline(max, color='blue', label=('Highest Score: '+str(max)), linewidth=4)
plt.axvline(min, color='red', label=('Lowest Score: '+str(min)), linewidth=4)

# This adds the legend to the graph.
# loc() let me change the location of the legend in my graph. These are x, y coordinates
plt.legend(loc=(0.15, 0.8))

plt.title('Biology Test Scores')
plt.xlabel('Scores')
plt.ylabel('Number of Students')

plt.tight_layout()
plt.show()

### Using Code
We can also use code that Google wrote to upload files.  
Run the code cell below and you will see how it works.  
You can choose the file you want to upload after you click the button in the output cell.  

(You can also probably use this to automate uploading your files when you run your code for the first time.)

**The downside of this method is that it mounts your whole Drive, so it may slow your processing speeds.**  
For that reason, **I would not recommend this method.**

In [None]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

## Mounting Google Drive
Mounting your Google Drive to the notebook allows you to pull any file directly from your Drive to use in the notebook.

**The downside of this method is that it mounts your whole Drive, so it may slow your processing speeds.**  
For that reason, **I would not recommend this method.**

### Using the Files Tab
You may have noticed that the Files Tab has a file button that looks like a file with the Drive icon on it.  
If you click that button, it will upload your Drive.  
(You will get a pop up telling you that this may take a while.)

### Using Code
You can also use the code below to mount your Drive.  
This code can be run in its own cell or part of a code cell you are working in.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Once your Google Drive is mounted, you can copy the path to any file in your Drive and use it in a with open statement to access the data on the file.