In [None]:
import pandas as pd 
import env
import os

# Methods of Data Acquisition

### `read_clipboard`: 
- When you have data copied to your clipboard, you can use pandas to read it into a data frame with pd.read_clipboard. This can be useful for quickly transferring data to/from a spreadsheet.

<br>

### `read_excel`: 
- This function can be used to create a data frame based on the contents of an Excel spreadsheet.

<br>

### `read_csv`: 
- Read from a local csv, or from a the cloud (Google Sheets or AWS S3).

<br>

### `read_sql(sql_query, connection_url)`: 
- Read data using a SQL query to a database. You must have the required drivers installed, and a specially formatted url string must be provided.

    >To talk to a mysql database:
    >
    >` python -m pip install pymysql mysql-connector`
    <br>
    >The connection url string:
    >
    >` mysql+pymysql://{USER}:{PASSWORD}@{HOST}/{DATABASE_NAME}`

___
# Source: Clipboard

Navigate to Google Classroom > Classwork > Data

- Scroll down to Classification Lesson - students.csv 
- Double click to Open
- [Cmd][A]
- [Cmd][C]

Or find a table (not image) of data like: <a href = "https://www.testmasters.net/PsatAbout/Scoring-Scale">PSAT Scoring Scale</a>

___
# Source: A Shared Google Sheet
1. Get the shareable link url: https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER

2. Turn that into a CSV export URL: 
    - Replace `/edit` with `/export`; 
    - Add `format=csv` to the beginning of the query string. 
    
        https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/export?format=csv&gid=NUMBER

3. Pass it to `pd.read_csv`, which can take a URL.

___
# Source: CSV (Hosted or Local)

#### Hosted:

___
# Source: SQL
Create a dataframe from the `passengers` table in the mySQL database: `titanic_db`.

<div class="alert alert-danger" role="alert">
    <div class="row vertical-align">
        <div class="col-xs-1 text-center">
            <i class="fa fa-exclamation-triangle fa-2x"></i>
        </div>
        <div class="col-xs-11">
                <strong> Remember:</strong>
            Be sure to import <b>.gitignore</b> prior to pushing env.py
</div>

<div class="alert alert-danger" role="alert">
    <div class="row vertical-align">
        <div class="col-xs-1 text-center">
            <i class="fa fa-exclamation-triangle fa-2x"></i>
        </div>
        <div class="col-xs-11">
<strong>Database Credentials</strong>
<br>
It's a bad idea to store your database access credentials (i.e. your username and password) in plaintext in your source code. There are many different ways one could manage secrets like this, but a simple way is to store the values in a python file that is <b>not</b> included along with the rest of your source code. 
<br>
This is what we have done with the env module.            
<br>
<br>
Another option may be to create environment variables that may be called using the os library.
            </div> 


#### We will create a function that we can reference later to acquire the data:

#### Store this function in a file named `acquire.py`

---
___
# Caching Your Data
Because data acquisition *can take **time***, it's a common practice to write the data **locally** to a `.csv` file.

1. Do whatever you need to do to produce the dataframe that you need.
    - For example ```df = pd.read_sql('SELECT * FROM passengers', get_connection('titanic_db'))```
    - Or your dataframe could include joins, multiple data sources, etc...
    
<br>

2. Next, use ```df.to_csv("titanic.csv")``` to write that dataframe to the file.
<br>

3. In your data acquisition function:
    - First check to see if the csv file exists 
    - If it does, read from the csv file
    - Otherwise get "fresh" data from mySQL


***
Let's work through the function creation!