## Data Wrangling
---
### Data Acquisition and Data Preparation

__Learning Objectives__:
1. Learn efficient and effective workflow for data acquisition
>- Understand why `.py` files and jupyter notebooks are used together in practice.
>- How to use `.py` files in data acquisition
>- Best practices when going through the data science workflow.
1. Practice querying a database using `SQL` and `Pandas`
1. Practice cleaning data to use for EDA and Modeling
1. Apply and reinforce concepts using these datasets:
- `mall_customers`
- `employees`
- `sakila`
- `world`

### Data Acquisition Prerequisites
---
These datasets are in a SQL database. To access them, I'll need a few things.


<div class="alert alert-block alert-danger">This is the equivalent of: "Check for your keys, phone, and wallet BEFORE you leave the house! Before you start acquiring data, know EXACTLY what you need from a database/dataset: Database name, database table, columns, filters for subsets of data, etc. If you don't, you'll waste time. Be effective the FIRST time, planning goes a long way."</div>

__Tools__:
1. jupyter notebook
1. SQL GUI
1. SQL Programming
1. Python Programming
1. Pandas Library



__Data Acquisition Setup__:
1. SQL database access credentials
1. The name of the database
1. The name of the table inside the database
1. The data I need from that table(s)
1. The SQL to retrieve that data
1. The code to load the data in a local environment
1. How to save data from a database in a local environment

> Some steps will take longer/short than others, but practice them all.

### Data Preparation Prerequisites
---
These datasets are in a SQL database. To access them, I'll need a few things.


<div class="alert alert-block alert-danger">Before you start preparing your data, know EXACTLY what actions you need to take: shaping data, removing nulls, filling missing values, casting datatypes, encoded variables, formatting row/column/value names, etc. If you don't, you'll waste time. Be effective the FIRST time, planning goes a long way."</div>

__Tools__:
1. jupyter notebook
1. Python Programming
1. Pandas Library

__Data Preparation Setup__:
1. Necessities
> Some steps will take longer/short than others, but practice them all.

In [6]:
# Import pandas and connect to database with access credentials
# to begin data acquisition.
import pandas as pd

import env
from acquire import get_connection

# `mall_customers` dataset
---
- Data Acquisition
1. Return __all rows__ from the `customers table` in the `mall_customers` database.
    - No specifics, just acquire the data.

- Data Preparation

### Data Acquisition



```python
# Prebuilt function to connect to a SQL database.

# Database with the name `mall_customers` is accessed.
connection = get_connection('mall_customers')

# Data from the table `customers` table is returned.
sql_query = 'select * from customers;'

# Use pandas to send the connection with 'order instructions' 
# to return all data from the `customers` table in the 
# `mall_customers` database.
df_mall_customers = pd.read_sql(sql=sql_query, con=connection)
```

```python
# Sweet. After the data is loaded into your notebook,
# SAVE IT To your local environment.

# Only run this code once. It will create a CSV file of the
# `customers` table in your current working directory.

df_mall_customers.to_csv('mall_customers.csv', index=False)
```

In [12]:
# Load in a local copy of mall_customers data.
df_mall_customers = pd.read_csv('mall_customers.csv')

In [14]:
df_mall_customers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   customer_id     200 non-null    int64 
 1   gender          200 non-null    object
 2   age             200 non-null    int64 
 3   annual_income   200 non-null    int64 
 4   spending_score  200 non-null    int64 
dtypes: int64(4), object(1)
memory usage: 7.9+ KB


###### Data Acquisition Analysis
1. encode gender
1. drop customer_id
1. define our target variable: spending score

### Data Preparation

###### Data Preparation Analysis
1. 
1. 
1. 

# `employees` dataset
---
- Data Acquisition
- Data Preparation

### Data Acquisition

###### Data Acquisition Analysis
1. 
1. 
1. 

### Data Preparation

###### Data Preparation Analysis
1. 
1. 
1. 

# `sakila` dataset
---
- Data Acquisition
- Data Preparation

### Data Acquisition

###### Data Acquisition Analysis
1. 
1. 
1. 

### Data Preparation

###### Data Preparation Analysis
1. 
1. 
1. 

# `world` dataset
---
- Data Acquisition
- Data Preparation

### Data Acquisition

###### Data Acquisition Analysis
1. 
1. 
1. 

### Data Preparation

###### Data Preparation Analysis
1. 
1. 
1. 

# Summary TIL, TIP