# 5. Egress and Persistence

Now that you've engineered your features, manipulated your data and applied classifiications to your data, what if we want to save it somewhere?

For the most part, this is the same as the Acquisition step in reverse...

In [None]:
import pandas as pd
import os

## To File

In [None]:
df = pd.read_csv("../data/CompleteUG_clean.csv")
df.head()

In [None]:
os.mkdir("../data/output")

In [None]:
# To CSV

df.to_csv("../data/output/Example.csv", index=None)

In [None]:
# To Parquet

df.to_parquet("../data/output/Example.parquet", index=None)

## Output to SQL Server

Pandas DataFrames feature a method to write directly to databases, however Pandas does not have its own connection manager.

We need to use SQLAlchemy as an ORM to upload to a SQL Table, and provide a driver for the task.


In [None]:
!pip install sqlalchemy pyodbc


In [None]:
#import sqlalchemy
#import pyodbc
from sqlalchemy import create_engine
import urllib


In [None]:
# create a SQL Alchemy Engine
params = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};SERVER=.\\SQL2017;DATABASE=Sandbox;Trusted_Connection=yes")

engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)

In [None]:
# Persist data to database

df.to_sql(name="CompleteUG", con=engine)

In [None]:
#and if exists, append more data...

df.to_sql(name="CompleteUG", con=engine, if_exists="append")

In [None]:
# or simply replace...

df.to_sql(name="CompleteUG", con=engine, if_exists="replace")

## ...and more!

In [None]:
# and more...
formats = [
{"text", "CSV", "read_csv", "to_csv"},
{"text", "JSON", "read_json", "to_json"},
{"text", "HTML", "read_html", "to_html"},
{"text", "Local clipboard", "read_clipboard", "to_clipboard"},
{"binary", "MS Excel", "read_excel", "to_excel"},
{"binary", "HDF5 Format", "read_hdf", "to_hdf"},
{"binary", "Feather Format", "read_feather", "to_feather"},
{"binary", "Parquet Format", "read_parquet", "to_parquet"},
{"binary", "Msgpack", "read_msgpack", "to_msgpack"},
{"binary", "Stata", "read_stata", "to_stata"},
{"binary", "SAS", "read_sas", ""},
{"binary", "Python Pickle Format", "read_pickle", "to_pickle"},
{"SQL", "SQL", "read_sql", "to_sql"},
{"SQL", "Google Big Query", "read_gbq", "to_gbq"}    
]
cols = ["Format Type", "Data Description", "Reader", "Writer"]
pd.DataFrame(formats, columns=cols)

Source: http://pandas.pydata.org/pandas-docs/stable/user_guide/io.html