In [37]:
# INTRODUCTION TO FINANCIAL DATABASES
# In finance, vast amounts of data are generated and stored daily.
# Because of the speed, size, and dynamic nature of the data, companies need tools to sotre and access the data in an organized way.
# They do this through repositories known as DATABASES.
# Companies use financial databases in particular to build all sorts of financial applications and APIs.
# With financial databases, companies can store, organize, and analyze large amounts of data.
# SQL is a language that was designed to help manage and analyze data within databases.
# We can also use it with Python to build applications that interact with databases.
# SQL has become a standard language for people in roles that involve working with data, such as data engineers, data analysts, and data modelers.

In [38]:
# DATABASES vs. DATAFRAMES
# To understand databases, we'll first revisit CSV files and DataFrames.
# So far, besides when APIs were the main source of data in Module 5, most of the data provided has been TABULAR DATA, or CSV files.
# TABULAR DATA is good for storing small amounts of data that don't change much.
# However, you've observed the dynamic and varied nature of financial data, therefore, text files just aren't sufficient for analyzing or working with dynamic data.
# There was a revelation that financial technologists needed more dynamic ways to access and work with data.
# So a data container in Python, called a DataFrame, was created. 
# A DataFrame stores the rows and columns of CSV data in a container that many tools can access.
# These tools help us manipulate and analyze the data. 
# Databases are thus like DataFrames in that they're designed to store, access, and analyze data.
# Unlike DataFrames, however, databases exist outside Python.

In [39]:
# TYPES OF DATABASES
# Databases offer a wide range of benefits, which include the ability to handle vast amounts of data and spread them across many computers and networks.
# This environment of range is often referred to as THE CLOUD.
# While many types of databases exist, we'll focus on relational databases.
# This type of database stores its data in a table format.
# The best way to learn how to interact with a database is to use Python and SQL.
# In this lesson, we'll work with a database called SQLite, which we can interact with via Python.
# However, you can apply thses skills that you learn to any relational database, including:
    # 1. PostgreSQL
    # 2. MySQL
    # 3. Cloud Databases

In [40]:
# ACCESSING FINANCIAL DATABASES WITH PYTHON
# In a professional FinTech environment, you'll likely access databases via the firm's computer network or internet.
# The latter will likely be the case if a cloud infrastructure, like Amazon Web Services (AWS) or Microsoft Azure, stores the database.
# However, it's possible to mimic the functionality of a financial database on your local computer.
# To do this, you can use the SQLAlchemy Python library.
# This library facilitates the communication between Python program and a database that you create from a Jupyter notebook.
# As always, you begin by importing the library.

In [41]:
# IMPORT SQLALCHEMY
# Because databases exist outside Python, we need to import a new tool that allows us to communicate with those databases.

# IMPORTANT
# The examples in this lesson use the SQLite database, but the same principles apply to any relational database.

# Import the dependencies:
import pandas as pd
import sqlalchemy

In [42]:
# CREATE A DATABASE ENGINE
# An ENGINE is a tool that can communicate with our database.
# Think of it as a smart connection that knows the DIALECT, or type, of database that we have and how to connect and interact with it.
# First, we need to define a connection string to connect to our database.
# The CONNECTION STRING tells the engine the type of database that we're using and other important connection details, like a password, server name, port number, or IP address.
# We'll assign a value to the connection string that creates a temporary SQLite database, as the following code shows:

# Create a temporary sqlite database:
database_connection_string = 'sqlite:///'

# Now that we have our connection string, we'll pass it to the `create_engine` function:
engine = sqlalchemy.create_engine(database_connection_string)
engine

# Now we can the engine to do all sorts of things. We'll start by reviewing the tables in the database.

Engine(sqlite:///)

In [43]:
# VIEW AND CREATE TABLES
# To view the tables in the database, we run the following code:
engine.table_names()

  engine.table_names()


[]

In [44]:
# As you can see, the output is empty brackets. 
# This indicates that no tables exist, which makes sense because we haven't created any yet.
# Let's create a Pandas DataFrame to represent a table.
# We'll name the DataFrame `stocks_dataframe` and give it two columns AAPL and GOOG, that contain fictional pricing data:
stocks_dataframe = pd.DataFrame({'AAPL': [1, 2], 'GOOG': [3, 4]})
stocks_dataframe

Unnamed: 0,AAPL,GOOG
0,1,3
1,2,4


In [45]:
# Next, we'll use this DataFrame to create a table of data in the SQLite database.
# To do so, we'll use the `to_sql` function. 
# This function accepts a table name and a database engine and then converts the Pandas DataFrame to a database table:
stocks_dataframe.to_sql('stocks_dataframe', engine)

2

In [46]:
# Now, let's query the database engine again to make sure that the table now exists:
engine.table_names()

  engine.table_names()


['stocks_dataframe']

In [47]:
# IMPORTANT:
# People often use the term QUERY when referring to a SQL database.
# A database query can refer to either a select query or an action query.
# A SELECT QUERY requests information from a database.
# An ACTION QUERY requests that the database apply an action - such as an addition, update, or deletion - to its information.
# People also use the term query to refer to any SQL statement that the database engine acts on.
# Specific queries will be covered later in this module.

In [48]:
# Do you wonder where the database is and how to get data from it?
# Currently, your SQLite database resides in your computer's memory - but only temporarily.
# As soon as you close your Jupyter notebook, the database will disappear.
# If you want to save the database to access it in the future, oyu can specify a file name, such as:
    # `sqlite:///mydatabase.db
# This will save the database information to a file named `mydatabase.db`, which will reside on your computer's hard drive.

In [49]:
# READ A TABLE INTO A DATAFRAME
# The `stocks` table in our database originated from a Pandas DataFrame of health stock closing prices that we called.
# Now, we will act as if we're observing the `stocks` table in the database for the first time.
# How do we create a Pandas DataFrame from this table?
# We can read any database table into a Pandas DataFrame by using the `read_sql_table` function and specifying the name of the table.
# We also need to tell Pandas to use our engine as the connection to the database by setting the `con=engine` parameter:
sql_stocks_df = pd.read_sql_table('stocks_dataframe', con=engine)
sql_stocks_df

Unnamed: 0,index,AAPL,GOOG
0,0,1,3
1,1,2,4


In [50]:
# Notice the index column.
# When we originally created the database table, we didn't tell SQL to ignore the index of the DataFrame.
# So, it created a column in the database table to store the index.
# While this can prove helpful if the index contains important information (such as a date or customer ID number),
    # the information in this specific index column doesn't add any value.
# We can remove the index column by setting the `index` parameter of the `to_sql` function to `False`.
# We'll also add an `if_exists` parameter to the `to_sql` function and set its value to `replace`.
# This will remove the existing `stocks_dataframe` table from the database and replace it with the current information.
# If we don't set this parameter, we'll receive an error message indicating that a table with the name `stocks_dataframe` already exists.
# By setting `if_exists` to `replace`, we remove the original `stocks_dataframe` table and replace it with the new, index-free version.

# *IMPORTANT*
# The `if_exists` parameter tells the `to_sql` function how to behave if a table with the same name already exists in the database.
# The `if_exists` parameter has three option:
    # 1. The `replace` value: This removes the existing table and replaces it with the new values.
    # 2. The `fail` value: This produces an error message that alerts us to the existence of the original table, and the function takes no other action. This is the default value.
    # 3. The `append` value: This adds the new values to the existing table.

# Here's the code to remove the index column:
stocks_dataframe.to_sql('stocks_dataframe', engine, index=False, if_exists='replace')
sql_stocks_df = pd.read_sql_table('stocks_dataframe', con=engine)
sql_stocks_df

Unnamed: 0,AAPL,GOOG
0,1,3
1,2,4
