In [1]:
import pandas as pd

## Extracting data from parquet files

One of the most common ways to ingest data from a source system is by reading data from a file, such as a CSV file. As data has gotten bigger, the need for better file formats has brought about new column-oriented file types, such as parquet files.

In this exercise, you'll practice extracting data from a parquet file.

### Instructions
    - Read the parquet file at the path "sales_data.parquet" into a pandas DataFrame.
    - Check the data types of the DataFrame via print()ing.
    - Output the shape of the DataFrame, as well as it's head.

In [None]:
import pandas as pd

# Read the sales data into a DataFrame
sales_data = pd.read_parquet("sales_data.parquet", engine="fastparquet")

# Check the data type of the columns of the DataFrames
print(sales_data.dtypes)

# Print the shape of the DataFrame, as well as the head
print(sales_data.shape)
print(sales_data.head())

## Pulling data from SQL databases

SQL databases are one of the most used data storage tools in the world. Many companies have teams of several individuals responsible for creating and maintaining these databases, which typically store data crucial for day-to-day operations. These SQL databases are commonly used as source systems for a wide range of data pipelines.

For this exercise, pandas has been imported as pd. Best of luck!

### Instructions
    - Update the connection URI to create a connection engine for the sales database, using sqlalchemy.
    - Query all rows and columns of the sales table and output the results.

In [None]:
import sqlalchemy

# Create a connection to the sales database
db_engine = sqlalchemy.create_engine("postgresql+psycopg2://repl:password@localhost:5432/sales")

# Query the sales table
raw_sales_data = pd.read_sql("SELECT * FROM sales", db_engine)
print(raw_sales_data)