# Midterm Project: ETL and Data Warehousing
Madelyn Khoury (mgk5ybb) and Tiara Allard

DS 2002 Spring 2023

### Design and Strategy

We chose to model bank transactions as the core business process of our data warehouse, so we designed a database schema centered around bank transactions. To see the schema we designed for this project, please look at the ReadMe of our GitHub project. 

Our schema stores information about bank transactions, bank accounts and users involved in transactions, transaction dates, and transaction locations. We were unable to find a database/dataset with all this information, so instead we combined data from multiple different data sources. This had the added benefit of allowing us to meet the requirements for importing data from a number of sources.

We combined several dummy/randomly generated datasets to build a complete data warehouse. First, we got bank transaction and account information from a .csv file stored on our local filesystem. Then, we generated user data from an API and linked it to the accounts and transactions. Finally, we imported location information from the Northwind MySQL database to represent regions in which banking transactions might have occurred.

After processing the data and computing useful fields, we formatted it into our fact and dimension tables in the final data warehouse.

### Imports and Helper Functions

In [2]:
import json
import pandas as pd
import requests
import os
import datetime
import matplotlib.pyplot as plt

def get_api_response(url, headers, params, response_type):
    try:
        response = requests.get(url, headers=headers, params=params)
        response.raise_for_status()
    
    except requests.exceptions.HTTPError as errh:
        return "An Http Error occurred: " + repr(errh)
    except requests.exceptions.ConnectionError as errc:
        return "An Error Connecting to the API occurred: " + repr(errc)
    except requests.exceptions.Timeout as errt:
        return "A Timeout Error occurred: " + repr(errt)
    except requests.exceptions.RequestException as err:
        return "An Unknown Error occurred: " + repr(err)

    if response_type == 'json':
        # result = json.dumps(response.json(), sort_keys=True, indent=4)
        result = response.json()
    elif response_type == 'dataframe':
        result = pd.json_normalize(response.json())
    else:
        result = "An unhandled error has occurred!"
        
    return result

### Importing Data From API

We've chosen to use the `users` endpoint from random-data-api.com, which randomly generates data for a set of users. This will populate the Users table in our data warehouse.

In [7]:
size = 5 # only get info on 5 users for now
url = "https://random-data-api.com/api/v2/users"
querystring = {"size":size}
headers = None

# Get information from users API endpoint
users_json = get_api_response(url, headers, querystring, "dataframe")
users_json


Unnamed: 0,id,uid,password,first_name,last_name,username,email,avatar,gender,phone_number,...,address.zip_code,address.state,address.country,address.coordinates.lat,address.coordinates.lng,credit_card.cc_number,subscription.plan,subscription.status,subscription.payment_method,subscription.term
0,1356,f402ab30-70a7-4aa6-afc2-f875827ebd75,4oMzQWbmgp,Randy,Grant,randy.grant,randy.grant@email.com,https://robohash.org/esteumperspiciatis.png?si...,Polygender,+251 862.003.5280 x3399,...,89974-1241,Georgia,United States,-13.866172,-88.814594,4451027778589,Basic,Blocked,Apple Pay,Monthly
1,2752,f3486f90-6abf-4e4a-86d0-93f28ecbe982,V4R5cy1N3v,Katherin,Batz,katherin.batz,katherin.batz@email.com,https://robohash.org/nequeatest.png?size=300x3...,Male,+236 1-193-626-5899 x3115,...,59001-1931,New York,United States,-12.687525,150.051687,6771-8960-5042-0764,Bronze,Idle,Alipay,Payment in advance
2,9162,eafbb5a2-91b9-43ee-9a32-c1502c372d99,jh4neOcg0H,Christian,O'Conner,christian.o'conner,christian.o'conner@email.com,https://robohash.org/perspiciatisaliquamenim.p...,Female,+48 209-250-0233 x1375,...,73444-4693,Arizona,United States,69.483611,102.687136,6771-8930-6423-0842,Standard,Active,Cash,Monthly
3,372,43fa0dad-0813-48cf-923d-5bdfbf753e5a,8e1z0CXTOt,Domingo,Kreiger,domingo.kreiger,domingo.kreiger@email.com,https://robohash.org/enimdoloredolor.png?size=...,Polygender,+1-868 (524) 203-8066,...,61945,Louisiana,United States,-21.620289,-119.800074,5382-6597-7546-7491,Platinum,Idle,Debit card,Annual
4,6556,38b45b85-b6c8-42b8-85c6-3054e53ea0ea,bqOvuRZGm1,Corrinne,Halvorson,corrinne.halvorson,corrinne.halvorson@email.com,https://robohash.org/officiaeumatque.png?size=...,Bigender,+689 1-711-411-3668 x6053,...,53425-9105,Oregon,United States,12.491353,-109.209017,5155-6765-6192-2484,Silver,Idle,Apple Pay,Full subscription
