# Midterm Project: ETL and Data Warehousing
Madelyn Khoury (mgk5ybb) and Tiara Allard

DS 2002 Spring 2023

### Design and Strategy

We chose to model bank transactions as the core business process of our data warehouse, so we designed a database schema centered around bank transactions. To see the schema we designed for this project, please look at the ReadMe of our GitHub project. 

Our schema stores information about bank transactions, bank accounts and users involved in transactions, transaction dates, and transaction locations. We were unable to find a database/dataset with all this information, so instead we combined data from multiple different data sources. This had the added benefit of allowing us to meet the requirements for importing data from a number of sources.

We combined several dummy/randomly generated datasets to build a complete data warehouse. First, we got bank transaction and account information from a .csv file stored on our local filesystem. Then, we generated user data from an API and linked it to the accounts and transactions. Finally, we imported location information from the Northwind MySQL database to represent regions in which banking transactions might have occurred.

After processing the data and computing useful fields, we formatted it into our fact and dimension tables in the final data warehouse.

### Imports and Helper Functions

In [15]:
import sys
!{sys.executable} -m pip install openpyxl



In [2]:
import json
import pandas as pd
import requests
import os
import datetime
import matplotlib.pyplot as plt

def get_api_response(url, headers, params, response_type):
    try:
        response = requests.get(url, headers=headers, params=params)
        response.raise_for_status()
    
    except requests.exceptions.HTTPError as errh:
        return "An Http Error occurred: " + repr(errh)
    except requests.exceptions.ConnectionError as errc:
        return "An Error Connecting to the API occurred: " + repr(errc)
    except requests.exceptions.Timeout as errt:
        return "A Timeout Error occurred: " + repr(errt)
    except requests.exceptions.RequestException as err:
        return "An Unknown Error occurred: " + repr(err)

    if response_type == 'json':
        # result = json.dumps(response.json(), sort_keys=True, indent=4)
        result = response.json()
    elif response_type == 'dataframe':
        result = pd.json_normalize(response.json())
    else:
        result = "An unhandled error has occurred!"
        
    return result

### Importing Data From Local File System

The core bank transaction information that we will use came from a dataset on Kaggle (https://www.kaggle.com/datasets/apoorvwatsky/bank-transaction-data). We downloaded the data in the form of a csv file and will import it from the local filesystem in order to be used in our data warehouse.

In [16]:
"""
If you don't have Jupyter file explorer settings configured to start in the notebook's current directory,
you'll need to replace this with the absolute path to the file
"""
transaction_info = pd.read_excel('bank.xlsx')

In [17]:
transaction_info

Unnamed: 0,Account No,DATE,TRANSACTION DETAILS,CHQ.NO.,VALUE DATE,WITHDRAWAL AMT,DEPOSIT AMT,BALANCE AMT,.
0,409000611074',2017-06-29,TRF FROM Indiaforensic SERVICES,,2017-06-29,,1000000.0,1.000000e+06,.
1,409000611074',2017-07-05,TRF FROM Indiaforensic SERVICES,,2017-07-05,,1000000.0,2.000000e+06,.
2,409000611074',2017-07-18,FDRL/INTERNAL FUND TRANSFE,,2017-07-18,,500000.0,2.500000e+06,.
3,409000611074',2017-08-01,TRF FRM Indiaforensic SERVICES,,2017-08-01,,3000000.0,5.500000e+06,.
4,409000611074',2017-08-16,FDRL/INTERNAL FUND TRANSFE,,2017-08-16,,500000.0,6.000000e+06,.
...,...,...,...,...,...,...,...,...,...
116196,409000362497',2019-03-05,TRF TO 1196428 Indiaforensic SE,,2019-03-05,117934.30,,-1.901902e+09,.
116197,409000362497',2019-03-05,FDRL/INTERNAL FUND TRANSFE,,2019-03-05,,300000.0,-1.901602e+09,.
116198,409000362497',2019-03-05,FDRL/INTERNAL FUND TRANSFE,,2019-03-05,,300000.0,-1.901302e+09,.
116199,409000362497',2019-03-05,IMPS 05-03-20194C,,2019-03-05,109868.65,,-1.901412e+09,.


### Importing Data From API

We've chosen to use the `users` endpoint from random-data-api.com, which randomly generates data for a set of users. This will populate the Users table in our data warehouse.

In [10]:
size = 5 # only get info on 5 users for now
url = "https://random-data-api.com/api/v2/users"
querystring = {"size":size}
headers = None

# Get information from users API endpoint
users_json = get_api_response(url, headers, querystring, "dataframe")
users_json


Unnamed: 0,id,uid,password,first_name,last_name,username,email,avatar,gender,phone_number,...,address.zip_code,address.state,address.country,address.coordinates.lat,address.coordinates.lng,credit_card.cc_number,subscription.plan,subscription.status,subscription.payment_method,subscription.term
0,7736,67ff8957-aa2b-4838-bf2c-040efa010b2a,cdOST4kaFs,Zona,Abbott,zona.abbott,zona.abbott@email.com,https://robohash.org/architectoadipiscidolorum...,Bigender,+240 326.231.5883 x44560,...,18175-2541,Pennsylvania,United States,58.961226,-45.302817,4997677902168,Premium,Blocked,Cash,Annual
1,580,ed9a010d-83f7-4ecb-b3f7-81b2a7808337,R8QDasqTkJ,Jonathon,Stanton,jonathon.stanton,jonathon.stanton@email.com,https://robohash.org/nihilveniamculpa.png?size...,Female,+502 199.707.3598 x729,...,09467,Maryland,United States,46.56055,165.426246,4633225112030,Basic,Active,Money transfer,Monthly
2,1123,cd9201cf-8857-414b-9dc9-6295af86341c,uG9qBHZJsn,Marth,Fritsch,marth.fritsch,marth.fritsch@email.com,https://robohash.org/maioresvoluptatevitae.png...,Female,+268 1-969-998-8562,...,85096,Vermont,United States,-41.173682,-142.810297,6771-8990-3665-9505,Business,Pending,Alipay,Annual
3,7041,7968406d-8e5b-4c61-b6f5-722a863b908c,O17pBVQR9q,Celina,Haley,celina.haley,celina.haley@email.com,https://robohash.org/euminquis.png?size=300x30...,Genderfluid,+387 (955) 767-3513 x92649,...,19736-1961,Florida,United States,37.446838,-86.115315,5242-3734-2890-5812,Essential,Active,Google Pay,Monthly
4,6497,7709d336-859b-4ab8-886e-210b50a3351f,XPb9hckL30,Eliseo,Bruen,eliseo.bruen,eliseo.bruen@email.com,https://robohash.org/rationequisquamrerum.png?...,Agender,+356 750.380.4363 x0998,...,61575,South Dakota,United States,-23.3595,-100.98436,6771-8955-0082-6463,Basic,Pending,WeChat Pay,Payment in advance
