To view the Zeppelin Notebook shown in the demo without installing the Docker image: go to https://www.zepl.com/explore and paste the following link into the search bar https://raw.githubusercontent.com/abeasock/open_source_demo/master/assets/loans.json
This repository contains the Dockerfile and assests used to build the Docker image shown during the open source demo.
The Dockerfile installs the following tools:
- Java
- Anaconda (Python 2.7)
- Apache Spark 2.2.0
- Apache Zeppelin
- H2O Sparkling Water
- Apache Superset
The assets folder contains:
- loans_updated.zip - contains loans_updated.csv (data prepped for Superset)
- flask_deployment_demo - folder contains the flask app used to deploy the model built in loans.json
- superset_dashboard_loans.pickle - the dashboard built in Superset to visualize the loan data
- create_database.sh - shell script to unzip loans_updated.csv, creates the SQLite3 database lending_club.db
,
and runs create_table.sql
to create the SQL table.
- create_table.sql - creates the SQL table loans_v3
from loans_updated.csv
Additional Files:
run_open_source_demo.bat - file to quick start the Docker image on Windows
run_open_source_demo.sh - file to quick start the Docker image on Mac
Data:
The data used for the project is too large to upload to GitHub. The csv and data dictionary is available for download on Kaggle:
Kaggle Lending Club Data. This is the csv used in the loans Zeppelin notebook.
This assumes you have Docker installed and basic knowledge of using it. (https://www.docker.com/)
Steps to build a Docker image:
- Clone this repo
git clone https://github.com/abeasock/open_source_demo.git
or
Manually download by clicking the "Clone or Download button above" and "Download Zip"
Unzip the folder and place it in the desired location. I renamed the folder "open_source_demo" - Build the image
cd <directory containing the Dockerfile>
docker build -t open_source_demo .
Note: A path is a mandatory argument for the build command. I used "." because I navigated in the command line to the directory containing the dockerfile in the previous step. I used the -t option to tag the image. - Run the Docker image
cd <directory containing the run_open_source_demo file>
run_open_source_demo.bat or run_open_source_demo.sh
Zeppelin: http://localhost:19090
Superset: http://localhost:18088
Flask: http://localhost:15555
H2O Flow: http://localhost:54321
To import loans.json: go to Zeppelin in the browser, click "Import Note" on the home page, a pop up will appear, click "Add from URL", and enter a name to import as and the URL https://raw.githubusercontent.com/abeasock/open_source_demo/master/assets/loans.json
Flask app can started by running the following command:
/spark/bin/spark-submit --py-files /sparkling-water-2.2.6/py/build/dist/h2o_pysparkling_2.2-2.2.6.zip /assets/flask_deployment_demo/loan_app_demo.py
After running the command, open Browser and go to: http://localhost:15555/
Username and password are set in the Dockerfile:
Username: Admin
Password: Admin
To access the Superset dashboard built for the loans data, you will need to follow the steps below:
-
In the docker container execute the command:
/assets/create_database.sh
This will create a sqlite3 database namedlending_club.db
with a table namedloans_v3
-
Open Superset and log-in
-
Add the
lending_club.db
to Superset by "Sources" in the top banner > "Databases" > the plus sign in the upper right corner to add a new database. This will open the "Edit Database" page. Fill in:
Database: lending_club
SQLAlchemy URI: sqlite:////assets/lending_club.db
Click "Test Connection"
A message will pop up if your connection is successful
Click "Save" at the bottom -
Add the table
loans_v3
to Superset by clicking "Sources" in the top banner > "Tables" > the plus sign in the upper right corner to add a new table. This will open the "Add Table" page. Fill in:
Database: lending_club
Table Name: loans_v3
A message should print on the page that a table was created. -
Add the saved dashboard used in the demo by clicking "Manage" in the top banner > "Import Dashboards" > Choose File open_source_demo/assets/superset_dashboard_loans.pickle (where repository was downloaded locally) and click "Upload"
The dashboard for Lending Club should now be avialable under Dashboards. Click on it to view (it may take a few minutes for all of the slices in the dashboard to load).
Preview of dashboard: