-
Notifications
You must be signed in to change notification settings - Fork 51
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
initial commit: modified scala etls to accept Fannie Mae data (#191)
* initial commit: modified scala etls to accept Fannie Mae data Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * updated pyspark etls to consume raw mortgage data Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * updated pyspark application docs Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * updated scala spark application docs Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * updated MortgageETL.ipynb notebook Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * fixed accuracy issue in scala and python ETLs Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * tested MortgageETL Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * added scala notebook etl Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * updated MortgageETL+XGBoost.ipynb Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * fix bugs in docs Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * update docs to reflect fannie mae data Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * updated ipynb files with future links Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * link updated Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * remove maxPartitionBytes Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * added code to save train and test datasets Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * removed incompatibleOps Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * resolved spark/rapids configs Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * added instructions to download dataset Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * modified readme files to reflect config changes Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * fixed a bug in utility Mortgage.scala Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * fixed scala application bus Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * fixed python spark application bugs Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * added cpu etl section in readMe Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * fixed scala notebooks Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * fixed python notebooks Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * added step to run on CPU in scala notebook etl Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * fixed cv scala notebook bug Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * improve documentation Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com> * read data from disk before random split Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com>
- Loading branch information
1 parent
8c4809c
commit ac355c0
Showing
24 changed files
with
2,228 additions
and
1,531 deletions.
There are no files selected for viewing
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# How to download the Mortgage dataset | ||
|
||
|
||
|
||
## Steps to download the data | ||
|
||
1. Go to the [Fannie Mae](https://capitalmarkets.fanniemae.com/credit-risk-transfer/single-family-credit-risk-transfer/fannie-mae-single-family-loan-performance-data) website | ||
2. Click on [Single-Family Loan Performance Data](https://datadynamics.fanniemae.com/data-dynamics/?&_ga=2.181456292.2043790680.1657122341-289272350.1655822609#/reportMenu;category=HP) | ||
* Register as a new user if you are using the website for the first time | ||
* Use the credentials to login | ||
3. Select [HP](https://datadynamics.fanniemae.com/data-dynamics/#/reportMenu;category=HP) | ||
4. Click on **Download Data** and choose *Single-Family Loan Performance Data* | ||
5. You will find a tabular list of 'Acquisition and Performance' files sorted based on year and quarter. Click on the file to download `Eg: 2017Q1.zip` | ||
6. Unzip the downlad file to extract the csv file `Eg: 2017Q1.csv` | ||
7. Copy only the csv files to a new folder for the ETL to read | ||
|
||
## Notes | ||
1. Refer to the [Loan Performance Data Tutorial](https://capitalmarkets.fanniemae.com/media/9066/display) for more details. | ||
2. Note that *Single-Family Loan Performance Data* has 2 componenets. However, the Mortgage ETL requires only the first one (primary dataset) | ||
* Primary Dataset: Acquisition and Performance Files | ||
* HARP Dataset | ||
3. Use the [Resources](https://datadynamics.fanniemae.com/data-dynamics/#/resources/HP) section to know more about the dataset |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.