Skip to content

RonMallory/failed-bank-dataset

Repository files navigation

failed-bank-dataset

This project creates a dataset from the FDIC failed bank list. The dataset is published to kaggle here: failed-bank-list.

Table of Contents

Installation

  1. Clone the Repository:

    git clone https://github.com/RonMallory/failed-bank-dataset.git
    cd failed-bank-dataset
  2. Setup with Poetry:

    Ensure you have Poetry installed:

    poetry install

    This command installs all the necessary dependencies specified in pyproject.toml.

Pre-Commit Hooks

This project uses pre-commit to maintain code quality and consistency. The following hooks are in place:

pre-commit install

Publishing Kaggle Dataset

The init dataset is published manually while additional updates to the dataset occur in github actions.

Initial Publish

  1. Create api token from kaggle account settings
  2. Run the following command to generate the dataset.csv and the kaggle-metadata.json files
poetry run python src/main.py
  1. Run the following command to publish the dataset to kaggle
kaggle datasets create -p ./data"

Github action updating dataset.

  1. With the kaggle.json file that was created in Initial Publish create a github secret with the name KAGGLE_USERNAME and KAGGLE_KEY
  2. Once a pull request has been approved and merged into the main branch the github action will run and update the dataset.
    1. The ci.yml file will use the commit message to annotate the dataset with the changes made.

Contributing

  1. Fork the project.
  2. Create a branch based on the DSLP strategy: git checkout -b feature/new-feature
  3. Commit your changes: git commit -am 'Add new feature'
  4. Push to the branch: git push origin feature/new-feature
  5. Submit a pull request against the appropriate DSLP branch.

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published