Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default config file to src/ #37

Closed
ohenrik opened this issue Jul 8, 2016 · 3 comments
Closed

Add default config file to src/ #37

ohenrik opened this issue Jul 8, 2016 · 3 comments

Comments

@ohenrik
Copy link
Contributor

ohenrik commented Jul 8, 2016

Hi

Should we add a src/config.py or src/settings.py file? I believe this would make it easier to get paths to folders etc. in make_data.py for example.

# src/config.py
""" Storing config variables and other settings"""
from os.path import join, dirname, os, abspath
from dotenv import load_dotenv
import inspect

dotenv_path = join(dirname(__file__), '../.env')
load_dotenv(dotenv_path)

class ParamConfig:
    """Config variables for the project"""
    def __init__(self):
        self.kaggle_username = os.environ.get("KAGGLE_USERNAME")
        self.kaggle_password = os.environ.get("KAGGLE_PASSWORD")
        self.config_dir = dirname(abspath(inspect.getfile(inspect.currentframe())))
        self.root_dir = dirname(self.config_dir)

        # Data directories
        self.data_dir = os.path.join(self.root_dir, 'data')
        self.raw_data_dir = os.path.join(self.data_dir, 'raw')
        self.processed_data_dir = os.path.join(self.data_dir, 'processed')

config = ParamConfig()

I can then import the config variable like so:

# Selective excerpt from src/data/make_data.py as an example 
from src.settings import config 

def main(output_zip=False):
    """Create data!"""
    logger = logging.getLogger(__name__)
    logger.info('making final data set from raw data')

    # compression = 'gzip' if output_zip is True else

    # Read raw data (auto unzipping files!)
    train_sales = pd.read_csv(path.join(config.raw_data_dir, 'train.csv.zip'))
    test_sales = pd.read_csv(path.join(config.raw_data_dir, 'test.csv.zip'))
    stores = pd.read_csv(path.join(config.raw_data_dir, 'store.csv.zip'),
                         dtype={'CompetitionOpenSinceYear': str,
                                'CompetitionOpenSinceMonth': str,
                                'Promo2SinceWeek': str,
                                'Promo2SinceYear': str,})

However note that importing settings in this way also requires me to change the make file from this:

data:
    python -m src/data/make_dataset.py

to this:

data:
    python -m src.data.make_dataset

I'm not sure if this has any downsides to it. An alternative is also to add the src and/or settings file to the python path.

I'm still learning both Python and Data Science so please bear with me if what I'm suggesting or my code is Silly :)

@pjbull
Copy link
Collaborator

pjbull commented Jul 9, 2016

Hey @ohenrik thanks for the thoughts. Generally, we want anything that varies from machine to machine to be stored in the environment. Check out the 12-factor app's section on configuration for more reasons why.

Projects with a large number of settings do grow into having a separate importable settings or config module (e.g., a django application). My inclination is to keep things simple by default and not roll our own settings module that would need development/support/documentation at the base level of the cookiecutter.

Happy to consider if there are really compelling use cases for needing a config.py as a default, but I think environment variables will cover most of them.

@epogrebnyak
Copy link

epogrebnyak commented Mar 12, 2018

Hope the question fits here... what about a helper module or src/settings.py to access data folder?

In src/data/make_data.py you have:

# not used in this stub but often useful for finding various files
project_dir = os.path.join(os.path.dirname(__file__), os.pardir, os.pardir)

Myself I use something like below to locate the repo root folder:

from pathlib import Path
DATA_PATH = Path(__file__).parents[2] / 'data'

def make_data_path(folder: str, file_name: str) -> str:
    folder = DATA_PATH / folder
    if not folder.exists():
        folder.mkdir(parents=True)
    return str(folder / file_name)

Once the data directory structure is there template, maybe path to it should be in src/settings.py too? For me it is seems a good convenience feature.

@isms
Copy link
Collaborator

isms commented Apr 15, 2019

Closing as possible in the future but based on participation in this question issue it's not a commonly desired feature (or is easily slotted in on a per-project basis).

@isms isms closed this as completed Apr 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants