Skip to content

AravindhPandiyan/Text-processing

Repository files navigation

text_classification

Tools used in this project

Project Structure

.
├── config                      
│   ├── main.yaml                   # Main configuration file
│   ├── model                       # Configurations for training model
│   │   └── model1.yaml             # Second variation of parameters to train model
│   └── process                     # Configurations for processing data
│       └── process1.yaml           # Second variation of parameters to process data
├── data            
│   ├── final                       # data after training the model
│   ├── processed                   # data after processing
│   └── raw                         # raw data
├── docs                            # documentation for your project
├── .gitignore                      # ignore files that cannot commit to Git
├── Makefile                        # store useful commands to set up the environment
├── models                          # store models
├── notebooks                       # store notebooks
├── .pre-commit-config.yaml         # configurations for pre-commit
├── pyproject.toml                  # dependencies for poetry
├── README.md                       # describe your project
├── requirements.txt                # This contains the requirements file
└── src                             # store source code
    ├── __init__.py                 # make src a Python module 
    ├── process.py                  # process data before training model
    ├── train_model.py              # train model
    └── utils.py                    # store helper functions

Set up the environment

  1. Install Poetry
  2. Activate the virtual environment:
poetry shell
  1. Install dependencies:
  • To install all dependencies from pyproject.toml, run:
poetry install
  • To install only production dependencies, run:
poetry install --only main
  • To install a new package, run:
poetry add <package-name>

View and alter configurations

To view the configurations associated with a Pythons script, run the following command:

python src/process.py --help

Output:

process is powered by Hydra.

  == Configuration groups ==
  Compose your configuration from those groups (group=option)

model: model1
process: process1


  == Config ==
  Override anything in the config (foo.bar=value)

process:
  use_columns: sentence
  batch_size: 16
model:
  name: Logistic regression
  parameters:
    steps: 200
data:
  raw:
    train: ../data/raw/train.parquet
    val: ../data/raw/val.parquet

  processed:
    train: ../data/processed/train.parquet
    val: ../data/processed/val.parquet

  final: ../data/final/metrics.csv

To alter the configurations associated with a Python script from the command line, run the following:

python src/process.py data.raw=sample2.csv

Auto-generate API documentation

To auto-generate API document for your project, run:

make docs_save

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors