EasyML

A small domain-specific language and Python interpreter for loading tabular data, basic cleaning, and training a choice of two sklearn models—built as a final project for a programming class.

What it is

EasyML runs .ezml scripts line by line: set a file path, load CSV or Excel, clean rows, optionally fit classification or regression (picking between two fixed model pairs), and export datasets or models to exported/. It is a learning exercise, not a production ML system.

Tech stack

Python 3.11, pandas, scikit-learn (linear/logistic regression, decision tree classifier, random forest regressor), joblib, openpyxl (Excel). Dependencies are pinned in environment.yml and requirements.txt.

Prerequisites

Miniconda or Anaconda with conda on your PATH
Your own CSV/XLSX data if you run the samples (see trainingData/README.md)

Setup

Clone this repository and cd into the project root (the directory that contains easyML.py and environment.yml).
Create the environment:
```
conda env create -f environment.yml
```

Activate it:

conda activate undergrad-archive--easyml

Create an output directory (first DOWNLOAD will fail if it is missing):
```
mkdir -p exported
```

Usage

conda activate undergrad-archive--easyml
cd /path/to/EasyML
python easyML.py your_script.ezml

DATAPATH myPath = 'data/myfile.csv'
DATASET myDf = LOAD myPath
CLEAN myDf
MODEL myModel = PREDICT_CAT(myDf, COLUMN J)
DOWNLOAD DATASET myDf
DOWNLOAD MODEL myModel

DATAPATH myPath = 'data/myfile.csv'
DATASET myDf = LOAD myPath
CLEAN myDf
MODEL myModel = PREDICT_NUM(myDf, COLUMN I)
DOWNLOAD DATASET myDf
DOWNLOAD MODEL myModel

Sample scripts

sample1.ezml — Loads running.xlsx, cleans, exports the dataset only.
sample2.ezml — Loads Titanic.csv, cleans, trains classification on COLUMN J, exports dataset and model.
sample3.ezml — Loads housing.csv, cleans, trains regression on COLUMN I, exports dataset and model.

Known limitations

One statement per line; whitespace splitting (line.split()). Targets use COLUMN letters (A=0, …). PREDICT_CAT vs PREDICT_NUM; paths with spaces are fragile.
No unit tests or CI; little error handling beyond a missing script file.
No datasets in the repo; samples assume files exist at the given DATAPATH (see trainingData/README.md for formats and public sources).
exported/ is not created automatically.
Model choice and metrics are simplistic; multiclass and messy real-world tables can still break edge cases.

License

Apache License 2.0 — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EasyML

What it is

Tech stack

Prerequisites

Setup

Usage

Sample scripts

Known limitations

License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
exported		exported
trainingData		trainingData
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
easyML.py		easyML.py
environment.yml		environment.yml
requirements.txt		requirements.txt
sample1.ezml		sample1.ezml
sample2.ezml		sample2.ezml
sample3.ezml		sample3.ezml
trainingData.zip		trainingData.zip

Folders and files

Latest commit

History

Repository files navigation

EasyML

What it is

Tech stack

Prerequisites

Setup

Usage

Sample scripts

Known limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages