Project 21 is an AutoML engine that aims to make the life of Data Scientists a lot easier by automating the process of generating, tuning and testing the best model according to their dataset. You provide the data, we provide the best model along with the required expertise to get you started in the field of ML.
Get started now! [Note: Project 21 is a work in progress. This repo will undergo major changes in the coming months]
This project requires the following dependencies installed beforehand.
python
version3.8+
and/or above.node
versionv10.19.0
npm
version6.14.4
MongoDB Community
version4.4.7
pip
version 20.0.2
To run the project on Ubuntu by manually typing the commands one by one, do the following -
-
Get a copy of this project locally and install the dependencies:
-
Fork this repository.
-
git clone <url>
- put the url of your forked repo. Once you have the cloned copy locally.cd
into the project folder. -
Create a virtual environment inside the project folder using the following:
python3 -m venv venv
-
Activate the virtual environment using:
source venv/bin/activate
-
Once the environment is activated, install the dependencies using:
pip3 install -r requirements.txt
-
Now install the react dependencies by the following:
cd Frontend/pr21/ npm i
-
-
Once all dependencies are installed go to the project root folder and run the following commands:
-
Activate the virtual environment if not done so:
source venv/bin/activate
-
Start the Database server (mongodb community server)
sudo systemctl start mongod
Enter your password, if prompted to do so.
-
Start the Backend server (From the root folder)
python3 api.py
-
Start the Frontend server (From the root folder - can be done by opening a new terminal window)
cd Frontend/pr21/ npm start
-
-
Go to
localhost:3000
and use Project21 for your needs!
To run the project on Ubuntu using shell scripts, do the following -
-
Run the Installation shell script
./application-install-ubuntu.sh
-
Start the Frontend, Backend and Database server
./application-startup-ubuntu.sh
-
Go to
localhost:3000
and use Project21 for your needs!
To run the project on Windows by manually typing the commands one by one, do the following -
-
Get a copy of this project locally and install the dependencies
-
Fork this repository.
-
git clone <url>
- put the url of your forked repo. Once you have the cloned copy locally.cd
into the project folder. -
Create a virtual environment inside the project folder using the following:
python -m venv venv
-
Activate the virtual environment using:
.\venv\Scripts\activate
-
Now Install React dependencies by the following:
cd Frontend\pr21\ npm i
-
Once the environment is activated, install the dependencies using:
pip install -r requirements.txt
-
-
Once all dependencies are installed go to the project root folder and run the following commands:
-
Activate the virtual environment if not done so:
.\venv\Scripts\activate
-
Start the Database server - By running MongoDB Community Edition. (Can be downloaded from their website)
-
Start the Backend server (From the root folder)
python api.py
-
Start the Frontend server (From the root folder - can be done by opening a new terminal window)
cd Frontend\pr2\ npm start
-
-
Go to
localhost:3000/
and use Project21 for your needs!
To run the project on Windows using shell scripts, do the following -
-
Run the Installation shell script
.\application-install-windows.sh
-
Start the Frontend, Backend and Database server
.\application-startup-windows.sh
-
Go to
localhost:3000
and use Project21 for your needs!
Note: In case of any errors or bugs faced, please raise an issue on the GitHub page. Contributions are gladly welcomed!
- Task: task includes the problem type (classification, regression, seq2seq), the pointer to the data and evaluation metric to be used to build a model.
- Data: data holds the raw content and the meta information about the type of data (text, images, tabular etc.) and its characteristics (size, target, names, how to process etc).
- Model: is either a machine learning or time series or deep learning model which is needed to learn the relation in the data.
- Model Universe: is a collection of models, its hyper parameters and the tasks to which it has to be considered.
Top level architecture provides how things works in twentyone.
-
TwentyOne is designed to leverage transfer learning as much as possible. For many problems the data requirement for 21 is minimal. This saves a lot of time, effort and cost in data collection. Model training is also greatly reduced.
-
TwentyOne tries to be an auto ML engine, it can be used as an augmented ML engine which can help data scientist to quickly develop models. This can bring best of both worlds leading to "Real Intelligence = Artificial Intelligence + Human Intelligence"
-
21 builds "Robust Models"
-
21 requires less amount of user interaction
-
Cost for development is minimal (mainly compute cost, rest is reduced).
-
21 provides Data Security and Privacy
-
21 takes less amount of time to build a model.
-
Retraining can be done for same project to improve models
In 21 user can build two categories of machine learning models. They are:
- CLASSIFICATION
- REGRESSION
- CLUSTERING
- TIME SERIES
Machine learning algorithms is to recognize objects and being able to separate them into categories. This process is called classification, and it helps us segregate vast quantities of data into discrete values, i.e. distinct, like 0/1, True/False, or a pre-defined output label class.
Models Used:
- Logistic Regression
- Random Forest Classifier
- Decision Tree
- XGBOOST
- GaussianNB
- K-NN
- Polynomial
- SVR
-
Performance of random forest is evaluated by Precision, Recall, F1-score & Accuracy.
-
Regression is a supervised learning technique which helps in finding the correlation between variables and enables us to predict the continuous output variable based on the one or more predictor variables.
-
Regression shows a line or curve that passes through all the datapoints on target-predictor graph in such a way that the vertical distance between the datapoints and the regression line is minimum.
Models Used:
- Logistic Regression
- Random Forest Regression
- DecisionTree
- XGBOOST
- GaussianNB
- K-NN
- Polynomial
- SVC
- MultinomialNB
-
Clustering is unsupervised algorithm technique which can be used on data where target variable in unknown. It can be used in forming groups of similar data on different basis
Models Used:
- Kmeans
- Lmodes
- DBSCAN
- Birch
- Optics
- K-NN
- Agglomerative
- Mean shift
- Affinity Propagation
-
Time series algorithms are used on data which has dependency on time for example sales, finance, temperature etc.
Models Used:
- AR
- MA
- ARMA
- ARIMA
- SARIMA
This repository is maintained by Shivaramkrs and curl team members and contributors Nikhil Agarwal, Paarth S Barkur, Rishabh Bhatt, Pooja BS & Shubham Kumar Shaw.