<td>
   <a target="_blank" href="https://www.clarifai.com/" ><img src="https://upload.wikimedia.org/wikipedia/commons/b/bc/Clarifai_Logo_FC_Web.png" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Clarifai/examples/blob/main/models/model_train/text-classification_training.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Colab"></a>
</td>

# Models



Clarifai offers a range of powerful model types, each designed to generate meaningful outputs based on user specific inputs and AI tasks.

There are wide variety of models that can be used as standalone solutions, or as building blocks for your own custom business solutions.



Clarifai Models are the recommended starting points for many users because they offer incredibly fast training times when you customize them using the "embedding-classifier" (Transfer Learning Classifier) model type.

But there are many cases where accuracy and the ability to carefully target solutions take priority over speed and ease of use. Additionally, you may need a model to learn new features, not recognized by existing Clarifai Models. For these cases, it is possible to "deep fine-tune" your custom models and integrate them directly within your workflows.

You might consider deep training if you have:

- A custom tailored dataset
- Accurate labels
- Expertise and time to fine-tune models

_______
On the [Clarifai Community](https://clarifai.com/explore) explore page, you can click the [Models](https://clarifai.com/explore/models) tab to search and access the models available for everyone to use. 

This notebook contains Model Train demo for **text-classifier** Model Type with **HF_GPTNeo_125m_lora** Template.

# Getting Started

### Installation

In [None]:
! pip install clarifai

In [1]:
import os
os.environ["CLARIFAI_PAT"] = "PAT" # replace with your own PAT key here

*Note: Guide to get your [PAT](https://docs.clarifai.com/clarifai-basics/authentication/personal-access-tokens)*

### For Colab
To access data files from Clarifai examples repo, you can clone the repo

In [None]:
!git clone https://github.com/Clarifai/examples.git
%cd /content/examples/models/model_train

## TEXT-CLASSIFIER

**Input: Text**

**Output: Concepts**

Text classifier is a type of deep fine-tuned model designed to automatically categorize or classify text data into predefined categories or concepts. This is a common task in natural language processing (NLP) and has a wide range of applications, including sentiment analysis, spam detection, topic categorization, and more.

### Creating an App

In [2]:
from clarifai.client.user import User
#replace your "user_id"
client = User(user_id="user_id")

In [3]:
app = client.create_app(app_id="app_id", base_workflow="Universal")

### Uploading Classification Dataset

#### Preview of Data

In [17]:
CSV_PATH = os.path.join(os.getcwd().split('/models/model_train')[0],'datasets/upload/data/imdb.csv')
CSV_PATH

'/Users/adithyansukumar/work/ml_training_error/examples/datasets/upload/data/imdb.csv'

In [18]:
import pandas as pd
data = pd.read_csv(CSV_PATH)
data.head(5)

Unnamed: 0,input,concepts
0,"Now, I won't deny that when I purchased this o...",neg
1,"The saddest thing about this ""tribute"" is that...",neg
2,Last night I decided to watch the prequel or s...,neg
3,I have to admit that i liked the first half of...,neg
4,I was not impressed about this film especially...,neg


#### Upload Data

In [19]:
dataset = app.create_dataset(dataset_id="text_dataset")
dataset.upload_from_csv(csv_path=CSV_PATH,input_type='text',csv_type='raw', labels=True)

Uploading inputs: 100%|██████████| 2/2 [00:13<00:00,  6.51s/it]


### List Trainable Model Types

In [20]:
app.list_trainable_model_types()

['visual-classifier',
 'visual-detector',
 'visual-segmenter',
 'visual-embedder',
 'clusterer',
 'text-classifier',
 'embedding-classifier',
 'text-to-text']

### Create a Model

In [None]:
MODEL_ID = "model_text_classifier"
MODEL_TYPE_ID = "text-classifier"
model = app.create_model(model_id=MODEL_ID, model_type_id=MODEL_TYPE_ID)

### List Templates for the Model Type

Templates give you the control to choose the specific architecture used by your neural network, as well as define a set of hyperparameters you can use to fine-tune the way your model learns.

In [8]:
model.list_training_templates()

['HF_GPTNeo_125m_lora',
 'HF_GPTNeo_2p7b_lora',
 'HF_Llama_2_13b_chat_GPTQ_lora',
 'HF_Llama_2_7b_chat_GPTQ_lora',
 'HF_Mistral_7b_instruct_GPTQ_lora',
 'HuggingFace_AdvancedConfig']

### Save params
Save the parameters for the specific model template.

In [9]:
model_params = model.get_params(template='HF_GPTNeo_125m_lora')

In [10]:
model_params

{'dataset_id': '',
 'dataset_version_id': '',
 'concepts': [],
 'train_params': {'invalid_data_tolerance_percent': 5.0,
  'template': 'HF_GPTNeo_125m_lora',
  'model_config': {'pretrained_model_name': 'EleutherAI/gpt-neo-125m',
   'torch_dtype': 'torch.float32',
   'problem_type': 'multi_label_classification'},
  'peft_config': {'r': 16.0,
   'peft_type': 'LORA',
   'task_type': 'SEQ_CLS',
   'lora_dropout': 0.1,
   'inference_mode': False,
   'lora_alpha': 16.0},
  'tokenizer_config': {},
  'trainer_config': {'auto_find_batch_size': True, 'num_train_epochs': 1.0}},
 'inference_params': {'select_concepts': []}}

#### Get param info

In [11]:
print(model.get_param_info(param = 'concepts'))

{'fieldType': 'ARRAY_OF_CONCEPTS', 'description': 'List of concepts you want this model to predict from any existing concepts in your app.', 'required': True, 'param': 'concepts'}


### Update params
Note: User can edit the params in the YAML file or directly update the params using model.update_params()

In [22]:
concepts = [concept.id for concept in app.list_concepts()]

In [23]:
model.update_params(dataset_id = 'text_dataset',concepts = ["id-pos","id-neg"])

In [24]:
model.training_params

{'dataset_id': 'text_dataset',
 'dataset_version_id': '',
 'concepts': ['id-pos', 'id-neg'],
 'train_params': {'invalid_data_tolerance_percent': 5.0,
  'template': 'HF_GPTNeo_125m_lora',
  'model_config': {'pretrained_model_name': 'EleutherAI/gpt-neo-125m',
   'torch_dtype': 'torch.float32',
   'problem_type': 'multi_label_classification'},
  'peft_config': {'r': 16.0,
   'peft_type': 'LORA',
   'task_type': 'SEQ_CLS',
   'lora_dropout': 0.1,
   'inference_mode': False,
   'lora_alpha': 16.0},
  'tokenizer_config': {},
  'trainer_config': {'auto_find_batch_size': True, 'num_train_epochs': 1.0}},
 'inference_params': {'select_concepts': []}}

### Start Model Training

In [25]:
model_version_id = model.train()

### Check Model Training Status

In [26]:
import time
while True:
    status = model.training_status(version_id=model_version_id,training_logs=False)
    if status.code == 21106:#MODEL_TRAINING_FAILED
        print(status)
        break
    elif status.code == 21100: #MODEL_TRAINED
        print(status)
        break
    else:
        print("Current Status:",status)
        print("Waiting---")
        time.sleep(120)

Current Status: code: MODEL_QUEUED_FOR_TRAINING
description: "Model is currently in queue for training."

Waiting---
Current Status: code: MODEL_TRAINING
description: "Training stage in progress: 0/1 complete."

Waiting---
Current Status: code: MODEL_TRAINING
description: "Training stage in progress: 0/1 complete."

Waiting---
Current Status: code: MODEL_TRAINING
description: "Training stage in progress: 0/1 complete."

Waiting---
Current Status: code: MODEL_TRAINING
description: "Deployment stage in progress: 0/1 complete."

Waiting---
Current Status: code: MODEL_TRAINING
description: "Deployment stage in progress: 0/1 complete."

Waiting---
Current Status: code: MODEL_TRAINING
description: "Deployment stage in progress: 0/1 complete."

Waiting---
Current Status: code: MODEL_TRAINING
description: "Deployment stage in progress: 1/2 complete."

Waiting---
Current Status: code: MODEL_TRAINING
description: "Deployment stage in progress: 1/2 complete."

Waiting---
Current Status: code: MOD

## Model Prediction
Predicting with the Trained Model.
Note: Refer this [notebook](https://github.com/Clarifai/examples/blob/main/models/model_predict.ipynb) for more info on Model Predict

In [27]:
TEXT = b"This is a great place to work"
model_prediction = model.predict_by_bytes(TEXT, input_type="text")

# Get the output
print('Input: ',TEXT)
for concept in model_prediction.outputs[0].data.concepts:
    print(concept.id,':',round(concept.value,2))

Input:  b'This is a great place to work'
id-neg : 0.08
id-pos : 0.05


## Note

- This Notebook is a demo to get started with Model Training in Clarifai Platform with Python SDK.
- For better accuracy of the Model, Choose your own data and different Templates and Hyperparams.

## Clarifai Resources

**Website**: [https://www.clarifai.com](https://www.clarifai.com/)

**Demo**: [https://clarifai.com/demo](https://clarifai.com/demo)

**Sign up for a free Account**: [https://clarifai.com/signup](https://clarifai.com/signup)

**Developer Guide**: [https://docs.clarifai.com](https://docs.clarifai.com/)

**Clarifai Community**: [https://clarifai.com/explore](https://clarifai.com/explore)

**Python SDK Docs**: [https://docs.clarifai.com/python-sdk/api-reference](https://docs.clarifai.com/python-sdk/api-reference)

---