Skip to content

Irfan-S-1/Tradeshift-Text-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tradeshift-Text-Classification

The data contains features extracted from text similar to the one shown below.

You have to create a ML model that predict the probability that a piece of text belongs to a particular class.

Data extraction Fro the documents nGrams have been extracted, Each row in the Train.csvcorresponds to one such nGram.

Features For a given nGram several features have been extracted (145). These features have been saved in the train.csvand test.csv. They have parsing, spatial, content and relative information. • Content: The cryptographic hash of the raw text. • Parsing: nGram is a number, text, alphanumeric, etc. • Spatial: Position and size of the nGram • Relational: details of text nearby the nGram The feature values can be:

• Numbers. Continuous/discrete numerical values. • Boolean. The values include YES (true) or NO (false). • Categorical. Values within a finite set of possible values.

Labels This are the labels corresponding to the probability that the current sample belongs to the given class. This is multilabel problem and hence a given sample can belong to more than one class.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published