Tradeshift-Text-Classification

The data contains features extracted from text similar to the one shown below.

You have to create a ML model that predict the probability that a piece of text belongs to a particular class.

Data extraction Fro the documents nGrams have been extracted, Each row in the Train.csvcorresponds to one such nGram.

Features For a given nGram several features have been extracted (145). These features have been saved in the train.csvand test.csv. They have parsing, spatial, content and relative information. • Content: The cryptographic hash of the raw text. • Parsing: nGram is a number, text, alphanumeric, etc. • Spatial: Position and size of the nGram • Relational: details of text nearby the nGram The feature values can be:

• Numbers. Continuous/discrete numerical values. • Boolean. The values include YES (true) or NO (false). • Categorical. Values within a finite set of possible values.

Labels This are the labels corresponding to the probability that the current sample belongs to the given class. This is multilabel problem and hence a given sample can belong to more than one class.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
Tradeshift text classification.ipynb		Tradeshift text classification.ipynb
test.csv		test.csv
train.csv		train.csv
trainLabels.csv		trainLabels.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tradeshift-Text-Classification

About

Releases

Packages

Languages

Irfan-S-1/Tradeshift-Text-Classification

Folders and files

Latest commit

History

Repository files navigation

Tradeshift-Text-Classification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages