a) Specifying the Research Question Build a text classification model that classifies a given text input as written in english or in dutch.
b) Defining the Metric for Success Build a classification model with an accuracy of score of atleast 85%.
c) Understanding the Context You work as a Computational Linguist for a Global firm, collaborating with Engineers and Researchers in Assistant and Research & Machine Intelligence to develop language understanding models that improve our ability to understand and generate natural language.
d) Recording the Experimental Design Business Understanding Data Exploration Data Preparation Data Modeling and Evaluation
- We could use the accuracy as a reliable metric because our dataset was balanced.
- Ada Boost Classifier was leading
- Our best performing model was Ada Boost Classifier. To improve our model, we can try perfoming other text processing techniques that would better prepare our data for fitting our model. We can also use different vectorizing techniques, implement other machine learning models, perform hyperparameter tuning and sample a balanced dataset.