Skip to content

ZirvedaAytimur/Natural-Language-Processing-NLP-

Repository files navigation

Natural Language Processing

Natural Language Processing is a subcategory of artificial intelligence and linguistics, commonly known as NLP. It is a branch of science that conducts research on the processing and purpose of natural languages such as Turkish, English, German and French. NLP is a way for computers to analyze, understand and derive meaning from human language in a smart and useful way. There are examples for NLP in this repository. You can find the Turkish explanation of some examples on my Youtube channel. You can find the videos in the sample descriptions.

  • Text Basics - In this folder, you can find various text examples, examples that can be done on pdf, basic level examples about file operations.
  • N-Gram Analysis - There are 2 examples in Turkish about N-gram analysis. Youtube Youtube
  • Spacy Basics - Spacy is one of the important libraries used in NLP. Here too, there are several beginner-level examples of Spacy.
  • Tokenizer - There are examples of tokenization in this folder.
  • Stemming - Stemming is basically removing the suffix from a word and reduce it to its root word. You can find both English and Turkish examples here.
  • Lemmatization - Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma. Youtube
  • Stop Words - A stop word is a commonly used word (such as “the”, “a”, “an”, “in”) that a search engine has been programmed to ignore, both when indexing entries for searching and when retrieving them as the result of a search query.
  • Phrase Matching and Vocabulary - In this folder, there are examples that identify and label specific phrases that match patterns we can define ourselves.
  • Part of Speech Tagging - Here you can see examples of pos taggers and see comparisons on a Turkish collection. Youtube
  • Latent Semantic Analysis - Latent Semantic Model is a statistical model for determining the relationship between a collection of documents and the terms present n those documents by obtaining the semantic relationship between those words. In the example here, an example was made on a Turkish corpus. Youtube
  • Simple LESK Algorithm - The Lesk definition, on which the Lesk algorithm is based is “measure overlap between sense definitions for all words in context”. Here I designed a simple LESK algorithm myself for the homonym Turkish words. Youtube
  • Named Entity Recognition - Named entity recognition (NER) ‒ also called entity identification or entity extraction ‒ is a natural language processing (NLP) technique that automatically identifies named entities in a text and classifies them into predefined categories. Entities can be names of people, organizations, locations, times, quantities, monetary values, percentages, and more.
  • Sentence Segmentation - Sentence tokenization (also called sentence segmentation) is the problem of dividing a string of written language into its component sentences.
  • Lexical Similarity - In the example here, misspelled words were detected using the Zemberek library on a Turkish corpus, and possible truths were written. Youtube
  • Semantic Similarity - In the example here, using Doc2vec, the most similar sentence to the given sentence was extracted from the text. Youtube
  • ChatBot with Flask - In this example using Flask, ChatBot answers the most frequently asked questions about artificial intelligence. Youtube
  • Machine Translation - In order to translate from English to Turkish, I created a model by passing the data through various stages. Youtube
  • Text Summarization - Here I have summarized a text taken from a website using NLP functions. Youtube