Skip to content

Latest commit

 

History

History
23 lines (17 loc) · 5.99 KB

parsers.md

File metadata and controls

23 lines (17 loc) · 5.99 KB

Sparser - Intoduction To Parsers

Terminology

Let's define the niche terms so that the wordy paragraphs further down make sense to normal people. It is important to note that these terms come from the linguistic sciences of human spoken and written language, but they are adapted to apply to computer programming languages in nearly identical ways.

  • lexer - A lexer is simply a scanner. This means it is a small utlity that analyzes language as atomic fragments. With spoken language humans use lexical analysis to break down collections of sounds and form them into words. With written language humans use lexical analysis against letters, space, and punctuation to form words, statements, and sections. In computer programming lexers are utilities that analyze atomic units of input, typically characters from text strings, and form larger pieces with more precise meaning like keywords, comments, and numbers.
  • lexeme - A language rule necessary to make use of the given language. In some spoken languages tone is an important rule that imposes unique meaning and context upon words. In various other spoken languages rhythm, social context, speed of delivery, gender, and other rules impose qualities upon language that influences how the language is deciphered. The necessary rules often change from language to language. This concept applies very directly to programming languages. As an example, a language like XML makes use of concepts and structures not available or relevant to a language like JavaScript.
  • lexicon - An inventory of various language rules, lexemes, necessary to make sense of a communication. The list of such rules for a given language is a language's lexicon. Lexicon's influence how a language is used and how thoughts are expressed in the given language.
  • parser - A parser is a higher order utility that typically uses a lexer to gather pieces of a language and puts those pieces into categories and structures to form knowledge. In spoken language a lexer may put sounds together to form words, but a parser puts those words together to form a thought. A parser has to account for the order, structure, and context of its language pieces in accordance with a langauge's lexicon for that langauge to make sense. Consider these two statements as examples: My dog house is rough. against My house dog is rough. The only difference is the order of two words, which provides very different meaning. The lexical analysis for those two statements would be identical, but they are parsed differently.
  • syntax - Syntax is a collection of rules that impose constraints on parsing. As an analogy syntax is to a parser as lexeme is to a lexer. Syntax helps keep the expressiveness of a language regular so that a receiver may better understand the communication of a transmitter or so that a transmitter may have higher confidence their message is properly interpreted by a receiver. For programming languages syntax determines if an instance of code makes sense to a computer's parser as it determines how the parser interprets the code, throws an error, or if the computer is allowed to guess at instructions.
  • grammar - Grammars are higher order rules that define structure and context. In written language grammar typically refers to word order, conjugation
  • compiler - A compiler transforms an artifact expressed in one language into a different language. Compilers are higher order utilities that frequently use parsers, but otherwise have nothing to do with parsing. This term is only included because many software developers uses these terms interchangeably, incorrectly, and cannot tell the difference. A compiler is not a parser and is not related to parsing.

Lexical Scope

A well known computer science term, lexical scope, was not included in the definitions above. This is intentional. Scope is the area of availability for a given reference or code unit. That has everything to do with where things are declared in code, how they are declarared, and from where they are referenced. Scope resolution, even if similarly named, has nothing to do with parsing or lexical analysis. The names are similar because they come from similar sources in linguistics research.

Lexical scope is a type of scope mechanism in language design where by a scope is a structure that could be referenced as though it were an atomic unit. The benefit of that is that the very boundaries that define the availability of a reference can be passed between code units like any other reference and it can contain child scopes no different than a structure containing child references. This feature is called lexical scope similar to lexical analysis, because the idea is to analyze the code in a lexical manner and compose certain structures as though they were semi-atomic word units much like phrases in a spoken statement whereby context is extended to the represented collection of pieces. This feature is not unique programming as evidenced by languages like Swahili whose grammars emphasize extensible semantics.

Blurry Boundaries

Don't get too married to the definitions above. The boundaries of different levels in the process of parsing and compiling vary wildly by application. In the case of this application the lexers actually complete all the lexical analysis and half the parsing step, according the definitions above. The lexer files analyze, describe, and in some cases even modify the produced tokens. The remainder of the parsing step, structure and context, is abstracted away and fully automated by the application logic as determined by the identified types for a given parsed token.

What Parsing Gets You

Parsing decomposes a large substance into small usable pieces another system can use whether it is reading written language, understanding speech, or making use of computer programming code. Common applications that immediately benefit from parsers are applications dedicated to analyzing code: compilers (translators), linters, code beautifiers, minifiers, and so forth.