Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modularization of library #746

Closed
Nalinc opened this issue May 2, 2016 · 33 comments
Closed

Modularization of library #746

Nalinc opened this issue May 2, 2016 · 33 comments
Labels
L2 - annoying Similar to L1 - broken but there is a known workaround available for the issue proposal

Comments

@Nalinc
Copy link

Nalinc commented May 2, 2016

There is a rising demand to move towards modularization of *marked * and separate different components in different files(grammar-rules, lexers, parsers and renderers). This though not have been mentioned explicitly, is evident from issues #743, #717, #704.

A properly modularized code would make maintenance easy and also allow addition of new grammar, implementation of custom parser, renderer and would address separation of concerns properly.

Like may be we can have an /src folder to fiddle with the code in development and use /lib for deliverable file.

@joshbruce joshbruce added help wanted L2 - annoying Similar to L1 - broken but there is a known workaround available for the issue question labels Dec 3, 2017
@joshbruce
Copy link
Member

@Nalinc: See #956

This idea is intriguing to me, primarily from the perspective of making it easier for the library to communicate what and how it works from an architectural perspective.

I don't know enough about Marked to do this work myself. What would you propose as a full folder structure given what's already here?

@Nalinc
Copy link
Author

Nalinc commented Dec 4, 2017

hey @joshbruce

I worked on marked a bit in past and even changed some of its functionalities based on my needs back then. One thing I found annoying was keeping track of things in the same file even if I was working on a different component. Here's what I did back then:

  • I moved each of the primary components in a separate file in /src. This way I was left with core.js, grammar.js, lexer.js, parser.js and renderer.js in /src folder.
  • Each of these components was written in IIFEs so they weren't messing with global scope.
  • Finally, I had a grunt task that'd combine these components from /src and create a bundled file in /dist`

I found it easy because having independent components in different files, I could easily change/extend one component without worrying about others. The code became smaller, readable and more manageable too. I feel comfortable with marked and am open to make a PR on this.
Regarding #956, is there an exiting card for this or should I make one and move it to "under consideration"?

@joshbruce
Copy link
Member

#420

#660

@joshbruce joshbruce added this to the 0.5.0 - Architecture and extensibility milestone Dec 25, 2017
@joshbruce
Copy link
Member

@Nalinc: Sorry it's taken a moment - I totally didn't see (or catch) your comments and questions at the bottom.

That architecture sounds about right to me; given what I've seen in the issues and PRs. Let me tag in a few of the other recently active community members to weigh in as well (@UziTech, @Feder1co5oave, @KostyaTretyak). There's not an existing card beyond this one; so, congrats on that. :)

Regarding submitting a PR, this is where I would like more feedback from the community. Our focus for the 0.4.0 milestone is to try and get functional issues taken care of. We have this card marked for the 0.5.0 milestone...but, if we can do it without much negative impact to cleaning up the other issues - I think it could actually help make things a bit more obvious on what's causing some of the other issues. I also know @UziTech is trying to make some improvements to all things testing, which will help the issue house cleaning and worthwhile for the 0.4.0 milestone as well.

@joshbruce
Copy link
Member

joshbruce commented Jan 28, 2018

Been thinking about this a lot lately - blaming the day job as I'm a lead architect and trying to build up my chops in the software architecture practice. Mainly thinking about the open/closed principle and a way to make marked easy to extend while maintaining speed.

Would like to know the purpose of each of these, as I can't really discern them from looking at the code:

  • core.js - ??
  • grammar.js - ??
  • lexer.js - ??
  • parser.js - ??
  • renderer.js - ??

Most of the flavors of Markdown I've seen don't really go too far afield from the original Markdown specification - what I believe we call "pedantic". Looking at this as an object-oriented problem to solve, we might end up with something like the following as a class inheritance tree (based on specification):

  • Pedantic (Markdown.pl)
    • CommonMark
      • GFM (from what I understand from comments made by @Feder1co5oave - GFM is more like an extension of the CommonMark spec)

So, if we make "Pedantic" the default - it can be its own class. Then CommonMark can extend Pedantic - overriding what needs overriding and adding anew what needs adding. The GitHubFlavored can extend CommonMark...and so on. At which point, adding in new flavors should be relatively easy (famous last words). CriticMarkup, for example, would extend Markdown.

Of course, this introduces an interesting concept - combining specs in an ad-hoc fashion. So, instead of using inheritance to extend a base class (because Inheritance is the Base Class of Evil) we could have renderers for each of the specs - each with only the parts that are different from the others. So, using GFM would include the stuff from CommonMark and Pedantic - without inheritance - it would be through dependency or some other type of inclusion.

But, again, I'm not sure what the purpose of each of the recommended JS files is. What is the difference between a parser and a renderer and a lexer??

Also, what about allowing consumers to skip certain parsing steps (a text editor that says, "Don't allow headers," which would no longer need to go through the header check and parse steps)?? What about changing the order of parsing, does that affect performance or output at all??

@joshbruce
Copy link
Member

joshbruce commented Feb 10, 2018

  1. To ES6 or not to ES6 is one question on the table.
  2. I also learned more about JS recently - not in my usual language stack or selection (I know, weird, but I'm learning - lol). If we went totally prototypal based on specification - that could also be interesting totally making sh*t up here to demonstrate a non-ES6 concept (no fake classes):
Daring = {}

Daring.prototype.paragraph = function(markdown) {
  // whatever this is against the param :)
  // /^([^\n]+(?:\n?(?!hr|heading|lheading| {0,3}>|tag)[^\n]+)+)/
  return result;
}

CommonMark = {}

CommonMark.prototype.paragraph = function(markdown) {
  return Daring.paragraph(markdown);
}

GFM = {}

GFM.prototype.paragraph = function(markdown) {
  return Daring.paragraph(markdown);
}

(This could be horrible JS, btw)

I should be able to override all the things at that point, yeah? (Hyper-extensibility.) We capture the exact differences between each of the specifications. I mean, unless I'm just a complete idiot, which is possible - this move would resolve most of the "feature request" problems that got us into hot water in the first place.

Having said that, I have no idea how slow it would be - but Marked isn't the fastest kid on the block anymore anyway.

We could also bail on the the configuration options for choosing which flavor, because you're always doing it directly - then divide the architecture by flavor to minimize compilation and package size - for example, Daring would be the lightest weight solution for client-side...but have the least amount of extended features. GFM would require CommonMark - override and add where need be.

Add builds for minified versions of each flavor to the npm build script @Feder1co5oave just finished.

Granted I still don't know what the difference is between the lexer, the parser, and all that jazz. But, I mean, if we're going to be doing breaking changes anyway, might as well reconsider our lives a bit on the way. lol

Again, just spitballing from the sidelines based on things I've done in other languages to solve similar problems.

Tagging a gentleman by the name of @colinalford - He's been helping me with my JS, is wicked smart, and should probably not add this to the list of things he's already doing. rofl

@joshbruce
Copy link
Member

We can also do it in such a way that the changes aren't breaking for those who haven't modified or extend it - I think. Leave the same entry API as the entry point for the facade to do everything else.

@UziTech
Copy link
Member

UziTech commented Feb 10, 2018

The question is not really "To ES6 or not to ES6" because I sure don't want to stick with ES5 forever. It's more of a question of when to ES6.

ES6+ has lots of improvements for code readability and modularization. We will want to keep an eye on the benchmarks though.

P.S. ES2018 will come with named capture groups which would make the RegExp heavy code much more readable.

@joshbruce
Copy link
Member

@UziTech: How so? Can you drop a small code sample? Pseudo is good enough as long as it does es6 things. :)

@UziTech
Copy link
Member

UziTech commented Feb 10, 2018

I think before we start modularizing the code base we need to do some serious work on automating testing and benchmarking in node and a browser environment like chrome headless or phantomjs

@joshbruce
Copy link
Member

That's fair. Before that (or during it), we should probably keep working on the fixing known issues thing as well - this is just future visioning.

What do you propose we need? What stops us from getting any of them? (CI against PRs, for example.) Maybe a ticket on this? I don't think we have one yet.

@joshbruce
Copy link
Member

I'll flag that one as part of 0.4.0 milestone to make sure we don't go out of order.

@UziTech
Copy link
Member

UziTech commented Feb 10, 2018

@joshbruce Here is the same code with ES6 classes as your example above:

class Daring {
  paragraph(markdown) {
    // whatever this is against the param :)
    // /^([^\n]+(?:\n?(?!hr|heading|lheading| {0,3}>|tag)[^\n]+)+)/
    return result;
  }
}

class CommonMark extends Daring {}

class GFM extends Daring {}

Much less boilerplate code

@UziTech
Copy link
Member

UziTech commented Feb 10, 2018

As far as automated testing is concerned, it would be nice if @chjj could set up CI for PRs on Travis for mac and linux and AppVeyor for windows. Both are free for open source projects.

I could create the CI scripts to automatically run the tests on multiple versions of node and different browser engines including benchmarks.

That would help a lot when modularizing the code base to prevent accidentally introducing bugs or performance hits.

@joshbruce
Copy link
Member

Yep. This is where @colinalford really comes into play for this conversation because he knows more about it than me but the class system in JS under the hood isn't true classing is it? Not like the ones we find in Swift, PHP, Objective-C?

I mean every object in JS is not only a definition (a class) but an instance as well; so, by boilerplate are we talking about the "prototype" piece? And functions are first class citizens, yeah?

Inheritance of classes can get tricky from the perspective that I can't just inherit parts and pieces (https://youtu.be/bIhUE5uUFOA). And, if this is what makes it easier to modularize, I'm not sure we need/want ES6 for most of it.

This is also what makes the prototype system interesting...I can inherit just the things I want - the banana without the gorilla holding it. Similar to traits in PHP, for example.


Agreed on the automated CI piece.

Is that it; or, is there more we could/should be doing on that score?

(I don't know how we can do the CI thing and actually maintain the repo being under @chjj . I mean it took us a lot of work to get to having me able to publish things, thanks for that. Does GitHub have a "I would like to take over this thing" like NPM does?)

@UziTech
Copy link
Member

UziTech commented Feb 10, 2018

Right, the class syntax is just hiding all the boilerplate "prototype" pieces. The code is doing the same thing just much easier to read.

We can just copy parts if we need to:

class CommonMark {
  // Daring has many other things but
  // we only want the paragraph
  paragraph(markdown) {
    return Daring.paragraph(markdown);
  }
}

again same as the ES5 code, just cleaner.

ES6 has many benefits over ES5 especially engine optimizations but since we will be transpiling it to ES5 anyways developer sanity will be the only benefit we will get until we can drop older browser/node versions.

That being said, keeping up with current ES versions will help us get those optimizations when we can drop the old version since we won't have to do an entire rewrite. All we will have to do is change our transpiler configuration.

@UziTech
Copy link
Member

UziTech commented Feb 10, 2018

Once the class fields proposal lands in ES it will be even easier:

class CommonMark {
  // Daring has many other things but
  // we only want the paragraph
  paragraph = Daring.paragraph;
}

@joshbruce
Copy link
Member

joshbruce commented Feb 11, 2018

@UziTech: Fair. And I agree. Now the next question, not a recommendation, just want to cover the base:

What about TypeScript?

It adds keywords that do not exist in the ES spec as of today, but might in the future; could the same argument be made and does it hold the same value?

@colinalford
Copy link

This is a great Crockford video, and he advocates what he calls a class free paradigm starting 42:50: https://youtu.be/DePE0ffiMf4

No need for classes, no need to worry about inheritance. Write a constructor, add the parts you want from other classes, and return the new object. Don't have to wait for the new spec, either. Very cool.

I am not the biggest fan of classes in JavaScript since it's a false construct. I write Typescript everyday for an Angular project, though, and I don't think it really matters given you understand the underlying JavaScript.

Re: using Typescript, I really like it for the type system. It has been incredibly helpful organizing my code. However, it's kind of a pain to setup and adds a bunch of complexity with the build. I don't think I would advocate adding it for such a small project -- ~1300 LOC seems manageable without a type system.

@UziTech
Copy link
Member

UziTech commented Feb 11, 2018

@KostyaTretyak has done a pretty good job of rewriting marked using TypeScript in marked-ts

I'm not opposed to TypeScript but I feel like that would be an extra barrier to getting contributors. Anyone who would want to submit a pull request would need to learn TypeScript.

The class-free method seems clever, which is usually not a good thing in the long run. Almost all code smells were once clever hacks.

My vote is for sticking with modern standards. It seems like the easiest way to not fall behind in the future.

@colinalford
Copy link

@UziTech I hear your concerns about being clever, but it is generally accepted in OOP that we should prefer composition over inheritance. The class free approach means using only composition while ignoring inheritance.

There has been a lot of debate in the community since the introduction of classes about whether we should even use them or not. I'm not sure I would say its a modern standard just because its part of the modern spec.

But that's JS in a nutshell. Its not like we ever had a PEP like Python or whatever the PHP style guide is. Probably Crockford's Good Parts is the closest thing in the past, and AirBNB's style guide is pretty popular nowadays, so maybe pick that (they prefer classes to prototypes, btw).

IMO as long as everyone understands how Javascript really works and that classes don't actually exist in JS, then it is a nice syntax to organize your code. My fears are based around Java/C# devs working on ES6/Typescript projects for the first time without learning the differences between the languages.

Also, here's a really good article from Dan Abramov titled "How to Use Classes and Sleep At Night": https://medium.com/@dan_abramov/how-to-use-classes-and-sleep-at-night-9af8de78ccb4

@colinalford
Copy link

@joshbruce Actually, that's a good recommendation. Use AirBnb's Style Guide for the project.

@joshbruce
Copy link
Member

@colinalford: See PHP-fig - PSRs 1 and 2 specifically - https://www.php-fig.org :)

Agreed on the AirBnB style guide. The USWDS also started using it. It definitely seems to be becoming a more accepted standard. Having said that, as I'm sure you already know about me, ultimately it's up to the people actually maintaining the majority of the codebase. ;)

(@Feder1co5oave and @UziTech, AirBnB is definitely worth looking into.)

I think that's my only issue using the class keyword in JS - it seems like a lie and hides the more functional-like compositional approach of JS. So, it's like JS is trying to be more class-like (PHP, Java, et al), while I'm trying to make PHP feel more functional...the world has gone topsy-turvy I tell you!

@UziTech: This is what I'm referring to on that score. https://github.com/8fold/php-html-component/blob/master/tests/ComponentTest.php

@joshbruce
Copy link
Member

@colinalford: Having said that, we are talking linters:

#999 - #1020

@UziTech
Copy link
Member

UziTech commented Feb 11, 2018

@colinalford sticking with modern specs is what I meant by modern standards. Specs aren't always the best way to do things but they are the way that most people know and the starting point of future specs. If we want to keep this project from becoming stale again we will want to keep up with the specs.

I'm not married to the javascript class either but I think this:

class Markdown {
  paragraph() {
    // code...
  }
}

is easier to read and write than:

function Markdown(){}

Markdown.prototype.paragraph = function () {
  // code...
}

If we want something to be a javascript "class" we might as well call it so.

The class syntax is not really why I want to use ES6+ that was just an easy example of making the code easier to read.

ES6+ also comes with:

  • Block scoped variables (i.e. let, const)
  • Default parameters
  • Destructuring
  • Template literals
  • Arrow functions
  • Async/Await
  • and many more...

Again, since we are compiling to ES5 in the build, the only benefit to any of this is ease of development.

Being that this is a community project with lots of people working on it (as opposed to a single person writing most of the code) remember that which ever code style we pick will need to be learned and written by anyone who submits a PR. Sticking with modern specs seems to be the easiest way to be sure that most people are able to easily help with the project.

@joshbruce
Copy link
Member

@UziTech: Agreed on the easier to write; however, it seems a bit more awkward to cherry pick functionality - but I suppose, technically, they're static methods.

Also, agree with @colinalford, that as long as everyone understands what's going on under the hood re classes, it's not that big of a deal.

I do appreciate a lot of what ES6 offers. I also appreciate the readonly brought in by typescript (as the let and const are not the same as what we see in something like Swift).

I might have to call a bit on the "lots of people working on it" piece a bit. Most of the PRs submitted since taking over have been from 2 people. Most of the "contributions" have been in the form of issues - not fixes - by other people...this, in my opinion, is what made Marked go stale after reading the issues. People just asking for more and more stuff and people either letting it through (not acting like the hardcore filter @chjj used to be when he was still participating) and the other contributors and Chris seeming to walk away - it just became too much to sustain - but some folks didn't want to walk away when we put the call out - it's got potential.

To be honest, if you (specifically) and @Feder1co5oave hadn't showed up, I was actually going to start systematically shutting it down and probably closing the project altogether - after having been made an owner of the package.

The world doesn't need another dead NPM package to choose from based on its popularity by being one of the first. #956 - for future reference while working on other things.

@UziTech
Copy link
Member

UziTech commented Feb 12, 2018

It will be much easier to cherry pick once private methods are introduced.

"lots" may have been an exaggeration but my point was that this is a project where the only way it will last very long is if it is easy to hand off to the next person that wants to maintain it. I don't think there would have been two or three people willing to take over this project from Chris if it was written in Elm or ClojureScript.

While I enjoy writing TypeScript, I will never use it for one of my projects unless the project is meant for the TypeScript community (like an angular tool).

@joshbruce
Copy link
Member

Not entirely sure, but it seems like #1077 and the push for a faster move to the 1.0 is clearing scrutiny. It got me thinking about the transition period between 1.0 (marked with current APIs and whatnot) and 2.0 (all the architecture and deprecation changes ever) and backward compatibility while actually making the changes. We might be able to maintain marked.js as an entry point that then references other things based on the setup options.

marked.js
core.js - rules common across supported flavors
grammar.js - maybe not necessary at root
lexer.js - same
parser.js - same
renderer.js - same
/DaringFireball
  core.js, grammar.js, lexer.js, parser.js, renderer.js
/CommonMark
  core.js, grammar.js, lexer.js, parser.js, renderer.js
/GFM
  core.js, grammar.js, lexer.js, parser.js, renderer.js
/[some other flavor]
  core.js, grammar.js, lexer.js, parser.js, renderer.js

Then we could compile minified versions for each flavor to allow users to take only the functionality they need into their projects. We could also modify the way the option settings change things in marked.js itself.

pedantic: true - use DaringFireball indirectly
gfm: true - use GFM indirectly (if both are set to true, use non-pedantic)

// Create more generic setting
flavor: daring|gfm|common-mark|[some other flavor]

Doing the update to the docs has me seeing a lot of setting options I'm curious if we need or want as well. gfm: true, tables: false, for example, why not use CommonMark instead?

@joshbruce
Copy link
Member

#1347

@styfle
Copy link
Member

styfle commented Sep 29, 2018

@joshbruce @UziTech This ticket along with #1288 and #1347 seems to be requesting the same thing. What do you think of closing the duplicates?

I would imagine we still want to focus on spec compliance before tearing things apart to modularize, and probably wait for ESM in Node.js to go stable.

Since this ticket appears to be the oldest, should we close the other two and get all this discussion in one place?

@joshbruce
Copy link
Member

I'm down as long as the overarching thread stays connected.

Definitely sticking to spec compliance for now: https://github.com/markedjs/marked/milestones - doesn't really change until 1.x or 2.x.

I'm not sure we should close #1288 and #1347 to be honest. I'd almost want to keep the newest open, to be honest.

@saragarmee
Copy link

I'm fine with closing #1347
I'm the one that opened it. I've completed the refactor and it seems to be working, feel free to pull whatever you like from my fork if it's helpful in any way. I did have some questions at the end of my admittedly lengthy work through that I really would love to see answered if possible though.

@styfle
Copy link
Member

styfle commented Oct 4, 2018

Closing this ticket as a duplicate of #1288

@styfle styfle closed this as completed Oct 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
L2 - annoying Similar to L1 - broken but there is a known workaround available for the issue proposal
Projects
None yet
Development

No branches or pull requests

6 participants