Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More flexible design for generator #14

Closed
yannisgu opened this issue Oct 20, 2014 · 20 comments
Closed

More flexible design for generator #14

yannisgu opened this issue Oct 20, 2014 · 20 comments

Comments

@yannisgu
Copy link
Contributor

When implementing the PR #13 I realized that the current design is not that flexible. We should discuss about some enhancements in the design 😄

Problems of the current design I see (personal opinion):

  • To add new features, code-changes must be performed at different places deep in the code. There is no simple possibility to extend the code
  • Not perfectly testable
  • Tightly coupled to HTML: Since a document provided by roslyn is translated directly to HTML it is not easy to add a search-index with rich metadata.

Proposed high-level design

  • Code analysis and html (and index) creation are two completly different steps
  • The code analysis creates an simple object model from the code which is then passed to the transformers
  • The model could like similar to this:

Model

  • There are two transformers
    • HTML: generates the HTML and writes the JSON documents
    • Indexing: updates the search index (e.g with Lucene.Net)

Thoughts?

@MarcinJuraszek
Copy link
Contributor

Looks much better than just concatenating strings :)

  • Do you want to tight simplified code structure model and internal links between read tokens within the same model? I have a bit of concerns about that. I think the linking should be done as part of HTML transform pass. It would also allow you to run code analysis on multiple files in parallel, as you don't have to have any knowledge from other files while doing that. It would be just simple transformation between Roslyn syntax tree into your own model.
  • Do you want to capture between related files, e.g. aspx and aspx.cs with the same name? If so, you'd need a way to relate two Document instances together.

But I was wondering if you need your simplified model at all. I think you can run your transformers on Roslyn Syntax Tree directly. Just structure it better than before :)

@JoshVarty
Copy link
Contributor

The current design certainly leaves a lot to be desired, and was thrown together as a proof of concept to gauge whether or not others would be interested in this project. It appears they are, so I agree we should rethink the design.

I think it's a good idea to use a simplified model instead of running the transformers on Roslyn's syntax tree. One of my long term goals for this project is to support languages beyond C# and VB .Net. If we can define a simple model, then the only language-dependent functionality would be the creation of this model. We could use the HTML and search transformers on any model, regardless of the underlying language.

@yannisgu
Copy link
Contributor Author

Thank you for your inputs @MarcinJuraszek and @JoshVarty

  • I think the extra model is it worth. You are right @MarcinJuraszek a object link across documents breaks possibility for parallelism (or add extra complexity). Maybe we can agree that the FullName of a token is the key. So we can create links based on string-keys.
  • Kind of relation between between documents would be great. Not sure what is better a property ParentPath in document (contains the path of the parent file/folder) or a property Parent referencing to the parent documenet-object.

ParentPath(Edit):

Parent(Edit):

@MarcinJuraszek
Copy link
Contributor

@JoshVarty Right, I forgot about that. Simplified model is definitely better if you're planning to get support for languages beyond Roslyn project.

@yannisgu I link the Parent approach. It seems to be much more flexible. However, I don't like the name. `IFileSystemObject' suggests it's file system related structure. When actually it's Solution/Project-only relationship. For aspx and aspx.cs files there is no File System structure, they both live in the same folder. VS shows the relationship because it's aware of it.

Also, if you're planning on moving forward with other languages, you should remember that for some languages (e.g. F#) order of files/folders in project/solution is important. Adding it to the model now seems like a good idea.

@yannisgu
Copy link
Contributor Author

I just can't find a good name 😞 IProjectObject, but not every language has a project system? Or IItem seems a bit too general? Anyone have a good name?

To handle order we probably need a property Childern

@JoshVarty
Copy link
Contributor

What sort of relationship are you trying to establish between aspx and aspx.cs files with the same name?

@AmadeusW
Copy link
Member

@yannisgu & @MarcinJuraszek I agree with the high level design. It will take advantages of single responsibility principle. The various transformers may execute asynchronously if HTML transformer finished and directs the web user to the html site for the project. It will be huge for testability.
As for the interface name, how about language agnostic ICodeItem or if we also represent directories, IProjectObject or IProjectItem?

@JoshVarty I think Document <-> Document, because these two files are logically connected. Similarly, .cs, .xaml.cs, and underlying .g.cs are partial classes that live in documents that have a relationship. I think that it is in scope of Document to handle logical connection, while proposed IToBeDetermined would handle physical location. Toughts?

@JoshVarty
Copy link
Contributor

@AmadeusW

You're right. Now that I think about it, the relationship would need to be established if we were to properly display the files in the treeview. (That is display the backing .cs and .g.cs. files as children of their corresponding document the same way Visual Studio's Solution Explorer does)

@yannisgu
Copy link
Contributor Author

Ok. I think we agree now all on the high-level design. The last discussion point about this design is about how to handle physical/logical relation between documents and folders.

So I hope those are the last diagrams I add to this post:

a) We handle physical and logical relation as the same thing, since a document cannot have at the same time a parent-folder and a parent-document:

or b) we separate those things, since they are not directly related

@AmadeusW
Copy link
Member

I like option a.
I'd just like to add a property to the overarching Model: Version information. Sooner or later we'll find ourselves having to update the model, we should know what version is that model from so that we can process it properly.

@JoshVarty
Copy link
Contributor

@yannisgu I prefer approach A.

One final note: TokenLink should probably be renamed to SymbolLink as we're not actually linking to a single token.

@yannisgu
Copy link
Contributor Author

Ok so go for IProjectItem.
@JoshVarty ok let's rename it

@AmadeusW hmm don't get that.. In which situation the version with which you created the model don't match the version in which you are consuming it?

@AmadeusW
Copy link
Member

@yannisgu ah, is the model a transient structure that exists solely for a short period of time between user selects a github repo and we finish processing it?

@yannisgu
Copy link
Contributor Author

Yep, that's what I thought.

So, how to proceed here? This change will probably break everything, so maybe all PRs should be closed first?

JoshVarty added a commit that referenced this issue Oct 26, 2014
Based off the design discussed in #14
@JoshVarty
Copy link
Contributor

After implementing the inital design, I think only one question remains.

How should the SymbolLink (formerly TokenLink) represent the referenced type?

If we use fully qualified name, the HTML transformer will need to know where every type is defined (ie. the document and line number, so it can build the correct link).This seems like something that would be beyond the scope of the HTML transformer.

Maybe we include: Fully qualified name, containing document (or document path) and line number?

@yannisgu
Copy link
Contributor Author

See: #14 (comment)

Do you want to tight simplified code structure model and internal links between read tokens within the same model? I have a bit of concerns about that. I think the linking should be done as part of HTML transform pass. It would also allow you to run code analysis on multiple files in parallel, as you don't have to have any knowledge from other files while doing that. It would be just simple transformation between Roslyn syntax tree into your own model.

The linking should be done by the HtmlTransformer. In fact the HtmlTransformer know about the whole model since it transforms the entire model. The code to find the correct document/line should not be to extensive with LINQ.

@JoshVarty
Copy link
Contributor

Ok, I'm a little concerned about performance, but we can see once we've gotten there.

I've started rewriting the DocumentWalker to generate the DocumentModels. One more question: How should we represent trivia? Should we have a Leading and Trailing trivia field for each token?

JoshVarty added a commit that referenced this issue Nov 10, 2014
Based off the design discussed in #14
@AmadeusW
Copy link
Member

Just a reminder: @JoshVarty remember to support folders which are in the repo but come before the .sln file. Currently they are not in the model, and we don't account for them (but we should)

@JoshVarty
Copy link
Contributor

What are the opinions on the first three transformers?

I've created a base abstract class AbstractWorkspaceVisitor that all transformers inherit from. It takes care of visiting the projects and folders properly.

Based on their needs transformers may override:

  1. VisitWorkspace
  2. VisitProjectItem
  3. VisitFolder
  4. VisitDocument

This allows transformers to process only the pieces of the WorkspaceModel that they're interested in. (ie. A search index transformer likely does not need to visit Folders when building its index.)

If everyone is roughly satisfied with this approach, I'll go ahead and close this Issue. Also, I think @MarcinJuraszek was right about immutability. There's no needs for a lot of the WorkspaceModel items to have setters. However, I'll create a separate issue for that.

@MarcinJuraszek
Copy link
Contributor

Wow, I think I missed all the work that was happening in the workspaceModel branch ...

Great job guys! I'll take a look into all of this tonight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants