-
Notifications
You must be signed in to change notification settings - Fork 69
Add Typescript support #2
Comments
Sounds interesting! The first thing you need to look into is how to collect all source information from TypeScript code. Is there already a library for generating and iterating an Abstract Syntax Tree? What information does it provide? Definitions of types, variables, functions and where they are referenced, including their exact locations in the source code? Once you have a prototype for that, we can think about storing everything with SourcetrailDB. |
Hello Eberhard! I will have to do some research to answer all of your questions, but I know that Typescript already has a language service that adheres to the Language Service Protocol, which is what I would probably start with. Alternatively, the typescript compiler does have an API for parsing and working with the AST. It looks like that it understands the React code syntax natively as well, from a quick test, so that's awesome. The API for working with the AST definitely can extract source code positions, types, etc. It's pretty thorough. The hard part will probably be the interop with C++ and Sourcetrail. It may be as simple as pointing C++ at an external tool, and parsing the output. That would probably be slow to run, but fast to implement. I can start looking into that over the next couple days, with this weekend being a good time for me to really dive into it. |
Sounds good to me! Note that we are using SWIG for the Python binding used in our Python Indexer. The SWIG page states that JavaScript is also supported for bindings, which might work for you. Anyways, let's worry about this later. We had a quick look at the Language Server Protocol some time ago and were not sure if it supplies all necessary information. It's usage also seems limited to doing single requests, which is not ideal for Sourcetrail indexing, but can work if the indexer is written like a crawler. Anyways, we never took a deep look into LSP so far. |
I don't know too many details of the LSP, I just knew it was what VSCode and a couple of other editors use to easily add language support for things like renaming, jumping around the code, etc. If it doesn't provide enough information then I can understand why you would have gone with a different solution. I can confirm experimentally that the typescript library that is usable by Javascript to compile Typescript code comes with a wealth of information, including types, source code positions, expressions, and more. It should be more than enough for Sourcetrail to build an index. |
@bheklilr, did you make any progress on this one? Otherwise I would start working on making the SourcetrailDB interface callable from Javascript. |
@mlangkabel I looked more into the Typescript side of things (just research) to see how to extract the index data from an entire project, and I'm still working on that. I didn't get as much time this weekend to work on it as I would have liked, but there still has been some progress. I'll try to get some of my notes committed tonight. |
I've got the skeleton of a project set up following https://github.com/CoatiSoftware/SourcetrailPythonIndexer, it's currently over in my repos at https://github.com/bheklilr/SourcetrailTypescriptIndexer. I've gotta get back to work right now, but I'll update it later tonight with more info on what I've found out. I also discovered that I won't be able to really do any development on my work laptop (even though this is justifiably work related) due to needing a C++ compiler to do the FFI between node and the DB library. It looks like it won't be too difficult to get working using https://github.com/node-ffi/node-ffi, I can just load a .dll or .so and will just have to build the interop layer between JS and SourcetrailDB. I'm planning on just building this project in JS, even though it's for TS. It'll just simplify the build process, since typescript just compiles to JS. I don't expect this to be a particularly large project after looking at the python code. |
Thanks for the update! For creating an interop layer for Python we are using SWIG. So the SWIG interface definition already exists. This morning I briefly looked into generating SWIG based Javascript bindings http://www.swig.org/Doc3.0/Javascript.html. It should be possible, but when generating that layer one has to specify a desired target engine (-jsc, -v8, or -node). I guess, that the node-ffi just allows to generate bindings that work with the node engine, but that would also be fine. One more thing to watch out for is the licensing of those external dependencies and tools that we use when creating those extensions: For node-ffi it is looking good :) |
@mlangkabel I'll be sure to go through and check the licensing on the dependencies. Is there any particular restriction I need to keep in mind? I'll have to evaluate which route will be best for developing the bindings. If I remember correctly SWIG is more of an auto-generated set of bindings that get compiled into an extension module (at least for Python), while node-ffi dynamically calls into a shared lib at runtime. There are probably more performance concerns there. As for which JS engine to use, it'll certainly be node. |
For licensing: It is just that the used libraries and tools need to be compatible with the Apache license of this project. I had a further look at the swig based node bindings and it seems that it is not as easy to setup as expected. It looks like the swig support for more recent versions of node is broken. |
I've bindings now for TypeScript or Node.js at https://github.com/LouisStAmour/sourcetraildb -- not published to NPM, not much in the way of unit tests, but "it works!" if you've CMake installed. Tested only with VS Build tools installed on Windows, so no idea how portable the CMake instructions are. It's possible file paths need adjustment. As to the code itself ... it could be cleaned up, there's a lot of copy and paste in the CPP code, and no guarantees that the TypeScript declaration actually matches the CPP implementation when ideally I'd code-generate everything using a custom script from an AST based on the original CPP SDK or similar. Or I'd use the TS to generate the CPP bindings somehow, instead of manually doing tests for values and setting errors. Additionally, I'm thinking I'd want to change the wrapper or TS API from returning success booleans to instead returning void and throwing exceptions. It's more the pattern expected of TS code, I think, and it's annoying to have to use wrapper code to fail hard on an error when an Exception would do it for you. I might do it upstream, as it was a bit annoying to not be able to use CPP's excellent exception support and instead return std::pairs with success booleans for conversion functions. Finally, I'm not convinced the TS API works exactly the way I want it to. I tried to simplify creating NameElement objects with a constructor that can take just a name, or a set of 3 parameters to include prefix, name, postfix. But it still feels odd to create both NameHierarchy and NameElement objects, ideally there'd be some kind of overload so you could do simpler use cases more easily. You can see the ported CPP example in the form of cpp_example.ts in the repo, I'm a bit tired after spending all weekend researching the best ways to write native code for Node and ending up at CMake.js with node-addons-api over N-API as the best way, with a TypeScript declaration for ease-of-use. If you're looking for native indexing support in SourceTrail for TypeScript, you won't find it here yet. Maybe I'll have something proof-of-concept up by next weekend, but writing the bindings was supposed to be the easy part. Now I need to map either TypeScript or Babel types (probably Babel's because it's more compatible with raw source generally) to the various APIs and Kinds used by this API. Using a tree walker with Babel, I suspect, will actually be easier than using this API correctly with SourceTrail -- it might take a few iterations of trial-and-error before things turn out exactly as they should... For now, given the beta nature of these bindings, I don't feel comfortable publishing them to NPM yet, just point your package.json to the git repo if you want to try it. Once I've got things working for TypeScript, I'd like to turn my attention to C# support via Roslyn's AST next, which might be as easy as compiling the C++ code using a managed C++ compiler, or it might require SWIG bindings, we'll see. I've worked at a couple companies recently on a JS/TS/C# heavy stack, and I miss Sourcetrail from when I worked on primarily Java codebases, so this is my way of trying to get that back! JS or TS bindings will probably prove very flexible, language-wise, as any AST listed at https://astexplorer.net/ has JS bindings and it's a pretty long list thanks to the number of languages that compile to JavaScript or have semantic support for JS IDEs like VS Code including, in alphabetical order: CSS, GraphQL, Graphviz, Handlebars (templating), HTML, ICU MessageFormat, JavaScript (of course), JSON, Lua, Markdown, MDX, PHP, Regexp, Scala, Solidity, SQL, WebIDL, and YAML. Now, how much value will you get out of Markdown support or Regexp with this kind of tool? Probably not much. But it's possible. Me, I'll be trying to map JSX or TSX to JS symbols, but outside of that I'm not sure how useful SourceTree would integrate with a document-based syntax like HTML without importing an entire ontology of JS DOM node types from MDN or W3C specs and even then... the dynamic nature of most template languages (and JS itself) means it can be hard to know with 100% certainty what's going on. But it's a step in the right direction... |
@LouisStAmour: Wow, you are awesome! Thank you for making this! We've just checked out your repository and got the example to run without any issue. So this is already looking really nice! Unfortunately we don't have a typescript expert on our team, so we cannot really give you any advice on the Typescript side of things, but if you say that exceptions are the Typescript way, go for it! It should be very straight forward to turn the bool that the Cxx code returned into an exception on the Typescript side. When you start creating the actual Typescript indexer, please take a look at our SourcetrailPythonIndexer repository. The SourcetrailPythonIndexer can be run from within Sourcetrail via the For the SourcetrailPythonIndexer we started by creating a deep indexer, that traverses the AST and for every symbol that is referenced (e.g. the line Right now the Typescript bindings for SourcetrailDB are located on a separate repository. We discussed this internally and we would really like to the main SourcetrailDB repository. So if you want to, you could
With that setup we would be able to keep the Typescript bindings up to date with all upcoming changes to the database. |
Fantastic, thanks. I'll look at doing that, though I might not have time until later in the week. In regards to the deep introspection, I'm pretty sure Babel's AST is relatively shallow, more like a parser, as it works file-by-file. It has the benefit of understanding the most real-world source code, though, because most code goes through Babel's transformations and sometimes Babel is used to transform code directly to what the browser will run, though usually there are additional steps. TypeScript's compiler, on the other hand, goes through an entire project and builds out incredible detail on what symbols reference which types, where they were defined, etc., some of which is mentioned here https://levelup.gitconnected.com/writing-a-custom-typescript-ast-transformer-731e2b0b66e6 and so from an ideal perspective, you'd chain together Babel to TypeScript with source mappings to preserve the lookup from the transformed Babel code back to the original TS code pre-transformation. The catch is that if the symbols don't match one-to-one, there will likely be extra code in the TypeScript version than was present in the initial Babel one. And given how many Babel transformations there are to support it's possible folks might have to pick one or the other to start. Typescript's compiler is not at all extensible, but it does have JSX support: https://www.typescriptlang.org/docs/handbook/jsx.html Here's further documentation on the TypeScript Binder: https://basarat.gitbooks.io/typescript/docs/compiler/binder.html I'll probably need to read and re-read the TypeScript binder docs to build the indexer, and so the first projects we'd support would be ones that the |
I figured the hard part was interfacing with the compiler rather than refactoring the node bindings to merge with this repo. So I got started on an indexer command line tool, using ts-command-line and some compiler code samples. No readme yet, as it's still setup for local development (package.json points to a local path for the sourcetraildb module) and it's not yet complete. But I figured I'd share where I'm at. It looks like TypeScript's compiler prefers getting all project files at once rather than going one-by-one, and has a compiler mode where it's passed a single Anyway, you can have a look at https://github.com/LouisStAmour/SourcetrailTypescriptIndexer/ for my progress. My goal by tonight is to get the basics done so I can run the indexer directly and have it produce a Sourcetrail DB for some todomvc examples: https://github.com/tastejs/todomvc After that, I'll worry about more advanced TS projects and helping refactor the code for future maintenance... |
Okay. So ... I'm finding the model of Sourcetrail slightly hard to map to TypeScript/EcmaScript's model. Not impossible, just having to wrap my head around it. I do wish the following graphics at https://www.sourcetrail.com/documentation/#Nodes were mapped to sourcetraildb representations more clearly. (all the way to the Graph Legend, etc.) If I understand it correctly, in the following scenario: I should have packages for "typescript", "@types/react", "csstype" (under a "node_modules" package?) and then ... obviously, files for each file. It's unclear to me whether I should use relative or absolute paths, though I suppose that will be obvious as I go. It'd be nice if I could add "folder" types, maybe. And keep node modules separate. The idea in JavaScript modules is that modules can be uniquely identified by the full file path they came from. And it looks like TypeScript's compiler automatically tracks symbols back to where they came from. Though I'm not sure if it automatically resolves source map files, probably not. It probably sticks to .d.ts files and falls back to the minified JS if it can interpret types at run-time. If I'd want to make it easier to display node_modules code or have it prefer to show the original source via sourcemaps, I'd have to do the math myself. If source maps aren't available, it's possible we've a reference to code that's next to impossible to display without "de-minification" to re-introduce line breaks to the JS. But ... I feel like I am missing something for the mapping of TS types to Sourcetrail types. Sometimes in typescript you've "internal" or non-exported symbols, and other times you've exported symbols. Making it more confusing, sometimes exported symbols are callable but "undocumented" because they're not exported into .d.ts files. And I can't help but feel like I'm just scratching the surface of the complexity here. I kind of wish some of this were more clearly possible to express in the SourcetrailDB API. For instance, I'd probably want to see which functions are exported and which ones aren't. Then there are things that seem hard to map, TypeScript has union types... https://www.typescriptlang.org/docs/handbook/advanced-types.html But unlike CPP, union types don't have members, they aren't actually distinct in that sense: https://docs.microsoft.com/en-us/cpp/cpp/unions?view=vs-2019 I can probably narrow down what type something is based on the source, as TS does a lot of magic inference on its own, but I'm not sure where to include details like what type a variable is at that moment vs how it was defined originally? An object is inherently dynamic, and sometimes is used like a hash map or dictionary... confusingly, an array has similar properties and if you assign a non-integer to it, it automatically becomes an object (confusingly initialized as an Array!) I guess where I'm going with this is it can be near impossible to pin down a variable to a certain value in practice. A variable defined in one place will change as it's used elsewhere. So I'm not sure how we can best represent this in the UI. There's even an "EvolvingArray" type: https://stackoverflow.com/questions/52804806/how-does-typescript-infer-element-type-for-array-literals Some of the oddness comes from Typescript having to handle the fact that it's a generic language (JS) underneath and TypeScript was designed to be pragmatic -- to assign types to a dynamic language leads to quite a bit of complexity. For instance, you can type a method signature as many times as you like, as long as there's a single implementation and the implementation's signature is flexible enough to handle every type given: https://www.typescriptlang.org/docs/handbook/functions.html#overloads I don't think this will cause a problem in practice, but it does make for odd reading at times. The concept of .d.ts files can probably be compared to CPP header files, except they can't contain implementations, they're only declarations for code implemented elsewhere. Sometimes you've the .js source files on disk, sometimes they're mapped to runtime-provided APIs. (typescript/lib) Given how important generics are to typescript, I'm wondering if I should instead be looking at a more complete C++ example: https://www.typescriptlang.org/docs/handbook/generics.html I'd assume I need to map TypeReference nodes from TS generics to TYPE_PARAMETER symbols? Then you've the oddballs due to JS scoping. When you use All I can say is, TypeScript has ... a lot of syntax to map: https://github.com/microsoft/TypeScript/blob/master/lib/typescript.d.ts#L78 And that's not even counting having to interpret flags, decorators, modifiers: https://github.com/microsoft/TypeScript/blob/master/lib/typescript.d.ts#L496 It does look like most of what I want to pay attention to are resolved TS symbols and symbol tables, but those are in turn attached to the nodes in the syntax tree, and themselves have a lot of flags: https://github.com/microsoft/TypeScript/blob/master/lib/typescript.d.ts#L2164 Also, Symbol interfaces or Symbol objects are not to be confused with UniqueESSymbolType https://github.com/microsoft/TypeScript/blob/master/lib/typescript.d.ts#L2346 or ES5 Symbols. This had me stumped for awhile as I didn't catch the difference. Also if all the above wasn't enough ... I don't think it affects anything in Sourcetrail yet, but you can use literals as types. So a string might be assigned the type Then there's namespacing, a thing unique to TypeScript which pre-dates modern ES6 module named import/export. It has some fun concepts too, like aliases: https://www.typescriptlang.org/docs/handbook/namespaces.html#aliases Note that an alias can be independently modified ... which is similar to how an object in JS also can be modified independent of its class. Should these possible object modifications make their own classes? How should they be exposed to Sourcetrail, or shown to users? In conclusion -- I know I'm overthinking this. But there's a lot of syntax to take in, and I know if I were writing the UI from scratch, I'd want to present more of it to the user, particularly (ideally) what type an object or symbol is determined to be at that moment. but I'm not sure if the only way I can do that is to say it's re-declared when the type changes? Short of mapping JS back to V8 C++ primitives, I'm not sure how to model some of this... |
And all this without mentioning that of course, you can have, and regularly have, multiple types of syntax and files in the same JS/TS project -- sometimes even in the same file -- including JSON, JavaScript, CSS, HTML, XML, TypeScript, JSX/TSX, Markdown, SCSS, and references to assets (images, fonts, web links). At various points you could make a graph from CSS class references in HTML and JS to CSS selector references back to HTML and JS, etc. And that's not to mention all the git history we're probably ignoring right now. ;-) I don't know how in-depth Sourcetrail could be customized, but it'd be nice to organize the source by my own graph, such as by router routes in an MVC framework or React-Router, or ideally, to start from the app server initialization and HTML serving, and move straight into CSS import graphs, JS graphs, through frameworks like React and Redux to window.fetch/XMLHttpRequests and API calls, exposed API endpoints, then back-end code, SQL statements, to the database model. That would, of course, be the "holy grail" of visualizations, and probably at the start, only available to programmers looking at an AST and trying to make automated sense of a restricted syntax used in a particular codebase... I'm not saying it would be required here, but it would be a "nice-to-have" if the existing AST parsers were opened up for further customization, perhaps through a walker API and an extended SDK? Seems like an alternative might be https://docs.microsoft.com/en-us/visualstudio/modeling/directed-graph-markup-language-dgml-reference?view=vs-2019 but as I don't have VS Enterprise, I haven't extensively played with this at all. And it feels like we're getting dangerously close to OWL/RDF/ontology territory here... GraphML is also probably similar. Finally, I'm not looking forward to writing up the syntax highlighting. Ideally, we'd have taken a VS Code approach and reused TextMate bundles (iirc) since they're very common and relatively portable, though not always "smart". The only other option I can think of would be to integrate an LSP and ideally it would provide syntax highlighting. I'm not sure how "smart" these get either. |
Hi @LouisStAmour, thank you very much for the update. It took me a while to get through all of the things that you mentioned but here is my answer and advice. I hope I haven't missed too many of the things you addressed. When we started out working on the Python indexer we thought it would be difficult because of the dynamic nature of Python, but tackling TypeScript seems to take "difficult" to a whole new level. Yes, you are overthinking this ;) Don't try to focus too much on the language and on how the language is read by the compiler. Instead try to think about how the code is read by a user and about what will be useful to know for the user.
If you haven't read it already, make sure to read our language extension guide. Maybe it helps to answer some of your open questions. In general:
|
Thanks for your quick reply! Letting me know the difference between globals and locals really helps, it’s the kind of info I need to try and understand how and why I would pick one description over another. https://github.com/microsoft/TypeScript/wiki/Architectural-Overview Is quite useful too. I’m surprised it took me this long to find it and that I got so far without it. I’m going to have a look at the language server to see if it has any good examples on converting from AST to something more human-readable and useful. I’m also going to focus much more on the symbol attribute attached to nodes — it appears to be the symbolic representation used by TS to determine whether types match and are appropriate. So it will likely give us the most information on what an AST node actually means. Secondarily to that are reporting types, and as they can get complicated, I’ll try to narrow the type just like TS does, to the best guess of what type the variable is at that moment. I’m still not sure if to represent union types I’ll have to report a variable as more than one type, somehow, but I will assume for now the API is flexible enough to handle whatever I need to toss at it. It’s a good reminder to report just the information you would want to know if someone pointed at a node in the AST and asked “what’s this?” The idea that you want to focus less on makes sense. I can just imagine a lookup for alone is near useless and muddies the water when looking for usage of T. An ideal approach would be looking for the top-level generic, and if there’s too many results, filtering the list by . Similarly, if looking for T, you could narrow further by T[], or Array or [T] ... yes, those are all 3 ways of specifying an array of T in TypeScript. The first two are interchangeable. The last is most specific because it expects a size of 1, only containing T. |
Well, I figured I'd go back to basics and try to figure out Sourcetrail's ERD, how the various "kinds" relate to one another and to the graph's representation. Ended up with https://github.com/LouisStAmour/SourcetrailTypescriptIndexer/blob/master/docs/SourcetrailDB%20Schema.pdf where, if I were to summarize, a table named "element" is used to track unique IDs for pretty much every other table, except that elements have many components (ambiguous markers/internal data?) and elements have many source_locations (Source Range + Location Kind), while a source location can be mapped to one or more elements. What is an element then? The primary kinds are nodes and edges. Nodes are files or symbols, edges connect two nodes. Nodes are unique in that they can be named, while edges are distinguished by their types, sources and targets. Multiple edges of different types can connect the same nodes (symbols or files). Files have contents, symbols have kinds and definition kinds, and edges have reference kinds. Special types of edges include membership and aggregation, haven't traced these back to the relevant calls yet. Actually, my next goal was to look at which kinds make sense for TypeScript and which don't, as well as to map some of the API calls to table inserts. Other types of elements besides nodes and edges are errors and local_symbols, which have source locations, and possibly ambiguous, but are otherwise detached from the graph. (That's right, a local_symbol is not a symbol.) The source file for the graph is the graphml, edited/created interactively using yEd, based on a graphml file created in DBeaver. |
Wow, this is awesome. Everything you deduced is correct. Here some more information:
|
Two things I've noted, before I go back to my, um, visual modelling, is that 1. I noticed after a "clear" that while most of my tables have cleared the element numbering shows 73 to 90, and the element table has entries from 1 to 90. And 2. local symbols and errors might not be intentionally part of the table, but it certainly creates a relationship: https://github.com/CoatiSoftware/SourcetrailDB/blob/master/core/src/DatabaseStorage.cpp#L479 and https://github.com/CoatiSoftware/SourcetrailDB/blob/master/core/src/DatabaseStorage.cpp#L524 |
Oh right, sorry for the misinformation. I still remembered an older state of the database. Looks like we already fixed that (before the error table and the local_symbol table had stored location information on their own). |
Okay, I've updated the above diagram with TypeScript API calls. Might have to double-count to make sure I didn't miss any. Looks like some of the Location Kinds aren't possible to set (yet), such as "fulltext search" or "screen search"? Presumably these are used internally, perhaps for screen state restoration, or as a form of prioritization. I'll say that compared to the other calls, it feels a bit odd (a) that there are so many location API calls -- given the pattern of the others, it might be nicer to have a SymbolLocationKind class and be able to pass in Token (presumably), Scope, Qualifier, or Signature. Not a big deal, once you work it out mentally. The real odd one is:
It's like a From my perspective, you record a reference, then record the reference location. Actually, maybe I'm thinking of this backwards. Perhaps a better question is -- when is location not required or not desired to be tracked? It would seem to me that 99% of the time, you'd want to track the location of everything added to the system. Because location is attached to element and everything we track is an element, it follows that everything has a location. The only things which might not need a location for obvious reasons are files, and it might follow that the only reason files are nodes is because in some languages, there might be a strong one-to-one mapping between -- actually wait, no, it doesn't make sense. Why are files considered nodes? Ah. Because you can have edges between files, or references from one file to another. Though ... you can't set that up in the API, so that files are a type of node is an implementation detail as far as the ERD is concerned. Okay, let me see if I can make that clearer in my diagram somehow. Edit: I've updated the diagram so files is now closer to source location, and ambiguous references live closer to edges, and re-arranged spacing so reference locations live between references and source locations, without overlapping lines. Ideally, the symbol location APIs would live between symbols and source locations, but it's confusing if I maintain the database schema mapping of files to nodes when the API layer breaks that and only maps files to source locations. Second edit: I've updated the diagram a second time to move symbol source location APIs between symbols and source locations. And manually touched up the label placement for ERD edges. |
Now I only need to make sure I understand how the different kinds map to the Sourcetrail UI, again. I've a better idea why you record certain things -- the API is meant to be a bit lazy, in that you'd probably try to record as little as possible. And while you can say a file imports another file, it's not supported by API documentation. So ... I'm not sure how I should handle |
As an update, the typescript indexer is underway, I've got it looping through files, but I was getting bogged down in all the error-checking statements and having to maintain so many temporary IDs, so I decided to clean up the TypeScript API, and came up with a builder syntax: https://github.com/LouisStAmour/sourcetraildb/blob/master/src/builder.ts (This might not be final, I'll have to add documentation and may clean it up further as I use it.) Here's the cpp_example.ts converted to the new builder syntax, which also throws exceptions, but takes advantage of this to return itself and allow for easy object and property declaration through chaining: (Edit: Re-reading the builder syntax, the part that concerns me most is the createSymbol parameters. They’re not reading quite like English to me, but I’m not yet sure if the extra complexity of having another builder and an extra function call or two is worth it.) import SourcetrailDB, { ReferenceKind, SymbolKind } from "./src/builder";
// open database by passing .srctrldb or .srctrldb_tmp path
SourcetrailDB.openAndClear(dbPath, writer => {
console.log("Starting Indexing...");
// record source file by passing it's absolute path
const file = writer.createFile(sourcePath).asLanguage("cpp"); // for syntax highlighting
// record atomic source range for multi line comment
writer.recordAtomicSourceRange(file.at(2, 1, 6, 3));
// record namespace "api"
const namespace = writer
.createSymbol(".", "api")
.explicitly()
.ofType(SymbolKind.NAMESPACE)
.atLocation(file.at(8, 11, 8, 13))
.withScope(file.at(8, 1, 24, 1));
// record class "MyType"
const className = namespace
.createChildSymbol("MyType")
.explicitly()
.ofType(SymbolKind.CLASS)
.atLocation(file.at(11, 7, 11, 12))
.withScope(file.at(11, 1, 22, 1)); // gets highlight when active
// record inheritance reference to "BaseType"
writer
.createSymbol(".", "BaseType")
.isReferencedBy(className, ReferenceKind.INHERITANCE)
.atLocation(file.at(12, 14, 12, 21));
// add child method "void my_method() const"
const method = className
.createChildSymbol("void", "my_method", "() const")
.explicitly()
.ofType(SymbolKind.METHOD)
.atLocation(file.at(15, 10, 15, 18))
.withScope(file.at(15, 5, 21, 5)) // gets highlight when active
.withSignature(file.at(15, 5, 15, 45)); // used in tooltip
// record usage of parameter type "bool"
writer
.createSymbol(".", "bool")
.isReferencedBy(method, ReferenceKind.TYPE_USAGE)
.atLocation(file.at(15, 20, 15, 23));
// record parameter "do_send_signal"
writer
.createLocalSymbol("do_send_signal")
.atLocation(file.at(15, 25, 15, 38))
.atLocation(file.at(17, 13, 17, 26));
// record source range of "Client" as qualifier location
const qualifier = writer
.createSymbol(".", "Client")
.withQualifier(file.at(19, 13, 19, 18));
// record function call reference to "send_signal()"
qualifier
.createChildSymbol("", "send_signal", "()")
.ofType(SymbolKind.FUNCTION)
.isReferencedBy(method, ReferenceKind.CALL)
.atLocation(file.at(19, 21, 19, 31));
// record error
writer.recordError(
'Really? You missed that ";" again? (intentional error)',
file.at(22, 1, 22, 1)
);
});
console.log("done!"); |
This little project is teaching me so much I didn't know or had forgotten about JavaScript. For instance, I was looking up why I couldn't refer to a name for my classes in a certain location and discovered: class expressions! https://www.typescriptlang.org/docs/handbook/release-notes/typescript-1-6.html#class-expressions
let Point = class {
constructor(public x: number, public y: number) { }
public length() {
return Math.sqrt(this.x * this.x + this.y * this.y);
}
};
var p = new Point(3, 4); // p has anonymous class type
console.log(p.length()); At times like these, I wonder if an AST is simply more work than a simple graph of symbols to declarations... In this case, the class doesn't have a name, the variable the class is assigned to does. What the how in the .... I'm reminded of anonymous functions. Or anonymous classes in Java. Searching around, I found this for C++ CoatiSoftware/Sourcetrail#189 which highlights that a user-friendly way to handle this is to add the members to the graph, as if they belong to the parent or assigned variable as members. That makes sense to me, I suppose in the graph it's easier to think of an anonymous class as a class with a "namespace" or a class with a variable name that isn't global to the current scope. It ... would complicate variable assignment, I suppose. Sometimes a variable assignment would be a class definition. Joy. :) |
Okay, so I'm making progress, but it feels like I'm doing it the hardest way possible, pretty much because I don't know any better yet. I've copied over the code for the language server when it's building an outline of the source code for a file. But I've replaced each call to build the outline with a debugger breakpoint, and as the AST walker hits each breakpoint, I manually inspect the match to figure out what to do. So far, I haven't left the global definitions of lib d.ts files, so I've restricted my matches currently to such files where I find them, as I'm creating a lot of GLOBAL_VARIABLES but in the process, I feel like my graph isn't as rich as it could be. And once I'm done with globals, I'll have to hit a point where I'm creating node names that mix both filenames ("/" delimiters) and JS/TS variable scope ("." delimiters) and I'm not yet sure how I'll handle that: these exist in packages/folders which might be nice to navigate if I hard-code one level, but if I use multi-level, or write in the filename to the scope or name of the variable, well, normally in TS we don't consider the files to be modules -- we don't display filenames as part of variable names -- but we treat them like that when editing. If not in ES6 modules mode, then we'd probably fall back to globals or local function scope. The way we tend to think of it is "symbolName defined in file path" the same way you'd "import (name or alias) from (file)". There's also a difference between an exported symbol and one that isn't. So it would make sense that only exported symbols are scoped to the file, while other symbols are considered local. But local symbols aren't rich enough to be part of the graph -- local symbols can't have edges, you can't record calls to local symbols, they're not nodes. So if an exported function is complicated enough to call multiple local functions, and maybe a local class, there's no easy way to define this in Sourcetrail right now outside of adding extra noise to the Name Hierarchy in the form of file paths to make TS symbols unique to the files they're defined in... |
An update, now that Sourcetrail's open source, this should speed up the work I was doing above, as I'll have a better idea of how this fits in, and maybe I can suggest parts of a PR to better support TypeScript in Sourcetrail's UI/graph types. While I do like the consistency of the existing types, it's worth trying to import a slightly more complicated graph and see how it plays out visually, where the seams are that need to be ironed out. |
Sorry for keeping quiet for a while. We had a lot of work going on with opens-sourcing Sourcetrail. I don't know what you have decided to do now, but from what it sounds like: if you have a way of indexing TS code that reads in multiple files at once, use that way. That's the way that sounds at least like it is capable of solving relations between symbols. Looking at the screenshot from above: I'm not familiar with TS, but to me the code just looks like it is only definitions. No usages, calls, etc there. So it is just natural that the graph looks this way ;) Maybe just take a look at our Java indexer (it is much easier to understand than the C++ indexer). In the Java indexer the For example take a look at boolean AstVisitor.visit(SimpleName node). Here we just get an object of type
This "converts" the AST node to a
Regarding file and symbol name delimiter: don't worry about this now. just use a I hope this helps ;) |
Thanks! Sorry for the radio silence on my end also :) Things got suddenly busy at work and I had folks requesting C# and XSLT analysis so I had to drop the TS analysis temporarily. I will get back to it, but possibly not until the holiday break. Instead I'm learning all this Roslyn and XSLT 1.0, fun! ;-) Might contribute to #17 when I get the chance. |
I am interested in helping with the TypeScriptIndexer, although i am not that much experienced with TypeScript and C++, but see this as a step forward to learn both in more detail! @LouisStAmour i tried running your indexer but was only succesful to index via the commandline. When i try this inside the project settings (Custom Command) it does not work. When starting to index it either returns error code 1 when run via:
or when run without that:
When i run the command from the error message in the commandline i get no error and the database is build 🤔. Any suggestions what i am doing wrong? Or is the indexer not at the step to be used inside the project settings? Maybe @mlangkabel also has suggestions about how to insert a command in the correct way? |
Has there been any movement on this front? |
I'm also interested. |
I would be interested in adding typescript support to SourcetrailDB. This would also lay the groundwork for JavaScript support as well.
Specifically, I would like to ensure that react code is supported as well.
The text was updated successfully, but these errors were encountered: