Skip to content

How to identify technologies ?

Helielzel edited this page Jun 14, 2023 · 14 revisions

In order to provide good diagrams, we need to be able to automatically identify technologies in model elements for which a package manager (Maven, NPM, pip, whatever) is used. How to do that ?

Experiment 1 : Gathering opinions (interviews, forums)

To get ideas and opinions, we asked questions of the company's developers, as well as on various forums (Reddit, Discord, Stackoverflow (failed), développez.com (still awaiting validation)). Below is a summary of what was said.

In general, the devs we interviewed didn't have a clear opinion, but had some interesting ideas. It was recommended that we visit a number of sites and forums to see what technologies were listed.

  • Stackoverflow tech tags
  • Stacks from stackshare.io
  • Techempower
  • Github/google Advanced Search Reddit
  • ChatGPT
  • Gitlab Auto Devops and RedHat OpenShift auto detection of technologies.

As for the devs interviewed directly, the most common response was that when it comes to an architecture doc, the name of the language(s) and frameworks is enough. Which is relevant, of course, but not precise enough.

============================= #Forum message template Hello everyone !

I'm working on a maven project called aadarchi. It's a Maven archetype allowing you to easily create your agile architecture documentation using a mix of C4, Asciidoc and PlantUML.

So far it's been mostly focused on Java projects, and we're currently trying to make it work for JS, or even maybe Python. But there's a catch ! In order to provide good diagrams, we need to be able to automatically identify technologies in model elements for which a package manager (Maven, NPM, pip...) is used. But how ? How can we decide what is really a "technology", which deserves to be detected in a project (for example in package.json files) and used in its architecture documentation ? Of course, we already thought about "just the language and framework", but we need some advices...

So : What are your thoughts ?

Thank you !

--> QUESTION "OFF TOPIC", immediately closed. image

Experiment 2 : Scraping ?

Suggested by a company dev, why don't we use a scraping script (Python Scrapy) to get the list of technologies on relevant sites ? For him, it was worth a shot to try it on https://stackshare.io/ or maybe https://techdetector.de/welcome

Experiment 3 : How does Gitlab Auto DevOps and other auto detect technologies ?

While asking to Devs on Slack, I got this answer : "You might find some answers by looking at the way Gitlab Auto DevOps or Red Hat OpenShift "auto detects" the underlying language and technologies of a project." ~S.R

https://about.gitlab.com/stages-devops-lifecycle/auto-devops/ https://techdetector.de/welcome

Experiment 4 :

In the end, we decided that scraping the sites seems like a good solution, in order to recover as many technologies as possible on the most objective criteria possible. But we decided to do it in a more complete way, and on two different sites: mvnrepository and stackoverflow. We used 2 tools, scrapy (python) for mvnrepository and a RestAPI for stackoverflow, filtering technologies according to their popularity.

===============

Experiment name

What do we want to validate ?

Is there any preexisting solution or research ?

Does this solution or research completely answers our needs ?

How will we validate this hypothesis ?

When is the experiment a success ?

What are the results ?