# Internal Dependencies
<br>  

### References
- [Analyze java package metrics in a graph database](https://joht.github.io/johtizen/data/2023/04/21/java-package-metrics-analysis.html)
- [Calculate metrics](https://101.jqassistant.org/calculate-metrics/index.html)
- [Neo4j Python Driver](https://neo4j.com/docs/api/python-driver/current)

In [1]:
import os
import pandas as pd
import matplotlib.pyplot as plot
from neo4j import GraphDatabase

In [2]:
# Please set the environment variable "NEO4J_INITIAL_PASSWORD" in your shell 
# before starting jupyter notebook to provide the password for the user "neo4j". 
# It is not recommended to hardcode the password into jupyter notebook for security reasons.

driver = GraphDatabase.driver(uri="bolt://localhost:7687", auth=("neo4j", os.environ.get("NEO4J_INITIAL_PASSWORD")))
driver.verify_connectivity()

In [3]:
def get_cypher_query_from_file(cypherFileName):
    with open(cypherFileName) as file:
        return ' '.join(file.readlines())

In [4]:
def query_cypher_to_data_frame(filename : str, limit: int = 10_000):
    cypher_query_template = "{query}\nLIMIT {row_limit}"
    cypher_query = get_cypher_query_from_file(filename)
    cypher_query = cypher_query_template.format(query = cypher_query, row_limit = limit)
    records, summary, keys = driver.execute_query(cypher_query)
    return pd.DataFrame([r.values() for r in records], columns=keys)

In [5]:
def query_first_non_empty_cypher_to_data_frame(*filenames : str, limit: int = 10_000):
    """
    Executes the Cypher queries of the given files and returns the first result that is not empty.
    If all given file names result in empty results, the last (empty) result will be returned.
    By additionally specifying "limit=" the "LIMIT" keyword will appended to query so that only the first results get returned.
    """    
    result=pd.DataFrame()
    for filename in filenames:
        result=query_cypher_to_data_frame(filename, limit)
        if not result.empty:
            return result
    return result

In [6]:
#The following cell uses the build-in %html "magic" to override the CSS style for tables to a much smaller size.
#This is especially needed for PDF export of tables with multiple columns.

In [7]:
%%html
<style>
/* CSS style for smaller dataframe tables. */
.dataframe th {
    font-size: 8px;
}
.dataframe td {
    font-size: 8px;
}
</style>

In [8]:
# Pandas DataFrame Display Configuration
pd.set_option('display.max_colwidth', 300)

## 1 - Modules

List the modules this notebook is based on. Different sorting variations help finding modules by their features and support larger code bases where the list of all modules gets very long.

Only the top 30 entries are shown. The whole table can be found in the following CSV report:  
`List_all_Typescript_modules`

In [9]:
internalModules = query_cypher_to_data_frame("../cypher/Internal_Dependencies/List_all_Typescript_modules.cypher")

### Table 1a - Top 30 modules with the highest element count

In [10]:
# Sort by number of modules descending
internalModules.sort_values(by=['numberOfElements','moduleName'], ascending=[False, True]).reset_index(drop=True).head(30)

Unnamed: 0,rootProjectName,moduleName,numberOfElements,numberOfGitCommits,incomingDependencies,outgoingDependencies
0,react-router-6.28.2,react-router-dom,62,423,26,308
1,react-router-6.28.2,react-router-native,17,154,10,48
2,react-router-6.28.2,react-router,7,251,0,34
3,react-router-6.28.2,todos,7,7,0,0
4,react-router-6.28.2,server,6,164,0,76
5,react-router-6.28.2,snkrs,4,4,0,0
6,react-router-6.28.2,data,3,8,0,0
7,react-router-6.28.2,images,2,5,0,0
8,react-router-6.28.2,images,2,4,0,0
9,react-router-6.28.2,App,1,6,0,1


### Table 1b - Top 30 modules with the highest number of incoming dependencies

The following table lists the top 30 internal modules that are used the most by other modules (highest count of incoming dependencies, highest in-degree).

In [11]:
# Sort by number of incoming dependencies descending
internalModules.sort_values(by=['incomingDependencies','moduleName'], ascending=[False, True]).reset_index(drop=True).head(30)

Unnamed: 0,rootProjectName,moduleName,numberOfElements,numberOfGitCommits,incomingDependencies,outgoingDependencies
0,react-router-6.28.2,react-router-dom,62,423,26,308
1,react-router-6.28.2,react-router-native,17,154,10,48
2,react-router-6.28.2,App,1,6,0,1
3,react-router-6.28.2,auth,1,5,0,0
4,react-router-6.28.2,auth,1,4,0,0
5,react-router-6.28.2,data,3,8,0,0
6,react-router-6.28.2,images,2,5,0,0
7,react-router-6.28.2,images,2,4,0,0
8,react-router-6.28.2,react-router,7,251,0,34
9,react-router-6.28.2,server,6,164,0,76


### Table 1c - Top 30 modules with the highest number of outgoing dependencies

The following table lists the top 30 internal modules that are depending on the highest number of other modules (highest count of outgoing dependencies, highest out-degree).

In [12]:
# Sort by number of outgoing dependencies descending
internalModules.sort_values(by=['outgoingDependencies','moduleName'], ascending=[False, True]).reset_index(drop=True).head(30)

Unnamed: 0,rootProjectName,moduleName,numberOfElements,numberOfGitCommits,incomingDependencies,outgoingDependencies
0,react-router-6.28.2,react-router-dom,62,423,26,308
1,react-router-6.28.2,server,6,164,0,76
2,react-router-6.28.2,react-router-native,17,154,10,48
3,react-router-6.28.2,react-router,7,251,0,34
4,react-router-6.28.2,App,1,6,0,1
5,react-router-6.28.2,auth,1,5,0,0
6,react-router-6.28.2,auth,1,4,0,0
7,react-router-6.28.2,data,3,8,0,0
8,react-router-6.28.2,images,2,5,0,0
9,react-router-6.28.2,images,2,4,0,0


### Table 1d - Top 30 modules with the lowest element count

In [13]:
# Sort by number of elements ascending
internalModules.sort_values(by=['numberOfElements','moduleName'], ascending=[True, True]).reset_index(drop=True).head(30)

Unnamed: 0,rootProjectName,moduleName,numberOfElements,numberOfGitCommits,incomingDependencies,outgoingDependencies
0,react-router-6.28.2,App,1,6,0,1
1,react-router-6.28.2,auth,1,5,0,0
2,react-router-6.28.2,auth,1,4,0,0
3,react-router-6.28.2,images,2,5,0,0
4,react-router-6.28.2,images,2,4,0,0
5,react-router-6.28.2,data,3,8,0,0
6,react-router-6.28.2,snkrs,4,4,0,0
7,react-router-6.28.2,server,6,164,0,76
8,react-router-6.28.2,react-router,7,251,0,34
9,react-router-6.28.2,todos,7,7,0,0


### Table 1e - Top 30 modules with the lowest number of incoming dependencies

The following table lists the top 30 internal modules that are used the least by other modules (lowest count of incoming dependencies, lowest in-degree).

In [14]:
# Sort by number of incoming dependencies ascending
internalModules.sort_values(by=['incomingDependencies','moduleName'], ascending=[True, True]).reset_index(drop=True).head(30)

Unnamed: 0,rootProjectName,moduleName,numberOfElements,numberOfGitCommits,incomingDependencies,outgoingDependencies
0,react-router-6.28.2,App,1,6,0,1
1,react-router-6.28.2,auth,1,5,0,0
2,react-router-6.28.2,auth,1,4,0,0
3,react-router-6.28.2,data,3,8,0,0
4,react-router-6.28.2,images,2,5,0,0
5,react-router-6.28.2,images,2,4,0,0
6,react-router-6.28.2,react-router,7,251,0,34
7,react-router-6.28.2,server,6,164,0,76
8,react-router-6.28.2,snkrs,4,4,0,0
9,react-router-6.28.2,todos,7,7,0,0


### Table 1f - Top 30 modules with the lowest number of outgoing dependencies

The following table lists the top 30 internal modules that are depending on the lowest number of other modules (lowest count of outgoing dependencies, lowest out-degree).

In [15]:
# Sort by number of outgoing dependencies ascending
internalModules.sort_values(by=['outgoingDependencies','moduleName'], ascending=[True, True]).reset_index(drop=True).head(30)

Unnamed: 0,rootProjectName,moduleName,numberOfElements,numberOfGitCommits,incomingDependencies,outgoingDependencies
0,react-router-6.28.2,auth,1,5,0,0
1,react-router-6.28.2,auth,1,4,0,0
2,react-router-6.28.2,data,3,8,0,0
3,react-router-6.28.2,images,2,5,0,0
4,react-router-6.28.2,images,2,4,0,0
5,react-router-6.28.2,snkrs,4,4,0,0
6,react-router-6.28.2,todos,7,7,0,0
7,react-router-6.28.2,App,1,6,0,1
8,react-router-6.28.2,react-router,7,251,0,34
9,react-router-6.28.2,react-router-native,17,154,10,48


## 2 - Cyclic Dependencies

Cyclic dependencies occur when one module uses an elements of another module and vice versa. 
These dependencies can lead to problems when one of these modules needs to be changed.

### Table 2a - Cyclic Dependencies Overview

Show the top 40 cyclic dependencies sorted by the most promising to resolve first. This is done by calculating the number of forward dependencies (first cycle participant to second cycle participant) in relation to backward dependencies (second cycle participant back to first cycle participant). The higher this rate (approaching 1), the easier it should be to resolve the cycle by focussing on the few backward dependencies.

Only the top 40 entries are shown. The whole table can be found in the following CSV report:  
`Cyclic_Dependencies_for_Typescript`

**Columns:**
- *projectFileName* identifies the project of the first participant of the cycle
- *modulePathName* identifies the module of the first participant of the cycle
- *dependentProjectFileName* identifies the project of the second participant of the cycle
- *dependentModulePathName* identifies the module of the second participant of the cycle
- *forwardToBackwardBalance* is between 0 and 1. High for many forward and few backward dependencies.
- *numberForward* contains the number of dependencies from the first participant of the cycle to the second one
- *numberBackward* contains the number of dependencies from the second participant of the cycle back to the first one
- *someForwardDependencies* lists some forward dependencies in the text format "type1 -> type2"
- *backwardDependencies* lists the backward dependencies in the format "type1 <- type2" that are recommended to get resolved

In [16]:
cyclic_dependencies = query_cypher_to_data_frame("../cypher/Cyclic_Dependencies/Cyclic_Dependencies_for_Typescript.cypher")
cyclic_dependencies.head(40)

Unnamed: 0,projectFileName,moduleName,dependentProjectFileName,dependentModulePathName,forwardToBackwardBalance,numberForward,numberBackward,forwardDependencyExamples,backwardDependencyExamples
0,react-router-dom,./index.tsx,react-router-native,./index.tsx,0.222222,11,7,"[useFormAction->useLocation, NavLink->useLocation, useLinkClickHandler->useLocation, useScrollRestoration->useLocation, useSearchParams->useLocation, SetURLSearchParams->NavigateOptions, LinkProps->To, useLinkClickHandler->To, useViewTransitionState->To]","[To<-LinkProps, useNavigate<-useLinkPressHandler, To<-useLinkPressHandler, useNavigate<-useDeepLinking, useNavigate<-useSearchParams, useLocation<-useSearchParams, NavigateOptions<-SetURLSearchParams]"


### Table 2b - Cyclic Dependencies Break Down

Lists modules with cyclic dependencies with every dependency in a separate row sorted by the most promising dependency first.

Only the top 40 entries are shown. The whole table can be found in the following CSV report:  
`Cyclic_Dependencies_Breakdown_for_Typescript`

**Columns in addition to Table 2a:**
- *dependency* shows the cycle dependency in the text format "type1 -> type2" (forward) or "type2<-type1" (backward)

In [17]:
cyclic_dependencies_breakdown = query_cypher_to_data_frame("../cypher/Cyclic_Dependencies/Cyclic_Dependencies_Breakdown_for_Typescript.cypher",limit=40)
cyclic_dependencies_breakdown

Unnamed: 0,projectFileName,moduleName,dependentProjectFileName,dependentModulePathName,dependency,forwardToBackwardBalance,numberForward,numberBackward
0,react-router-dom,./index.tsx,react-router-native,./index.tsx,To<-LinkProps,0.222222,11,7
1,react-router-dom,./index.tsx,react-router-native,./index.tsx,useNavigate<-useLinkPressHandler,0.222222,11,7
2,react-router-dom,./index.tsx,react-router-native,./index.tsx,To<-useLinkPressHandler,0.222222,11,7
3,react-router-dom,./index.tsx,react-router-native,./index.tsx,useNavigate<-useDeepLinking,0.222222,11,7
4,react-router-dom,./index.tsx,react-router-native,./index.tsx,useNavigate<-useSearchParams,0.222222,11,7
5,react-router-dom,./index.tsx,react-router-native,./index.tsx,useLocation<-useSearchParams,0.222222,11,7
6,react-router-dom,./index.tsx,react-router-native,./index.tsx,NavigateOptions<-SetURLSearchParams,0.222222,11,7
7,react-router-dom,./index.tsx,react-router-native,./index.tsx,useFormAction->useLocation,0.222222,11,7
8,react-router-dom,./index.tsx,react-router-native,./index.tsx,NavLink->useLocation,0.222222,11,7
9,react-router-dom,./index.tsx,react-router-native,./index.tsx,useLinkClickHandler->useLocation,0.222222,11,7


### Table 2c - Cyclic Dependencies Break Down - Backward Dependencies Only

Lists modules with cyclic dependencies with every dependency in a separate row sorted by the most promising  dependency first. This table only contains the backward dependencies from the second participant of the cycle back to the first one that are the most promising to resolve.

Only the top 40 entries are shown. The whole table can be found in the following CSV report:  
`Cyclic_Dependencies_Breakdown_BackwardOnly_for_Typescript`

In [18]:
cyclic_dependencies_breakdown_backward = query_cypher_to_data_frame("../cypher/Cyclic_Dependencies/Cyclic_Dependencies_Breakdown_Backward_Only_for_Typescript.cypher",limit=40)
cyclic_dependencies_breakdown_backward

Unnamed: 0,projectFileName,moduleName,dependentProjectFileName,dependentModulePathName,dependency,forwardToBackwardBalance,numberForward,numberBackward
0,react-router-dom,./index.tsx,react-router-native,./index.tsx,To<-LinkProps,0.222222,11,7
1,react-router-dom,./index.tsx,react-router-native,./index.tsx,useNavigate<-useLinkPressHandler,0.222222,11,7
2,react-router-dom,./index.tsx,react-router-native,./index.tsx,To<-useLinkPressHandler,0.222222,11,7
3,react-router-dom,./index.tsx,react-router-native,./index.tsx,useNavigate<-useDeepLinking,0.222222,11,7
4,react-router-dom,./index.tsx,react-router-native,./index.tsx,useNavigate<-useSearchParams,0.222222,11,7
5,react-router-dom,./index.tsx,react-router-native,./index.tsx,useLocation<-useSearchParams,0.222222,11,7
6,react-router-dom,./index.tsx,react-router-native,./index.tsx,NavigateOptions<-SetURLSearchParams,0.222222,11,7


## 3 - Module Usage

### Table 3a - Elements that are used by multiple modules

This table shows the top 40 modules that are used by the highest number of different modules. The whole table can be found in the CSV report `WidelyUsedTypescriptElements`.


In [19]:
elements_used_by_many_modules=query_cypher_to_data_frame("../cypher/Internal_Dependencies/List_elements_that_are_used_by_many_different_modules_for_Typescript.cypher", limit=40)
elements_used_by_many_modules

Unnamed: 0,fullQualifiedDependentElementName,dependentElementModuleName,dependentElementName,dependentElementLabels,numberOfUsingModules
0,"""@remix-run/router"".NavigateOptions",router,NavigateOptions,ExternalDeclaration,2
1,"""@remix-run/router"".Router",router,Router,ExternalDeclaration,2
2,"""@remix-run/router"".To",router,To,ExternalDeclaration,2
3,"""@remix-run/router"".useLocation",router,useLocation,ExternalDeclaration,2
4,"""@remix-run/router"".useNavigate",router,useNavigate,ExternalDeclaration,2
5,"""@remix-run/router"".DataStrategyFunction",router,DataStrategyFunction,ExternalDeclaration,1
6,"""@remix-run/router"".RouteObject",router,RouteObject,ExternalDeclaration,1
7,"""@remix-run/router"".RouteObject",router,RouteObject,ExternalDeclaration,1
8,"""@remix-run/router"".Router",router,Router,ExternalDeclaration,1


### Table 3b - Elements that are used by multiple modules

This table shows the top 30 modules that only use a few (compared to all existing) elements of another module.
The whole table can be found in the CSV report `ModuleElementsUsageTypescript`.

In [20]:
used_packages_of_dependent_artifact=query_cypher_to_data_frame("../cypher/Internal_Dependencies/How_many_elements_compared_to_all_existing_are_used_by_dependent_modules_for_Typescript.cypher",limit=30)
used_packages_of_dependent_artifact

Unnamed: 0,sourceModuleName,dependentModuleName,dependentElementsCount,dependentModuleElementsCount,elementUsagePercentage,dependentElementFullNameExamples,dependentElementNameExamples
0,server,react-router-dom,2,62,0.032258,"[""@remix-run/router"".RouteObject, ""@remix-run/router"".Router]","[RouteObject, Router]"
1,react-router,react-router-dom,2,62,0.032258,"[""@remix-run/router"".DataStrategyFunction, ""@remix-run/router"".Router]","[DataStrategyFunction, Router]"
2,react-router-native,react-router-dom,4,62,0.064516,"[""@remix-run/router"".useNavigate, ""@remix-run/router"".NavigateOptions, ""@remix-run/router"".useLocation, ""@remix-run/router"".To]","[useNavigate, NavigateOptions, useLocation, To]"
3,react-router-dom,react-router-native,4,17,0.235294,"[""@remix-run/router"".useLocation, ""@remix-run/router"".NavigateOptions, ""@remix-run/router"".To, ""@remix-run/router"".useNavigate]","[useLocation, NavigateOptions, To, useNavigate]"


### Table 3c - Distance distribution between dependent files

This table shows the file directory distance distribution between dependent files. Intuitively, the distance is given by the fewest number of change directory commands needed to navigate between a file and a dependency it uses. Those are aggregate to see how many dependent files are in the same directory, how many are just one change directory command apart, and so on.

In [21]:
query_first_non_empty_cypher_to_data_frame("../cypher/Internal_Dependencies/Get_file_distance_as_shortest_contains_path_for_dependencies.cypher",
                                           "../cypher/Internal_Dependencies/Set_file_distance_as_shortest_contains_path_for_dependencies.cypher", limit=20)

Unnamed: 0,dependency.fileDistanceAsFewestChangeDirectoryCommands,numberOfDependencies,numberOfDependencyUsers,numberOfDependencyProviders,examples
0,0,1,1,1,[./server.tsx uses ./index.tsx]
1,4,4,4,2,"[./index.ts uses ./index.tsx, ./index.tsx uses ./index.tsx, ./server.tsx uses ./index.tsx, ./index.tsx uses ./index.tsx]"
