# Introduction to Software Analytics with Jupyter Notebooks

* Jupyter Notebooks are JSON documents
   * they consist of in- and out-cells
   * input cells have a type, e.g. Markdown or Code

* a kernel, in this case a Python 3 kernel, is running in the background of each Notebook
   * this allows to write and execute Python code directly inside the Notebook
   * therefor, Jupyter Notebooks are interactive
   * libraries like Pandas or Plotly allow to facilitate the analysis and visualize data

* extension increase the usability of cells by adding for example language integrations
   * 'cypher' extenstion allows the connection to a Neo4j db and writing queries directly inside a cell

In [None]:
%load_ext cypher
%config CypherMagic.uri='http://neo4j:neo@localhost:7474/db/data'

In [None]:
%%cypher 
MATCH (t:Type:Java)
WHERE t.fqn STARTS WITH "com.salesmanager"
RETURN count(t)

* Python libraries allow for easy data visualization
   * Plotly and Pygal provide many different types of diagrams and are well-documented
   * integration of D3JS (via embedding of the HTML) is possible, but difficult

In [None]:
import plotly.express as px

In [None]:
artifactSize = %cypher MATCH (a:Java:Main:Artifact)-[:CONTAINS]->(t:Type) \
                       WHERE a.group STARTS WITH "com.shopizer"           \
                       RETURN a.name AS  Artifact,                        \
                              count(DISTINCT t) AS Classes

df = artifactSize.get_dataframe()
fig = px.pie(df, values='Classes', names='Artifact', title='Sizes of the artifacts')
fig.show()

* enriching information in the graph is also possible

In [None]:
%%cypher
// Identifying all Shopizer nodes
MATCH (artifact:Main:Artifact{group: "com.shopizer"})
SET artifact:Shopizer
WITH artifact
MATCH (artifact)-[:CONTAINS]->(c)
SET c:Shopizer
RETURN artifact.name AS Artifact, 
       count(DISTINCT c) AS ContentCount
ORDER BY artifact.name