# Graph Lab

## Header information:

  - Author #1: Hamna Malik (malikh32@mcmaster.ca)
  - Author #2: Julian Cecchini (cecchinj@mcmaster.ca)
  - Gitlab URL: http://gitlab.cas.mcmaster.ca/...
  - Avenue to Learn group name: Group 53

Week 1 Justification:


It is important that information extraction be completely separate from graphing. This is why the cvsReader was created. It can open a cvs file and then create a python dictionary to store the information. These dictionaries can then be used, but the actual implementation is isolated from this process. This enforcess the single-responsibility principle, because the cvsReaders one and only responsibility is to extract the information from the cvs files. We felt that the cvsReader shouldn’t need to inherit a class because a different cvs file will need completely new methods - it will need to be rewritten either way, and so there would be no benefit in creating the inheritance. 
We also have a PathListGenerator, which is a simple script where the cvs file accessing is found. The files are made into a dictionary, that can be received and analyzed by the analyzer. It is simply a single dictionary within a single line. This makes it significantly easier to change the cvs files, since all that is required it changing the cvs file paths. It is also all in one single file, so it isn’t necessary to look at or change the actual code.
The attributes were made in a way to be generic and applicable to every graph that will be generated so that it does not need to be rewritten every time. Instead, all the details specific to the cvs file can be added to the attribute we have called extraInfo. This is a dictionary that can be added to or changed to best suit the data obtained. This was done deliberately because having to create a completely new node class for every different graph would be incredibly inconvenient and be a poor design. A superclass could also be a possibility but it would require rewriting parts of the node class that isn’t necessary. We have also made it so that the node has a unique attribute nodeID, allowing easy identification of each node.
The same method was implemented with the edge class, but it has the additional attributes of uniqueValues and weight. For the weight, it would have been possible to make a separate class for this, but there is no real reason for this since we can just have the edge contain it. If we need to add another weight type (ex. Station weight), there is a changeWeight function in the edge class. The uniqueValues is another design choice we decided to implement to make accessing edges easier. Each edge is assigned a unique format of node 1 number, node 2 number, and weight all concatenated together. This way, by just getting the edge ID, we can gain information about which two nodes are being connected with this edge, and it’s weight. The uniqueValues attribute is necessary when two edges have identical node1, node2, and weights. Then, we concatenate a unique value to the end of it, to differentiate these edges. They can be used as information about the edge attribute. For example, if you want to specify one subway route has accessibility accommodations, this can be specified in the uniqueValues attribute.
To actually initialize the nodes and edges, we have methods for collecting objects, that is responsible for assembling the nodes and edges into a graph. This also enforces the single-responsibility principle, because it’s only job is to make the graph. The graph class is responsible for the actual maintenance of the graph, but to keep it single-responsibility, we didn’t want it to also be responsible for breaking the information down from dictionaries into objects. This is instead done by the collect object method. The CollectNode method for example specifies that the nodeID is an integral attribute to the object, and then the rest of the information in the dictionary will be added to the extraInfo attribute. This way, when the cvs creates a dictionary it takes information and transfers it to the CollectNodes, where it is sorted into a node object.
Once all the objects are created, they need to be created into the actual graph. We’ve implemented a Factory Pattern between UndirectedGraph and another method called graphUpdater. The graphUpdater deals with permutations and changes to the graph, as well as its initialization. UndirectedGraph manges edges, nodes, and the adjacency list. The factory pattern and implementation allow you to remove and add, and it does not matter where this information is taken from. The GraphUpdater initializes the cvsReader, and the Collectors, and then makes them work together to create a pipeline that is used to create the graph. Each time you want ot update the graph, pass it into GraphUpdater and call updater().  If we want a specific graph, like tubeGraph, we can add a child tubeGraph to UndirectedGraph.Then, we also add a tubeGraphUpdater as a child of GraphUpdater. This makes the design enforce the open-closed principle because it is extendable by adding different graph types but still closed to modification, since the UndirectedGraph and GraphUpdater classes are not modified when you add a new graph type. This way you can add as many types of graphs, depending on the situation, and the graph implantation will still function.
Another feature we decided to add into this was for updating larger graphs after adding additional nodes. If the graph had 100 000 nodes for example, and we wanted to add another 1000, instead of updating it all, there is an updateNodes() that can be used instead, and called alone. This will just take the new nodes and add them to the large graph, thus saving time spent updating all 101 000 nodes. 
Separate from all the graph implementation, we also have a metric extractor call. This allows us to call all the functions within it onto specific graphs to extract specific pieces of information like the average degree and the number of nodes.
For the Dijktra’s algorithm, we ran into a dilemma of not being able ot combine weights while also relaxing edges, which created an issue of being unable to quantify time along a path. Therefore, the algorithm is run with different weight prioritization and lists them all at the end for the client to select. This design choice was necessary because it is impossible to predict how much time any given customer is willing to sacrifice for fewer line transfers. Also, it seemed unnecessary to add line changes as a weighted value, as it cannot be statically defined. Instead, we made the design decision to have a wrapper on top of Dijkstra and A*, which checks for line count and returns it. This was also done instead of just adding line transfers as a heuristic for the A* algorithm because we felt that could have produced suboptimal solutions in return for overall fewer lines. Instead, we simply present the best solutions to the customer, and they can decide what they want to prioritize in their journey.
For benchmarking, we have a KPI class with methods within it that can be called within each algorithm. It is very generic and so does not ever need to be changed. The way it works is applicable classes (like node and edge) keep track within themselves the number of times the . These classes are “KPI participant”, meaning that they have a giveKPI function. When a KPI object is made, it needs ot be given a KPI participant, and the KPI class will link the participant with  This can then be accessed within the KPI method, and combined with a calculated execution time for benchmarking.
In terms of how the work was done, it usually included the two of us discussing different design options and ideas. We both would break down the problem, brainstorm ideas, and discuss how to execute it. After these decisions were made collaboratively, Julian took charge of implementing them with support from Hamna. For example, Julian wrote the shortest path algorithm, and Hamna wrote the priority queue for him to implement this. Julian was responsible for most of the algorithm coding, and once complete Hamna took charge of the UML diagram and writing the justifications of design that they discussed. Improvements and alterations throughout the project were discussed and developed together as well.


In [None]:
![Alt Text]("...\l1-graph-lab\UML_Week1.pdf")

In [None]:
#Coding Example

def PathListGenerator():

    dict={}
    print("""------------------------------------------------------------------------------------------------------------""")
    print("This function helps develop the dictionary necessary for graphUpdator, but is unnecessary to directly use it")
    print("The format of the dictionary needed for graphUpdator that will be generated here is as follows:")
    print("{nodePath: str, edgePath: str, nodeID: str, edgeNodeLabel1: str, edgeNodeLabel2: str}")
    print("Optional Arguments: {weightLabels: list[str], uniqueValues: list[str], additionalPaths: dict[str : str] }")
    print("Please check UndirectedGraph documentation to obtain further information. ")
    print("\n")
    print("Generation shall now begin...")

    dict['nodePath']=input("Enter path to csv containing nodes for graph\n")
    dict['edgePath']=input("Enter path to csv containing edges for graph\n")
    dict['nodeID']=input("Enter label for row/column containing values that can uniquely identify nodes\n")
    dict['edgeNodeLabel1']=input("Enter label for row/column containing values for one end of an edge\n")
    dict['edgeNodeLabel2']=input("Enter label for row/column containing values for the other end of an edge\n")

    print("Necessary values are done. Proceeding to optional arguments.")
    print("If you wish to skip over an argument, enter \"s\" ")

    inp=input("Enter label(s) for row/column containing values for edge weights, separated by commas but no spaces\n")
    dict['weightLabel']=None if inp =="s" else inp.split(',')
    inp=input("Enter label(s) for row/column containing values unique to edges, separated by commas but no spaces\n")
    dict['uniqueValues']=[] if inp =="s" else inp.split(',')

    dict['additionalPaths']={}
    print("Enter title of csv and corresponding paths for additional features of the graph in the following format:")
    inp=input("title/path, title/path, ...:\n")

    if inp is not 's':
        for pair in [tuple.split('/') for tuple in inp.split(', ')]: 
            print(pair) 
            dict['additionalPaths'][pair[0]]=pair[1]
    
    print("Dictionary has been generated, now returning")
    print("""------------------------------------------------------------------------------------------------------------""")
    return dict

PathListGenerator()

# create graph
g = UndirectedGraph({},{},{})
u = GraphUpdater(g, generatedDict)
u.update()





#shortest path algorithms

sp = ShortestPath(g)

dijk=sp.Dikjstra

edgeTo, distTo=dijk('24','222',[['time']],[BaseHeuristic])

edgeTo, distTo=dijk('59','250',[['time']],[EuclideanForTube])


edgeTo, distTo=dijk('197','250',[['time']],[StationCount])

astar=sp.Astar

edgeTo, distTo=astar('24','222',[['time']],[BaseHeuristic])

edgeTo, distTo=astar('197','250',[['time']],[EuclideanForTube])

edgeTo, distTo=astar('197','250',[['time']],[StationCount])

# metric extractor

extractor = MetricExtractor(g)

#total number of nodes in the graph
num_nodes = extractor.get_num_nodes

#total number of edges in the graph
num_edges = extractor.get_num_edges

rand_node = randint(num_nodes)

#get the degree of a node
node_deg = extractor.get_degree(rand_node)

rand_edge = randint(num_edges)

#get the weight of an edge
edge_weight = extractor.get_weight(rand_edge)

#average weight of all edges
avg_w = extractor.get_ave_weight

#average degree per node for all nodes
avg_deg = extractor.get_ave_degree

#creates a histogram of the number of nodes per degree in the graph
histogram = extractor.histogram
