-
Notifications
You must be signed in to change notification settings - Fork 0
/
ComputationalGraphPrimer.py
2324 lines (1997 loc) · 127 KB
/
ComputationalGraphPrimer.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
__version__ = '1.1.2'
__author__ = "Avinash Kak (kak@purdue.edu)"
__date__ = '2023-February-2'
__url__ = 'https://engineering.purdue.edu/kak/distCGP/ComputationalGraphPrimer-1.1.2.html'
__copyright__ = "(C) 2023 Avinash Kak. Python Software Foundation."
__doc__ = '''
ComputationalGraphPrimer.py
Version: ''' + __version__ + '''
Author: Avinash Kak (kak@purdue.edu)
Date: ''' + __date__ + '''
@title
CHANGE LOG:
Version 1.1.2:
Fixes a couple of additional bugs, one dealing with the misalignment of
variables and parameters in the backpropagation logic for the one-neuron
case, and the other with the example script "verify_with_torchnn.py" not
working on account of the previous changes to the module code. Thanks to
all who discovered these bugs.
Version 1.1.1:
Includes a bug fix in the parser for the expressions that define a
multi-layer neural network. The bug was causing the input nodes to be
replicated when parsing the expressions at each node in the second layer of
such a network. Many thanks to the student who discovered the bug.
Version 1.1.0:
The network visualization code in this version figures out the network
layout from the specification of the network in a user's script.
Previously, I had hand coded the visualization code for the specific
networks in the Examples directory of the distribution.
Version 1.0.9:
I have added to the in-code documentation to make it easier to understand
the implementation code. And wherever appropriate, I have also included
pointers to my Week 3 slides for the DL class at Purdue for more detailed
explanations of the logic implemented in a code fragment. Additionally, I
have incorporated better graph and neural-network display routines in the
module.
Version 1.0.8:
To make the code in the Example scripts easier to understand, the random
training data generator now returns the data that you can subsequently
supply to the training function. (See the scripts in the Examples directory
for what I mean.) I have also provided additional comments in the main
class file to help understand the code better.
Version 1.0.7:
I have fixed some documentation errors in this version and also cleaned up
the code by deleting functions not used in the demos. The basic codebase
remains the same as in 1.0.6.
Version 1.0.6:
This version includes a demonstration of how to extend PyTorch's Autograd
class if you wish to customize how the learnable parameters are updated
during backprop on the basis of the data conditions discovered during the
forward propagation. Previously this material was in the DLStudio module.
Version 1.0.5:
I have been experimenting with different ideas for increasing the tutorial
appeal of this module. (That is the reason for the jump in the version
number from 1.0.2 to the current 1.0.5.) The previous public version
provided a simple demonstration of how one could forward propagate data
through a DAG (Directed Acyclic Graph) while at the same compute the partial
derivatives that would be needed subsequently during the backpropagation
step for updating the values of the learnable parameters. In 1.0.2, my goal
was just to illustrate what was meant by a DAG and how to use such a
representation for forward data flow and backward parameter update. Since I
had not incorporated any nonlinearities in such networks, there was
obviously no real learning taking place. That fact was made evident by a
plot of training loss versus iterations.
To remedy this shortcoming of the previous public-release version, the
current version introduces two special cases of networks --- a one-neuron
network and a multi-neuron network --- for experimenting with forward
propagation of data while calculating the partial derivatives needed later,
followed by backpropagation of the prediction errors for updating the values
of the learnable parameters. In both cases, I have used the Sigmoid
activation function at the nodes. The partial derivatives that are
calculated in the forward direction are based on analytical formulas at both
the pre-activation point for data aggregation and the post-activation point.
The forward and backward calculations incorporate smoothing of the
prediction errors and the derivatives over a batch as required by stochastic
gradient descent.
Version 1.0.2:
This version reflects the change in the name of the module that was
initially released under the name CompGraphPrimer with version 1.0.1
@title
INTRODUCTION:
This module was created with a modest goal in mind: its purpose being merely
to serve as a prelude to discussing automatic calculation of the gradients
as the input training data forward propagates through a network of nodes
that form a directed acyclic graph (DAG).
Most students taking classes on deep learning focus on just using the tools
provided by platforms such as PyTorch without any understanding of how the
tools really work. Consider, for example, Autograd --- a module that is at
the heart of PyTorch --- for automatic calculation of the Jacobian of the
output of a layer with respect to all the learnable parameters in the layer.
With no effort on the part of the programmer, and through the functionality
built into the "torch.Tensor" class, the Autograd module keeps track of a
tensor through all calculations involving the tensor and computes the
partial derivatives of the output of the layer with respect to the
parameters stored in the tensor. These derivatives are subsequently used to
update the values of the learnable parameters during the backpropagation
step.
1) To introduce you to forward and backward dataflows in a Directed Acyclic
Graph (DAG).
2) To extend the material developed for the first goal with simple examples
of neural networks for demonstrating the forward and backward
dataflows for the purpose of updating the learnable parameters. This
part of the module also includes a comparison of the performance of
such networks with those constructed using torch.nn components.
3) To explain how the behavior of PyTorch's Autograd class can be customized
to your specific data needs by extending that class.
GOAL 1:
The first goal of this Primer is to introduce you to forward and backward
dataflows in a general DAG. The acronym DAG stands for Directed Acyclic
Graph. Just for the educational value of playing with dataflows in DAGs,
this module allows you to create a DAG of variables with a statement like
expressions = ['xx=xa^2',
'xy=ab*xx+ac*xa',
'xz=bc*xx+xy',
'xw=cd*xx+xz^3']
where we assume that a symbolic name that beings with the letter 'x' is a
variable, all other symbolic names being learnable parameters, and where we
use '^' for exponentiation. The four expressions shown above contain five
variables --- 'xx', 'xa', 'xy', 'xz', and 'xw' --- and four learnable
parameters: 'ab', 'ac', 'bc', and 'cd'. The DAG that is generated by these
expressions looks like:
________________________________
/ \
/ \
/ xx=xa**2 v
xa --------------> xx -----------------> xy xy = ab*xx + ac*xa
| \ |
| \ |
| \ |
| \ |
| \_____________ |
| | |
| V V
\ xz
\ / xz = bc*xx + xy
\ /
-----> xw <----
xw = cd*xx + xz
By the way, you can call 'display_network2()' on an instance of
ComputationalGraphPrimer to make a much better looking plot of the network
graph for any DAG created by the sort of expressions shown above.
In the DAG shown above, the variable 'xa' is an independent variable since
it has no incoming arcs, and 'xw' is an output variable since it has no
outgoing arcs. A DAG of the sort shown above is represented in
ComputationalGraphPrimer by two dictionaries: 'depends_on' and 'leads_to'.
Here is what the 'depends_on' dictionary would look like for the DAG shown
above:
depends_on['xx'] = ['xa']
depends_on['xy'] = ['xa', 'xx']
depends_on['xz'] = ['xx', 'xy']
depends_on['xw'] = ['xx', 'xz']
Something like "depends_on['xx'] = ['xa']" is best read as "the vertex 'xx'
depends on the vertex 'xa'." Similarly, the "depends_on['xz'] = ['xx',
'xy']" is best read aloud as "the vertex 'xz' depends on the vertices 'xx'
and 'xy'." And so on.
Whereas the 'depends_on' dictionary is a complete description of a DAG, for
programming convenience, ComputationalGraphPrimer also maintains another
representation for the same graph, as provide by the 'leads_to' dictionary.
This dictionary for the same graph as shown above would be:
leads_to['xa'] = ['xx', 'xy']
leads_to['xx'] = ['xy', 'xz', 'xw']
leads_to['xy'] = ['xz']
leads_to['xz'] = ['xw']
The "leads_to[xa] = [xx]" is best read as "the outgoing edge at the node
'xa' leads to the node 'xx'." Along the same lines, the "leads_to['xx'] =
['xy', 'xz', 'xw']" is best read as "the outgoing edges at the vertex 'xx'
lead to the vertices 'xy', 'xz', and 'xw'.
Given a computational graph like the one shown above, we are faced with the
following questions: (1) How to propagate the information from the
independent nodes --- that we can refer to as the input nodes --- to the
output nodes, these being the nodes with only incoming edges? (2) As the
information flows in the forward direction, meaning from the input nodes to
the output nodes, is it possible to estimate the partial derivatives that
apply to each link in the graph? And, finally, (3) Given a scalar value at
an output node (which could be the loss estimated at that node), can the
partial derivatives estimated during the forward pass be used to
backpropagate the loss?
Consider, for example, the directed link between the node 'xy' and node
'xz'. As a variable, the value of 'xz' is calculated through the formula
"xz = bc*xx + xy". In the forward propagation of information, we estimate
the value of 'xz' from currently known values for the learnable parameter
'bc' and the variables 'xx' and 'xy'. In addition to the value of the
variable at the node 'xz', we are also interested in the value of the
partial derivative of 'xz' with respect to the other variables that it
depends on --- 'xx' and 'xy' --- and also with respect to the parameter it
depends on, 'bc'. For the calculation of the derivatives, we have a
choice: We can either do a bit of computer algebra and figure out that the
partial of 'xz' with respect to 'xx' is equal to the current value for
'bc'. Or, we can use the small finite difference method for doing the
same, which means that (1) we calculate the value of 'xz' for the current
value of 'xx', on the one hand, and, on the other, for 'xx' plus a delta;
(2) take the difference of the two; and (3) divide the difference by the
delta. ComputationalGraphPrimer module uses the finite differences method
for estimating the partial derivatives.
Since we have two different types of partial derivatives, partial of a
variable with respect to another variable, and the partial of a variable
with respect a learnable parameter, ComputationalGraphPrimer uses two
different dictionaries for storing this partials during each forward pass.
Partials of variables with respect to other variables as encountered during
forward propagation are stored in the dictionary "partial_var_to_var" and
the partials of the variables with respect to the learnable parameters are
stored in the dictionary partial_var_to_param. At the end of each forward
pass, the relevant partials extracted from these dictionaries are used to
estimate the gradients of the loss with respect to the learnable
parameters, as illustrated in the implementation of the method
train_on_all_data().
While the exercise mentioned above is good for appreciating data flows in a
general DAG, you've got to realize that, with today's algorithms, it would
be impossible to carry out any learning in a general DAG. A general DAG
with millions of learnable parameters would not lend itself to a fast
calculation of the partial derivatives that are needed during the
backpropagation step. Since the exercise described above is just to get
you thinking about data flows in DAGs and nothing else, I have not bothered
to include any activation functions in the DAG demonstration code in this
Primer.
GOAL 2:
That brings us to the second major goal of this Primer module:
To provide examples of simple neural structures in which the required
partial derivatives are calculated during forward data propagation and
subsequently used for parameter update during the backpropagation of
loss.
In order to become familiar with how this is done in the module, your best
place to start would be the following two scripts in the Examples directory
of the distribution:
one_neuron_classifier.py
multi_neuron_classifier.py
The first script, "one_neuron_classifier.py", invokes the following
function from the module:
run_training_loop_one_neuron_model()
This function, in turn, calls the following functions, the first for
forward propagation of the data, and the second for the backpropagation of
loss and updating of the parameters values:
forward_prop_one_neuron_model()
backprop_and_update_params_one_neuron_model()
The data that is forward propagated to the output node is subject to
Sigmoid activation. The derivatives that are calculated during forward
propagation of the data include the partial 'output vs. input' derivatives
for the Sigmoid nonlinearity. The backpropagation step implemented in the
second of the two functions listed above includes averaging the partial
derivatives and the prediction errors over a batch of training samples, as
required by SGD.
The second demo script in the Examples directory,
"multi_neuron_classifier.py" creates a neural network with a hidden layer
and an output layer. Each node in the hidden layer and the node in the
output layer are all subject to Sigmoid activation. This script invokes
the following function of the module:
run_training_loop_multi_neuron_model()
And this function, in turn, calls upon the following two functions, the
first for forward propagating the data and the second for the
backpropagation of loss and updating of the parameters:
forward_prop_multi_neuron_model()
backprop_and_update_params_multi_neuron_model()
In contrast with the one-neuron demo, in this case, the batch-based data
that is output by the forward function is sent directly to the backprop
function. It then becomes the job of the backprop function to do the
averaging needed for SGD.
In the Examples directory, you will also find the following script:
verify_with_torchnn.py
The idea for this script is to serve as a check on the performance of the
main demo scripts "one_neuron_classifier.py" and
"multi_neuron_classifier.py". Note that you cannot expect the performance
of my one-neuron and multi-neuron scripts to match what you would get from
similar networks constructed with components drawn from "torch.nn". One
primary reason for that is that "torch.nn" based code uses the
state-of-the-art optimization of the steps in the parameter hyperplane,
with is not the case with my demo scripts. Nonetheless, a comparison with
the "torch.nn" is important for general trend of how the training loss
varies with the iterations. That is, if the "torch.nn" based script showed
decreasing loss (indicated that learning was taking place) while that was
not the case with my one-neuron and multi-neuron scripts, that would
indicate that perhaps I had made an error in either the computation of the
partial derivatives during the forward propagation of the data, or I had
used the derivatives for updating the parameters.
GOAL 3:
The goal here is to show how to extend PyTorch's Autograd class if you want
to endow it with additional functionality. Let's say that you wish for some
data condition to be remembered during the forward propagation of the data
through a network and then use that condition to alter in some manner how
the parameters would be updated during backpropagation of the prediction
errors. This can be accomplished by subclassing from Autograd and
incorporating the desired behavior in the subclass. As to how how you can
extend Autograd is demonstrated by the inner class AutogradCustomization in
this module. Your starting point for understanding what this class does
would be the script
extending_autograd.py
in the Examples directory of the distribution.
@title
INSTALLATION:
The ComputationalGraphPrimer class was packaged using setuptools. For
installation, execute the following command in the source directory (this is
the directory that contains the setup.py file after you have downloaded and
uncompressed the package):
sudo python3 setup.py install
On Linux distributions, this will install the module file at a location that
looks like
/usr/local/lib/python3.8/dist-packages/
If you do not have root access, you have the option of working directly off
the directory in which you downloaded the software by simply placing the
following statements at the top of your scripts that use the
ComputationalGraphPrimer class:
import sys
sys.path.append( "pathname_to_ComputationalGraphPrimer_directory" )
To uninstall the module, simply delete the source directory, locate where
the ComputationalGraphPrimer module was installed with "locate
ComputationalGraphPrimer" and delete those files. As mentioned above, the
full pathname to the installed version is likely to look like
"/usr/local/lib/python3.8/dist-packages/".
If you want to carry out a non-standard install of the
ComputationalGraphPrimer module, look up the on-line information on Disutils
by pointing your browser to
http://docs.python.org/dist/dist.html
@title
USAGE:
Construct an instance of the ComputationalGraphPrimer class as follows:
from ComputationalGraphPrimer import *
cgp = ComputationalGraphPrimer(
expressions = ['xx=xa^2',
'xy=ab*xx+ac*xa',
'xz=bc*xx+xy',
'xw=cd*xx+xz^3'],
output_vars = ['xw'],
dataset_size = 10000,
learning_rate = 1e-6,
grad_delta = 1e-4,
display_loss_how_often = 1000,
)
cgp.parse_expressions()
cgp.display_network2()
cgp.gen_gt_dataset(vals_for_learnable_params = {'ab':1.0, 'bc':2.0, 'cd':3.0, 'ac':4.0})
cgp.train_on_all_data()
cgp.plot_loss()
@title
CONSTRUCTOR PARAMETERS:
batch_size: Introduced in Version 1.0.5 for demonstrating forward
propagation of the input data while calculating the partial
derivatives needed during backpropagation of loss. For SGD,
updating the parameters involves smoothing the derivatives
over the training samples in a batch. Hence the need for
batch_size as a constructor parameter.
dataset_size: Although the networks created by an arbitrary set of
expressions are not likely to allow for any true learning of
the parameters, nonetheless the ComputationalGraphPrimer
allows for the computation of the loss at the output nodes
and backpropagation of the loss to the other nodes. To
demonstrate this, we need a ground-truth set of input/output
values for given value for the learnable parameters. The
constructor parameter 'dataset_size' refers to how may of
these 'input/output' pairs would be generated for such
experiments.
For the one-neuron and multi-neuron demos introduced in
Version 1.0.5, the constructor parameter dataset_size refers
to many tuples of randomly generated data should be made
available for learning. The size of each data tuple is
deduced from the the first expression in the list made
available to module through the parameter 'expressions'
described below.
display_loss_how_often: This controls how often you will see the result of
the calculations being carried out in the computational
graph. Let's say you are experimenting with 10,000
input/output samples for propagation in the network, if you
set this constructor option to 1000, you will see the
partial derivatives and the values for the learnable
parameters every 1000 passes through the graph.
expressions: These expressions define the computational graph. The
expressions are based on the following assumptions: (1) any
variable name must start with the letter 'x'; (2) a symbolic
name that does not start with 'x' is a learnable parameter;
(3) exponentiation operator is '^'; (4) the symbols '*',
'+', and '-' carry their usual arithmetic meanings.
grad_delta: This constructor option sets the value of the delta to be used
for estimating the partial derivatives with the finite
difference method.
layers_config: Introduced in Version 1.0.5 for the multi-neuron demo. Its
value is a list of nodes in each layer of the network. Note
that I consider the input to the neural network as a layer
unto itself. Therefore, if the value of the parameter
num_layers is 3, the list you supply for layers_config must
have three numbers in it.
learning_rate: Carries the usual meaning for updating the values of the
learnable parameters based on the gradients of the
output of a layer with respect to those parameters.
num_layers: Introduced in Version 1.0.5 for the multi-neuron demo. It
is merely a convenience parameter that indicated the
number of layers in your multi-neuron network. For the
purpose of counting layers, I consider the input as a
layer unto itself.
one_neuron_model: Introduced in Version 1.0.5. This boolean parameter is
needed only when you are constructing a one-neuron demo. I
needed this constructor parameter for some conditional
evaluations in the "parse_expressions()" method of the
module. I use that expression parser for both the older
demos and the new demo based on the one-neuron model.
output_vars: Although the parser has the ability to figure out which nodes
in the computational graph represent the output variables
--- these being nodes with no outgoing arcs --- you are
allowed to designate the specific output variables you are
interested in through this constructor parameter.
training_iterations: Carries the expected meaning.
@title
PUBLIC METHODS:
(1) backprop_and_update_params_one_neuron_model():
Introduced in Version 1.0.5. This method is called by
run_training_loop_one_neuron_model() for backpropagating the loss
and updating the values of the learnable parameters.
(2) backprop_and_update_params_multi_neuron_model():
Introduced in Version 1.0.5. This method is called by
run_training_loop_multi_neuron_model() for backpropagating the
loss and updating the values of the learnable parameters.
(3) display_network2():
This method calls on the networkx module to construct a visual
display of the computational graph.
(4) forward_propagate_one_input_sample_with_partial_deriv_calc():
This method is used for pushing the input data forward through a
general DAG and at the same computing the partial derivatives that
would be needed during backpropagation for updating the values of
the learnable parameters.
(5) forward_prop_one_neuron_model():
Introduced in Version 1.0.5. This function propagates the input
data through a one-neuron network. The data aggregated at the
neuron is subject to a Sigmoid activation. The function also
calculates the partial derivatives needed during backprop.
(6) forward_prop_multi_neuron_model():
Introduced in Version 1.0.5. This function does the same thing as
the previous function, except that it is intended for a multi-layer
neural network. The pre-activation values at each neuron are
subject to the Sigmoid nonlinearity. At the same time, the partial
derivatives are calculated and stored away for use during backprop.
(7) gen_gt_dataset()
This method generates the training data for a general graph of
nodes in a DAG. For random values at the input nodes, it
calculates the values at the output nodes assuming certain given
values for the learnable parameters in the network. If it were
possible to carry out learning in such a network, the goal would
to see if the value of those parameters would be learned
automatically as in a neural network.
(8) gen_training_data():
Introduced in Version 1.0.5. This function generates training data
for the scripts "one_neuron_classifier.py",
"multi_neuron_classifier.py" and "verify_with_torchnn.py" scripts
in the Examples directory of the distribution. The data
corresponds to two classes defined by two different multi-variate
distributions. The dimensionality of the data is determined
entirely the how many nodes are found by the expression parser in
the list of expressions that define the network.
(9) parse_expressions()
This method parses the expressions provided and constructs a DAG
from them for the variables and the parameters in the expressions.
It is based on the convention that the names of all variables
begin with the character 'x', with all other symbolic names being
treated as learnable parameters.
(10) parse_multi_layer_expressions():
Introduced in Version 1.0.5. Whereas the previous method,
parse_expressions(), works well for creating a general DAG and for
the one-neuron model, it is not meant to capture the layer based
structure of a neural network. Hence this method.
(11) run_training_loop_one_neuron_model():
Introduced in Version 1.0.5. This is the main function in the
module for the demo based on the one-neuron model. The demo
consists of propagating the input values forward, aggregating them
at the neuron, and subjecting the result to Sigmoid activation.
All the partial derivatives needed for updating the link weights
are calculating the forward propagation. This includes the
derivatives of the output vis-a-vis the input at the Sigmoid
activation. Subsequently, during backpropagation of the loss, the
parameter values are updated using the derivatives stored away
during forward propagation.
(12) run_training_loop_multi_neuron_model()
Introduced in Version 1.0.5. This is the main function for the
demo based on a multi-layer neural network. As each batch of
training data is pushed through the network, the partial derivatives
of the output at each layer is computed with respect to the
parameters. This calculating includes computing the partial
derivatives at the output of the activation function with respect
to its input. Subsequently, during backpropagation, first
batch-based smoothing is applied to the derivatives and the
prediction errors stored away during forward propagation in order
to comply with the needs of SGD and the values of the learnable
parameters updated.
(13) run_training_with_torchnn():
Introduced in Version 1.0.5. The purpose of this function is to
use comparable network components from the torch.nn module in
order to "authenticate" the performance of the handcrafted
one-neuron and the multi-neuron models in this module. All that
is meant by "authentication" here is that if the torch.nn based
networks show the training loss decrease with iterations, you
would the one-neuron and the multi-neuron models to show similar
results. This function contains the following inner classes:
class OneNeuronNet( torch.nn.Module )
class MultiNeuronNet( torch.nn.Module )
that define networks similar to the handcrafted one-neuron and
multi-neuron networks of this module.
(14) train_on_all_data()
The purpose of this function is to call forward propagation and
backpropagation functions of the module for the demo based on
arbitrary DAGs.
(15) plot_loss()
This is only used by the functions that DAG based demonstration code
in the module. The training functions introduced in Version 1.0.5 have
embedded code for plotting the loss as a function of iterations.
@title
THE Examples DIRECTORY:
The Examples directory of the distribution contains the following the
following scripts:
1. graph_based_dataflow.py
This demonstrates forward propagation of input data and
backpropagation in a general DAG (Directed Acyclic Graph).
The forward propagation involves estimating the partial
derivatives that would subsequently be used for "updating" the
learnable parameters during backpropagation. Since I have not
incorporated any activations in the DAG, you can really not
expect any real learning to take place in this demo. The
purpose of this demo is just to illustrate what is meant by a
DAG and how information can flow forwards and backwards in
such a network.
2. one_neuron_classifier.py
This script demonstrates the one-neuron model in the module.
The goal is to show forward propagation of data through the
neuron (which includes the Sigmoid activation), while
calculating the partial derivatives needed during the
backpropagation step for updating the parameters.
3. multi_neuron_classifier.py
This script generalizes what is demonstrated by the one-neuron
model to a multi-layer neural network. This script
demonstrates saving the partial-derivative information
calculated during the forward propagation through a
multi-layer neural network and using that information for
backpropagating the loss and for updating the values of the
learnable parameters.
4. verify_with_torchnn.py
The purpose of this script is just to verify that the results
obtained with the scripts "one_neuron_classifier.py" and
"multi_neuron_classifier.py" are along the expected lines.
That is, if similar networks constructed with the torch.nn
module show the training loss decreasing with iterations, you
would expect the similar learning behavior from the scripts
"one_neuron_classifier.py" and "multi_neuron_classifier.py".
5. extending_autograd.py
This provides a demo example of the recommended approach for
giving additional functionality to Autograd. See the
explanation in the doc section associated with the inner
class AutogradCustomization of this module for further info.
@title
BUGS:
Please notify the author if you encounter any bugs. When sending email,
please place the string 'Computational Graph Primer' in the subject line to
get past the author's spam filter.
@title
ACKNOWLEDGMENTS:
Akshita Kamsali's help with improving the quality of the network graph
visualization code is much appreciated. Akshita is working on her Ph.D. in
Robot Vision Lab at Purdue.
@title
ABOUT THE AUTHOR:
The author, Avinash Kak, is a professor of Electrical and Computer
Engineering at Purdue University. For all issues related to this
module, contact the author at kak@purdue.edu If you send email, please
place the string "ComputationalGraphPrimer" in your subject line to get
past the author's spam filter.
@title
COPYRIGHT:
Python Software Foundation License
Copyright 2023 Avinash Kak
@endofdocs
'''
import sys,os,os.path
import numpy as np
import re
import operator
import itertools
import math
import random
import torch
from collections import deque
import copy
import matplotlib.pyplot as plt
import networkx as nx
class Exp:
'''
With CGP, you can handcraft a neural network (actually you can handcraft any DAG) by designating
the nodes and the links between them with expressions like
expressions = [ 'xx=xa^2',
'xy=ab*xx+ac*xa',
'xz=bc*xx+xy',
'xw=cd*xx+xz^3' ]
In these expressions, names beginning with 'x' denote the nodes in the DAG, and the names beginning with
lowercase letters like 'a', 'b', 'c', etc., designate the learnable parameters. The variable on the
left of the '=' symbol is considered to be the dependent_var and those on the right are, as you guessed,
the right_vars. Since the learnable parameters are always on the right of the equality sign, we refer
to them as right_params in what is shown below. The expressions shown above are parsed by the
parser function in CGP. The parser outputs an instance of the Exp class for each expression of the
sort shown above. What is shown above has 4 expressions for creating a DAG. Of course, you can have
any number of them.
'''
def __init__(self, exp, body, dependent_var, right_vars, right_params):
self.exp = exp
self.body = body
self.dependent_var = dependent_var
self.right_vars = right_vars
self.right_params = right_params
#______________________________ ComputationalGraphPrimer Class Definition ________________________________
class ComputationalGraphPrimer(object):
def __init__(self, *args, **kwargs ):
if args:
raise ValueError(
'''ComputationalGraphPrimer constructor can only be called with keyword arguments for
the following keywords: expressions, output_vars, dataset_size, grad_delta,
learning_rate, display_loss_how_often, one_neuron_model, training_iterations,
batch_size, num_layers, layers_config, epochs, for_verification_only and debug''')
expressions = output_vars = dataset_size = grad_delta = display_loss_how_often = learning_rate = None
one_neuron_model = training_iterations = batch_size = num_layers = layers_config = epochs = for_verification_only = debug = None
if 'one_neuron_model' in kwargs : one_neuron_model = kwargs.pop('one_neuron_model')
if 'batch_size' in kwargs : batch_size = kwargs.pop('batch_size')
if 'num_layers' in kwargs : num_layers = kwargs.pop('num_layers')
if 'layers_config' in kwargs : layers_config = kwargs.pop('layers_config')
if 'expressions' in kwargs : expressions = kwargs.pop('expressions')
if 'output_vars' in kwargs : output_vars = kwargs.pop('output_vars')
if 'dataset_size' in kwargs : dataset_size = kwargs.pop('dataset_size')
if 'learning_rate' in kwargs : learning_rate = kwargs.pop('learning_rate')
if 'training_iterations' in kwargs : training_iterations = kwargs.pop('training_iterations')
if 'grad_delta' in kwargs : grad_delta = kwargs.pop('grad_delta')
if 'display_loss_how_often' in kwargs : display_loss_how_often = kwargs.pop('display_loss_how_often')
if 'epochs' in kwargs : epochs = kwargs.pop('epochs')
if 'for_verification_only' in kwargs : for_verification_only = kwargs.pop('for_verification_only')
if 'debug' in kwargs : debug = kwargs.pop('debug')
if len(kwargs) != 0: raise ValueError('''You have provided unrecognizable keyword args''')
self.one_neuron_model = True if one_neuron_model is not None else False
if training_iterations:
self.training_iterations = training_iterations
self.batch_size = batch_size if batch_size else 4
self.num_layers = num_layers
self.for_verification_only = for_verification_only
if layers_config:
self.layers_config = layers_config
if expressions:
self.expressions = expressions
if output_vars:
self.output_vars = output_vars
if dataset_size:
self.dataset_size = dataset_size
if learning_rate:
self.learning_rate = learning_rate
else:
self.learning_rate = 1e-6
if grad_delta:
self.grad_delta = grad_delta
else:
self.grad_delta = 1e-4
if display_loss_how_often:
self.display_loss_how_often = display_loss_how_often
if dataset_size:
self.dataset_input_samples = {i : None for i in range(dataset_size)}
self.true_output_vals = {i : None for i in range(dataset_size)}
self.vals_for_learnable_params = None
self.epochs = epochs
if debug:
self.debug = debug
else:
self.debug = 0
self.gradient_of_loss = None
self.gradients_for_learnable_params = None
self.expressions_dict = {}
self.LOSS = [] ## loss values for all iterations of training
self.all_vars = set()
if (one_neuron_model is True) or (num_layers is None) or (for_verification_only is True):
self.independent_vars = []
self.learnable_params = []
else:
self.independent_vars = set()
self.learnable_params = set()
self.dependent_vars = {}
self.depends_on = {} ## See Introduction for the meaning of this
self.leads_to = {} ## See Introduction for the meaning of this
self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
def parse_expressions(self):
'''
This method creates a DAG from a set of expressions that involve variables and learnable
parameters. The expressions are based on the assumption that a symbolic name that starts
with the letter 'x' is a variable, with all other symbolic names being learnable parameters.
The computational graph is represented by two dictionaries, 'depends_on' and 'leads_to'.
To illustrate the meaning of the dictionaries, something like "depends_on['xz']" would be
set to a list of all other variables whose outgoing arcs end in the node 'xz'. So
something like "depends_on['xz']" is best read as "node 'xz' depends on ...." where the
dots stand for the array of nodes that is the value of "depends_on['xz']". On the other
hand, the 'leads_to' dictionary has the opposite meaning. That is, something like
"leads_to['xz']" is set to the array of nodes at the ends of all the arcs that emanate
from 'xz'.
'''
self.exp_objects = []
self.var_to_int_labels = {} ## needed for DAG visualization with networkx
self.var_to_var_param = {}
node_int_label = 0 ## initialize int labels
for exp in self.expressions:
left,right = exp.split('=')
self.expressions_dict[left] = right
self.depends_on[left] = []
parts = re.findall('([a-zA-Z]+)', right)
parts_in_pairs_dict = { parts[2*i+1] : parts[2*i] for i in range(len(parts)//2) } ## for DAG visualiztion
self.var_to_var_param[left] = parts_in_pairs_dict ## for DAG visualization
right_vars = []
right_params = []
for part in parts:
if part.startswith('x'):
self.all_vars.add(part)
self.depends_on[left].append(part)
right_vars.append(part)
self.var_to_int_labels[part] = node_int_label
node_int_label += 1
else:
if (self.one_neuron_model is True) or (self.for_verification_only is True):
self.learnable_params.append(part)
else:
self.learnable_params.add(part)
right_params.append(part)
self.var_to_int_labels[left] = node_int_label
self.all_vars.add(left)
exp_obj = Exp(exp, right, left, right_vars, right_params)
self.exp_objects.append(exp_obj)
if self.debug:
print("\n\nall variables: %s" % str(self.all_vars))
print("\n\nlearnable params: %s" % str(self.learnable_params))
print("\n\ndependencies: %s" % str(self.depends_on))
print("\n\nexpressions dict: %s" % str(self.expressions_dict))
print("\n\nvar_to_var_param dict: ", self.var_to_var_param)
print("\n\nnode to int labels: ", self.var_to_int_labels)
for var in self.all_vars:
if var not in self.depends_on: # that is, var is not a key in the depends_on dict
if (self.one_neuron_model is True) or (self.for_verification_only is True):
self.independent_vars.append(var)
else:
self.independent_vars.add(var)
self.input_size = len(self.independent_vars)
if self.debug:
print("\n\nindependent vars: %s" % str(self.independent_vars))
self.dependent_vars = [var for var in self.all_vars if var not in self.independent_vars]
self.output_size = len(self.dependent_vars)
self.leads_to = {var : set() for var in self.all_vars}
for k,v in self.depends_on.items():
for var in v:
self.leads_to[var].add(k)
if self.debug:
print("\n\nleads_to dictionary: %s" % str(self.leads_to))
def parse_multi_layer_expressions(self):
'''
This method is a modification of the previous expression parser and meant specifically
for the case when a given set of expressions are supposed to define a multi-layer neural
network. The naming conventions for the variables, which designate the nodes in the layers
of the network, and the learnable parameters remain the same as in the previous function.
'''
self.exp_objects = []
self.var_to_int_labels = {} ## needed for DAG visualization with networkx
self.var_to_var_param = {}
node_int_label = 0 ## initialize int labels
self.layer_expressions = { i : [] for i in range(1,self.num_layers) }
self.layer_exp_objects = { i : [] for i in range(1,self.num_layers) }
## A deque is a double-ended queue in which elements can inserted and deleted at both ends.
all_expressions = deque(self.expressions)
for layer_index in range(self.num_layers - 1):
for node_index in range(self.layers_config[layer_index+1]):
self.layer_expressions[layer_index+1].append( all_expressions.popleft() )
print("\n\nself.layer_expressions: ", self.layer_expressions)
self.layer_vars = {i : [] for i in range(self.num_layers)} # layer indexing starts at 0
self.layer_params = {i : [] for i in range(1,self.num_layers)} # layer indexing starts at 1
for layer_index in range(1,self.num_layers):
for exp in self.layer_expressions[layer_index]:
left,right = exp.split('=')
self.expressions_dict[left] = right
self.depends_on[left] = []
parts = re.findall('([a-zA-Z]+)', right)
parts_in_pairs_dict = { parts[2*i+1] : parts[2*i] for i in range(len(parts)//2) }
self.var_to_var_param[left] = parts_in_pairs_dict
right_vars = []
right_params = []
for part in parts:
if part.startswith('x'):
self.all_vars.add(part)
self.depends_on[left].append(part)
right_vars.append(part)
if part not in self.var_to_int_labels:
self.var_to_int_labels[part] = node_int_label
node_int_label += 1
else:
self.learnable_params.add(part)
right_params.append(part)
self.layer_vars[layer_index-1] = right_vars
self.layer_vars[layer_index].append(left)
self.layer_params[layer_index].append(right_params)
self.var_to_int_labels[left] = node_int_label
node_int_label += 1
self.all_vars.add(left)
exp_obj = Exp(exp, right, left, right_vars, right_params)
self.layer_exp_objects[layer_index].append(exp_obj)
if self.debug:
print("\n\n[layer index: %d] all variables: %s" % (layer_index, str(self.all_vars)))
print("\n\n[layer index: %d] learnable params: %s" % (layer_index, str(self.learnable_params)))
print("\n\n[layer index: %d] dependencies: %s" % (layer_index, str(self.depends_on)))
print("\n\n[layer index: %d] expressions dict: %s" % (layer_index, str(self.expressions_dict)))
print("\n\n[layer index: %d] var_to_var_param dict: %s" % (layer_index, str(self.var_to_var_param)))
print("\n\n[layer index: %d] node to int labels: %s" % (layer_index, str(self.var_to_int_labels)))
for var in self.all_vars:
if var not in self.depends_on: # that is, var is not a key in the depends_on dict
self.independent_vars.add(var)
self.input_size = len(self.independent_vars)
if self.debug:
print("\n\n[layer index: %d] independent vars: %s" % (layer_index, str(self.independent_vars)))
self.dependent_vars = [var for var in self.all_vars if var not in self.independent_vars]
self.output_size = len(self.dependent_vars)
self.leads_to = {var : set() for var in self.all_vars}
for k,v in self.depends_on.items():
for var in v:
self.leads_to[var].add(k)
if self.debug:
print("\n\n[layer index: %d] leads_to dictionary: %s" % (layer_index, str(self.leads_to)))
print("\n\n[Final] independent vars: %s" % str(self.independent_vars))
print("\n\n[Final] self.layer_vars: ", self.layer_vars)
print("\n\n[Final] self.layer_params: ", self.layer_params)