-
Notifications
You must be signed in to change notification settings - Fork 0
/
feed.xml
1235 lines (1010 loc) · 98.6 KB
/
feed.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Blog Name</title>
<subtitle>Blog subtitle</subtitle>
<id>http://blog.url.com/posts</id>
<link href="http://blog.url.com/posts"/>
<link href="http://blog.url.com/feed.xml" rel="self"/>
<updated>2020-11-27T11:00:00-05:00</updated>
<author>
<name>Blog Author</name>
</author>
<entry>
<title>Concourse CI - Lessons</title>
<link rel="alternate" href="http://blog.url.com/posts/2020/11/28/building-oci-images-in-concourse/"/>
<id>http://blog.url.com/posts/2020/11/28/building-oci-images-in-concourse/</id>
<published>2020-11-27T11:00:00-05:00</published>
<updated>2020-12-06T12:02:19-05:00</updated>
<author>
<name>Article Author</name>
</author>
<content type="html"><p>I run a mid-sized <a href="https://concourse-ci.org/">Concourse CI</a> cluster for Tulip, that runs ~3000 fairly resource-intensive builds weekly.
I&rsquo;ve encountered a fair share of stability issues with it, some from lack of experience, some from real issues,
but overall, my experience with it has been fairly positive. I can&rsquo;t speak about
Github Actions, or TravisCI and CircleCI but my experience has been vastly better than that with Jenkins (another popular CI/CD tool).
It is open-source and is continuously improving with fairly frequent releases with good core contributing members. It helps with not getting
locked down by a specific platform such as Github Actions. I&rsquo;m actually surprised not more people are onboard this,
which is one of the reasons that prompted me to write this series.</p>
<p>Over the next couple posts or so, I&rsquo;ll be talking about some random topics related to Concourse. They might help with your decision to onboard
(or skip) Concourse. This first one might read like a rant on the issues of Concourse, but it really isn&rsquo;t :P</p>
<p>To clarify, they&rsquo;re more like lessons I&rsquo;ve learned about Concourse and how some tweaks might help with smoothing the running of a cluster.</p>
<h3>Infrastructure</h3>
<p>We use <a href="https://github.com/EngineerBetter/control-tower">EngineerBetter/ControlTower</a>, also formerly known as ConcourseUp for the initial setup.
The initial setup is fairly effortless (generally speaking, without deviation).
On top of that, we do most of the custom configuration for Concourse via <a href="https://bosh.io/docs/">Bosh</a>.
Bosh is also incharge of provisioning the different components such as the prometheus, atc, web, worker nodes.
It is also essentially self-healing because of bosh cloud-checks as well; any termination or deletion will automatically be replaced.</p>
<h3>Resources</h3>
<p>I&rsquo;ll be talking about resources in the next few sections, so here&rsquo;s a quick primer.</p>
<p>They&rsquo;re like versioned artifacts with external resources. To interface with the external source of truth, there are &ldquo;plugins&rdquo; that are called <code>resource_types</code>.
There are <a href="https://resource-types.concourse-ci.org/">a bunch of community built resource types</a> and they&rsquo;re an important contributor of Concourse&rsquo;s flexibility imo.</p>
<p>For example:</p>
<ul>
<li>the <a href="https://github.com/concourse/git-resource">git-resource</a> tracks commits in a Git repo</li>
<li><a href="https://github.com/concourse/registry-image-resource">registry-image</a> would manage images for docker registries.</li>
</ul>
<h3>Triggering builds for Pull Requests</h3>
<p>This is a really common usecase of CI/CD. Everytime a pull request is updated with a new commit in Github, a build is triggered to
do a range of tasks, from simple go lints, unit tests, to building artifacts to full-scale integration tests. This flow is achieved through webhooks
events from Github.</p>
<p>The receiver of those webhook events is a <a href="https://github.com/telia-oss/github-pr-resource">github-pr-resource</a> <code>resource_type</code> (or similar forks like
<a href="https://github.com/digitalocean/github-pr-resource">digitalocean&rsquo;s</a>).</p>
<p>You might imagine that a pipeline can be triggered immediately after it interprets the webhook event. It&rsquo;s worth clarifying
that this concept of triggering a pipeline is incorrect; that&rsquo;s not how it works in Concourse. Pipelines are basically just set of jobs
and they are all independently scheduled. New builds are created by the scheduler when it detects that a job&rsquo;s dependent (e.g. trigger: true)
resources have changed.</p>
<p>So, what really happens after it processes a webhook event?</p>
<p>The <code>pr-resource</code> queues a <code>check</code> that reaches out to Github to query for open pull-requests updates
through their GraphQL API (filtering from the latest update in a previous pull). From there, it updates the lists of versions
for the resource and rely on the scheduler to do the rest as mentioned above^.</p>
<p>To be honest, this flow isn&rsquo;t one of the strong points of Concourse. It is somewhat awkward - leading to the perception that it is slower to
trigger than other popular builds, among other concerns, such as rate-limiting (if you have too many open pull-requests at one time).</p>
<p>For me though, I&rsquo;ll say that this setup has worked acceptably well.</p>
<p>And, this has been acknowledged as such by the core members and listed as a primary focus in the <a href="https://blog.concourse-ci.org/core-roadmap-towards-v10/">v10 roadmap</a>, which I&rsquo;m pretty excited about!</p>
<h3><code>default_check_interval</code> / <code>check_recycle_period</code> / Github Ratelimiting <a id="intervals" href="#intervals"></a></h3>
<p>Default (Bosh) setting for the <a href="https://concourse-ci.org/concourse-web.html">web node</a>&rsquo;s <code>default_check_interval</code> is 1 minute. This means that for every resource you define, you&rsquo;ll be running a check,
hitting whatever api that might be required. For example, for a <a href="https://github.com/concourse/git-resource">git-resource</a> that hits Github, each call counts towards the rate-limit that Github sets.
It&rsquo;s fairly high at 5000 per hour, but it is still exhaustible if you&rsquo;re not careful!</p>
<p>Relatedly, there is another setting in Bosh, for the web/scheduler node, <code>check_recycle_period</code> - which decides
how often the containers for resource checks are garbage-collected. The default is 6 hours.</p>
<p>Don&rsquo;t make the mistake (like me!) of drastically reducing this GC interval even if there might be containers used for checks lying around, doing nothing.
It depends on the implementation of the particular concourse resource but in my case, the <a href="https://github.com/concourse/git-resource">git-resource</a> would init and re-query
(history) of versions and end up consuming unnecessary calls to Github, which led to us getting rate-limited occasionally.</p>
<p>YMMV, but if you&rsquo;re using this resource, consider leaving it at a higher enough interval to take advantage of the caching!</p>
<h3>Container Placement Strategy</h3>
<p>We have resource-intensive jobs (across different pipelines) that can be triggered at the same time. When that happens, our cluster occasionally run into
resoure deprivation issues.</p>
<p>I&rsquo;ve tried the experimental feature <code>limit-active-tasks</code> - a <code>container_placement_strategy</code> that limits the number of tasks per worker. In my opinion,
that does not work well for clusters with varying types of workloads. It would inevitably end up blocking tasks that may not be resource-intensive.
An example is the periodic resource check, or worse, at times, it might only allow light tasks through and blocking tasks that could still fit well.</p>
<p>You can also do <code>volume-locality</code>, <code>random</code> and <code>fewest-build-containers</code> placements. We&rsquo;ve ultimately gone with <code>fewest-build-containers</code> because
we have CPU-intensive tasks, but I think every workload / situation is probably different and this is one of those settings
to consider tweaking when setting Concourse up or if you&rsquo;re seeing load-imbalance.</p>
<p>Sidenote: I believe this issue of load-imbalance is also going to be addressed in v10 as well!</p>
<h3>Resource Allocation</h3>
<p>This is obviously deeply related to the section above. If you run smaller nodes and can&rsquo;t have multiple (heavy) jobs run at the same time, you do have
a number of knobs to help you restrict these.</p>
<p>You can control the number of builds per job that happens at the same time, with <code>max_in_flight</code> (or <code>serial: true</code> for 1) at the job definition level.
If you would like all jobs that belong some specific category to run serially, you can group all of these jobs up and run them different groups serially.</p>
<pre class="highlight yaml"><code><span class="na">jobs</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">job-a</span>
<span class="na">serial_groups</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">some-tag</span><span class="pi">]</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">job-b</span>
<span class="na">serial_groups</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">some-tag</span><span class="pi">,</span> <span class="nv">some-other-tag</span><span class="pi">]</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">job-c</span>
<span class="na">serial_groups</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">some-other-tag</span><span class="pi">]</span>
</code></pre>
<p>Also, it&rsquo;s probably prudent to also define a default cpu/memory allocation with this <a href="https://bosh.io/jobs/web?source=github.com/concourse/concourse-bosh-release&amp;version=6.6.0#p%3ddefault_task_cpu_limit">Bosh settings</a> and then override each task with <code>container_limits</code>,
to avoid any rogue jobs just spinning out of control. Anecdotally, I had jobs that pegged and took down 4xlarge nodes; to be fair they were erlang/beam jobs that
are notorious for the amount of resources they demand.</p>
<pre class="highlight yaml"><code><span class="c1"># Bosh -&gt; web node defaults</span>
<span class="c1"># This is the equivalent of</span>
<span class="na">default_task_cpu_limit</span><span class="pi">:</span> <span class="s">256</span>
<span class="na">default_task_memory_limit</span><span class="pi">:</span> <span class="s">4GB</span>
</code></pre><pre class="highlight yaml"><code><span class="c1"># Override at the Job level</span>
<span class="na">container_limits</span><span class="pi">:</span>
<span class="na">cpu</span><span class="pi">:</span> <span class="s">256</span>
<span class="na">memory</span><span class="pi">:</span> <span class="s">1GB</span>
</code></pre>
<p>Do note that the CPU defined here is not the number of cores but the CPU shares. I believe the Concourse / system stuff running on each node runs
with a default of <code>512</code> so using <code>256</code> would slightly lower the priority of user-level jobs so important system processes don&rsquo;t get starved.</p>
<h3>(Storage) Volumes / Baggage Claims</h3>
<p>I&rsquo;ve discovered that, when the volumes choke up (IOPS or otherwise), Concourse baggage-claims (gc of volumes and such) seem to fail rather silently,
and you start having containers failing to schedule within the time limit.</p>
<p>I only really realized this when we went from many small(xlarge) EC2 nodes to fewer 4xlarge nodes and had our EBS volumes&rsquo; IOPS constantly get
pegged by certain IO intensive jobs. It was extremely surprising how much IOPS we needed (thanks, yarn). Many of our performance issues went away with this fixed.</p>
<p>I encourage people who are facing issues, just double-check this in their cluster as well.</p>
<h3>Overlay vs btrfs</h3>
<p>Concourse ships with btrfs by default. There are obviously things that btrfs does that overlay doesn&rsquo;t but it has stability issues. The problem set and
trade-offs are clearly talked about in <a href="https://github.com/concourse/concourse/issues/1045">this github issue</a> so I won&rsquo;t rehash them.</p>
<p>One thing I&rsquo;ll say though, I encourage people to switch over to overlay for most usecases.</p>
<h3>Next</h3>
<p>Again, this might have read like a rant on the problems, but it really is more like the things that I&rsquo;ve learned over running our cluster. To be honest,
alot of these are surrounding issues that are not Concourse specific per-se. And it is extremely positive in my opinion that the core team acknowledges
some of the real issues (that really still work reasonably well) and put real work towards them for v10.</p>
<p>In the next few posts, I&rsquo;ll go over in more technical details how you might do certain things, like</p>
<ul>
<li>building from PR, or</li>
<li>building docker OCI images in Concourse, or</li>
<li>running docker-in-docker in overlay</li>
<li>running a registry-mirror and using it in Concourse</li>
</ul>
</content>
</entry>
<entry>
<title>Monitoring Stack in Kubernetes, with Prometheus</title>
<link rel="alternate" href="http://blog.url.com/posts/2019/08/01/prometheus-monitoring-in-kubernetes/"/>
<id>http://blog.url.com/posts/2019/08/01/prometheus-monitoring-in-kubernetes/</id>
<published>2019-07-31T12:00:00-04:00</published>
<updated>2020-11-19T22:43:44-05:00</updated>
<author>
<name>Article Author</name>
</author>
<content type="html"><p>For the past year or so, I&rsquo;ve been working with DevOps in Tulip.
It&rsquo;s a fairly big change in direction but quite frankly, it&rsquo;s been a refreshing experience!</p>
<p>One of the first projects was to build a monitoring system for a number of different components
in our kubernetes cluster: various microservices, main monolith application, our ingress controller,
and the health of the cluster itself. I thought Prometheus fit fairly well with what we wanted, so we
went ahead with that. I would say that it has been served us pretty well!</p>
<h3>Prometheus</h3>
<p>Some context about Prometheus; it is a pull-based monitoring system that is sometimes compared to Nagios.
It&rsquo;s not event-based, so applications do not report each individual event to Prometheus as it happens (like SegmentIO).
Also, since its pull-based, we have to define all its targets (to scrape) in advance. Contrary to certain arguments, I actually think
this is a plus. It is more tedious and harder to set up, but it is also harder to get into a situation where it becomes a blackbox and
you have no idea what you&rsquo;re dumping into it. It&rsquo;s also easier to detect if a target is actually down vs a push-based system.</p>
<h3>Prometheus Operator</h3>
<p><a href="https://github.com/coreos/prometheus-operator">Prometheus Operator</a> is an open-source tool
that makes deploying a Prometheus stack (AlertManager, Grafana, Prometheus) so, much easier than hand-crafting the entire stack.
It helps generate a whole lot of boiler plates and pretty much reduces the entire deployment down to native
kubernetes declarations and YAML.</p>
<p>If you&rsquo;re familiar with Kubernetes, then you&rsquo;ve probably heard of custom resource definitions, or
CRDs in short. Think of them as definitions of objects like pods, deployments or daemonsets
that the cluster can understand and act on if needed. For the purpose of deploying a monitoring stack,
Prometheus Operator introduces 3 new CRDs - <code>Prometheus</code>, <code>AlertManager</code> and <code>ServiceMonitor</code>, and a controller
that is in-charge of deploying and connfiguring the respective services into the Kubernetes Cluster.</p>
<p>For example: if a prometheus CRD like the one below is present in the cluster, the prometheus-operator controller
would create a matching deployment of Prometheus into the kubernetes cluster, that in this case would also link
up with the alertmanager by that name in the monitoring namespace.</p>
<pre class="highlight plaintext"><code>apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
...
name: clustermon-prometheus-oper-prometheus
namespace: monitoring
spec:
alerting:
alertmanagers:
- name: clustermon-prometheus-oper-alertmanager
namespace: monitoring
pathPrefix: /
port: web
baseImage: quay.io/prometheus/prometheus
...
</code></pre>
<h3>kube-prometheus</h3>
<p>kube-prometheus used to be a set of contrib helm charts that utilizes the capabilities of the Prometheus Operator to deploy
an entire monitoring stack (with some assumptions and defaults ofc). It has since been absorbed into the main <a href="https://github.com/helm/charts/tree/master/stable/prometheus-operator">helm charts</a>
and moved to the official stable chart repository.</p>
<p>There are various exporters included such as: kube-dns, kube-state-metrics, node-exporter and many others
that are necessary to monitor the health of a Kubernetes cluster (and more). You can find the full list <a href="https://github.com/helm/charts/tree/master/stable/prometheus-operator/templates/exporters">here</a>.
It also has a simple set of kubernetes-mixins for Grafana as well (if you choose to install that).</p>
<h3>Overview</h3>
<p>This section gives a general idea of the components involved.</p>
<p>An important implementation decision that I&rsquo;ll like to point out is that Grafana is (mostly) stateless. Any new dashboards, or changes
would need to be commited to code; in general I think this conforms better to the <code>infrastructure-as-code</code> kind of idealogy which makes
it much easier to replicate the same infrastructure across multiple clouds / regions.</p>
<p><a href="https://homan.s3-ap-southeast-1.amazonaws.com/blog/monitoring-stack.png"><img src="https://homan.s3-ap-southeast-1.amazonaws.com/blog/monitoring-stack.png" title="Monitoring Stack" alt="Monitoring Stack" /></a></p>
<h3>Custom Helm Chart</h3>
<p>You can find a stripped down version fo the things I&rsquo;ll talk about in this repository: <a href="https://github.com/aranair/k8s-prometheus-operator-helm-example">k8s-prometheus-operator-helm-example</a></p>
<p>Note: The contents are all based on prometheus-operator helm chart <code>5.10.5</code>.</p>
<pre class="highlight yaml"><code><span class="c1"># requirements.yaml</span>
<span class="na">dependencies</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">prometheus-operator</span>
<span class="na">version</span><span class="pi">:</span> <span class="s">5.10.5</span>
<span class="na">repository</span><span class="pi">:</span> <span class="s">https://kubernetes-charts.storage.googleapis.com/</span>
</code></pre>
<p>This part of the chart is responsible for loading <code>*.json</code> dashboard configurations exported from Grafana and creating
them as individual configmaps in Kubernetes. The config-reloaders in Grafana would then read them and reconfigure itself.</p>
<pre class="highlight yaml"><code><span class="c1"># templates/dashboards-configmap.yaml</span>
<span class="pi">{{</span><span class="nv">- $files</span> <span class="pi">:</span><span class="nv">= .Files.Glob "dashboards/*.json"</span> <span class="pi">}}</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">ConfigMapList</span>
<span class="na">items</span><span class="pi">:</span>
<span class="pi">{{</span><span class="nv">- range $path</span><span class="pi">,</span> <span class="nv">$fileContents</span> <span class="pi">:</span><span class="nv">= $files</span> <span class="pi">}}</span>
<span class="pi">{{</span><span class="nv">- $dashboardName</span> <span class="pi">:</span><span class="nv">= regexReplaceAll "(^.*/)(.*)\\.json$" $path "$</span><span class="pi">{</span><span class="nv">2</span><span class="pi">}</span><span class="s2">"</span><span class="nv"> </span><span class="s">}}</span>
<span class="s">-</span><span class="nv"> </span><span class="s">apiVersion:</span><span class="nv"> </span><span class="s">v1</span>
<span class="s">kind:</span><span class="nv"> </span><span class="s">ConfigMap</span>
<span class="s">metadata:</span>
<span class="s">name:</span><span class="nv"> </span><span class="s">{{</span><span class="nv"> </span><span class="s">printf</span><span class="nv"> </span><span class="s">"</span><span class="err">%</span><span class="nv">s-%s" (include "prometheus-operator.fullname" $) $dashboardName | trunc 63 | trimSuffix "-"</span> <span class="pi">}}</span>
<span class="na">labels</span><span class="pi">:</span>
<span class="na">grafana_dashboard</span><span class="pi">:</span> <span class="s2">"</span><span class="s">1"</span>
<span class="na">app</span><span class="pi">:</span> <span class="pi">{{</span> <span class="nv">template "prometheus-operator.name" $</span> <span class="pi">}}</span><span class="s">-grafana</span>
<span class="pi">{{</span> <span class="nv">include "prometheus-operator.labels" $ | indent 6</span> <span class="pi">}}</span>
<span class="na">data</span><span class="pi">:</span>
<span class="pi">{{</span> <span class="nv">$dashboardName</span> <span class="pi">}}</span><span class="s">.json</span><span class="pi">:</span> <span class="pi">|-</span>
<span class="pi">{{</span> <span class="nv">$.Files.Get $path | indent 6</span><span class="pi">}}</span>
<span class="pi">{{</span><span class="nv">- end</span> <span class="pi">}}</span>
</code></pre>
<h3>Prometheus</h3>
<p>Prometheus rocks a TSDB for data storage so the instance that the pod runs on needs to have a huge volume attached to it.
In my setup, I&rsquo;ve chosen to run Prometheus on a node, by itself, with no other pods scheduled on it. I do this by setting up taints
on a particular node and having Prometheus selectively schedule to that node and tolerate those taints. Normal pods without that
toleration would then just refuse to schedule on it</p>
<p>(This is slightly different in the example app above)</p>
<pre class="highlight plaintext"><code># values.yaml
prometheus:
prometheusSpec:
# So that Prometheus can schedule onto the node with this taint
# Other pods will not have this toleration and won't schedule on it
tolerations:
- key: "dedicated"
operator: "Exists"
effect: "NoSchedule"
- key: "dedicated"
operator: "Exists"
effect: "NoExecute"
# The Prometheus node
nodeSelector:
node: prometheus
# This PV is named in my case, but you can also just do a dynamic claim template like here:
# https://github.com/aranair/k8s-prometheus-operator-helm-example/blob/master/clustermon/values.yaml#L181-L189
storageSpec:
volumeClaimTemplate:
spec:
volumeName: prometheus-pv
selector:
matchLabels:
app: prometheus-pv
resources:
requests:
storage: 1500Gi
selector: {}
</code></pre>
<p>If you take a look at the prometheus-operator&rsquo;s default [values.yaml][helm-chart-values] file, you will find just about any configuration you can think of.</p>
<h3>Monitoring Custom Services</h3>
<p>The <code>ServiceMonitor</code> CRD from the prometheus-operators is used to describe the set of targets to be monitored by Prometheus; the
controller would automatically generate the Prometheus config needed.</p>
<p>For example, a <code>ServiceMonitor</code> for monitoring <a href="https://traefik.io">Traefik</a>, our ingress controller would look something like this:
<code>
additionalServiceMonitors:
- name: traefik-monitor
namespace: monitoring
selector:
matchLabels:
app: traefik # this should be the selector for the Service
namespaceSelector:
matchNames:
- kube-system # Which namespace to look for the Service in
endpoints:
- basicAuth: # Take creds from secret named traefik-monitor-metrics-auth
password:
name: traefik-monitor-metrics-auth
key: password
username:
name: traefik-monitor-metrics-auth
key: user
port: metrics
interval: 10s
</code></p>
<p>These would show up as targets in the prometheus deployment, e.g.</p>
<p><a href="https://homan.s3-ap-southeast-1.amazonaws.com/blog/traefik-in-prometheus.png"><img src="https://homan.s3-ap-southeast-1.amazonaws.com/blog/traefik-in-prometheus.png" title="Traefik Targets Prometheus" alt="Traefik Targets Prometheus" /></a></p>
<p>You can then use PromQL to query things.. like average number of open connections per second looking back at 5min windows, (then extrapolate to 5mins by multiplying by 300)</p>
<p><a href="https://homan.s3-ap-southeast-1.amazonaws.com/blog/traefik-chart-prometheus.png"><img src="https://homan.s3-ap-southeast-1.amazonaws.com/blog/traefik-chart-prometheus.png" title="Traefik Avg backend open connections" alt="Traefik Avg backend open connections" /></a></p>
<p>Charting isnt the best in Prometheus but to be fair, that&rsquo;s not really the primary function of Prometheus.
It can get you what you need eventually, but it just takes way more effort than it should.</p>
<h3>Grafana</h3>
<p>Grafana fills that gap; with this setup, a Grafana instance is automatically setup with Prometheus targeted as a data source.
So generally what I&rsquo;ll do is experiement in Prometheus with PromQL, then port over to a Grafana dashboard with proper
variables and timeframes then export in json and check that in into our git repository. Overtime, we have developed
quite a number of dashboards that monitor many of the services in our cluster (as well as many good default mixins provided
out of the box).</p>
<p><a href="https://homan.s3-ap-southeast-1.amazonaws.com/blog/grafana-dashboards.png"><img src="https://homan.s3-ap-southeast-1.amazonaws.com/blog/grafana-dashboards.png" title="Grafana Dashboards" alt="Grafana Dashboards" /></a></p>
<p>One example is shown below, where it displays the total CPU/RAM usage; we can also click to drill down to each individual pod.</p>
<p><a href="https://homan.s3-ap-southeast-1.amazonaws.com/blog/traefik-dashboard.png"><img src="https://homan.s3-ap-southeast-1.amazonaws.com/blog/traefik-dashboard.png" title="Traefik k8s mixin" alt="Traefik CPU/RAM" /></a></p>
<p>This next one is a dashboard that I built to monitor the health of Traefik, looking at the number of times its had to hot-reload
configurations, and latencies and other useful metrics. We also track the Apdex for example for both entrypoints and backends.</p>
<p><a href="https://homan.s3-ap-southeast-1.amazonaws.com/blog/traefik-custom.png"><img src="https://homan.s3-ap-southeast-1.amazonaws.com/blog/traefik-custom.png" title="Traefik Custom" alt="Traefik Custom" /></a></p>
<h3>Prometheus Rules</h3>
<p>Prometheus rules can be defined in PromQL; these are primarily alerts that you might want the system to flag.
There are many built-in rules that come along with the default installation.</p>
<p>Like when the kube api pods&rsquo; error rate is high:
<code>
alert: KubeAPIErrorsHigh
expr: sum(rate(apiserver_request_count{code=~&quot;^(?:5..)$&quot;,job=&quot;apiserver&quot;}[5m]))
/ sum(rate(apiserver_request_count{job=&quot;apiserver&quot;}[5m])) * 100 &gt; 3
for: 10m
labels:
severity: critical
annotations:
message: API server is returning errors for {{ $value }}% of requests.
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh
</code></p>
<p>Or like when there are CLBO pods:</p>
<pre class="highlight plaintext"><code>alert: KubePodCrashLooping
expr: rate(kube_pod_container_status_restarts_total{job="kube-state-metrics"}[15m])
* 60 * 5 &gt; 0
for: 1h
labels:
severity: critical
annotations:
message: Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container }})
is restarting {{ printf "%.2f" $value }} times / 5 minutes.
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodcrashlooping
</code></pre>
<p>But you can also define your own custom ones; we have quite a number of custom rules.
As an example, there is an alert that will fire when there are &gt; 10 etcd failed proposals
in the past 10 mins, which might indicate some stability issues with the etcd cluster.</p>
<pre class="highlight yaml"><code><span class="na">additionalPrometheusRules</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">custom-alerts</span>
<span class="na">groups</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">generic.rules</span>
<span class="na">rules</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">alert</span><span class="pi">:</span> <span class="s">EtcdFailedProposals</span>
<span class="na">expr</span><span class="pi">:</span> <span class="s">increase(etcd_server_proposal_failed_total[10m]) &gt; 10</span>
<span class="na">labels</span><span class="pi">:</span>
<span class="na">severity</span><span class="pi">:</span> <span class="s">warning</span>
<span class="na">group</span><span class="pi">:</span> <span class="s">tulip</span>
<span class="na">annotations</span><span class="pi">:</span>
<span class="na">summary</span><span class="pi">:</span> <span class="s2">"</span><span class="s">etcd</span><span class="nv"> </span><span class="s">failed</span><span class="nv"> </span><span class="s">proposals"</span>
<span class="na">description</span><span class="pi">:</span> <span class="s2">"</span><span class="s">{{</span><span class="nv"> </span><span class="s">$labels.instance</span><span class="nv"> </span><span class="s">}}</span><span class="nv"> </span><span class="s">failed</span><span class="nv"> </span><span class="s">etcd</span><span class="nv"> </span><span class="s">proposals</span><span class="nv"> </span><span class="s">over</span><span class="nv"> </span><span class="s">the</span><span class="nv"> </span><span class="s">past</span><span class="nv"> </span><span class="s">10</span><span class="nv"> </span><span class="s">minutes</span><span class="nv"> </span><span class="s">has</span><span class="nv"> </span><span class="s">increased.</span><span class="nv"> </span><span class="s">May</span><span class="nv"> </span><span class="s">signal</span><span class="nv"> </span><span class="s">etcd</span><span class="nv"> </span><span class="s">cluster</span><span class="nv"> </span><span class="s">instability"</span>
</code></pre>
<p>Or when a specific pod has restarted X number of times:</p>
<pre class="highlight yaml"><code><span class="nn">...</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">generic.rules</span>
<span class="s">- alert</span><span class="pi">:</span> <span class="s">TraefikPodCrashLooping</span>
<span class="s">expr</span><span class="pi">:</span> <span class="s">round(increase(kube_pod_container_status_restarts_total{pod=~"traefik-.*"}[5m])) &gt; 5</span>
<span class="s">labels</span><span class="pi">:</span>
<span class="na">severity</span><span class="pi">:</span> <span class="s">critical</span>
<span class="na">group</span><span class="pi">:</span> <span class="s">tulip</span>
<span class="na">annotations</span><span class="pi">:</span>
<span class="na">summary</span><span class="pi">:</span> <span class="s2">"</span><span class="s">Traefik</span><span class="nv"> </span><span class="s">pod</span><span class="nv"> </span><span class="s">is</span><span class="nv"> </span><span class="s">restarting</span><span class="nv"> </span><span class="s">frequently"</span>
<span class="na">description</span><span class="pi">:</span> <span class="s2">"</span><span class="s">Traefik</span><span class="nv"> </span><span class="s">pod</span><span class="nv"> </span><span class="s">{{$labels.pod}}</span><span class="nv"> </span><span class="s">has</span><span class="nv"> </span><span class="s">restarted</span><span class="nv"> </span><span class="s">{{$value}}</span><span class="nv"> </span><span class="s">times</span><span class="nv"> </span><span class="s">in</span><span class="nv"> </span><span class="s">the</span><span class="nv"> </span><span class="s">last</span><span class="nv"> </span><span class="s">5</span><span class="nv"> </span><span class="s">mins"</span>
</code></pre>
<p>When these alerts fire, you can see them in Prometheus directly; they are also sent off to the AlertManger if one is linked up with Prometheus.</p>
<h3>AlertManager</h3>
<p>AlertManager can be configured to send to Slack, VictorOps, PagerDuty, or various other sorts of alerting systems.</p>
<pre class="highlight plaintext"><code>alertmanager:
config:
global:
smtp_auth_username: ''
smtp_auth_password: ''
victorops_api_key: ''
victorops_api_url: ''
</code></pre>
<p>In our setup, I configured it to post to Slack whenever there is a <code>Warning</code> level alert, and to VictorOps when there is a <code>critical</code> level alert.</p>
<pre class="highlight plaintext"><code>route:
group_by: ['job']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'null'
# This can be used to route specific specific type of alerts to specific teams.
routes:
- match:
alertname: DeadMansSwitch
receiver: 'null'
- match:
alertname: TargetDown
receiver: 'null'
- match:
severity: warning
group: custom
group_by: ['namespace']
receiver: 'slack'
- match:
severity: critical
group: custom
group_by: ['namespace']
receiver: 'victorops'
receivers:
- name: 'null'
- name: 'sysadmins-email'
email_configs:
- to: 'sysadmin@example.com'
- name: 'slack'
slack_configs:
- username: 'Prometheus'
send_resolved: true
api_url: ''
title: '[{{ .Status | toUpper }}] Warning Alert'
text: &gt;-
{{ template "slack.techops.text" . }}
- name: 'victorops'
victorops_configs:
- routing_key: 'routing_key'
message_type: '{{ .CommonLabels.severity }}'
entity_display_name: '{{ .CommonAnnotations.summary }}'
state_message: &gt;-
{{ template "slack.techops.text" . }}
api_url: ''
api_key: ''
</code></pre>
<p>Generally speaking, <code>warning</code> alerts could indicate some level of degraded service but might self-recover, such as when a node is down and pods auto-reschedule;
or they could also be non time-critical situations that does not need immediate intervention. And <code>critical</code> alerts are reserved for mission-critical services
or infrastructure, that can cause a bunch of issues if not recovered quickly. These would page someone and be resolved as quickly as we can.</p>
<p>Example of an alert that has gone off in AlertManager:</p>
<p><a href="https://homan.s3-ap-southeast-1.amazonaws.com/blog/alert-manager.png"><img src="https://homan.s3-ap-southeast-1.amazonaws.com/blog/alert-manager.png" title="Example AlertManager Alert" alt="Example AlertManager Alert" /></a></p>
<p>Slack Alert:</p>
<p><a href="https://homan.s3-ap-southeast-1.amazonaws.com/blog/prometheus-slack-alert.png"><img src="https://homan.s3-ap-southeast-1.amazonaws.com/blog/prometheus-slack-alert.png" title="Example Slack Alert" alt="Example Slack Alert" /></a></p>
<p>From here, you can have inhibitions that, when present, other alerts will not fire; or silences that would silence
alerts based on their tags.</p>
<h3>Wrap-up</h3>
<p>Together, I think the 3 components form a rather well-rounded monitoring stack for k8s infrastructure and services&rsquo; metrics.
I think, down the road, the next big extension would be to have it spin up federated clusters to monitor different AWS regions and/or clusters.</p>
<p>PS: Here&rsquo;s the repo that has a simplified version of everything I&rsquo;ve talked about above:
<a href="https://github.com/aranair/k8s-prometheus-operator-helm-example">k8s-prometheus-operator-helm-example</a>. And feel free to let me know in the
comments section below if you have any questions or run into any issues playing with that example.</p>
</content>
</entry>
<entry>
<title>Programming with the Modbus RTU & TCP/IP Protocol</title>
<link rel="alternate" href="http://blog.url.com/posts/2017/10/30/programming-with-modbus-rtu-tcp-protocol/"/>
<id>http://blog.url.com/posts/2017/10/30/programming-with-modbus-rtu-tcp-protocol/</id>
<published>2017-10-29T12:00:00-04:00</published>
<updated>2020-11-19T22:43:42-05:00</updated>
<author>
<name>Article Author</name>
</author>
<content type="html"><p>Today&rsquo;s post probably has a very different audience- modbus protocol; it&rsquo;s nowhere near the web projects that I&rsquo;ve been
doing so far but definitely something I&rsquo;m super interested in. This project mostly works with the <a href="http://www.simplymodbus.ca/FAQ.htm">modbus protocol</a>,
which is an open, communication protocol used for transmitting information over serial lines between hardware devices.
Given that IoT is becoming more and more relevant and that the modbus protocol, while old, is still a very commonly used
protocol in the IoT world. So, I hope people will find this post interesting, or even useful if you&rsquo;re attempting something
similar.</p>
<h3>Backstory</h3>
<p>The backstory of the project is that I needed a program to read some data from a spindle, as well as control it through an
inverter- the hitachi wj200 over the <a href="http://www.simplymodbus.ca/FAQ.htm">Modbus</a> RTU protocol. At the same time, it also needs to relay some of this
information to a <a href="https://www.kepware.com/en-us/products/kepserverex/">KepwareServer</a> that acts as both a Modbus TCP/IP slave and a <a href="https://opcfoundation.org/about/opc-technologies/opc-ua/">OPC/UA</a> server.
This then, in turn allows communication with other OPC/UA clients.</p>
<p>The project was initially developed and tested on OSX Sierra 10.12.6 but was eventually compiled and ran on a Windows 10
so that the program can just talk to Kepware over modbus TCP instead of needing 2 machines: 1 linux/OSX + external cabling
to a windows machine (Kepware only runs on windows), but it was also tested on OSX Sierra 10.12.6 first.</p>
<p>You can find the reference code here: <a href="https://github.com/aranair/modbus_adapter">https://github.com/aranair/modbus_adapter</a>.</p>
<h3>Simplified Demo</h3>
<p>If you&rsquo;re just here to find some sample code that runs a Modbus client and server, you can check out the <code>simplified</code> branch
from the repo above. The master and slave code should work with each other.</p>
<h3>Setup</h3>
<p>The hardware setup looks roughly like this:</p>
<p>Spindle &lt;&gt; hitachi wj200 &lt;&gt; USB/COM converter &lt;&gt; C program &lt;&gt; Kepware &lt;&gt; OPC/UA</p>
<p>In this post though, I&rsquo;ll focus on the first part (from the left) of the setup, up to the C program. The C program
was written and tested on my Mac at first so I&rsquo;ll talk a little bit on that. In the next post, I&rsquo;ll shift the focus to
Kepware and how I compiled the same program in Windows 10 (which turned out to be harder than I thought it should be because of
some dependencies I used).</p>
<h3>Modbus Masters vs Slaves</h3>
<p>I am not going to go into details of the Modbus protocol, you can head over <a href="http://www.simplymodbus.ca/FAQ.htm">here</a> if you want a quick overview of
the actual protocol like how to <code>write_registers</code> and <code>write_coil</code> e.g. but I&rsquo;ll like to talk about something I was
initially confused about.</p>
<p>It was the concept of masters, slaves, clients and servers in Modbus. The two different ways of
definition that are sometimes used interchangeably in documentations makes it harder to remember which is which, at
least for me. So, before moving ahead with the rest of the stuff, I should probably define it here again so that
it&rsquo;s less confusing for the unfortunate souls who might read on lol.</p>
<h4>Master / Client</h4>
<p>The master in a modbus network is the brain that is in charge of controlling devices. They can read and write to
slaves (devices). The concept of master and slave is <a href="https://en.wikipedia.org/wiki/Master/slave_(technology)">pretty common</a> in software engineering, so I
won&rsquo;t elaborate more here.</p>
<p>However, in the case of the modbus protocol, the master is also called the client and physical
devices such as the inverter above, are considered servers, or slaves. The master would be the
one that initiates the connection to the slaves. I had assumed it was the other way around.</p>
<p>What remains the same is that, there can only be one master in a single modbus RTU network. (You can
have multiple masters in a modbus TCP/IP network though I think.)</p>
<h4>Slaves / Server</h4>
<p>The slaves are the physical devices that you&rsquo;re communicating with. They&rsquo;re also called servers. They
accept connections from the masters.</p>
<h3>Multiple Modbus Masters?</h3>
<p>For each of the connections defined in <a href="https://github.com/aranair/modbus_adapter/tree/master/config.cfg">config.cfg</a>, I created a Modbus connection.
In this case, one was over RTU protocol and speaks over COM3 and one over TCP/IP.</p>
<p>My spindle was obviously a slave, and it accepts connections / commands from a master. But, I also needed live
information from the spindle at the windows machine with Kepware. At first, I was hoping that I could achieve
that by having a single Modbus slave to multiple Modbus masters (program + kepware). Unfortunately, that isn&rsquo;t
possible, at least over Modbus RTU.</p>
<p>To get around that, I got my program to issue commands to the spindle as a master, while periodically polling
whatever required data from it, and relaying that information as a master to another slave- the Kepware instance.</p>
<p>Essentially, my program initiates and maintains two separate Modbus connections as a master.</p>
<h3>libconfig</h3>
<p>With regards to config files setup in my C program, coming ƒrom ruby and the web environment, YAML seemed like a
natural choice. But I soon learned that, that&rsquo;s not the case in C. I&rsquo;m not sure what is the de-facto solution here,
or if people used config files at all, but I eventually settled on <code>libconfig</code>. It was fairly simple to use and
the interface was semi-clean I guess, even if a little convoluted.</p>
<p>It provides you a way to define nested lists and hashes.</p>
<pre class="highlight plaintext"><code>connections = (
{
type = "rtu";
rtu_port = "COM3";
baud = 115200;
},
{
type = "tcp";
ip = "127.0.0.1";
port = 502;
}
);
</code></pre>
<p>Which you can then get from the program via something like</p>
<pre class="highlight plaintext"><code>setting = config_lookup(&amp;cfg, "connections");
int connections_count = config_setting_length(setting);
conn_arr = (struct ModbusConn *) malloc(sizeof(struct ModbusConn) * connections_count);
const char *type;
for (i = 0; i &lt; connections_count; i++) {
config_setting_t *connection = config_setting_get_elem(setting, i);
config_setting_lookup_string(connection, "type", &amp;type);
...
}
</code></pre>
<p>I know, it is a little long if you&rsquo;re coming from ruby since all of those would be a single line of code.
But hey, at least I&rsquo;ve managed to encapsulate all the config stuff into <a href="https://github.com/aranair/modbus_adapter/tree/master/config.h">config.h</a>.
From the main program, I just need to search/reference it for the configs!</p>
<pre class="highlight plaintext"><code>struct ModbusDevice *plc = get_device(config, "hitachiwj200");
struct ModbusDevice *kep = get_device(config, "kepware");
</code></pre>
<h3>libmodbus</h3>
<p>The library that I was using to establish connections and construct the bytes to send to the devices was <a href="https://github.com/stephane/libmodbus">libmodbus</a>,
a library in C.</p>
<p>The gist of it is, you establish a connection.</p>
<pre class="highlight c"><code><span class="k">if</span> <span class="p">(</span><span class="n">modbus_connect</span><span class="p">(</span><span class="n">ctx</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Connection failed: %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">modbus_strerror</span><span class="p">(</span><span class="n">errno</span><span class="p">));</span>
<span class="n">modbus_free</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
<span class="n">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
</code></pre>
<p>And from there, by addressing directly to the register/coil memory locations you can set or read information through the
protocol.</p>
<pre class="highlight c"><code><span class="kt">void</span> <span class="nf">set_coil</span><span class="p">(</span><span class="n">modbus_t</span> <span class="o">*</span><span class="n">ctx</span><span class="p">,</span> <span class="kt">uint16_t</span> <span class="n">addr_offset</span><span class="p">,</span> <span class="n">bool</span> <span class="n">setting</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Setting coil to %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">setting</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">modbus_write_bit</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">addr_offset</span><span class="p">,</span> <span class="n">setting</span> <span class="o">?</span> <span class="mi">1</span> <span class="o">:</span> <span class="mi">0</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Failed to write to coil: %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">modbus_strerror</span><span class="p">(</span><span class="n">errno</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre>
<p>The library implements all of the commands the protocol provides. You can read more about the commands at <a href="https://github.com/stephane/libmodbus">SimplyModbus</a>.
Each of the commands can be represented via some bytes (as with all things CS lol).</p>
<p>For instance, the <code>modbus_read_registers</code> method in <a href="https://github.com/stephane/libmodbus">libmodbus</a>, is essentially <code>Read Holding Registers</code>
on <a href="http://www.simplymodbus.ca/FC03.htm">this page</a>. The library helps you take care of</p>
<ul>
<li>the slave address (which you do have to set beforehand),</li>
<li>the function code (that represents read_registers) and</li>
<li>the CRC.</li>
</ul>
<p>You also have to manually pass in the rest of the parameters such as the memory location and
the number of registers requested.</p>
<h3>Tricky Memory Addressing</h3>
<p>And this got a little tricky for me.</p>
<p>Each type of register / coil also have their designated memory locations and depending on the implementation of the library.
For instance, single register memory locations might start from 40000 or 400001 depending on which library you use, and this
is obviously quite a source of problem.</p>
<p>Something I found was useful with libmodbus is that it helps you with the first digit of the memory address if you point out
which type it is. You could address a register at memory address 0 with libmodbus and I believe it would automatically map
that to the appropriate memory address, say 400001 in the byte stream for the request it sends out to the slave.</p>
<p>Do note that different libraries might implement it differently and this can be a source of error in particular.</p>
<h3>Configuring Kepware</h3>
<p>I&rsquo;m (also) not going to go into too much details with the configuration of Kepware since the vast majority of you who
happen to read this article will not be paying the price tag on Kepware. But, I think it&rsquo;s enough to say that,
it is a piece of software that provides multiple drivers and UIs that come bundled with it to allow devices who might
speak different protocols such as modbus, or OPC/UA (and a million others), to speak to each other without
needing another piece of software to translate.</p>
<p>For the purpose of this project, it was set up on a Windows machine such that it would host a Modbus slave
that accepts connections from my program, and would receive the data over Modbus TCP/IP as a slave and store the
streamed byte data in an internal register that is universally accessible in Kepware by it&rsquo;s other services e.g.
the OPC/UA driver.</p>
<h3>Virtual Serial Ports Via Pseudo Terminal</h3>
<p>The above sections kinda ran through the setup that I built. This section is mostly on a quick way to run it locally
without needing a COM port connected to the actual device at first. I found it troublesome to have to test my program
with the actual spindle/hardware connected all the time so I looked for a way to simulate the Modbus RTU locally.</p>
<p>So far, I&rsquo;ve found that the pseudo terminal works pretty well, okay except when it randomly stops emiting the stream
data mysteriously heh. But, a restart of the socat stuff below usually fixes that.</p>
<p>I used virtual serial ports to test the program using the steps below:</p>
<pre class="highlight plaintext"><code>$ brew install socat
$ socat -d -d pty,raw,echo=0 pty,raw,echo=0 # to get two pseudo terminals assigned.
$ cat &lt; /dev/ttys035
$ echo "Test" &gt; /dev/ttys037 # on a separate terminal
</code></pre>
<p><a href="http://www.dest-unreach.org/socat/doc/socat.html">Socat</a> is a CLI toolt that allows you to establish two bi-directional byte streams and allows a
transfer of data between them. The commands in the snippet above, in combination, sets up the byte stream across
<code>/dev/ttys035</code> and <code>/dev/ttys037</code> (psuedo terminals) so that any data sent from one end of it will be transmitted
over to the other.</p>
<p>In other words, I could then get my program, which acts as a Modbus RTU master, to connect directly to <code>/dev/ttys035</code>
that has a Modbus RTU slave connected to it. And they can talk to each other in the modbus protocol flawlessly.</p>
<h3>Wrapping Up</h3>
<p>I hope this helps anyone out there who is trying to achieve the same thing and like me, doesn&rsquo;t have a clue how or where
to begin.</p>
<p>Anyways, after finishing development of the program on my Macbook, I eventually had to move this to a Windows machine running on Win 10.
Despite the fact that C is relatively well-supported on Windows (I mean it&rsquo;s just basically compiling to byte code), I had quite
a hard time compiling it because of all that dll shit and hoops that Windows make you jump through, and some issues surrounding
certain dependencies the program had. I did get everything to compile in MSVS 2017 eventually, but I think I&rsquo;ll leave that story
to Part 2 instead. If you wanna skip ahead, the project files can be found in the <a href="https://github.com/aranair/modbus_adapter/tree/master/win32">win32 folder</a>!</p>
</content>
</entry>
<entry>
<title>Golang Telegram Bot - Migrations, Cronjobs & Refactors</title>
<link rel="alternate" href="http://blog.url.com/posts/2017/08/20/golang-telegram-bot-migrations-cronjobs-and-refactors/"/>
<id>http://blog.url.com/posts/2017/08/20/golang-telegram-bot-migrations-cronjobs-and-refactors/</id>
<published>2017-08-19T12:00:00-04:00</published>
<updated>2020-11-19T22:43:42-05:00</updated>
<author>
<name>Article Author</name>
</author>
<content type="html"><p>This post is kind of like a continuation from the previous posts of my Golang Telegram Bot, so if you
haven&rsquo;t seen that yet, it&rsquo;s probably better to start with those first: <a href="https://aranair.github.io/posts/2016/12/25/how-to-set-up-golang-telegram-bot-with-webhooks/">part 1</a> and <a href="https://aranair.github.io/posts/2017/01/21/how-i-deployed-golang-bot-on-digital-ocean/">part 2</a>. I
basically wanted my telegram bot to be able to remember dated / timed reminders and send messages to
notify me when that time comes (like a calendar). Furthermore, just to force me to complete the tasks
quickly, I also make it repeat the notifications until its cleared.</p>
<h3>Code Organization</h3>
<p>Something that I&rsquo;ve never really gotten right in Golang, is code organization.
I find it hard to decide where each piece belongs; it almost feels like a naming- kind of problem to me
and I wish there was a little more convention around this, or a generally accepted framework to think about how
to arrange things.</p>
<p>When I realised I needed the web-app (for responding to messages/commands) and the timer-app (for
periodically checking the time and sending overdue reminders etc) to run at the same time,
a couple of questions came up:</p>
<ul>
<li>Are these 2 related? (For which the answer is yes - configs, db, handlers)</li>
<li>Should these two be separate git repos? (No, because of the previous question)</li>
<li>Can they be run with just one &lsquo;app&rsquo;? (No, reasons in another section)</li>
<li>They are logically separate &#39;apps&rsquo;, so where should each <code>main.go</code> be at?</li>
<li>How do I organise the shared packages and shared configurations?</li>
<li>How do I structure it such that my Dockerfile and docker-compose configs don&rsquo;t require massive
changes? Or even better, can they be shared? (Yes)</li>
</ul>
<p>While researching, I came across this <a href="https://medium.com/@benbjohnson/structuring-applications-in-go-3b04be4ff091">blog post</a> that talks about code organization in
Golang in general and thinking about the application from the perspective of a library, which
all made a ton of sense to me. Head over there to check it out if you&rsquo;re in the same situation as
me.</p>
<h3>CMD Folder</h3>
<p>Anyway, so one of the things that was recommended, is to use a <code>cmd</code> folder to contain
the main running packages (those that actually need a <code>main.go</code>), thereby removing the main.go
from the root folder. It also satisfies my other criteria of not needing to change my docker
setup drastically, so that&rsquo;s all good.</p>
<p>Shared packages are left untouched under the root folder so that logically they&rsquo;re like libraries
and exist in some sort of a common area and they can also be easily imported in the timer/webapp packages.</p>
<p>The general structure comes up to something like this:</p>
<pre class="highlight plaintext"><code>remindbot/
cmd/
timer/
main.go
...
webapp/
main.go
...
config/
commands/
handlers/
migrations/
</code></pre>
<h3>Cron / Scheduled Task</h3>
<p>I needed a cron that would run perpetually and schedules a task every 5 minutes.</p>
<p>I feel that this cron job and my webapp should be in somewhat separated. While they are somewhat
related in terms of configs, commands and databases,I felt that they have two rather different
responsibilities.</p>
<p>I could use a single app, with background tasks or threads running the cron that does exactly
what the timer app does but I&rsquo;ve done them in a way that they run in separate containers,
almost like microservices. I feel that that is a better way of representing the clear distinction
of their responsibilities.</p>
<p>I use <a href="https://github.com/jasonlvhit/gocron">gocron</a> to run a function in a shared package every 5 minutes but if you look at the
code inside, you probably can do without the package if you&rsquo;re afraid of adding dependencies.</p>
<h3>Migrations</h3>
<p>I needed to make changes to the database schema; I think there isn&rsquo;t a defacto package for handling
that out there? There are a couple of them out there like goose for example.</p>
<p>I ended up using <a href="https://github.com/rubenv/sql-migrate">rubenv/sql-migrate</a> though; goose was slightly finicky for me, YMMV.
They&rsquo;re also run manually for now since I don&rsquo;t forsee that many migrations to happen but if they start to
become more frequent, I would definitely move them out into a separate docker container that runs
briefly on every deploy.</p>
<h3>Docker Setup</h3>
<p>There were minimal changes to my Dockerfile and docker-compose config files.</p>
<p>For the <code>docker-compose.yml</code>, I&rsquo;ve added a <code>base</code> key that builds the Dockerfile in the root
folder. And then each of the other 2 services would just define a different entrypoint. I could
also have two separate Dockerfiles but at this point I think they&rsquo;re still similar enough to just
have one Dockerfile.</p>
<pre class="highlight yaml"><code><span class="na">version</span><span class="pi">:</span> <span class="s1">'</span><span class="s">2'</span>
<span class="na">services</span><span class="pi">:</span>
<span class="na">base</span><span class="pi">:</span>
<span class="na">build</span><span class="pi">:</span> <span class="s">.</span>
<span class="na">hazel</span><span class="pi">:</span>
<span class="na">extends</span><span class="pi">:</span> <span class="s">base</span>
<span class="na">ports</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s2">"</span><span class="s">8080:8080"</span>
<span class="na">expose</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s2">"</span><span class="s">8080"</span>
<span class="na">volumes</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">/var/data:/var/data</span>
<span class="na">entrypoint</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">webapp</span>
<span class="na">timer</span><span class="pi">:</span>
<span class="na">extends</span><span class="pi">:</span> <span class="s">base</span>
<span class="na">volumes</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">/var/data:/var/data</span>
<span class="na">entrypoint</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">timer</span>
</code></pre>
<p>I&rsquo;ve also setup <a href="https://github.com/tools/godep">Godep</a> to deal with external package version control. It does a simple job -
save the external packages into the vendor folder so that they can be restored easily the next time.</p>
<p>That way, the Dockerfile would have just one package to grab and restore all the package locally,
instead of getting all of them via <code>go get</code>. Other than that, the Dockerfile basically remains
unchanged other than the Godep stuff and moving the entrypoint from before into the docker-compose
instead.</p>
<pre class="highlight plaintext"><code>FROM golang:1.6
ADD configs.toml /go/bin/
ADD . /go/src/github.com/aranair/remindbot
WORKDIR /go/src/github.com/aranair/remindbot
RUN go get github.com/tools/godep
RUN godep restore
RUN go install ./...
WORKDIR /go/src/github.com/aranair/remindbot
WORKDIR /go/bin/
</code></pre>
<h3>Next Iterations</h3>
<ul>
<li>I want to be able to use &ldquo;today&rdquo; / &ldquo;tomorrow&rdquo; / &ldquo;next week&rdquo; instead of having to put in a date
manually; this probably just means better datetime parsing.</li>
<li>Ideally, I also want a snooze function, where you can postpone the notifications by X number of
hours.</li>
</ul>
</content>
</entry>
<entry>
<title>Building a Python CLI Stock Ticker with Urwid</title>
<link rel="alternate" href="http://blog.url.com/posts/2017/06/28/building-a-python-cli-stock-ticker-with-urwid/"/>
<id>http://blog.url.com/posts/2017/06/28/building-a-python-cli-stock-ticker-with-urwid/</id>
<published>2017-06-27T12:00:00-04:00</published>
<updated>2020-11-19T22:43:42-05:00</updated>
<author>
<name>Article Author</name>
</author>
<content type="html"><p>A bit of context - I do some investing in equities on the side and I&rsquo;ve always wanted to build a simple stock ticker in the form of a
CLI app that runs in my terminal setup. There were a few out there but none that would show
just the information I needed, in a minimalistic fashion. And I thought it would be a fun project
for me since I don&rsquo;t have much prior experience building a CLI app. </p>
<p>So last weekend, I decided to build one for fun! Here is the quick image of it running and
you can find the code over at <a href="https://github.com/aranair/rtscli">https://github.com/aranair/rtscli</a>.</p>
<p><a href="https://raw.githubusercontent.com/aranair/rtscli/master/rtscli-demo.png"><img src="https://raw.githubusercontent.com/aranair/rtscli/master/rtscli-demo.png" alt="Demo" /></a></p>
<h3>Python &amp; CLI Libraries</h3>
<p>I&rsquo;ve been starting to work with Python recently - due to the data-related work at Pocketmath.
So language-wise, Python was a natural choice. But honestly, many other languages offer packages that
can achieve the same result or more - like the <a href="http://tldp.org/HOWTO/NCURSES-Programming-HOWTO/intro.html#WHATIS">ccurses</a> library in C.</p>
<p>But for Python, I found a number of different libraries for CLIs:</p>
<ul>
<li><a href="https://docs.python.org/2/howto/curses.html">curses</a></li>
<li><a href="http://urwid.org/">urwid</a></li>
<li><a href="http://click.pocoo.org/5/">click</a></li>
<li><a href="https://github.com/thomasballinger/curtsies">curtsies</a></li>
</ul>
<h3>Urwid</h3>
<p>Eventually I went with <a href="http://urwid.org/">urwid</a> because it seems easier to just jump in and get started with it instantly.
Urwid is an alternative to the <a href="https://docs.python.org/2/howto/curses.html">curses</a> library in Python and it implements sort of like a layer
ontop of boilerplate stuff that turns out to be really productive for me.</p>
<h3>Stock Ticker - Details</h3>
<p>Okay, this section is mainly describing some of the functionalities I wanted and strictly
speaking, has nothing to do with urwid nor python nor code so feel free to skip if you&rsquo;re not into this ;)</p>
<p>The basic functionalities I wanted was:</p>
<ul>
<li>Read a list of stock tickers that contain the following:
<ul>
<li>Name of the Stock</li>
<li>Symbol (google symbol - <code>HKG:0005</code> e.g.)</li>
<li>Buy-in price</li>
<li>Number of shares held</li>
</ul></li>
<li>Display key information per stock:
<ul>
<li>Change (day)</li>
<li>% Change (day)</li>
<li>Gain (overall)</li>
<li>% Gain (overall)</li>
</ul></li>
<li>Display a portfolio wide change</li>
</ul>
<h3>Implementation - MainLoop</h3>
<p>I imagined the app to be a long-running CLI that continuously accepts commands, while at the same time
pulling the stock information at an interval, on top of re-painting the information on screen. That
can be modelled with a loop - a <code>MainLoop</code> as urwid calls it.</p>
<pre class="highlight python"><code><span class="kn">import</span> <span class="nn">urwid</span>
<span class="n">main_loop</span> <span class="o">=</span> <span class="n">urwid</span><span class="o">.</span><span class="n">MainLoop</span><span class="p">(</span><span class="n">layout</span><span class="p">,</span> <span class="n">palette</span><span class="p">,</span> <span class="n">unhandled_input</span><span class="o">=</span><span class="n">handle_input</span><span class="p">)</span>
<span class="n">main_loop</span><span class="o">.</span><span class="n">set_alarm_in</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">refresh</span><span class="p">)</span>
<span class="n">main_loop</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
</code></pre>
<p>The code above instantiates a <code>MainLoop</code> which ties together a display module, some widgets
and an event loop. Quoting the documentation: <em>It handles passing input from the display module to the
widgets, rendering the widgets and passing the rendered canvas to the display module to be drawn.</em> </p>
<p><strong>I think of it as a controller of sorts.</strong></p>
<h3>Implementation - Refresh Mechanism</h3>
<p><code>set_alarm_in</code> is like <code>setTimeout</code> in the JavaScript world; it just calls the <code>refresh</code> method instantly
in this case. In the refresh method I set another alarm that goes off in <code>10s</code>, that is as good as
telling it to do one data pull from Google Finance every 10 seconds.</p>
<pre class="highlight python"><code><span class="k">def</span> <span class="nf">refresh</span><span class="p">(</span><span class="n">_loop</span><span class="p">,</span> <span class="n">_data</span><span class="p">):</span>