Skip to content

Commit

Permalink
more blog post fix up since backup from ghost
Browse files Browse the repository at this point in the history
  • Loading branch information
chekkan committed Apr 12, 2022
1 parent 1ea59de commit c195c7d
Show file tree
Hide file tree
Showing 9 changed files with 480 additions and 231 deletions.
20 changes: 18 additions & 2 deletions _drafts/migrate-blog-to-jekyll-and-hosted-on-github-pages.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -54,18 +54,33 @@ jekyll's default post format `yyyy-mm-dd-title` urls. Therefore, I had to
manually set the `permalink` front matter attribute. This wasn't so bad as I
only had around 34 blog posts.

Also, posts with links to other posts ended up with `__GHOST_URL__` prefix.
e.g. `[Part 2 - Setting up Kibana Service](__GHOST_URL__/setting-up-elasticsearch-cluster-on-kubernetes-part-2-kibana/)`.
Therefore, make sure to find and replace these ones manually to use `post_url`
liquid tags.
e.g. `[Part 2 - Setting up Kibana Service]({{ '{%' }} post_url 2018-02-13-setting-up-elasticsearch-cluster-on-kubernetes-part-2-kibana %})`.

### Code blocks
I also had to also add the correct `highlight language` tag for my code blocks
in existing posts.

```
{% raw %}
{% highlight csharp %}
...
public int Add(int a, int b) => a + b;
{% endhighlight %}
{% endraw %}
```

I later found out that jekyll can also make use of
[GitHub Fenced Code Blocks][gh_code] syntax as well.

<pre class='code'>
<code>``` javascript
const add = (a, b) => a + b;
```</code>
</pre>

### Images
[jekyll_ghost_importer][1] also imports the images as html tags, and with fixed
width and height which looks streched out in Jekyll with minima theme. I had to
Expand All @@ -81,4 +96,5 @@ have to stop migrating the blog and stick to a platform for a substantial
amount of time. And I am hopefull that this might be it.


[1]: <https://github.com/eloyesp/jekyll_ghost_importer>
[1]: <https://github.com/eloyesp/jekyll_ghost_importer>
[gh_code]: <https://help.github.com/articles/creating-and-highlighting-code-blocks/>
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
layout: post
title: Release Management Service and VSTS Team Build
permalink: release-management-service-and-team-build
date: '2017-07-25 22:38:00'
tags:
- vsts
Expand All @@ -9,13 +10,20 @@ tags:
- azure-devops
---

I’ve started working on a new project at my company and it is a massive project compared to anything I have done previously. We are not short on technologies in this project starting with an angular web site all the way to using elastic search. One of the challenges we faced daily was releasing all the different systems that is part of project into dev, test, staging and production.
I’ve started working on a new project at my company and it is a massive project
compared to anything I have done previously. We are not short on technologies
in this project starting with an angular web site all the way to using elastic
search. One of the challenges we faced daily was releasing all the different
systems that is part of project into dev, test, staging and production.

This is a blog post explaining the different challenges I faced and going through step by step each challenges.
This is a blog post explaining the different challenges I faced and going
through step by step each challenges.

Let me be clear about the different technologies and environments used in this walk through so that it is clear for you.
Let me be clear about the different technologies and environments used in this
walk through so that it is clear for you.

1. Visual Studio Team Services - At the time of this writing the latest release for visual studio [13th April 2016](https://www.visualstudio.com/en-us/news/2016-apr-13-vso) had came out.
1. Visual Studio Team Services - At the time of this writing the latest release
for Sisual Studio Online [13th April 2016][apr-13-vso] had came out.
2. GIT - for the source control
3. Angular with Typescript on the front end.
4. ASP.NET WebAPI 2 on the server
Expand All @@ -24,13 +32,28 @@ Let me be clear about the different technologies and environments used in this w
7. Elastic Search
8. SQL Server

The challenge was how to automate the full deployment of all these various technologies into deferent environment that was repeatable and to deliver our code that was predictable.
The challenge was how to automate the full deployment of all these various
technologies into deferent environment that was repeatable and to deliver our
code that was predictable.

My experience before starting this with regards to continuos delivery and operations were zero to none.
My experience before starting this with regards to continuos delivery and
operations were zero to none.

At first, I had spend some time looking into PowerShell DSC. It seems promising as VSTS at the time had Release Management Application which only supported PowerShell DSC. It wasn’t too long into PowerShell DSC that I ran into some difficulties with PowerShell DSC. To name a few, when I first started, there weren’t a lot of built in Resources that I could use. There were a few which I could get from the inter webs, but it was difficult to get it downloaded into the machine itself. Such as the IIS Site resources, xWebSite resource, xWebAdministrator resource, etc.
At first, I had spend some time looking into PowerShell DSC. It seems promising
as VSTS at the time had Release Management Application which only supported
PowerShell DSC. It wasn’t too long into PowerShell DSC that I ran into some
difficulties with PowerShell DSC. To name a few, when I first started, there
weren’t a lot of built in Resources that I could use. There were a few which I
could get from the inter webs, but it was difficult to get it downloaded into
the machine itself. Such as the IIS Site resources, xWebSite resource,
xWebAdministrator resource, etc.

The error messages provided by PowerShell DSC were really difficult to diagnose. Finding the answers on google searches were not straight forward etc. I finally decided that PowerShell wasn’t the way forward and that we needed to rely on something else to get the job done.
The error messages provided by PowerShell DSC were really difficult to
diagnose. Finding the answers on google searches were not straight forward etc.
I finally decided that PowerShell wasn’t the way forward and that we needed to
rely on something else to get the job done.

I had a few weeks to go down this route before I knew Microsoft was bringing release management into their cloud web portal.
I had a few weeks to go down this route before I knew Microsoft was bringing
release management into their cloud web portal.

[apr-13-vso]: <https://www.visualstudio.com/en-us/news/2016-apr-13-vso>
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
layout: post
title: Ingesting data from Oracle DB into Elasticsearch with Logstash
permalink: ingesting-data-into-elasticsearch-with-logstash
date: '2017-07-30 00:19:00'
tags:
- elasticsearch
Expand All @@ -10,81 +11,127 @@ tags:
- ansible
---

Alternative to Logstash was the [Elasticsearch JDBC tool](https://github.com/jprante/elasticsearch-jdbc). Which at the time of writing was using port `9300` for transfering data. There were talks of not exposing this port externally in future releases of elaticsearch and hence we went with logstash.
Alternative to Logstash was the [Elasticsearch JDBC tool][es_jdbc]. Which at
the time of writing was using port `9300` for transfering data. There were
talks of not exposing this port externally in future releases of elaticsearch
and hence we went with logstash.

## Setup

- The way we have setup logstash and elasticsearch cluster at present is by using [Ansible](https://www.ansible.com/).
- We have one vm with logstash installed which can connect to the elasticsearch cluster.
- [ReadonlyRest](https://readonlyrest.com/) plugin is used for managing access for our cluster.
- Used the [JDBC plugin](https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html) in order to query for the data with [elasticsearch output plugin](https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html).
- Use a cron job for scheduling the logstash to run on a schedule. Our schedule is once every hour.
- The way we have setup logstash and elasticsearch cluster at present is by
using [Ansible][ansible].
- We have one vm with logstash installed which can connect to the elasticsearch
cluster.
- [ReadonlyRest][ror] plugin is used for managing access for our cluster.
- Used the [JDBC plugin][jdbc] in order to query for the data with
[elasticsearch output plugin][esop].
- Use a cron job for scheduling the logstash to run on a schedule. Our schedule
is once every hour.

As of logstash version 5.0, there is an option to enable [http compression](https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#_http_compression) for **requests** , so make sure to take advantage of this. As we saw a reduction of up to 10 fold in the data size.
As of logstash version 5.0, there is an option to enable
[http compression][htcomp] for **requests** , so make sure to take advantage of
this. As we saw a reduction of up to 10 fold in the data size.

## Updates

There were two options for getting the updates from oracle db whilst using the JDBC input plugin. **Option 1:** Modify the job which insert or updates each table that we are ingesting with a `lastupdated` field. The script that would run at our schedule of every one hour would then query the elasticsearch index for the `max_date` on the index and pass it to the sql thats run by logstash jdbc plugin. **Option 2:** Use the `sql_last_value` plugin parameter which will persist the `sql_last_value` parameter in the form of a metadata file stored in the configured `last_run_metadata_path`. Upon query execution, this file will be updated with the current value of `sql_last_value`. In our case, this meant that we will need to use an insert or update timestamp in our table.

Primary key in the oracle db table is used as the document id in elasticsearch. This means that each updated document will correctly override the document in elasticsearch.

output {
elasticsearch {
hosts => ${HOST_STRING}
index => "${ES_INDEX}"
document_id => "%{${ES_DOC_ID}}"
document_type => "${INDEX_TYPE}"
flush_size => 1000
http_compression => true
}
}
There were two options for getting the updates from oracle db whilst using the
JDBC input plugin. **Option 1:** Modify the job which insert or updates each
table that we are ingesting with a `lastupdated` field. The script that would
run at our schedule of every one hour would then query the elasticsearch index
for the `max_date` on the index and pass it to the sql thats run by logstash
jdbc plugin. **Option 2:** Use the `sql_last_value` plugin parameter which will
persist the `sql_last_value` parameter in the form of a metadata file stored in
the configured `last_run_metadata_path`. Upon query execution, this file will
be updated with the current value of `sql_last_value`. In our case, this meant
that we will need to use an insert or update timestamp in our table.

Primary key in the oracle db table is used as the document id in elasticsearch.
This means that each updated document will correctly override the document in
elasticsearch.

{% highlight ruby %}

output {
elasticsearch {
hosts => ${HOST_STRING}
index => "${ES_INDEX}"
document_id => "%{${ES_DOC_ID}}"
document_type => "${INDEX_TYPE}"
flush_size => 1000
http_compression => true
}
}

{% endhighlight %}

## Transform data

Make use of filters in order to do basic data transformations.

### Transform table column value to object

mutate {
rename => { "address.line1" => "[address][line1]" }
rename => { "address.line2" => "[address][line2]" }
}
{% highlight ruby %}

mutate {
rename => { "address.line1" => "[address][line1]" }
rename => { "address.line2" => "[address][line2]" }
}

{% endhighlight %}

### Covert comma delimeted field to array of string

ruby {
init => "require 'csv'"
code => "['urls'].each { |type|
if event.include?(type) then
if event.get(type) == nil || event.get(type) == 'null' then
{% highlight ruby %}

ruby {
init => "require 'csv'"
code => "['urls'].each { |type|
if event.include?(type) then
if event.get(type) == nil || event.get(type) == 'null' then
event.remove(type)
else
# bin data if not valid CSV
begin
event.set(type, CSV.parse(event.get(type))[0])
rescue
event.remove(type)
else
# bin data if not valid CSV
begin
event.set(type, CSV.parse(event.get(type))[0])
rescue
event.remove(type)
end
end
end
}"
}
end
}"
}

{% endhighlight %}

## Improvements

The setup described in this article doesn’t work well if we need to also remove deleted entries. Consider using a column in our view to indicate if a field was removed or not. But that only works for “soft-deletes” in database.
The setup described in this article doesn’t work well if we need to also remove
deleted entries. Consider using a column in our view to indicate if a field was
removed or not. But that only works for “soft-deletes” in database.

Move towards using a bus queuing system for ingestion. One project by linkedin that caught my attention that supports oracle db as source for ingestion was [databus](https://github.com/linkedin/databus). But, haven’t managed to get it setup locally (poor documentation at the time of writing).
Move towards using a bus queuing system for ingestion. One project by linkedin
that caught my attention that supports oracle db as source for ingestion was
[databus][lkndb]. But, haven’t managed to get it setup locally (poor
documentation at the time of writing).

Full re-index is currently a manual process, even though we a script to perform full re-index.
Full re-index is currently a manual process, even though we a script to perform
full re-index.

## Further Reading

- 📖[bottled water: real-time integration of postgresql and kafka](https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/)
- 📖[data pipeline evolution at linkedin on a few pictures](http://getindata.com/data-pipeline-evolution-at-linkedin-on-a-few-pictures)
- 🎥[change data capture: the magic wand we forgot](https://www.youtube.com/watch?v=ZAZJqEKUl3U)
- 📖 [bottled water: real-time integration of postgresql and kafka](https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/)
- 📖 [data pipeline evolution at linkedin on a few pictures](http://getindata.com/data-pipeline-evolution-at-linkedin-on-a-few-pictures)
- 🎥 [change data capture: the magic wand we forgot](https://www.youtube.com/watch?v=ZAZJqEKUl3U)

_Image credit:_

- [https://flic.kr/p/8wuFEJ](https://flic.kr/p/8wuFEJ)
- [https://creativecommons.org/licenses/by-nc/2.0/](https://creativecommons.org/licenses/by-nc/2.0/)

[es_jdbc]: <https://github.com/jprante/elasticsearch-jdbc>
[ansible]: <https://www.ansible.com/>
[ror]: <https://readonlyrest.com/>
[jdbc]: <https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html>
[esop]: <https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html>
[htcomp]: <https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#_http_compression>
[lkndb]: <https://github.com/linkedin/databus>
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
---
layout: post
title: Access response headers in HTTP Fetch API with Serverless Framework and AWS
Lambda
title: Access response headers in HTTP Fetch API with Serverless Framework and
AWS Lambda
permalink: http-fetch-response-headers-with-serverless-and-aws-lambda
date: '2017-11-30 00:23:00'
tags:
- aws
Expand All @@ -11,47 +12,62 @@ tags:
- node-js
---

In order to access response headers such as `Location` in HTTP Fetch api whilst using Serverless Framework and AWS Lambda Functions with CORS enabled, you need to do the following.
In order to access response headers such as `Location` in HTTP Fetch api whilst
using Serverless Framework and AWS Lambda Functions with CORS enabled, you need
to do the following.

Make sure `cors` is set to `true` on `serverless.yml`

postUsers:
handler: handler.postUsers
events:
- http:
path: users
method: post
cors: true
{% highlight yaml %}

postUsers:
handler: handler.postUsers
events:
- http:
path: users
method: post
cors: true

{% endhighlight %}

Make sure in the response header, you are returning the following:

callback(null, {
statusCode: 201,
headers: {
"Access-Control-Allow-Origin": "*",
// Required for cookies, authorization headers with HTTPS
"access-control-allow-credentials": true,
"access-control-allow-headers": "Location",
"access-control-expose-headers": "Location",
Location: id
}
});
{% highlight javascript %}

callback(null, {
statusCode: 201,
headers: {
"Access-Control-Allow-Origin": "*",
// Required for cookies, authorization headers with HTTPS
"access-control-allow-credentials": true,
"access-control-allow-headers": "Location",
"access-control-expose-headers": "Location",
Location: id
}
});

{% endhighlight %}

Now, you can access the header `location` from `fetch`.

this.httpClient
.fetch("/users", {
method: "post",
body: json({ username: "chekkan" })
})
.then(res => {
return res.headers.get("location");
});
{% highlight javascript %}

this.httpClient
.fetch("/users", {
method: "post",
body: json({ username: "chekkan" })
})
.then(res => {
return res.headers.get("location");
});

{% endhighlight %}

**References:**

- [Serverless.yml CORS](https://serverless.com/framework/docs/providers/aws/events/apigateway#enabling-cors)
- [Access-Control-Allow-Headers - Mozilla Developer](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Access-Control-Allow-Headers)

_Photo by [Paul Buffington](https://unsplash.com/photos/Lwe2hbm5XKk?utmsource=unsplash&utmmedium=referral&utmcontent=creditCopyText) on [Unsplash](https://unsplash.com/?utmsource=unsplash&utmmedium=referral&utmcontent=creditCopyText)_
_Photo by [Paul Buffington](https://unsplash.com/photos/Lwe2hbm5XKk?utmsource=unsplash&utmmedium=referral&utmcontent=creditCopyText)
on [Unsplash](https://unsplash.com/?utmsource=unsplash&utmmedium=referral&utmcontent=creditCopyText)_

0 comments on commit c195c7d

Please sign in to comment.