Questions #10

RobStallion · 2018-12-11T11:42:36Z

@nelsonic just opening this issue as a place to ask some questions I have at the moment. Can split each question into their own issue if needed or add them to an FAQ section in the readme if you feel that would be helpful.

Questions

1.

In issue #1 you mention shortening the URL from:

location-app.com/venues/123e4567-e89b-12d3-a456-426655440000

to

location-app.com/sw1x

How would we ensure that this URL is always unique. If we had millions of URLs, like youtube for example? 4 characters does not seem long enough for that to be possible.

2.

In issue #1 you mention:

I suggest that by using a hash of the content as the Primary Key, Ecto (or PostgreSQL) would "reject" the insert request as a "duplicate" and we would not waste space in the database/table with dupe data.

My understanding of the above is that we would take all the content from the form submission and hash it generating a string (which would be used as the primary key). The same content would generate the same string so we could easily tell if it existed or not (I think this is similar to how hash tables work under the hood (thanks for suggesting that book btw 😉)). If my understanding of this is correct then my questions are...

How would we link the change to the original? Would we still be using an alog approach where we have a :entry_id to keep track of this?

Would the long term plan be to be just track the changes, and link to the original file? (I believe this is similar to how both git and IPFS work)

The text was updated successfully, but these errors were encountered:

nelsonic · 2018-12-11T15:58:49Z

Hi @RobStallion, thanks for posting these good questions as an issue. 🎉

in future consider opening separate issues for each question you have for clarity 💭
and making the question the title of the issue for SEO benefits 🔍
and making the life of the "next dev" easier when they have similar questions ... 😉

Answers

1. Uniqueness?

cid will be universally unique.

That needs to be clear to everyone from the first line of the readme.
(if it's not, then it's "my bad" and I need to "fix" it ...)

We will be using the SHA256 hash which to date has not incurred a single hash collision. 256 bits of data is used by most crypto currencies. Enough "smart people" have done the homework on this for us not to worry about it.

The "math" is covered in: https://crypto.stackexchange.com/questions/39641/what-are-the-odds-of-collisions-for-a-hash-function-with-256-bit-output

This is the best video on 256 bit hash collision probability:

https://youtu.be/S9JGmA5_unY

This video does not cover the "Birthday Paradox" see: nelsonic/nelsonic.github.io#576
But again, for the purposes of this answer and indeed any project we are likely to work on in our lifetime,
when dealing with 256 bit hashes, the chance of a "birthday attack" creating a collision is "ignorable".

`cid` means we have the `<option>` to store content on IPFS

We need to make it clear that using a cid as the unique identifier for a record
means we can optionally store content on IPFS for redundancy/decentralisation,
but for the purposes of building our Apps in 2019, we are NOT going to even try to build "D-Apps" because unfortunately there is no way of maintaining privacy for private/personal content on IPFS without pre-storage encryption which then automatically implies storing encryption/decryption keys somewhere centrally.
i.e. something will need to be stored centrally, so we might as well store the data centrally
to reduce query time and request latency in any App(s) we build.

Note: the reason I haven't previously "proclaimed" in dwyl/technology-stack#67
that "all our apps will be distributed by default from now on",
is because the Application building "story" is incomplete on IPFS/IPLD.
There is no way of deleting old data that people no longer want to exist: ipfs-inactive/faq#9
This means that if someone says something hurtful or untrue, they cannot "retract" it
to stop it perpetuating on the netwok ... so decentralisation and content replication can be harmful!
One of the original principals of IPFS was the "permanent web".
I'm fairly certain that most users will not like the idea of "losing control" over their data,
and indeed this is incompatible with EU law: https://en.wikipedia.org/wiki/Right_to_be_forgotten
So we are going to use cid as a means of ensuring uniqueness in our DB records,
and we will use the concept of prev for versioning. See: answers 1 and 2 below.
But we are not going to store textual data on IPFS for the foreseeable future,
until "Filecoin" is fully operational and we have a guarantee that our data will not disappear.
We can still use cid 100% independently of IPFS and when the ecosystem "matures",
we can offer users of our application(s) (Time, Tudo, ALT, etc) to "backup" their data to IPFS!
For now, ignore the existence of IPFS and focus purely on using cid to replace entry_id in Alog.

Uniqueness in a Phoenix-based Web Application

In a given web app, there will be a PostgreSQL database that will store the data.
Each item of content will have a cid

Imagine that we are building a "home rental" website. "restful-bed-and-healthy-breafast.com"
which has the short domain: bnb.com here is a example (simplified) "homes" table:

`inserted`	`cid`(PK)¹	`address`	`slug`	`prev`
1541609554	hdyk80sgPeAX	Wayne Manor, 1007 Mountain Drive, Gotham	hdy	null
1541618643	HvTlGsEX88Nc	Wayne Manor, 1007 Mountain Drive, Gotham	waynemanor	hdyk80sgPeAX
1541628987	pN7hWNuqJ6J	Wayne Manor, 1007 Mountain Drive, Gotham, USA	waynemanor	HvTlGsEX88Nc

The first row is the "creation" of the entry for "Wayne Manor".
At this point the URL would be: bnb.com/hdy corresponding to the first 3 letters of the cid.

The second row is when the listing owner updates the slug to be a more friendly waynemanor
so the URL is more human memorable and SEO friendly: bnb.com/waynemanor
The URL may be longer but it's more memorable and thus people may prefer it.

Notice how the value of prev refers to the cid of the previous version of the record?
that's how we do versioning in a cid based web app. (see below)

As this data will be stored "centrally" by a PostgreSQL database, the DB can be responsible for ensuring that the slug field is not duplicated. We will need to run a "SELECT" query before inserting any record that has a slug to confirm that the user inserting the data has the access rights to update the row with that slug but we will clarify those "access control" details later. For now, let's stick with the simplified version.

In the third row, we added the "USA" to the address which changed the content and thus creates a new cid. The prev refers to the previous version of the record (before "USA" was added). The slug has not changed, so the URL is still the same: bnb.com/waynemanor

2. Updating Content

the update version of content would be linked to the previous version using a prev field the way it happens in IPFS, Etherium and Bitcoin (so it will be familiar to people)
prev: previous_cid address example:

`inserted`	`cid`(PK)¹	`name`	`address`	`prev`
1541609554	gVSTedHFGBetxy	Bruce Wane	1007 Mountain Drive, Gotham	null
1541618643	smnELuCmEaX42	Bruce Wane	Rua Goncalo Afonso, Vila Madalena, Sao Paulo, 05436-100, Brazil	gVSTedHFGBetxy

When a row does not have a prev value then we know it is the first time that content has been inserted into the database. When a prev value is defined in a row we know this is a new version of a previously inserted content and we can "traverse the tree" to see all previous versions.

¹: all cid values truncated for brevity.

@RobStallion please let me know if this answers your questions. 🤔
If not, please help identify the remaining confusion. thanks. 👍

RobStallion · 2018-12-11T17:24:58Z

@nelsonic Those are amazing thank you. Super super helpful. 👍

nelsonic · 2018-12-11T20:24:15Z

@RobStallion do you want to convert these questions & answers into "FAQ.md" and create a PR? 😉

RobStallion · 2019-01-22T11:06:57Z

@nelsonic Will do 👍

… records #10

RobStallion · 2019-01-26T15:57:39Z

The following lines added to the read in #16 answer my first question...

The reason we can abbreviate the URL to just gV is because our SHORT URL service has a centralised Database/store. If we wanted to run a decentralised content addressing system, we would simply link to the full cid: dwyl.co/gVSTedHFGBetxyYib9mBQsjtZj4dJjQe

RobStallion · 2019-01-27T11:05:34Z

Closing as @nelsonic has answered my questions and they have been added to readme

nelsonic added question Further information is requested starter labels Dec 11, 2018

nelsonic assigned RobStallion Dec 20, 2018

nelsonic mentioned this issue Jan 22, 2019

What are CIDs? #13

Open

RobStallion added the in-progress label Jan 26, 2019

RobStallion added a commit that referenced this issue Jan 26, 2019

moves real world examples and adds link to relevant reading #10

c654362

RobStallion added a commit that referenced this issue Jan 26, 2019

adds faqs section to readme with questions on uniqueness and updating…

15c9a63

… records #10

RobStallion mentioned this issue Jan 26, 2019

Faqs #20

Merged

RobStallion added awaiting-review and removed in-progress labels Jan 26, 2019

RobStallion closed this as completed Jan 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions #10

Questions #10

RobStallion commented Dec 11, 2018 •

edited

Loading

nelsonic commented Dec 11, 2018 •

edited

Loading

RobStallion commented Dec 11, 2018

nelsonic commented Dec 11, 2018

RobStallion commented Jan 22, 2019

RobStallion commented Jan 26, 2019

RobStallion commented Jan 27, 2019

Questions #10

Questions #10

Comments

RobStallion commented Dec 11, 2018 • edited Loading

Questions

1.

2.

nelsonic commented Dec 11, 2018 • edited Loading

Answers

1. Uniqueness?

cid means we have the <option> to store content on IPFS

Uniqueness in a Phoenix-based Web Application

2. Updating Content

RobStallion commented Dec 11, 2018

nelsonic commented Dec 11, 2018

RobStallion commented Jan 22, 2019

RobStallion commented Jan 26, 2019

RobStallion commented Jan 27, 2019

RobStallion commented Dec 11, 2018 •

edited

Loading

nelsonic commented Dec 11, 2018 •

edited

Loading

`cid` means we have the `<option>` to store content on IPFS