Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Some sort of data store will be needed #41
At the moment, subscribers in the channels.yml file is hardcoded in. It'd make more sense to have this dynamically updated via a script that runs periodically... if this subscriber number is meant to be used for anything (eg: sorting channels by # of subscribers)
We can use a data store (eg: flat file, sqlite, etc) that can update certain data dynamically. This will also need to be considered for #22.
To add this we'd need a backend (can be written in Node.Js) that interacts with the data store and can provide a simple api to the frontend as well, for example serving channels data, playlist data, etc. And #22 could also use the API to post submissions in a pending review state
@soliveira-vouga Thanks for this. We are in agreement this needs to be updated dynamically, however there should be no reason to implement any other data store than the yml files inside the data folder.
Additional lambda services can be set up to commit directly to master which would trigger a rebuild of the website, or even set up a pull request so that a human can confirm before merging.
Channel subscriber data for example could be updated daily, where as watch/like count on videos might be done every few hours. This would mean there is no additional load on the user's browser, or delays on retrieving data, also means we could sort natively on the data.
I'm not sure how much you've worked with static site generators, they're definitely something it takes a bit to wrap your head around. The more conversations we can have about these topics the better. I had a good chat with @Murodese last night to go over this exact issue, right now he's looking into making executable that can add/edit channels/videos, these could then be leveraged by the updating service.
I've used Jekyll in the past. While it's great for simple blogs or static websites, it is limited to that. It cannot handle dynamic forms. Something as simple as a contact form requires a third party service. I also wouldn't use it if one of the requirements is that data is constantly updated via a script running on a cronjob. Here we're actually adding more overhead than simply using a database (eg: a flat file db). If you're going to use lambda then why not use S3 as your data storage? If you're going to use lambda, then you'll need to store git credentials somewhere for it to have permission to push to master. Same goes for whatever technology you decide to use for the user submission form. You will now need to handle git credentials safely which is additional overhead
Also, using git as the data store you'll have additional problems when having the public submit content via a webform. How will you handle their commits? Will they go to a separate branch? How will you keep that branch in sync with master? What if there is a conflict between the branches? How will you select which submissions are to be included and which ones are not to be includes? You can use
You'll already need a backend to handle form data (ex: taking the form data and committing it to git) so why not use that backend to store the data in a proper db?
Personally, i have mixed feelings about this (and i've experimented with both approaches). The security-related issues @soliveira-vouga brought up are very real, but i think the main problem with git-as-db is there is no built-in ACL and the past can be rewritten (that is not true of all CVS, just git). So giving a bot access to a repository can have disastrous consequences if the bot gets compromised.
In the end you're trying to synchronise your static website (breadtube) with external services. So you're trying to build a coherent state, but static websites are mostly "stateless" information-sharing tools, where you need to rely on external tools (whether manual curating, your forge, or some other application) to populate your data folders (handle the state). But most software forges were not built with this purpose in mind so you end up either giving full access to your repo to some external service (ouch) or you end up keeping the content on your build machine (with proper backups) and not using git for this.
Approach #1 : 3rd party service managing your repo
That's what Netlify does with its "CMS" and "Forms" features. Basically, they run a service managing identity and permissions and let a bot of theirs open/merge PRs (using PR titles as a means of keeping track) when you ask it to through their web interface
So the only way you're not reinventing the wheel (a super complex wheel) going in this direction is by using Netlify, who are as far as i know the only people who have a such usable solution. But do we trust them, their financial interests, and their thousands of line of ugly NodeJS code controlling our git repository as single source of truth?
Approach #2: your data folder is curated locally
You can add some parts of your data folder to
Even if your data is stored locally, you need a way for remote users to suggest content so you really want some form of Web API taking care of authentication, permissions management (ACL), moderation... then you can expose a web "app" (HTML forms + CSS, no JS needed) and a CLI client for your API to propose content.
Note: i did not mention any external database in regards to this API because we already have a flat-file database. But of course it could be swapped for SQLite or whatever suits better, and then an export script can generate the data files accordingly. Personally, i don't really see the point but it's feasible and fits within this second approach.
Approach #3: taking the best of both worlds?
Maybe an intermediate approach could be to use a local source of truth, but handle it as a separate CVS repository (such as git). This way you get integrity and history from the CVS while not giving your bot/API access to the whole website repository. Plus, it's notably easier to have your scripts maintain the repo as you won't have to deal with merges and incorrect states and whatnot (the original source of truth is on your build server and git is only there to keep history/backups).
This third approach i never really experimented with so far. Do you have any critics/feedback? Is it worth trying out?
EDIT: Added some formatting. Also, sorry for the long comment. I hope it brings interesting conversation and solutions to our problems :)
Another really helpful comment that I completely missed. I’m going to spend this week curating issues and scoping out what work we have now.
As this point the original issue in this has been fixed, there is now a script to pull all the subscriber counts, I run it daily and submit a pull request.
I think that using the pull request model for bots will allow us to avoid issues of compromised services overwriting history (they could only submit to a branch).
Same ultimate problem of having a profit business in our business, something like gitlab could solve this also.
In #43 (comment) the benefit of using a static site for allowing distribution on IPFS etc is the sort of huge benefit i see for the cost of having systems which manage content addition/maintenance
I think continuing the discussion on this is going to be really important, though I’m really happy to have a script to help us maintain the data store as it currently is.