You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How the data ingestion is going to work with a minimum of system expense:
Every article load will send a request to a js file, passing it the article URL and an object consisting of the noun-matched items.
The backend script will hash the URL and look for that file in its filesystem. If the file doesn't exist it will create it and add the noun-matched items and submitting URL. It will also write the hash to a list of queued hashes.
A daemon will watch the queue list and process the noun-matched items in each file, adding them to the database.
These lists will also be available in javascript form at a similar URL.
Once a day those pages will be put together and published via a cronjob and backend script that pulls the article information from the database and writes it to flat files.
The database should include proper noun, article title, and the date it first appeared. The date field is questionable since some old articles will have a new date assigned to them.
The idea is to begin compiling a database for each player and team of stories (cannonical urls only) that they are mentioned in.
That way, once we have this list, we can do something useful with it.
The text was updated successfully, but these errors were encountered: