How to handle unique resources, e.g. phd thesis #3

cjrd · 2013-04-13T19:46:38Z

How should we handle unique resources? Currently, we store general resources in the resources.txt file and reference them via the source tag in the node content. This prevents rewriting the same resource for each node, e.g. Bishop's PRML. But what if we want to cite a specific web page or publication. Should we add the web page to resources.txt and reference it via the source tag?

rgrosse · 2013-04-13T21:38:50Z

In the convolutional_nets node, I have an entry for the original LeCun paper, which lists the title, author, and so on. Right now, it gets ignored, because it lists

source: paper

and there's currently no entry for "paper" in the database.

Here's what I'm roughly envisioning: currently, we have a field in resources.txt called "resource_type." For each resource type, we would have associated templates which determine how it's rendered in HTML or plaintext. We would then have a dummy entry in resources.txt which is something like:

key: paper
resource_type: paper

which just says to look for the template for "paper." The templates would be stored in the content repository. Any thoughts on this proposal?

cjrd · 2013-04-14T11:21:14Z

Templates sound like a good idea, but logically, these should probably be stored with the frontend content (since they'll be html templates that are independent of the content, itself).

One concern with this overall approach, though, is that we could have the same paper/thesis/resource repeated in a number of nodes, all with

source: paper
link: paper.com

But then if paper.com changes to paper.edu we would have to update each node instead of just updating global resources.txt.

Also we would have to specify the "free" tag in the node/resources.txt file (not all papers are freely available, especially in non-CS fields).

Here's an alternative idea:
Every unique resource has a unique entry in the global resources.txt file, meaning the LeCun paper would have:

key: lecunpaper
title: Gradient-based learning applied to document recognition
location: http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
resource_type: paper
free: 1

and the convolutional_nets/resources.txt would have entry

source: lecunpaper
location: section II.A
authors: Yann LeCun and Leon Bottou and Yoshua Bengio and Patrick Haffner
mark: star

This way each nodes/resources entry is a pointer to the global resource that contains information that can be used by multiple nodes (title of resource, URL, free-tag, etc) without repeating this information for each node entry.

[FYI: I just described a simple text-based relational database]

This could be automated in a web form, where a new global resource is created if no previous resource matched the location tag, which should be a unique identifier for resources. We could also suggest that the resource already exists if the title matched a previously entered title.

Additional note: Manually entering the resources into the resources.txt file may work for the moment, but this will quickly become unmanageable, e.g. scanning through hundreds (thousands?) of resources to make sure I'm not repeating a certain book or thesis entry. Searching for the correct source key when I want to use a resource (did I call it lecunpaper or lecun_paper) or trying to find an old entry of mine in order to make updates. Sure, after entering a resource I'll probably remember where/how I entered it for a few weeks, but I'd like a good system for finding the same entry next year, or for finding an entry that someone else made.

Hmm, I can't seem to shake the feeling that a flat file database will be very difficult to maintain -- I'll open a new discussion for this, though.

rgrosse · 2013-04-14T14:43:02Z

I know it feels like templates should generally be part of the view, but I'd still argue for keeping them with the content for purposes of modularity. In particular:

Let's say we have a regular contributor who works a lot with the content repository. They should be able to add a new resource type without understanding/modifying the code repository as well.
If someone wants to create an entirely separate content database for another field (e.g. biology), it may have its own set of resource types, such as review papers. They should be able to create their own content repository while using the same server code. (In principle, we should be able to have one content repository which includes every field, but let's still not write off the possibility of separate content repositories.)
Keeping the templates in the view introduces a logical dependency between the content and server repositories.

As far as whether to add individual papers to the global resources.txt, it's really a matter of how unique they are. My guess is that most papers will correspond only to a single concept in the graph, so it would be easier to keep all the information in the individual content nodes. But then, there are a number of review papers (e.g. the Wainwright & Jordan tutorial) which cover a lot of topics, so it would be worth giving them their own entry in the global resources.txt.

To make it easier to edit the text files, another option would be just to write tools that help with that. This would preserve the benefits of flat files, especially the ability to collaborate through Github. The tools could take the form of standalone programs for editing the text which provide autocomplete. Or it might take the form of emacs/vim plugins.

cjrd · 2013-04-14T15:12:53Z

Let's say we have a regular contributor who works a lot with the content repository. They should be able to add a new resource type without understanding/modifying the code repository as well.

True, currently the code depends on the content but not vice-versa, so a content developer can work without understanding the code but a coder has to understand the content.

But the templates will probably also incorporate CSS/javascript. Should we place template specific CSS/javascript with the content as well or make a list of valid CSS/javascript that can be used in a template? What about general CSS/javascript that's used throughout the display? I agree with your other points.

As far as whether to add individual papers to the global resources.txt, it's really a matter of how unique they are.

Yes, but I think we should also aim for consistency, e.g. the resource links and free-tag should always be in the global resources.txt file.

The tools could take the form of standalone programs for editing the text which provide autocomplete. Or it might take the form of emacs/vim plugins.

Any reason to develop standalone programs or vim/emacs plugins instead of incorporating these features into the browser?

rgrosse · 2013-04-14T15:37:28Z

On Sun, Apr 14, 2013 at 11:12 AM, Colorado Reed notifications@github.comwrote:

Let's say we have a regular contributor who works a lot with the content
repository. They should be able to add a new resource type without
understanding/modifying the code repository as well.

True, currently the code depends on the content but not vice-versa, so a
content developer can work without understanding the code but a coder has
to understand the content.

But the templates will probably also incorporate CSS/javascript. Should we
place template specific CSS/javascript with the content as well or make a
list of valid CSS/javascript that can be used in a template? What about
general CSS/javascript that's used throughout the display? I agree with
your other points.

Why would templates involve Javascript? I wouldn't expect them to involve
anything more than simple HTML tags such as or . We'd need to define
some way to interpolate the field values, but that shouldn't have to be too
complicated.

As far as whether to add individual papers to the global resources.txt,
it's really a matter of how unique they are.

Yes, but I think we should also aim for consistency, e.g. the resource
links and free-tag should always be in the global resources.txt file.

Another way to handle this, which might be more consistent, would be to
simply have the global resources.txt give a set of default attributes
associated with each resource. Then, the "source: gpml" field in a node's
resource entry would simply tell it to substitute in the corresponding
attributes from the global entry. That way, the two attribute sets will be
concatenated, and the view won't have to worry about whether the individual
fields came from the global resources file or the node-specific one.

The tools could take the form of standalone programs for editing the
text which provide autocomplete. Or it might take the form of emacs/vim
plugins.

Any reason to develop standalone programs or vim/emacs plugins instead of
incorporating these features into the browser?

To avoid reinventing the wheel. The text editors already have lots of
features that people like, so there's no sense in forcing everyone to use a
common web form interface.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/3#issuecomment-16352592
.

cjrd · 2013-04-14T16:33:41Z

Why would templates involve Javascript? I wouldn't expect them to involve
anything more than simple HTML tags such as or . We'd need to define
some way to interpolate the field values, but that shouldn't have to be too
complicated.

I guess that depends where/how you want to use the template. The current resources display for the knowledge map uses quite a bit of css and a little bit of javascript in the [additional info] link. But feel free to rewrite this so that it ionly uses basic html tags.

Another way to handle this, which might be more consistent, would be to
simply have the global resources.txt give a set of default attributes
associated with each resource. Then, the "source: gpml" field in a node's
resource entry would simply tell it to substitute in the corresponding
attributes from the global entry. That way, the two attribute sets will be
concatenated, and the view won't have to worry about whether the individual
fields came from the global resources file or the node-specific one.

Sounds good to me

To avoid reinventing the wheel. The text editors already have lots of
features that people like, so there's no sense in forcing everyone to use a
common web form interface.

I don't think using the browser is "reinventing the wheel." Data entry via a browser is not a new concept, and there's lots of highly developed libraries, e.g. jquery, that provide a host of robust features that we could use to craft nice IDE. But yes, my favorite text editor is more comfortable than a browser, and it would be nice to improve it for editing kmaps. That being said, the bottleneck for agfk will be getting users to contribute content. Telling someone to clone our content git repository, create and edit six text files for each node, then send us a pull request with the content is a pretty big overhead (and adding in that they should use our emacs plugin doesn't help much in this regard). And quite frankly, I doubt more than a handful of individuals in CS fields would contribute. So I feel we should spend a lot of time developing the browser interface to the content. This way, users can simply click a button ("add node"), fill in the data have it immediately available. We can even incorporate realtime visualization of the graph they're creating. The point is that we want to entice users into contributing, not make them jump through hoops.

rgrosse · 2013-04-14T19:04:43Z

It's not a matter of browser vs. text editor, and it could be that
libraries like jquery turn out to be the best way to construct a GUI for
editing the text files. There's a whole ecosystem already built up around
text files, including text editors, UNIX command line tools, Git, Github,
etc. In order to replace text files, we'd have to reimplement a lot of
functionality associated with each of these in order to make it as usable
and intuitive.

Assuming we go with the two-tiered system, it's already possible for people
to make relatively self-contained contributions (adding stuff for
individual nodes) through the web interface. It shouldn't be too hard to
set it up so you can add new nodes this way either. The only obstacle is
for making complex changes like splitting nodes, and this is going to be
tricky no matter how we handle it. I'm sure we can make a graphical
interface that's easier to use than text files, but I doubt our version 1.0
will be.

Now, if people in other fields get excited about this and are willing to
put in a lot of time to build up whole maps, I agree we should do
everything we can to make things easy for them. This is certainly a problem
we'd like to have. But I think we can worry about this when the time comes.
Let's just make it clear that we're happy to talk to them to figure out
what would be easiest for them. Then we'd be able to iterate with actual
users. Hopefully the combination of their feedback and the experience of
contributors under the current format will let us design an interface
that's simpler and more intuitive than whatever we'd come up with now.

On Sun, Apr 14, 2013 at 12:33 PM, Colorado Reed notifications@github.comwrote:

Why would templates involve Javascript? I wouldn't expect them to involve
anything more than simple HTML tags such as * or . We'd need to define

some way to interpolate the field values, but that shouldn't have to be too
complicated.
*

I guess that depends where/how you want to use the template. The current
resources display for the knowledge map uses quite a bit of css and a
little bit of javascript in the [additional info] link. But feel free to
rewrite this so that it ionly uses basic html tags.

Another way to handle this, which might be more consistent, would be to
simply have the global resources.txt give a set of default attributes
associated with each resource. Then, the "source: gpml" field in a node's
resource entry would simply tell it to substitute in the corresponding
attributes from the global entry. That way, the two attribute sets will be
concatenated, and the view won't have to worry about whether the individual
fields came from the global resources file or the node-specific one.

Sounds good to me

To avoid reinventing the wheel. The text editors already have lots of
features that people like, so there's no sense in forcing everyone to use a
common web form interface.

I don't think using the browser is "reinventing the wheel." Data entry via
a browser is not a new concept, and there's lots of highly developed
libraries, e.g. jquery, that provide a host of robust features that we
could use to craft nice IDE. But yes, my favorite text editor is more
comfortable than a browser, and it would be nice to improve it for editing
kmaps. That being said, the bottleneck for agfk will be getting users to
contribute content. Telling someone to clone our content git repository,
create and edit six text files for each node, then send us a pull request
with the content is a pretty big overhead (and adding in that they should
use our emacs plugin doesn't help much in this regard). And quite frankly,
I doubt more than a handful of individuals in CS fields would contribute.
So I feel we should spend a lot of time developing the browser interface to
the content. This way, users can simply click a button ("add node"), fill
in the data have it immediately av ailable. We can even incorporate
realtime visualization of the graph they're creating. The point is that we
want to entice users into contributing, not make them jump through hoops.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/3#issuecomment-16353905
.

cjrd · 2013-04-17T12:33:06Z

I agree that we shouldn't focus on allowing complex changes through the web interface at this time.
My argument, is that given the option between developing a backend tool to accomplish a given task and developing frontend capabilities to accomplish the same task, we should focus on the latter if they require the same amount of effort. For instance, entering new resources into resources.txt is currently not set up that well. We have to manually check that a source is unique, both by name and content. We could either (i) write a python script or emacs extension that parses the resources.txt file and checks that the resource is unique or (ii) use a frontend form to submit resources that performs the same verification. Both of these tools require roughly the same amount of effort to build but option (ii) should be easier for the end user: simply fill our the required field and press "send" and this tool can be used without forking our repository, downloading an emacs extension, or running a python script. I believe that in the long run, content editing should take place mostly from the web interface simply because most users won't want to download/understand our entire project in order to contribute. So it makes sense to start developing these frontend tools now. This way we can debug the interfaces, begin seeing what type of functionality works well, and provide the eventual users with a well polished interface. I agree that we should work with outsiders to improve this interface (when the time comes, that is), but presenting a user with a good prototype and asking "how can we improve this" is better IMO than asking "what do you imagine to be a good interface".

On Apr 14, 2013, at 8:04 PM, Roger Grosse wrote:

It's not a matter of browser vs. text editor, and it could be that
libraries like jquery turn out to be the best way to construct a GUI for
editing the text files. There's a whole ecosystem already built up around
text files, including text editors, UNIX command line tools, Git, Github,
etc. In order to replace text files, we'd have to reimplement a lot of
functionality associated with each of these in order to make it as usable
and intuitive.

Assuming we go with the two-tiered system, it's already possible for people
to make relatively self-contained contributions (adding stuff for
individual nodes) through the web interface. It shouldn't be too hard to
set it up so you can add new nodes this way either. The only obstacle is
for making complex changes like splitting nodes, and this is going to be
tricky no matter how we handle it. I'm sure we can make a graphical
interface that's easier to use than text files, but I doubt our version 1.0
will be.

Now, if people in other fields get excited about this and are willing to
put in a lot of time to build up whole maps, I agree we should do
everything we can to make things easy for them. This is certainly a problem
we'd like to have. But I think we can worry about this when the time comes.
Let's just make it clear that we're happy to talk to them to figure out
what would be easiest for them. Then we'd be able to iterate with actual
users. Hopefully the combination of their feedback and the experience of
contributors under the current format will let us design an interface
that's simpler and more intuitive than whatever we'd come up with now.

On Sun, Apr 14, 2013 at 12:33 PM, Colorado Reed notifications@github.comwrote:

Why would templates involve Javascript? I wouldn't expect them to involve
anything more than simple HTML tags such as * or . We'd need to define

some way to interpolate the field values, but that shouldn't have to be too
complicated.

I guess that depends where/how you want to use the template. The current
resources display for the knowledge map uses quite a bit of css and a
little bit of javascript in the [additional info] link. But feel free to
rewrite this so that it ionly uses basic html tags.

Another way to handle this, which might be more consistent, would be to
simply have the global resources.txt give a set of default attributes
associated with each resource. Then, the "source: gpml" field in a node's
resource entry would simply tell it to substitute in the corresponding
attributes from the global entry. That way, the two attribute sets will be
concatenated, and the view won't have to worry about whether the individual
fields came from the global resources file or the node-specific one.

Sounds good to me

To avoid reinventing the wheel. The text editors already have lots of
features that people like, so there's no sense in forcing everyone to use a
common web form interface.

I don't think using the browser is "reinventing the wheel." Data entry via
a browser is not a new concept, and there's lots of highly developed
libraries, e.g. jquery, that provide a host of robust features that we
could use to craft nice IDE. But yes, my favorite text editor is more
comfortable than a browser, and it would be nice to improve it for editing
kmaps. That being said, the bottleneck for agfk will be getting users to
contribute content. Telling someone to clone our content git repository,
create and edit six text files for each node, then send us a pull request
with the content is a pretty big overhead (and adding in that they should
use our emacs plugin doesn't help much in this regard). And quite frankly,
I doubt more than a handful of individuals in CS fields would contribute.
So I feel we should spend a lot of time developing the browser interface to
the content. This way, users can simply click a button ("add node"), fill
in the data have it immediately av ailable. We can even incorporate
realtime visualization of the graph they're creating. The point is that we
want to entice users into contributing, not make them jump through hoops.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/3#issuecomment-16353905
.

—
Reply to this email directly or view it on GitHub.

rgrosse · 2013-04-17T15:25:19Z

I think we both agree that we eventually want all the content editing to be
done in a way that's more convenient than just editing text files. But it's
going to be a while before the GUI is robust enough to replace the text
version completely. We'll tackle a lot of the some questions anyway as we
work on the user content editing forms, and at some point it'll become
clear that it's time to get rid of the text and replace everything with
that interface. Before that happens, I think the two-tiered system is a
way to get something up and running quickly which has 90% of the desired
functionality. Whenever there are bugs or missing features in the
submission form -- and there will be -- we have the text DB to fall back
to.

In terms of your example, editing resources.txt requires pressing ctl-S to
search for the key and the name of the textbook. We'd still have to do
something analogous through the online form, e.g. search for the resource
name to see if it exists already. (And the search would have to be flexible
enough to account for variations on the title. E.g., "Coursera: Neural
Networks" would have to turn up "Coursera course on neural networks.") This
could be slightly more convenient if done right, but probably not a huge
time saver.

On Wed, Apr 17, 2013 at 8:33 AM, Colorado Reed notifications@github.comwrote:

I agree that we shouldn't focus on allowing complex changes through the
web interface at this time.
My argument, is that given the option between developing a backend tool to
accomplish a given task and developing frontend capabilities to accomplish
the same task, we should focus on the latter if they require the same
amount of effort. For instance, entering new resources into resources.txt
is currently not set up that well. We have to manually check that a source
is unique, both by name and content. We could either (i) write a python
script or emacs extension that parses the resources.txt file and checks
that the resource is unique or (ii) use a frontend form to submit resources
that performs the same verification. Both of these tools require roughly
the same amount of effort to build but option (ii) should be easier for the
end user: simply fill our the required field and press "send" and this tool
can be used without forking our repository, downloading an emacs extension,
or running a python script. I believe that in the long run, content editing
should take place mostly from the web interface simply because most users
won't want to download/understand our entire project in order to
contribute. So it makes sense to start developing these frontend tools now.
This way we can debug the interfaces, begin seeing what type of
functionality works well, and provide the eventual users with a well
polished interface. I agree that we should work with outsiders to improve
this interface (when the time comes, that is), but presenting a user with a
good prototype and asking "how can we improve this" is better IMO than
asking "what do you imagine to be a good interface".

On Apr 14, 2013, at 8:04 PM, Roger Grosse wrote:

It's not a matter of browser vs. text editor, and it could be that
libraries like jquery turn out to be the best way to construct a GUI for
editing the text files. There's a whole ecosystem already built up
around
text files, including text editors, UNIX command line tools, Git,
Github,
etc. In order to replace text files, we'd have to reimplement a lot of
functionality associated with each of these in order to make it as
usable
and intuitive.

Assuming we go with the two-tiered system, it's already possible for
people
to make relatively self-contained contributions (adding stuff for
individual nodes) through the web interface. It shouldn't be too hard to
set it up so you can add new nodes this way either. The only obstacle is
for making complex changes like splitting nodes, and this is going to be
tricky no matter how we handle it. I'm sure we can make a graphical
interface that's easier to use than text files, but I doubt our version
1.0
will be.

Now, if people in other fields get excited about this and are willing to
put in a lot of time to build up whole maps, I agree we should do
everything we can to make things easy for them. This is certainly a
problem
we'd like to have. But I think we can worry about this when the time
comes.
Let's just make it clear that we're happy to talk to them to figure out
what would be easiest for them. Then we'd be able to iterate with actual
users. Hopefully the combination of their feedback and the experience of
contributors under the current format will let us design an interface
that's simpler and more intuitive than whatever we'd come up with now.

On Sun, Apr 14, 2013 at 12:33 PM, Colorado Reed <
notifications@github.com>wrote:

Why would templates involve Javascript? I wouldn't expect them to
involve
anything more than simple HTML tags such as * or . We'd need to define

some way to interpolate the field values, but that shouldn't have to
be too
complicated.
*

I guess that depends where/how you want to use the template. The
current
resources display for the knowledge map uses quite a bit of css and a
little bit of javascript in the [additional info] link. But feel free
to
rewrite this so that it ionly uses basic html tags.

Another way to handle this, which might be more consistent, would be
to
simply have the global resources.txt give a set of default attributes
associated with each resource. Then, the "source: gpml" field in a
node's
resource entry would simply tell it to substitute in the corresponding
attributes from the global entry. That way, the two attribute sets
will be
concatenated, and the view won't have to worry about whether the
individual
fields came from the global resources file or the node-specific one.

Sounds good to me

To avoid reinventing the wheel. The text editors already have lots of
features that people like, so there's no sense in forcing everyone to
use a
common web form interface.

I don't think using the browser is "reinventing the wheel." Data entry
via
a browser is not a new concept, and there's lots of highly developed
libraries, e.g. jquery, that provide a host of robust features that we
could use to craft nice IDE. But yes, my favorite text editor is more
comfortable than a browser, and it would be nice to improve it for
editing
kmaps. That being said, the bottleneck for agfk will be getting users
to
contribute content. Telling someone to clone our content git
repository,
create and edit six text files for each node, then send us a pull
request
with the content is a pretty big overhead (and adding in that they
should
use our emacs plugin doesn't help much in this regard). And quite
frankly,
I doubt more than a handful of individuals in CS fields would
contribute.
So I feel we should spend a lot of time developing the browser
interface to
the content. This way, users can simply click a button ("add node"),
fill
in the data have it immediately av ailable. We can even incorporate
realtime visualization of the graph they're creating. The point is
that we
want to entice users into contributing, not make them jump through
hoops.

—
Reply to this email directly or view it on GitHub<
https://github.com/agfk/knowledge-maps/issues/3#issuecomment-16353905>
.

—
Reply to this email directly or view it on GitHub.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/3#issuecomment-16502703
.

cjrd · 2013-04-23T11:50:25Z

Yes, you're right. I realize we're nowhere near orienting kmaps towards "mass appeal" and we probably shouldn't focus too much on mass usability at this time; it's premature. We can reevaluate this issue as agfk evolves. I would certainly like to eventually do some iterative "focus group" type studies to build the content editing frontend.

rgrosse closed this as completed Aug 24, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle unique resources, e.g. phd thesis #3

How to handle unique resources, e.g. phd thesis #3

cjrd commented Apr 13, 2013

rgrosse commented Apr 13, 2013

cjrd commented Apr 14, 2013

rgrosse commented Apr 14, 2013

cjrd commented Apr 14, 2013

rgrosse commented Apr 14, 2013

cjrd commented Apr 14, 2013

rgrosse commented Apr 14, 2013

cjrd commented Apr 17, 2013

some way to interpolate the field values, but that shouldn't have to be too
complicated.

rgrosse commented Apr 17, 2013

cjrd commented Apr 23, 2013

How to handle unique resources, e.g. phd thesis #3

How to handle unique resources, e.g. phd thesis #3

Comments

cjrd commented Apr 13, 2013

rgrosse commented Apr 13, 2013

cjrd commented Apr 14, 2013

rgrosse commented Apr 14, 2013

cjrd commented Apr 14, 2013

rgrosse commented Apr 14, 2013

cjrd commented Apr 14, 2013

rgrosse commented Apr 14, 2013

cjrd commented Apr 17, 2013

some way to interpolate the field values, but that shouldn't have to be too complicated.

rgrosse commented Apr 17, 2013

cjrd commented Apr 23, 2013

some way to interpolate the field values, but that shouldn't have to be too
complicated.