New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flexible import of csv exports #127

Open
mfrasca opened this Issue Apr 14, 2016 · 5 comments

Comments

Projects
None yet
1 participant
@mfrasca
Copy link
Member

mfrasca commented Apr 14, 2016

From @mfrasca on December 17, 2015 13:10

@RoDuth writes in #55:

When I say csv dump I'm more thinking about the kind of output you can get from the reporting system (where you only output selected fields from the search results, etc.) just there is no way of importing this at this point. (e.g. If I have a batch of seedlings from a known collection source and I use half of them for a new accession for us and give the other half to another garden, I can pull our records up in Bauble and generate a report which provides our accession number, all the collection data and any other fields that may be of relevance to the receiving garden. Then when they get it, if it is in some universal format such as csv, they could import it into their database when they generate their accessions) And in this context I say csv only as it is so universally used. Its more the scenario that interests me and from our discussion the other day I think I remember you talking along the same lines in regards to the use of JSON. I have absolutely no JavaScript or JSON understanding so wont proffer a solution here but is there a way to make the 2 work together? I think most systems are able to produce a basic csv - can this be transformed, say using a script for instance, into JSON? Anyway, I'm out of my depth here, you get the point. Just looking for as universal a solution as possible.

Copied from original issue: Bauble/bauble.classic#222

@mfrasca

This comment has been minimized.

Copy link
Member

mfrasca commented Apr 14, 2016

Ross @RoDuth, this is the reason why I often repeat that a group of gardens should adopt Bauble and temporarily hire a programmer.
the JSON import-export tool came out of the JBQ period, and it's been useful to them and I'm using it extensively to import data into existing databases, but it does not import-export all the fields of all the database objects: only the ones the JBQ was interested in at the time we migrated their data. see for example #206.
what about you give me an example csv export as the one you mention, then I can tell you more. maybe it's just ½ hour work, or less than one day, just the time to write an external script, if the fields are already included in the current JSON export? if they're not included then no, it will take less than a week but more than a day.

@mfrasca

This comment has been minimized.

Copy link
Member

mfrasca commented Apr 14, 2016

From @RoDuth on December 18, 2015 6:48

I agree, it would be ideal to have a group of gardens. Not sure I am of much use in that regard, we have shared material with several gardens in the past but not repeatedly from any particular garden and I think they are all settled on the systems they use anyway.

I will see what I receive from ANBG and get it to you soon hopefully. I might see if I can get another example from one of the gardens that uses BGRecorder also. (Assuming then can provide it)

@mfrasca

This comment has been minimized.

Copy link
Member

mfrasca commented Jul 26, 2016

taken from an email to @RoDuth, replying on producing json stuff...

you read in a record,

you make sure it's in dictionary form, not just an array, (if you're reading csv files, there's an option for it in the standard csv library reader)

you will need a format string for:

  • family
  • genus
  • species
  • location
  • accession
  • plant

something that becomes a legal javascript representation of an object, once you do this in python: format_string % record

for each format string, you populate its fields with the fields from the dictionary (the record you read),

you accumulate the strings into sets, one set per type as listed above, and you are using sets so you are automatically saved from duplicates,

then you save the strings into a single json file, in the order of the list above. (you will need a leading [, a trailing ], and commas separating strings, and no comma after the last record, before the trailing ])

ghini should be able to read it, or you edit it by hand and you correct the script.

possibly (I'm guessing, building it now, you may check), these could be the format strings for family, genus, species:

{"epithet": "%(family)s", "object": "taxon", "rank": "family"}

{"author": "%(genus_author)s", "epithet": "%(genus)s", "ht-epithet": "%(family)s", "ht-rank": "family", "object": "taxon", "rank": "genus"}

{"epithet": "%(species_epithet)s", "ht-epithet": "%(genus)s", "ht-rank": "genus", "hybrid": %(is_species_hybrid), "object": "taxon", "rank": "species", "author": "%(species_author)s"},

you feed all of them with the same dictionary, in which I would make sure I have 'species' as the binomial name, and 'species_epithet' as the species epithet. then species_author, genus_author, location, is_species_hybrid, is_genus_hybrid, and I guess you can figure out the rest. the accession needs refer to the complete species binomial name.

as said: 'untested', it's just an idea.

it might be cleaner to work differently, using the json standard library module, but this sounds to me very simple and you do not need really work with json, it's just python strings.

you might have trouble with 'false/False' and 'true/True', as I'm afraid JavaScript would only accept the lower case versions. so maybe you first convert the Python boolean values into the correct JavaScript string representation.

for k in ['is_species_hybrid', 'is_genus_hybrid']:
  row[k] = ("%s" % row[k]).lower()```

I'm noting it here so we all have these notes and can work on it. flexibility can come later.

mfrasca added a commit that referenced this issue Aug 4, 2016

@mfrasca

This comment has been minimized.

Copy link
Member

mfrasca commented Sep 14, 2016

Garden in NL would adopt Ghini once this issue is solved.

@mfrasca mfrasca added prio:high and removed prio:low labels Sep 16, 2016

@mfrasca

This comment has been minimized.

Copy link
Member

mfrasca commented Dec 23, 2016

one more request, this time from Colombia. so let's just do this and see where we end.
the data is relatively well structured, the only trouble is with the binomial name, which contains several fields, and the vernacular field which is used to contain the family in case the plant is not better identified.

if all goes well, the header is (each field prefixed with P-plant, S-species, A-Accession, or multiple letters if I'm not so sure):
A-accession_no, S-vernacular, S-binomial+authorship, P-DBH, P-Height, S-habit, P-altitude, P-easting, P-northing, SP-ecosystem, SAP-notes

if not all goes well (maybe 10% of the cases), the binomial name may contain:
the identification qualifier (cf) which can be either cf or cf.
no authorship
a question mark
the vernacular (and the vernacular stays empty)
an incomplete identification (genus plus one of 'sp', 'sp1', 'sp.1', 'sp 1', etc)
or be empty, in which case the vernacular holds either a vernacular name or a family name.

mfrasca added a commit that referenced this issue Dec 23, 2016

mfrasca added a commit that referenced this issue Jun 1, 2017

@mfrasca mfrasca added the JBQ label Aug 16, 2017

mfrasca added a commit that referenced this issue Aug 30, 2017

RoDuth pushed a commit to RoDuth/ghini.desktop that referenced this issue Sep 8, 2017

RoDuth pushed a commit to RoDuth/ghini.desktop that referenced this issue Dec 30, 2017

RoDuth referenced this issue in RoDuth/ghini.desktop Aug 14, 2018

make csv imports work with multiple empty files
Previously they had failed if an empty tables (csv file) had a
dependencies on another of the empty tables that had not been committed
yet.  (e.g. prop_cutting_rooted, prop_cutting).  Also this way you get
warning that you are about to replace the current table (it may contain
data).

@mfrasca mfrasca removed prio:high labels Aug 21, 2018

mfrasca added a commit that referenced this issue Nov 17, 2018

mfrasca added a commit that referenced this issue Nov 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment