Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate preprocessing reporting into trade module #67

Closed
malexan opened this issue Jan 23, 2017 · 7 comments
Closed

Incorporate preprocessing reporting into trade module #67

malexan opened this issue Jan 23, 2017 · 7 comments
Assignees
Labels

Comments

@malexan
Copy link
Contributor

malexan commented Jan 23, 2017

@malexan:

I suggest the view below. Messages will be displayed in R session in interactive mode and simultaneously be added to a text file. The file will be accessible to any (authorised) SWS user.
Priority prefix (TRACE, INFO, WARN, ERROR, etc) and time stamp are optional. Reports for different > situations (text files for users or interactive sessions for R developers) can have different level of details (all messages, or only warnings and errors) and different format (time stamps etc).

INFO [2016-12-12 10:33:25] Archive file: ~/ce_combinednomenclature_unlogged_2013.csv.gz
INFO [2016-12-12 10:33:25] Trade data source: Eurostat 
INFO [2016-12-12 10:33:25] Start reading the archive file 
INFO [2016-12-12 10:35:55] Trade data records total: 11318256 
INFO [2016-12-12 10:35:55] HS chapters to select: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
+13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 33, 35, 38, 40, 41, 42, 43, 50, 51, 52, 53 
INFO [2016-12-12 10:35:58] Records after filtering by HS-chapters: 2468146
INFO [2016-12-12 10:36:01] Records after filtering by stat regime (ES only): 2388041 
INFO [2016-12-12 10:36:15] Records after removing duplicates: 145860
INFO [2016-12-12 10:36:15] Records after filtering out nonnumeric reporters: 139304 
INFO [2016-12-12 10:36:16] Records after filtering out nonnumeric HS: 138096 
INFO [2016-12-12 10:36:16] Records after filtering out HS shorter than 6: 136139 
INFO [2016-12-12 10:36:17] Records after filtering out HS outside agri intervals: 90722
INFO [2016-12-12 11:21:15] Reports in /home/sas/tmp/Rtmph47dEx/faoreports/20161212104537MSK/
INFO [2016-12-12 11:21:46] Multi and no link:
        totalrecsmulti totalnolink  propmulti   propnolink
1           2352          11 0.02592535 0.0001212495

Besides such reports data sets from all important processing stages are saved as csv files. I've sent similar file with unmapped HS->FCL links to you before.
The report and corresponding csv files are saved in a separate folder (with time stamp in its name). So every run of module will be traced by its own set of report files.

Carola:

This log report is very useful but the csv file will be essential (it is not
+very reader-friendly :-) ).

The report should contain more information (number and list of reporting
+countries, reported trade flow by country, number of items and records by
+country, etc.) to have a more detailed snapshot of the input data.
Question: will there be two separate report? One for UNSD data and one for
+Eurostat data?

   totalrecsmulti totalnolink  propmulti   propnolink
>1           2352          11 0.02592535 0.0001212495

I need an explanation on this table

@chrMongeau:

(Though I agree
on the low user-friendliness of the output: I think team B/C would
like to have pivot/dynamic tables, graphs, and other bells and whistles,
but that is just an impression (i.e., a wild guess))

@malexan malexan self-assigned this Jan 23, 2017
@malexan
Copy link
Contributor Author

malexan commented Jan 23, 2017

@chrMongeau have you run trade module recently? If yes, have you got file with [tiny] report?

@chrMongeau
Copy link
Member

@malexan I haven't updated the package locally. I'll try to do it today (eventually tomorrow).

@malexan
Copy link
Contributor Author

malexan commented Jan 25, 2017

@chrMongeau:

I merged the reporting branch in master, but I see that flog.info() just at
the very beginning of the file and not used anymore after loading esdata. I did
not find the relevant code with which you obtained the log in
#67

Folder in R_SWS_SHARE_PATH with name in form

file.path(
  Sys.getenv("R_SWS_SHARE_PATH"),
  SWS_USER,
  paste0("tradereport_",
format(Sys.time(), "%Y%m%d%H%M%S%Z")))

was to be created. With report.txt inside of it.

See details in the code.

The report contains just few lines (I want to test stuff with file writing in that SWS share folder).

@malexan
Copy link
Contributor Author

malexan commented Jan 25, 2017

I merged with hs6agrifilter#66 branch. So now in main.R there are two runs of filterHS6FAOinterest() for esdata and tldata.

The function is in R subdirecotory, but not exported, as I failed to run devtools::document().

@malexan malexan added the reports label Feb 2, 2017
@malexan
Copy link
Contributor Author

malexan commented Feb 21, 2017

@carola-f:

Below is a list of problematic countries. Could you map the HS codes of these countries?
It is an important question because lack or incomplete trade data could explain the anomalies in the final Food Balance Sheets results. This is true in particular for

  • 122 Lesotho
  • 147 Namibia
  • 188 Saint Kitts and Nevis
  • 191 Saint Vincent and the Grenadines
  • 193 Sao Tome and Principe

To get such statistics in a systematic way we need report inclusions allowing to check:

  1. Summary of original trade data sets for every reporter
  2. Summary of result of transformations per reporter at every major step
  3. Final summary per country.

For purpose of report compactness a summary per one country should fit in one line.

@malexan
Copy link
Contributor Author

malexan commented Feb 21, 2017

On the other hand it would be useful to have a possibility to extract walk-through results for a specific country.

Either we run the module for a specific subset of reporters (up to the mirroring) or we have a place (R list, for example) where we store all intermediate results for everything and then extract pieces of interest in any sequence and any resulting format.

@malexan
Copy link
Contributor Author

malexan commented Feb 22, 2017

We need to make decisions on several develop questions:

  1. How to store algorithms of processing data for generation of reports.
  2. Do we need to store calculations prepared for reports and, if yes, how to store them.

After the experiment with mixing of core module code with reporting instructions I decided it is better to split core processing from report generation.

In further development we can provide an advanced user with possibility of modifying reports without necessity for changes of core module. So reporting instructions should be stored and maintained separately from the module code.

It is possible to share features of reporting across other SWS modules. So source code for reporting should be developed taking into account the possibility of separation into a standalone R package.

I suggest to create all reporting functions with common prefix like str_ in stringr package. We can use rprt_ prefix.

All results of of calculations what can be used again during the session should be stored in one list. Later the list can be saved alongside with text of report and csv files. We can use name rprtdata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants