-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make vma.py support XLS files #2
Comments
It would be great if a single Excel file for a solution could supply VMA (issue #2), customadoption/adoptiondata/etc (issue #3), plus allow additional sheets for various and sundry stuff specific to the solution which doesn't benefit from being done within Python. A way to do this would be to check if a sheet named "VMA" or "VMA Data" exists, and use it if it does. |
Let me see if I've got this right. Currently I think we have
And what's being proposed is (I think)
Is this correct, that the new files are intended to be in addition to the existing files? Or should something be replaced? Also, is the long-form/short-form column names what is intended? I've been assuming that the new Could you also clarify what you mean by supporting multiple VMA definitions within one file? For example, right now in
|
At this point I'd recommend maximizing for the human usage: Long-form column names, in an XLSX file, with multiple VMAs defined within one "Variable Meta-analysis" sheet essentially identical to what is currently in the XLSM file. (In earlier comments I misremembered this as being named "VMA Data") A number of the existing XLSM files use their "Variable Meta-analysis" sheet for calculations like:
We did not consider any of these uses when we first created the CSV files, we just had tools/vma_xls_extract.py fetch the final computed values and write them to CSV. So using the cars example, I think it would be great if we could:
A couple other notes:
|
Also: it would be nice if model/VMA.py continued to support use of a solution/<name>/vma_data/VMA_NAME.csv file, so that we don't have to go change all of the solutions all at once. In actual usage I would expect that if a VMA.xlsx file is present then there would probably be no CSV files for VMAs in that solution, that all of the VMAs would be defined in the XLSX file. |
Okay, I've made an example Do you know of any examples that have non-empty "Variable Meta-Analysis" sheets? All the |
The Excel files checked into the repository (and later deleted) are the Public versions. Some of the data used in the Drawdown models is licensed and has restrictions on redistribution. The Public models copy the VMA information to a Variable Meta-AnalysisDD sheet but remove the raw data, to avoid redistributing it publicly. The DD tab retains the Mean and standard deviation computed from the VMA data, which is sufficient for the model to run and generate results but avoids redistributing the licensed data. I placed two of the full files in https://drive.google.com/drive/folders/16ToiESaPkpz8Z-Hda2r2SuDIIoKXp1ik and made it available to anyone with the link. The CSP file is from solution/concentratedsolar, the SolarPVUtility file is from solution/solarpvutil. I'll leave them there for a while, long enough to retrieve them to work on this issue, though I'll need to take them down later. |
Thanks, you can delete them now. I'm finding this slow going to find time to work on, but I'd like to keep trying. |
I believe this is closed by #126 |
model/vma.py implements Variable Meta-Analysis, where we produce a value for an input variable like the yield for soybeans or the cost for a megawatt from natural gas powerplants. Researchers at Project Drawdown vet data sources as inputs, ideally multiple sources, and vma.py collects these sources and produces a single resulting value. This is typically the mean, though +/- multiples of the standard deviation is also common.
Right now in the __init__ method we call VMA._read_csv, which reads in a CSV file. The existing CSV files were produced via the vma_xls_extract.py code generator from the original Drawdown Excel models.
This issue concerns adding support for XLS files for VMA input, to allow researchers to more easily perform data normalizations like currency or unit conversion.
The desired steps are:
In vma.py VMA::__init__, check the file extension of the data source and add support for *.xlsx and *.xlsm
For CSV we require a separate file for each VMA. For Excel we want to support multiple VMA definitions within one file, to allow the researcher to implement their needed conversions once not have copies in multiple files.
Therefore, the code should open the Excel file and then search for the definition of its VMA. Searching for the name of the VMA within the first sheet of the workbook, and figuring out where the VMA definition is below that, is preferred.
advanced_controls.py, which instantiates VMA objects, knows the human-readable Title of the VMA it is looking for. The VMA::__init__ does not currently receive the Title as a parameter, but it can be added.
Please do not add a default value, the codebase is small enough that we can update all existing callers to pass in a proper value. If the backing file is CSV, the title argument may just not be used.
Add unit tests to model/tests/test_vma.py. Add an Excel file for use in the test in model/tests/data.
Please use Pandas read_excel() and ensure it works with the 'xlrd' backend, as we already have dependencies on xlrd in the tree. We do not currently have any dependencies on other Excel+python packages like openpyxl, and would prefer not to add new dependencies without a really good reason.
The text was updated successfully, but these errors were encountered: