Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meetings, assignments, verbal discusions and questions, including those about math and programming #3

Closed
rainersachs opened this issue Aug 15, 2017 · 73 comments
Assignees
Labels
discussion Thread for the purposes of managing assignments and workflow.

Comments

@rainersachs
Copy link
Collaborator

I suggest we consider every file except today's upload by me and the Rmd file as obsolescent and try to transfer their information into the file I just uploaded. I forgot to ask Edward how to merge files within GitHub. I have done some of that (painfully) with R-Studio. Once we are sure a particular other file is obsolescent, let's rename it to contain the word OBSOLETE. I think maybe we can already do it to every file except 2: the one I just committed that ends in GH.R (for GitHub) and Mark's file. Mark: please change your filename to something more informative ending in GH.R, e.g. OurIDERs_vs.2017ccHazardGH.R or something.

@rainersachs
Copy link
Collaborator Author

Edward: Thanks for the general style comments in HGSynergyMain.R. I only just found them and will read them soon. Can you open an issue which contains advice, move the style comments there, add the comment you made today about the command to freeze everything above a buggy line while you play with the line, and periodically update the advice.
As best I can tell, if we can somehow merge the information in HGSynergyMain.R plus the two files I just renamed to end in GH.R we are almost done except for 95% CI on I(d). But merging without losing information or introducing bugs looks like it could be a very nasty job to me. Do you know how to merge within GitHub? If so I think the best way might be for the two of us to do it together in a long session. Otherwise it would, I think, take either one of us alone something like 20 hours to do it.

Edward: please close the other two issues unless you have reason to keep them open.

@eghuang eghuang closed this as completed Aug 22, 2017
@eghuang eghuang reopened this Aug 22, 2017
@eghuang
Copy link
Member

eghuang commented Aug 22, 2017

Ray: I think we may have to do this manually. There are two cases of easy merges that I know of:

  1. File A and file B have code that are mutually irrelevant, i.e. nothing in either script affects the behavior of the other. We can just copy and paste the whole script in this case.
  2. File B is an updated version of file A, i.e. file A's script is obsolete. We can just copy B over A or delete A.

Our merge obviously falls into neither of these categories and I am not aware of a way for Github to know what script we wish to keep or discard between our files. I will try merging the two today and tomorrow, and if that does not work then we can work on it together.

@rainersachs
Copy link
Collaborator Author

rainersachs commented Aug 23, 2017 via email

@eghuang
Copy link
Member

eghuang commented Aug 24, 2017

I've merged the scripts into one file, HGsynergyMain_merge.R, such that there are no errors in running the code and the plots are the same as they were when the code was in separate files. I've also removed redundant code between the files and left comments with the tag #egh where values differ between files (e.g. phi <- 3e3 #egh phi <- 1000 in HGsynergyHZE_GH.R).

@rainersachs
Copy link
Collaborator Author

rainersachs commented Aug 24, 2017 via email

@rainersachs
Copy link
Collaborator Author

Edward (and Mark). HGsynergyMain_merge.R runs nicely on my machine. Commenting is really clear. Thanks.

I will next work on trying to eliminate redundancies and obsolete parts from that script. For example I think we ended up with 2 inconsistent models for the fast light ions, with the one on lines 134- 144 and line 195 obsolete (though functional). And my comments need to be brought up to date, not to mention needing to try to follow your style tips.

I suggest you and Mark next work on the last unexplored and hardest part: Monte-Carlo simulations for MIXDER 95% CI using variance-covariance matrices. Assume a mixture of HZE only (no light, fast, low-LET ions until I have cleaned them up a little and eliminated the obsolete version, and you guys or I have solved the bugs mentioned in the comment on line 293 for the version that is not obsolete).

Mark (and Edward). I think we may be able to use the theory of functions of a complex variable in a non-trivial way in connection with defining an IDER with an ODE initial value problem dE/dd=F(E) and E(0)=0, where F(z) is a function, of a complex variable z=x+iy=E+iy, that has no singularities on the non-negative part of of the real axis and the restriction to that domain is real (e.g. any polynomial function of z with real coefficients). The idea is to see if there are relations between the locations of the zeros and behavior of the IDER. For example F=1+z^2 has zeros only on the y axis and the resulting IDER has the unpleasant property that E reaches + infinity at a finite value of d =pi/2; is that just a coincidence? In the 19th century people got a lot of mileage out of looking for the location of zeros (and, of course, for the location of singularities). I'll write this up for you guys if I ever have time.

@rainersachs
Copy link
Collaborator Author

The correction needed for beta and lambda was in line 231, not line 247. I think the file is OK now.

@rainersachs
Copy link
Collaborator Author

I would like to call a meeting. I am available most times, 7 days a week except Saturdays AM and Monday Sept. 11 PM. Mark and Edward please agree on a time and let me know.

I tentatively decided on a low LET model, and just uploaded a file (merge2) which I think has everything we have done to date and no redundancies, assuming that low LET model. However to make a final decision on the low LET model I need Mark to calculate information coefficients (Akaike and Bayesian) and compare with 17Cuc. I think Edward should work on Monte Carlo calculations of 95% confidence intervals for HZE MIXDERs . These will be useful even if we later change the low LET model because our HZE model is already decided. They will also act as templates for calculating more general MIXDERs. They will also insure Edward gets involved in very specific details of our particular calculations.

@eghuang
Copy link
Member

eghuang commented Aug 28, 2017

I contacted Mark yesterday and will let you know the meeting time as soon as I can.

@eghuang
Copy link
Member

eghuang commented Sep 2, 2017

How about next Wednesday, 2pm at Strada?

@rainersachs
Copy link
Collaborator Author

Wed 2 at Strada is fine.

Mark: Are you getting these issues messages? Do you plan to sign your learning contract?

@rainersachs rainersachs changed the title obsolete files. Meetings, assignments, verbal discusions and questions, including those about math and programming Sep 5, 2017
@rainersachs
Copy link
Collaborator Author

Hi:

We meet Wed. the 6th 2 PM at the Strada. I think we better meet weekly during the semester to become more focused, so please bring your schedules

Mark: please come prepared to report how far you have gotten on the following assignment which we discussed earlier and try to ask enough questions to make sure you know how to carry out the assignment within the next few weeks. The assignment is the following: Study the theory behind and implementation of calculating information criteria (ICs), especially Akaike and Bayesian. Calculate them for our low LET IDER in merge2 on Edward’s web site. Calculate them for the NTE1 and NTE2 low LET IDERs in 17Cuc the same way (hold background and alpha_lambda fixed; make the IDERs be zero for dose 0; calibrate their 2 parameters using only the non-zero dose data). Compare. Once this is done you will be asked to repeat the calculation for HZE IDERs. About midway in the semester I will ask you to give a talk to our HG group on the theory of IC. If you are at a loss to decipher the assignment as stated, please try to review the relevant terminology and relevant lines in merge2. In addition, if you haven’t changed your mind, please read and sign your learning contract.

Mark and Edward: I had a very strong new URAP applicant, Yimin Lin, for this semester. He will be in I decided to put him in charge of the theory and implementation of error analysis based on Monte Carlo sampling of variance-covariance matrices, which we eventually need for 95% I(d) CI. He will also report to us on the theory during the semester.

Edward: I would like you to be in charge of quality control and testing of the programs during the semester and eventually report to us on that. Also let’s take a chance that Mark’ IC answers will not be unfavorable. So please try to extend merge2 to the case of a mixture involving N≥1 HZE and one low LET ion, using the IDERs in merge 2.

See you guys Wed! Ray

@rainersachs
Copy link
Collaborator Author

Minutes of meeting 9/6/17. Mark Ebert, Edward Huang, and Ray Sachs met at the Cafe Strada for an hour.
We agreed on the following plan for the semester. All 4 of us will work on a script that will be able to apply synergy theory to the new Harderian gland (HG) mixed GCR radiation field data that will be available in some months, 18 months after the actual experiments due to tumorigenesis lag time. Concurrently, Mark will be in charge of breeding information coefficients (IC) and eventually explaining them to our pod or the whole URAP class, Yinmin will be in charge of breeding our variance-covariance matrices in R, caring for them, feeding them, and showcasing them. Edward will be in charge of debugging, testing and quality control of our program(s). I reiterated that if time permits there are many additional instructive and useful calculations and ideas to pursue, which are, however, of lower priority.
We decided the three of us will meet again 11:20 Thursday the 14th at the Strada. We are hoping that we can find times bunched in such a way that I can meet with each of you, including Yinmin, individually for a half hour or so and we can also have shorter 4-way or at least 3-way meetings and student pairwise meetings on the same day. If Yinmin can make it Thursday mornings that will work. If we cannot find times then I will continue to meet Edward and Mark Thursday mornings weekly, meet Yinmin Fridays at 2:30, and we will arrange occaisional 4-way meetings at other times. Edward has locations where 4 way meetings are convenient and I suggest from now on he be responsible for all meeting organization.
Edward explained some R commands, notably browser(). He will continue to add to the other issue, on style. We discussed what Mark needs to do to find the IC of immediate interest.
Please make sure Yinmin and Mark have access to these issues and get notifications when comments are added. Any material I have that is of interest to our entire 4-pod will only appear on this repository from now on.
Please post any additions or corrections to these minutes here.

Thanks!

@rainersachs
Copy link
Collaborator Author

In my minutes of Wed. 9/6/17 meeting I forgot to add an additional assignment that Edward and I agreed on. At the moment merge2 has a function to calculate baselines for mixtures of any number N of HZE and one to calculate baselines for a mixture of one HZE with one low LET ion. The latter should be extended to mixtures with N HZE ions and one low LET ions. Maybe we only need one R function to calculate simple and incremental effect additivity baselines for N>=1 HZE and either 0 or 1 low LET ion.

@rainersachs
Copy link
Collaborator Author

Yimin and I met Friday. His main programming assignment for the semester is writing code to calculate 95% CI for I(d) baseline MIXDERS. He will start with the three-ion HZE mixture defined in line 77 ff. of the HGsynergy_merge2.R code. For that mixture he will use the variance-covariance matrix determined by nls( ) regression in the code. He will use appropriate R functions that are already in the relevant packages. During the first few weeks he will be mainly concerned with getting the calculation for this specific example to work. Later in the semester he will generalize the calculation and also go more deeply into understanding the math/stat behind the packaged R functions.

Some logistic items that resulted from the meeting are the following. Weekly 4-way meetings are not feasible. Yimin and I will meet Fridays around 2 in my office. Edward, Mark, and I will meet Thursday mornings. In addition we will arrange at least one 4 way meeting sometime within the next month and one 6-way meeting with the other pod sometime during the semester.
Yimin and Edward will both be working on HGsynergy_merge2.R; please coordinate, e.g. by using GitHub pull requests.
My phone numbers are: 510-658-5790 for most times; 510-206-7483 only when I am already on the way to one of our meetings.

I think the project is moving forward well. Thanks to all 3 of you.

@rainersachs
Copy link
Collaborator Author

Yimin: See you tomorrow 2:30, my office. I downloaded improved versions of the previous paper and of a nice improvement edward made on the code. But if you prefer you can keep working on the earlier code and reading the earlier paper -- once we have one Monte Carlo CI estimate, we will be able to generalize pretty easily I think.

@yiminllin
Copy link
Collaborator

Hi guys, I just pushed the confidence interval code to the branch "ConfidenceInterval". The only change I made to the original code is adding new code, commenting out the plotting code and arrange obsolete file to a folder. I also add some comments for reading. If the code works well I will merge it to the main branch. This is just the first version so if there is any issue please tell me. I will try to implement naive version of calculating CI by next week. Have a nice weekend.

@rainersachs
Copy link
Collaborator Author

errorMessages.docx
This .docx file mentions some issues with Yimin's confidence interval code. In brief, it becomes abnormally slow, so slow it stopped altogether after 87 Monte arloand it has problems with dose intervals being "too small". However it did produce a graph which looked plausible

Also while stumbling around I made a superfluous branch for this repository. Yimin please delete

@yiminllin
Copy link
Collaborator

Hi Professor,
stopping after 87 iterations is expected becasue 87 iterations means 87 dose points rather than 87 monte carlo, which means we have made 87*500 monte carlo sampling. For the step size it also seems weird to me, and I think the reason behind it is "deSolve" package we used to solve ODE. In order to get accurate result, the ODE solver just take arbitrarily small step sizes. I will look into this issue later.

@rainersachs
Copy link
Collaborator Author

Oh. I see. Thanks for your prompt reply. That sounds less bad than I thought it was.
I agree ode() is one of the problems. It is adaptive, so it is presumably taking small steps already near dose zero.
But the slowing down seems more typical of the behavior when some vector has been initialized to a certain length and the program is adding information for indices bigger than than the length.

No hurry!!

Ray

@yiminllin
Copy link
Collaborator

Hi guys, I just added a few lines to implement the naive method to calculate CI (consider each parameter separately), and the graph seems Okay to me. I pushed the code to the master branch directly so hopefully I did not mess up the commits...

@rainersachs
Copy link
Collaborator Author

Hi:
Minutes for week of Sept. 11. 2017
I met Thursday with Mark at the Strada, Edward over the phone, and Friday with Yimin at my office. It is possible that as a result of the subsequent calculations by Edward and by Yimin we have already got code which addresses all the major topics that might arise during the whole project with the exception of calculating information coefficients (IC) and comparing to earlier models. I am confident that the IC can be calculated so we may be finished as far as possible fatal obstacles. If so we still have a whole lot of work to do: eradicating bugs; cleaning up the code in many ways; cleaning up the commenting and variable names in many ways; adding models that consider only targeted effects (TE), not both NTE and TE (TE models are simpler than the TE+NTE model we are working with now (and simpler than the NTE models in 17Cuc which are actually TE+NTE); writing a report; cleaning up GitHub; understanding the math and motivations behind the R programs; etc. Just that stuff might take all semester but all except perhaps eradicating bugs can clearly be done; none except the bugs is likely to be fatal.
However the confidence interval (CI) part runs so slowly on my computer that I have not had a chance to judge if there are mistakes in the code that allow it to run but give the wrong answers. I will try to run the code overnight, tomorrow night if I have time, and then see if I can do some checks using just the environment without re-running the CI parts.

Agenda for week of September 18. Programming and Github: Mark is working on ICs, Yimin if he has still more time than he has already spent is working on on improving the CI part of our program; Edward, if he has still more time than he has already spent should work on checking his MIXDER results and/or on cleaning up GitHub and giving a protocol for its use. I don't use GiyHub correctly as regards folders and pulls and pushes and have already added superfluous stuff that I don't know how to get rid of.
All of us should continue to study the relevant literature as time permits.

Outlook:
Quite possibly all the rest will be plain sailing, tedious at times, but all amenable to systematic improvements. But we cannot be sure of that until the code runs faster than it does now; so some chance of running into an obstacle that requires drastic major changes in our approach remains. On balance we are ahead of where I thought we would be at this point in the semester.

@rainersachs
Copy link
Collaborator Author

Meeting minutes:

Edward and I met at the Strada today for about 90 minutes, discussed a lot of details, discussed over-all plans for the next 9 months, and got a whole lot done.

At Edward's request I put a .pdf copy of the paper submitted today, SynergyRR, in Edward's repository.

Edward's MIXDER program seems to run well. If Yimin's program, to be discussed tomorrow, works as well, all the rest of the project will almost certainly be feasible and we can start to plan our paper.

Assignment for next week:
Edward will see how Yimin's code runs on his computer. He will produce a temporary, truncated version of Merge2:R which omits Yimins code so I can run it on my computer quickly and start to implement my quality control assignment. He will do a lot of work on GitHub, e.g. teaching us how to run .Rmd programs on GitHUb if they won't run on our own computers. Mark will continue to program IC calculations. Yimin's assignments will be posted after he and I meet tomorrow.

@eghuang
Copy link
Member

eghuang commented Sep 21, 2017

@rainersachs I have created a new file doseExploration.R for your quality control tests which omits Yimin's CI calculations. I also ran all of merge2 and reproduced my plots without issue so I'm not sure where you're getting an error. Perhaps try reverting your merge2.R to the current version?

Also, a quick note on Rmarkdown - .Rmd files are by default rendered by Github so we don't need to actually run anything. For example, this is a Rmd vignette written by a colleague:

https://github.com/cmerow/meteR/blob/master/vignettes/meteR_vignette.Rmd

Github shows the output of the file by default, and you can also view the code itself by clicking "raw". I will update this post or make new posts as I make changes to the repository.

UPDATES:

  1. Yimin's code also seems to run very quickly for me, < 10s so I'm not sure what's causing the long runtimes on your machine.

  2. I cleaned up the repository a bit. There are a few files that I didn't touch (Mark's files).

@rainersachs
Copy link
Collaborator Author

I think for the time being we may need to keep Yimin's HZE CI script separate from HZEsynergyMain_merge2.R and I just downloaded a file which on my computer implements that separation.
More generally we need a protocol to avoid stepping on each other's feet. I suggest we aim in this repository for a protocol which allows only Edward to commit files to the main branch. Yimin and I should have to ask his permission (e.g. as the reviewer) via GitHub mechanisms like making a branch and a pull request which I am in the process of trying to learn to use efficiently.

@rainersachs
Copy link
Collaborator Author

At our meeting Yimin pointed out that his CI calculation can probably be speeded up a lot by using a single set of 4 parameters for each of 500 MIXDERs instead of generating new parameters at each dose point of 1 mixder. That is anyway the correct approach in principle: If we have misestimated the parameters then we need a single better set and that will apply to each dose point. Speeding up would be a big plus.

Both Yimin and Edward have emphasized that we might as well do version control manually instead of insisting on the use of GitHub machinery designed for much bigger programs, with many more collaborators, and much more stringent deadlines. So I withdraw my previous comments on needing to use branches and viewers; everybody can commit directly (but please not indiscriminately -- in case of doubt ask Edward) to Edward's main branch and we will be able to reconcile discrepancies by hand.

@eghuang
Copy link
Member

eghuang commented Sep 25, 2017

Based on the methods sections described in Ray's recent synergy theory paper (a draft is located in the folder misc_materials), it appears that the script is approaching its final edits. I will begin cleaning up and organizing merge2.R with respect to several objectives:

  1. The raw script and its calculations should be very readable to researchers in this field who are at least superficially acquainted with R.
  2. The script should reflect good coding practice and style.
  3. The script should clearly reflect principles of reproducible science and chronologically follow our own methodology.
  4. The script should be easily grafted to an Rmarkdown file if we choose to do so.

@rainersachs rainersachs reopened this Nov 26, 2017
@yiminllin
Copy link
Collaborator

Hi guys,
I finished implementing IC part. I implemented it based on Dae's code. Honestly I totally don't get the theory behind the information criterion stuff, so I just use the formula of AIC and BIC on Nonlinear Least Square for granted. The code works, but maybe there are some bugs I didn't notice. Also I experiment with different background (y0) as Rainer suggested. I didn't update my plotting script (time is limited), and I planned to organize them over this winter.
I believe I will not have time over RRR week to do extra work for this project (exams to review and a presentation to prepare). Anyway, it is a great semester for me to work on this project!
Good luck on the final~ Happy studying.

@rainersachs
Copy link
Collaborator Author

# Over All Plan for Spring Semester
Starting Jan 1 we should carry out the following steps.

  1. Make a master file for the main calculations. Thereafter: this will be the only master version; it will live on GitHub; all changes to it should go through me; all other versions should only be in sandboxes in our own computers or, if any other versions are in GitHub, they must be clearly labelled in their filename as subordinate, obsolete, temporary, or some similar word.

    Yimin: in our latest meetings you described a whole lot of very useful and interesting results. It is time to make those available to the whole pod. I tried my best to deal with not using your script until you are satisfied with it, but that really didn't work during vacation. It cost me time and confusion. I keep getting new information from NASA colleagues or new papers; I keep getting new ideas to try out; I have to put the corresponding changes somewhere; I ended up often making the same change or tries in various different scripts, and introducing inadvertent inconsistencies; also I was sometimes afraid to delete stuff because I was not sure that I might inadvertently delete information found nowhere else. We will be making a big push at the start of the semester and it is time to be more systematic about the results we already have and all the comparatively minor changes we will still in all likelihood need.

    As soon as we have a master, even though it will still be quite preliminary, we can start on other jobs, and eventually get to some jobs that are actual new research, not just routine.

  2. Clean the master enough to make correcting and extending it a bit more convenient. Cleaning it entirely can wait till much later.

  3. Testing the master. All members of the mouse pod should try to find bugs and misunderstandings. I will leave line-by-line checks to you guys. Instead I will pick lots of cases where I can independently calculate or think I can guess results or trends, and see if the script gives figures which show the expected features.

  4. Clean up and add enough figures for drafts of the LSSR paper. This will be stepwise: get what seem to be enough figures; write the results section of the paper; change some figures, delete some, add some, write a second draft of Results; etc.

  5. Do actual research and write new chunks of the master. We still need code for the TE models. Yimin already has some important robustness (actually, it turned out, lack of robustness) results but we need more sensitivity results. Other forays into unknown territory will emerge as we write the paper. All we know at present is that we have enough for a paper; optimizing the paper remains to be done and will undoubtedly involve some additional calculating and coding.

  6. Have some meetings with some of the chromosome aberration pod to discuss issues common to both pods.

  7. Start to think about another paper. That will involve brand new calculations. Mark may be starting on some possible ones.

@yiminllin
Copy link
Collaborator

Hi guys,
Hope you are enjoying your vacation. I just finished organizing plottingYimin.R, and I believe the code is cleaner and more readable. I encapsulate the important calculations into some main functions, namely: plottingHelper, multiplotHelper and CIHelper. Your could treat them as blackbox and these three functions simply do all the calculation and plotting required. The code is absolutely clean to work with now, and you could check and modify them easily. I did not add detailed comments, but only brief ones, because I believe the code itself is self explanatory. Some important notes here: I output the plots into "~/Desktop/plots/", and you could change it by changing the parameter of function: plottingHelper, or simply do a substitution in .R file. Also, I didn't modify the dataframe (195, 197 to 180 stuff), but the code could be easily adapted to this change when needed. Generated eps files are in this github folder : plots/updatedVersion/.
There will be some minor issues (e.g. tick marks, size of gaps, text on axis), and I will fix them in the future.

@rainersachs
Copy link
Collaborator Author

Happy new year!

Our main HG pod job spring semester 2018, which I outlined 7 days ago above, will be getting figures for, and then writing, a LSSR paper. The elegant code Yimin posted 5 days ago probably shows that no roadblocks remain in principle, so that we may be able to finish this job this semester by a lot of work on the details. I say "probably" because I cannot yet run the code on my computer to check, for a variety of reasons. But I think when Edward gets back we should be able to implement the key first step of the main HG pod job, making YinMin's code on GitHub the dominant master file and subordinating all other files to it. I wrote Edward a detailed plan for this.

Meanwhile I cooked up a more mathematical (and speculative) project that is completely independent of all other URAP projects to date and uses the theory of functions of a complex variable. Yimin said that he might work on the project sometimes as a break from programming the main HG project. I have just posted a .pdf file describing the first few steps. I am sending Yimin two auxiliary .docx files by email because I suspect you guys (like me at the moment) may not find .docx files on GitHub useful (let me know if you can do something with them and help me figure out how to do it).

Looking foeward to seeing you soon, Ray

@rainersachs
Copy link
Collaborator Author

Regarding the functions of a complex variable optional project here are a few further comments. Pretty much all I know about the project is in Section A5 of IEA_optional_complex variable_assignment.pdf that I put in the misc_materials GitHub folder yesterday. That section A5 contains the following.

  1. Some scattered comments and results on incremental effect additivity (IEA) behavior for mixtures. These comments and results are relevant to this optional project only insofar as they motivate the choice and interpretation of IDERs and slope generators, as illustrated in item 2 below. The project concerns IDERs and slope generators. If we can clarify those, using the IDERs in IEA synergy theory will be a follow-up project.

  2. A5 gives some key examples of slope generators which are polynomials all of whose zeros in the complex plane are simple (the Taylor series a_1z +a_2z^2+... in the neighborhood of the zero has a_1 non-zero.
    For this case a full classification of the following form can and should be carried out for IDER types characterized via locations of the zeros of the slope generator in the complex plane
    (a) The IDER for non-negative (real) doses is not defined for large doses because it approaches +- infinity at a finite dose. This is the case we have to avoid. It occurs, for example, if all the zeros are located on the imaginary axis but not at the origin.
    (b) The IDER is defined for all non-negative doses; it approaches infinity for large doses. This is fine. It occurs for example if the slope generator is 1+z, positive real at the origin and having one zero on the negative real axis.
    (b') The IDER is defined for all non-negative doses; it approaches -infinity for large doses. This is also OK, not as an IDER by itself but as one component of a mixture, provided there are other strongly positive components in the mixture. The -infinity case occurs for example if the slope generator is z-1, negative real at the origin and having one zero on the positive real axis.
    (c) and (c') The IDER is defined for all non-negative doses; it approaches a finite (real) constant C for large doses. This is also OK. For example if the only slope generator zero is located on the positive real axis and the slope generator is real and positive at the origin, then C is positive.
    (d) The IDER is zero for all non-negative doses. This occurs iff the slope generator is zero at the origin.
    Surprisingly, it is a very interesting case, not, of course, as an IDER by itself, but as a component of a mixture some of whose components have positive IDERs.

  3. A5 also contains some examples where the slope generator is a polynomial some of whose zeros have a_1=0. It also contains a few examples where the slope generator is not a polynomial. Basically we want to find enough slope generators to cover all cases of interest in practice (which will most probably require some non-polynomial slope generators). But we need some restrictions which insure that non of the resulting IDERs become infinite at a finite dose. Perhaps focusing attention on the poles and zeros of the slope generators will give us an elegant collection of slope generators. I have started to think about slope generators such as A*cosine(z)+B, where A and B are real. These can have an infinite number of zeros in the complex plane. Also, slope generators that have poles are candidates, as are multi-valued slope generators that need Riemann sheets for their detailed analysis.

  4. A5 also contains some comments on and applications of the qualitative theory of differential equations. Since the optional project is concerned with only one IDER at a time, I expected this to be trivial, but that is actually not the case. The reason is that when IDERs are defined in terms of an AIVP we usually cannot solve the for the IDER explicitly. R ODE integrators should be enough to indicate what we need to know about the qualitative properties of such IDERs but it is better to have proofs using the qualitative theory. Here is an example. Suppose dE/dd= 2+cos(E). Does E reach + infinity at a finite dose?
    Of course not: 2+cos(E) < 4 and dE/dd = 4 has solution E=4d which is defined for all doses. So the solution to dE/dd=2+cos(E) must be smaller than 4E, thus never reach an infinite value at finite dose.
    But giving the proof or using numerical integration to show this is a pain in the neck and for less trivial cases the result may not be obvious. We need a book on the qualitative theory of ODE (AKA dynamics) that presents in simple form the relevant results and proofs about attractors etc. even in the 1-ODE case, not just as a special case of many coupled ODE, or an infinte number. There must be such books. Finding one may be tedious.

@rainersachs
Copy link
Collaborator Author

Our plan continues to be making Yimin's code the one and only master script as soon as possible.

But thinking a bit more about this what I then need next is code that I can run in a sandbox in my own RStudio in my home computer. I want to be able to write code that generates almost any kind of plot for IDERs or mixture calculations except for ribbon plots. It should show plots on my sandbox RStudio plot panel and not in any separate files. Then I can quickly reject lots of plots I try; the ones I like best I can turn into .eps files just by export, then I can refine them further in Adobe Illustrator, tweak the plotting code, iterate the procedure, and finally when I am satisfied tell Yimin or Edward to put them into the .ggplot part of the master file. To do that I would need a chunk that is only for me and should be commented out by everyone else. It would have to use plot, not ggplot. It should be as idiot-proof and as cranky computer proof as possible, since it would be run by the one on the other.

I think if Edward or Yimin could add such a chunk as close to the start of Yimin's master as possible, and put in it 3 or 4 representative plots that translate ggplot into plot I could probably write all the rest just by looking at Yimin's other ggplot files and translating them into plot. I would not have to ask either Yimin or Edward to constantly be sending me .eps files for tweaking. Only when we are closer to finished would my chunk be erased and all the information be in ggplots in the master.

Is that feasible?

In haste, Ray

@rainersachs
Copy link
Collaborator Author

Hi Yimin:

Your code through line 276 runs and gives me almost all the functionality I need.
Remaining improvements needed in order of decreasing importance:

  1. A few examples of gg2plot translated into plot and put somewhere before line 276
  2. Hardest: make I(d) run up to dose 1 Gy whenever possible. This means avoiding error messages from uniroot sometimes when the dose points are not chosen just right. See if you can find doses that give error messages from uniroot. Then see if you can fix them or there is a maximum dose < 1 Gy above which nothing works.
  3. Give me easier control over the dose ranges of calculations and plots.
  4. Cosmetics but not functionality
  5. Optional: give me the ability to run Monte Carlo for 2 or 10 sample paths rather than 1000

@rainersachs
Copy link
Collaborator Author

Yimin will see if reducing the number of dose points for a ribbon plot down to as little as 10 or 20 speeds up the master script without changing the plots markedly except perhaps at doses so low they will not be noticeable on most of our plots and are lower than almost every data point (i.e. are < 0.01 Gy).

@rainersachs
Copy link
Collaborator Author

Next week Claire will continue to study Yimin's code line by line. Yimin will work on some plots and also see about speeding up the code. See you guys tomorrow! Edward is waiting for when Yimin's code is a bit more settled.

@rainersachs
Copy link
Collaborator Author

rainersachs commented Feb 17, 2018

  1. The HG pod got a lot done this week, including Yimin's speeding up of the Monte Carlo calculations. Most of you will write minutes of our meetings and I will comment on those when they appear on GitHub.

  2. Sunday night 2/18/2018 Yimin's code (plotting_Yimin.R) on GitHub will become the master file and be turned over to Edward for quality control. Yimin may or may not have time to add a few last-minute changes before then; either way is fine. After Sunday all changes to this master should go through Edward, who will work on debugging it and, with Yimin's help, adding various bells and whistles. Yimin and Edward should cooperate closely on this.

  3. We are postulating 2 types of HZE models, NTE (i.e. TE-NTE since TE are always assumed and always dominate at large doses) and TE (i.e. TE-only, with NTE assumed negligible at all doses). At the moment we only have graphs, variance-covariance matrices, and Monte Carlo CI calculations for our NTE HZE models. Eventually we will need graphs etc. for our TE only models as well. This will be relatively easy and can be postponed because it will be needed only for the "major" report/paper about a year from now, not for the "minor" paper/report that should be ready, with luck, this semester.

  4. The HG pod minor paper is progressing well. I hope to have Title, Abstract, Introduction, and Methods sections ready in a couple of weeks. I will then send it to you guys for information and for your criticisms.

@Mebert314
Copy link
Collaborator

We went over my results on the sham mixture principle and m&m in Lam’s independent action theory in the cases where the IDERs (n=2):
a) Started at 0, and
b) Started at 0, ended at finite value (in this case 1)
With n=2, case (a) is identical to simple effect additivity. I will check if it holds for n>2.
For case (b), with IDERs E1=1-exp(-d) and E2=1-(d+1)^-2, we discussed my results showing that: E1 satisfies sham (e.g. 50-50 mixture), E2 doesn’t satisfy sham, and that m&m doesn’t hold.
From these examples, it seems that Lam won’t help us prove that sham implies m&m, so we should drop it for now.
Moving forward, my assignments are to compare Lam with the other synergy theories, and see if/how/when we can use it. Also, check if Lam simplifies to simple effect in cases where the IDERs:
i) Start at 0, go to infinity
ii) Start at –infinity, go to 0.
with n>2 (I suspect that it doesn’t).

@rainersachs
Copy link
Collaborator Author

rainersachs commented Feb 19, 2018

As of now, the plotting Yimin script on GitHub is turned over to Edward for quality control. All changes, major or minor, should go through him to avoid confusion. The GitHub script, perhaps under a different name, is now the master file for the project.

3 immediate jobs are the following. 1). Yimin's changes that speeded up the Monte Carlo calculation should be added to the script. 2). the LET values L in the data bases for Fe56 at 600 MeV/u have to be changed from values like 173 or 193 or 196 to all be 185 keV/micron; this change has to be implemented throughout the rest of the script wherever appropriate. 3). All the output of the high LET NTE model, like figures, variance-covariance matrices, p=values from regression summaries, etc. has to be duplicated for the high LET TE model. This third job can actually be postponed for a while because it is not needed for our next (minor) paper, only for the major paper to be started later. But the other 2 are need for the minor paper and I will soon have a version of that with more details on what output is needed.

Edward: I will keep my sandbox synched with the master by hand. When commenting the script you need not worry much about comments that relate the script to the radiobiology literature or our upcoming papers. When you are close to finished, let me know and I will send you a version that includes those kinds of comments. I will have to ask you to add various ggplot2 figures to the master as I gradually write more of the next paper (the minor one). More on that when we next meet.

@eghuang
Copy link
Member

eghuang commented Feb 19, 2018

I've been looking through Yimin's script this weekend. Several items I've noticed:

  1. The plotting script lacks the updated names and rewritten modeling functions I added to HGsyngergyMain.R last fall. I will add these as I merge our two files.
  2. Hardcoded variables (i.e hardcoded data pipelines). This makes it very difficult for us to make changes to the data or to have other researchers examine/run our code.
  3. Lots of repeated code, which has led to a total script length of ~1200 lines - I'm not yet convinced that this is necessary, but I may be wrong.
  4. Lack of commenting in the plotting functions: this is not of prime importance but it will delay my familiarization with the script as we use it in the future to produce plots for the TE models.

Items 1 and 2 are of much greater importance than 3 and 4. Since the plotting is not very relevant to the "computational implementation" of the script, I am not sure whether refactoring the plotting code should be prioritized. Whatever the case, I will be forced to refactor at least a few parts of Yimin's code as I combine his plotting code with my updated data and model code since variable names and data are inconsistent between the two. This minor refactoring will take care of item 2 on Ray's latest post (i.e. updating the LET data).

ACTION ITEMS:

Yimin - If convenient, please add additional comments to your main three plotting functions to clarify what certain blocks are doing (especially obscure code from other libraries or esoteric code from Stackoverflow threads). This is of lesser importance than pushing your Monte Carlo changes as described by Ray above.
Ray and Yimin - Thoughts and comments on the above would be appreciated.

@rainersachs
Copy link
Collaborator Author

Hi:
As regards your item 1, I have a minor suggestion which you can ignore or accept as you choose. Instead of using all lower case names allow upper case iff a well established acronym is involved.

  1. As regards 2. I agree that we have to change that. We are hoping other groups will not only use but also modify the code to test their own opinions on modeling and to correct our possible coding mistakes and our inevitable conceptual mistakes. We should make that easy for them.

  2. As regards 3, Yimin left a lot of extra stuff in at my request.

  3. Your 4 is not of high priority. However you and I will have to work out some way I can get figures as we go. I'm thinking the following but you may have better suggestions: Before putting any changes on GitHub you test that basic functionality remains as regards my using the data frame and being able to add (in my local sand box only, not in the master; your master should allow this but not even mention it, let alone implement it) plots of my own choosing (none of those will use the Monte Carlo part of the script). Sporadically I will send you a draft figure, you will add a ggplot2 figure or a placeholder to the master and I will make the figure in my sandbox. Figures which use the MC part of the master I will leave entirely to you and just keep tweaking by email .eps versions you keep sending me by email.

  4. An hour ago I finished the penultimate draft of Title, Abstract, Introduction, and Methods Sections of the minor paper. I will sleep on it and send around before we meet this week. I think you should read the result fairly carefully before you you get too far into revising the master script.

  5. I think we might be able to submit the minor paper this semester and am making that my top priority. Your top URAP priorities should be, and as far as I can see are, different. With a comparatively small increment of effort we should I think be able to reduce this discrepancy to almost nothing.

  6. As I keep saying, I think URAP delays are much preferable than letting URAP interfere with a student's real academic or personal life.

@rainersachs
Copy link
Collaborator Author

While writing the minor paper I ran into a discrepancy. For the minor paper I want to use the LET Values for individual ions as given in papers by Alpen (or by Chang respectively) for 20th century (and recent data respectively) . However the experimental methods were not quite the same: Alpen measured and used the LETs the ion tracks have when entering the accelerator beam; Chang measured and used the somewhat higher LETs the ion tracks have when they enter the mouse after being slowed down by intervening matter. In prospective major papers a year from now or so this discrepancy will have to be dealt with somehow. At the moment I don't know what method will be chosen. Because it concerns data physical parameters rather than parameters obtained by regression during modeling the ultimate authority over what method to use is in principle up to the experimentalists not us. In the meantime we should keep in mind that we may eventually have to change some parameters in the data input for our scripts (in addition to adding new data, and perhaps to correcting recording errors for the old data which are easier to deal with because they do not involve differences in experimental techniques,

@rainersachs
Copy link
Collaborator Author

I figured out the way we should handle the discrepancy I discussed here 5 hours ago. It will probably be acceptable to everyone who studies any of our reports or papers. I will just implement it even for the minor paper; and thereby put the whole issue permanently to rest. It will cost me about 8 hours and it could be up to a month before I find the time. But there are already manipulations in the code, where we use physics equations to fill in a few extra columns of dfr given some of the other columns that have nothing to do with any actual data but merely depend on physics quantities like Z and LET and Mev/u. The method of correcting for the LET discrepancy due to different experimental Alpen vs. Chang protocols is very similar to the method already used to generate extra physics columns. Instead of generating extra physics columns we will correct the LET column to correct for the Alpen vs. Chang discrepancy. Edward will certainly need to incorporate the present physics number manipulations somewhere (in R or .xlsx or .csv manipulations?) so basically he will just need to eventually add one extra calculation of the same type. So nothing relevant will have to wait for my detailed implementation.

I hope ;-)>>

@eghuang eghuang added the discussion Thread for the purposes of managing assignments and workflow. label Feb 26, 2018
@rainersachs
Copy link
Collaborator Author

Hi Yimin:
do you want to meet this Thursday or skip this Thursday? In any case Edward may want to ask you for some details at some point and you, Mark, and Claire have the separate complex variable project to work on.

@yiminllin
Copy link
Collaborator

How about I arrange a meeting with Edward this week (or by email)? And we could meet next Thursday after I talk with Edward.

@rainersachs
Copy link
Collaborator Author

Sounds good to me! Edward may not yet have a lot to ask but I think it will be useful if you and he can discuss plans in person or by email. See you Thursday 8 march around 2 or 2:15. -ray

@rainersachs
Copy link
Collaborator Author

Hi Yimin:

Unfortunately I have an urgent telecon just scheduled on another matter tomorrow and have to cancel our 2:15 meeting tomorrow. See you Thursday the 16th!

@rainersachs
Copy link
Collaborator Author

Hi Peter:

Sorry that I only realized while talking to Hada how severe the confounding by complex CA in the human lymphocyte data set is. I think we just have to write off the human lymphocytes as a sunken cost at least for the time being. On the bright side:
(a) We can focus on the fibroblast data set, where such confounding is almost absent. Since Hada now tentatively plans to add more fibroblast experiments, the fibroblast data set will probably become substantially more informative.
(b) The developments inadvertently illustrated a typical feature of research. There are always lots of mistakes, including mistakes in judgement like the one I made, from which one can learn. Cranks also make lots of mistakes, but won't admit, even to themselves, that they are wrong. So they can't learn from their mistakes and thus never get anywhere.

@eghuang
Copy link
Member

eghuang commented Mar 15, 2018

This thread is quite long, as noted by Ray. I am archiving the thread. New updates and discussion should be directed to: #7

@eghuang eghuang closed this as completed Mar 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Thread for the purposes of managing assignments and workflow.
Projects
None yet
Development

No branches or pull requests

5 participants