Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developing Guidance & Documentation for clim-recal #42

Closed
8 of 9 tasks
dingaaling opened this issue Aug 3, 2023 · 20 comments
Closed
8 of 9 tasks

Developing Guidance & Documentation for clim-recal #42

dingaaling opened this issue Aug 3, 2023 · 20 comments
Assignees

Comments

@dingaaling
Copy link
Collaborator

dingaaling commented Aug 3, 2023

Plan:

  • create vision for clean pipeline (see commit 89b70a5)
  • added sphinx base for future documentation to branch extend_documentation
  • split existing content into main pipeline walk-through (visible in readme) and internal document
  • create "narrative walk-through" for pipeline
    still to do
  • change all paths to be either azure specific or dummy general purpose
  • change all steps to use just one metric, city and run as example
  • add Griff's azure doc to INTERNAL.md
  • add contributors
  • fix section links
@aranas
Copy link
Collaborator

aranas commented Aug 10, 2023

Before we go into creating a nice looking website using mkdocs, I would like to start by mapping out the individual elements in terms of guidance & documentation more closely by reworking the README. I suggest that the README contains some guidance (and links to further info where it gets too deep) and this guidance should be clearly split into different user groups, eg non-climate scientist and more expert researchers, because they will have different goals when interacting with the project. Here is a draft, maybe we can discuss this later today together

Structure

  • Intro (what is it, for whom)
  • ToC
  • Quick start guide (download repo, install dependencies & run small-scale example)
  • small-scale example as notebook
  • Guidance
    • For non-climate scientists (why BC, which BCs brief taxonomy viz, how to decide/flowchart)
    • For expert researchers (detailed technical guides, eg code examples & BC tutorials; how to contribute)
  • Documentation (divided into python & R pipelines?)
    • Installation & setup
    • Where to get data from & data format (eventually MO open data portal)
    • functions docstrings
    • FAQs
  • Research (review, references)
  • License & contributors

Resources for good README

@aranas
Copy link
Collaborator

aranas commented Aug 10, 2023

The part about downloading data from zure is only for internal info right? Or will this be outward facing info?
https://github.com/alan-turing-institute/clim-recal#accessing-the-pre-downloadedpre-processed-data

@gmingas
Copy link
Collaborator

gmingas commented Aug 10, 2023

The part about downloading data from zure is only for internal info right? Or will this be outward facing info? https://github.com/alan-turing-institute/clim-recal#accessing-the-pre-downloadedpre-processed-data

Yes, this is for internal info and could be moved to another readme file or somewhere else, possibly with a link to it from the main readme.

@RuthBowyer
Copy link
Collaborator

RuthBowyer commented Aug 10, 2023

I think this could be useful for our partners if this is how we share the data with them (which I think is still tbc?) jic you were unaware (see issues 37 and 38 )

@dingaaling
Copy link
Collaborator Author

The overview you've drafted looks great, @aranas!

Two additional example resources I'd add as README reference points for us are BIG-bench and EleutherAI's lm-evaluation-harness. These are two examples of creating standardised resources for evaluating and comparing LLMs on a range of tasks. I think BIG-bench is better documented atm and probably the better source of inspo for us, but the eval-harness is also going through a major refactor atm.

Beyond the README, another useful reference point is how they document tasks in a summary table for Big-Bench and task-table for the eval-harness. I recommend we add that as a priority so we (and any users!) have a standard map/naming we can use to refer to our different BC methods.

@aranas
Copy link
Collaborator

aranas commented Sep 12, 2023

another example of a benchmark style repo but closer to home: https://github.com/duncanwp/ClimateBench

@dingaaling
Copy link
Collaborator Author

dingaaling commented Sep 21, 2023

@gmingas prioritisation feedback: Guidance (e.g. step by step) of how to use the pipeline via CLI or notebook

@RuthBowyer feedback: we're still trying to figure out who the users are

@aranas "narrative around the pipeline"

@aranas
Copy link
Collaborator

aranas commented Oct 11, 2023

@RuthBowyer atm according to the anlysis flowchart the Cropping_Rasters_to_three_cities.R script takes the resampled files and extracts data for three cities before passing on to further preprocessing (splitting into test & validate and eventually applying bias correction).
For me to include this into the pipeline walk-through could you provide the relevant R specs, e.g version, environment files specifying packages?

@aranas
Copy link
Collaborator

aranas commented Oct 11, 2023

@RuthBowyer, I am wondering should we host the shapefiles on the GitHub repo to make this more accessible? they don't seem very big.

Else, I will need the source for this shapefile:
NUTS_Level_1_January_2018_FCB_in_the_United_Kingdom_2022_7279368953270783580 -- this shapefile used for defining regions and cutting, also London -- this one also used for chopping up LCAT data

@aranas
Copy link
Collaborator

aranas commented Oct 11, 2023

For the analysis walk-through I will provide shell commands to execute full pipeline end to end for one. I think it would suffice to illustrate this with one metric, one city, one run, rather than including the loops.
Would you agree or should the walk-through include the loops?

@gmingas
Copy link
Collaborator

gmingas commented Oct 11, 2023

For the analysis walk-through I will provide shell commands to execute full pipeline end to end for one. I think it would suffice to illustrate this with one metric, one city, one run, rather than including the loops. Would you agree or should the walk-through include the loops?

Totally agree, just one combination is enough. And the script with the loops will be available in the codebase too.

@gmingas
Copy link
Collaborator

gmingas commented Oct 11, 2023

@RuthBowyer, I am wondering should we host the shapefiles on the GitHub repo to make this more accessible? they don't seem very big.

Else, I will need the source for this shapefile: NUTS_Level_1_January_2018_FCB_in_the_United_Kingdom_2022_7279368953270783580 -- this shapefile used for defining regions and cutting, also London -- this one also used for chopping up LCAT data

I would support including them in the repo if there are no licensing issues.

@RuthBowyer
Copy link
Collaborator

Yep sounds good - all downloaded from OA sources but might need to double check the licenses on the sites

@griff-rees
Copy link
Collaborator

I've added ticket #42 for configuring how the documentation is rendered and maintained.

@griff-rees
Copy link
Collaborator

griff-rees commented Oct 17, 2023

I've added some screenshots from using quarto in #42. Great to get a sense if people like that option (and sorry I think I commented on #56 thinking it was this one, my bad).

@aranas
Copy link
Collaborator

aranas commented Oct 18, 2023

Yep sounds good - all downloaded from OA sources but might need to double check the licenses on the sites

I will create a new issue for this @RuthBowyer

@aranas aranas mentioned this issue Oct 18, 2023
4 tasks
@aranas
Copy link
Collaborator

aranas commented Oct 18, 2023

I think I have now written all the sections that I wanted to complete, so I will open PR #62 for review. Please feel free to comment / fix / close while I am on A/L this week.

Some open questions / comments from my side:

  • R version specs need to be added to requirements
  • (line 123 in guidance) currently R script logic at this point quite different from python scripts, as all paths and variables are hardcoded in script. Either I adjust this in the guidance by instructing to either change paths in R script or paste R code with dummy paths directly into guidance or we adjust the Cropping R script.
    • for simplicity in guidance I chose to remove some flags that had defaults and were not crucial conceptually for the analysis (eg multiprocessing option), but feel free to add those back in if you think they are needed

@gmingas
Copy link
Collaborator

gmingas commented Oct 26, 2023

As discussed today: @RuthBowyer when you are back could you please have a look at this and add the documentation parts relevant to the R pipeline e.g. R packages list. Recommendation is to do this in a PR separate from #62

@gmingas
Copy link
Collaborator

gmingas commented Oct 26, 2023

@aranas to talk to @griff-rees to decide on easiest approach for merging the quarto and guidance branches (quarto branch already has merged main branch, which was challenging)

@griff-rees
Copy link
Collaborator

@gmingas the merge was managed in #72

@gmingas gmingas closed this as completed Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants