ENH adding a mock function #43

floracharbo · 2022-01-05T15:42:59Z

I have added the mock function flora_mock_func() in run.py, which is called at the end of the run function. This has no use; I am just saying hi.

DBCerigo · 2022-01-06T08:50:51Z

Yo, so here's my 2c on it all :)

First off, nice one opening this. That's a lot of new tings all at once already, so big ups.

Second, my suggestion now would be to try to get this PR so that it is passing the CI. You see the Some checks were not successful above? That needs to be a green tick.

(Soz if you already know all this.) The CI (Continuos Integration) is all the checks and tests that get ran on the code base to assert and give confidence that a) the code will do what we want it to do (this comes from the unit tests and mypy and the linters), and b) the code follows the standards and conventions we want it to (this comes from the linters and the formatters).

Something to keep in mind is that the first PR is always the hardest by a long long way. They'll be a done of new tools and things going on that you likely won't yet have encountered, so they'll be a lot a lot to learn. So progress on the actual PR will be slow, but progress on ones own knowledge and development will be very fast. So be patient with it and do best to avoid trying to tackle everything at once as it might become overwhelming - eat the elephant one spoon at a time (is what my brother always says).

So back to the CI.
Step 1: You need to have your development environment properly setup locally. I didn't yet add how to do this to readme yet which is my bad, but the instructs would be:

create a virtual environment using python -m venv .venv
activate your virtual environment using source .venv/bin/activate
install all the stuff using the Makefile convenience function make install

The Makefile is a really important and central file to any/most projects that contain one. So reading that file is always a good place to start when working on any new project. The Makefile is used a convenience tool, to define common terminal commands that developers will want to repeatedly run during writing code, and the Makefile makes it easy and fast for the developer to do that. The most important ones for you at the start in this project are lint test and format. Hopefully the names are indicative of what they do! But do read the file to get full understanding.

The most important Makefile command is the top one in the file, and the reason for this is that if you run just make without any command given it will default to running the top command. Developers will most likely make the top command, the one which runs when you run just make to but "run all the CI checks and tests etc.". This is because that's the most common task one wants to run while developing. In our case the top command (call all but the name isn't important, it could just as well be called ci for CI.) runs lint and test which together constitute our CI. If make doesn't "pass" (i.e. you get errors or you don't get to the end of it or a test fails and you don't get "all green lights") then the code isn't "right" in some way or other and your PR won't pass the automated CI running by Github and the project won't merge you're code (until you've fixed it). So you should be running make like all all the time. This is your main feedback for if you're code is right or not. I run it so often that I have an alias that I can type m into my terminal and it runs make.

So the workflow when developing should look like:

change/write some code
leave it in staging (i.e. git staging, don't commit it)
run make and see if everything passes
- if it doesn't pass then fix the complaints and retry make
- if it does pass then continue
add the code you want for your next single commit to staging: one of my favourite git command is git add -up, you can check out the manual for what it does (hint: it just gives way more control as to what you're staging), but if you get in the habit of always using that and never ever using git add . your commits will become much better straight away.
now I would run git diff --staged, this shows me the diffs to the files that I've staged so I'm actually about to commit, and I read it all and check it is al good and proper
now I run make again! just to be sure
now I git commit and add a good message

Ok, so that could be a lot at once already, but here's some more important stuff.

Formatting:

the project using code formatters, these you can run on the entire code base and they shuffle everything around and make it confirm to a fully defined standard, this makes the code base very consistent and much easier to read, and avoids developers having discussions and wasting their time arguing about what format is better or worse - now no discussion, you just let the formatters do it and whatever they do is "right".
To run formatting run make format
When you run make, it will use the formatters to check that the formatting is correct

On your commit messages:

I hadn't seen the commit codes before, but if Numpy uses them then that's all good! Always a good way of deciding if you want to follow some convention of use some tool is to go look at the big open source projects that you respect and like (ref: https://numpy.org/devdocs/dev/development_workflow.html#writing-the-commit-message)

On you commit history:

This is probably going to be the biggest fucker to get clocked.
Right now, your git history, though well intentioned, is actually bad. The reasoning for this is that in reality you didn't want/need to make 3 commits - you didn't need to because you are actually only trying to make 1 minor addition. The reason there is 3 commits is because there was mistaking in the prior commits, and so commits 2 and 3 or actually fixing commit 1. As a review I don't want/need to see the history of the mistakes a developer has made, I just want to the history of the intended and good changes to code base.
So, the thing to start thinking about and reading about and learning is: How to edit/rewrite your git commit history
- IMO and from experience this is surprisingly tricky concept to implement in practice, so take your time about it, experiment a lot, and read around a lot about it.
So the intended commit history for this PR should just be 1 commit with the addition of your function.
Some hints/refs:
- git commit --amend is a good one, git commit --amend --no-edit is an even better one imo, which I use enough that I have an aliase of git ciane!
- Understanding and being a witch at using git rebase ... is a really crucial skill. But rebase is quiet a conceptual fucker at the start (for me at least). I think the biggest/most important mind shift was:
- At the start I thought of a commit as a like a save point or check point, and that each commit contained ALL the code in it. If you think like this then reordering commits or moving them around makes no sense, as how do you reorder check points?
- Commits are save points, they are patches, a commit is only the diffs, not all the code. All the code of built up by applying all the patches for all the commits in order. Once you have it firming in your mind that a commit is only the patch, only the few changes, i.e. "+ def some_func(): on line 28, + df1.merge(df2) on line 29", then reordering and generally slicing and dicing commits up which way anyway becomes a tenable concept.
This one article taught me more about git than anything else out there https://medium.com/@porteneuve/getting-solid-at-git-rebase-vs-merge-4fa1a48c53aa it's long and dense though, so don't expect to clock it in one sitting. I was working on it for a few months and often referred back to it for a long while until everything got super locked in.

Last nit: git commit message should be in the imperative:

So not "adding a mock function", but instead "add a mock function"

Oh ye, and in the top right of this page, you can pick/assign reviewers, if you wanna put some people (i.e. me, Ali if you want his thoughts also if you're lucky for him to give them!).

…dardisation

DBCerigo · 2022-01-10T12:00:12Z

Yo, hope it's ok for me to reply to those questions on here (if not then do go ahead and delete this comment once you've read/made a copy of it!):

The remaining problems stemmed from the fact that I had only checked linting using pylint rather than the MakeFile. Interestingly, pylint was giving me the green light, whereas flake8/black did not.

Yea so "linting" as a concept isn't an exactly defined thing: projects enforce all kinds of different varieties and flavours of linting using different tools with different standards. The thing to focus on instead of "check it passes linting" is to think "check does the project's CI run successfully on my changes", that will include the linting setup specific to that project and possibility all manor of other stuff (tests, type checking, building the documentation, building the package ready for release, etc.).

I am sorry I made the commit history even more confusing while trying to clean it up - it seems I have recommitted old changes by you and Alex. I understand the mistake I have made but do not want to attempt fixing it and risk erasing your work from my PR in the process.

No sorries needed! Everything you're doing on your fork floracharbo/kotsu is totally 100% separate and sandboxed from what's going on in datavaluepeople/kotsu. So those Alex commits aren't the same commits, they're your copy of those commits. So don't at all feel worried or restrained in what you're trying and experimenting with because you're worried you might break something here - you won't. And if you get your local repo in a weird state then just rm it and re-clone it to start afresh!
If you've got the history all in muddle, you can start a new branch off the old clean main - sometimes this is easier then trying to unravel something tricky one has accidentally made.

When I push my changes to my fork, I notice that it automatically updates my PR on your datavaluepeople fork. Is this desirable? Should I find workarounds to play around with the commit history etc on my fork only, before updating the PR all in one go so that you are not notified every time?

This is very much desirable yes. For instance, if you open a PR, then I request some changes, and you go make those changes and push them, then we definitely want those changes to be in that PR!
That said, if you're experimenting/working on a PR or doing further work on a PR, and it is not yet ready for review, then you don't have to push the changes, you can just leave them on your local machine, until you're happy with them, and then push them. Pushing after every commit is not really a habit to want to maintain; pushing after every meaningful chunk of work that have been checked passes the CI and would be worthy of someone elses time to look at is fine though (this might be an hour of work or a day's work). Or, if you're worried about having work you've done only saved on your local machine and thus open to data lose if you're machine suddenly implodes (or more likely you spill a cup of tea on it) then you can make a new branch to do your dev work on (a separate branch from the one this PR is from), push to that all the time and only when you're done rebase this PR branch on that work branch to get it to catch up and then push the PR branch. Thus you get the constant saving to the cloud (GitHub cloud) while not making GitHub notifications of "updated PR" every 5 mins :)

I do not think I have permission to assign reviewers. So this is your nudge to please review my pull request when you have time :)

Oh my bad - I'm still learning stuff too! :)

Hope this helps!

floracharbo · 2022-01-12T15:18:54Z

Ok, Dan,
I am not sure how useful that is as you probably will not accept the pull request anyways as this is just a mock function, but I have made a separate pull request with one clean commit. Thanks for your comments and I will close this PR.

DBCerigo requested review from DBCerigo and removed request for DBCerigo January 5, 2022 19:21

DBCerigo and others added 5 commits January 6, 2022 15:38

Update changelog for v0.2.0 release

9de3908

support passing run_params to model (n.b. typing now nonspecific)

dde7ab1

handle artefacts artefacts directory creation and trailing slash stan…

2a07483

…dardisation

Add additions for PR to changelog

2f47d88

Update changelog for v0.2.1 release

86478de

floracharbo force-pushed the shiny-new-feature branch from 383c69c to 86478de Compare January 6, 2022 14:51

Add mock congratulatory function

4c98f71

floracharbo force-pushed the shiny-new-feature branch from 6dd7dc0 to 4c98f71 Compare January 6, 2022 15:12

DBCerigo requested review from DBCerigo and ali-tny January 10, 2022 11:56

floracharbo closed this Jan 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH adding a mock function #43

ENH adding a mock function #43

floracharbo commented Jan 5, 2022

DBCerigo commented Jan 6, 2022 •

edited

DBCerigo commented Jan 10, 2022 •

edited

floracharbo commented Jan 12, 2022

ENH adding a mock function #43

ENH adding a mock function #43

Conversation

floracharbo commented Jan 5, 2022

DBCerigo commented Jan 6, 2022 • edited

DBCerigo commented Jan 10, 2022 • edited

floracharbo commented Jan 12, 2022

DBCerigo commented Jan 6, 2022 •

edited

DBCerigo commented Jan 10, 2022 •

edited