From f2ec879928fb9f4cb173262645ceb97de8e69188 Mon Sep 17 00:00:00 2001 From: Jason DeBacker Date: Tue, 19 May 2026 11:04:30 -0400 Subject: [PATCH 1/6] Expand tutorial content and migrate book config --- README.md | 8 +- book/{_config.yml => _config.yml.jb1.bak} | 0 book/{_toc.yml => _toc.yml.jb1.bak} | 0 book/content/glossary.md | 38 ++-- book/content/intro.md | 68 +++---- book/content/using/AdvancedGit.md | 90 ++++++++- book/content/using/Attribution.md | 68 ++++++- book/content/using/GitHubBasics.md | 129 +++++++++++- book/content/using/GitHubWorkflow.md | 235 ++++++++++++++++++++-- book/content/using/IssuePRthreads.md | 71 ++++++- book/content/using/MergeConflicts.md | 99 ++++++++- book/myst.yml | 49 +++++ 12 files changed, 775 insertions(+), 80 deletions(-) rename book/{_config.yml => _config.yml.jb1.bak} (100%) rename book/{_toc.yml => _toc.yml.jb1.bak} (100%) create mode 100644 book/myst.yml diff --git a/README.md b/README.md index 67320f0..c74e103 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ [![PSL incubating](https://img.shields.io/badge/PSL-incubating-ff69b4.svg)](https://www.PSLmodels.org) -# Git and GitHub Use, Collaboration, and Workflow (*IN PROGRESS*) -This repository houses the source code and content files for the open access, *Git and GitHub Use, Collaboration, and Workflow* book tutorial and training that many contributors to the [PSLmodels](https://github.com/PSLmodels) community use. This project uses [Jupyter Book](https://jupyterbook.org/intro.html) 0.7.3 to create the HTML and Jupyter notebook forms of the tutorial. This public GitHub repository hosts all the source code for the book, and the compiled Jupyter book is available at the GitHub page for this repository [https://pslmodels.github.io/Git-Tutorial](https://pslmodels.github.io/Git-Tutorial), up to the most recent commit to the master branch. We hope to add tutorial videos at some point in the future. This project is maintained by [Richard W. Evans](https://sites.google.com/site/rickecon/) and [Jason DeBacker](https://www.jasondebacker.com/). +# Git and GitHub Use, Collaboration, and Workflow +This repository houses the source code and content files for the open access, *Git and GitHub Use, Collaboration, and Workflow* book tutorial and training used by many contributors in the [PSLmodels](https://github.com/PSLmodels) community. This project uses [Jupyter Book](https://jupyterbook.org/intro.html) to create the published HTML version of the tutorial. The source lives in this GitHub repository, and the compiled book is available at [https://pslmodels.github.io/Git-Tutorial](https://pslmodels.github.io/Git-Tutorial). This project is maintained by [Richard W. Evans](https://sites.google.com/site/rickecon/) and [Jason DeBacker](https://www.jasondebacker.com/). ## Contributing to the book @@ -13,7 +13,7 @@ From your fork of this repository, you can generate your own version of the book ### Setting up the virtual environment -The virtual environment specifications are defined in the [`environment.yml`]() file. If you have not set up the conda environment, navigate to your `Git-Tutorial` repository folder in your terminal on your local machine and execute the following two commands. If you have already created the conda environment, then simply activate it using the second command below (skip the first command). +The virtual environment specifications are defined in the [`environment.yml`](environment.yml) file. If you have not set up the conda environment, navigate to your `Git-Tutorial` repository folder in your terminal on your local machine and execute the following two commands. If you have already created the conda environment, then simply activate it using the second command below (skip the first command). ```bash conda env create -f environment.yml @@ -45,4 +45,4 @@ jb clean --all ./book ## Notes -Put notes here. +This repository is actively maintained and contributions are welcome. The most helpful contributions usually improve tutorial clarity, correct command examples, add beginner-friendly explanations, or expand the hands-on workflow chapters. diff --git a/book/_config.yml b/book/_config.yml.jb1.bak similarity index 100% rename from book/_config.yml rename to book/_config.yml.jb1.bak diff --git a/book/_toc.yml b/book/_toc.yml.jb1.bak similarity index 100% rename from book/_toc.yml rename to book/_toc.yml.jb1.bak diff --git a/book/content/glossary.md b/book/content/glossary.md index b0cbdaf..67dd6e1 100644 --- a/book/content/glossary.md +++ b/book/content/glossary.md @@ -6,19 +6,19 @@ application programming interface An application programming interface or API is the medium, method, and rules through which a user interacts with software. The API includes a medium which can be a {term}`command line interface` on a specific {term}`local` terminal or a {term}`graphical user interface`. The API also defines the commands through which a user interacts with the software. benevolent dictator - TODO: Make *benevolent dictator* entry... + In open-source software, a benevolent dictator is an informal term for a person who has broad authority over the direction of a project and final say over which changes are accepted. In many modern projects, decision-making is shared across several maintainers rather than centered on one individual. Bitbucket *Bitbucket* or [*Bitbucket.org*](https://bitbucket.org/) is a {term}`cloud` {term}`source code management service` platform designed to enable scalable, efficient, and secure version controlled collaboration by linking {term}`local` {term}`Git` version controlled software development by developers. Bitkeeper - TODO: Put Bitkeeper definition here... + BitKeeper was a proprietary version control system that played an important role in the history of Linux kernel development and influenced the design of {term}`Git`. Box, Inc. - TODO: Box Inc. definition... University file sharing company... + Box, Inc. is a cloud file storage and file sharing service used by many universities and businesses. It supports file synchronization and sharing, but it is not a full version control platform like GitHub. branch - TODO: define branch + A branch is a named line of development in a Git repository. Branches allow a user to work on one set of changes independently from another set of changes. centralized version control system A centralized version control system or CVCS is an approach to version control in which all the files in a {term}`repository` as well as the change history (content and timing) are located on a central {term}`remote` server. User's check out versions of files from the repository and check them back in, creating new change history on the central server. @@ -30,22 +30,22 @@ cloud Cloud can be a descriptor or a noun. As a descriptor, cloud refers to computational resources, such as servers, that are accessed remotely via the internet. As a noun, remote computational resources and storage can be referred to generically as "the cloud". command line interface - TODO: A *command line interface* or CLI... + A command line interface or CLI is a text-based way of interacting with software by typing commands into a terminal. commit - TODO: *Commit* can be a verb or a noun. Define commit... + Commit can be a verb or a noun in Git. As a verb, to commit means to record a set of staged changes in the repository history. As a noun, a commit is that recorded snapshot together with its author, timestamp, and message. continuous integration - Continuous integration or continuous integration unit testing is... + Continuous integration or CI is the practice of automatically running checks, such as tests, builds, or style validation, whenever changes are proposed or merged. distributed version control system A *distributed version control system* or DVCS is {term}`version control system` software on any computer, {term}`local` or {term}`remote`, that tracks the entire history of changes to a {term}`repository` and coordinates and organizes collaboration among multiple users. It is distributed in the sense that multiple {term}`clone`s of a single {term}`remote` repository have the same full history of that repository. Dropbox - TODO: define Dropbox + Dropbox is a cloud file storage and synchronization service. It is useful for sharing files, but it does not provide the same history, branching, and code review features as Git and GitHub. fork - TODO: define fork + A fork is a copy of a remote repository created under a different GitHub account or organization. Forks are central to many open-source workflows because they let contributors propose changes without direct write access to the upstream repository. Git *Git* is {term}`open source` {term}`version control system` software with capability designed to also operate as {term}`distributed version control system` (DVCS) software that resides on your local computer and tracks changes and the history of changes to all the files in a directory or {term}`repository`. See the Git website [https://git-scm.com/](https://git-scm.com/) and the [Git Wikipedia entry](https://en.wikipedia.org/wiki/Git) {cite}`GitWiki2020` for more information. @@ -54,25 +54,25 @@ GitHub *GitHub* or [*GitHub.com*](https://github.com/) is a {term}`cloud` {term}`source code management service` platform designed to enable scalable, efficient, and secure version controlled collaboration by linking {term}`local` {term}`Git` version controlled software development by users. *GitHub*'s main business footprint is hosting a collection of millions of version controlled code repositories. In addition to being a platform for {term}`distributed version control system` (DVCS), *GitHub*'s primary features include code review, project management, {term}`continuous integration` {term}`unit testing`, {term}`GitHub actions`, and associated web page (GitHub pages) and documentation hosting and deployment. GitHub actions - GitHub actions + GitHub Actions is GitHub's automation system for running workflows such as tests, builds, deployments, and repository maintenance tasks. GitLab - TODO: define *GitLab*... + GitLab is a source code hosting and collaboration platform similar to GitHub. It supports Git repositories, issue tracking, merge requests, CI, and other project management features. Google Docs - TODO: define Google Docs + Google Docs is a cloud-based collaborative word-processing application. It is useful for shared writing, but it is not a substitute for Git-based version control of code repositories. Google Drive - TODO: define Google Drive + Google Drive is a cloud storage and file synchronization platform from Google. It can store and share project files, but it does not provide Git-style branching, merging, and commit history. graphical user interface - A graphical user interface or GUI... + A graphical user interface or GUI is a visual way of interacting with software through windows, buttons, menus, icons, and other on-screen elements. integrated development environment - Integrated development environment or IDE is a software application that comsolidates many of the functions of software development under one program. IDE's often include a code editor, object memory and identification, debugger, and build automation tools. (See [IDE Wikipedia entry](https://en.wikipedia.org/wiki/Integrated_development_environment) {cite}`GitIDE2020`.) + Integrated development environment or IDE is a software application that consolidates many of the functions of software development under one program. IDEs often include a code editor, object memory and identification, debugger, and build automation tools. (See [IDE Wikipedia entry](https://en.wikipedia.org/wiki/Integrated_development_environment) {cite}`GitIDE2020`.) Linux - TODO: write Linux description... + Linux is a family of open-source operating systems whose development history is closely tied to the history of Git. local *Local* is a descriptor that refers to files that reside or operations that are performed on a user's machine to which he or she has direct access without using the internet. @@ -81,13 +81,13 @@ local version control system A *local version control system* or LVCS is the simplest and most common approach to VCS. LVCS stores all the changes to the files in a {term}`repository` locally on the user's machine as a series of changes or deltas in the files. This is the approach taken by Apple's Time Machine backup software as most software that includes an "undo" function. merge - TODO: create *merge* entry... + Merge can be a verb or a noun. To merge is to combine changes from one branch into another branch. A merge may happen automatically or may require human conflict resolution. open source *Open source* is a descriptor that is usually applied to software or computer code projects, but can also be applied to any project based upon or represented by digital files. An open source project is one in which the source code is freely available to be downloaded and used and in which collaboration, improvements, and changes to the code are encouraged. The free download and use (outward direction) aspect of *open source* is often emphasized. But the collaboration and improvement contribution (inward direction) aspect is at least as important. {term}`Git` and {term}`GitHub` have enabled efficient and scalable collaboration to a degree not seen in other collaborative workflows. pull request - TODO: define *pull request*... + A pull request or PR is a GitHub request asking maintainers to review and merge a proposed set of changes from one branch into another branch. remote *Remote* is a descriptor that refers to files that reside or operations that are carried out on a server to which a user has access using the internet. @@ -99,7 +99,7 @@ source code management service A *source code management service* is a {term}`cloud` platform that hosts computer code files and provides either {term}`centralized version control system` (CVCS) or {term}`distributed version control system`. As the central hub of either CVCS or DVCS, the source code management service provides the platform and rules for distributed code collaboration. Leading examples are {term}`GitHub` and {term}`Bitbucket`. unit testing - Unit testing is... + Unit testing is the practice of writing and running tests for small, individual parts of a code base to confirm that each part behaves as expected. version control system *Version control system* or version control software or VCS is software that records changes to a set of files, including the order in which the changes were made and the content of those changes, in such a way that previous versions can be recalled or restored. diff --git a/book/content/intro.md b/book/content/intro.md index b1895ea..d4092b2 100644 --- a/book/content/intro.md +++ b/book/content/intro.md @@ -3,13 +3,16 @@ This book is published online at [https://pslmodels.github.io/Git-Tutorial/](https://pslmodels.github.io/Git-Tutorial/) using [Jupyter Book](https://jupyterbook.org/intro.html) as its publishing engine. The source code for this book is publicly available in the GitHub repository [https://github.com/PSLmodels/Git-Tutorial](https://github.com/PSLmodels/Git-Tutorial), and users are encouraged to submit pull requests with contributions and corrections. All content is licensed under the [Creative Commons Attribution Non Commercial Share Alike 3.0 license](https://creativecommons.org/licenses/by-nc-sa/3.0/). -This Git and GitHub tutorial grew out of training needs that constantly recurred and overlapped in our collaborative circles. We would need to train any new collaborators on [Policy Simulation Library](https://github.com/PSLmodels) incubated projects on the Git and GitHub workflow we used. Since 2013, the authors have been training students, faculty, policy makers, other researchers to collaborate on research and code using Git and GitHub. Until now, the materials we would use to train our collaborators were pieced together from a number of great sources. This book represents the consolidation of those resources and our experience together in one place. +This Git and GitHub tutorial grew out of training needs that constantly recurred and overlapped in our collaborative circles. We repeatedly needed to help new collaborators contribute to [Policy Simulation Library](https://github.com/PSLmodels) repositories using a workflow that was safe, transparent, and scalable. Since 2013, the authors have trained students, faculty, policy makers, and researchers to collaborate on research and code using Git and GitHub. This book collects those lessons in one place. + +Two warnings that an experienced Git user should always give a new entrant to version-controlled collaboration are the following. -Two warnings that a seasoned Git and GitHub user should always give a new entrant to this type of version control and code collaboration are the following. * The learning curve is steep. * The workflow initially is not intuitive. -These two obstacles seem to work together to make this form of collaboration harder than the sum of their parts initially. However, once you begin collaborating on open source projects or on large-group academic or research projects, you start to see the value of all the different steps, methods, and safeguards invoved with using Git and GitHub. {numref}`Figure %s ` below is a diagram of the main pieces and actions in the primary workflow that we advocate in this book. You will notice that a version of this figure is the main image for the book and is also the `favicon` for the tabs of the web pages of the online book. This figure of a Git and GitHub workflow diagram looks complicated, but these actions will become second nature. And following this workflow will save the collaborators time in the long-run. +Those obstacles are real, but they are temporary. Once you begin collaborating on open-source projects or on large-group academic and research projects, the purpose of the workflow becomes much clearer. The extra steps create safeguards around shared work: they make changes reviewable, preserve a precise history, and reduce the risk that one person's mistake will disrupt everyone else's work. + +{numref}`Figure %s ` below is a diagram of the main pieces and actions in the primary workflow that we advocate in this book. At first glance, it looks busy. That is normal. The main goal of this tutorial is to help the reader connect the commands, the GitHub interface, and the underlying logic of that workflow so that the process becomes manageable and then routine. ```{figure} ../_static/lecture_specific/intro/GitFlowDiag.png --- @@ -20,10 +23,9 @@ name: GitFlowDiag Flow diagram of Git and GitHub workflow ``` - ## Brief definitions -Before we move on with the introduction and the rest of this book, we want to give the reader a quick reference, definition, and comparison of Git and GitHub. We will spend entire chapters on these two topics in {ref}`chap_VCgitHist` and {ref}`chap_GitHubHist`. But we want to give a brief reference here at the beginning. A full {ref}`chap_glossary` is included in the Appendix of this book. +Before we move on, we give a short comparison of Git and GitHub. We spend more time on these topics in {ref}`chap_VCgitHist` and {ref}`chap_GitHubHist`, and a fuller set of definitions appears in the Appendix glossary. ```{admonition} Definition: Repository :class: note @@ -32,63 +34,61 @@ A {term}`repository` or "repo" is a directory containing files that are tracked ```{admonition} Definition: Git :class: note -{term}`Git` is an {term}`open source` {term}`distributed version control system` (DVCS) software that resides on your local computer and tracks changes and the history of changes to all the files in a directory or {term}`repository`. See the Git website [https://git-scm.com/](https://git-scm.com/) and the [Git Wikipedia entry](https://en.wikipedia.org/wiki/Git) {cite}`GitWiki2020` for more information. +{term}`Git` is {term}`open source` version control software that resides on your local computer and tracks changes and the history of changes to all the files in a directory or {term}`repository`. It also provides commands for moving and integrating those changes across related repositories. ``` ```{admonition} Definition: GitHub :class: note -{term}`GitHub` or [*GitHub.com*](https://github.com/) is a {term}`cloud` {term}`source code management service` platform designed to enable scalable, efficient, and secure version controlled collaboration by linking {term}`local` {term}`Git` version controlled software development by users. *GitHub*'s main business footprint is hosting a collection of millions of version controlled code repositories. In addition to being a platform for {term}`distributed version control system` (DVCS), *GitHub*'s primary features include code review, project management, {term}`continuous integration` {term}`unit testing`, {term}`GitHub actions`, and associated web page (GitHub pages) and documentation hosting and deployment. +{term}`GitHub` is a {term}`cloud` {term}`source code management service` built around Git repositories. GitHub hosts remote repositories and adds tools for code review, project management, {term}`continuous integration`, pull requests, issue tracking, and documentation hosting. ``` -To be clear at the outset, Git is the version control software that resides on your local computer. It's main functionalities are to track changes in the files in specified directories. But Git also has some functionality to interact with remote repositories. The ineraction between Git and GitHub creates an ideal environment and platform for scaleable collaboration on code among large teams. +To be clear at the outset, Git is the version control software that lives on your machine, while GitHub is the collaboration platform that helps groups coordinate their Git-based work. The interaction between the two creates a practical environment for large-scale collaboration. ## Wide usage -Every year in November, GitHub publishes are report entitled, "The State of the Octoverse", in which they detail the growth and developments in the GitHub community in the most recent year. The most recent [State of the Octoverse](https://github.blog/2019-11-06-the-state-of-the-octoverse-2019/) was published on November 6, 2019 and covered developments from October 1, 2018 to September 30, 2019. Some interesting statistics from that report are the following. -* more than 40 million GitHub user accounts -* more than 100 million code repositories - -Alternatives to GitHub include [GitLab](https://about.gitlab.com/), [Bitbucket](https://bitbucket.org/). Other alternatives are documented in [this June 2020 post](https://www.softwaretestinghelp.com/github-alternatives/) by Software Testing Help. But GitHub has the largest user base and largest number of repositories. +GitHub has become one of the dominant collaboration platforms for open-source software and many research software projects. That widespread use matters for beginners because it means that the skills learned here transfer to a large number of projects, organizations, and workplaces. +Alternatives to GitHub include [GitLab](https://about.gitlab.com/) and [Bitbucket](https://bitbucket.org/). The details of the interface differ across platforms, but the core Git concepts discussed in this book carry over. ## Other great Git and GitHub resources -Prior to writing this book, the authors pieced together training materials from some great resources. We highlight these resources here both to document what many of our ideas are built upon and to recommend resources that should be used concurrently with this book. -* *Pro Git* {cite}`ChaconStraub2020` is the free official textbook of the Git software open source project. You can access this book online at [https://git-scm.com/book/en/v2](https://git-scm.com/book/en/v2) or you can buy a hard copy from [Amazon.com](https://www.amazon.com/Pro-Git-Scott-Chacon/dp/1484200772?ie=UTF8&camp=1789&creative=9325&creativeASIN=1430218339&linkCode=as2&tag=git-sfconservancy-20). This book focuses on the documentation of the Git software and not as much on its interaction with third party repository hosting services like GitHub. *Pro Git* has translations in 13 languages besides English and has started translations in 16 other languages. -* QuantEcon's "[Git, GitHub, and Version Control](https://julia.quantecon.org/more_julia/version_control.html)" article {cite}`SoodEtAl2020`. This article makes a quick run through Git setup, workflows, and merge conflicts. This article also has some nice exercises to work through at the end. -* Kate Hudson's (the developer, not the actress) "[Flight Rules for Git](https://github.com/k88hudson/git-flight-rules/)" is a how-to guide for actions with Git. Think of this as a Git FAQ. All the links are in the [README.md](https://github.com/k88hudson/git-flight-rules/blob/master/README.md) of this GitHub repository. -* GitHub's [Learning Lab](https://lab.github.com/) site offers tutorials for using Git and GitHub in the form of different learning paths (sequences of tutorials) and individual tutorials. -* [Bitbucket Git tutorials](https://www.atlassian.com/git/tutorials). Bitbucket is the next closest competitor to GitHub, so these tutorials focus on using Git to interact with Bitbucket, which is a little bit different from GitHub. However, the Git tutorials are good. -* Katie Sylor-Miller and Julia Evans [*Oh Shit, Git!*](https://wizardzines.com/zines/oh-shit-git/) is a book in which each chapter is dedicated to key mistakes a Git user will make. This playful e-book costs $10 and can be printed as a PDF. The authors have not used this book, but it looks fantastic. If you prefer or need a cleaner book title, it is also published as [*Dangit, Git!*](https://gumroad.com/l/dangit-git) for the same price. -* Data School's Justin Markham wrote a nice web page entitled, "[Step-by-step guide to contributing on GitHub](https://www.dataschool.io/how-to-contribute-on-github/)" dated June 11, 2020. This is a nice general version of what most contributor guides require. The advantages of this page are that it is highly visual and includes links to other tutorial material. -* The [Git Cheatsheet](https://ndpsoftware.com/git-cheatsheet.html) by NDP Software is an interactive html page that gives the relevant git commands in the five areas of stash, workspace, index, local repository, and upstream repository, as well as how those commands flow into the other areas' commands. -* This [git-pretty flowchart](http://justinhileman.info/article/git-pretty/git-pretty.png) by Justin Hileman is equal parts tongue-and-cheek and realistic heuristic for deciding what to do when you have a particular problem in Git. -* [Git Koans](https://stevelosh.com/blog/2013/04/git-koans/) by Steve Losh, posted April 8, 2013. A Koan rhetorical device from Zen Buddhism in the form of a story or dialogue that moves the listener or reader toward understanding and enlightenment. Losh's Git koans are a clever way to illustrate some of the conventions used in the Git API ({term}`application programming interface`). +Prior to writing this book, the authors pieced together training materials from a number of great resources. We highlight them here both to acknowledge those influences and to recommend companion references. + +* *Pro Git* {cite}`ChaconStraub2020` is the free official textbook of the Git software open source project. You can access it online at [https://git-scm.com/book/en/v2](https://git-scm.com/book/en/v2). +* QuantEcon's "[Git, GitHub, and Version Control](https://julia.quantecon.org/more_julia/version_control.html)" article {cite}`SoodEtAl2020` provides a concise introduction with helpful exercises. +* Kate Hudson's "[Flight Rules for Git](https://github.com/k88hudson/git-flight-rules/)" is an excellent Git troubleshooting reference. +* [Bitbucket Git tutorials](https://www.atlassian.com/git/tutorials) explain many Git concepts clearly, even when their examples use Bitbucket rather than GitHub. +* [Git Koans](https://stevelosh.com/blog/2013/04/git-koans/) by Steve Losh presents Git ideas through short conceptual exercises. ## Open source, Policy Simulation Library, research, and collaboration -Finish this preface with a big-picture discussion of where Git and GitHub fit in the broader open source, policy, research, collaboration setting. Discussion of PSLmodels goals, scalable collaboration, precise attribution, open source ethos, heirarchical protection of code, testing, documentation, best practices, harmonization of convention, ease of participation, modularity, crowd sourcing contributions. +This tutorial is shaped by the needs of collaborative open-source research. PSLmodels repositories are not just software products. They are also part of a broader research and policy workflow in which code, documentation, assumptions, and review history all matter. +In that setting, Git and GitHub provide several important benefits. -## Outline of book and exercises +* They make contributions easier to review before they are accepted. +* They preserve a precise history of who changed what and why. +* They support testing and documentation alongside code. +* They let maintainers protect the main branch while still welcoming outside contributions. +* They lower the cost of onboarding new contributors because the workflow and discussion are visible. + +The PSLmodels workflow emphasizes these goals. A contributor is encouraged to work on a branch, open a pull request, discuss the change in public, pass automated checks, and merge only after review. That process may feel slower at first, but it scales much better than informal file sharing or email-based patch exchange. -In the online version of this book, the table of contents can be toggled to be visible or not visible by clicking on the arrow at the upper-left corner of the main text column of each page. The book is divided into six parts, each of which has sub chapters. +## Outline of book and exercises -Introduction +In the online version of this book, the table of contents can be toggled visible or hidden by clicking the arrow at the upper-left corner of the main text column. The book is divided into several parts covering background, hands-on Git and GitHub use, repository management practices, and editor configuration. ```{tableofcontents} ``` -The contribution of this book to the large body of Git and GitHub references is to bring together in one place the tools and specific instruction for a beginning to move up the Git learning curve as quickly as possible. This is the only resource we know of that combines Git and GitHub functionality and usage with the tools of open source repository management. This is the guide that we wish we had had when we were learning to collaborate with Git and GitHub. - -We have also include exercises in some of the chapters. Chapters ? have some of the most important exercises. These are meant to give the user hands on experience with the issues that often come up in collaborating with Git and GitHub. +The main contribution of this book is to bring together beginner Git usage, GitHub collaboration, and open-source repository management in one place. We especially aim to help the reader make the jump from "I have heard of Git" to "I can contribute safely to a shared repository." +Several chapters include examples and practical exercises. The most important exercises are those that ask the reader to create a branch, make a commit, open a pull request, and resolve a simple conflict, because those tasks mirror the real situations that arise in collaborative work. ## About the authors -[**Richard W. Evans**](https://sites.google.com/site/rickecon/) is Advisory Board Visiting Fellow at Rice University's [Baker Institute for Public Policy](https://www.bakerinstitute.org/). He also holds appointments as Director of the [Open Source Economics Laboratory](https://www.oselab.org/), Nonresident Fellow at the [Tax Policy Center](https://www.taxpolicycenter.org/) of the Urban Institute and Brookings Institution, President of [Open Research Group, Inc.](https://www.openrg.com/), and Senior Editor at the [Center for Growth and Opportunity](https://www.thecgo.org/) at Utah State University. Evans specializes in macroeconomics, public economics, and computational economics. Rick was previously Associate Director and Senior Lecturer in the [M.A. Program in Computational Social Science](https://macss.uchicago.edu/) at the University of Chicago from 2016 to 2020 and a Fellow at the [Becker Friedman Institute](https://bfi.uchicago.edu/) at the University of Chicago from 2016 to 2019. He was the Co-Founder and Co-Director of the BYU Macroeconomics and Computational Laboratory at Brigham Young University from 2012 to 2016. He was Assistant Professor in the BYU Economics Department from 2008 to 2016. +[**Richard W. Evans**](https://sites.google.com/site/rickecon/) is Senior Economist at the [Abundance Institute](https://www.abundance.institute/), Director of the [Open Source Economics Laboratory](https://www.oselab.org/), and President of [Open Research Group, Inc.](https://www.openrg.com/). Evans specializes in macroeconomics, public economics, and computational economics. Rick was previously Associate Director and Senior Lecturer in the [M.A. Program in Computational Social Science](https://macss.uchicago.edu/) at the University of Chicago from 2016 to 2020 and a Fellow at the [Becker Friedman Institute](https://bfi.uchicago.edu/) at the University of Chicago from 2016 to 2019. He was the Co-Founder and Co-Director of the BYU Macroeconomics and Computational Laboratory at Brigham Young University from 2012 to 2016. He was Assistant Professor in the BYU Economics Department from 2008 to 2016. After receiving a B.A. in economics from Brigham Young University in 1998, he began his economic career as a Research Economist at Thredgold Economic Associates in Salt Lake City, Utah, providing state and national economic analysis for Zions Bank and their operations in eight western states. Rick later received an M.A. in Public Policy from Brigham Young University in and Ph.D. in Economics from the University of Texas at Austin. He has also spent time as a researcher at the Joint Economic Committee of the U.S. Congress, the Federal Reserve Bank of Dallas, Utah Economic Council, and as an economic consultant. Rick’s current research focuses on building large-scale, open-source, dynamic general equilibrium macroeconomic models of tax policy and providing web applications and training to allow non-experts to use these models for policy analysis. Rick is a core maintainer of the [OG-USA](https://github.com/PSLmodels/OG-USA) open source large-scale overlapping generations macroeconomic model of U.S. fiscal policy, and has provided macroeconomic modeling consulting services to the European Commission, World Bank, and Indian Ministry of Finance. -[**Jason DeBacker**](https://jasondebacker.com/) is Associate Professor in the Department of Economics at the Darla Moore School of Business at the University of South Carolina and Nonresident Fellow at the [Tax Policy Center](https://www.taxpolicycenter.org/) of the Urban Institute and Brookings Institution. His research interests lie in the areas of public finance and macroeconomics. He has published papers on these topics in the *Journal of Financial Economics*, *Journal of -Law and Economics*, the *Journal of Public Economics*, the *Brookings Papers on Economic Activity*, and other outlets. From 2009 to 2012, Jason worked as a financial economist in the Office of Tax Analysis at the U.S. Department of the Treasury. He earned a Ph.D. in economics from the University of Texas at Austin. +[**Jason DeBacker**](https://jasondebacker.com/) is Associate Professor in the Department of Economics at the Darla Moore School of Business at the University of South Carolina and Nonresident Fellow at the [Tax Policy Center](https://www.taxpolicycenter.org/) of the Urban Institute and Brookings Institution. His research interests lie in the areas of public finance and macroeconomics. He has published papers on these topics in the *Journal of Financial Economics*, *Journal of Law and Economics*, the *Journal of Public Economics*, the *Brookings Papers on Economic Activity*, and other outlets. From 2009 to 2012, Jason worked as a financial economist in the Office of Tax Analysis at the U.S. Department of the Treasury. He earned a Ph.D. in economics from the University of Texas at Austin. diff --git a/book/content/using/AdvancedGit.md b/book/content/using/AdvancedGit.md index cdc39b8..e9d603f 100644 --- a/book/content/using/AdvancedGit.md +++ b/book/content/using/AdvancedGit.md @@ -1,7 +1,91 @@ (chap_advgit)= # Advanced Git and GitHub -Create branch from someone else's branch, submit pull request to someone else's branch, reset, rebase. +This chapter collects several topics that are very useful once you are comfortable with the standard fork, branch, commit, push, and pull request workflow. These tools are powerful, but they are best learned after the beginner workflow already feels familiar. -* git blame -* [git-fame](https://github.com/casperdcl/git-fame). Great package that outputs statistics on contributors including author, lines of code, files, distribution (stats), sorted by most contributions. +## Branching from someone else's branch + +Sometimes you want to continue work that started on another contributor's branch. There are a few ways to do this, but the safest approach is usually: + +1. fetch the relevant remote branch +2. create your own local branch from it +3. push your new branch to your own fork + +For example: + +```bash +git fetch upstream contributor-branch +git checkout -b continue-contributor-work FETCH_HEAD +git push origin continue-contributor-work +``` + +This keeps your work under your own branch name and avoids accidental pushes to someone else's branch. + +## Opening a pull request based on another branch + +Not every pull request has to target `main`. Sometimes a repository may ask you to build on top of an open feature branch. In that case, the pull request base branch may be another contributor's branch rather than the default branch. + +This can be helpful when: + +* a large feature is being built in stages +* a maintainer asks you to contribute to work already in progress +* several related PRs need to merge in a specific order + +When doing this, be extra clear in the PR description about which branch your work depends on. + +## `git reset` + +`git reset` moves branch history or unstages changes, depending on how it is used. Because it can rewrite history, it deserves care. + +Common safe uses include: + +```bash +git reset HEAD path/to/file +git reset --soft HEAD~1 +``` + +These are often used to: + +* unstage a file +* redo the most recent commit while keeping the changes + +Avoid more destructive forms until you are confident you understand the consequences. + +## `git rebase` + +Rebasing reapplies commits on top of a different base commit. This is often used to produce a cleaner, more linear history. + +For example, to replay your feature branch on top of updated `main`: + +```bash +git checkout my-feature-branch +git rebase main +``` + +Rebasing is especially helpful before opening a PR or when a maintainer prefers a linear project history. However, rebasing rewrites commit history. If you have already pushed the branch, updating GitHub may require a force-push: + +```bash +git push --force-with-lease origin my-feature-branch +``` + +Use `--force-with-lease`, not plain `--force`, because it adds a useful safety check. + +## `git blame` + +`git blame` shows which commit most recently changed each line of a file. It is a powerful way to understand the history behind a particular piece of code or documentation. + +```bash +git blame path/to/file +``` + +This command is most useful when used with curiosity rather than blame in the ordinary sense of the word. It helps you answer questions like: + +* When was this line introduced? +* Who might know the background of this code? +* Which commit should I read next? + +## Contribution summaries + +Some projects use tools that summarize who contributed what. One example is [git-fame](https://github.com/casperdcl/git-fame), which reports contributor statistics such as lines changed, files touched, and contribution shares. + +These tools can be interesting and occasionally useful for project maintenance, but they should never replace careful reading of commit history, issues, and PR discussions. diff --git a/book/content/using/Attribution.md b/book/content/using/Attribution.md index 0f700a7..284fcb3 100644 --- a/book/content/using/Attribution.md +++ b/book/content/using/Attribution.md @@ -1,7 +1,67 @@ (chap_attrib)= # Attribution -Go through different ways that Git and GitHub clearly and precisely attribute code to authors. -* academic sense of attribution, much better than author list in a journal article -* GitHub page is used more than resume in tech hiring decisions -* `git blame` +One of the strengths of Git and GitHub is that they provide much more precise attribution than many older collaboration methods. In a healthy repository, you can usually tell not only who wrote a line of code, but also when it was added, what issue or pull request motivated it, and what discussion surrounded the change. + +## Why attribution matters + +Attribution is important for several reasons. + +* It gives contributors visible credit for their work. +* It helps maintainers know who may understand a particular part of the repository. +* It supports reproducibility by showing when and why a change entered the project. +* It creates a richer record than a static author list on a paper or report. + +This is especially valuable in open-source research collaboration, where software, documentation, and analysis may evolve over many years and across many contributors. + +## Commits as attribution records + +Every Git commit records an author, a timestamp, and a message. If commit messages are informative, the history becomes a useful project narrative rather than a list of opaque snapshots. + +A good commit history helps answer questions such as: + +* Who introduced this change? +* What problem was this commit trying to solve? +* Was this part of a larger refactor or a one-line fix? + +## Pull requests as attribution plus review history + +A pull request records more than authorship. It also preserves: + +* the branch where the work happened +* the commits that made up the change +* review comments +* requests for revision +* links to related issues +* CI results at the time of review + +That makes a PR thread one of the best places to understand the history and reasoning behind a change. + +## `git blame` + +The command `git blame` shows which commit last changed each line of a file. + +```bash +git blame path/to/file +``` + +This is often the fastest way to identify the commit that introduced or revised a line you are trying to understand. From there, you can inspect the commit message or find the related pull request on GitHub. + +## Attribution in an academic and research setting + +In many academic settings, attribution is compressed into an author list, acknowledgments section, or changelog. Git and GitHub provide a much finer-grained record. + +They can show: + +* which contributor wrote or revised a section +* which contributor fixed a bug +* which reviewer requested an important change +* which maintainer ultimately approved the result + +That level of detail can be very helpful for onboarding, collaboration, and transparency. + +## A note of caution + +Attribution tools are powerful, but they should be interpreted carefully. A line-based tool such as `git blame` identifies the most recent editor of a line, not necessarily the only person responsible for the underlying idea or design. + +Use Git history as one source of evidence, and read related commits, issues, and PRs for fuller context. diff --git a/book/content/using/GitHubBasics.md b/book/content/using/GitHubBasics.md index a863455..2320706 100644 --- a/book/content/using/GitHubBasics.md +++ b/book/content/using/GitHubBasics.md @@ -1,6 +1,131 @@ (chap_basics)= # Git and GitHub basics -Create, clone, fork, remote, branch, push, pull, pull request. +This chapter introduces the key concepts that appear again and again in the rest of the book. A newcomer can learn Git commands by memorization, but it is much easier to build good habits if you first understand where your files live and which tool is responsible for which part of the workflow. -Include a discussion of `git pull` vs. `git pull --ff-only` vs. `git pull --rebase`. A good blog post is "[Why You Should Use git pull –ff-only](https://blog.sffc.xyz/post/185195398930/why-you-should-use-git-pull-ff-only)" by Shane at ssfc's Tech Blog. +## Git versus GitHub + +{term}`Git` is the version control software that runs on your local machine. It tracks changes to files, records those changes in {term}`commit`s, and helps you move changes between related copies of a {term}`repository`. + +{term}`GitHub` is a website and collaboration platform built around Git repositories. GitHub hosts remote repositories and adds tools for code review, pull requests, issue tracking, access control, documentation hosting, and automated testing. + +In short: + +* Git tracks and moves changes. +* GitHub helps people collaborate around those changes. + +## Repository, local clone, origin, and upstream + +When you work on an open-source repository, there are usually three important places where the same project exists. + +1. The main repository, often owned by an organization such as `PSLmodels`. This is usually called the {term}`upstream` repository. +2. Your personal copy of that repository on GitHub. This is called your {term}`fork`. +3. The copy on your computer created with `git clone`. This is your local repository. + +When you clone your fork, Git automatically names that remote repository `origin`. In a fork-based workflow, you usually add the main organization repository as a second remote named `upstream`. + +```{admonition} Typical naming convention +:class: note +For a repository `https://github.com/PSLmodels/project`, a beginner often works with the following: + +* `upstream` = `https://github.com/PSLmodels/project` +* `origin` = `https://github.com/yourname/project` +* local repository = directory `project` on your computer +``` + +## Branches + +A {term}`branch` is a named line of development. Branches let you work on one change without disturbing another. + +For example, you might keep your local `main` branch synchronized with the latest tested code from `upstream/main`, while making your actual edits on a branch such as `fix-typo-intro` or `add-ci-docs`. + +This is one of the most important beginner habits: + +* keep `main` clean +* create a new branch for each distinct piece of work +* open a pull request from that branch + +## The basic command cycle + +Most day-to-day Git work follows the same pattern: + +1. Update your local `main` branch from `upstream`. +2. Create a new branch from `main`. +3. Edit files. +4. Use `git status` to see what changed. +5. Use `git add` to stage the changes you want in the next commit. +6. Use `git commit -m "..."` to record a snapshot. +7. Use `git push origin your-branch-name` to upload the branch to GitHub. +8. Open a pull request on GitHub. + +We walk through this process in detail in {ref}`chap_workflow`. + +## `git add` and `git commit` + +New users often confuse the staging area with a commit. + +* `git add` puts selected file changes into the staging area. +* `git commit` records the staged changes as a new commit in your local history. + +That two-step process is useful because it lets you decide exactly which changes belong together in a commit. + +## `git fetch`, `git pull`, and `git push` + +These commands all move information between repositories, but they do different things. + +* `git fetch upstream` downloads new history from a remote without changing your working files. +* `git pull` is shorthand for `git fetch` followed by an integration step, usually a merge or rebase. +* `git push origin branch-name` uploads your local commits to a remote repository. + +Because `git pull` performs two actions at once, beginners should understand what kind of integration behavior they want before using it automatically. + +## `git pull` versus `git pull --ff-only` versus `git pull --rebase` + +These three commands all begin by fetching new commits from the remote. They differ in what they do next. + +### `git pull` + +Plain `git pull` fetches changes and then integrates them using your current pull strategy. On many systems that means a merge. If your local branch and the remote branch have both moved forward, Git may create a merge commit. + +This is convenient, but it can surprise beginners because a simple "get me up to date" command may create an extra commit. + +### `git pull --ff-only` + +This is the safest default for many beginners. A fast-forward-only pull succeeds only if Git can move your branch pointer forward without creating a merge commit. + +If your branch has diverged from the remote, Git stops and tells you. That pause is helpful because it forces you to make an intentional decision about how to integrate your work. + +### `git pull --rebase` + +This command fetches remote commits and then reapplies your local commits on top of them. The resulting history is often linear and tidy, but rebasing rewrites commit history on your local branch. + +Rebasing is extremely useful, but it is best used when you understand what it means to replay commits and when it is safe to force-push updated history. + +```{admonition} Good beginner default +:class: tip +If you are still learning Git, `git pull --ff-only` is often a good habit for your local `main` branch. It avoids accidental merge commits and makes divergence visible right away. +``` + +## Pull requests + +A {term}`pull request` or PR is a GitHub request asking maintainers to review and merge a branch. A PR is not just a code diff. It is also the place where collaborators discuss design choices, request changes, run automated tests, and document why a change was made. + +In most PSLmodels-style workflows: + +* you open a PR from your branch on your fork +* the PR targets the main branch of the upstream repository +* review comments lead to additional commits on the same branch +* the PR is merged only after discussion and testing pass + +## Good beginner habits + +The following habits prevent many common problems. + +* Run `git status` often. +* Commit related changes together in small chunks. +* Write commit messages that explain the change. +* Open pull requests before your branch grows too large. +* Ask for help before resolving a confusing conflict by trial and error. +* Avoid working directly on `main`. + +The next chapter puts these ideas together into a step-by-step collaborative workflow. diff --git a/book/content/using/GitHubWorkflow.md b/book/content/using/GitHubWorkflow.md index 294fd4c..c3b5c93 100644 --- a/book/content/using/GitHubWorkflow.md +++ b/book/content/using/GitHubWorkflow.md @@ -1,21 +1,232 @@ (chap_workflow)= # Git and GitHub Collaborative Workflow -Git and GitHub can be a lot to wrap one's head around at the beginning. There's a lot of new jargon and one can get lost thinking about the various places that version of the files of kept (locally, in the `origin` repo, in the `upstream` repo). To help with this, we'll step through the typical GitHub workflow followed by most projects in the Policy Simulation Library. Throughout this, we'll reference the diagram below, which provides a visual representation of the GitHub workflow described. +Git and GitHub can feel confusing at first because you are managing changes in more than one place at the same time: your working files, your local Git history, your fork on GitHub, and the main repository maintained by a project. The goal of this chapter is to slow that process down and show the standard fork-and-pull-request workflow used by many repositories in the Policy Simulation Library (PSLmodels) organization. + +Throughout this chapter, we refer to the main project repository as `upstream` and your personal fork as `origin`. ![Git Diagram](GitFlowDiag.png) -We begin assuming that you have installed and configured Git on your machine and setup a GitHub account. Now assume that you want to start working with a project at `https://github.com/PSLmodels/project`. The step-by-step approach to this is as follows: +## The workflow at a glance + +In most PSLmodels repositories, a new contributor follows this cycle: + +1. Fork the upstream repository on GitHub. +2. Clone the fork to your computer. +3. Add the upstream repository as a second remote. +4. Keep your local `main` branch synchronized with `upstream/main`. +5. Create a feature branch for your work. +6. Make changes, stage them, and commit them locally. +7. Push the branch to your fork on GitHub. +8. Open a pull request from your branch to the upstream repository. +9. Respond to review comments by adding more commits to the same branch. +10. After the pull request is merged, update your local `main` branch and delete the temporary feature branch. + +## Step 1: Fork the main repository + +Suppose you want to contribute to `https://github.com/PSLmodels/project`. + +Go to that repository on GitHub and click the "Fork" button. GitHub will create a copy of the repository under your own account, for example: + +* upstream repository: `https://github.com/PSLmodels/project` +* your fork: `https://github.com/yourname/project` + +Your fork gives you a place where you can push branches even if you do not have direct write access to the upstream repository. + +## Step 2: Clone your fork + +Open a terminal and move to a directory where you want the project to live. Then clone your fork: + +```bash +git clone https://github.com/yourname/project.git +cd project +``` + +At this point, Git has already created one remote for you: + +```bash +git remote -v +``` + +That remote is usually named `origin`, and it points to your fork. + +## Step 3: Add the upstream remote + +Because you cloned your fork rather than the main PSLmodels repository, you still need to tell Git where the upstream project lives: + +```bash +git remote add upstream https://github.com/PSLmodels/project.git +git remote -v +``` + +After this, you should see both `origin` and `upstream`. + +```{admonition} Why this matters +:class: note +Beginners often think `origin` is a special word that means "the main repository." It does not. `origin` is simply the default name Git gives to the remote you cloned from. In a fork-based workflow, `origin` is usually your fork and `upstream` is the main project repository. +``` + +## Step 4: Keep local `main` synchronized with upstream + +Before starting new work, update your local `main` branch so that it matches the latest tested code from the main project. + +```bash +git checkout main +git fetch upstream +git merge upstream/main +git push origin main +``` + +Some older repositories may still use `master` instead of `main`. If so, replace `main` with `master` in the commands above. + +Many contributors prefer `git pull --ff-only upstream main` instead of separate `fetch` and `merge` commands. Both approaches are fine once you understand what they do. We separate them here because it makes the sequence easier to understand. + +## Step 5: Create a feature branch + +Do not work directly on `main`. Create a branch for the change you want to make: + +```bash +git checkout -b fix-typo-intro +``` + +Choose a branch name that briefly describes the task. Good names make it easier for reviewers and for your future self to understand what the branch is for. + +## Step 6: Edit, inspect, stage, and commit + +Now make your changes in the files you want to update. While working, use `git status` often: + +```bash +git status +``` + +When you are ready to record part of your work, stage the relevant file changes: + +```bash +git add path/to/file +``` + +Then commit the staged changes: + +```bash +git commit -m "Clarify glossary definitions for branch and commit" +``` + +You can repeat the cycle of edit, `git add`, and `git commit` as many times as needed. + +```{admonition} A commit records staged changes +:class: tip +The most common beginner mistake here is to think that `git commit` takes a file path. Normally it does not. First stage files with `git add`, then create the commit with `git commit -m "message"`. +``` + +## Step 7: Push the branch to your fork + +Once your branch contains commits you want backed up or reviewed, push it to your fork: + +```bash +git push origin fix-typo-intro +``` + +The first time you push a new branch, GitHub will usually show a banner offering to open a pull request. + +## Step 8: Open a pull request + +Open the pull request from your branch on your fork to the upstream repository's main branch. + +In practice, this usually means: + +* base repository: `PSLmodels/project` +* base branch: `main` +* compare repository: `yourname/project` +* compare branch: `fix-typo-intro` + +Write a PR title that states the change clearly. In the description, explain: + +* what you changed +* why you changed it +* anything a reviewer should pay special attention to + +If the repository has a PR template, fill it out carefully. That template usually reflects the maintainers' expectations. + +## Step 9: Respond to review and CI feedback + +Opening the pull request is not the end of the process. In most collaborative repositories, three things happen next. + +### Review comments + +Maintainers or other contributors may ask questions or request revisions. Make those changes locally on the same branch, then commit and push again: + +```bash +git add path/to/file +git commit -m "Address PR feedback on workflow example" +git push origin fix-typo-intro +``` + +The pull request updates automatically. + +### Automated tests + +Many repositories run continuous integration checks on every pull request. These checks might run unit tests, style checks, builds, or documentation validation. If a CI check fails, inspect the failure and update your branch until the checks pass. + +### Keeping the branch current + +If the upstream repository changes while your PR is open, you may need to update your branch: + +```bash +git checkout main +git fetch upstream +git merge upstream/main +git checkout fix-typo-intro +git merge main +``` + +More advanced contributors may prefer rebasing here, but merging `main` into your feature branch is often simpler for beginners. + +## Step 10: After the pull request is merged + +Once your PR is merged, clean up your local repository. + +First, update your local `main` branch: + +```bash +git checkout main +git fetch upstream +git merge upstream/main +git push origin main +``` + +Then delete the feature branch locally: + +```bash +git branch -d fix-typo-intro +``` + +You can also delete the branch from your fork on GitHub, either through the GitHub interface or with: + +```bash +git push origin --delete fix-typo-intro +``` + +## What PSLmodels-style workflow is trying to protect + +This workflow may seem elaborate for a one-line fix, but each part serves a purpose. + +* Forks let anyone propose changes without giving everyone write access to the main repository. +* Branches isolate one task from another. +* Pull requests create a clear place for review and discussion. +* CI checks help catch bugs before they reach the default branch. +* Keeping `main` clean ensures you always have a reliable branch to build from. + +Those safeguards are especially valuable in research and policy-model repositories, where code, documentation, and results often need to be reproducible and carefully reviewed. + +## Common beginner mistakes -1. Make a fork of the main repository. To do this, you navigate to the GitHub repository of the project, `https://github.com/PSLmodels/project` and click "Fork" in the top right of the screen. Select the destination for the fork (which you will probably want to be your GitHub profile). This will create a copy of the project on the Internet at `https://github.com/yourname/project`. We will refer to `https://github.com/PSLmodels/project` as the "upstream" repository and your fork, `https://github.com/yourname/project`, as the "origin" repository. -2. Make a copy of the new fork on your local machine. To do this, you will click on the green "Code" button on the webpage of your fork. Copy that url that is shown there. Next, open your command prompt/terminal on your computer and navigate to a directory where you would like to save this repository. Once there, type `git clone https://github.com/username/project.git`, where the url is what you just copied and pasted into your terminal. Hit "return" and you will see the files being downloaded onto your computer into a new folder called `project`. Now you have a copy of all the files for the project on your computer and are ready to work with them. -3. Once ready to start editing files, we'll want to create a new branch to work on. Note that once you cloned the repository, one branch was created, this one is called `master` (or `main`). We want to keep the `master`/`main` branch synced to the upstream repository, ensuring that it has tested, working code. We can create a new branch off of the `master` branch by (1) ensuring we are on the `master` branch with the command `git checkout master` and then (2) checking out a new branch with `git checkout -b new_branch_name`. -4. If sometime has gone by since we got files from the upstream branch, we can check to make sure we have the latest files by doing a `git fetch upstream`. -5. After fetching the changes, we can add them to our local files with the command `git merge upstream/master`. -6. If changes were found, we can also sync our remote fork with a `git push origin branch_name`. -7. Once we make edits to local files, we will want to add our changes to the Git history. We do this by `git add path/filename`. -8. After adding our changes to the history, we will want to commit them so that they are included in our next push to the remote repository. Do this by `git commit path/filename`. -9. When we have made some set of changes, we may want to push them to our remote fork to (1) back them up, (2) more easily share them with others, and/or (3) to set up a Pull Request to change the code in the upstream repository. Do do this, we'll do `git push origin new_branch_name`. -10. After we've made all the changes we want to the code base (which may entail cycling through steps 4-9 several times), we will want to open a pull request. The easiest way to do this is to navigate to the upstream repo at `https://github.com/PSLmodels/project`, then click on the "Pull requests" tab. If you have pushed your changes to your origin, you should see a green button that says "Compare & pull request". Click then to open a pull request. Enter a descriptive title and more description in the dialogue box below. This will help project maintainers understand and review your code changes. If everything looks good, it'll get merged into the code base. If not, maintainers will offer helpful feedback to address any issues and you can revise the code by committing and pushing new changes as in steps 7-9. +The following problems happen often and are normal parts of learning Git. +* Working directly on `main` instead of a feature branch. +* Forgetting to add the `upstream` remote. +* Committing too many unrelated changes together. +* Pushing to the wrong branch. +* Opening a PR from `main` instead of from the feature branch. +* Trying to fix a failed CI check on GitHub without reproducing the problem locally. +The next chapters cover some of these topics in more detail, especially merge conflicts and pull request discussions. diff --git a/book/content/using/IssuePRthreads.md b/book/content/using/IssuePRthreads.md index f0f036b..913eff6 100644 --- a/book/content/using/IssuePRthreads.md +++ b/book/content/using/IssuePRthreads.md @@ -1,4 +1,73 @@ (chap_threads)= # Issue Threads and PR Threads -Go through the value of Issue threads and pull request (PR) threads. +GitHub is not just a place to store code. It is also the record of why a project changed, who discussed the change, what concerns were raised, and how the final decision was made. Two of the most important parts of that record are issue threads and pull request threads. + +## Issue threads + +An issue thread is the right place to discuss a problem, idea, bug report, or proposed enhancement before code is merged. + +Issues are especially useful for: + +* reporting a bug +* proposing a new feature +* asking whether a change would be welcome +* documenting design decisions +* giving newcomers a place to ask clarifying questions + +For new contributors, issues reduce wasted effort. A quick issue can confirm that a problem is real, that a maintainer agrees with the direction, and that nobody else is already solving the same thing. + +## Pull request threads + +A pull request thread is where a proposed code change is reviewed. Unlike an issue, a PR thread is attached to a concrete set of commits and file diffs. + +PR threads are where collaborators: + +* review the implementation +* ask for revisions +* discuss tradeoffs +* link related issues +* confirm that tests and documentation are adequate + +The PR thread becomes part of the long-term project history, which is valuable when future contributors want to understand why a change was accepted. + +## How issues and PRs work together + +In many healthy repositories, the flow looks like this: + +1. An issue identifies a problem or desired improvement. +2. Someone opens a branch and implements a fix. +3. A pull request references the issue. +4. Review discussion happens in the PR. +5. Once merged, the issue is closed. + +This structure keeps problem definition and code review related, but distinct. + +## Good habits in issue threads + +* Use a clear title. +* Include steps to reproduce a bug when relevant. +* Explain the expected behavior and the actual behavior. +* Link to related PRs, commits, or outside references. +* Be specific about what help you need. + +## Good habits in PR threads + +* Explain what changed and why. +* Keep PRs focused on one main task. +* Respond to review comments directly in the thread. +* Push follow-up commits to the same branch rather than opening a new PR for each revision. +* Say when you intentionally did not make a suggested change and explain why. + +## Why this matters in PSLmodels-style collaboration + +Many PSLmodels repositories support research, teaching, and policy analysis. In that setting, the discussion around a change can be almost as important as the final code itself. + +A good issue or PR thread helps future contributors answer questions such as: + +* Why was this modeling choice made? +* Was an alternative approach considered? +* Did maintainers expect follow-up work? +* Were there known limitations at the time of merge? + +That kind of written context makes collaboration more scalable and lowers the cost of onboarding new contributors. diff --git a/book/content/using/MergeConflicts.md b/book/content/using/MergeConflicts.md index 616b8c4..847c1fa 100644 --- a/book/content/using/MergeConflicts.md +++ b/book/content/using/MergeConflicts.md @@ -1,7 +1,104 @@ (chap_mergeconfl)= # Merge Conflicts and File Diffs -Go through merge conflict examples +A merge conflict happens when Git cannot safely combine two sets of changes on its own. This often sounds frightening to beginners, but a conflict is not a disaster. It is simply Git stopping and asking a human to decide which version of the text should be kept. + +## What a conflict usually means + +Most conflicts happen because two branches changed the same lines of the same file, or because one branch changed a file that another branch deleted or renamed. + +Typical situations include: + +* two contributors editing the same paragraph in a documentation file +* one contributor renaming a function while another edits calls to that function +* a long-lived branch falling behind `main` + +## Why Git stops + +Git can merge many independent changes automatically. For example, if you edit one file and another contributor edits a different file, Git usually has no trouble combining those changes. + +But if Git sees competing edits to the same section, it does not guess. It pauses so that you can inspect the result. + +## A simple conflict workflow + +Suppose you are on your feature branch and want to merge in the latest changes from `main`: + +```bash +git checkout my-feature-branch +git merge main +``` + +If Git reports a conflict, use: + +```bash +git status +``` + +Git will list the files that need attention. + +Open one of those files and you may see markers like these: + +```text +<<<<<<< HEAD +Your version of the text +======= +Incoming version of the text +>>>>>>> main +``` + +These markers show the competing edits. Your job is to rewrite that section so that the file contains the correct final version and none of the conflict markers remain. + +## Resolving the conflict + +After editing the file into its final state: + +```bash +git add path/to/conflicted-file +git commit +``` + +That commit completes the merge. + +If several files are conflicted, resolve each one, stage each resolved file, and then make the merge commit after all conflicts are cleared. + +## Good habits during conflict resolution + +* Read the surrounding code or text, not just the marked lines. +* Decide what the final intended result should be. +* Run tests or rebuild documentation after resolving conflicts. +* Use `git diff` to inspect the final merged version before committing. +* Ask for help if you are not sure which side should win. + +## File diffs + +A {term}`diff` is a comparison showing how one version of a file differs from another. Diffs are the basic unit of code review on GitHub and one of the best tools for understanding a merge conflict. + +Useful commands include: + +```bash +git diff +git diff main..my-feature-branch +git diff --staged +``` + +These help you answer three different questions: + +* What have I changed but not staged? +* What have I staged for the next commit? +* How does my branch differ from another branch? + +## Conflict prevention + +You cannot eliminate conflicts entirely, but you can reduce them. + +* Keep branches focused and short-lived. +* Sync your local `main` branch with `upstream/main` regularly. +* Merge or rebase against current `main` before a branch drifts too far. +* Communicate with collaborators when multiple people are editing the same files. + +## Keep calm + +The key lesson is that a merge conflict is a request for judgment, not proof that something has gone wrong beyond repair. Include {numref}`Figure %s ` in this discussion with respect to proper handling of merge conflicts versus complete burndowns. diff --git a/book/myst.yml b/book/myst.yml new file mode 100644 index 0000000..bf91364 --- /dev/null +++ b/book/myst.yml @@ -0,0 +1,49 @@ +# See docs at: https://mystmd.org/guide/frontmatter +version: 1 + +project: + id: a725c955-45c9-432f-96b8-2e1f3ced43fb + title: Git and GitHub Use, Collaboration, and Workflow + authors: + - name: Richard W. Evans + - name: Jason DeBacker + github: PSLmodels/Git-Tutorial + toc: + - file: content/intro.md + - title: Git and GitHub Background and History + children: + - file: content/background/VCgitHistory.md + - file: content/background/GitHubHistory.md + - file: content/background/GoogleDropboxBox.md + - title: Using Git and GitHub + children: + - file: content/using/installing_git.md + - file: content/using/git_config.md + - file: content/using/create_GitHub_acct.md + - file: content/using/GitHubBasics.md + - file: content/using/GitHubWorkflow.md + - file: content/using/MergeConflicts.md + - file: content/using/IssuePRthreads.md + - file: content/using/Attribution.md + - file: content/using/AdvancedGit.md + - file: content/using/GitCheatSheet.md + - title: Open Source Repository Management + children: + - file: content/repomgt/license.md + - file: content/repomgt/testing_CI.md + - file: content/repomgt/documentation.md + - file: content/repomgt/virt_env.md + - title: Text Editor Git Configuration + children: + - file: content/txteditor/TextEditorsIntro.md + - file: content/txteditor/VScode.md + - title: Appendix + children: + - file: content/glossary.md + - file: content/bibliography.md + +site: + template: book-theme + options: + favicon: _static/logo/favicon.ico + logo: _static/logo/jb_git_tutorial_logo.png From e0c1d4ab036b9d8235edf28a755f7c34adcaf932 Mon Sep 17 00:00:00 2001 From: Jason DeBacker Date: Tue, 19 May 2026 12:02:33 -0400 Subject: [PATCH 2/6] rename from .bak --- book/{_config.yml.jb1.bak => config.yml} | 0 book/{_toc.yml.jb1.bak => toc.yml} | 0 2 files changed, 0 insertions(+), 0 deletions(-) rename book/{_config.yml.jb1.bak => config.yml} (100%) rename book/{_toc.yml.jb1.bak => toc.yml} (100%) diff --git a/book/_config.yml.jb1.bak b/book/config.yml similarity index 100% rename from book/_config.yml.jb1.bak rename to book/config.yml diff --git a/book/_toc.yml.jb1.bak b/book/toc.yml similarity index 100% rename from book/_toc.yml.jb1.bak rename to book/toc.yml From 8980d8c24726c04c2b5794715ebcd4a6f1579042 Mon Sep 17 00:00:00 2001 From: Jason DeBacker Date: Tue, 19 May 2026 12:02:39 -0400 Subject: [PATCH 3/6] update jb version --- environment.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/environment.yml b/environment.yml index 0bd3a5f..c481e65 100644 --- a/environment.yml +++ b/environment.yml @@ -11,4 +11,4 @@ dependencies: - pip - matplotlib - pip: - - jupyter-book>=0.7.4 + - jupyter-book>=2.0.0 From 41c8fc1bfadbe0b55e03f8fa852dfa8a0f4fdf7e Mon Sep 17 00:00:00 2001 From: Jason DeBacker Date: Tue, 19 May 2026 12:11:26 -0400 Subject: [PATCH 4/6] remove jb 1 files --- book/config.yml | 46 ---------------------------------------------- book/toc.yml | 37 ------------------------------------- 2 files changed, 83 deletions(-) delete mode 100644 book/config.yml delete mode 100644 book/toc.yml diff --git a/book/config.yml b/book/config.yml deleted file mode 100644 index ec5bb07..0000000 --- a/book/config.yml +++ /dev/null @@ -1,46 +0,0 @@ -#################################################### -# Book settings -title : Git and GitHub Use, Collaboration, and Workflow -author : Richard W. Evans and Jason DeBacker -copyright : '2020' -logo : '_static/logo/jb_git_tutorial_logo.png' - -#################################################### -# Execution settings -execute: - execute_notebooks : cache - -#################################################### -# HTML-specific settings -html: - favicon : "_static/logo/favicon.ico" # generated at https://www.favicon.cc/ - use_edit_page_button : true # Whether to add an "edit this page" button to pages. If `true`, repository information in repository: must be filled in - use_repository_button : true # Whether to add a link to your repository button - use_issues_button : true # Whether to add an "open an issue" button - extra_navbar : Powered by Jupyter Book # Will be displayed underneath the left navbar. - # extra_footer : "" # Will be displayed underneath the footer. - # google_analytics_id : "" # A GA id that can be used to track book views. - home_page_in_navbar : true # Whether to include your home page in the left Navigation Bar - baseurl : "https://pslmodels.github.io/Git-Tutorial/" # The base URL where your book will be hosted. Used for creating image previews and social links. e.g.: https://mypage.com/mybook/ - -#################################################### -# Launch button settings -launch_buttons: - notebook_interface : 'classic' # The interface interactive links will activate ["classic", "jupyterlab"] - # binderhub_url : https://mybinder.org # The URL of the BinderHub (e.g., https://mybinder.org) - # jupyterhub_url : "" # The URL of the JupyterHub (e.g., https://datahub.berkeley.edu) - # thebelab : false # Add a thebelab button to pages (requires the repository to run on Binder) - -#################################################### -# Information about where the book exists on the web -repository: - url : https://github.com/PSLmodels/Git-Tutorial # The URL to your book's repository - path_to_book : 'book' # A path to your book's folder, relative to the repository root - branch : master # Which branch of the repository should be used when creating links - -#################################################### -# LaTeX information -latex: - latex_engine : 'xelatex' - latex_documents: - targetname : book.tex diff --git a/book/toc.yml b/book/toc.yml deleted file mode 100644 index c81b93b..0000000 --- a/book/toc.yml +++ /dev/null @@ -1,37 +0,0 @@ -- file: content/intro - -- part: Git and GitHub Background and History - chapters: - - file: content/background/VCgitHistory - - file: content/background/GitHubHistory - - file: content/background/GoogleDropboxBox - -- part: Using Git and GitHub - chapters: - - file: content/using/installing_git - - file: content/using/git_config - - file: content/using/create_GitHub_acct - - file: content/using/GitHubBasics - - file: content/using/GitHubWorkflow - - file: content/using/MergeConflicts - - file: content/using/IssuePRthreads - - file: content/using/Attribution - - file: content/using/AdvancedGit - - file: content/using/GitCheatSheet - -- part: Open Source Repository Management - chapters: - - file: content/repomgt/license - - file: content/repomgt/testing_CI - - file: content/repomgt/documentation - - file: content/repomgt/virt_env - -- part: Text Editor Git Configuration - chapters: - - file: content/txteditor/TextEditorsIntro - - file: content/txteditor/VScode - -- part: Appendix - chapters: - - file: content/glossary - - file: content/bibliography From ad7ee84c4aadc63acf389fe0cfb509a05ee076f9 Mon Sep 17 00:00:00 2001 From: Jason DeBacker Date: Tue, 19 May 2026 12:14:23 -0400 Subject: [PATCH 5/6] update workflow for JB 2.0 --- .github/workflows/check_jupyterbook.yml | 5 +++-- .github/workflows/deploy_jupyterbook.yml | 4 ++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/.github/workflows/check_jupyterbook.yml b/.github/workflows/check_jupyterbook.yml index baf368a..2976c93 100644 --- a/.github/workflows/check_jupyterbook.yml +++ b/.github/workflows/check_jupyterbook.yml @@ -15,10 +15,11 @@ jobs: with: activate-environment: jb-git-tutorial environment-file: environment.yml - python-version: 3.7 + python-version: 3.13 auto-activate-base: false - name: Build # Build Jupyter Book shell: bash -l {0} run: | - jb build ./book \ No newline at end of file + cd book + jupyter-book build \ No newline at end of file diff --git a/.github/workflows/deploy_jupyterbook.yml b/.github/workflows/deploy_jupyterbook.yml index fce452d..80771cf 100644 --- a/.github/workflows/deploy_jupyterbook.yml +++ b/.github/workflows/deploy_jupyterbook.yml @@ -18,14 +18,14 @@ jobs: with: activate-environment: jb-git-tutorial environment-file: environment.yml - python-version: 3.7 + python-version: 3.13 auto-activate-base: false - name: Build # Build Jupyter Book shell: bash -l {0} run: | cd book - jb build . + jupyter-book build - name: Deploy uses: JamesIves/github-pages-deploy-action@releases/v3 From c8901b492a5c932b5f3a0c69c0ff84028928327a Mon Sep 17 00:00:00 2001 From: Jason DeBacker Date: Tue, 19 May 2026 17:46:17 -0400 Subject: [PATCH 6/6] Expand setup and repository management chapters --- README.md | 11 +-- book/content/repomgt/documentation.md | 95 +++++++++++++++++++- book/content/repomgt/license.md | 78 ++++++++++++++++- book/content/repomgt/testing_CI.md | 105 ++++++++++++++++++++++- book/content/repomgt/virt_env.md | 93 +++++++++++++++++++- book/content/using/GitCheatSheet.md | 33 +++---- book/content/using/create_GitHub_acct.md | 47 +++++++++- book/content/using/git_config.md | 105 ++++++++++++++++++++--- book/content/using/installing_git.md | 36 +++++++- 9 files changed, 563 insertions(+), 40 deletions(-) diff --git a/README.md b/README.md index c74e103..8573304 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ [![PSL incubating](https://img.shields.io/badge/PSL-incubating-ff69b4.svg)](https://www.PSLmodels.org) # Git and GitHub Use, Collaboration, and Workflow -This repository houses the source code and content files for the open access, *Git and GitHub Use, Collaboration, and Workflow* book tutorial and training used by many contributors in the [PSLmodels](https://github.com/PSLmodels) community. This project uses [Jupyter Book](https://jupyterbook.org/intro.html) to create the published HTML version of the tutorial. The source lives in this GitHub repository, and the compiled book is available at [https://pslmodels.github.io/Git-Tutorial](https://pslmodels.github.io/Git-Tutorial). This project is maintained by [Richard W. Evans](https://sites.google.com/site/rickecon/) and [Jason DeBacker](https://www.jasondebacker.com/). +This repository houses the source code and content files for the open access, *Git and GitHub Use, Collaboration, and Workflow* book tutorial and training used by many contributors in the [PSLmodels](https://github.com/PSLmodels) community. This project uses [Jupyter Book](https://jupyterbook.org/intro.html) and MyST to create the published HTML version of the tutorial. The source lives in this GitHub repository, and the compiled book is available at [https://pslmodels.github.io/Git-Tutorial](https://pslmodels.github.io/Git-Tutorial). This project is maintained by [Richard W. Evans](https://sites.google.com/site/rickecon/) and [Jason DeBacker](https://www.jasondebacker.com/). ## Contributing to the book @@ -23,16 +23,17 @@ conda activate jb-git-tutorial ### Building a Jupyter Book -Run the following command in your terminal: +Run the following command in your terminal from the `book` directory: ```bash -jb build ./book +cd book +jupyter book build --all ``` If you would like to work with a clean build, you can empty the build folder by running: ```bash -jb clean ./book +jupyter book clean ``` If jupyter execution is cached, this command will not delete the cached folder. @@ -40,7 +41,7 @@ If jupyter execution is cached, this command will not delete the cached folder. To remove the build folder (including `cached` executables), you can run: ```bash -jb clean --all ./book +jupyter book clean --all ``` ## Notes diff --git a/book/content/repomgt/documentation.md b/book/content/repomgt/documentation.md index 61453a3..298ea3e 100644 --- a/book/content/repomgt/documentation.md +++ b/book/content/repomgt/documentation.md @@ -1,6 +1,97 @@ (chap_docum)= # Documentation -Code documentation standards, README.md, Jupyter Book, actual books/manual, Sphinx +Good documentation lowers the cost of using, reviewing, and contributing to a repository. In many projects, documentation quality is one of the main factors that determines whether a newcomer becomes a contributor or gives up early. -The [Jupyter Book](https://jupyterbook.org/intro.html) documentation has a section in the [MyST Markdown Overview](https://jupyterbook.org/content/myst.html) chapter suggesting that if you are using VS Code to modify markdown files, you should download the [vscode MyST markdown extension](https://marketplace.visualstudio.com/items?itemName=ExecutableBookProject.myst-highlight) that provides highlighting and other features. \ No newline at end of file +## Why documentation matters + +Documentation helps different audiences answer different questions. + +* Users want to know what the project does and how to run it. +* Contributors want to know how to set up the environment, run tests, and submit changes. +* Maintainers want project knowledge to live in the repository rather than only in their heads. + +Documentation is part of the product, not an afterthought. + +## The role of the README + +For many repositories, `README.md` is the first page a newcomer sees. A good README usually includes: + +* what the repository is for +* how to install or set it up +* how to run the project +* how to contribute +* where to find more detailed documentation + +If the README grows too large, that is often a sign the project is ready for fuller documentation rather than a sign the README should be abandoned. + +## Layered documentation + +Healthy projects often use several layers of documentation. + +* README for quick orientation +* contributor guide for workflow and setup +* API or developer docs for code-level detail +* tutorials or examples for common tasks +* changelog or release notes for project history + +Each layer serves a different purpose. + +## Documentation for code and research repositories + +In research-oriented open-source work, documentation may need to cover more than software usage. It may also need to explain: + +* modeling assumptions +* data inputs +* expected outputs +* limitations and known caveats +* how results should be interpreted + +This kind of context is often essential for reproducibility. + +## Documentation tooling + +Projects use many different documentation tools. Common options include: + +* plain Markdown in the repository +* Jupyter Book +* Sphinx +* notebooks and example galleries + +The best tool is often the one the team will maintain consistently. + +This repository uses Jupyter Book and MyST Markdown. If you are editing MyST files in VS Code, the [MyST extension](https://marketplace.visualstudio.com/items?itemName=ExecutableBookProject.myst-highlight) can make the authoring experience smoother. + +## What good contributor documentation should cover + +A newcomer should not need tribal knowledge to contribute. At minimum, contributor documentation should explain: + +* how to create the project environment +* how to run tests +* how to build docs, if applicable +* how to format or lint code +* the expected pull request workflow + +## Treat documentation as part of review + +One good maintainer habit is to ask during review: does this change require documentation updates? + +Examples include: + +* a new feature +* a changed command-line interface +* a new dependency +* a workflow change +* a changed result or interpretation in a research project + +If the answer is yes, the docs should usually change in the same pull request. + +## Good habits + +* Keep examples current. +* Prefer concrete commands over vague prose. +* Write for a reader who does not already know the project. +* Document the supported setup path explicitly. +* Fix stale docs quickly once discovered. + +Documentation quality compounds over time. A small improvement today often saves many future contributors from the same confusion. diff --git a/book/content/repomgt/license.md b/book/content/repomgt/license.md index c9c6876..a16e7d7 100644 --- a/book/content/repomgt/license.md +++ b/book/content/repomgt/license.md @@ -1,4 +1,80 @@ (chap_license)= # Open Source License Options -Discuss open source license options as well as private repository options. Benefits and drawbacks of each. +Choosing a license is one of the most important early decisions for an open-source repository. A license tells other people what they are allowed to do with the code and what obligations come with that use. + +## Why a license matters + +Without a license, a public GitHub repository may be visible, but others usually do not have clear legal permission to reuse, modify, or redistribute the code. + +A clear license: + +* makes reuse expectations explicit +* reduces uncertainty for contributors and users +* helps organizations decide whether they can adopt the project +* clarifies how improvements can flow back into the community + +## Common open-source license families + +Open-source licenses often fall into two broad categories. + +### Permissive licenses + +Permissive licenses such as MIT, BSD, and Apache 2.0 usually allow broad reuse with relatively few obligations beyond preserving notices and, in some cases, including the license text. + +These licenses are often chosen when the project wants to maximize downstream adoption. + +### Copyleft licenses + +Copyleft licenses such as GPL require derivative works to remain under compatible open-source terms when distributed. These licenses are often chosen when maintainers want changes to remain open in downstream redistributions. + +## Questions maintainers should ask + +Before choosing a license, it helps to ask: + +* Do we want the broadest possible reuse? +* Do we want derivative works to remain open? +* Are there funder, employer, or institutional requirements? +* Are we combining code with dependencies that have license compatibility constraints? + +## A practical beginner rule + +If you are contributing to an existing repository, do not choose a new license on your own. Follow the license already used by the project and ask maintainers before making changes to licensing files. + +If you are starting a new repository, choose a standard well-known license rather than inventing custom terms. + +## Private repositories + +Not every repository is open source. Some projects are private because they contain sensitive data, internal tools, unpublished research, or code that an organization is not ready to release. + +A private repository can still benefit from all the workflow practices described in this book: + +* branches +* pull requests +* code review +* CI +* issue tracking + +The difference is access control, not the basic collaboration model. + +## Documentation matters too + +Once a project chooses a license, that decision should be easy to find. + +Good practice includes: + +* a top-level `LICENSE` file +* a short note in the `README.md` +* consistent statements in package metadata when relevant + +## The social side of licensing + +Licensing is partly legal, but it is also cultural. A project's license signals something about how it wants to participate in the open-source ecosystem. + +For collaborative research repositories, it is often worth pairing the code license with clear guidance on: + +* attribution expectations +* contribution workflow +* data or documentation licensing, if different from the code + +That combination makes the repository easier to understand and safer to contribute to. diff --git a/book/content/repomgt/testing_CI.md b/book/content/repomgt/testing_CI.md index 63801f6..6b23870 100644 --- a/book/content/repomgt/testing_CI.md +++ b/book/content/repomgt/testing_CI.md @@ -1,4 +1,107 @@ (chap_testCI)= # Continuous Integration (CI) and Unit Testing -Give a tutorial on how to set up CI and coverage in a repository. Test driven development (TDD), references. Differences among unit testing, regression testing, and CI. +As repositories grow, contributors need more than good intentions to keep the default branch working. They need automatic checks that run consistently for every proposed change. This is the role of {term}`continuous integration`, usually abbreviated as CI. + +## What CI does + +Continuous integration is the practice of running automated checks whenever code is pushed or a pull request is opened or updated. In GitHub-based projects, these checks are often run with GitHub Actions, but the basic idea is broader than any one platform. + +Typical CI jobs include: + +* running unit tests +* running style or lint checks +* building documentation +* checking that notebooks or examples execute successfully +* measuring code coverage + +The purpose of CI is not to replace human review. It is to catch routine problems quickly and consistently so reviewers can focus on design, correctness, and maintainability. + +## Unit testing, regression testing, and CI + +These ideas are related, but they are not identical. + +### Unit testing + +Unit tests check small, individual pieces of the code base. A unit test might verify that one function returns the expected output for a known input. + +### Regression testing + +Regression tests check that behavior that used to work still works after a change. A regression test may be a unit test, but it can also be a larger end-to-end or integration-style test that guards against reintroducing a known bug. + +### Continuous integration + +CI is the automation framework that runs tests and other checks whenever the repository changes. CI is the delivery mechanism, not the test itself. + +## Why this matters in collaborative repositories + +In a shared repository, a broken default branch creates friction for everyone. CI lowers that risk by making sure that each pull request is evaluated under the same rules. + +For PSLmodels-style collaboration, CI is especially valuable because repositories often combine code, documentation, examples, and research outputs. A pull request may look harmless in one file while breaking something important elsewhere. + +## What a healthy CI setup looks like + +A good CI setup is usually: + +* fast enough that contributors will actually pay attention to it +* reliable enough that failures usually mean something real +* broad enough to test the most important project behavior +* visible in pull requests so contributors and reviewers can act on results + +For a small project, that might mean only a few checks. For a mature project, it might involve several operating systems, multiple Python versions, documentation builds, and separate slow-running jobs. + +## A simple CI path for a Python project + +Many Python repositories start with something like the following: + +1. install project dependencies +2. run the test suite +3. run formatting or linting checks +4. optionally collect coverage + +If the project publishes documentation, another good early step is to add a documentation build check so broken examples or malformed Markdown are caught before merge. + +## Code coverage + +Coverage tools measure how much of your code is exercised by the test suite. Coverage is useful, but it should be interpreted carefully. + +Coverage can answer: + +* Did our tests execute this function at all? +* Which files have very little test attention? + +Coverage cannot answer: + +* Are the tests meaningful? +* Do the tests assert the right behavior? +* Are the most important edge cases covered? + +High coverage is not the same thing as high quality. Still, coverage reports can be useful for spotting neglected parts of a code base. + +## Test-driven development + +Test-driven development, or TDD, is the practice of writing a failing test before implementing the code that makes it pass. Some teams use TDD heavily, while others use it selectively. + +Even if a project does not follow TDD strictly, it is still a strong habit to add or update tests whenever behavior changes. + +## Good contributor habits around CI + +* Run relevant tests locally before pushing, when practical. +* Read the CI failure output rather than guessing. +* Keep fixes for a failing check on the same pull request branch. +* Treat flaky tests as real maintenance problems. +* Make sure documentation changes are tested if the repo has docs automation. + +## Good maintainer habits around CI + +* Keep required checks clear and documented. +* Avoid adding slow or brittle checks without strong benefit. +* Make failure messages readable. +* Update CI when the supported environment changes. +* Keep secrets and deployment credentials out of general-purpose workflows. + +## Start simple + +Beginners sometimes think a project needs an elaborate CI system before it can benefit from automation. In practice, even a single workflow that installs dependencies and runs tests can dramatically improve collaboration quality. + +The best CI setup is usually one that the team understands, trusts, and maintains. diff --git a/book/content/repomgt/virt_env.md b/book/content/repomgt/virt_env.md index bfcc7ef..1140273 100644 --- a/book/content/repomgt/virt_env.md +++ b/book/content/repomgt/virt_env.md @@ -1,4 +1,95 @@ (chap_virtenv)= # Virtual Environments -Give tutorial on setting up virtual environments with Conda. Discuss also Python's native virtual environments as well as Docker. +One of the first practical problems new contributors encounter is that the same repository may behave differently on different machines. Virtual environments help reduce that problem by isolating project dependencies from the rest of the system. + +## Why virtual environments matter + +Without an isolated environment, installing a package for one project can interfere with another project. Version conflicts become harder to diagnose, and reproducing another contributor's setup becomes much less reliable. + +Virtual environments help by: + +* keeping project dependencies together +* reducing accidental conflicts with global packages +* making installation steps more reproducible +* making onboarding easier for new contributors + +## Conda environments + +Many scientific Python projects use Conda because it manages both Python packages and non-Python dependencies well. + +If a repository includes an `environment.yml` file, the usual workflow is: + +```bash +conda env create -f environment.yml +conda activate jb-git-tutorial +``` + +If the environment already exists and the specification changed, contributors often update it with: + +```bash +conda env update -f environment.yml --prune +``` + +The `--prune` option removes packages that are no longer listed in the environment specification. + +## Python's built-in virtual environments + +Some repositories prefer Python's built-in `venv` tool instead of Conda. A common pattern looks like this: + +```bash +python -m venv .venv +source .venv/bin/activate +pip install -r requirements.txt +``` + +This approach is lightweight and widely available, though it may require more manual handling of non-Python system dependencies. + +## Which approach should a project choose? + +There is no single best choice for every repository. + +Conda is often a good fit when: + +* the project has compiled dependencies +* contributors work across different operating systems +* the repository includes scientific or numeric libraries + +`venv` is often a good fit when: + +* the dependency stack is relatively simple +* the project is pure Python +* the team wants a minimal default setup + +The important thing is to document one recommended path clearly. + +## Reproducibility and lock-in + +An environment file helps, but it does not guarantee perfect reproducibility forever. Package indexes change, upstream packages release new versions, and operating-system differences remain relevant. + +For that reason, it is good practice to: + +* keep dependency files under version control +* update them intentionally +* document the supported Python version +* mention platform-specific steps when needed + +## Docker + +Some projects go one step further and package the whole runtime in Docker. This can be useful when: + +* the environment is complex +* the project includes services or system packages that are hard to install locally +* exact reproducibility matters a great deal + +Docker is powerful, but it adds complexity. For many beginner-friendly repositories, a documented Conda or `venv` setup is the better starting point. + +## Good repository habits + +* Include one clearly documented environment setup path. +* Keep dependency specifications current. +* Avoid requiring contributors to guess which packages they need. +* Explain whether tests or docs require extra dependencies. +* Note when a project supports both Conda and `venv`, but recommend one default path. + +For collaborative work, a virtual environment is not just a convenience. It is part of the repository's social contract: it helps contributors arrive at the same working setup with less trial and error. diff --git a/book/content/using/GitCheatSheet.md b/book/content/using/GitCheatSheet.md index 6e59daf..8ed93ab 100644 --- a/book/content/using/GitCheatSheet.md +++ b/book/content/using/GitCheatSheet.md @@ -4,19 +4,20 @@ About 99% of the commands you'll type in `git` are summarized in the table below: -| Functionality | Git Command | -|-------------------------------------------------------------|------------------------------------------------------------------| -| See active branch and uncommitted changes for tracked files | `git status -uno` | -| Change branch | `git checkout ` | -| Create new branch and change to it | `git checkout -b ` | -| Track file or latest changes to file | `git add ` | -| Commit changes to branch | `git commit -m "message describing changes" ` | -| Push committed changes to remote branch | `git push origin ` | -| Merge changes from master into development branch | `(change working branch to master, then…) git merge ` | -| Merge changes from development branch into master | (change to development branch, then…) `git merge master` | -| List current tags | `git tag` | -| Create a new tag | `git tag -a v -m "message with new tag"` | -| Pull changes from remote repo onto local machine | `git fetch upstream` | -| Merge changes from remote into active local branch | `git merge upstream/` | -| Clone a remote repository | `git clone ` | - +| Functionality | Git Command | +|-------------------------------------------------------------|----------------------------------------------------------------------| +| See active branch and uncommitted changes | `git status` | +| Change branch | `git checkout ` | +| Create new branch and change to it | `git checkout -b ` | +| Stage file changes | `git add ` | +| Commit staged changes | `git commit -m "message describing changes"` | +| Push committed changes to remote branch | `git push origin ` | +| Update local information from upstream | `git fetch upstream` | +| Merge upstream main into local main | `git checkout main` then `git merge upstream/main` | +| Merge local main into your feature branch | `git checkout ` then `git merge main` | +| List current tags | `git tag` | +| Create a new tag | `git tag -a v -m "message with new tag"` | +| Fast-forward local branch from a remote when possible | `git pull --ff-only` | +| See unstaged changes | `git diff` | +| See staged changes | `git diff --staged` | +| Clone a remote repository | `git clone ` | diff --git a/book/content/using/create_GitHub_acct.md b/book/content/using/create_GitHub_acct.md index 56d7c2e..fef6057 100644 --- a/book/content/using/create_GitHub_acct.md +++ b/book/content/using/create_GitHub_acct.md @@ -1,6 +1,49 @@ (chap_gitacct)= # Create GitHub account -Follow [these instructions](https://docs.github.com/en/github/getting-started-with-github/signing-up-for-a-new-github-account) at Github.com to create an account. +To collaborate on GitHub-hosted repositories, you need a GitHub account. If you do not yet have one, start with GitHub's official instructions: -Most likely, the free organization account will be the right place to start for you. We recommend choosing a username suitable for a professional setting, as this will be your public profile on GitHub. +[Sign up for a GitHub account](https://docs.github.com/en/github/getting-started-with-github/signing-up-for-a-new-github-account) + +## Choose a professional username + +Your username becomes part of your public GitHub identity and appears in repository URLs, pull requests, and issue discussions. It is usually worth choosing a name you would be comfortable showing on a resume or in a professional collaboration. + +## Start with a personal account + +Most new contributors should begin with a personal GitHub account. Organizations such as PSLmodels can then invite that account into repositories or teams when appropriate. + +## Turn on two-factor authentication + +If you plan to contribute regularly, enabling two-factor authentication is a very good idea. Many organizations require it, and it significantly improves account security. + +## Set up your profile + +A fully polished GitHub profile is not required before making your first contribution, but a few simple steps help collaborators know who you are: + +* add your name +* add a short bio if you want +* add a profile picture +* optionally link a website or institution + +## Decide how you will authenticate from the command line + +Creating the GitHub account is not enough by itself. To push from your local machine, you also need an authentication method for Git operations. + +The two most common choices are: + +* HTTPS with a credential manager or personal access token +* SSH keys + +Both approaches work. What matters most is that you configure one of them and test it before you are in the middle of opening your first pull request. + +## Good first account steps for contributors + +After creating your account: + +1. confirm your email address +2. enable two-factor authentication +3. set up command-line authentication +4. visit a repository you care about and practice forking it + +That sequence makes the jump from "I have an account" to "I am ready to contribute" much smoother. diff --git a/book/content/using/git_config.md b/book/content/using/git_config.md index 8cf9170..117f655 100644 --- a/book/content/using/git_config.md +++ b/book/content/using/git_config.md @@ -1,31 +1,114 @@ (chap_gitconfig)= # Setting up Git with git config -Show how to set up Git and make it your own using git config. +One reason Git feels personal to experienced users is that it can be configured to match your preferred workflow. Some settings are essential for everyone, while others are quality-of-life improvements that become more valuable as you use Git more often. -To view all of your settings, you can type the following into your computers terminal: +## Inspect your current configuration -``` +To view your active settings and where they came from, run: + +```bash git config --list --show-origin ``` -When getting set up, it's important to enter your credentials so that `git` on your local machine is linked to your account on GitHub. You'll do this by first entering your name: +This is useful because Git settings can come from multiple scopes: -``` +* system-wide configuration +* user-level global configuration +* repository-level local configuration + +For most personal setup, you will want `--global`. + +## Set your name and email + +The first essential settings are your name and email, because Git records them in each commit. + +```bash git config --global user.name "Your Name" +git config --global user.email yourname@example.com ``` -Then, you'll enter your email (using the email that you used to register your account on GitHub.com): +Use the email address you want associated with your GitHub contributions. If your GitHub account uses a privacy-protecting noreply address, you may prefer that instead of a personal email. + +## Set your default editor + +Git will sometimes open a text editor for commit messages, merge commits, or rebase instructions. Set an editor you are comfortable using. + +For example, to use `vim`: + +```bash +git config --global core.editor vim ``` -git config --global user.email yourname@example.com + +If you prefer VS Code: + +```bash +git config --global core.editor "code --wait" ``` -You can also set your default text editor for use with `git` by following the example below, which makes `vim` the default: +We discuss editor integration further in {ref}`chap_vscode`. +## Choose a pull strategy + +One helpful early choice is how `git pull` should behave. Many beginners are surprised when a pull creates a merge commit. If you want Git to update only when a fast-forward is possible, you can set: + +```bash +git config --global pull.ff only ``` -git config --global core.editor vim + +This is a good protective default for many new users. + +Contributors who prefer rebasing local work onto updated remote history sometimes choose: + +```bash +git config --global pull.rebase true +``` + +Either choice is better than leaving the behavior mysterious. + +## Set the default branch name for new repositories + +Many repositories now use `main` as the default branch name. To make newly initialized repositories match that convention: + +```bash +git config --global init.defaultBranch main +``` + +## Credential helpers and authentication + +Git needs a way to authenticate when you push to GitHub. That is usually handled through either HTTPS with a credential helper or SSH keys. + +If you use HTTPS, a credential helper can save you from entering credentials repeatedly. The exact setup varies by operating system, but GitHub's authentication guides explain the options clearly. + +If you use SSH, you will generate an SSH key pair, add the public key to GitHub, and clone or update remote URLs using the SSH form of the repository address. + +```{admonition} Important note +:class: note +GitHub no longer accepts account passwords for command-line Git operations over HTTPS. If you use HTTPS, you will typically authenticate with a personal access token through a credential manager. +``` + +## Quality-of-life aliases + +As you grow more comfortable with Git, you may want a few aliases for common commands. For example: + +```bash +git config --global alias.st status +git config --global alias.co checkout +git config --global alias.br branch ``` -We have some discussion in the {ref}`chap_vscode` Chapter about setting up VS Code as a `mergetool` and `difftool`. +Aliases are optional. They are useful only if they make your workflow clearer rather than more cryptic. + +## Keep configuration understandable + +Git is extremely configurable, which is both a strength and a trap. A beginner should start with a small, understandable configuration and expand only as needed. + +A sensible starting set is: + +* `user.name` +* `user.email` +* `core.editor` +* `pull.ff only` or `pull.rebase true` +* `init.defaultBranch main` -For more information on configuing `git`, see the full instructions from `git` [here](https://git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup). +For more information on configuring Git, see the official setup guide in *Pro Git* [here](https://git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup). diff --git a/book/content/using/installing_git.md b/book/content/using/installing_git.md index 396debf..af45a33 100644 --- a/book/content/using/installing_git.md +++ b/book/content/using/installing_git.md @@ -1,4 +1,38 @@ (chap_installgit)= # Installing Git -Detailed instructions to install Git are provided at the [Git-SCM website](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git). \ No newline at end of file +Before you can use GitHub effectively from your local machine, you need Git installed in your terminal environment. + +## Check whether Git is already installed + +On many machines, Git is already available. You can check with: + +```bash +git --version +``` + +If that command prints a version number, Git is installed. If not, you will need to install it. + +## Use the official installation instructions + +The most reliable installation instructions are maintained by the Git project itself: + +[Git installation guide](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) + +That page covers macOS, Windows, and Linux. + +## After installation + +Installing Git is only the first step. After Git is available, most contributors should immediately continue with: + +1. setting `user.name` and `user.email` +2. choosing an editor +3. setting up authentication for GitHub + +Those next steps are covered in {ref}`chap_gitconfig` and {ref}`chap_gitacct`. + +## A beginner sanity check + +After installation, verify that Git runs from the same terminal where you plan to do your development work. This matters because some users have multiple terminals, shells, or Python environments and assume Git is available everywhere when it is not. + +Once `git --version` works and Git can be found in your normal shell, you are ready to continue.