Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add project planning weeks of work averaging script #3412

Merged
merged 4 commits into from Nov 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
33 changes: 32 additions & 1 deletion utilities/project_planning/README.md
Expand Up @@ -40,7 +40,7 @@ for that project.

The input file is an Excel spreadsheet which looks like the following:

![Excel spreadsheet](./_docs/example_spreadsheet_screenshot.png)
![Excel spreadsheet](./_docs/example_effort_spreadsheet.png)

The input file should have one "sheet" per voter, with each sheet's title being
the member's name. Each sheet should be a copy of the first sheet, named
Expand All @@ -51,3 +51,34 @@ The output is two box plots, one for effort and one for impact, which look like
the following:

![Box plot for effort](./_docs/example_effort.png)

## Average Weeks of Work Calculation

In addition to voting on effort and impact, maintainers also vote on the number
of weeks a prospective project might take. Instructions provided to the
maintainers are as follows:

> **Instructions**:
>
> Provide each project in the sheet a value for each category. The scales >
> aren't a perfect, measurable thing, so use your best judgement and > instinct.
> A notes field is also provided, please use this for notes to > yourself after
> all the values are combined when discussion occurs. All of the projects also
> link back to the description provided for them by the project author.
>
> \# of IPs is defined for you based on the initial project plan, and total
> weeks is calculated automatically. Please fill out how many weeks of worth you
> believe the implementation of the project would take one person working on it
> full time. You can use fractional values like 0.5 or 2.5.

The script is used to ingest the output of the voting and produce two CSV files
for average weeks and weighted average weeks. The weighted average weeks uses
the confidence value as the weight for the average (although some special
considerations had to be made when performing the actual calculation because not
all projects had votes in all 3 confidence levels).

The input file is an Excel spreadsheet which looks like the following:

![Excel spreadsheet](_docs/example_weeks_spreadsheet.png)

The output CSVs will have two columns: the project name and the computed weeks.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
95 changes: 95 additions & 0 deletions utilities/project_planning/calculate_average_weeks_of_work.py
@@ -0,0 +1,95 @@
"""
Script for generating average weeks of work based on estimations made by all team
members.

See the README for more information.
"""
from pathlib import Path

import click
import pandas as pd
import sheet_utils


INPUT_FILE = Path(__file__).parent / "data" / "week_votes.xlsx"
OUTPUT_PATH = Path(__file__).parent / "output"

COLUMN_WEEKS = "Total weeks"
COLUMN_CONFIDENCE = "Confidence (1-3)"
SKIP_SHEETS = {
# Ignore the template sheet
"Template",
# Design hours were provided separate from dev hours and will be used elsewhere
"Francisco",
}


def calculate_weighted_average(
weeks: pd.DataFrame, confidence: pd.DataFrame
) -> pd.DataFrame:
"""
Use the confidence values provided to compute a weighted average, first by averaging
all values that share the same confidence score, then by performing a conditional
weighted average. The weighting is conditional because not all projects have votes
across all 3 confidence values (for instance, a large complex project only having
confidence values 2 and 1).
"""
# Average the 3 votes, 2 votes and 1 votes individually
high_confidence_avg = weeks[confidence == 3].mean(axis=1).fillna(0)
med_confidence_avg = weeks[confidence == 2].mean(axis=1).fillna(0)
low_confidence_avg = weeks[confidence == 1].mean(axis=1).fillna(0)

# Mask for determining weighted sum
high_mask = (high_confidence_avg > 0) * 3
med_mask = (med_confidence_avg > 0) * 2
low_mask = (low_confidence_avg > 0) * 1

# Compute the weighted average across rows, using the sum (if it exists!) of
# weights across that row
weighted_average_weeks = (
high_confidence_avg * 3 + med_confidence_avg * 2 + low_confidence_avg * 1
) / (high_mask + med_mask + low_mask)
return weighted_average_weeks.round().astype(int)


def _write_file(data: pd.DataFrame, filename: str) -> None:
"""Small wrapper for writing dataframes for this script."""
path = OUTPUT_PATH / filename
print(f"Writing file to {path}")
data.to_csv(path, header=True)


@click.command()
@click.option(
"--output",
help="Output directory",
type=click.Path(path_type=Path),
default=OUTPUT_PATH,
)
@click.option(
"--input-file",
help="Input Excel document to use",
type=click.Path(path_type=Path),
default=INPUT_FILE,
)
def main(output: Path, input_file: Path):
# Ensure the output folder exists
output.mkdir(parents=True, exist_ok=True)

frames, projects = sheet_utils.read_file(input_file)
members = list(set(frames.keys()) - SKIP_SHEETS)

weeks = sheet_utils.get_columns_by_members(frames, members, projects, COLUMN_WEEKS)
confidence = sheet_utils.get_columns_by_members(
frames, members, projects, COLUMN_CONFIDENCE
)

average_weeks = weeks.mean(axis=1).round().astype(int)
weighted_average_weeks = calculate_weighted_average(weeks, confidence)

_write_file(average_weeks, "average_weeks.csv")
_write_file(weighted_average_weeks, "weighted_average_weeks.csv")


if __name__ == "__main__":
main()
45 changes: 12 additions & 33 deletions utilities/project_planning/graph_project_voting.py
Expand Up @@ -10,6 +10,7 @@
import matplotlib.colors as mcolors
import matplotlib.pyplot as plt
import pandas as pd
import sheet_utils


INPUT_FILE = Path(__file__).parent / "data" / "votes.xlsx"
Expand All @@ -20,23 +21,6 @@
COLUMN_CONFIDENCE = "Confidence (1-3)"


def get_columns_by_members(
frames: dict[str, pd.DataFrame],
members: list[str],
projects: pd.Series,
column: str,
):
"""
Create a new DataFrame which pulls out the provided column from each of the member
sheets, and sets the index to the project names.
"""
data = pd.DataFrame([frames[name][column] for name in members], index=members)
# The data is transposed here because the DataFrame constructor creates a DataFrame
# with the projects as the columns, and the members as the index, whereas we want
# the projects as the index.
return data.T.set_index(projects)


def plot_votes(
data: pd.DataFrame, color_by: pd.Series, column: str, year: int, output_path: Path
):
Expand Down Expand Up @@ -106,27 +90,22 @@ def main(output: Path, input_file: Path):
# Ensure the output folder exists
output.mkdir(parents=True, exist_ok=True)

print(f"Reading input file: {input_file}")
# Read the input file
frames = pd.read_excel(
input_file,
# Include all sheets
sheet_name=None,
# Skip the first 5 rows, which are the instructional text
skiprows=5,
# Use the first row as the header
header=0,
)
# Pull the project names out of the template sheet
projects = frames["Template"]["Name"]
frames, projects = sheet_utils.read_file(input_file)

# Use the name of the frames as the list of voting members
members = list(frames.keys())[1:]
# This is planning for the *next* year, e.g. one beyond the current one
planning_year = datetime.now().year + 1

effort = get_columns_by_members(frames, members, projects, COLUMN_EFFORT)
impact = get_columns_by_members(frames, members, projects, COLUMN_IMPACT)
confidence = get_columns_by_members(frames, members, projects, COLUMN_CONFIDENCE)
effort = sheet_utils.get_columns_by_members(
frames, members, projects, COLUMN_EFFORT
)
impact = sheet_utils.get_columns_by_members(
frames, members, projects, COLUMN_IMPACT
)
confidence = sheet_utils.get_columns_by_members(
frames, members, projects, COLUMN_CONFIDENCE
)
average_confidence = confidence.mean(axis=1)

plot_votes(effort, average_confidence, COLUMN_EFFORT, planning_year, output)
Expand Down
45 changes: 45 additions & 0 deletions utilities/project_planning/sheet_utils.py
@@ -0,0 +1,45 @@
from pathlib import Path

import pandas as pd


def read_file(input_file: Path) -> tuple[dict[str, pd.DataFrame], pd.Series]:
"""
Read a structured input excel document and return the various frames and a
list of projects from it. This makes several assumptions about the format of the
file, namely that 5 rows should be skipped, that the document has a "Template" sheet
that can be used for determine the project lists, and that the project lists
are in a column called "Name".
"""
print(f"Reading input file: {input_file}")
# Read the input file
frames = pd.read_excel(
input_file,
# Include all sheets
sheet_name=None,
# Skip the first 5 rows, which are the instructional text
skiprows=5,
# Use the first row as the header
header=0,
)
# Pull the project names out of the template sheet
projects = frames["Template"]["Name"]

return frames, projects


def get_columns_by_members(
frames: dict[str, pd.DataFrame],
members: list[str],
projects: pd.Series,
column: str,
):
"""
Create a new DataFrame which pulls out the provided column from each of the member
sheets, and sets the index to the project names.
"""
data = pd.DataFrame([frames[name][column] for name in members], index=members)
# The data is transposed here because the DataFrame constructor creates a DataFrame
# with the projects as the columns, and the members as the index, whereas we want
# the projects as the index.
return data.T.set_index(projects)