WordPress · AetherUnbound · Nov 29, 2023 · Nov 28, 2023 · Nov 28, 2023 · Nov 29, 2023
@@ -40,7 +40,7 @@ for that project.
 
 The input file is an Excel spreadsheet which looks like the following:
 
-![Excel spreadsheet](./_docs/example_spreadsheet_screenshot.png)
+![Excel spreadsheet](./_docs/example_effort_spreadsheet.png)
 
 The input file should have one "sheet" per voter, with each sheet's title being
 the member's name. Each sheet should be a copy of the first sheet, named
@@ -51,3 +51,34 @@ The output is two box plots, one for effort and one for impact, which look like
 the following:
 
 ![Box plot for effort](./_docs/example_effort.png)
+
+## Average Weeks of Work Calculation
+
+In addition to voting on effort and impact, maintainers also vote on the number
+of weeks a prospective project might take. Instructions provided to the
+maintainers are as follows:
+
+> **Instructions**:
+>
+> Provide each project in the sheet a value for each category. The scales >
+> aren't a perfect, measurable thing, so use your best judgement and > instinct.
+> A notes field is also provided, please use this for notes to > yourself after
+> all the values are combined when discussion occurs. All of the projects also
+> link back to the description provided for them by the project author.
+>
+> \# of IPs is defined for you based on the initial project plan, and total
+> weeks is calculated automatically. Please fill out how many weeks of worth you
+> believe the implementation of the project would take one person working on it
+> full time. You can use fractional values like 0.5 or 2.5.
+
+The script is used to ingest the output of the voting and produce two CSV files
+for average weeks and weighted average weeks. The weighted average weeks uses
+the confidence value as the weight for the average (although some special
+considerations had to be made when performing the actual calculation because not
+all projects had votes in all 3 confidence levels).
+
+The input file is an Excel spreadsheet which looks like the following:
+
+![Excel spreadsheet](_docs/example_weeks_spreadsheet.png)
+
+The output CSVs will have two columns: the project name and the computed weeks.
@@ -0,0 +1,95 @@
+"""
+Script for generating average weeks of work based on estimations made by all team
+members.
+
+See the README for more information.
+"""
+from pathlib import Path
+
+import click
+import pandas as pd
+import sheet_utils
+
+
+INPUT_FILE = Path(__file__).parent / "data" / "week_votes.xlsx"
+OUTPUT_PATH = Path(__file__).parent / "output"
+
+COLUMN_WEEKS = "Total weeks"
+COLUMN_CONFIDENCE = "Confidence (1-3)"
+SKIP_SHEETS = {
+    # Ignore the template sheet
+    "Template",
+    # Design hours were provided separate from dev hours and will be used elsewhere
+    "Francisco",
+}
+
+
+def calculate_weighted_average(
+    weeks: pd.DataFrame, confidence: pd.DataFrame
+) -> pd.DataFrame:
+    """
+    Use the confidence values provided to compute a weighted average, first by averaging
+    all values that share the same confidence score, then by performing a conditional
+    weighted average. The weighting is conditional because not all projects have votes
+    across all 3 confidence values (for instance, a large complex project only having
+    confidence values 2 and 1).
+    """
+    # Average the 3 votes, 2 votes and 1 votes individually
+    high_confidence_avg = weeks[confidence == 3].mean(axis=1).fillna(0)
+    med_confidence_avg = weeks[confidence == 2].mean(axis=1).fillna(0)
+    low_confidence_avg = weeks[confidence == 1].mean(axis=1).fillna(0)
+
+    # Mask for determining weighted sum
+    high_mask = (high_confidence_avg > 0) * 3
+    med_mask = (med_confidence_avg > 0) * 2
+    low_mask = (low_confidence_avg > 0) * 1
+
+    # Compute the weighted average across rows, using the sum (if it exists!) of
+    # weights across that row
+    weighted_average_weeks = (
+        high_confidence_avg * 3 + med_confidence_avg * 2 + low_confidence_avg * 1
+    ) / (high_mask + med_mask + low_mask)
+    return weighted_average_weeks.round().astype(int)
+
+
+def _write_file(data: pd.DataFrame, filename: str) -> None:
+    """Small wrapper for writing dataframes for this script."""
+    path = OUTPUT_PATH / filename
+    print(f"Writing file to {path}")
+    data.to_csv(path, header=True)
+
+
+@click.command()
+@click.option(
+    "--output",
+    help="Output directory",
+    type=click.Path(path_type=Path),
+    default=OUTPUT_PATH,
+)
+@click.option(
+    "--input-file",
+    help="Input Excel document to use",
+    type=click.Path(path_type=Path),
+    default=INPUT_FILE,
+)
+def main(output: Path, input_file: Path):
+    # Ensure the output folder exists
+    output.mkdir(parents=True, exist_ok=True)
+
+    frames, projects = sheet_utils.read_file(input_file)
+    members = list(set(frames.keys()) - SKIP_SHEETS)
+
+    weeks = sheet_utils.get_columns_by_members(frames, members, projects, COLUMN_WEEKS)
+    confidence = sheet_utils.get_columns_by_members(
+        frames, members, projects, COLUMN_CONFIDENCE
+    )
+
+    average_weeks = weeks.mean(axis=1).round().astype(int)
+    weighted_average_weeks = calculate_weighted_average(weeks, confidence)
+
+    _write_file(average_weeks, "average_weeks.csv")
+    _write_file(weighted_average_weeks, "weighted_average_weeks.csv")
+
+
+if __name__ == "__main__":
+    main()
@@ -10,6 +10,7 @@
 import matplotlib.colors as mcolors
 import matplotlib.pyplot as plt
 import pandas as pd
+import sheet_utils
 
 
 INPUT_FILE = Path(__file__).parent / "data" / "votes.xlsx"
@@ -20,23 +21,6 @@
 COLUMN_CONFIDENCE = "Confidence (1-3)"
 
 
-def get_columns_by_members(
-    frames: dict[str, pd.DataFrame],
-    members: list[str],
-    projects: pd.Series,
-    column: str,
-):
-    """
-    Create a new DataFrame which pulls out the provided column from each of the member
-    sheets, and sets the index to the project names.
-    """
-    data = pd.DataFrame([frames[name][column] for name in members], index=members)
-    # The data is transposed here because the DataFrame constructor creates a DataFrame
-    # with the projects as the columns, and the members as the index, whereas we want
-    # the projects as the index.
-    return data.T.set_index(projects)
-
-
 def plot_votes(
     data: pd.DataFrame, color_by: pd.Series, column: str, year: int, output_path: Path
 ):
@@ -106,27 +90,22 @@ def main(output: Path, input_file: Path):
     # Ensure the output folder exists
     output.mkdir(parents=True, exist_ok=True)
 
-    print(f"Reading input file: {input_file}")
-    # Read the input file
-    frames = pd.read_excel(
-        input_file,
-        # Include all sheets
-        sheet_name=None,
-        # Skip the first 5 rows, which are the instructional text
-        skiprows=5,
-        # Use the first row as the header
-        header=0,
-    )
-    # Pull the project names out of the template sheet
-    projects = frames["Template"]["Name"]
+    frames, projects = sheet_utils.read_file(input_file)
+
     # Use the name of the frames as the list of voting members
     members = list(frames.keys())[1:]
     # This is planning for the *next* year, e.g. one beyond the current one
     planning_year = datetime.now().year + 1
 
-    effort = get_columns_by_members(frames, members, projects, COLUMN_EFFORT)
-    impact = get_columns_by_members(frames, members, projects, COLUMN_IMPACT)
-    confidence = get_columns_by_members(frames, members, projects, COLUMN_CONFIDENCE)
+    effort = sheet_utils.get_columns_by_members(
+        frames, members, projects, COLUMN_EFFORT
+    )
+    impact = sheet_utils.get_columns_by_members(
+        frames, members, projects, COLUMN_IMPACT
+    )
+    confidence = sheet_utils.get_columns_by_members(
+        frames, members, projects, COLUMN_CONFIDENCE
+    )
     average_confidence = confidence.mean(axis=1)
 
     plot_votes(effort, average_confidence, COLUMN_EFFORT, planning_year, output)

@@ -0,0 +1,45 @@
+from pathlib import Path
+
+import pandas as pd
+
+
+def read_file(input_file: Path) -> tuple[dict[str, pd.DataFrame], pd.Series]:
+    """
+    Read a structured input excel document and return the various frames and a
+    list of projects from it. This makes several assumptions about the format of the
+    file, namely that 5 rows should be skipped, that the document has a "Template" sheet
+    that can be used for determine the project lists, and that the project lists
+    are in a column called "Name".
+    """
+    print(f"Reading input file: {input_file}")
+    # Read the input file
+    frames = pd.read_excel(
+        input_file,
+        # Include all sheets
+        sheet_name=None,
+        # Skip the first 5 rows, which are the instructional text
+        skiprows=5,
+        # Use the first row as the header
+        header=0,
+    )
+    # Pull the project names out of the template sheet
+    projects = frames["Template"]["Name"]
+
+    return frames, projects
+
+
+def get_columns_by_members(
+    frames: dict[str, pd.DataFrame],
+    members: list[str],
+    projects: pd.Series,
+    column: str,
+):
+    """
+    Create a new DataFrame which pulls out the provided column from each of the member
+    sheets, and sets the index to the project names.
+    """
+    data = pd.DataFrame([frames[name][column] for name in members], index=members)
+    # The data is transposed here because the DataFrame constructor creates a DataFrame
+    # with the projects as the columns, and the members as the index, whereas we want
+    # the projects as the index.
+    return data.T.set_index(projects)