Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Impact Statement is split across pages #13

Open
earlng opened this issue Feb 9, 2021 · 2 comments
Open

The Impact Statement is split across pages #13

earlng opened this issue Feb 9, 2021 · 2 comments

Comments

@earlng
Copy link
Owner

earlng commented Feb 9, 2021

Describe the bug
the BIS is split across multiple pages, the code only pulls in the content before the page break because in the xml the page break is a <section> break

To Reproduce
Papers:

  • 285baacbdf8fda1de94b19282acd23e2

Expected behavior
Grab the entire impact statement across pages.

Possible Fix
pages seem to be marked by an <outsider> tag; might be possible to write the code so that it would continue to scrap the <section> immediately following an <outsider> tag?

@earlng
Copy link
Owner Author

earlng commented Feb 9, 2021

Looking at 285baacbdf8fda1de94b19282acd23e2:

      <section class="DoCO:Section">
        <h1 class="DoCO:SectionTitle" id="145" page="10" column="1">Broader Impact</h1>
        <section class="DoCO:Section">
          <h2 class="DoCO:SectionTitle" id="146" confidence="possible" page="10" column="1">Exploring Memory-Computation Trade-offs in RL</h2>
          <region class="DoCO:TextChunk" id="151" page="10" column="1">Reinforcement learning policies have enjoyed remarkable success in recent years, in particular in the context of large-scale game playing. These results, however, mask the high underlying costs in terms of computational resources and training time that the demonstrations requires [<xref ref-type="bibr" rid="R36" id="147" class="deo:Reference">36</xref>, <xref ref-type="bibr" rid="R26" id="148" class="deo:Reference">26</xref>, <xref ref-type="bibr" rid="R27" id="149" class="deo:Reference">27</xref>, <xref ref-type="bibr" rid="R35" id="150" class="deo:Reference">35</xref>]. For example, the AlphaGo Zero algorithm that mastered Chess and Go from scratch trained their algorithm over 72 hours using 4 TPUs and 64 GPUs. These results, while highlighting the intrinsic power in reinforcement learning algorithms, are computationally infeasible for applying algorithms to RL tasks in computing systems. As an example, RL approaches have received much interest in several of the following problems:</region>
          <region class="DoCO:TextChunk" id="159" confidence="possible" page="10" column="1">• Memory Management : Many computing systems have two sources of memory; on-chip memory which is fast but limited, and off-chip memory which has low bandwidth and suffers from high latency. Designing memory controllers for these system require a scheduling policy to adapt to changes in workload and memory reference streams, ensuring consistency in the memory, and controlling for long-term consequences of scheduling decisions [<xref ref-type="bibr" rid="R1" id="152" class="deo:Reference">1</xref>, <xref ref-type="bibr" rid="R2" id="153" class="deo:Reference">2</xref>, <xref ref-type="bibr" rid="R8" id="154" class="deo:Reference">8</xref>]. • Online Resource Allocation : Cloud-based clusters for high performance computing must decide how to allocate computing resources to different users or tasks with highly variable demand. Controllers for these systems must make decisions online to manage the trade-offs between computation cost, server costs, and delay in job-completions. Recent work has studied RL algorithms for such problems [<xref ref-type="bibr" rid="R15" id="155" class="deo:Reference">15</xref>, <xref ref-type="bibr" rid="R23" id="156" class="deo:Reference">23</xref>, <xref ref-type="bibr" rid="R28" id="157" class="deo:Reference">28</xref>, <xref ref-type="bibr" rid="R22" id="158" class="deo:Reference">22</xref>].</region>
          <region class="DoCO:TextChunk" id="160" page="10" column="1">Common to all of these examples are computation and storage limitations on the devices used for the controller.</region>
          <region class="DoCO:TextChunk" id="161" confidence="possible" page="10" column="1">• Limited Memory : On chip memory is expensive and off-chip memory access has low- bandwidth. As any reinforcement learning algorithm requires memory to store estimates of relevant quantities - RL algorithms for computing systems must manage their computational requirements. • Power Consumption : Many applications require low-power consumption for executing RL policies on general computing platforms. • Latency Requirements : Many problems for computing systems (e.g. memory management) have strict latency quality of service requirements that limits reinforcement learning algorithms to execute their policy quickly.</region>
          <region class="DoCO:TextChunk" id="167" page="10" column="1">Our algorithm A DA MB takes a first step towards designing efficient reinforcement learning algorithms for continuous (or large finite) spaces, where efficient means both low-regret, but also low storage and computation complexity (see <xref ref-type="table" rid="T1" id="162" class="deo:Reference">Table 1</xref>). A DA MB is motivated by recent algorithms for reinforcement learning on memory constrained devices which use a technique called cerebellar model articulation controller (CMAC). This technique uses a random-discretizations of the space at various levels of coarseness [<xref ref-type="bibr" rid="R15" id="163" class="deo:Reference">15</xref>]. Moreover, heuristic algorithms which use discretizations (either fixed or adaptive) have been extensively studied on various tasks [<xref ref-type="bibr" rid="R32" id="164" class="deo:Reference">32</xref>, <xref ref-type="bibr" rid="R39" id="165" class="deo:Reference">39</xref>, <xref ref-type="bibr" rid="R22" id="166" class="deo:Reference">22</xref>]. We are able to show that our algorithm achieves good dependence with respect to K on all three dimensions (regret, computation, and storage complexity). With future work we hope to determine problem specific guarantees, exhibiting how these adaptive partitioning algorithms are able to extract structure common in computing systems problems.</region>
          <outsider class="DoCO:TextBox" type="page_nr" id="168" page="10" column="1">10</outsider>
        </section>

Although I don't think this is a good example of an impact statement going over different pages (doesn't seem like it does). I think this issue would be resolved via the proposed fix of #10.

Regardless of what page the impact statement spills over to, it should still be part of the same <section>, so if we grab all the text in the relevant section, we should find everything.

@earlng
Copy link
Owner Author

earlng commented Feb 16, 2021

It is not resolved by be5e2a9 because in the specific example of 285baacbdf8fda1de94b19282acd23e2 the line that immediately follows the Broader Impact header is another section. So this doesn't satisfy the condition of:

if child.itertext() != "" and (child.attrib["class"] == "DoCO:TextChunk" or child.attrib["class"] == "DoCO:TextBox")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant