Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
EthanC111 committed Sep 20, 2023
1 parent 1cd3c10 commit c4bab55
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,11 +144,11 @@ We demonstrate that Abel-70B not only achieves SOTA on the GSM8k and MATH datase
<img src="./fig/MATH_comparison.png">

## Limitation
* **Overfitting**: Despite conducting robustness analysis and considering that generative AI for mathematics inherently exhibits fragility (often necessitating advanced decoding strategies, such as majority voting), excessive reliance on constructing SFT samples to enhance performance can inevitably lead the model towards overfitting. (However, overfitting is not the primary concern of the current project because even with overfitting various augmented training data, it remains challenging to achieve favorable results on the test set, such as the MATH dataset, for complex mathematical reasoning tasks.) Nevertheless, we still need to perform more extensive robust analysis (https://github.com/GAIR-NLP/abel/issues/9) and actively explore training methods that can transform the model into a mathematical polymath and conduct a more comprehensive cross-domain generalization analysis.
* **Generalization**: A good mathematical model should not be limited to solving problems only on GSM8K and MATH datasets; it should be capable of handling various types of problems, including those that assess different knowledge domains and require different types of responses (e.g., multiple-choice, true/false, proofs, arithmetic, etc.). The current model's capabilities are insufficient to generalize to these diverse scenarios (https://github.com/GAIR-NLP/abel/issues/10).
* **Universality**: Ultimately, we anticipate that the mathematical reasoning abilities enabled by large models can be integrated into chatbots for various domains such as medicine, law, physics, chemistry, etc. The key to achieving AGI is incorporating the power of a strong mathematical model into other models, which is currently lacking in the current model (https://github.com/GAIR-NLP/abel/issues/11).
* **Multilinguality**: The current model's training data and base model constraints limit its ability to provide responses in languages other than English (https://github.com/GAIR-NLP/abel/issues/12).
* **Advanced techniques**: The current model primarily focuses on SFT, and advanced techniques such as reward models, RLHF (Reinforcement Learning from Human Feedback), and tools have not yet been explored. (https://github.com/GAIR-NLP/abel/issues/13, https://github.com/GAIR-NLP/abel/issues/14)
* **Overfitting**: Despite conducting robustness analysis and considering that generative AI for mathematics inherently exhibits fragility (often necessitating advanced decoding strategies, such as majority voting), excessive reliance on constructing SFT samples to enhance performance can inevitably lead the model towards overfitting. (However, overfitting is not the primary concern of the current project because even with overfitting various augmented training data, it remains challenging to achieve favorable results on the test set, such as the MATH dataset, for complex mathematical reasoning tasks.) Nevertheless, we still need to perform more extensive robust analysis (https://github.com/GAIR-NLP/abel/issues/1) and actively explore training methods that can transform the model into a mathematical polymath and conduct a more comprehensive cross-domain generalization analysis.
* **Generalization**: A good mathematical model should not be limited to solving problems only on GSM8K and MATH datasets; it should be capable of handling various types of problems, including those that assess different knowledge domains and require different types of responses (e.g., multiple-choice, true/false, proofs, arithmetic, etc.). The current model's capabilities are insufficient to generalize to these diverse scenarios (https://github.com/GAIR-NLP/abel/issues/2).
* **Universality**: Ultimately, we anticipate that the mathematical reasoning abilities enabled by large models can be integrated into chatbots for various domains such as medicine, law, physics, chemistry, etc. The key to achieving AGI is incorporating the power of a strong mathematical model into other models, which is currently lacking in the current model (https://github.com/GAIR-NLP/abel/issues/3).
* **Multilinguality**: The current model's training data and base model constraints limit its ability to provide responses in languages other than English (https://github.com/GAIR-NLP/abel/issues/4).
* **Advanced techniques**: The current model primarily focuses on SFT, and advanced techniques such as reward models, RLHF (Reinforcement Learning from Human Feedback), and tools have not yet been explored. (https://github.com/GAIR-NLP/abel/issues/5, https://github.com/GAIR-NLP/abel/issues/6)

We have created a [list of issues](https://github.com/GAIR-NLP/abel/issues) to maintain these limitations and potential solutions. Your opinions and comments are always welcome.

Expand Down

0 comments on commit c4bab55

Please sign in to comment.