You should use PySR to find scaling laws. Here's an example: better_scaling_law_llama2.py
Source for scaling dataset: https://arxiv.org/abs/2309.16039
Their law:
The laws PySR can find automatically (a simultaneous fit for number parameters & context size):