Readme for OpenAI Leaderboard Analysis

This project is a Python script that analyzes the performance of models on the OpenAI leaderboard. It does this by scraping commit dates from GitHub for each model and plots their scores over time. The leaderboard can be found at this URL: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

Inputs

The main input for this script is an Excel file named raw_leaderboard.xlsx. This file should contain a sheet named Sheet1 that has a column called Model, which includes the names of the models on the leaderboard, and a column that contains hyperlinks to the GitHub page for each of these models.

The Excel file is manually created by copying the leaderboard data from the above URL. This data includes the model name and the corresponding performance metrics on four different challenges: ARC, HellaSwag, MMLU, and TruthfulQA.

Outputs

The output is a PNG image file named leaderboard_scores.png. This image file contains a plot of the models' scores over time. It includes plots for the four different challenges and compares the scores of the models with the scores of GPT-3.5 and GPT-4.

Procedure

The script first reads the Excel file and extracts the hyperlinks from the Model column.
Then it uses the requests and BeautifulSoup libraries to scrape the text from the commit history pages on GitHub for each model.
After scraping the text, it extracts the dates of the commits using regular expressions.
It standardizes the dates and then uses them to restructure the DataFrame to include only the highest available scores for each challenge on each date.
Finally, it plots the scores over time, saves the plot as a PNG image file, and displays it.

Please note that the first commit on the model page is used as the benchmark for the model's availability. This may not perfectly correspond to when the model became available on the leaderboard.

Future Expansion

Two potential ways to expand this project are:

Using Selenium to automatically pull the leaderboard table from the OpenAI site instead of manually copying and pasting it into an Excel file.
Utilizing Selenium again to pull date information for the models that require interaction (clicking) on the model page before the commit histories become available. (The models currently getting dropped are: 'llama-65b' 'llama-30b' 'bigcode/starcoderplus' 'llama-13b' 'alpaca-13b' 'llama-7b' 'alessandropalla/instruct_gpt2')

Please be aware that these extensions would introduce more complexity to the script and would require additional dependencies.

Dependencies

This project requires the following Python libraries:

pandas
openpyxl
requests
BeautifulSoup
re
datetime
dateparser
matplotlib

You can install these libraries using pip:

pip install pandas openpyxl requests beautifulsoup4 regex datetime dateparser matplotlib

Execution

To execute the script, navigate to the directory containing the script and the input Excel file, and run the script using a Python interpreter.

python leaderboard_analysis.py

The script will output the leaderboard_scores.png image file in the same directory.

For any issues, please refer to the original code and accompanying blog post.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
leaderboard_analysis.py		leaderboard_analysis.py
leaderboard_scores.png		leaderboard_scores.png
raw_leaderboard.xlsx		raw_leaderboard.xlsx
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

leaderboard_analysis.py

leaderboard_analysis.py

leaderboard_scores.png

leaderboard_scores.png

raw_leaderboard.xlsx

raw_leaderboard.xlsx

readme.md

readme.md

Repository files navigation

Readme for OpenAI Leaderboard Analysis

Inputs

Outputs

Procedure

Future Expansion

Dependencies

Execution

About

Releases

Packages

Languages

abigailhaddad/leaderboard_analysis

Folders and files

Latest commit

History

Repository files navigation

Readme for OpenAI Leaderboard Analysis

Inputs

Outputs

Procedure

Future Expansion

Dependencies

Execution

About

Resources

Stars

Watchers

Forks

Languages