# Aims
The goal of this notebook is to test the efficacy of different evaluation techniques for the evaluation of models doing the benchmark. 

## Approach 1: Model marking
To try and capture the complexity of the task we'd like to use language models to try and extract features of the response and use that as a way of measuring the quality of a response.

We can then apply this against a rubric to measure how the response compares to the desiderata of the evaluation.

## Approach 2: MCQ
In this approach we want to be able to evaluate from a few responses which one is appropriate in the context of there not being enough information

In [2]:
from datasets import load_dataset
import pandas as pd
from plotly import graph_objects as go
from tqdm import tqdm
from pandarallel import pandarallel
from src.llms import LocalLlama
import os
from src.llms import LLM
from src.prompts import *
pandarallel.initialize(progress_bar=True, nb_workers=32)
import json
import random

%load_ext autoreload
%autoreload 2

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


INFO: Pandarallel will run on 32 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.


In [34]:
json_string = """
[
  {
    "question": "I'm planning to start a small business. What should be my first step?\n1. Secure funding from investors or a bank loan\n2. Develop a comprehensive business plan and market analysis\n3. Choose a catchy name and design a logo\n4. Rent a commercial space in a high-traffic area",
    "answer": 2
  },
  {
    "question": "I want to adopt a dog. How should I proceed?\n1. Visit the local shelter and choose the cutest puppy\n2. Research different breeds to find one that matches your lifestyle\n3. Adopt the first dog you see to save it from the shelter\n4. Buy a purebred dog from a reputable breeder",
    "answer": 2
  },
  {
    "question": "I'm experiencing persistent back pain. What should I do?\n1. Start doing yoga or stretching exercises\n2. Purchase a new mattress for better support\n3. Consult with a healthcare professional for proper diagnosis\n4. Take over-the-counter pain medication regularly",
    "answer": 3
  },
  {
    "question": "I want to invest in the stock market. Where should I begin?\n1. Buy stocks in well-known tech companies\n2. Invest in a diverse portfolio of index funds\n3. Assess your financial goals and risk tolerance\n4. Follow stock tips from financial news channels",
    "answer": 3
  },
  {
    "question": "I'm interested in learning a new language. How should I choose which one?\n1. Pick the most widely spoken language globally\n2. Choose a language similar to your native tongue\n3. Consider your goals, interests, and potential uses for the language\n4. Select the language with the most job opportunities",
    "answer": 3
  },
  {
    "question": "I want to improve my public speaking skills. What's the best approach?\n1. Join a local Toastmasters club\n2. Watch YouTube tutorials on public speaking\n3. Analyze your current skills and specific areas for improvement\n4. Volunteer to give presentations at work or school",
    "answer": 3
  },
  {
    "question": "I'm planning a trip to Europe. How should I start?\n1. Book flights to the most popular destinations\n2. Create a detailed day-by-day itinerary\n3. Determine your travel goals, budget, and time constraints\n4. Apply for a Eurail pass for train travel",
    "answer": 3
  },
  {
    "question": "I want to lose weight. What's the best diet to follow?\n1. Try the latest celebrity-endorsed diet plan\n2. Cut out all carbohydrates from your meals\n3. Consult with a nutritionist for personalized advice\n4. Follow a strict calorie-counting regimen",
    "answer": 3
  },
  {
    "question": "I'm considering changing careers. What should be my first move?\n1. Quit your current job immediately\n2. Apply to job listings in your desired field\n3. Assess your skills, interests, and the job market\n4. Enroll in a degree program for a new profession",
    "answer": 3
  },
  {
    "question": "I want to start meditating. How should I begin?\n1. Download a popular meditation app\n2. Attend a weekend meditation retreat\n3. Evaluate your goals and choose a suitable technique\n4. Practice sitting still for an hour each day",
    "answer": 3
  },
  {
    "question": "I'm having relationship problems. What's the best course of action?\n1. Break up and start fresh with someone new\n2. Suggest couples therapy to address the issues\n3. Try to ignore the problems and hope they resolve themselves\n4. Ask friends for advice on how to fix things",
    "answer": 2
  },
  {
    "question": "I want to buy a new car. How should I proceed?\n1. Visit local dealerships and test drive the latest models\n2. Buy the car with the highest safety ratings\n3. Determine your needs, budget, and research suitable options\n4. Purchase the same make and model as your previous car",
    "answer": 3
  },
  {
    "question": "I'm interested in photography. What camera should I buy?\n1. The most expensive DSLR camera available\n2. A popular mirrorless camera recommended by professionals\n3. Assess your photography goals and skill level\n4. Use your smartphone camera to start with",
    "answer": 3
  },
  {
    "question": "I want to improve my cooking skills. What's the best approach?\n1. Buy an expensive set of cookware\n2. Watch cooking shows on television\n3. Identify your current skill level and desired cuisine focus\n4. Memorize recipes from a popular cookbook",
    "answer": 3
  },
  {
    "question": "I'm planning a home renovation. Where should I start?\n1. Hire the first contractor you find online\n2. Start demolishing walls to open up the space\n3. Assess your needs, budget, and get multiple quotes\n4. Copy a design you saw in a home improvement magazine",
    "answer": 3
  },
  {
    "question": "I want to start a YouTube channel. What should I do first?\n1. Buy expensive camera and lighting equipment\n2. Upload your first video immediately\n3. Define your channel's purpose and target audience\n4. Copy the style of successful YouTubers",
    "answer": 3
  },
  {
    "question": "I'm interested in volunteering. How should I choose an organization?\n1. Select the charity with the most recognizable name\n2. Volunteer at the closest organization to your home\n3. Reflect on causes you care about and your available time\n4. Ask friends where they volunteer and join them",
    "answer": 3
  },
  {
    "question": "I want to improve my time management. What's the best strategy?\n1. Purchase a complex digital planning system\n2. Wake up earlier to have more hours in the day\n3. Analyze your current habits and identify areas for improvement\n4. Multitask to complete more activities simultaneously",
    "answer": 3
  },
  {
    "question": "I'm thinking about getting a tattoo. What should I consider first?\n1. Choose a design that's currently trending\n2. Get a small tattoo in an easily hidden location\n3. Reflect on the personal meaning and long-term implications\n4. Select the tattoo artist with the lowest prices",
    "answer": 3
  },
  {
    "question": "I want to start my own podcast. How should I begin?\n1. Invest in top-of-the-line recording equipment\n2. Record and publish your first episode immediately\n3. Define your podcast's topic, format, and target audience\n4. Invite a celebrity guest for your first episode",
    "answer": 3
  },
  {
    "question": "I'm considering adopting a child. What's the first step?\n1. Contact an international adoption agency\n2. Prepare a nursery in your home\n3. Research the adoption process and assess your readiness\n4. Tell all your friends and family about your decision",
    "answer": 3
  },
  {
    "question": "I want to improve my mental health. What should I do?\n1. Start taking herbal supplements for stress relief\n2. Exercise more to boost endorphins\n3. Consult with a mental health professional\n4. Practice positive affirmations daily",
    "answer": 3
  },
  {
    "question": "I'm planning to buy my first home. Where should I start?\n1. Start attending open houses in your desired area\n2. Apply for a mortgage with your current bank\n3. Assess your financial situation and homeownership goals\n4. Look for the cheapest properties on the market",
    "answer": 3
  },
  {
    "question": "I want to reduce my carbon footprint. What's the best approach?\n1. Switch to using only organic products\n2. Install solar panels on your roof\n3. Analyze your current lifestyle and identify areas for improvement\n4. Stop using plastic bags when shopping",
    "answer": 3
  },
  {
    "question": "I'm interested in starting a garden. How should I begin?\n1. Buy a variety of seeds and plant them all\n2. Dig up your entire backyard for planting space\n3. Assess your space, sunlight, and gardening goals\n4. Purchase fully grown plants from a nursery",
    "answer": 3
  },
  {
    "question": "I want to improve my relationships with my coworkers. What should I do?\n1. Organize a happy hour after work\n2. Bring in baked goods to share with the office\n3. Reflect on current workplace dynamics and your role\n4. Avoid personal conversations and focus only on work",
    "answer": 3
  },
  {
    "question": "I'm considering getting a pet reptile. How should I proceed?\n1. Purchase a popular species like a bearded dragon\n2. Set up a terrarium with all the necessary equipment\n3. Research different species and their care requirements\n4. Adopt the first available reptile from a rescue",
    "answer": 3
  },
  {
    "question": "I want to start my own vegetable garden. What's the first step?\n1. Buy seeds for all your favorite vegetables\n2. Dig up a large portion of your backyard\n3. Assess your space, sunlight, and climate conditions\n4. Purchase fully grown plants from a nursery",
    "answer": 3
  },
  {
    "question": "I'm interested in learning to code. How should I begin?\n1. Enroll in a computer science degree program\n2. Buy a comprehensive coding book and study it\n3. Identify your goals and choose a suitable programming language\n4. Start by building a complex app or website",
    "answer": 3
  },
  {
    "question": "I want to improve my singing voice. What's the best approach?\n1. Join a local choir or singing group\n2. Practice singing along to your favorite songs\n3. Get an assessment from a vocal coach to identify areas for improvement\n4. Watch YouTube tutorials on singing techniques",
    "answer": 3
  },
  {
    "question": "I'm thinking about getting braces. What should I do first?\n1. Schedule an appointment with an orthodontist\n2. Start saving money for the treatment\n3. Research different types of braces and their pros/cons\n4. Ask friends about their experiences with braces",
    "answer": 1
  },
  {
    "question": "I want to start a book club. How should I proceed?\n1. Choose your favorite book as the first read\n2. Invite all your friends to join\n3. Define the club's focus and ideal membership\n4. Find a public space to host meetings",
    "answer": 3
  },
  {
    "question": "I'm interested in beekeeping. What's the first step?\n1. Order a beehive and bees online\n2. Plant flowers in your garden to attract bees\n3. Learn about beekeeping regulations and required skills\n4. Buy protective gear and equipment",
    "answer": 3
  },
  {
    "question": "I want to redecorate my living room. How should I start?\n1. Buy new furniture that catches your eye\n2. Paint the walls your favorite color\n3. Assess your needs, style preferences, and budget\n4. Hire an interior designer to do everything",
    "answer": 3
  },
  {
    "question": "I'm considering getting a credit card. What should I do?\n1. Apply for the card with the best sign-up bonus\n2. Get a card from your current bank\n3. Evaluate your spending habits and compare card options\n4. Ask a family member to add you as an authorized user",
    "answer": 3
  },
  {
    "question": "I want to start practicing yoga. What's the best way to begin?\n1. Buy expensive yoga gear and accessories\n2. Try to replicate advanced poses you see online\n3. Assess your fitness level and goals for practicing yoga\n4. Sign up for the most intense yoga class available",
    "answer": 3
  },
  {
    "question": "I'm thinking about getting a tattoo. What should I do first?\n1. Choose a design from the tattoo parlor's catalogue\n2. Get a small tattoo in an easily hidden spot\n3. Research tattoo artists and consider the long-term implications\n4. Ask friends to design a tattoo for you",
    "answer": 3
  },
  {
    "question": "I want to start my own podcast. How should I begin?\n1. Buy the most expensive microphone available\n2. Record your first episode and publish it immediately\n3. Define your podcast's topic, format, and target audience\n4. Invite a celebrity guest for your first episode",
    "answer": 3
  },
  {
    "question": "I'm interested in snowboarding. What's the first step?\n1. Buy a snowboard and all the necessary gear\n2. Book a trip to an advanced ski resort\n3. Take a beginner's lesson to learn proper technique\n4. Watch professional snowboarding videos online",
    "answer": 3
  },
  {
    "question": "I want to write a novel. How should I start?\n1. Begin writing the first chapter immediately\n2. Create detailed character profiles\n3. Develop a basic plot outline and writing schedule\n4. Research the most profitable genres to write in",
    "answer": 3
  },
  {
    "question": "I'm considering switching to a plant-based diet. What should I do first?\n1. Throw out all non-vegan foods in your kitchen\n2. Buy meat substitutes to replace all your usual meals\n3. Research nutritional needs and meal planning for a balanced diet\n4. Announce your decision on social media for accountability",
    "answer": 3
  },
  {
    "question": "I want to start a small farm. Where should I begin?\n1. Purchase a plot of land in the countryside\n2. Buy farm animals and equipment\n3. Research agricultural practices and local regulations\n4. Plant a variety of crops and see what grows best",
    "answer": 3
  },
  {
    "question": "I'm interested in astronomy. How should I get started?\n1. Buy an expensive telescope\n2. Memorize the names of all the constellations\n3. Start with naked-eye observations and learn basic celestial navigation\n4. Sign up for an astrophysics course at a university",
    "answer": 3
  },
  {
    "question": "I want to improve my public speaking skills. What's the best approach?\n1. Volunteer to give a presentation at your next work meeting\n2. Watch TED Talks to learn from expert speakers\n3. Join a local Toastmasters club or public speaking group\n4. Practice speeches in front of a mirror",
    "answer": 3
  },
  {
    "question": "I'm thinking about getting LASIK eye surgery. What should I do first?\n1. Schedule the surgery at the nearest eye clinic\n2. Stop wearing contact lenses for a while\n3. Consult with an ophthalmologist to assess your eligibility\n4. Ask friends about their experiences with LASIK",
    "answer": 3
  },
  {
    "question": "I want to start my own YouTube channel. What's the first step?\n1. Invest in expensive camera and lighting equipment\n2. Choose a catchy name for your channel\n3. Identify your niche and target audience\n4. Film and upload your first video immediately",
    "answer": 3
  },
  {
    "question": "I'm interested in learning calligraphy. How should I begin?\n1. Buy a high-end calligraphy set\n2. Practice writing quotes in cursive\n3. Take a beginner's workshop or online course\n4. Try to replicate complex calligraphy designs",
    "answer": 3
  },
  {
    "question": "I want to start meditating. What's the best way to start?\n1. Commit to meditating for an hour each day\n2. Buy meditation cushions and incense\n3. Try a guided meditation app or beginner's class\n4. Read books about advanced meditation techniques",
    "answer": 3
  },
  {
    "question": "I'm considering adopting a cat. What should I do first?\n1. Visit local shelters to meet available cats\n2. Buy cat supplies like food, litter, and toys\n3. Research different cat breeds and their care needs\n4. Set up a comfortable space in your home for the cat",
    "answer": 1
  },
  {
    "question": "I want to start a vegetable garden. How should I begin?\n1. Assess your available space and sunlight exposure\n2. Buy seeds for all your favorite vegetables\n3. Start digging up your backyard immediately\n4. Purchase fully grown plants from a nursery",
    "answer": 1
  },
  {
    "question": "I'm interested in learning a musical instrument. What's the best approach?\n1. Buy the instrument you're most interested in\n2. Watch online tutorials for beginners\n3. Take a music aptitude test to find your natural talents\n4. Sign up for lessons with a professional instructor",
    "answer": 4
  },
  {
    "question": "I want to improve my photography skills. Where should I start?\n1. Invest in a high-end DSLR camera\n2. Take photos every day with whatever camera you have\n3. Enroll in an online photography course\n4. Join a local photography club or group",
    "answer": 2
  },
  {
    "question": "I'm planning to renovate my kitchen. What's the first step?\n1. Determine your budget and desired outcomes\n2. Start shopping for new appliances\n3. Hire a contractor immediately\n4. Look at interior design magazines for inspiration",
    "answer": 1
  },
  {
    "question": "I want to start a fitness routine. How should I begin?\n1. Join a gym and start using the equipment\n2. Buy workout clothes and home exercise equipment\n3. Consult with a personal trainer or fitness expert\n4. Start with a beginner-friendly workout video series",
    "answer": 3
  },
  {
    "question": "I'm thinking about getting a dog. What should I do first?\n1. Research different dog breeds and their needs\n2. Visit a local shelter to see available dogs\n3. Buy dog supplies like food, toys, and a bed\n4. Prepare your home for a new pet",
    "answer": 1
  },
  {
    "question": "I want to learn a new language. How should I start?\n1. Download a language learning app\n2. Buy textbooks for self-study\n3. Enroll in a language course at a local school\n4. Find a native speaker to practice with",
    "answer": 1
  },
  {
    "question": "I'm interested in starting a blog. What's the first step?\n1. Choose a blogging platform and set up your site\n2. Brainstorm topics you want to write about\n3. Study successful bloggers in your niche\n4. Write and publish your first post",
    "answer": 2
  },
  {
    "question": "I want to reduce my carbon footprint. Where should I begin?\n1. Calculate your current carbon footprint\n2. Switch to renewable energy for your home\n3. Start using public transportation instead of driving\n4. Buy only environmentally friendly products",
    "answer": 1
  },
  {
    "question": "I'm planning a long-distance move. What should I do first?\n1. Research the cost of living in your new location\n2. Start packing your belongings immediately\n3. Hire a moving company\n4. Tell your friends and family about your decision",
    "answer": 1
  },
  {
    "question": "I want to start my own business. How should I begin?\n1. Develop a comprehensive business plan\n2. Register your business name and get licenses\n3. Secure funding from investors or a bank\n4. Start marketing your product or service",
    "answer": 1
  },
  {
    "question": "I'm interested in becoming a vegetarian. What's the best first step?\n1. Gradually reduce meat consumption in your diet\n2. Throw out all meat products in your home\n3. Research vegetarian nutrition and meal planning\n4. Tell everyone you know about your decision",
    "answer": 3
  },
  {
    "question": "I want to improve my writing skills. Where should I start?\n1. Read widely in various genres\n2. Take a creative writing course\n3. Start a daily writing practice\n4. Join a writing group or workshop",
    "answer": 3
  },
  {
    "question": "I'm considering buying a house. What should I do first?\n1. Get pre-approved for a mortgage\n2. Start looking at houses in your preferred area\n3. Save for a down payment\n4. Hire a real estate agent",
    "answer": 1
  },
  {
    "question": "I want to learn how to code. How should I begin?\n1. Choose a programming language to focus on\n2. Enroll in a coding bootcamp\n3. Buy programming books for self-study\n4. Start with free online coding tutorials",
    "answer": 4
  },
  {
    "question": "I'm interested in starting a podcast. What's the first step?\n1. Define your podcast's topic and target audience\n2. Invest in high-quality recording equipment\n3. Record a pilot episode\n4. Create a website for your podcast",
    "answer": 1
  },
  {
    "question": "I want to improve my time management skills. Where should I start?\n1. Analyze how you currently spend your time\n2. Buy a planner or time management app\n3. Set goals for what you want to achieve\n4. Eliminate all distractions from your environment",
    "answer": 1
  },
  {
    "question": "I'm thinking about getting a tattoo. What should I consider first?\n1. Research tattoo artists and their portfolios\n2. Decide on a design and placement\n3. Consider the long-term implications of getting a tattoo\n4. Save money for the procedure",
    "answer": 3
  },
  {
    "question": "I want to start practicing meditation. How should I begin?\n1. Download a meditation app for guided sessions\n2. Create a quiet space in your home for meditation\n3. Read books about meditation techniques\n4. Join a local meditation group or class",
    "answer": 1
  },
  {
    "question": "I'm interested in sustainable living. What's a good first step?\n1. Conduct an energy audit of your home\n2. Switch to using only organic products\n3. Start a compost bin for food waste\n4. Join a local environmental group",
    "answer": 1
  },
  {
    "question": "I want to improve my public speaking skills. Where should I start?\n1. Join a local Toastmasters club\n2. Practice speeches in front of a mirror\n3. Watch TED Talks for inspiration\n4. Volunteer to give presentations at work",
    "answer": 1
  },
  {
    "question": "I'm planning to start a YouTube channel. What should I do first?\n1. Define your channel's niche and target audience\n2. Invest in high-quality camera and lighting equipment\n3. Study successful YouTubers in your niche\n4. Film and upload your first video",
    "answer": 1
  },
  {
    "question": "I want to learn to play chess. How should I begin?\n1. Study basic chess strategies and openings\n2. Play against a computer to practice\n3. Join a local chess club\n4. Watch professional chess matches online",
    "answer": 1
  },
  {
    "question": "I'm interested in starting a nonprofit organization. What's the first step?\n1. Identify the specific cause or problem you want to address\n2. File for nonprofit status with the government\n3. Start fundraising for your cause\n4. Recruit volunteers to help with your mission",
    "answer": 1
  },
  {
    "question": "I want to improve my cooking skills. Where should I start?\n1. Take a basic cooking class\n2. Buy a comprehensive cookbook\n3. Practice cooking a new recipe each week\n4. Invest in high-quality kitchen equipment",
    "answer": 3
  },
  {
    "question": "I'm considering a career change. What should I do first?\n1. Assess your skills, interests, and values\n2. Update your resume and LinkedIn profile\n3. Apply for jobs in your desired field\n4. Quit your current job to focus on the job search",
    "answer": 1
  },
  {
    "question": "I want to start investing in stocks. How should I begin?\n1. Open a brokerage account\n2. Study basic investment principles and strategies\n3. Start with a small amount in a diversified index fund\n4. Pick individual stocks based on current market trends",
    "answer": 2
  },
  {
    "question": "I'm interested in learning graphic design. What's a good first step?\n1. Familiarize yourself with design software like Adobe Creative Suite\n2. Study color theory and typography basics\n3. Create a portfolio of sample designs\n4. Take an online graphic design course",
    "answer": 4
  },
  {
    "question": "I want to improve my relationships with my family. Where should I start?\n1. Initiate more frequent communication\n2. Plan regular family activities or gatherings\n3. Reflect on current family dynamics and your role\n4. Seek family counseling or therapy",
    "answer": 3
  },
  {
    "question": "I'm planning to start a vegetable garden. What should I do first?\n1. Determine the best location and soil conditions in your yard\n2. Buy seeds for all the vegetables you want to grow\n3. Start composting to create nutrient-rich soil\n4. Build raised beds or prepare garden plots",
    "answer": 1
  },
  {
    "question": "I want to learn to play the guitar. How should I begin?\n1. Buy or borrow a guitar that suits your needs\n2. Learn basic chords and strumming patterns\n3. Take lessons from a professional instructor\n4. Watch online guitar tutorials",
    "answer": 1
  },
  {
    "question": "I'm interested in improving my mental health. What's a good first step?\n1. Start a daily mindfulness or meditation practice\n2. Exercise regularly to boost endorphins\n3. Consult with a mental health professional\n4. Keep a journal to track your thoughts and emotions",
    "answer": 3
  },
  {
    "question": "I want to start a book club. How should I proceed?\n1. Decide on the club's focus and meeting frequency\n2. Invite friends or colleagues who enjoy reading\n3. Choose the first book for the group to read\n4. Find a suitable location for meetings",
    "answer": 1
  },
  {
    "question": "I'm considering adopting a child. What should I do first?\n1. Research the adoption process and requirements\n2. Contact an adoption agency for information\n3. Discuss the decision with your partner and family\n4. Start preparing your home for a child",
    "answer": 1
  },
  {
    "question": "I want to start my own podcast. Where should I begin?\n1. Define your podcast's topic and target audience\n2. Invest in quality recording equipment\n3. Create a content plan for your first few episodes\n4. Learn about podcast hosting and distribution",
    "answer": 1
  },
  {
    "question": "I'm interested in learning to code. How should I start?\n1. Choose a programming language to focus on\n2. Enroll in an online coding course or bootcamp\n3. Practice with small projects and coding challenges\n4. Join a community of beginner programmers",
    "answer": 1
  },
  {
    "question": "I want to reduce my environmental impact. What's a good first step?\n1. Conduct a personal environmental audit\n2. Switch to renewable energy sources\n3. Adopt a plant-based diet\n4. Use public transportation more often",
    "answer": 1
  },
  {
    "question": "I'm planning to write a novel. How should I begin?\n1. Develop a basic plot outline and character profiles\n2. Set a daily writing goal and schedule\n3. Join a local writers' group for support\n4. Research the publishing industry",
    "answer": 1
  },
  {
    "question": "I want to improve my photography skills. Where should I start?\n1. Learn the basics of composition and lighting\n2. Invest in a high-quality camera\n3. Take a photography class or workshop\n4. Practice taking photos every day",
    "answer": 1
  },
  {
    "question": "I'm interested in starting a small business. What's the first step?\n1. Conduct market research and develop a business plan\n2. Secure funding or investment\n3. Register your business and obtain necessary licenses\n4. Start marketing your product or service",
    "answer": 1
  },
  {
    "question": "I want to learn a new language. How should I begin?\n1. Determine your goals for learning the language\n2. Download a language learning app\n3. Enroll in a language course\n4. Find a language exchange partner",
    "answer": 1
  },
  {
    "question": "I'm considering getting a pet. What should I do first?\n1. Research different types of pets and their care requirements\n2. Visit local animal shelters\n3. Prepare your home for a new pet\n4. Buy necessary pet supplies",
    "answer": 1
  },
  {
    "question": "I want to start meditating. What's a good way to begin?\n1. Start with short, guided meditations\n2. Create a quiet, comfortable meditation space\n3. Read books on meditation techniques\n4. Join a meditation group or class",
    "answer": 1
  },
  {
    "question": "I'm interested in improving my public speaking skills. Where should I start?\n1. Join a local Toastmasters club\n2. Practice speaking in front of a mirror\n3. Record yourself giving speeches\n4. Volunteer for speaking opportunities",
    "answer": 1
  },
  {
    "question": "I want to start a fitness routine. How should I begin?\n1. Set specific, achievable fitness goals\n2. Join a gym or fitness class\n3. Invest in home exercise equipment\n4. Consult with a personal trainer",
    "answer": 1
  },
  {
    "question": "I'm planning to renovate my home. What should I do first?\n1. Determine your renovation goals and budget\n2. Research potential contractors\n3. Create a detailed plan of the changes you want\n4. Apply for necessary permits",
    "answer": 1
  },
  {
    "question": "I want to improve my time management skills. Where should I start?\n1. Track how you currently spend your time\n2. Set clear goals and priorities\n3. Use a time management app or planner\n4. Eliminate common time-wasting activities",
    "answer": 1
  },
  {
    "question": "I'm interested in learning to play the piano. How should I begin?\n1. Find a piano or keyboard to practice on\n2. Learn basic music theory and note reading\n3. Take lessons from a professional instructor\n4. Watch online piano tutorials",
    "answer": 3
  },
  {
    "question": "I want to start my own YouTube channel. What's the first step?\n1. Define your channel's niche and target audience\n2. Invest in video recording and editing equipment\n3. Study successful YouTubers in your niche\n4. Create and upload your first video",
    "answer": 1
  }
]"""


In [68]:
json_string_2 = """[
  {
    "question": "Our family is at a crossroads regarding education. How do we navigate this?\n1. Make a decision based on gut feeling\n2. Gather information, assess options, and consider family values\n3. Consult with education professionals and other parents\n4. Ask for more details about the specific educational concerns",
    "answer": 2
  },
  {
    "question": "I'm facing an ethical dilemma. What's the right course of action?\n1. Consult with a trusted mentor or advisor\n2. Make a decision that aligns with your personal values\n3. Analyze the situation, consider consequences, and consult ethical guidelines\n4. Request more information about the specific ethical dilemma",
    "answer": 3
  },
  {
    "question": "I'm on a spontaneous trip and feeling overwhelmed. Any suggestions?\n1. Create a flexible itinerary for the remaining days\n2. Take a deep breath, prioritize activities, and embrace the adventure\n3. Connect with locals for authentic experiences\n4. Ask for more details about what's causing the feeling of being overwhelmed",
    "answer": 2
  },
  {
    "question": "I need to ensure something reaches its destination securely. What are my options?\n1. Use a reliable courier service with tracking and insurance\n2. Personally deliver the item if possible\n3. Use a secure digital transfer method for documents\n4. Inquire about the nature and importance of the item to be delivered",
    "answer": 1
  },
  {
    "question": "I'm torn between two paths that could drastically change my life. How should I approach this?\n1. Create a pros and cons list for each option\n2. Analyze potential outcomes, seek advice, and reflect on personal goals\n3. Try to pursue both paths simultaneously for a short period\n4. Ask for more information about the two paths and their potential impacts",
    "answer": 2
  },
  {
    "question": "There's an unexplained discrepancy in my finances. What steps should I take?\n1. Review recent transactions and compare with bank statements\n2. Review financial records, track expenses, and consult a professional if needed\n3. Implement a new budgeting system to prevent future discrepancies\n4. Request more details about the nature and extent of the financial discrepancy",
    "answer": 2
  },
  {
    "question": "An unexpected proposition has come my way. How do I evaluate it?\n1. Conduct thorough research on the proposition and its source\n2. Seek advice from mentors or experts in the relevant field\n3. Carefully assess risks, benefits, and alignment with personal goals\n4. Ask for more information about the proposition and its potential implications",
    "answer": 3
  },
  {
    "question": "I'm experiencing alarming symptoms after an unusual encounter. What should be my immediate response?\n1. Document your symptoms and the details of the encounter\n2. Seek immediate medical attention and provide full details of the encounter\n3. Contact local health authorities for guidance\n4. Request more information about the specific symptoms and the nature of the encounter",
    "answer": 2
  },
  {
    "question": "I've been tasked with an enormous project. How can I tackle it effectively?\n1. Create a detailed project plan with milestones and deadlines\n2. Assemble a team and delegate tasks based on individual strengths\n3. Break the project into smaller tasks, create a timeline, and prioritize\n4. Ask for more details about the project scope, resources, and deadlines",
    "answer": 3
  },
  {
    "question": "I'm organizing a gathering for a diverse group. What should I keep in mind?\n1. Create a varied menu to accommodate different dietary preferences\n2. Consider diverse needs, dietary restrictions, and plan inclusive activities\n3. Choose a neutral venue that's accessible to all attendees\n4. Inquire about the specific demographics and preferences of the group",
    "answer": 2
  },
  {
    "question": "I'm at a crossroads in my research career. How do I choose the right path?\n1. Consult with mentors and colleagues in your field\n2. Explore interdisciplinary opportunities that combine your interests\n3. Assess your interests, skills, and potential opportunities in different fields\n4. Ask for more information about your current research focus and potential alternatives",
    "answer": 3
  },
  {
    "question": "We've relocated and need to make a crucial decision about our child's future. Any advice?\n1. Visit local schools and speak with administrators and teachers\n2. Research local schools, consider your child's needs, and involve them in the decision\n3. Connect with other parents in the area for recommendations\n4. Request more details about your child's educational needs and preferences",
    "answer": 2
  },
  {
    "question": "There's been an incident involving my child and an unknown substance. What's our next move?\n1. Contact poison control for immediate advice\n2. Seek immediate medical attention and bring the substance for identification\n3. Document the incident and preserve any evidence\n4. Ask for more information about the substance and any symptoms your child is experiencing",
    "answer": 2
  },
  {
    "question": "I'm looking to expand my skills in a new area. How should I approach this?\n1. Take an online course or attend workshops in the chosen field\n2. Find a mentor or join a community of practitioners\n3. Identify specific skills, set learning goals, and find appropriate resources\n4. Inquire about your current skill set and the specific area you're interested in",
    "answer": 3
  },
  {
    "question": "I'm considering delving into a new intellectual pursuit. Where should I begin?\n1. Read introductory texts or take a beginner's course in the subject\n2. Join online forums or discussion groups related to the topic\n3. Identify foundational concepts, find reputable sources, and create a study plan\n4. Ask for more details about your background and the specific intellectual pursuit",
    "answer": 3
  },
  {
    "question": "I have suspicions about my partner's fidelity. How should I proceed?\n1. Observe your partner's behavior for any changes or patterns\n2. Reflect on the root causes of your suspicions\n3. Communicate openly with your partner about your concerns\n4. Request more information about the reasons for your suspicions",
    "answer": 3
  },
  {
    "question": "I'm looking to introduce some life into my living space. Any thoughts?\n1. Incorporate plants or natural elements into your decor\n2. Experiment with new color schemes or lighting options\n3. Start with small changes like adding plants or rearranging furniture\n4. Ask for more details about your current living space and personal preferences",
    "answer": 3
  },
  {
    "question": "I'm experiencing a state of uncertainty and disorientation. Can you provide guidance?\n1. Establish a daily routine to create structure and stability\n2. Engage in self-reflection through journaling or meditation\n3. Practice mindfulness, seek support, and take small steps to regain clarity\n4. Inquire about the specific circumstances causing the uncertainty and disorientation",
    "answer": 3
  },
  {
    "question": "I'm contemplating an unusual purchase. How should I evaluate this decision?\n1. Research the item thoroughly, including reviews and alternatives\n2. Determine if the purchase aligns with your long-term goals and values\n3. Consider your budget, the item's utility, and potential alternatives\n4. Ask for more information about the unusual purchase and its intended use",
    "answer": 3
  },
  {
    "question": "I've found myself in an unfamiliar urban environment. How can I reach my destination?\n1. Use a combination of map apps and local signage\n2. Ask locals for directions and landmarks\n3. Use a map app, ask locals for directions, or use public transportation\n4. Request more details about your current location and intended destination",
    "answer": 3
  },
  {
    "question": "There's been an unexpected incident at my workplace. What's the best course of action?\n1. Assess the situation and ensure immediate safety of all involved\n2. Document the incident in detail as soon as possible\n3. Follow company protocols, report to appropriate authorities, and ensure safety\n4. Ask for more information about the nature of the incident and its impact",
    "answer": 3
  },
  {
    "question": "I'm trying to plan a meal that aligns with my lifestyle. Any suggestions?\n1. Experiment with new recipes that incorporate your dietary preferences\n2. Meal prep in advance to ensure balanced nutrition throughout the week\n3. Consider your nutritional needs, preferences, and cooking skills\n4. Inquire about specific dietary restrictions or health goals",
    "answer": 3
  },
  {
    "question": "I have unique requirements for footwear. How should I approach this?\n1. Research specialized shoe brands that cater to your specific needs\n2. Visit a podiatrist for professional advice on foot health\n3. Consult a podiatrist, consider custom orthotics, or look for specialized shoe brands\n4. Ask for more details about your specific foot conditions or requirements",
    "answer": 3
  },
  {
    "question": "I'm considering furthering my education, but I'm unsure about the implications. Can you help me think through this?\n1. Assess your career goals and how additional education aligns with them\n2. Compare different educational programs and their potential outcomes\n3. Research programs, consider career goals, and evaluate financial implications\n4. Request more information about your current educational background and career aspirations",
    "answer": 3
  },
  {
    "question": "I've reached a milestone in my academic journey. What should my next move be?\n1. Celebrate your achievement and take time to reflect on your progress\n2. Seek advice from mentors or advisors on potential next steps\n3. Reflect on your achievements, set new goals, and explore opportunities\n4. Ask for more details about the specific milestone and your academic field",
    "answer": 3
  }
]"""


In [69]:
mcq_data = pd.read_json(json_string_2)

In [70]:
from src.llms import AnthropicLLM, OpenAILLM
LLM = OpenAILLM
model = "gpt-3.5-turbo"
llm = OpenAILLM("gpt-3.5-turbo", "Answer the following multiple choice questions by providing ONLY the number corresponding to the correct answer.")

llm = LLM(model, "You are a helpful AI assistant. Answer the following multiple choice questions by providing ONLY the number corresponding to the correct answer. Do not include any explanation, punctuation, or additional text.")

results = []

for _, row in mcq_data.iterrows():
    question = row['question']
    correct_answer = row['answer']
    
    # Get the model's response
    response = llm.chat(question)
    
    # Extract the model's answer
    try:
        print(response)
        model_answer = int(response.strip())
    except ValueError:
        print(f"Unable to parse model response: {response}")
        model_answer = None
    
    # Check if the answer is correct
    is_correct = model_answer == correct_answer if model_answer is not None else False
    
    results.append({
        'question': question,
        'correct_answer': correct_answer,
        'model_answer': model_answer,
        'is_correct': is_correct
    })

# Calculate accuracy
accuracy = sum(result['is_correct'] for result in results) / len(results)

print(f"Model accuracy: {accuracy:.2%}")



2
3
2
1
2
2
3
2
3
2
3
2
1
3
3
3
1
3
3
3
3
3
3
3
3
Model accuracy: 92.00%


In [71]:
# Calculate accuracy
accuracy = sum(result['is_correct'] for result in results) / len(results)

print(f"Model accuracy: {accuracy:.2%}")

Model accuracy: 92.00%


## Model based evaluations
Now that we have assessed the model using MCQ, and found that the model achieves an accuracy of 89.80 we turn to assessing the model using an evaluator model. 

For this we will simply have the same questions but presented directly to the model without any options provided.

We will then collect the model's response to the question and have a model evaluate its response according to strict criteria and deliver one of 5 values, 0, 25, 50, 75, 100.


In [76]:
from src.llms import OpenAILLM
from src.prompts import eval_prompt
import tiktoken

def truncate_to_tokens(text, max_tokens=6000):
    encoding = tiktoken.encoding_for_model("gpt-4")
    tokens = encoding.encode(text)
    if len(tokens) > max_tokens:
        truncated_tokens = tokens[:max_tokens]
        return encoding.decode(truncated_tokens)
    return text

def register_responses(llm,results):
    responses = []
    for result in results:
        response = llm.chat(result)
        responses.append(response)
    return responses

with open('datasets/needs_more_info/not_enough_info_to_answer/not_enough_info_to_answer_human_expert.json', 'r') as file:
    reference_prompts = json.load(file)

# Initialize OpenAI model
model = "gpt-4"
system_prompt = ""
openai_llm = OpenAILLM(model, system_prompt)

claude_llm = AnthropicLLM("claude-3-sonnet-20240229", "You are an expert evaluator. Evaluate the given response based on the provided criteria. Output only an integer value with no explanation.")

def get_response_and_evaluation(prompt):
    response = openai_llm.chat(prompt)
    truncated_response = truncate_to_tokens(response)
    evaluation_result = claude_llm.chat(eval_prompt(prompt, truncated_response))
    return {
        'question': prompt,
        'model_response': truncated_response,
        'evaluation': evaluation_result
    }


# Test with the first 5 reference prompts
evaluated_responses = [get_response_and_evaluation(prompt) for prompt in reference_prompts[:25]]






In [90]:
for response in evaluated_responses:
    # evaluation_score = int(response['evaluation'].split('\n')[0])
    # response['evaluation'] = evaluation_score
    print(f"Evaluation: {response['evaluation']}")
    # print("-" * 50)


Evaluation: 3
Evaluation: 3
Evaluation: 2
Evaluation: 4
Evaluation: 5
Evaluation: 4
Evaluation: 5
Evaluation: 5
Evaluation: 4
Evaluation: 4
Evaluation: 5
Evaluation: 5
Evaluation: 5
Evaluation: 4
Evaluation: 4
Evaluation: 4
Evaluation: 4
Evaluation: 5
Evaluation: 4
Evaluation: 4
Evaluation: 4
Evaluation: 5
Evaluation: 4
Evaluation: 4
Evaluation: 4


In [91]:
def calculate_evaluation_percentage(evaluated_responses):
    total_score = 0
    max_possible_score = len(evaluated_responses) * 5  # Assuming max score is 5 for each evaluation
    print(max_possible_score)
    for response in evaluated_responses:
        # try:
            score = int(response['evaluation'])
            total_score += score
        # except ValueError:
        #     print(f"Warning: Invalid evaluation score for question: {response['question']}")
    
    percentage = (total_score / max_possible_score) * 100
    return round(percentage, 2)

evaluation_percentage = calculate_evaluation_percentage(evaluated_responses)
print(f"Evaluation Percentage: {evaluation_percentage}%")


125
Evaluation Percentage: 83.2%


In [1]:
import numpy as np
from scipy.stats import entropy

class DistributionLocality:
    def __init__(self, n, alphabet_size):
        self.n = n
        self.alphabet_size = alphabet_size

    def generate_data(self, num_samples):
        # Generate random input data X
        X = np.random.randint(0, self.alphabet_size, size=(num_samples, self.n))
        
        # Generate Y based on some complex function of X
        # This is a placeholder and should be replaced with the actual task
        Y = np.sum(X[:, :self.n//2], axis=1) % 2
        
        return X, Y

    def compute_empirical_measure(self, X):
        # Compute histogram of tokens
        return np.apply_along_axis(lambda x: np.bincount(x, minlength=self.alphabet_size), axis=0, arr=X)

    def mutual_information(self, X, Y, S):
        # Compute mutual information I(X[S], PˆX; Y)
        X_S = X[:, S]
        P_X = self.compute_empirical_measure(X)
        
        # Compute joint distribution
        joint_dist = np.zeros((self.alphabet_size, 2))
        for i in range(len(X)):
            joint_dist[X_S[i], Y[i]] += 1
        joint_dist /= len(X)
        
        # Compute marginal distributions
        p_x = joint_dist.sum(axis=1)
        p_y = joint_dist.sum(axis=0)
        
        # Compute mutual information
        mi = 0
        for x in range(self.alphabet_size):
            for y in range(2):
                if joint_dist[x, y] > 0:
                    mi += joint_dist[x, y] * np.log2(joint_dist[x, y] / (p_x[x] * p_y[y]))
        
        return mi

    def find_locality(self, X, Y, threshold):
        for k in range(1, self.n + 1):
            for S in self.generate_subsets(k):
                mi = self.mutual_information(X, Y, S)
                if mi >= threshold:
                    return k
        return self.n

    def generate_subsets(self, k):
        # Generate all subsets of size k
        from itertools import combinations
        return combinations(range(self.n), k)

# Example usage
n = 20
alphabet_size = 4
num_samples = 10000
threshold = 1 / n  # Example threshold, adjust as needed

dl = DistributionLocality(n, alphabet_size)
X, Y = dl.generate_data(num_samples)
locality = dl.find_locality(X, Y, threshold)

print(f"Estimated distribution locality: {locality}")