Skip to content

cu-cit-cloud-team/gpt4-openai-vs-azure-openai

Repository files navigation

ct-gpt4-openai-vs-azure-openai

Comparing GPT-4 API results between OpenAI and Azure OpenAI Services

NOTE: Work and documentation in progress

About

We've been seeing inconsistent and less than desireable results from the Azure OpenAI Services when working with our own data.

We wanted to be able to identify if the poor results were exclusive to working with our own data or if something was fundamentally different when working with GPT-4 models via Azure OpenAI vs OpenAI directly.

This repo is a place to test and compare the results of the two services using different system prompts and some general questions.

Running Locally

Prerequisites

  • Node.js >= 18.x (with npm >= v8.x)
  • Azure Subscription
    • Azure OpenAI access enabled
    • GPT-4 access enabled
    • Azure OpenAI Services deployed along with a GPT-4 model
      • API key for deployed service
  • OpenAI API key

Getting Started

  1. Clone repo

    • git clone https://github.com/CU-CommunityApps/ct-gpt4-openai-vs-azure-openai.git
  2. Enter directory

    • cd ct-gpt4-openai-vs-azure-openai
  3. Install dependencies

    • npm install
  4. Copy .env.example to .env and then replace necessary values cp .env.example .env

    • see comments in .env.example for more info
  5. Run the demos:

    • npm run demo
  6. Example results:

      Run ID: 40724c27-8878-424e-b313-636819da7230
      Prompt
      [
        {
          role: "system",
          content: "You are a helpful AI assistant."
        }, {
          role: "user",
          content: "Why is the sky blue?"
        }
      ]
      Results
      OpenAI
      {
        role: "assistant",
        content: "The sky appears blue because of a process called Rayleigh scattering. As sunlight reaches Earth's atmosphere, it is made to change direction and spread out or scatter by tiny molecules in the air. Blue light waves are shorter and scatter more than other colors because they travel in smaller, tighter waves. This scattered blue light is what we see when we look up at the sky."
      }
      5.834 seconds
      Azure OpenAI
      {
        role: "assistant",
        content: "The sky appears blue because of a process called Rayleigh scattering. When the sun's light enters our atmosphere, it is made up of different colors that combined appear white. These colors are all scattered in various directions by the gases and particles in Earth's atmosphere. Blue light being shorter in wavelength is scattered more than other colors because it travels in smaller, shorter waves. This is why we see a blue sky most of the time.\n"
      }
      4.175 seconds
    
    
      Prompt
      [
        {
          role: "system",
          content: "You are a helpful AI assistant."
        }, {
          role: "user",
          content: "Why are the dinosaurs extinct?"
        }
      ]
      Results
      OpenAI
      {
        role: "assistant",
        content: "The dinosaurs went extinct about 65 million years ago, at the end of the Cretaceous period, a fact often attributed to a cataclysmic asteroid impact near what is now Chicxulub in Mexico. The impact would have caused tsunamis, earthquakes, and also released substantial amounts of dust into the atmosphere, blocking sunlight and causing drastic global cooling. This would disrupt plant life and ultimately lead to a massive disruption in the food chain.\n\nIn addition to this theory, there are others that suggest intense volcanic activity could have significantly altered the global climate around this same time—or it may have been a combination of both. It's not definitively clear-cut but the most widely supported theory is the asteroid impact. This event ended up wiping out approximately 75% of all species on Earth. It's important to note however that birds are technically considered dinosaurs by scientists because they evolved from theropod dinosaurs."
      }
      13.009 seconds
      Azure OpenAI
      {
        role: "assistant",
        content: "The most widely accepted theory about the dinosaurs' extinction is that a giant asteroid or comet impacted Earth around 65 million years ago. This event, which occurred near present-day Chicxulub, Mexico, likely caused massive fires, tsunamis, intense heat from the explosion followed by a \"nuclear winter\" effect. The heat and blast would have killed many creatures instantly, and the dust and particles thrown into the atmosphere would have blocked sunlight, resulting in a collapse of plant life. This would in turn affect herbivores and then predators due to lack of food.\n\nHowever, some scientists also argue that massive volcanic activity in what is now India may have contributed to the extinction event through releasing large amounts of dust and sulphur into the atmosphere with similar effects.\n\nIt's believed the combination of these dramatic changes to their environment made survival for many dinosaur species impossible. It’s important to note though that not all life died out during this time - mammals, birds (which are considered modern descendants of dinosaurs), amphibians and others managed to survive and evolve."
      }
      10.225 seconds
    
    
      Prompt
      [
        {
          role: "system",
          content: "You are an AI programming assistant. Follow the user's requirements carefully and to the letter. First, think step-by-step and describe your plan for what to build in pseudocode, written out in great detail. Then, output the code in a single code block. Minimize any other prose."
        },
        {
          role: "user",
          content: "Solve the FizzBuzz problem using modern style JavaScript."
        }
      ]
      Results
      OpenAI
      {
        role: "assistant",
        content: "Pseudocode:\n\n1. Loop from 1 to 100 (both inclusive)\n2. If the loop index is divisible evenly by 3 and 5, print 'FizzBuzz'\n3. Else if the loop index is divisible evenly by only 3, print 'Fizz'\n4. Else if the loop index is divisible evenly by only 5, print 'Buzz'\n5. Else, just print the loop index\n6. End Loop\n\nCode:\n\n```javascript\nfor(let i = 1; i <= 100; i++) {\n    if(i % 15 === 0) {\n        console.log('FizzBuzz');\n    } else if(i % 3 === 0) {\n        console.log('Fizz');\n    } else if(i % 5 === 0) {\n        console.log('Buzz');\n    } else {\n        console.log(i);\n    }\n}\n```\n"
      }
      13.110 seconds
      Azure OpenAI
      {
        role: "assistant",
        content: "Step-by-step plan in pseudocode:\n1. Declare a for loop from 1 to 100 (both inclusive).\n2. Inside the loop, check three conditions using control flow for each number.\n3. - If the number is divisible by both 3 and 5, print \"FizzBuzz\".\n4. - If the number is only divisible by 3, print \"Fizz\".\n5. - If the number is only divisible by 5, print \"Buzz\".\n6. - If it's not divisible by either of these, then just print the number itself.\n\nJavaScript code:\n\n```javascript\nfor(let i = 1; i <= 100; i++) {\n    let output = '';\n    if(i % 3 === 0) output += 'Fizz';\n    if(i % 5 === 0) output += 'Buzz';\n    console.log(output || i);\n}\n```\nIn this code block, looping takes place from numbers one to one hundred inclusive (Step-1). Within each iteration of this loop, first there's an empty string assigned to `output`. Then it's checked whether `i` is divisible by three—if it is `Fizz` gets appended to `output` (Step-2). The same goes for five and appending `Buzz`. At last if neither of those conditions matched i.e., if `output` remains empty string—then that implies the current number isn't divisible by either three or five (Step-4), which in turn means you should simply log out the current number (`i`) instead of any string message, which is what happens where it says `console.log(output || i);`. In case a non-empty message exists in `output`, that'd be logged out because that would mean some divisibility condition has been met as per our FizzBuzz ruleset above defined at steps-2 through -4 respectively.\n"
      }
      17.377 seconds
    
    
      Prompt
      [
        {
          role: "system",
          content: "You are a helpful AI research assistant. Answer the user's queries. You should be brutally honest rather than polite. You should take as much time as you need to think and then provide comprehensive responses instead of quick answers. Please use and have opinions, and make it clear that they are opinions. As much as possible and practical, please provide references. Answer in markdown format."
        },
        {
          role: "user",
          content: "Does the diversity of a group of researchers affect their performance?"
        }
      ]
      Results
      OpenAI
      {
        role: "assistant",
        content: "Yes, the diversity of a group of researchers can affect their performance in various ways. This is a widely researched topic and there are many studies indicating both positive and negative aspects of workforce diversity.\n\n### Positive Impact:\n\n1. **Innovation and Creativity**: Diversity in a group could mean differences in race, gender, nationality, academic discipline or professional background, age, etc. Such differences may bring new perspectives to the table that stimulate discussion and spur creativity. This point is well-documented in multiple leadership studies ([Homan et al., 2007](https://www.sciencedirect.com/science/article/pii/S1048984307001166), [Phillips et al., 2008](https://journals.sagepub.com/doi/10.1177/0149206308321544)).\n\n2. **Better problem-solving capabilities**: Groups with diverse members can potentially have better problem-solving capabilities because they see issues from different angles due to their different backgrounds ([Page, 2007](http://www.leppanencontemporary.com/pdf/The_Difference.pdf)).\n\n3. **Improved decision making**: Diverse teams are likely to consider more alternative solutions compared to homogeneous ones hence contributing towards enhanced decision making ([Cox & Blake, 1991](https://journals.aom.org/doi/abs/10.5465/amr.1991.4278938)).\n\n### Negative Impact:\n \n1. **Communication barriers and conflict**: Differences in culture or language can lead to misunderstandings or conflicts within diverse groups if not managed properly ([Stahl et al., 2010](https://onlinelibrary.wiley.com/doi/full/10.1002/job.674) , [Jehn et al., 1999](https://journals.sagepub.com/doi/abs/10.1177/014920639902500303)).\n\n2. **Social loafing**: This is a phenomenon in which some group members reduce their effort, relying on the work from other members of the group. Evidence suggests that social loafing might be higher in diverse groups ([Latané et al., 1979](https://journals.sagepub.com/doi/10.1177/014616727900500407)).\n\nIn my opinion, the potential advantages of diversity—innovation, problem-solving capabilities and decision making—are crucial for research where novel ideas are the primary catalysts for progress. However, it's important to take note of potential downsides and manage these carefully.\n\nPlease note that this overview doesn't cover all possible impacts of diversity on group performance. The impact can also vary depending on how 'diversity' is defined and measured in different contexts. It's a complex topic with many influencing factors.\n"
      }
      43.414 seconds
      Azure OpenAI
      {
        role: "assistant",
        content: "Based on the available data, diversity within a team of researchers can indeed influence their performance, generally in a positive way. This finding stems from several key factors related to the concept of cognitive diversity.\n\n1. **Problem-Solving**: More diverse teams have been found to be better at problem-solving. According to various studies such as Hong and Page (2004), a group with diverse sets of identities, experiences and backgrounds can bring wider range of ideas and solutions to a specific problem[^1^]. \n\n2. **Innovation**: Diverse teams are often more innovative, according to landmark research from Richard Freeman and Wei Huang. Their study published in 2015 examined roughly 2.5 million scientific papers published over 20 years and found that papers written by ethnically diverse research teams received more citations than those written by homogenous teams[^2^].\n\n3. **Avoids Groupthink**: A lack of diversity can lead to groupthink, where conformity pressure leads the group to make collective decisions that may not always be optimal[^3^]. Diversity can reduce the risk of this occurring.\n\nAlthough these studies suggest positive impacts arising from diversity, it's important to note that directly correlating diversity with improved performance isn't fully straightforward immediately because successful research also depends on multiple factors like research facility quality, funding availability and individual abilities.\n\nMoreover, diversity may also introduce challenges such as communication issues due to differences in cultural backgrounds or languages barriers. Nevertheless, most evidence suggests that the benefits often outweigh downside hence making case for diverse research groups stronger.\n\nKeep in mind this is an opinion informed by current literature. The effects of diversity on team performance may evolve as newer studies become available or societal factors change.\n\n[^1^]: Hong L., & Page S.E., 2004,\"Groups Of Diverse Problem Solvers Can Outperform Groups Of High-Ability Problem Solvers\", Proceedings Of The National Academy Of Sciences Of The United States Of America\n\n[^2^]: Freeman R.B., & Huang W., 2015,\"Collaborating With People Like Me: Ethnic Co-Authorship Within The US\", National Bureau Of Economic Research\n\n[^3^]: Janis I.L., 1972 \"Victims of Groupthink: Psychological Study of Foreign-policy Decisions and Fiascoes.\", Houghton Mifflin"
      }
      38.474 seconds
    
    
      Results as JSON: ./results/40724c27-8878-424e-b313-636819da7230.json
      Total Execution Time: 145.658 seconds