<a href="https://colab.research.google.com/github/tanmaypilla/AIEarthHack/blob/main/evaluator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [13]:
import pandas as pd
import numpy as np
import csv


## Dataset Processing

We first take our dataset of project ideas and store them in a data structure with efficient access. We would then create a baseline model to filter out the ideas using the GPT3.5-API.

In [14]:
rows = []

with open('./data/AI_EarthHack_Dataset_Small.csv', encoding = 'latin-1') as file:
  csv_reader = csv.reader(file)
  header = next(csv_reader)
  for row in csv_reader:
    rows.append(row)

print(rows[0])

['1', 'The construction industry is indubitably one of the significant contributors to global waste, contributing approximately 1.3 billion tons of waste annually, exerting significant pressure on our landfills and natural resources. Traditional construction methods entail single-use designs that require frequent demolitions, leading to resource depletion and wastage.   ', "Herein, we propose an innovative approach to mitigate this problem: Modular Construction. This method embraces recycling and reuse, taking a significant stride towards a circular economy.   Modular construction involves utilizing engineered components in a manufacturing facility that are later assembled on-site. These components are designed for easy disassembling, enabling them to be reused in diverse projects, thus significantly reducing waste and conserving resources.  Not only does this method decrease construction waste by up to 90%, but it also decreases construction time by 30-50%, optimizing both environment

In [15]:
%pip install openai

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [16]:
from openai import OpenAI

api_key = "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"
client = OpenAI(api_key=api_key)

## Baseline Model
The Baseline Model would

In [17]:
#Baseline model with base metrics :
#Adherence to circular economy, market potential, scalability, feasibility, maturity stage, technological innovation
baseline_metrics  = {'Market Potential':20, 'Scalability':20,'Feasibility':20,'Maturity Stage':20,'Technological Innovation':20}

def get_number_tokens(res, tokens):
    for token in tokens: 
        if token.isdigit():
            res.append(token)

def generate_baseline_results(idx, problem, solution):
      messages = [
        {
            "role": "system",
            "content": '''You are an AI-powered decision-support tool used to evaluate innovative circular economy business opportunities.
              You are given a problem statement and a solution. Here are a few important metrics you need to evaluate these solutions on, 
              Metrics : Market Potential, Scalability, Feasibility, Maturity Stage, Technological Innovation. Follow these steps for the output :
              Step 1 : For each metric, you provide a score for the solution between 0 and 20. The higher the score, the better the solution.
              Step 2 : You must create a combined score, by aggregating (sum of) all the individual scores from the metrics above. This score should be between 0 and 100.
              Ensure each criteria is given equal weightage, and is scored out of 20. Ensure that the output is in one line always. Ensure that the output is exactly the same format 
              as the example, with the same number of spaces and punctuation. You do not have to show your reasoning for the scores.''',
        },
        {
            "role": "user",
            "content": '''Problem Statement : The construction industry is indubitably one of the significant contributors to global waste, contributing approximately 1.3 billion tons of waste annually, exerting significant pressure on our landfills and natural resources. Traditional construction methods entail single-use designs that require frequent demolitions, leading to resource depletion and wastage.
                          Solution : Herein, we propose an innovative approach to mitigate this problem: Modular Construction. This method embraces recycling and reuse, taking a significant stride towards a circular economy. Modular construction involves utilizing engineered components in a manufacturing facility that are later assembled on-site. These components are designed for easy disassembling, enabling them to be reused in diverse projects, thus significantly reducing waste and conserving resources. Not only does this method decrease construction waste by up to 90%, but it also decreases construction time by 30-50%, optimizing both environmental and financial efficiency. This reduction in time corresponds to substantial financial savings for businesses. Moreover, the modular approach allows greater flexibility, adapting to changing needs over time. We believe, by adopting modular construction, the industry can transit from a 'take, make and dispose' model to a more sustainable 'reduce, reuse, and recycle' model, driving the industry towards a more circular and sustainable future. The feasibility of this concept is already being proven in markets around the globe, indicating its potential for scalability and real-world application.''',
        },
        {
            "role": "assistant",
            "content": "Market Potential: 15 Scalability: 12 Feasibility: 19 Maturity Stage: 14 Technological Innovation: 11 Combined Score: 71",
        },
        {
            "role": "user",
            "content": "Problem Statement : " + problem + " Solution : " + solution,
        }
      ]
      
      res = client.chat.completions.create(
          model = "gpt-3.5-turbo",
          messages = messages
      )
      msg = res.choices[0].message.content
      print(idx, msg)
      tokens = msg.split(' ')
      # print(tokens)
      result = [idx, problem, solution, tokens[2], tokens[4], tokens[6], tokens[9], tokens[12], tokens[15]]
      return result

def baseline_model(model_data):
    fieldnames=['Index', 'Problem', 'Solution', 'Market Potential', 'Scalability', 'Feasibility','Maturity Stage','Technological Innovation', 'Combined Score']

    for row in rows:
      baseline_row = generate_baseline_results(row[0], row[1], row[2])
      model_data.append(baseline_row)
    model_data.sort(key=lambda x: x[::-1], reverse=True)

    with open('./data/baseline_results.csv','w', newline = '', encoding = 'latin-1') as file:
      writer = csv.writer(file, fieldnames)
      writer.writerow(fieldnames)
      writer.writerows(model_data)

baseline_model_data = []
baseline_model(baseline_model_data)


1 Market Potential: 15 
Scalability: 12 
Feasibility: 19 
Maturity Stage: 14 
Technological Innovation: 11 
Combined Score: 71
2 Market Potential: 18 Scalability: 16 Feasibility: 17 Maturity Stage: 13 Technological Innovation: 19 Combined Score: 83
3 Market Potential: 18 Scalability: 16 Feasibility: 17 Maturity Stage: 14 Technological Innovation: 16 Combined Score: 81
4 Market Potential: 16 Scalability: 18 Feasibility: 17 Maturity Stage: 13 Technological Innovation: 14 Combined Score: 78
5 Market Potential: 18 Scalability: 17 Feasibility: 16 Maturity Stage: 13 Technological Innovation: 19 Combined Score: 83
6 Market Potential: 14 Scalability: 17 Feasibility: 12 Maturity Stage: 8 Technological Innovation: 16 Combined Score: 67


## Categorization 
Asking GPT to generate Categories based on ideas 

In [86]:
import re 
def generate_categories(categories):
    #generating categories for baseline model 
    data = ""
    for row in rows:
        data += row[0]
        data += "\nProblem: "
        data += row[1]
        # data += "\nSolution: "
        # data += row[2]
        data += "\n"
    # print(data)

    messages = [{"role": "system",
        "content": '''You are going to categorize the following problems into categories relevant to strengthening the circular economy.
                    I want you to only tell me the category name and the number of problems that fit in that category.
                    Output Format - Category name: number of problems in that category. Ensure that the output is in one line always.
                    Ensure that each category is separated by a comma.'''
        },]
    res = client.chat.completions.create(
          model = "gpt-3.5-turbo",
          messages = messages
      )
    message = res.choices[0].message.content
    tokens = re.split('; |, ', message)
    for i in range(0, len(tokens)-1):
        categories[tokens[i]] = tokens[i+1]
        i+=1
    print(categories)
    print(res.choices[0].message.content)

categories = {}
generate_categories(categories)

{'Recycling: 3': 'Waste Management: 1', 'Waste Management: 1': 'Renewable Energy: 2', 'Renewable Energy: 2': 'Sustainable Agriculture: 2', 'Sustainable Agriculture: 2': 'Sustainable Manufacturing: 1'}
Recycling: 3, Waste Management: 1, Renewable Energy: 2, Sustainable Agriculture: 2, Sustainable Manufacturing: 1


## Visualizations

In [91]:
def visualization(categories):
    import plotly.graph_objects as go
    keys = list(categories.keys())
    values = list(categories.values())

    fig = go.Figure(data=[go.Bar(x=keys, y=values)])
    fig.update_layout(title_text='Category Distribution', xaxis_title='Categories', yaxis_title='Frequency')
    fig.show()

visualization(categories)

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

## User-Based Model

In [23]:
#User-based model
import os

user_weights = []
print(baseline_model_data)
def user_model():

    while True:
        print('''Tell us about yourself, explain what type of investments you are looking for\n
              e.g. Im a young investor looking to make big profit, I have a large amount of money to invest and am willing to try anything for a big profit margin and need a return within the next 10 years\n''')
        intro = input()
    
        chat_completion = client.chat.completions.create(
            messages=[
                { "role": "system", "content": "You are a decision-support tool, given an investor profile determine weightings ideas from 1 to 100 based on how relevant each of the following metrics is: Market Potential, Scalibility, Feasibility, Maturity Stage, Technological Innovation" },
                { "role": "user", "content": "I am a Venture Capital Analyst looking for start-ups, I am looking for safe investments and I would need my investment to pay off in 3-5 years."},
                { "role": "assistant", "content": "23, 90, 63, 74, 9" },
                { "role": "user", "content": intro}
            ],
            model="gpt-3.5-turbo",
        )
        weights = chat_completion.choices[0].message.content
        #weights = '70, 85, 50, 67, 94' 

        if(weights[0].isdigit()): # check for error/other prompt response
            break

    values = list(map(lambda x: int(x), weights.split(', ')))
    return values

def calculateScore():
    average = sum(user_weigths) / len(user_weigths)

    weightedScore = [[] for x in range(len(baseline_model_data))]

    for x in range(1, len(baseline_model_data)):
        total = 0
        
        for y in range(0, 5):
            weightedScore[x - 1].append(int(int(baseline_model_data[x][y + 3]) * (user_weigths[y] / average)))
            total += weightedScore[x - 1][y]

        weightedScore[x - 1].append(total)

    return weightedScore


user_weigths = user_model()
weighted_scores = calculateScore()
print(weighted_scores)

[['2', 'I\'m sure you, like me, are feeling the heat - literally! With World Health Organization declaring climate change as ""the greatest threat to global health in the 21st century"", we\'re in a race against time to move away from fossil fuels to more efficient, less polluting electrical power. But as we take bold leaps into a green future with electric cars and heating, we\'re confronted with a new puzzle - generating enough electrical power without using fossil fuels!  ', 'Imagine standing on a green hill, not a single towering, noisy windmill in sight, and yet, you\'re surrounded by wind power generation! Using existing, yet under-utilized technology, I propose a revolutionary approach to harness wind energy on a commercial scale, without those ""monstrously large and environmentally damaging windmills"". With my idea, we could start construction tomorrow and give our electrical grid the jolt it needs, creating a future where clean, quiet and efficient energy isn\'t a dream, but